[
  {
    "path": "README.md",
    "content": "# Command Line Text Processing\n\nLearn about various commands available for common and exotic text processing needs. Examples have been tested on GNU/Linux - there'd be syntax/feature variations with other distributions, consult their respective `man` pages for details.\n\n---\n\n:warning: :warning: I'm no longer actively working on this repo. Instead, I've converted existing chapters into ebooks (see [ebook section](#ebooks) below for links), available under the same license. These ebooks are better formatted, updated for newer versions of the software, includes exercises, solutions, etc. Since all the chapters have been converted, I'm archiving this repo.\n\n---\n\n<br>\n\n## Ebooks\n\nIndividual online ebooks with better formatting, explanations, exercises, solutions, etc:\n\n* [CLI text processing with GNU grep and ripgrep](https://learnbyexample.github.io/learn_gnugrep_ripgrep/)\n* [CLI text processing with GNU sed](https://learnbyexample.github.io/learn_gnused/)\n* [CLI text processing with GNU awk](https://learnbyexample.github.io/learn_gnuawk/)\n* [Ruby One-Liners Guide](https://learnbyexample.github.io/learn_ruby_oneliners/)\n* [Perl One-Liners Guide](https://learnbyexample.github.io/learn_perl_oneliners/)\n* [CLI text processing with GNU Coreutils](https://learnbyexample.github.io/cli_text_processing_coreutils/)\n* [Linux Command Line Computing](https://learnbyexample.github.io/cli-computing/)\n\nSee https://learnbyexample.github.io/books/ for links to PDF/EPUB versions and other ebooks.\n\n<br>\n\n## Chapters\n\nAs mentioned earlier, I'm no longer actively working on these chapters:\n\n* [Cat, Less, Tail and Head](./tail_less_cat_head.md)\n    * cat, less, tail, head, Text Editors\n* [GNU grep](./gnu_grep.md)\n* [GNU sed](./gnu_sed.md)\n* [GNU awk](./gnu_awk.md)\n* [Perl the swiss knife](./perl_the_swiss_knife.md)\n* [Ruby one liners](./ruby_one_liners.md)\n* [Sorting stuff](./sorting_stuff.md)\n    * sort, uniq, comm, shuf\n* [Restructure text](./restructure_text.md)\n    * paste, column, pr, fold\n* [Whats the difference](./whats_the_difference.md)\n    * cmp, diff\n* [Wheres my file](./wheres_my_file.md)\n* [File attributes](./file_attributes.md)\n    * wc, du, df, touch, file\n* [Miscellaneous](./miscellaneous.md)\n    * cut, tr, basename, dirname, xargs, seq\n\n<br>\n\n## Webinar recordings\n\nRecorded couple of videos based on content in the chapters, not sure if I'll do more:\n\n* [Using the sort command](https://www.youtube.com/watch?v=qLfAwwb5vGs)\n* [Using uniq and comm](https://www.youtube.com/watch?v=uAb2kxA2TyQ)\n\nSee also my short videos on [Linux command line tips](https://www.youtube.com/watch?v=p0KCLusMd5Q&list=PLTv2U3HnAL4PNTmRqZBSUgKaiHbRL2zeY)\n\n<br>\n\n## Exercises\n\nCheck out [exercises](./exercises) directory to solve practice questions on `grep`, right from the command line itself.\n\nSee also my [TUI-apps](https://github.com/learnbyexample/TUI-apps) repo for interactive CLI text processing exercises.\n\n<br>\n\n## Contributing\n\n* Please [open an issue](https://github.com/learnbyexample/Command-line-text-processing/issues) for typos or bugs\n    * As this repo is no longer actively worked upon, **please do not submit pull requests**\n* Share the repo with friends/colleagues, on social media, etc to help reach other learners\n* In case you need to reach me, mail me at `echo 'yrneaolrknzcyr.arg@tznvy.pbz' | tr 'a-z' 'n-za-m'` or send a DM via [twitter](https://twitter.com/learn_byexample)\n\n<br>\n\n## Acknowledgements\n\n* [unix.stackexchange](https://unix.stackexchange.com/) and [stackoverflow](https://stackoverflow.com/) - for getting answers to pertinent questions as well as sharpening skills by understanding and answering questions\n* Forums like [Linux users](https://www.linkedin.com/groups/65688), [/r/commandline/](https://www.reddit.com/r/commandline/), [/r/linux/](https://www.reddit.com/r/linux/), [/r/ruby/](https://www.reddit.com/r/ruby/), [news.ycombinator](https://news.ycombinator.com/news), [devup](http://devup.in/) and others for valuable feedback (especially spotting mistakes) and encouragement\n* See [wikipedia entry 'Roses Are Red'](https://en.wikipedia.org/wiki/Roses_Are_Red) for `poem.txt` used as sample text input file\n\n<br>\n\n## License\n\nThis work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/)\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex01_basic_match.txt",
    "content": "1) Match lines containing the string: day\nSolution: grep 'day' sample.txt\n\n2) Match lines containing the string: it\nSolution: grep 'it' sample.txt\n\n3) Match lines containing the string: do you\nSolution: grep 'do you' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex02_basic_options.txt",
    "content": "1) Match lines containing the string irrespective of lower/upper case: no\nSolution: grep -i 'no' sample.txt\n\n2) Match lines not containing the string: o\nSolution: grep -v 'o' sample.txt\n\n3) Match lines with line numbers containing the string: it\nSolution: grep -n 'it' sample.txt\n\n4) Output only number of matching lines containing the string: a\nSolution: grep -c 'a' sample.txt\n\n5) Match first two lines containing the string: do\nSolution: grep -m2 'do' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex03_multiple_string_match.txt",
    "content": "1) Match lines containing either of these three strings\n        String1: Not\n        String2: he\n        String3: sun\nSolution: grep -e 'Not' -e 'he' -e 'sun' sample.txt\n\n2) Match lines containing both these strings\n        String1: He\n        String2: or\nSolution: grep 'He' sample.txt | grep 'or'\n\n3) Match lines containing either of these two strings\n        String1: a\n        String2: i\n   and contains this as well\n        String3: do\nSolution: grep -e 'a' -e 'i' sample.txt | grep 'do'\n\n4) Match lines containing the string\n        String1: it\n   but not these strings\n        String2: No\n        String3: no\nSolution: grep 'it' sample.txt | grep -vi 'no'\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex04_filenames.txt",
    "content": "Note: All files present in the directory should be given as file inputs to grep\n\n1) Show only filenames containing the string: are\nSolution: grep -l 'are' *\n\n2) Show only filenames NOT containing the string: two\nSolution: grep -L 'two' *\n\n3) Match all lines containing the string: are\nSolution: grep 'are' *\n\n4) Match maximum of two matching lines along with filenames containing the character: a\nSolution: grep -m2 'a' *\n\n5) Match all lines without prefixing filename containing the string: to\nSolution: grep -h 'to' *\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex05_word_line_matching.txt",
    "content": "Note: All files present in the directory should be given as file inputs to grep\n\n1) Match lines containing whole word: do\nSolution: grep -w 'do' *\n\n2) Match whole lines containing the string: Hello World\nSolution: grep -x 'Hello World' *\n\n3) Match lines containing these whole words:\n        Word1: He\n        Word2: far\nSolution: grep -w -e 'far' -e 'He' *\n\n4) Match lines containing the whole word: you\n    and NOT containing the case insensitive string: How\nSolution: grep -w 'you' * | grep -vi 'how'\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex06_ABC_context_matching.txt",
    "content": "1) Get lines and 3 following it containing the string: you\nSolution: grep -A3 'you' sample.txt\n\n2) Get lines and 2 preceding it containing the string: is\nSolution: grep -B2 'is' sample.txt\n\n3) Get lines and 1 following/preceding containing the string: Not\nSolution: grep -C1 'Not' sample.txt\n\n4) Get lines and 1 following and 4 preceding containing the string: Not\nSolution: grep -A1 -B4 'Not' sample.txt\n\n5) Get lines and 1 preceding it containing the string: you\n        there should be no separator between the matches\nSolution: grep --no-group-separator -B1 'you' sample.txt\n\n6) Get lines and 1 preceding it containing the string: you\n        the separator between the matches should be: #####\nSolution: grep --group-separator='#####' -B1 'you' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex07_recursive_search.txt",
    "content": "Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified\n\n1) Match all lines containing the string: you\nSolution: grep -r 'you'\n\n2) Show only filenames matching the string: Hello\n    filenames should only end with .txt \nSolution: grep -rl --include='*.txt' 'Hello'\n\n3) Show only filenames matching the string: Hello\n    filenames should NOT end with .txt \nSolution: grep -rl --exclude='*.txt' 'Hello'\n\n4) Show only filenames matching the string: are\n    should not include the directory: progs\nSolution: grep -rl --exclude-dir='progs' 'are'\n\n5) Show only filenames matching the string: are\n    should NOT include these directories\n            dir1: progs\n            dir2: msg\nSolution: grep -rl --exclude-dir='progs' --exclude-dir='msg' 'are'\n\n6) Show only filenames matching the string: are\n    should include files only from sub-directories\n    hint: use shell glob pattern to specify directories to search\nSolution: grep -rl 'are' */\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex08_search_pattern_from_file.txt",
    "content": "Note: words.txt has only whole words per line, use it as file input when task is to match whole words\n\n1) Match all strings from file words.txt in file baz.txt\nSolution: grep -f words.txt baz.txt \n\n2) Match all words from file words.txt in file foo.txt\n    should only match whole words\n    should print only matching words, not entire line\nSolution: grep -owf words.txt foo.txt\n\n3) Show common lines between foo.txt and baz.txt\nSolution: grep -Fxf foo.txt baz.txt\n\n4) Show lines present in baz.txt but not in foo.txt\nSolution: grep -Fxvf foo.txt baz.txt\n\n5) Show lines present in foo.txt but not in baz.txt\nSolution: grep -Fxvf baz.txt foo.txt\n\n6) Find all words common between all three files in the directory\n    should only match whole words\n    should print only matching words, not entire line\nSolution: grep -owf words.txt foo.txt | grep -owf- baz.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex09_regex_anchors.txt",
    "content": "1) Match all lines starting with: no\nSolution: grep '^no' sample.txt\n\n2) Match all lines ending with: it\nSolution: grep 'it$' sample.txt\n\n3) Match all lines containing whole word: do\nSolution: grep -w 'do' sample.txt\n\n4) Match all lines containing words starting with: do\nSolution: grep '\\<do' sample.txt\n\n5) Match all lines containing words ending with: do\nSolution: grep 'do\\>' sample.txt\n\n6) Match all lines starting with: ^\nSolution: grep '^^' sample.txt\n\n7) Match all lines ending with: $\nSolution: grep '$$' sample.txt\n\n8) Match all lines containing the string: in\n    not surrounded by word boundaries, for ex: mint but not tin or ink\nSolution: grep '\\Bin\\B' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex10_regex_this_or_that.txt",
    "content": "1) Match all lines containing any of these strings:\n        String1: day\n        String2: not\nSolution: grep -E 'day|not' sample.txt\n\n2) Match all lines containing any of these whole words:\n        String1: he\n        String2: in\nSolution: grep -wE 'he|in' sample.txt\n\n3) Match all lines containing any of these strings:\n        String1: you\n        String2: be\n        String3: to\n        String4: he\nSolution: grep -E 'he|be|to|you' sample.txt\n\n4) Match all lines containing any of these strings:\n        String1: you\n        String2: be\n        String3: to\n        String4: he\n    but NOT these strings:\n        String1: it\n        String2: do\nSolution: grep -E 'he|be|to|you' sample.txt | grep -vE 'do|it'\n\n5) Match all lines starting with any of these strings:\n        String1: no\n        String2: to\nSolution: grep -E '^no|^to' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex11_regex_quantifiers.txt",
    "content": "1) Extract all 3 character strings surrounded by word boundaries\nSolution: grep -ow '...' garbled.txt\n\n2) Extract largest string from each line\n        starting with character: d\n        ending with character  : g\nSolution: grep -o 'd.*g' garbled.txt\n\n3) Extract all strings from each line\n        starting with character: d\n        followed by zero or one: o\n        ending with character  : g\nSolution: grep -oE 'do?g' garbled.txt\n\n4) Extract all strings from each line\n        starting with character: d\n        followed by zero or one of any character\n        ending with character  : g\nSolution: grep -oE 'd.?g' garbled.txt\n\n5) Extract all strings from each line\n        starting with character: g\n        followed by atleast one: o\n        ending with character  : d\nSolution: grep -oE 'go+d' garbled.txt\n\n6) Extract all strings from each line\n        starting with character : g\n        followed by extactly six: o\n        ending with character   : d\nSolution: grep -oE 'go{6}d' garbled.txt\n\n7) Extract all strings from each line\n        starting with character         : g\n        followed by min two and max four: o\n        ending with character           : d\nSolution: grep -oE 'go{2,4}d' garbled.txt\n\n8) Extract all strings from each line\n        starting with character: d\n        followed by max of two : o\n        ending with character  : g\nSolution: grep -oE 'do{,2}g' garbled.txt\n\n9) Extract all strings from each line\n        starting with character : g\n        followed by min of three: o\n        ending with character   : d\nSolution: grep -oE 'go{3,}d' garbled.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex12_regex_character_class_part1.txt",
    "content": "1) Match all lines containing any of these characters:\n        character1: q\n        character2: x\n        character3: z\nSolution: grep '[qzx]' sample_words.txt\n\n2) Match all lines containing any of these characters:\n        character1: c\n        character2: f\n    followed by any character\n    followed by   : t\nSolution: grep '[cf].t' sample_words.txt\n\n3) Extract all words starting with character: s\n    ignore case\n    should contain only alphabets\n    minimum two letters\n    should be surrounded by word boundaries\nSolution: grep -iowE 's[a-z]+' sample_words.txt\n\n4) Extract all words made up of these characters:\n        character1: a\n        character2: c\n        character3: e\n        character4: r\n        character5: s\n    ignore case\n    should contain only alphabets\n    should be surrounded by word boundaries\nSolution: grep -iowE '[acers]+' sample_words.txt\n\n5) Extract all numbers surrounded by word boundaries\nSolution: grep -ow '[0-9]*' sample_words.txt\n\n6) Extract all numbers surrounded by word boundaries matching the condition\n    30 <= number <= 70\nSolution: grep -owE '[3-6][0-9]|70' sample_words.txt\n\n7) Extract all words made up of non-vowel characters\n    ignore case\n    should contain only alphabets and at least two\n    should be surrounded by word boundaries\nSolution: grep -iowE '[b-df-hj-np-tv-z]{2,}' sample_words.txt\n\n8) Extract all sequence of strings consisting of character: -\n    surrounded on either side by zero or more case insensitive alphabets    \nSolution: grep -io '[a-z]*-[a-z]*' sample_words.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex13_regex_character_class_part2.txt",
    "content": "1) Extract all characters before first occurrence of =\nSolution: grep -o '^[^=]*' sample.txt\n\n2) Extract all characters from start of line made up of these characters\n        upper or lower case alphabets\n        all digits\n        the underscore character\nSolution: grep -o '^\\w*' sample.txt\n\n3) Match all lines containing the sequence\n        String1: there\n        any number of whitespace\n        String2: have\nSolution: grep 'there\\s*have' sample.txt\n\n4) Extract all characters from start of line made up of these characters\n        upper or lower case alphabets\n        all digits\n        the characters [ and ]\n        ending with ]\nSolution: grep -oi '^[]a-z0-9[]*]' sample.txt\n\n5) Extract all punctuation characters from first line\nSolution: grep -om1 '[[:punct:]]' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex14_regex_grouping_and_backreference.txt",
    "content": "1) Match lines containing these strings\n        String1: scare\n        String2: spore\nSolution: grep -E 's(po|ca)re' sample.txt\n\n2) Extract these words\n        Word1: handy\n        Word2: hand\n        Word3: hands\n        Word4: handful\nSolution: grep -oE 'hand([sy]|ful)?' sample.txt\n\n3) Extract all whole words with at least one letter occurring twice in the word\n    ignore case\n    only alphabets\n    the letter occurring twice need not be placed next to each other\nSolution: grep -ioE '[a-z]*([a-z])[a-z]*\\1[a-z]*' sample.txt\n\n4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line\n    ignore case\nSolution: grep -iE '([a-z]{3}).*\\1' sample.txt\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex15_regex_PCRE.txt",
    "content": "1) Extract all strings to the right of =\n    provided characters from start of line until = do not include [ or ]\nSolution: grep -oP '^[^][=]+=\\K.*' sample.txt\n\n2) Match all lines containing the string: Hi\n    but shouldn't be followed afterwards in the line by: are\nSolution: grep -P 'Hi(?!.*are)' sample.txt\n\n3) Extract from start of line up to the string: Hi\n    provided it is followed afterwards in the line by: you\nSolution: grep -oP '.*Hi(?=.*you)' sample.txt\n\n4) Extract all sequence of characters surrounded on both sides by space character\n    the space character should not be part of output\nSolution: grep -oP ' \\K[^ ]+(?= )' sample.txt\n\n5) Extract all words\n    made of upper or lower case alphabets\n    at least two letters in length\n    surrounded by word boundaries\n    should not contain consecutive repeated alphabets\nSolution: grep -iowP '[a-z]*([a-z])\\1[a-z]*(*SKIP)(*F)|[a-z]{2,}' sample.txt\n\n"
  },
  {
    "path": "exercises/GNU_grep/.ref_solutions/ex16_misc_and_extras.txt",
    "content": "Note: all files in directory are input to grep, unless otherwise specified\n\n1) Extract all negative numbers\n    starts with - followed by one or more digits\n    do not output filenames\nSolution: grep -hoE -- '-[0-9]+' *\n\n2) Display only filenames containing these two strings anywhere in the file\n        String1: day\n        String2: and\nSolution: grep -zlE 'day.*and|and.*day' *\n\n3) The below command\n        grep -c '^Solution:' ../.ref_solutions/*\n    will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed\nSolution: cat ../.ref_solutions/* | grep -c '^Solution:'\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex01_basic_match/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex01_basic_match.txt",
    "content": "1) Match lines containing the string: day\n\n\n2) Match lines containing the string: it\n\n\n3) Match lines containing the string: do you\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex02_basic_options/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex02_basic_options.txt",
    "content": "1) Match lines containing the string irrespective of lower/upper case: no\n\n\n2) Match lines not containing the string: o\n\n\n3) Match lines with line numbers containing the string: it\n\n\n4) Output only number of matching lines containing the string: a\n\n\n5) Match first two lines containing the string: do\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex03_multiple_string_match/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex03_multiple_string_match.txt",
    "content": "1) Match lines containing either of these three strings\n        String1: Not\n        String2: he\n        String3: sun\n\n\n2) Match lines containing both these strings\n        String1: He\n        String2: or\n\n\n3) Match lines containing either of these two strings\n        String1: a\n        String2: i\n   and contains this as well\n        String3: do\n\n\n4) Match lines containing the string\n        String1: it\n   but not these strings\n        String2: No\n        String3: no\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex04_filenames/greeting.txt",
    "content": "Hi, how are you?\n\nHola :)\n\nHello world\n\nGood day\n\nRock on\n"
  },
  {
    "path": "exercises/GNU_grep/ex04_filenames/poem.txt",
    "content": "Roses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n"
  },
  {
    "path": "exercises/GNU_grep/ex04_filenames/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex04_filenames.txt",
    "content": "Note: All files present in the directory should be given as file inputs to grep\n\n1) Show only filenames containing the string: are\n\n\n2) Show only filenames NOT containing the string: two\n\n\n3) Match all lines containing the string: are\n\n\n4) Match maximum of two matching lines along with filenames containing the character: a\n\n\n5) Match all lines without prefixing filename containing the string: to\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex05_word_line_matching/greeting.txt",
    "content": "Hi, how are you?\n\nHola :)\n\nHello World\n\nGood day\n\nRock on\n"
  },
  {
    "path": "exercises/GNU_grep/ex05_word_line_matching/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex05_word_line_matching/words.txt",
    "content": "afar\nfar\ncarfare\nfarce\nfaraway\nairfare\n"
  },
  {
    "path": "exercises/GNU_grep/ex05_word_line_matching.txt",
    "content": "Note: All files present in the directory should be given as file inputs to grep\n\n1) Match lines containing whole word: do\n\n\n2) Match whole lines containing the string: Hello World\n\n\n3) Match lines containing these whole words:\n        Word1: He\n        Word2: far\n\n\n4) Match lines containing the whole word: you\n    and NOT containing the case insensitive string: How\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex06_ABC_context_matching/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex06_ABC_context_matching.txt",
    "content": "1) Get lines and 3 following it containing the string: you\n\n\n2) Get lines and 2 preceding it containing the string: is\n\n\n3) Get lines and 1 following/preceding containing the string: Not\n\n\n4) Get lines and 1 following and 4 preceding containing the string: Not\n\n\n5) Get lines and 1 preceding it containing the string: you\n        there should be no separator between the matches\n\n\n6) Get lines and 1 preceding it containing the string: you\n        the separator between the matches should be: #####\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search/msg/greeting.txt",
    "content": "Hi, how are you?\n\nHola :)\n\nHello World\n\nGood day\n\nRock on\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search/msg/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search/poem.txt",
    "content": "Roses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search/progs/hello.py",
    "content": "#!/usr/bin/python3\n\nprint(\"Hello World\")\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search/progs/hello.sh",
    "content": "#!/bin/bash\n\necho \"Hello $USER\"\necho \"Today is $(date -u +%A)\"\necho 'Hope you are having a nice day'\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search/words.txt",
    "content": "afar\nfar\ncarfare\nfarce\nfaraway\nairfare\n"
  },
  {
    "path": "exercises/GNU_grep/ex07_recursive_search.txt",
    "content": "Note: Every file in this directory and sub-directories is input for grep, unless otherwise specified\n\n1) Match all lines containing the string: you\n\n\n2) Show only filenames matching the string: Hello\n    filenames should only end with .txt \n\n\n3) Show only filenames matching the string: Hello\n    filenames should NOT end with .txt \n\n\n4) Show only filenames matching the string: are\n    should not include the directory: progs\n\n\n5) Show only filenames matching the string: are\n    should NOT include these directories\n            dir1: progs\n            dir2: msg\n\n\n6) Show only filenames matching the string: are\n    should include files only from sub-directories\n    hint: use shell glob pattern to specify directories to search\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex08_search_pattern_from_file/baz.txt",
    "content": "I saw a few red cars going that way\nTo the end!\nAre you coming today to the party?\na[5] = 'good';\nHave you read the Harry Potter series?\n"
  },
  {
    "path": "exercises/GNU_grep/ex08_search_pattern_from_file/foo.txt",
    "content": "part\na[5] = 'good';\nI saw a few red cars going that way\nBelieve it!\nto do list\n"
  },
  {
    "path": "exercises/GNU_grep/ex08_search_pattern_from_file/words.txt",
    "content": "car\npart\nto\nread\n"
  },
  {
    "path": "exercises/GNU_grep/ex08_search_pattern_from_file.txt",
    "content": "Note: words.txt has only whole words per line, use it as file input when task is to match whole words\n\n1) Match all strings from file words.txt in file baz.txt\n\n\n2) Match all words from file words.txt in file foo.txt\n    should only match whole words\n    should print only matching words, not entire line\n\n\n3) Show common lines between foo.txt and baz.txt\n\n\n4) Show lines present in baz.txt but not in foo.txt\n\n\n5) Show lines present in foo.txt but not in baz.txt\n\n\n6) Find all words common between all three files in the directory\n    should only match whole words\n    should print only matching words, not entire line\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex09_regex_anchors/sample.txt",
    "content": "hello world!\n\ngood day\nhow do you do?\n\njust do it\nbelieve it!\n\ntoday is sunny\nnot a bit funny\nno doubt you like it too\n\nmuch ado about nothing\nhe he he\n\n^ could be exponentiation or xor operator\nscalar variables in perl start with $\n"
  },
  {
    "path": "exercises/GNU_grep/ex09_regex_anchors.txt",
    "content": "1) Match all lines starting with: no\n\n\n2) Match all lines ending with: it\n\n\n3) Match all lines containing whole word: do\n\n\n4) Match all lines containing words starting with: do\n\n\n5) Match all lines containing words ending with: do\n\n\n6) Match all lines starting with: ^\n\n\n7) Match all lines ending with: $\n\n\n8) Match all lines containing the string: in\n    not surrounded by word boundaries, for ex: mint but not tin or ink\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex10_regex_this_or_that/sample.txt",
    "content": "hello world!\n\ngood day\nhow do you do?\n\njust do it\nbelieve it!\n\ntoday is sunny\nnot a bit funny\nno doubt you like it too\n\nmuch ado about nothing\nhe he he\n\n^ could be exponentiation or xor operator\nscalar variables in perl start with $\n"
  },
  {
    "path": "exercises/GNU_grep/ex10_regex_this_or_that.txt",
    "content": "1) Match all lines containing any of these strings:\n        String1: day\n        String2: not\n\n\n2) Match all lines containing any of these whole words:\n        String1: he\n        String2: in\n\n\n3) Match all lines containing any of these strings:\n        String1: you\n        String2: be\n        String3: to\n        String4: he\n\n\n4) Match all lines containing any of these strings:\n        String1: you\n        String2: be\n        String3: to\n        String4: he\n    but NOT these strings:\n        String1: it\n        String2: do\n\n\n5) Match all lines starting with any of these strings:\n        String1: no\n        String2: to\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex11_regex_quantifiers/garbled.txt",
    "content": "gd\ngod\ngoood\noh gold\ngoooooodyyyy\ndog\ndg\ndig good gold\ndoogoodog\nc@t made forty justify\ndodging a toy\n"
  },
  {
    "path": "exercises/GNU_grep/ex11_regex_quantifiers.txt",
    "content": "1) Extract all 3 character strings surrounded by word boundaries\n\n\n2) Extract largest string from each line\n        starting with character: d\n        ending with character  : g\n\n\n3) Extract all strings from each line\n        starting with character: d\n        followed by zero or one: o\n        ending with character  : g\n\n\n4) Extract all strings from each line\n        starting with character: d\n        followed by zero or one of any character\n        ending with character  : g\n\n\n5) Extract all strings from each line\n        starting with character: g\n        followed by atleast one: o\n        ending with character  : d\n\n\n6) Extract all strings from each line\n        starting with character : g\n        followed by extactly six: o\n        ending with character   : d\n\n\n7) Extract all strings from each line\n        starting with character         : g\n        followed by min two and max four: o\n        ending with character           : d\n\n\n8) Extract all strings from each line\n        starting with character: d\n        followed by max of two : o\n        ending with character  : g\n\n\n9) Extract all strings from each line\n        starting with character : g\n        followed by min of three: o\n        ending with character   : d\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex12_regex_character_class_part1/sample_words.txt",
    "content": "far 30 scarce f@$t 42 fit\nCute 34 quite pry far-fetched Sure\n70 cast-away 12 good hue he\ncry just Nymph race Peace. 67\nfoo;bar;baz;p@t\nARE 72 cut copy paste\np1ate rest 512 Sync\n"
  },
  {
    "path": "exercises/GNU_grep/ex12_regex_character_class_part1.txt",
    "content": "1) Match all lines containing any of these characters:\n        character1: q\n        character2: x\n        character3: z\n\n\n2) Match all lines containing any of these characters:\n        character1: c\n        character2: f\n    followed by any character\n    followed by   : t\n\n\n3) Extract all words starting with character: s\n    ignore case\n    should contain only alphabets\n    minimum two letters\n    should be surrounded by word boundaries\n\n\n4) Extract all words made up of these characters:\n        character1: a\n        character2: c\n        character3: e\n        character4: r\n        character5: s\n    ignore case\n    should contain only alphabets\n    should be surrounded by word boundaries\n\n\n5) Extract all numbers surrounded by word boundaries\n\n\n6) Extract all numbers surrounded by word boundaries matching the condition\n    30 <= number <= 70\n\n\n7) Extract all words made up of non-vowel characters\n    ignore case\n    should contain only alphabets and at least two\n    should be surrounded by word boundaries\n\n\n8) Extract all sequence of strings consisting of character: -\n    surrounded on either side by zero or more case insensitive alphabets    \n\n"
  },
  {
    "path": "exercises/GNU_grep/ex13_regex_character_class_part2/sample.txt",
    "content": "a[2]='sample string'\nfoo_bar=4232\nappx_pi=3.14\ngreeting=\"Hi  there\t\thave a nice   day\"\nfood[4]=\"dosa\"\nb[0][1]=42\n"
  },
  {
    "path": "exercises/GNU_grep/ex13_regex_character_class_part2.txt",
    "content": "1) Extract all characters before first occurrence of =\n\n\n2) Extract all characters from start of line made up of these characters\n        upper or lower case alphabets\n        all digits\n        the underscore character\n\n\n3) Match all lines containing the sequence\n        String1: there\n        any number of whitespace\n        String2: have\n\n\n4) Extract all characters from start of line made up of these characters\n        upper or lower case alphabets\n        all digits\n        the characters [ and ]\n        ending with ]\n\n\n5) Extract all punctuation characters from first line\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex14_regex_grouping_and_backreference/sample.txt",
    "content": "hands hand library scare handy handful\nscared too big time eel candy\nspare food regulate circuit spore stare\ntire tempt cold malady\n"
  },
  {
    "path": "exercises/GNU_grep/ex14_regex_grouping_and_backreference.txt",
    "content": "1) Match lines containing these strings\n        String1: scare\n        String2: spore\n\n\n2) Extract these words\n        Word1: handy\n        Word2: hand\n        Word3: hands\n        Word4: handful\n\n\n3) Extract all whole words with at least one letter occurring twice in the word\n    ignore case\n    only alphabets\n    the letter occurring twice need not be placed next to each other\n\n\n4) Match lines where same sequence of three consecutive alphabets is matched another time in the same line\n    ignore case\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex15_regex_PCRE/sample.txt",
    "content": "a[2]='Hi, how are you?'\nfoo_bar=4232\nappx_pi=3.14\ngreeting=\"Hi there have a nice day\"\nfood[4]=\"dosa\"\nb[0][1]=42\n"
  },
  {
    "path": "exercises/GNU_grep/ex15_regex_PCRE.txt",
    "content": "1) Extract all strings to the right of =\n    provided characters from start of line until = do not include [ or ]\n\n\n2) Match all lines containing the string: Hi\n    but shouldn't be followed afterwards in the line by: are\n\n\n3) Extract from start of line up to the string: Hi\n    provided it is followed afterwards in the line by: you\n\n\n4) Extract all sequence of characters surrounded on both sides by space character\n    the space character should not be part of output\n\n\n5) Extract all words\n    made of upper or lower case alphabets\n    at least two letters in length\n    surrounded by word boundaries\n    should not contain consecutive repeated alphabets\n\n\n"
  },
  {
    "path": "exercises/GNU_grep/ex16_misc_and_extras/garbled.txt",
    "content": "day and night\n-43 and 99 and 12\n"
  },
  {
    "path": "exercises/GNU_grep/ex16_misc_and_extras/poem.txt",
    "content": "Roses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\nGood day to you :)\n"
  },
  {
    "path": "exercises/GNU_grep/ex16_misc_and_extras/sample.txt",
    "content": "account balance: -2300\ngood day\nfoo and bar and baz\n"
  },
  {
    "path": "exercises/GNU_grep/ex16_misc_and_extras.txt",
    "content": "Note: all files in directory are input to grep, unless otherwise specified\n\n1) Extract all negative numbers\n    starts with - followed by one or more digits\n    do not output filenames\n\n\n2) Display only filenames containing these two strings anywhere in the file\n        String1: day\n        String2: and\n\n\n3) The below command\n        grep -c '^Solution:' ../.ref_solutions/*\n    will give number of questions in each exercise. Change it, using another command and pipe if needed, so that only overall total is printed\n\n\n"
  },
  {
    "path": "exercises/GNU_grep/solve",
    "content": "dir_name=$(basename \"$PWD\")\nref_file=\"../.ref_solutions/$dir_name.txt\"\nsol_file=\"../$dir_name.txt\"\ntmp_file='../.tmp.txt'\n\n# color output\ntcolors=$(tput colors)\nif [[ -n $tcolors && $tcolors -ge 8 ]]; then\n    red=$(tput setaf 1)\n    green=$(tput setaf 2)\n    blue=$(tput setaf 4)\n    clr_color=$(tput sgr0)\nelse\n    red=''\n    green=''\n    blue=''\n    clr_color=''\nfi\n\nsub_sol=0\nif [[ $1 == -s ]]; then\n    prev_cmd=$(fc -ln -2 | sed 's/^[ \\t]*//;q')\n    sub_sol=1\nelif [[ $1 == -q ]]; then\n    # highlight the question to be solved next\n    # or show only the (unanswered)? question to be solved next\n    cat \"$sol_file\"\n    return\nelif [[ -n $1 ]]; then\n    echo -e 'Unknown option...Exiting script'\n    return\nfi\n\ncount=0\nsol_count=0\nerr_count=0\nwhile IFS= read -u3 -r ref_line && read -u4 -r sol_line; do\n    if [[ \"${ref_line:0:9}\" == Solution: ]]; then\n        (( count++ ))\n\n        if [[ $sub_sol == 1 && -z $sol_line ]]; then\n            sol_line=\"$prev_cmd\"\n            sub_sol=0\n        fi\n\n        if [[ \"$(eval \"command ${ref_line:10}\")\" == \"$(eval \"command $sol_line\")\" ]]; then\n            (( sol_count++ ))\n            # use color if terminal supports\n            echo '---------------------------------------------'\n            echo \"Match for question $count:\"\n            echo \"${red}Submitted solution:${clr_color} $sol_line\"\n            echo \"${green}Reference solution:${clr_color} ${ref_line:10}\"\n            echo '---------------------------------------------'\n        else\n            (( err_count++ ))\n            if [[ $err_count == 1 && -n $sol_line ]]; then\n                echo '---------------------------------------------'\n                echo \"Mismatch for question $count:\"\n                echo \"$(tput bold)${red}Expected output is:${clr_color}$(tput rmso)\"\n                eval \"command ${ref_line:10}\"\n                echo '---------------------------------------------'\n            fi\n            sol_line=''\n        fi\n    fi\n\n    echo \"$sol_line\" >> \"$tmp_file\"\n\ndone 3<\"$ref_file\" 4<\"$sol_file\"\n\n((count==sol_count)) && printf \"\\t\\t$(tput bold)${blue}All Pass${clr_color}$(tput rmso)\\t\\t\\n\"\n\nmv \"$tmp_file\" \"$sol_file\"\n\n# vim: syntax=bash\n"
  },
  {
    "path": "exercises/README.md",
    "content": "# <a name=\"exercises\"></a>Exercises\n\nInstructions and shell script here assumes `bash` shell. Tested on *GNU bash, version 4.3.46*\n\n<br>\n\n* For example, the first exercise for **GNU_grep**\n    * directory: `ex01_basic_match`\n    * question file: `ex01_basic_match.txt`\n    * solution reference: `.ref_solutions/ex01_basic_match.txt`\n* Each exercise contains one or more question to be solved\n* The script `solve` will assist in checking solutions\n\n```bash\n$ git clone https://github.com/learnbyexample/Command-line-text-processing.git\n$ cd Command-line-text-processing/exercises/GNU_grep/\n$ ls\nex01_basic_match      ex02_basic_options      ex03_multiple_string_match      solve\nex01_basic_match.txt  ex02_basic_options.txt  ex03_multiple_string_match.txt\n\n$ find -name 'ex01*'\n./.ref_solutions/ex01_basic_match.txt\n./ex01_basic_match\n./ex01_basic_match.txt\n```\n\n<br>\n\n* Solving the questions\n    * Go to the exercise folder\n    * Use `ls` to see input file(s)\n    * To see the problems for that exercise, follow the steps below\n\n```bash\n$ cd ex01_basic_match\n$ ls\nsample.txt\n\n$ # to see the questions\n$ source ../solve -q\n1) Match lines containing the string: day\n\n\n2) Match lines containing the string: it\n\n\n3) Match lines containing the string: do you\n\n\n$ # or open the questions file with your fav editor\n$ gvim ../$(basename \"$PWD\").txt\n$ # create an alias to use from any ex* directory\n$ alias oq='gvim ../$(basename \"$PWD\").txt'\n$ oq\n```\n\n<br>\n\n* Submitting solutions one by one\n    * immediately after executing command that answers a question, call the `solve` script\n\n```bash\n$ grep 'day' sample.txt \nGood day\nToday is sunny\n$ source ../solve -s\n---------------------------------------------\nMatch for question 1:\nSubmitted solution: grep 'day' sample.txt \nReference solution: grep 'day' sample.txt\n---------------------------------------------\n```\n\n<br>\n\n* Submit all at once\n    * by editing the `../$(basename \"$PWD\").txt` file directly\n    * the answer should replace the empty line immediately following the question\n* **Note**\n    * there are different ways to solve the same question\n    * but for specific exercise like **GNU_grep** try to solve using `grep` only\n    * also, remember that `eval` is used to check equivalence. So be sure of commands submitted\n\n```bash\n$ cat ../$(basename \"$PWD\").txt\n1) Match lines containing the string: day\ngrep 'day' sample.txt\n\n2) Match lines containing the string: it\nsed -n '/it/p' sample.txt\n\n3) Match lines containing the string: do you\necho 'How do you do?'\n\n$ source ../solve\n---------------------------------------------\nMatch for question 1:\nSubmitted solution: grep 'day' sample.txt\nReference solution: grep 'day' sample.txt\n---------------------------------------------\n---------------------------------------------\nMatch for question 2:\nSubmitted solution: sed -n '/it/p' sample.txt\nReference solution: grep 'it' sample.txt\n---------------------------------------------\n---------------------------------------------\nMatch for question 3:\nSubmitted solution: echo 'How do you do?'\nReference solution: grep 'do you' sample.txt\n---------------------------------------------\n\t\tAll Pass\t\t\n```\n\n<br>\n\n* Then move on to next exercise directory\n* Create aliases for different commands for easy use, after checking that the aliases are available of course\n\n```bash\n$ type cs cq ca nq pq\nbash: type: cs: not found\nbash: type: cq: not found\nbash: type: ca: not found\nbash: type: nq: not found\nbash: type: pq: not found\n\n$ alias cs='source ../solve -s'\n$ alias cq='source ../solve -q'\n$ alias ca='source ../solve'\n$ # to go to directory of next question\n$ nq() { d=$(basename \"$PWD\"); nd=$(printf \"../ex%02d*/\" $((${d:2:2}+1))); cd $nd ; }\n$ # to go to directory of previous question\n$ pq() { d=$(basename \"$PWD\"); pd=$(printf \"../ex%02d*/\" $((${d:2:2}-1))); cd $pd ; }\n```\n\n<br>\n\nIf wrong solution is submitted, the expected output is shown. This also helps to better understand the question as I found it difficult to convey the intent of question clearly with words alone...\n\n```bash\n$ source ../solve -q\n1) Match lines containing the string: day\n\n\n2) Match lines containing the string: it\n\n\n3) Match lines containing the string: do you\n\n$ grep 'do' sample.txt \nHow do you do?\nJust do it\nNo doubt you like it too\nMuch ado about nothing\n$ source ../solve -s\n---------------------------------------------\nMismatch for question 1:\nExpected output is:\nGood day\nToday is sunny\n---------------------------------------------\n```\n"
  },
  {
    "path": "file_attributes.md",
    "content": "# <a name=\"file-attributes\"></a>File attributes\n\n**Table of Contents**\n\n* [wc](#wc)\n    * [Various counts](#various-counts)\n    * [subtle differences](#subtle-differences)\n    * [Further reading for wc](#further-reading-for-wc)\n* [du](#du)\n    * [Default size](#default-size)\n    * [Various size formats](#various-size-formats)\n    * [Dereferencing links](#dereferencing-links)\n    * [Filtering options](#filtering-options)\n    * [Further reading for du](#further-reading-for-du)\n* [df](#df)\n    * [Examples](#examples)\n    * [Further reading for df](#further-reading-for-df)\n* [touch](#touch)\n    * [Creating empty file](#creating-empty-file)\n    * [Updating timestamps](#updating-timestamps)\n    * [Preserving timestamp](#preserving-timestamp)\n    * [Further reading for touch](#further-reading-for-touch)\n* [file](#file)\n    * [File type examples](#file-type-examples)\n    * [Further reading for file](#further-reading-for-file)\n\n<br>\n\n## <a name=\"wc\"></a>wc\n\n```bash\n$ wc --version | head -n1\nwc (GNU coreutils) 8.25\n\n$ man wc\nWC(1)                            User Commands                           WC(1)\n\nNAME\n       wc - print newline, word, and byte counts for each file\n\nSYNOPSIS\n       wc [OPTION]... [FILE]...\n       wc [OPTION]... --files0-from=F\n\nDESCRIPTION\n       Print newline, word, and byte counts for each FILE, and a total line if\n       more than one FILE is specified.  A word is a non-zero-length  sequence\n       of characters delimited by white space.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"various-counts\"></a>Various counts\n\n```bash\n$ cat sample.txt\nHello World\nGood day\nNo doubt you like it too\nMuch ado about nothing\nHe he he\n\n$ # by default, gives newline/word/byte count (in that order)\n$ wc sample.txt\n 5 17 78 sample.txt\n\n$ # options to get individual numbers\n$ wc -l sample.txt\n5 sample.txt\n$ wc -w sample.txt\n17 sample.txt\n$ wc -c sample.txt\n78 sample.txt\n\n$ # use shell input redirection if filename is not needed\n$ wc -l < sample.txt\n5\n```\n\n* multiple file input\n* automatically displays total at end\n\n```bash\n$ cat greeting.txt\nHello there\nHave a safe journey\n$ cat fruits.txt\nFruit   Price\napple   42\nbanana  31\nfig     90\nguava   6\n\n$ wc *.txt\n  5  10  57 fruits.txt\n  2   6  32 greeting.txt\n  5  17  78 sample.txt\n 12  33 167 total\n```\n\n* use `-L` to get length of longest line\n\n```bash\n$ wc -L < sample.txt\n24\n\n$ echo 'foo bar baz' | wc -L\n11\n$ echo 'hi there!' | wc -L\n9\n\n$ # last line will show max value, not sum of all input\n$ wc -L *.txt\n 13 fruits.txt\n 19 greeting.txt\n 24 sample.txt\n 24 total\n```\n\n<br>\n\n#### <a name=\"subtle-differences\"></a>subtle differences\n\n* byte count vs character count\n\n```bash\n$ # when input is ASCII\n$ printf 'hi there' | wc -c\n8\n$ printf 'hi there' | wc -m\n8\n\n$ # when input has multi-byte characters\n$ printf 'hi👍' | od -x\n0000000 6968 9ff0 8d91\n0000006\n\n$ printf 'hi👍' | wc -m\n3\n\n$ printf 'hi👍' | wc -c\n6\n```\n\n* `-l` option gives only the count of number of newline characters\n\n```bash\n$ printf 'hi there\\ngood day' | wc -l\n1\n$ printf 'hi there\\ngood day\\n' | wc -l\n2\n$ printf 'hi there\\n\\n\\nfoo\\n' | wc -l\n4\n```\n\n* From `man wc` \"A word is a non-zero-length sequence of characters delimited by white space\"\n\n```bash\n$ echo 'foo        bar ;-*' | wc -w\n3\n\n$ # use other text processing as needed\n$ echo 'foo        bar ;-*' | grep -iowE '[a-z]+'\nfoo\nbar\n$ echo 'foo        bar ;-*' | grep -iowE '[a-z]+' | wc -l\n2\n```\n\n* `-L` won't count non-printable characters and tabs are converted to equivalent spaces\n\n```bash\n$ printf 'food\\tgood' | wc -L\n12\n$ printf 'food\\tgood' | wc -m\n9\n$ printf 'food\\tgood' | awk '{print length()}'\n9\n\n$ printf 'foo\\0bar\\0baz' | wc -L\n9\n$ printf 'foo\\0bar\\0baz' | wc -m\n11\n$ printf 'foo\\0bar\\0baz' | awk '{print length()}'\n11\n```\n\n<br>\n\n#### <a name=\"further-reading-for-wc\"></a>Further reading for wc\n\n* `man wc` and `info wc` for more options and detailed documentation\n* [wc Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/wc?sort=votes&pageSize=15)\n* [wc Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/wc?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"du\"></a>du\n\n```bash\n$ du --version | head -n1\ndu (GNU coreutils) 8.25\n\n$ man du\nDU(1)                            User Commands                           DU(1)\n\nNAME\n       du - estimate file space usage\n\nSYNOPSIS\n       du [OPTION]... [FILE]...\n       du [OPTION]... --files0-from=F\n\nDESCRIPTION\n       Summarize disk usage of the set of FILEs, recursively for directories.\n...\n```\n\n<br>\n\n<br>\n\n#### <a name=\"default-size\"></a>Default size\n\n* By default, size is given in size of **1024 bytes**\n* Files are ignored, all directories and sub-directories are recursively reported\n\n```bash\n$ ls -F\nprojs/  py_learn@  words.txt\n\n$ du\n17920   ./projs/full_addr\n14316   ./projs/half_addr\n32952   ./projs\n33880   .\n```\n\n* use `-a` to recursively show both files and directories\n* use `-s` to show total directory size without descending into its sub-directories\n\n```bash\n$ du -a\n712     ./projs/report.log\n17916   ./projs/full_addr/faddr.v\n17920   ./projs/full_addr\n14312   ./projs/half_addr/haddr.v\n14316   ./projs/half_addr\n32952   ./projs\n0       ./py_learn\n924     ./words.txt\n33880   .\n\n$ du -s\n33880   .\n\n$ du -s projs words.txt\n32952   projs\n924     words.txt\n```\n\n* use `-S` to show directory size without taking into account size of its sub-directories\n\n```bash\n$ du -S\n17920   ./projs/full_addr\n14316   ./projs/half_addr\n716     ./projs\n928     .\n```\n\n<br>\n\n<br>\n\n#### <a name=\"various-size-formats\"></a>Various size formats\n\n```bash\n$ # number of bytes\n$ stat -c %s words.txt\n938848\n$ du -b words.txt\n938848  words.txt\n\n$ # kilobytes = 1024 bytes\n$ du -sk projs\n32952   projs\n$ # megabytes = 1024 kilobytes\n$ du -sm projs\n33      projs\n\n$ # -B to specify custom byte scale size\n$ du -sB 5000 projs\n6749    projs\n$ du -sB 1048576 projs\n33      projs\n```\n\n* human readable and si units\n\n```bash\n$ # in terms of powers of 1024\n$ # M = 1048576 bytes and so on\n$ du -sh projs/* words.txt\n18M     projs/full_addr\n14M     projs/half_addr\n712K    projs/report.log\n924K    words.txt\n\n$ # in terms of powers of 1000\n$ # M = 1000000 bytes and so on\n$ du -s --si projs/* words.txt\n19M     projs/full_addr\n15M     projs/half_addr\n730k    projs/report.log\n947k    words.txt\n```\n\n* sorting\n\n```bash\n$ du -sh projs/* words.txt | sort -h\n712K    projs/report.log\n924K    words.txt\n14M     projs/half_addr\n18M     projs/full_addr\n\n$ du -sk projs/* | sort -nr\n17920   projs/full_addr\n14316   projs/half_addr\n712     projs/report.log\n```\n\n* to get size based on number of characters in file rather than disk space alloted\n\n```bash\n$ du -b words.txt\n938848  words.txt\n\n$ du -h words.txt\n924K    words.txt\n\n$ # 938848/1024 = 916.84\n$ du --apparent-size -h words.txt\n917K    words.txt\n```\n\n<br>\n\n#### <a name=\"dereferencing-links\"></a>Dereferencing links\n\n* See `man` and `info` pages for other related options\n\n```bash\n$ # -D to dereference command line argument\n$ du py_learn\n0       py_learn\n$ du -shD py_learn\n503M    py_learn\n\n$ # -L to dereference links found by du\n$ du -sh\n34M     .\n$ du -shL\n536M    .\n```\n\n<br>\n\n#### <a name=\"filtering-options\"></a>Filtering options\n\n* `-d` to specify maximum depth\n\n```bash\n$ du -ah projs\n712K    projs/report.log\n18M     projs/full_addr/faddr.v\n18M     projs/full_addr\n14M     projs/half_addr/haddr.v\n14M     projs/half_addr\n33M     projs\n\n$ du -ah -d1 projs\n712K    projs/report.log\n18M     projs/full_addr\n14M     projs/half_addr\n33M     projs\n```\n\n* `-c` to also show total size at end\n\n```bash\n$ du -cshD projs py_learn\n33M     projs\n503M    py_learn\n535M    total\n```\n\n* `-t` to provide a threshold comparison\n\n```bash\n$ # >= 15M\n$ du -Sh -t 15M\n18M     ./projs/full_addr\n\n$ # <= 1M\n$ du -ah -t -1M\n712K    ./projs/report.log\n0       ./py_learn\n924K    ./words.txt\n```\n\n* excluding files/directories based on **glob** pattern\n* see also `--exclude-from=FILE` and `--files0-from=FILE` options\n\n```bash\n$ # note that excluded files affect directory size reported\n$ du -ah --exclude='*addr*' projs\n712K    projs/report.log\n716K    projs\n\n$ # depending on shell, brace expansion can be used\n$ du -ah --exclude='*.'{v,log} projs\n4.0K    projs/full_addr\n4.0K    projs/half_addr\n12K     projs\n```\n\n<br>\n\n#### <a name=\"further-reading-for-du\"></a>Further reading for du\n\n* `man du` and `info du` for more options and detailed documentation\n* [du Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/disk-usage?sort=votes&pageSize=15)\n* [du Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/du?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"df\"></a>df\n\n```bash\n$ df --version | head -n1\ndf (GNU coreutils) 8.25\n\n$ man df\nDF(1)                            User Commands                           DF(1)\n\nNAME\n       df - report file system disk space usage\n\nSYNOPSIS\n       df [OPTION]... [FILE]...\n\nDESCRIPTION\n       This  manual  page  documents  the  GNU version of df.  df displays the\n       amount of disk space available on the file system containing each  file\n       name  argument.   If  no file name is given, the space available on all\n       currently mounted file systems is shown.\n...\n```\n\n<br>\n\n#### <a name=\"examples\"></a>Examples\n\n```bash\n$ # use df without arguments to get information on all currently mounted file systems\n$ df .\nFilesystem     1K-blocks     Used Available Use% Mounted on\n/dev/sda1       98298500 58563816  34734748  63% /\n\n$ # use -B option for custom size\n$ # use --si for size in powers of 1000 instead of 1024\n$ df -h .\nFilesystem      Size  Used Avail Use% Mounted on\n/dev/sda1        94G   56G   34G  63% /\n```\n\n* Use `--output` to report only specific fields of interest\n\n```bash\n$ df -h --output=size,used,file / /media/learnbyexample/projs\n Size  Used File\n  94G   56G /\n  92G   35G /media/learnbyexample/projs\n\n$ df -h --output=pcent .\nUse%\n 63%\n\n$ df -h --output=pcent,fstype | awk -F'%' 'NR>2 && $1>=40'\n 63% ext3\n 40% ext4\n 51% ext4\n```\n\n<br>\n\n#### <a name=\"further-reading-for-df\"></a>Further reading for df\n\n* `man df` and `info df` for more options and detailed documentation\n* [df Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/df?sort=votes&pageSize=15)\n* [Parsing df command output with awk](https://unix.stackexchange.com/questions/360865/parsing-df-command-output-with-awk)\n* [processing df output](https://www.reddit.com/r/bash/comments/68dbml/using_an_array_variable_in_an_awk_command/)\n\n<br>\n\n## <a name=\"touch\"></a>touch\n\n```bash\n$ touch --version | head -n1\ntouch (GNU coreutils) 8.25\n\n$ man touch\nTOUCH(1)                         User Commands                        TOUCH(1)\n\nNAME\n       touch - change file timestamps\n\nSYNOPSIS\n       touch [OPTION]... FILE...\n\nDESCRIPTION\n       Update  the  access  and modification times of each FILE to the current\n       time.\n\n       A FILE argument that does not exist is created empty, unless -c  or  -h\n       is supplied.\n...\n```\n\n<br>\n\n#### <a name=\"creating-empty-file\"></a>Creating empty file\n\n```bash\n$ ls foo.txt\nls: cannot access 'foo.txt': No such file or directory\n$ touch foo.txt\n$ ls foo.txt\nfoo.txt\n\n$ # use -c if new file shouldn't be created\n$ rm foo.txt\n$ touch -c foo.txt\n$ ls foo.txt\nls: cannot access 'foo.txt': No such file or directory\n```\n\n<br>\n\n#### <a name=\"updating-timestamps\"></a>Updating timestamps\n\n* Updating both access and modification timestamp to current time\n\n```bash\n$ # last access time\n$ stat -c %x fruits.txt\n2017-07-19 17:06:01.523308599 +0530\n$ # last modification time\n$ stat -c %y fruits.txt\n2017-07-13 13:54:03.576055933 +0530\n\n$ touch fruits.txt\n$ stat -c %x fruits.txt\n2017-07-21 10:11:44.241921229 +0530\n$ stat -c %y fruits.txt\n2017-07-21 10:11:44.241921229 +0530\n```\n\n* Updating only access or modification timestamp\n\n```bash\n$ touch -a greeting.txt\n$ stat -c %x greeting.txt\n2017-07-21 10:14:08.457268564 +0530\n$ stat -c %y greeting.txt\n2017-07-13 13:54:26.004499660 +0530\n\n$ touch -m sample.txt\n$ stat -c %x sample.txt\n2017-07-13 13:48:24.945450646 +0530\n$ stat -c %y sample.txt\n2017-07-21 10:14:40.770006144 +0530\n```\n\n* Using timestamp from another file to update\n\n```bash\n$ stat -c $'%x\\n%y' power.log report.log\n2017-07-19 10:48:03.978295434 +0530\n2017-07-14 20:50:42.850887578 +0530\n2017-06-24 13:00:31.773583923 +0530\n2017-06-24 12:59:53.316751651 +0530\n\n$ # copy both access and modification timestamp from power.log to report.log\n$ touch -r power.log report.log\n$ stat -c $'%x\\n%y' report.log\n2017-07-19 10:48:03.978295434 +0530\n2017-07-14 20:50:42.850887578 +0530\n\n$ # add -a or -m options to limit to only access or modification timestamp\n```\n\n* Using date string to update\n* See also `-t` option\n\n```bash\n$ # add -a or -m as needed\n$ touch -d '2010-03-17 17:04:23' report.log\n$ stat -c $'%x\\n%y' report.log\n2010-03-17 17:04:23.000000000 +0530\n2010-03-17 17:04:23.000000000 +0530\n```\n\n<br>\n\n#### <a name=\"preserving-timestamp\"></a>Preserving timestamp\n\n* Text processing on files would update the timestamps\n\n```bash\n$ stat -c $'%x\\n%y' power.log\n2017-07-21 11:11:42.862874240 +0530\n2017-07-13 21:31:53.496323704 +0530\n\n$ sed -i 's/foo/bar/g' power.log\n$ stat -c $'%x\\n%y' power.log\n2017-07-21 11:12:20.303504336 +0530\n2017-07-21 11:12:20.303504336 +0530\n```\n\n* `touch` can be used to restore timestamps after processing\n\n```bash\n$ # first copy the timestamps using touch -r\n$ stat -c $'%x\\n%y' story.txt\n2017-06-24 13:00:31.773583923 +0530\n2017-06-24 12:59:53.316751651 +0530\n$ # tmp.txt is temporary empty file\n$ touch -r story.txt tmp.txt\n$ stat -c $'%x\\n%y' tmp.txt\n2017-06-24 13:00:31.773583923 +0530\n2017-06-24 12:59:53.316751651 +0530\n\n$ # after text processing, copy back the timestamps and remove temporary file\n$ sed -i 's/cat/dog/g' story.txt\n$ touch -r tmp.txt story.txt && rm tmp.txt\n$ stat -c $'%x\\n%y' story.txt\n2017-06-24 13:00:31.773583923 +0530\n2017-06-24 12:59:53.316751651 +0530\n```\n\n<br>\n\n#### <a name=\"further-reading-for-touch\"></a>Further reading for touch\n\n* `man touch` and `info touch` for more options and detailed documentation\n* [touch Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/touch?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"file\"></a>file\n\n```bash\n$ file --version | head -n1\nfile-5.25\n\n$ man file\nFILE(1)                   BSD General Commands Manual                  FILE(1)\n\nNAME\n     file — determine file type\n\nSYNOPSIS\n     file [-bcEhiklLNnprsvzZ0] [--apple] [--extension] [--mime-encoding]\n          [--mime-type] [-e testname] [-F separator] [-f namefile]\n          [-m magicfiles] [-P name=value] file ...\n     file -C [-m magicfiles]\n     file [--help]\n\nDESCRIPTION\n     This manual page documents version 5.25 of the file command.\n\n     file tests each argument in an attempt to classify it.  There are three\n     sets of tests, performed in this order: filesystem tests, magic tests,\n     and language tests.  The first test that succeeds causes the file type to\n     be printed.\n...\n```\n\n<br>\n\n<br>\n\n#### <a name=\"file-type-examples\"></a>File type examples\n\n```bash\n$ file sample.txt\nsample.txt: ASCII text\n$ # without file name in output\n$ file -b sample.txt\nASCII text\n\n$ printf 'hi👍\\n' | file -\n/dev/stdin: UTF-8 Unicode text\n$ printf 'hi👍\\n' | file -i -\n/dev/stdin: text/plain; charset=utf-8\n\n$ file ch\nch:  Bourne-Again shell script, ASCII text executable\n\n$ file sunset.jpg moon.png\nsunset.jpg: JPEG image data\nmoon.png: PNG image data, 32 x 32, 8-bit/color RGBA, non-interlaced\n```\n\n* different line terminators\n\n```bash\n$ printf 'hi' | file -\n/dev/stdin: ASCII text, with no line terminators\n\n$ printf 'hi\\r' | file -\n/dev/stdin: ASCII text, with CR line terminators\n\n$ printf 'hi\\r\\n' | file -\n/dev/stdin: ASCII text, with CRLF line terminators\n\n$ printf 'hi\\n' | file -\n/dev/stdin: ASCII text\n```\n\n* find all files of particular type in current directory, for example `image` files\n\n```bash\n$ find -type f -exec bash -c '(file -b \"$0\" | grep -wq \"image data\") && echo \"$0\"' {} \\;\n./sunset.jpg\n./moon.png\n\n$ # if filenames do not contain : or newline characters\n$ find -type f -exec file {} + | awk -F: '/\\<image data\\>/{print $1}'\n./sunset.jpg\n./moon.png\n```\n\n<br>\n\n#### <a name=\"further-reading-for-file\"></a>Further reading for file\n\n* `man file` and `info file` for more options and detailed documentation\n* See also `identify` command which `describes the format and characteristics of one or more image files`\n"
  },
  {
    "path": "gnu_awk.md",
    "content": "<br> <br> <br>\n\n---\n\n:information_source: :information_source: This chapter has been converted into a better formatted ebook: https://learnbyexample.github.io/learn_gnuawk/. The ebook also has content updated for newer version of the commands, includes a chapter on regular expressions, has exercises, solutions, etc.\n\nFor markdown source and links to buy pdf/epub versions, see: https://github.com/learnbyexample/learn_gnuawk\n\n---\n\n<br> <br> <br>\n\n## <a name=\"gnu-awk\"></a>GNU awk\n\n**Table of Contents**\n\n* [Field processing](#field-processing)\n    * [Default field separation](#default-field-separation)\n    * [Specifying different input field separator](#specifying-different-input-field-separator)\n    * [Specifying different output field separator](#specifying-different-output-field-separator)\n* [Filtering](#filtering)\n    * [Idiomatic print usage](#idiomatic-print-usage)\n    * [Field comparison](#field-comparison)\n    * [Regular expressions based filtering](#regular-expressions-based-filtering)\n    * [Fixed string matching](#fixed-string-matching)\n    * [Line number based filtering](#line-number-based-filtering)\n* [Case Insensitive filtering](#case-insensitive-filtering)\n* [Changing record separators](#changing-record-separators)\n    * [Paragraph mode](#paragraph-mode)\n    * [Multicharacter RS](#multicharacter-rs)\n* [Substitute functions](#substitute-functions)\n* [Inplace file editing](#inplace-file-editing)\n* [Using shell variables](#using-shell-variables)\n* [Multiple file input](#multiple-file-input)\n* [Control Structures](#control-structures)\n    * [if-else and loops](#if-else-and-loops)\n    * [next and nextfile](#next-and-nextfile)\n* [Multiline processing](#multiline-processing)\n* [Two file processing](#two-file-processing)\n    * [Comparing whole lines](#comparing-whole-lines)\n    * [Comparing specific fields](#comparing-specific-fields)\n    * [getline](#getline)\n* [Creating new fields](#creating-new-fields)\n* [Dealing with duplicates](#dealing-with-duplicates)\n* [Lines between two REGEXPs](#lines-between-two-regexps)\n    * [All unbroken blocks](#all-unbroken-blocks)\n    * [Specific blocks](#specific-blocks)\n    * [Broken blocks](#broken-blocks)\n* [Arrays](#arrays)\n* [awk scripts](#awk-scripts)\n* [Miscellaneous](#miscellaneous)\n    * [FPAT and FIELDWIDTHS](#fpat-and-fieldwidths)\n    * [String functions](#string-functions)\n    * [Executing external commands](#executing-external-commands)\n    * [printf formatting](#printf-formatting)\n    * [Redirecting print output](#redirecting-print-output)\n* [Gotchas and Tips](#gotchas-and-tips)\n* [Further Reading](#further-reading)\n\n<br>\n\n```bash\n$ awk --version | head -n1\nGNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.4, GNU MP 6.1.0)\n\n$ man awk\nGAWK(1)                        Utility Commands                        GAWK(1)\n\nNAME\n       gawk - pattern scanning and processing language\n\nSYNOPSIS\n       gawk [ POSIX or GNU style options ] -f program-file [ -- ] file ...\n       gawk [ POSIX or GNU style options ] [ -- ] program-text file ...\n\nDESCRIPTION\n       Gawk  is  the  GNU Project's implementation of the AWK programming lan‐\n       guage.  It conforms to the definition of  the  language  in  the  POSIX\n       1003.1  Standard.   This version in turn is based on the description in\n       The AWK Programming Language, by Aho, Kernighan, and Weinberger.   Gawk\n       provides  the additional features found in the current version of Brian\n       Kernighan's awk and a number of GNU-specific extensions.\n...\n```\n\n**Prerequisites and notes**\n\n* familiarity with programming concepts like variables, printing, control structures, arrays, etc\n* familiarity with regular expressions\n    * if not, check out **ERE** portion of [GNU sed regular expressions](./gnu_sed.md#regular-expressions) which is close enough to features available in `gawk`\n* this tutorial is primarily focussed on short programs that are easily usable from command line, similar to using `grep`, `sed`, etc\n* see [Gawk: Effective AWK Programming](https://www.gnu.org/software/gawk/manual/) manual for complete reference, has information on other `awk` versions as well as notes on POSIX standard\n\n<br>\n\n## <a name=\"field-processing\"></a>Field processing\n\n<br>\n\n#### <a name=\"default-field-separation\"></a>Default field separation\n\n* `$0` contains the entire input record\n    * default input record separator is newline character\n* `$1` contains the first field text\n    * default input field separator is one or more of continuous space, tab or newline characters\n* `$2` contains the second field text and so on\n* `$(2+3)` result of expressions can be used, this one evaluates to `$5` and hence gives fifth field\n    * similarly if variable `i` has value `2`, then `$(i+3)` will give fifth field\n    * See also [gawk manual - Expressions](https://www.gnu.org/software/gawk/manual/html_node/Expressions.html)\n* `NF` is a built-in variable which contains number of fields in the current record\n    * so, `$NF` will give last field\n    * `$(NF-1)` will give second last field and so on\n\n```bash\n$ cat fruits.txt\nfruit   qty\napple   42\nbanana  31\nfig     90\nguava   6\n\n$ # print only first field\n$ awk '{print $1}' fruits.txt\nfruit\napple\nbanana\nfig\nguava\n\n$ # print only second field\n$ awk '{print $2}' fruits.txt\nqty\n42\n31\n90\n6\n```\n\n<br>\n\n#### <a name=\"specifying-different-input-field-separator\"></a>Specifying different input field separator\n\n* by using `-F` command line option\n* by setting `FS` variable\n* See [FPAT and FIELDWIDTHS](#fpat-and-fieldwidths) section for other ways of defining input fields\n\n```bash\n$ # second field where input field separator is :\n$ echo 'foo:123:bar:789' | awk -F: '{print $2}'\n123\n\n$ # last field\n$ echo 'foo:123:bar:789' | awk -F: '{print $NF}'\n789\n\n$ # first and last field\n$ # note the use of , and space between output fields\n$ echo 'foo:123:bar:789' | awk -F: '{print $1, $NF}'\nfoo 789\n\n$ # second last field\n$ echo 'foo:123:bar:789' | awk -F: '{print $(NF-1)}'\nbar\n\n$ # use quotes to avoid clashes with shell special characters\n$ echo 'one;two;three;four' | awk -F';' '{print $3}'\nthree\n```\n\n* Regular expressions based input field separator\n\n```bash\n$ echo 'Sample123string54with908numbers' | awk -F'[0-9]+' '{print $2}'\nstring\n\n$ # first field will be empty as there is nothing before '{'\n$ echo '{foo}   bar=baz' | awk -F'[{}= ]+' '{print $1}'\n\n$ echo '{foo}   bar=baz' | awk -F'[{}= ]+' '{print $2}'\nfoo\n$ echo '{foo}   bar=baz' | awk -F'[{}= ]+' '{print $3}'\nbar\n```\n\n* default input field separator is one or more of continuous space, tab or newline characters (will be termed as whitespace here on)\n    * exact same behavior if `FS` is assigned single space character\n* in addition, leading and trailing whitespaces won't be considered when splitting the input record\n\n```bash\n$ printf ' a    ate b\\tc   \\n'\n a    ate b     c\n$ printf ' a    ate b\\tc   \\n' | awk '{print $1}'\na\n$ printf ' a    ate b\\tc   \\n' | awk '{print NF}'\n4\n$ # same behavior if FS is assigned to single space character\n$ printf ' a    ate b\\tc   \\n' | awk -F' ' '{print $1}'\na\n$ printf ' a    ate b\\tc   \\n' | awk -F' ' '{print NF}'\n4\n\n$ # for anything else, leading/trailing whitespaces will be considered\n$ printf ' a    ate b\\tc   \\n' | awk -F'[ \\t]+' '{print $2}'\na\n$ printf ' a    ate b\\tc   \\n' | awk -F'[ \\t]+' '{print NF}'\n6\n```\n\n* assigning empty string to FS will split the input record character wise\n* note the use of command line option `-v` to set FS\n\n```bash\n$ echo 'apple' | awk -v FS= '{print $1}'\na\n$ echo 'apple' | awk -v FS= '{print $2}'\np\n$ echo 'apple' | awk -v FS= '{print $NF}'\ne\n\n$ # detecting multibyte characters depends on locale\n$ printf 'hi👍 how are you?' | awk -v FS= '{print $3}'\n👍\n```\n\n**Further Reading**\n\n* [gawk manual - Field Splitting Summary](https://www.gnu.org/software/gawk/manual/html_node/Field-Splitting-Summary.html#Field-Splitting-Summary)\n* [stackoverflow - explanation on default FS](https://stackoverflow.com/questions/30405694/default-field-separator-for-awk)\n* [unix.stackexchange - filter lines if it contains a particular character only once](https://unix.stackexchange.com/questions/362550/how-to-remove-line-if-it-contains-a-character-exactly-once)\n* [stackoverflow - Processing 2 files with different field separators](https://stackoverflow.com/questions/24516141/awk-processing-2-files-with-different-field-separators)\n\n<br>\n\n#### <a name=\"specifying-different-output-field-separator\"></a>Specifying different output field separator\n\n* by setting `OFS` variable\n* also gets added between every argument to `print` statement\n    * use [printf](#printf-formatting) to avoid this\n* default is single space\n\n```bash\n$ # statements inside BEGIN are executed before processing any input text\n$ echo 'foo:123:bar:789' | awk 'BEGIN{FS=OFS=\":\"} {print $1, $NF}'\nfoo:789\n$ # can also be set using command line option -v\n$ echo 'foo:123:bar:789' | awk -F: -v OFS=':' '{print $1, $NF}'\nfoo:789\n\n$ # changing a field will re-build contents of $0\n$ echo ' a      ate b   ' | awk '{$2 = \"foo\"; print $0}' | cat -A\na foo b$\n\n$ # $1=$1 is an idiomatic way to re-build when there is nothing else to change\n$ echo 'foo:123:bar:789' | awk -F: -v OFS='-' '{print $0}'\nfoo:123:bar:789\n$ echo 'foo:123:bar:789' | awk -F: -v OFS='-' '{$1=$1; print $0}'\nfoo-123-bar-789\n\n$ # OFS is used to separate different arguments given to print\n$ echo 'foo:123:bar:789' | awk -F: -v OFS='\\t' '{print $1, $3}'\nfoo     bar\n\n$ echo 'Sample123string54with908numbers' | awk -F'[0-9]+' '{$1=$1; print $0}'\nSample string with numbers\n```\n\n<br>\n\n## <a name=\"filtering\"></a>Filtering\n\n<br>\n\n#### <a name=\"idiomatic-print-usage\"></a>Idiomatic print usage\n\n* `print` statement with no arguments will print contents of `$0`\n* if condition is specified without corresponding statements, contents of `$0` is printed if condition evaluates to true\n* `1` is typically used to represent always true condition and thus print contents of `$0`\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # displaying contents of input file(s) similar to 'cat' command\n$ # equivalent to using awk '{print $0}' and awk '1'\n$ awk '{print}' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n```\n\n<br>\n\n#### <a name=\"field-comparison\"></a>Field comparison\n\n* Each block of statements within `{}` can be prefixed by an optional condition so that those statements will execute only if condition evaluates to true\n* Condition specified without corresponding statements will lead to printing contents of `$0` if condition evaluates to true\n\n```bash\n$ # if first field exactly matches the string 'apple'\n$ awk '$1==\"apple\"{print $2}' fruits.txt\n42\n\n$ # print first field if second field > 35\n$ # NR>1 to avoid the header line\n$ # NR built-in variable contains record number\n$ awk 'NR>1 && $2>35{print $1}' fruits.txt\napple\nfig\n\n$ # print header and lines with qty < 35\n$ awk 'NR==1 || $2<35' fruits.txt\nfruit   qty\nbanana  31\nguava   6\n```\n\n* If the above examples are too confusing, think of it as syntactical sugar\n* Statements are grouped within `{}`\n    * inside `{}`, we have a `if` control structure\n    * Like `C` language, braces not needed for single statements within `if`, but consider that `{}` is used for clarity\n    * From this explicit syntax, remove the outer `{}`, `if` and `()` used for `if`\n* As we'll see later, this allows to mash up few lines of program compactly on command line itself\n    * Of course, for medium to large programs, it is better to put the code in separate file. See [awk scripts](#awk-scripts) section\n\n```bash\n$ # awk '$1==\"apple\"{print $2}' fruits.txt\n$ awk '{\n         if($1 == \"apple\"){\n            print $2\n         }\n       }' fruits.txt\n42\n\n$ # awk 'NR==1 || $2<35' fruits.txt\n$ awk '{\n         if(NR==1 || $2<35){\n            print $0\n         }\n       }' fruits.txt\nfruit   qty\nbanana  31\nguava   6\n```\n\n**Further Reading**\n\n* [gawk manual - Truth Values and Conditions](https://www.gnu.org/software/gawk/manual/html_node/Truth-Values-and-Conditions.html)\n* [gawk manual - Operator Precedence](https://www.gnu.org/software/gawk/manual/html_node/Precedence.html)\n* [unix.stackexchange - filtering columns by header name](https://unix.stackexchange.com/questions/359697/print-columns-in-awk-by-header-name)\n\n<br>\n\n#### <a name=\"regular-expressions-based-filtering\"></a>Regular expressions based filtering\n\n* the *REGEXP* is specified within `//` and by default acts upon `$0`\n* See also [stackoverflow - lines around matching regexp](https://stackoverflow.com/questions/17908555/printing-with-sed-or-awk-a-line-following-a-matching-pattern)\n\n```bash\n$ # all lines containing the string 'are'\n$ # same as: grep 'are' poem.txt\n$ awk '/are/' poem.txt\nRoses are red,\nViolets are blue,\nAnd so are you.\n\n$ # negating REGEXP, same as: grep -v 'are' poem.txt\n$ awk '!/are/' poem.txt\nSugar is sweet,\n\n$ # same as: grep 'are' poem.txt | grep -v 'so'\n$ awk '/are/ && !/so/' poem.txt\nRoses are red,\nViolets are blue,\n\n$ # lines starting with 'a' or 'b'\n$ awk '/^[ab]/' fruits.txt\napple   42\nbanana  31\n\n$ # print last field of all lines containing 'are'\n$ awk '/are/{print $NF}' poem.txt\nred,\nblue,\nyou.\n```\n\n* strings can be used as well, which will be interpreted as *REGEXP* if necessary\n* Allows [using shell variables](#using-shell-variables) instead of hardcoded *REGEXP*\n    * that section also notes difference between using `//` and string\n\n```bash\n$ awk '$0 !~ \"are\"' poem.txt\nSugar is sweet,\n\n$ awk '$0 ~ \"^[ab]\"' fruits.txt\napple   42\nbanana  31\n\n$ # also helpful if search strings have the / delimiter character\n$ cat paths.txt\n/foo/a/report.log\n/foo/y/power.log\n$ awk '/\\/foo\\/a\\//' paths.txt\n/foo/a/report.log\n$ awk '$0 ~ \"/foo/a/\"' paths.txt\n/foo/a/report.log\n```\n\n* *REGEXP* matching against specific field\n\n```bash\n$ # if first field contains 'a'\n$ awk '$1 ~ /a/' fruits.txt\napple   42\nbanana  31\nguava   6\n\n$ # if first field contains 'a' and qty > 20\n$ awk '$1 ~ /a/ && $2 > 20' fruits.txt\napple   42\nbanana  31\n\n$ # if first field does NOT contain 'a'\n$ awk '$1 !~ /a/' fruits.txt\nfruit   qty\nfig     90\n```\n\n<br>\n\n#### <a name=\"fixed-string-matching\"></a>Fixed string matching\n\n* to search a string literally, `index` function can be used instead of *REGEXP*\n    * similar to `grep -F`\n* the function returns the starting position and `0` if no match found\n\n```bash\n$ cat eqns.txt\na=b,a-b=c,c*d\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ # no output since '+' is meta character, would need '/a\\+b/'\n$ awk '/a+b/' eqns.txt\n$ # same as: grep -F 'a+b' eqns.txt\n$ awk 'index($0,\"a+b\")' eqns.txt\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ # much easier than '/i\\*\\(t\\+9-g\\)/'\n$ awk 'index($0,\"i*(t+9-g)\")' eqns.txt\ni*(t+9-g)/8,4-a+b\n\n$ # check only last field\n$ awk -F, 'index($NF,\"a+b\")' eqns.txt\ni*(t+9-g)/8,4-a+b\n$ # index not needed if entire field/line is being compared\n$ awk -F, '$1==\"a+b\"' eqns.txt\na+b,pi=3.14,5e12\n```\n\n* return value is useful to match at specific position\n* for ex: at start/end of line\n\n```bash\n$ # start of line\n$ awk 'index($0,\"a+b\")==1' eqns.txt\na+b,pi=3.14,5e12\n\n$ # end of line\n$ # length function returns number of characters, by default acts on $0\n$ awk 'index($0,\"a+b\")==length()-length(\"a+b\")+1' eqns.txt\ni*(t+9-g)/8,4-a+b\n$ # to avoid repetitions, save the search string in variable\n$ awk -v s=\"a+b\" 'index($0,s)==length()-length(s)+1' eqns.txt\ni*(t+9-g)/8,4-a+b\n```\n\n<br>\n\n#### <a name=\"line-number-based-filtering\"></a>Line number based filtering\n\n* Built-in variable `NR` contains total records read so far\n* Use `FNR` if you need line numbers separately for [multiple file processing](#multiple-file-processing)\n\n```bash\n$ # same as: head -n2 poem.txt | tail -n1\n$ awk 'NR==2' poem.txt\nViolets are blue,\n\n$ # print 2nd and 4th line\n$ awk 'NR==2 || NR==4' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # same as: tail -n1 poem.txt\n$ # statements inside END are executed after processing all input text\n$ awk 'END{print}' poem.txt\nAnd so are you.\n\n$ awk 'NR==4{print $2}' fruits.txt\n90\n```\n\n* for large input, use `exit` to avoid unnecessary record processing\n\n```bash\n$ seq 14323 14563435 | awk 'NR==234{print; exit}'\n14556\n\n$ # sample time comparison\n$ time seq 14323 14563435 | awk 'NR==234{print; exit}'\n14556\n\nreal    0m0.004s\nuser    0m0.004s\nsys     0m0.000s\n$ time seq 14323 14563435 | awk 'NR==234{print}'\n14556\n\nreal    0m2.167s\nuser    0m2.280s\nsys     0m0.092s\n```\n\n* See also [unix.stackexchange - filtering list of lines from every X number of lines](https://unix.stackexchange.com/questions/325985/how-to-print-lines-number-15-and-25-out-of-each-50-lines)\n\n<br>\n\n## <a name=\"case-insensitive-filtering\"></a>Case Insensitive filtering\n\n```bash\n$ # same as: grep -i 'rose' poem.txt\n$ awk -v IGNORECASE=1 '/rose/' poem.txt\nRoses are red,\n\n$ # for small enough set, can also use REGEXP character class\n$ awk '/[rR]ose/' poem.txt\nRoses are red,\n\n$ # another way is to use built-in string function 'tolower'\n$ awk 'tolower($0) ~ /rose/' poem.txt\nRoses are red,\n```\n\n<br>\n\n## <a name=\"changing-record-separators\"></a>Changing record separators\n\n* `RS` to change input record separator\n* default is newline character\n\n```bash\n$ s='this is a sample string'\n\n$ # space as input record separator, printing all records\n$ printf \"$s\" | awk -v RS=' ' '{print NR, $0}'\n1 this\n2 is\n3 a\n4 sample\n5 string\n\n$ # print all records containing 'a'\n$ printf \"$s\" | awk -v RS=' ' '/a/'\na\nsample\n```\n\n* `ORS` to change output record separator\n* gets added to every `print` statement\n    * use [printf](#printf-formatting) to avoid this\n* default is newline character\n\n```bash\n$ seq 3 | awk '{print $0}'\n1\n2\n3\n$ # note that there is empty line after last record\n$ seq 3 | awk -v ORS='\\n\\n' '{print $0}'\n1\n\n2\n\n3\n\n$ # dynamically changing ORS\n$ # ?: ternary operator to select between two expressions based on a condition\n$ # can also use: seq 6 | awk '{ORS = NR%2 ? \" \" : RS} 1'\n$ seq 6 | awk '{ORS = NR%2 ? \" \" : \"\\n\"} 1'\n1 2\n3 4\n5 6\n$ seq 6 | awk '{ORS = NR%3 ? \"-\" : \"\\n\"} 1'\n1-2-3\n4-5-6\n```\n\n<br>\n\n#### <a name=\"paragraph-mode\"></a>Paragraph mode\n\n* When `RS` is set to empty string, one or more consecutive empty lines is used as input record separator\n* Can also use regular expression `RS=\\n\\n+` but there are subtle differences, see [gawk manual - multiline records](https://www.gnu.org/software/gawk/manual/html_node/Multiple-Line.html). Important points from that link quoted below\n\n>However, there is an important difference between ‘RS = \"\"’ and ‘RS = \"\\n\\n+\"’. In the first case, leading newlines in the input data file are ignored, and if a file ends without extra blank lines after the last record, the final newline is removed from the record. In the second case, this special processing is not done\n\n>Now that the input is separated into records, the second step is to separate the fields in the records. One way to do this is to divide each of the lines into fields in the normal manner. This happens by default as the result of a special feature. When RS is set to the empty string and FS is set to a single character, the newline character always acts as a field separator. This is in addition to whatever field separations result from FS\n\n>When FS is the null string (\"\") or a regexp, this special feature of RS does not apply. It does apply to the default field separator of a single space: ‘FS = \" \"’\n\nConsider the below sample file\n\n```bash\n$ cat sample.txt\nHello World\n\nGood day\nHow are you\n\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n```\n\n* Filtering paragraphs\n\n```bash\n$ # print all paragraphs containing 'it'\n$ # if extra newline at end is undesirable, can use\n$ # awk -v RS= '/it/{print c++ ? \"\\n\" $0 : $0}' sample.txt\n$ awk -v RS= -v ORS='\\n\\n' '/it/' sample.txt\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\n$ # based on number of lines in each paragraph\n$ awk -F'\\n' -v RS= -v ORS='\\n\\n' 'NF==1' sample.txt\nHello World\n\n$ awk -F'\\n' -v RS= -v ORS='\\n\\n' 'NF==2 && /do/' sample.txt\nJust do-it\nBelieve it\n\nMuch ado about nothing\nHe he he\n\n```\n\n* Re-structuring paragraphs\n\n```bash\n$ # default FS is one or more of continuous space, tab or newline characters\n$ # default OFS is single space\n$ # so, $1=$1 will change it uniformly to single space between fields\n$ awk -v RS= '{$1=$1} 1' sample.txt\nHello World\nGood day How are you\nJust do-it Believe it\nToday is sunny Not a bit funny No doubt you like it too\nMuch ado about nothing He he he\n\n$ # a better usecase\n$ awk 'BEGIN{FS=\"\\n\"; OFS=\". \"; RS=\"\"; ORS=\"\\n\\n\"} {$1=$1} 1' sample.txt\nHello World\n\nGood day. How are you\n\nJust do-it. Believe it\n\nToday is sunny. Not a bit funny. No doubt you like it too\n\nMuch ado about nothing. He he he\n\n```\n\n**Further Reading**\n\n* [unix.stackexchange - filtering line surrounded by empty lines](https://unix.stackexchange.com/questions/359717/select-line-with-empty-line-above-and-under)\n* [stackoverflow - excellent example and explanation of RS and FS](https://stackoverflow.com/questions/46142118/converting-regex-to-sed-or-grep-regex)\n\n<br>\n\n#### <a name=\"multicharacter-rs\"></a>Multicharacter RS\n\n* Some marker like `Error` or `Warning` etc\n\n```bash\n$ cat report.log\nblah blah\nError: something went wrong\nmore blah\nwhatever\nError: something surely went wrong\nsome text\nsome more text\nblah blah blah\n\n$ awk -v RS='Error:' 'END{print NR-1}' report.log\n2\n$ awk -v RS='Error:' 'NR==1' report.log\nblah blah\n\n$ # filter 'Error:' block matching particular string\n$ # to preserve formatting, use: '/whatever/{print RS $0}'\n$ awk -v RS='Error:' '/whatever/' report.log\n something went wrong\nmore blah\nwhatever\n\n$ # blocks with more than 3 lines\n$ # splitting string with 3 newlines will yield 4 fields\n$ awk -F'\\n' -v RS='Error:' 'NF>4{print RS $0}' report.log\nError: something surely went wrong\nsome text\nsome more text\nblah blah blah\n\n```\n\n* Regular expression based `RS`\n    * the `RT` variable will contain string matched by `RS`\n* Note that entire input is treated as single string, so `^` and `$` anchors will apply only once - not every line\n\n```bash\n$ s='Sample123string54with908numbers'\n$ printf \"$s\" | awk -v RS='[0-9]+' 'NR==1'\nSample\n\n$ # note the relationship between record and separators\n$ printf \"$s\" | awk -v RS='[0-9]+' '{print NR \" : \" $0 \" - \" RT}'\n1 : Sample - 123\n2 : string - 54\n3 : with - 908\n4 : numbers - \n\n$ # need to be careful of empty records\n$ printf '123string54with908' | awk -v RS='[0-9]+' '{print NR \" : \" $0}'\n1 : \n2 : string\n3 : with\n$ # and newline at end of input\n$ printf '123string54with908\\n' | awk -v RS='[0-9]+' '{print NR \" : \" $0}'\n1 : \n2 : string\n3 : with\n4 : \n\n```\n\n* Joining lines based on specific end of line condition\n\n```bash\n$ cat msg.txt\nHello there.\nIt will rain to-\nday. Have a safe\nand pleasant jou-\nrney.\n\n$ # join lines ending with - to next line\n$ # by manipulating RS and ORS\n$ awk -v RS='-\\n' -v ORS= '1' msg.txt\nHello there.\nIt will rain today. Have a safe\nand pleasant journey.\n\n$ # by manipulating ORS alone, sub function covered in later sections\n$ awk '{ORS = sub(/-$/,\"\") ? \"\" : \"\\n\"} 1' msg.txt\nHello there.\nIt will rain today. Have a safe\nand pleasant journey.\n$ # easier: perl -pe 's/-\\n//' msg.txt as newline is still part of input line\n```\n\n* processing null terminated input\n\n```bash\n$ printf 'foo\\0bar\\0' | cat -A\nfoo^@bar^@$\n$ printf 'foo\\0bar\\0' | awk -v RS='\\0' '{print}'\nfoo\nbar\n```\n\n**Further Reading**\n\n* [gawk manual - Records](https://www.gnu.org/software/gawk/manual/html_node/Records.html#Records)\n* [unix.stackexchange - Slurp-mode in awk](https://unix.stackexchange.com/questions/304457/slurp-mode-in-awk)\n* [stackoverflow - using RS to count number of occurrences of a given string](https://stackoverflow.com/questions/45102651/how-to-grep-double-quote-followed-by-a-string-at-same-time/45102962#45102962)\n\n<br>\n\n## <a name=\"substitute-functions\"></a>Substitute functions\n\n* Use `sub` string function for replacing first occurrence\n* Use `gsub` for replacing all occurrences\n* By default, `$0` which contains input record is modified, can specify any other field or variable as needed\n\n```bash\n$ # replacing first occurrence\n$ echo '1-2-3-4-5' | awk '{sub(\"-\", \":\")} 1'\n1:2-3-4-5\n\n$ # replacing all occurrences\n$ echo '1-2-3-4-5' | awk '{gsub(\"-\", \":\")} 1'\n1:2:3:4:5\n\n$ # return value for sub/gsub is number of replacements made\n$ echo '1-2-3-4-5' | awk '{n=gsub(\"-\", \":\"); print n} 1'\n4\n1:2:3:4:5\n\n$ # // format is better suited to specify search REGEXP\n$ echo '1-2-3-4-5' | awk '{gsub(/[^-]+/, \"abc\")} 1'\nabc-abc-abc-abc-abc\n\n$ # replacing all occurrences only for third field\n$ echo 'one;two;three;four' | awk -F';' '{gsub(\"e\", \"E\", $3)} 1'\none two thrEE four\n```\n\n* Use `gensub` to return the modified string unlike `sub` or `gsub` which modifies inplace\n* it also supports back-references and ability to modify specific match\n* acts upon `$0` if target is not specified\n\n```bash\n$ # replace second occurrence\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(\":\", \"-\", 2)} 1'\nfoo:123-bar:baz\n$ # use REGEXP as needed\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"XYZ\", 2)} 1'\nfoo:XYZ:bar:baz\n\n$ # or print the returned string directly\n$ echo 'foo:123:bar:baz' | awk '{print gensub(\":\", \"-\", 2)}'\nfoo:123-bar:baz\n\n$ # replace third occurrence\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"XYZ\", 3)} 1'\nfoo:123:XYZ:baz\n\n$ # replace all occurrences, similar to gsub\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"XYZ\", \"g\")} 1'\nXYZ:XYZ:XYZ:XYZ\n\n$ # target other than $0\n$ echo 'foo:123:bar:baz' | awk -F: -v OFS=: '{$1=gensub(/o/, \"b\", 2, $1)} 1'\nfob:123:bar:baz\n```\n\n* back-reference examples\n* use `\\\"` within double-quotes to represent `\"` character in replacement string\n* use `\\\\1` to represent `\\1` - the first captured group and so on\n* `&` or `\\0` will back-reference entire matched string\n\n```bash\n$ # replacing last occurrence without knowing how many occurrences are there\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/(.*):/, \"\\\\1-\", 1)} 1'\nfoo:123:bar-baz\n$ echo 'foo and bar and baz land good' | awk '{$0=gensub(/(.*)and/, \"\\\\1XYZ\", 1)} 1'\nfoo and bar and baz lXYZ good\n\n$ # use word boundaries as necessary\n$ echo 'foo and bar and baz land good' | awk '{$0=gensub(/(.*)\\<and\\>/, \"\\\\1XYZ\", 1)} 1'\nfoo and bar XYZ baz land good\n\n$ # replacing last but one\n$ echo '456:foo:123:bar:789:baz' | awk '{$0=gensub(/(.*):(.*:)/, \"\\\\1-\\\\2\", 1)} 1'\n456:foo:123:bar-789:baz\n\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"\\\"&\\\"\", \"g\")} 1'\n\"foo\":\"123\":\"bar\":\"baz\"\n```\n\n* saving quotes in variables - to avoid escaping double quotes or having to use octal code for single quotes\n\n```bash\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"\\047&\\047\", \"g\")} 1'\n'foo':'123':'bar':'baz'\n$ echo 'foo:123:bar:baz' | awk -v sq=\"'\" '{$0=gensub(/[^:]+/, sq\"&\"sq, \"g\")} 1'\n'foo':'123':'bar':'baz'\n\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"\\\"&\\\"\", \"g\")} 1'\n\"foo\":\"123\":\"bar\":\"baz\"\n$ echo 'foo:123:bar:baz' | awk -v dq='\"' '{$0=gensub(/[^:]+/, dq\"&\"dq, \"g\")} 1'\n\"foo\":\"123\":\"bar\":\"baz\"\n```\n\n**Further Reading**\n\n* [gawk manual - String-Manipulation Functions](https://www.gnu.org/software/gawk/manual/html_node/String-Functions.html)\n* [gawk manual - escape processing](https://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html)\n\n<br>\n\n## <a name=\"inplace-file-editing\"></a>Inplace file editing\n\n* Use this option with caution, preferably after testing that the `awk` code is working as intended\n\n```bash\n$ cat greeting.txt\nHi there\nHave a nice day\n\n$ awk -i inplace '{gsub(\"e\", \"E\")} 1' greeting.txt\n$ cat greeting.txt\nHi thErE\nHavE a nicE day\n```\n\n* Multiple input files are treated individually and changes are written back to respective files\n\n```bash\n$ cat f1\nI ate 3 apples\n$ cat f2\nI bought two bananas and 3 mangoes\n\n$ awk -i inplace '{gsub(\"3\", \"three\")} 1' f1 f2\n$ cat f1\nI ate three apples\n$ cat f2\nI bought two bananas and three mangoes\n```\n\n* to create backups of original file, set `INPLACE_SUFFIX` variable\n* **Note** that in newer versions, you have to use `inplace::suffix` instead of `INPLACE_SUFFIX`\n\n```bash\n$ awk -i inplace -v INPLACE_SUFFIX='.bkp' '{gsub(\"three\", \"3\")} 1' f1\n$ cat f1\nI ate 3 apples\n$ cat f1.bkp\nI ate three apples\n```\n\n* See [gawk manual - Enabling In-Place File Editing](https://www.gnu.org/software/gawk/manual/html_node/Extension-Sample-Inplace.html) for implementation details\n\n<br>\n\n## <a name=\"using-shell-variables\"></a>Using shell variables\n\n* when `awk` code is part of shell program and shell variable needs to be passed as input to `awk` code\n* for example:\n    * command line argument passed to shell script, which is in turn passed on to `awk`\n    * control structures in shell script calling `awk` with different search strings\n* See also [stackoverflow - How do I use shell variables in an awk script?](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script)\n\n```bash\n$ # examples tested with bash shell\n\n$ f='apple'\n$ awk -v word=\"$f\" '$1==word' fruits.txt\napple   42\n$ f='fig'\n$ awk -v word=\"$f\" '$1==word' fruits.txt\nfig     90\n\n$ q='20'\n$ awk -v threshold=\"$q\" 'NR==1 || $2>threshold' fruits.txt\nfruit   qty\napple   42\nbanana  31\nfig     90\n```\n\n* accessing shell environment variables\n\n```bash\n$ # existing environment variable\n$ awk 'BEGIN{print ENVIRON[\"PWD\"]}'\n/home/learnbyexample\n$ awk 'BEGIN{print ENVIRON[\"SHELL\"]}'\n/bin/bash\n\n$ # defined along with awk code\n$ word='hello world' awk 'BEGIN{print ENVIRON[\"word\"]}'\nhello world\n\n$ # using ENVIRON also prevents awk's interpretation of escape sequences\n$ s='a\\n=c'\n$ foo=\"$s\" awk 'BEGIN{print ENVIRON[\"foo\"]}'\na\\n=c\n$ awk -v foo=\"$s\" 'BEGIN{print foo}'\na\n=c\n```\n\n* passing *REGEXP*\n* See also [gawk manual - Using Dynamic Regexps](https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html)\n\n```bash\n$ s='are'\n$ # for: awk '!/are/' poem.txt\n$ awk -v s=\"$s\" '$0 !~ s' poem.txt\nSugar is sweet,\n$ # for: awk '/are/ && !/so/' poem.txt\n$ awk -v s=\"$s\" '$0 ~ s && !/so/' poem.txt\nRoses are red,\nViolets are blue,\n\n$ r='[^-]+'\n$ echo '1-2-3-4-5' | awk -v r=\"$r\" '{gsub(r, \"abc\")} 1'\nabc-abc-abc-abc-abc\n\n$ # escape sequence has to be doubled when string is interpreted as REGEXP\n$ s='foo and bar and baz land good'\n$ echo \"$s\" | awk '{$0=gensub(\"(.*)\\\\<and\\\\>\", \"\\\\1XYZ\", 1)} 1'\nfoo and bar XYZ baz land good\n$ # hence passing as variable should be\n$ r='(.*)\\\\<and\\\\>'\n$ echo \"$s\" | awk -v r=\"$r\" '{$0=gensub(r, \"\\\\1XYZ\", 1)} 1'\nfoo and bar XYZ baz land good\n\n$ # or use ENVIRON\n$ r='(.*)\\<and\\>'\n$ echo \"$s\" | r=\"$r\" awk '{$0=gensub(ENVIRON[\"r\"], \"\\\\1XYZ\", 1)} 1'\nfoo and bar XYZ baz land good\n```\n\n<br>\n\n## <a name=\"multiple-file-input\"></a>Multiple file input\n\n* Example to show difference between `NR` and `FNR`\n\n```bash\n$ # NR for overall record number\n$ awk 'NR==1' poem.txt greeting.txt\nRoses are red,\n\n$ # FNR for individual file's record number\n$ # same as: head -q -n1 poem.txt greeting.txt\n$ awk 'FNR==1' poem.txt greeting.txt\nRoses are red,\nHi thErE\n```\n\n* Constructs to do some processing before starting each file as well as at the end\n* `BEGINFILE` - to add code to be executed before start of each input file\n* `ENDFILE` - to add code to be executed after processing each input file\n* `FILENAME` - file name of current input file being processed\n\n```bash\n$ # similar to: tail -n1 poem.txt greeting.txt\n$ awk 'BEGINFILE{print \"file: \"FILENAME}\n       ENDFILE{print $0\"\\n------\"}' poem.txt greeting.txt\nfile: poem.txt\nAnd so are you.\n------\nfile: greeting.txt\nHavE a nicE day\n------\n```\n\n* And of course, there can be usual `awk` code\n\n```bash\n$ awk 'BEGINFILE{print \"file: \"FILENAME}\n       FNR==1;\n       ENDFILE{print \"------\"}' poem.txt greeting.txt\nfile: poem.txt\nRoses are red,\n------\nfile: greeting.txt\nHi thErE\n------\n\n$ awk 'BEGINFILE{c++; print \"file: \"FILENAME}\n       FNR==2;\n       END{print \"\\nTotal input files: \"c}' poem.txt greeting.txt\nfile: poem.txt\nViolets are blue,\nfile: greeting.txt\nHavE a nicE day\n\nTotal input files: 2\n```\n\n**Further Reading**\n\n* [gawk manual - Using ARGC and ARGV](https://www.gnu.org/software/gawk/manual/html_node/ARGC-and-ARGV.html)\n* [gawk manual - ARGIND](https://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html#index-ARGIND-variable)\n* [gawk manual - ERRNO](https://www.gnu.org/software/gawk/manual/html_node/Auto_002dset.html#index-ERRNO-variable)\n* [stackoverflow - Finding common value across multiple files](https://stackoverflow.com/a/43473385/4082052)\n\n<br>\n\n## <a name=\"control-structures\"></a>Control Structures\n\n* Syntax is similar to `C` language and single statements inside control structures don't require to be grouped within `{}`\n* See [gawk manual - Control Statements](https://www.gnu.org/software/gawk/manual/html_node/Statements.html) for details\n\nRemember that by default there is a loop that goes over all input records and constructs like `BEGIN` and `END` fall outside that loop\n\n```bash\n$ cat nums.txt\n42\n-2\n10101\n-3.14\n-75\n$ awk '{sum += $1} END{print sum}' nums.txt\n10062.9\n\n$ # uninitialized variables will have empty string\n$ printf '' | awk '{sum += $1} END{print sum}'\n\n$ # so either add '0' or use unary '+' operator to convert to number\n$ printf '' | awk '{sum += $1} END{print +sum}'\n0\n$ awk '{sum += $1} END{print sum+0}' /dev/null\n0\n```\n\n* See also [unix.stackexchange - change in behavior of unary + with gawk version 4.2.0](https://unix.stackexchange.com/questions/421904/regression-with-unary-plus)\n\n<br>\n\n#### <a name=\"if-else-and-loops\"></a>if-else and loops\n\n* We have already seen simple `if` examples in [Filtering](#filtering) section\n* See also [gawk manual - Switch](https://www.gnu.org/software/gawk/manual/html_node/Switch-Statement.html)\n\n```bash\n$ # same as: sed -n '/are/ s/so/SO/p' poem.txt\n$ # remember that sub/gsub returns number of substitutions made\n$ awk '/are/{if(sub(\"so\", \"SO\")) print}' poem.txt\nAnd SO are you.\n$ # of course, can also use\n$ awk '/are/ && sub(\"so\", \"SO\")' poem.txt\nAnd SO are you.\n\n$ # if-else example\n$ awk 'NR>1{if($2>40) $0=\"+\"$0; else $0=\"-\"$0} 1' fruits.txt\nfruit   qty\n+apple   42\n-banana  31\n+fig     90\n-guava   6\n```\n\n* ternary operator\n* See also [stackoverflow - finding min and max value of a column](https://stackoverflow.com/a/29784278/4082052)\n\n```bash\n$ cat nums.txt\n42\n-2\n10101\n-3.14\n-75\n\n$ # changing -ve to +ve and vice versa\n$ # same as: awk '{if($0 ~ /^-/) sub(/^-/,\"\"); else sub(/^/,\"-\")} 1' nums.txt\n$ awk '{$0 ~ /^-/ ? sub(/^-/,\"\") : sub(/^/,\"-\")} 1' nums.txt\n-42\n2\n-10101\n3.14\n75\n$ # can also use: awk '!sub(/^-/,\"\"){sub(/^/,\"-\")} 1' nums.txt\n```\n\n* for loop\n* similar to `C` language, `break` and `continue` statements are also available\n* See also [stackoverflow - find missing numbers from sequential list](https://stackoverflow.com/questions/38491676/how-can-i-find-the-missing-integers-in-a-unique-and-sequential-list-one-per-lin)\n\n```bash\n$ awk 'BEGIN{for(i=2; i<11; i+=2) print i}'\n2\n4\n6\n8\n10\n\n$ # looping each field\n$ s='scat:cat:no cat:abdicate:cater'\n$ echo \"$s\" | awk -F: -v OFS=: '{for(i=1;i<=NF;i++) if($i==\"cat\") $i=\"CAT\"} 1'\nscat:CAT:no cat:abdicate:cater\n$ # can also use sub function\n$ echo \"$s\" | awk -F: -v OFS=: '{for(i=1;i<=NF;i++) sub(/^cat$/,\"CAT\",$i)} 1'\nscat:CAT:no cat:abdicate:cater\n```\n\n* while loop\n* do-while is also available\n\n```bash\n$ awk 'BEGIN{i=2; while(i<11){print i; i+=2}}'\n2\n4\n6\n8\n10\n\n$ # recursive substitution\n$ # here again return value of sub/gsub is useful\n$ echo 'titillate' | awk '{while( gsub(/til/, \"\") ) print}'\ntilate\nate\n```\n\n<br>\n\n#### <a name=\"next-and-nextfile\"></a>next and nextfile\n\n* `next` will skip rest of statements and start processing next line of current file being processed\n    * there is a loop by default which goes over all input records, `next` is applicable for that\n    * it is similar to `continue` statement within loops\n* it is often used in [Two file processing](#two-file-processing)\n\n```bash\n$ # here 'next' is used to skip processing header line\n$ awk 'NR==1{print; next} /a.*a/{$0=\"*\"$0} /[eiou]/{$0=\"-\"$0} 1' fruits.txt\nfruit   qty\n-apple   42\n*banana  31\n-fig     90\n-*guava   6\n```\n\n* `nextfile` is useful to skip remaining lines from current file being processed and move on to next file\n\n```bash\n$ # same as: head -q -n1 poem.txt greeting.txt fruits.txt\n$ awk 'FNR>1{nextfile} 1' poem.txt greeting.txt fruits.txt\nRoses are red,\nHi thErE\nfruit   qty\n\n$ # specific field\n$ awk 'FNR>2{nextfile} {print $1}' poem.txt greeting.txt fruits.txt\nRoses\nViolets\nHi\nHavE\nfruit\napple\n\n$ # similar to 'grep -il'\n$ awk -v IGNORECASE=1 '/red/{print FILENAME; nextfile}' *\ncolors_1.txt\ncolors_2.txt\npoem.txt\n$ awk -v IGNORECASE=1 '$1 ~ /red/{print FILENAME; nextfile}' *\ncolors_1.txt\ncolors_2.txt\n```\n\n<br>\n\n## <a name=\"multiline-processing\"></a>Multiline processing\n\n* Processing consecutive lines\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # match two consecutive lines\n$ awk 'p~/are/ && /is/{print p ORS $0} {p=$0}' poem.txt\nViolets are blue,\nSugar is sweet,\n$ # if only the second line is needed\n$ awk 'p~/are/ && /is/; {p=$0}' poem.txt\nSugar is sweet,\n\n$ # match three consecutive lines\n$ awk 'p2~/red/ && p1~/blue/ && /is/{print p2} {p2=p1; p1=$0}' poem.txt\nRoses are red,\n\n$ # common mistake\n$ sed -n '/are/{N;/is/p}' poem.txt\n$ # would need something like this and not practical to extend for other cases\n$ sed '$!N; /are.*\\n.*is/p; D' poem.txt\nViolets are blue,\nSugar is sweet,\n```\n\nConsider this sample input file\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\n* extracting lines around matching line\n* See also [stackoverflow - lines around matching regexp](https://stackoverflow.com/questions/17908555/printing-with-sed-or-awk-a-line-following-a-matching-pattern)\n* how `n && n--` works:\n    * need to note that right hand side of `&&` is processed only if left hand side is `true`\n    * so for example, if initially `n=2`, then we get\n        * `2 && 2; n=1` - evaluates to `true`\n        * `1 && 1; n=0` - evaluates to `true`\n        * `0 && ` - evaluates to `false` ... no decrementing `n` and hence will be `false` until `n` is re-assigned non-zero value\n\n```bash\n$ # similar to: grep --no-group-separator -A1 'BEGIN' range.txt\n$ awk '/BEGIN/{n=2} n && n--' range.txt\nBEGIN\n1234\nBEGIN\na\n\n$ # only print the line after matching line\n$ # can also use: awk '/BEGIN/{n=1; next} n && n--' range.txt\n$ awk 'n && n--; /BEGIN/{n=1}' range.txt\n1234\na\n$ # generic case: print nth line after match\n$ awk 'n && !--n; /BEGIN/{n=3}' range.txt\nEND\nc\n\n$ # print second line prior to matched line\n$ awk '/END/{print p2} {p2=p1; p1=$0}' range.txt\n1234\nb\n$ # save all lines in an array for generic case\n$ # NR>n is checked to avoid printing empty line if there is a match\n$ # within first n lines\n$ awk -v n=3 '/BEGIN/ && NR>n{print a[NR-n]} {a[NR]=$0}' range.txt\n6789\n$ # or, use the reversing trick\n$ tac range.txt | awk 'n && !--n; /END/{n=3}' | tac\nBEGIN\na\n```\n\n* Checking if multiple strings are present at least once in entire input file\n* If there are lots of strings to check, use arrays\n\n```bash\n$ # can also use BEGINFILE instead of FNR==1\n$ awk 'FNR==1{s1=s2=0} /is/{s1=1} /are/{s2=1} s1&&s2{print FILENAME; nextfile}' *\npoem.txt\nsample.txt\n\n$ awk 'FNR==1{s1=s2=0} /foo/{s1=1} /report/{s2=1} s1&&s2{print FILENAME; nextfile}' *\npaths.txt\n```\n\n**Further Reading**\n\n* [stackoverflow - delete line based on content of previous/next lines](https://stackoverflow.com/questions/49112877/delete-line-if-line-matches-foo-line-above-matches-bar-and-line-below-match)\n* [softwareengineering - FSM examples](https://softwareengineering.stackexchange.com/questions/47806/examples-of-finite-state-machines)\n* [wikipedia - FSM](https://en.wikipedia.org/wiki/Finite-state_machine)\n\n<br>\n\n## <a name=\"two-file-processing\"></a>Two file processing\n\n* We'll use awk's associative arrays (key-value pairs) here\n    * key can be number or string\n    * See also [gawk manual - Arrays](https://www.gnu.org/software/gawk/manual/html_node/Arrays.html)\n* Unlike [comm](./sorting_stuff.md#comm) the input files need not be sorted and comparison can be done based on certain field(s) as well\n\n<br>\n\n#### <a name=\"comparing-whole-lines\"></a>Comparing whole lines\n\nConsider the following test files\n\n```bash\n$ cat colors_1.txt\nBlue\nBrown\nPurple\nRed\nTeal\nYellow\n\n$ cat colors_2.txt\nBlack\nBlue\nGreen\nRed\nWhite\n```\n\n* common lines and lines unique to one of the files\n* For two files as input, `NR==FNR` will be true only when first file is being processed\n* Using `next` will skip rest of code when first file is processed\n* `a[$0]` will create unique keys (here entire line content is used as key) in array `a`\n    * just referencing a key will create it if it doesn't already exist, with value as empty string (will also act as zero in numeric context)\n* `$0 in a` will be true if key already exists in array `a`\n\n```bash\n$ # common lines\n$ # same as: grep -Fxf colors_1.txt colors_2.txt\n$ awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt\nBlue\nRed\n\n$ # lines from colors_2.txt not present in colors_1.txt\n$ # same as: grep -vFxf colors_1.txt colors_2.txt\n$ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt\nBlack\nGreen\nWhite\n\n$ # reversing the order of input files gives\n$ # lines from colors_1.txt not present in colors_2.txt\n$ awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_2.txt colors_1.txt\nBrown\nPurple\nTeal\nYellow\n```\n\n<br>\n\n#### <a name=\"comparing-specific-fields\"></a>Comparing specific fields\n\nConsider the sample input file\n\n```bash\n$ cat marks.txt\nDept    Name    Marks\nECE     Raj     53\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nEEE     Tia     59\nECE     Om      92\nCSE     Amy     67\n```\n\n* single field\n* For ex: only first field comparison by using `$1` instead of `$0` as key\n\n```bash\n$ cat list1\nECE\nCSE\n\n$ # extract only lines matching first field specified in list1\n$ awk 'NR==FNR{a[$1]; next} $1 in a' list1 marks.txt\nECE     Raj     53\nECE     Joel    72\nCSE     Surya   81\nECE     Om      92\nCSE     Amy     67\n\n$ # if header is needed as well\n$ awk 'NR==FNR{a[$1]; next} FNR==1 || $1 in a' list1 marks.txt\nDept    Name    Marks\nECE     Raj     53\nECE     Joel    72\nCSE     Surya   81\nECE     Om      92\nCSE     Amy     67\n```\n\n* multiple fields\n* create a string by adding some character between the fields to act as key\n    * for ex: to avoid matching two field values `abc` and `123` to match with two other field values `ab` and `c123`\n    * by adding character, say `_`, the key would be `abc_123` for first case and `ab_c123` for second case\n    * this can still lead to false match if input data has `_`\n    * there is also a built-in way to do this using [gawk manual - Multidimensional Arrays](https://www.gnu.org/software/gawk/manual/html_node/Multidimensional.html#Multidimensional)\n\n```bash\n$ cat list2\nEEE Moi\nCSE Amy\nECE Raj\n\n$ # extract only lines matching both fields specified in list2\n$ awk 'NR==FNR{a[$1\"_\"$2]; next} $1\"_\"$2 in a' list2 marks.txt\nECE     Raj     53\nEEE     Moi     68\nCSE     Amy     67\n\n$ # uses SUBSEP as separator, whose default value is non-printing character \\034\n$ awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' list2 marks.txt\nECE     Raj     53\nEEE     Moi     68\nCSE     Amy     67\n```\n\n* field and value comparison\n\n```bash\n$ cat list3\nECE 70\nEEE 65\nCSE 80\n\n$ # extract line matching Dept and minimum marks specified in list3\n$ awk 'NR==FNR{d[$1]=$2; next} $1 in d && $3 >= d[$1]' list3 marks.txt\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nECE     Om      92\n```\n\n<br>\n\n#### <a name=\"getline\"></a>getline\n\n* `getline` is an alternative way to read from a file and could be faster than `NR==FNR` method for some cases\n* But use it with caution\n    * [gawk manual - getline](https://www.gnu.org/software/gawk/manual/html_node/Getline.html) for details, especially about corner cases, errors, etc\n    * [getline caveats](https://web.archive.org/web/20170524214527/http://awk.freeshell.org/AllAboutGetline)\n    * [gawk manual - Closing Input and Output Redirections](https://www.gnu.org/software/gawk/manual/html_node/Close-Files-And-Pipes.html) if you have to start from beginning of file again\n* `getline` return value: `1` if record is found, `0` if end of file, `-1` for errors such as file not found (use `ERRNO` variable to get details)\n\n```bash\n$ # replace mth line in poem.txt with nth line from nums.txt\n$ # return value handling is not shown here, but should be done ideally\n$ awk -v m=3 -v n=2 'BEGIN{while(n-- > 0) getline s < \"nums.txt\"}\n                     FNR==m{$0=s} 1' poem.txt\nRoses are red,\nViolets are blue,\n-2\nAnd so are you.\n\n$ # without getline, but slower due to NR==FNR check for every line processed\n$ awk -v m=3 -v n=2 'NR==FNR{if(FNR==n){s=$0; nextfile} next}\n                     FNR==m{$0=s} 1' nums.txt poem.txt\nRoses are red,\nViolets are blue,\n-2\nAnd so are you.\n\n$ # Note that if nums.txt has less than n lines:\n$ # getline version will use last line of nums.txt if any\n$ # NR==FNR version will give empty string as 's' would be uninitialized\n```\n\n* Another use case is if two files are to be processed simultaneously\n\n```bash\n$ # print line from fruits.txt if corresponding line from nums.txt is +ve number\n$ # the return value check ensures corresponding line number comparison\n$ awk -v file='nums.txt' '(getline num < file)==1 && num>0' fruits.txt\nfruit   qty\nbanana  31\n\n$ # without getline, but has to save entire file in array\n$ awk 'NR==FNR{n[FNR]=$0; next} n[FNR]>0' nums.txt fruits.txt\nfruit   qty\nbanana  31\n```\n\n* error handling\n\n```bash\n$ awk 'NR==FNR{n[FNR]=$0; next} n[FNR]>0' xyz.txt fruits.txt\nawk: fatal: cannot open file 'xyz.txt' for reading (No such file or directory)\n\n$ awk -v file='xyz.txt' '{ e=(getline num < file);\n                           if(e<0){print file \": \" ERRNO; exit} }\n                         e==1 && num>0' fruits.txt\nxyz.txt: No such file or directory\n```\n\n**Further Reading**\n\n* [stackoverflow - Fastest way to find lines of a text file from another larger text file](https://stackoverflow.com/questions/42239179/fastest-way-to-find-lines-of-a-text-file-from-another-larger-text-file-in-bash)\n* [unix.stackexchange - filter lines based on line numbers specified in another file](https://unix.stackexchange.com/questions/320651/read-numbers-from-control-file-and-extract-matching-line-numbers-from-the-data-f)\n* [stackoverflow - three file processing to extract a matrix subset](https://stackoverflow.com/questions/45036019/how-to-filter-the-values-from-selected-columns-and-rows)\n* [unix.stackexchange - column wise merging](https://unix.stackexchange.com/questions/294145/merging-two-files-one-column-at-a-time)\n* [stackoverflow - extract specific rows from a text file using an index file](https://stackoverflow.com/questions/40595990/print-many-specific-rows-from-a-text-file-using-an-index-file)\n\n<br>\n\n## <a name=\"creating-new-fields\"></a>Creating new fields\n\n* Number of fields in input record can be changed by simply manipulating `NF`\n\n```bash\n$ # reducing fields\n$ echo 'foo,bar,123,baz' | awk -F, -v OFS=, '{NF=2} 1'\nfoo,bar\n\n$ # creating new empty field(s)\n$ echo 'foo,bar,123,baz' | awk -F, -v OFS=, '{NF=5} 1'\nfoo,bar,123,baz,\n\n$ # assigning to field greater than NF will create empty fields as needed\n$ echo 'foo,bar,123,baz' | awk -F, -v OFS=, '{$7=42} 1'\nfoo,bar,123,baz,,,42\n```\n\n* adding a field based on existing fields\n\n```bash\n$ # adding a new 'Grade' field\n$ awk 'BEGIN{OFS=\"\\t\"; g[9]=\"S\"; g[8]=\"A\"; g[7]=\"B\"; g[6]=\"C\"; g[5]=\"D\"}\n      {NF++; $NF = NR==1 ? \"Grade\" : g[int($(NF-1)/10)]} 1' marks.txt\nDept    Name    Marks   Grade\nECE     Raj     53      D\nECE     Joel    72      B\nEEE     Moi     68      C\nCSE     Surya   81      A\nEEE     Tia     59      D\nECE     Om      92      S\nCSE     Amy     67      C\n\n$ # can also use split (covered in a later section)\n$ # array assignment: split(\"DCBAS\",g,//)\n$ # index adjustment: g[int($(NF-1)/10)-4]\n```\n\n* two file example\n\n```bash\n$ cat list4\nRaj class_rep\nAmy sports_rep\nTia placement_rep\n\n$ awk -v OFS='\\t' 'NR==FNR{r[$1]=$2; next}\n         {$(NF+1) = FNR==1 ? \"Role\" : r[$2]} 1' list4 marks.txt\nDept    Name    Marks   Role\nECE     Raj     53      class_rep\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nEEE     Tia     59      placement_rep\nECE     Om      92\nCSE     Amy     67      sports_rep\n```\n\n<br>\n\n## <a name=\"dealing-with-duplicates\"></a>Dealing with duplicates\n\n* default value of uninitialized variable is `0` in numeric context and empty string in text context\n    * and evaluates to `false` when used conditionally\n\n*Illustration to show default numeric value and array in action*\n\n```bash\n$ printf 'mad\\n42\\n42\\ndam\\n42\\n'\nmad\n42\n42\ndam\n42\n\n$ printf 'mad\\n42\\n42\\ndam\\n42\\n' | awk '{print $0 \"\\t\" int(a[$0]); a[$0]++}'\nmad     0\n42      0\n42      1\ndam     0\n42      2\n$ # only those entries with second column value zero will be retained\n$ printf 'mad\\n42\\n42\\ndam\\n42\\n' | awk '!a[$0]++'\nmad\n42\ndam\n```\n\n* first, examples that retain only first copy of duplicates\n* See also [iridakos: remove duplicates](https://iridakos.com/how-to/2019/05/16/remove-duplicate-lines-preserving-order-linux.html) for a detailed explanation\n* See also [stackoverflow - add a letter to duplicate entries](https://stackoverflow.com/questions/47774779/add-letter-to-second-third-fourth-occurrence-of-a-string)\n\n```bash\n$ cat duplicates.txt\nabc  7   4\nfood toy ****\nabc  7   4\ntest toy 123\ngood toy ****\n\n$ # whole line\n$ awk '!seen[$0]++' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # particular column\n$ awk '!seen[$2]++' duplicates.txt\nabc  7   4\nfood toy ****\n\n$ # total count\n$ awk '!seen[$2]++{c++} END{print +c}' duplicates.txt\n2\n```\n\n* if input is so large that integer numbers can overflow\n* See also [gawk manual - Arbitrary-Precision Integer Arithmetic](https://www.gnu.org/software/gawk/manual/html_node/Arbitrary-Precision-Integers.html)\n\n```bash\n$ # avoid unnecessary counting altogether\n$ awk '!($2 in seen); {seen[$2]}' duplicates.txt\nabc  7   4\nfood toy ****\n\n$ # use arbitrary-precision integers, limited only by available memory\n$ awk -M '!($2 in seen){c++} {seen[$2]} END{print +c}' duplicates.txt\n2\n```\n\n* For multiple fields, separate them using `,` or form a string with some character in between\n    * choose a character unlikely to appear in input data, else there can be false matches\n    * `FS` is a good choice as fields wouldn't contain separator character(s)\n\n```bash\n$ awk '!seen[$2 FS $3]++' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\n\n$ # can also use simulated multidimensional array\n$ # SUBSEP, whose default is \\034 non-printing character, is used as separator\n$ awk '!seen[$2,$3]++' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\n```\n\n* retaining specific numbered copy\n\n```bash\n$ # second occurrence of duplicate\n$ awk '++seen[$2]==2' duplicates.txt\nabc  7   4\ntest toy 123\n\n$ # third occurrence of duplicate\n$ awk '++seen[$2]==3' duplicates.txt\ngood toy ****\n```\n\n* retaining only last copy of duplicate\n\n```bash\n$ # reverse the input line-wise, retain first copy and then reverse again\n$ tac duplicates.txt | awk '!seen[$2]++' | tac\nabc  7   4\ngood toy ****\n```\n\n* filtering based on duplicate count\n* allows to emulate [uniq](./sorting_stuff.md#uniq) command for specific fields\n* See also [unix.stackexchange - retain only parent directory paths](https://unix.stackexchange.com/questions/362571/filter-out-paths-from-a-text-file-that-are-deeper-than-their-immediate-predecces)\n\n```bash\n$ # all duplicates based on 1st column\n$ awk 'NR==FNR{a[$1]++; next} a[$1]>1' duplicates.txt duplicates.txt\nabc  7   4\nabc  7   4\n$ # all duplicates based on 3rd column\n$ awk 'NR==FNR{a[$3]++; next} a[$3]>1' duplicates.txt duplicates.txt\nabc  7   4\nfood toy ****\nabc  7   4\ngood toy ****\n\n$ # more than 2 duplicates based on 2nd column\n$ awk 'NR==FNR{a[$2]++; next} a[$2]>2' duplicates.txt duplicates.txt\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # only unique lines based on 3rd column\n$ awk 'NR==FNR{a[$3]++; next} a[$3]==1' duplicates.txt duplicates.txt\ntest toy 123\n```\n\n<br>\n\n## <a name=\"lines-between-two-regexps\"></a>Lines between two REGEXPs\n\n* This section deals with filtering lines bound by two *REGEXP*s (referred to as blocks)\n* For simplicity the two *REGEXP*s usually used in below examples are the strings **BEGIN** and **END**\n\n<br>\n\n#### <a name=\"all-unbroken-blocks\"></a>All unbroken blocks\n\nConsider the below sample input file, which doesn't have any unbroken blocks (i.e **BEGIN** and **END** are always present in pairs)\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\n* Extracting lines between starting and ending *REGEXP*\n\n```bash\n$ # include both starting/ending REGEXP\n$ # can also use: awk '/BEGIN/,/END/' range.txt\n$ # which is similar to sed -n '/BEGIN/,/END/p'\n$ # but not suitable to extend for other cases\n$ awk '/BEGIN/{f=1} f; /END/{f=0}' range.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\na\nb\nc\nEND\n\n$ # exclude both starting/ending REGEXP\n$ # can also use: awk '/BEGIN/{f=1; next} /END/{f=0} f' range.txt\n$ awk '/END/{f=0} f; /BEGIN/{f=1}' range.txt\n1234\n6789\na\nb\nc\n```\n\n* Include only start or end *REGEXP*\n\n```bash\n$ # include only starting REGEXP\n$ awk '/BEGIN/{f=1} /END/{f=0} f' range.txt\nBEGIN\n1234\n6789\nBEGIN\na\nb\nc\n\n$ # include only ending REGEXP\n$ awk 'f; /END/{f=0} /BEGIN/{f=1}' range.txt\n1234\n6789\nEND\na\nb\nc\nEND\n```\n\n* Extracting lines other than lines between the two *REGEXP*s\n\n```bash\n$ awk '/BEGIN/{f=1} !f; /END/{f=0}' range.txt\nfoo\nbar\nbaz\n\n$ # the other three cases would be\n$ awk '/END/{f=0} !f; /BEGIN/{f=1}' range.txt\n$ awk '!f; /BEGIN/{f=1} /END/{f=0}' range.txt\n$ awk '/BEGIN/{f=1} /END/{f=0} !f' range.txt\n```\n\n<br>\n\n#### <a name=\"specific-blocks\"></a>Specific blocks\n\n* Getting first block\n\n```bash\n$ awk '/BEGIN/{f=1} f; /END/{exit}' range.txt\nBEGIN\n1234\n6789\nEND\n\n$ # use other tricks discussed in previous section as needed\n$ awk '/END/{exit} f; /BEGIN/{f=1}' range.txt\n1234\n6789\n```\n\n* Getting last block\n\n```bash\n$ # reverse input linewise, change the order of REGEXPs, finally reverse again\n$ tac range.txt | awk '/END/{f=1} f; /BEGIN/{exit}' | tac\nBEGIN\na\nb\nc\nEND\n\n$ # or, save the blocks in a buffer and print the last one alone\n$ # ORS contains output record separator, which is newline by default\n$ seq 30 | awk '/4/{f=1; b=$0; next} f{b=b ORS $0} /6/{f=0} END{print b}'\n24\n25\n26\n```\n\n* Getting blocks based on a counter\n\n```bash\n$ # all blocks\n$ seq 30 | sed -n '/4/,/6/p'\n4\n5\n6\n14\n15\n16\n24\n25\n26\n\n$ # get only 2nd block\n$ # can also use: seq 30 | awk -v b=2 '/4/{c++} c==b{print; if(/6/) exit}'\n$ seq 30 | awk -v b=2 '/4/{c++} c==b; /6/ && c==b{exit}'\n14\n15\n16\n\n$ # to get all blocks greater than 'b' blocks\n$ seq 30 | awk -v b=1 '/4/{f=1; c++} f && c>b; /6/{f=0}'\n14\n15\n16\n24\n25\n26\n```\n\n* excluding a particular block\n\n```bash\n$ # excludes 2nd block\n$ seq 30 | awk -v b=2 '/4/{f=1; c++} f && c!=b; /6/{f=0}'\n4\n5\n6\n24\n25\n26\n```\n\n<br>\n\n#### <a name=\"broken-blocks\"></a>Broken blocks\n\n* If there are blocks with ending *REGEXP* but without corresponding start, `awk '/BEGIN/{f=1} f; /END/{f=0}'` will suffice\n* Consider the modified input file where starting *REGEXP* doesn't have corresponding ending\n\n```bash\n$ cat broken_range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nbaz\n\n$ # the file reversing trick comes in handy here as well\n$ tac broken_range.txt | awk '/END/{f=1} f; /BEGIN/{f=0}' | tac\nBEGIN\n1234\n6789\nEND\n```\n\n* But if both kinds of broken blocks are present, accumulate the records and print accordingly\n\n```bash\n$ cat multiple_broken.txt\nqqqqqqq\nBEGIN\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nEND\n0-42-1\nBEGIN\na\nBEGIN\nb\nEND\nxyzabc\n\n$ awk '/BEGIN/{f=1; buf=$0; next}\n       f{buf=buf ORS $0}\n       /END/{f=0; if(buf) print buf; buf=\"\"}' multiple_broken.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\nb\nEND\n```\n\n**Further Reading**\n\n* [stackoverflow - select lines between two regexps](https://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns)\n* [unix.stackexchange - print only blocks with lines > n](https://unix.stackexchange.com/questions/295600/deleting-lines-between-rows-in-a-text-file-using-awk-or-sed)\n* [unix.stackexchange - print a block only if it contains matching string](https://unix.stackexchange.com/a/335523/109046)\n* [unix.stackexchange - print a block matching two different strings](https://unix.stackexchange.com/questions/347368/grep-with-range-and-pass-three-filters)\n* [unix.stackexchange - extract block up to 2nd occurrence of ending REGEXP](https://unix.stackexchange.com/questions/404175/using-awk-to-print-lines-from-one-match-through-a-second-instance-of-a-separate)\n\n<br>\n\n## <a name=\"arrays\"></a>Arrays\n\nWe've already seen examples using arrays, some more examples discussed in this section\n\n* array looping\n\n```bash\n$ # average marks for each department\n$ awk 'NR>1{d[$1]+=$3; c[$1]++} END{for(i in d)print i, d[i]/c[i]}' marks.txt\nECE 72.3333\nEEE 63.5\nCSE 74\n```\n\n* Sorting\n* See [gawk manual - Predefined Array Scanning Orders](https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html#Controlling-Scanning) for more details\n\n```bash\n$ # by default, keys are traversed in random order\n$ awk 'BEGIN{a[\"z\"]=1; a[\"x\"]=12; a[\"b\"]=42; for(i in a)print i, a[i]}'\nx 12\nz 1\nb 42\n\n$ # index sorted ascending order as strings\n$ awk 'BEGIN{PROCINFO[\"sorted_in\"] = \"@ind_str_asc\";\n       a[\"z\"]=1; a[\"x\"]=12; a[\"b\"]=42; for(i in a)print i, a[i]}'\nb 42\nx 12\nz 1\n\n$ # value sorted ascending order as numbers\n$ awk 'BEGIN{PROCINFO[\"sorted_in\"] = \"@val_num_asc\";\n       a[\"z\"]=1; a[\"x\"]=12; a[\"b\"]=42; for(i in a)print i, a[i]}'\nz 1\nx 12\nb 42\n```\n\n* deleting array elements\n\n```bash\n$ cat list5\nCSE     Surya   75\nEEE     Jai     69\nECE     Kal     83\n\n$ # update entry if a match is found\n$ # else append the new entries\n$ awk '{ky=$1\"_\"$2} NR==FNR{upd[ky]=$0; next}\n        ky in upd{$0=upd[ky]; delete upd[ky]} 1;\n        END{for(i in upd)print upd[i]}' list5 marks.txt\nDept    Name    Marks\nECE     Raj     53\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   75\nEEE     Tia     59\nECE     Om      92\nCSE     Amy     67\nECE     Kal     83\nEEE     Jai     69\n```\n\n* true multidimensional arrays\n* length of sub-arrays need not be same. See [gawk manual - Arrays of Arrays](https://www.gnu.org/software/gawk/manual/html_node/Arrays-of-Arrays.html#Arrays-of-Arrays) for details\n\n```bash\n$ awk 'NR>1{d[$1][$2]=$3} END{for(i in d[\"ECE\"])print i}' marks.txt\nJoel\nRaj\nOm\n\n$ awk -v f='CSE' 'NR>1{d[$1][$2]=$3} END{for(i in d[f])print i, d[f][i]}' marks.txt\nSurya 81\nAmy 67\n```\n\n**Further Reading**\n\n* [gawk manual - all array topics](https://www.gnu.org/software/gawk/manual/html_node/Arrays.html)\n* [unix.stackexchange - count words based on length](https://unix.stackexchange.com/questions/396855/is-there-an-easy-way-to-count-characters-in-words-in-file-from-terminal)\n* [unix.stackexchange - filtering specific lines](https://unix.stackexchange.com/a/326215/109046)\n\n<br>\n\n## <a name=\"awk-scripts\"></a>awk scripts\n\n* For larger programs, save the code in a file and use `-f` command line option\n* `;` is not needed to terminate a statement\n* See also [gawk manual - Command-Line Options](https://www.gnu.org/software/gawk/manual/html_node/Options.html#Options) for other related options\n\n```bash\n$ cat buf.awk\n/BEGIN/{\n    f=1\n    buf=$0\n    next\n}\n\nf{\n    buf=buf ORS $0\n}\n\n/END/{\n    f=0\n    if(buf)\n        print buf\n    buf=\"\"\n}\n\n$ awk -f buf.awk multiple_broken.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\nb\nEND\n```\n\n* Another advantage is that single quotes can be freely used\n\n```bash\n$ echo 'foo:123:bar:baz' | awk '{$0=gensub(/[^:]+/, \"\\047&\\047\", \"g\")} 1'\n'foo':'123':'bar':'baz'\n\n$ cat quotes.awk\n{\n    $0 = gensub(/[^:]+/, \"'&'\", \"g\")\n}\n\n1\n\n$ echo 'foo:123:bar:baz' | awk -f quotes.awk\n'foo':'123':'bar':'baz'\n```\n\n* If the code has been first tried out on command line, add `-o` option to get a pretty printed version\n\n```bash\n$ awk -o -v OFS='\\t' 'NR==FNR{r[$1]=$2; next}\n         {$(NF+1) = FNR==1 ? \"Role\" : r[$2]} 1' list4 marks.txt\nDept    Name    Marks   Role\nECE     Raj     53      class_rep\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nEEE     Tia     59      placement_rep\nECE     Om      92\nCSE     Amy     67      sports_rep\n```\n\nFile name can be passed along `-o` option, otherwise by default `awkprof.out` will be used\n\n```bash\n$ cat awkprof.out\n        # gawk profile, created Mon Mar 16 10:11:11 2020\n\n        # Rule(s)\n\n        NR == FNR {\n                r[$1] = $2\n                next\n        }\n\n        {\n                $(NF + 1) = (FNR == 1 ? \"Role\" : r[$2])\n        }\n\n        1 {\n                print $0\n        }\n\n$ # note that other command line options have to be provided as usual\n$ # for ex: awk -v OFS='\\t' -f awkprof.out list4 marks.txt\n```\n\n<br>\n\n## <a name=\"miscellaneous\"></a>Miscellaneous\n\n<br>\n\n#### <a name=\"fpat-and-fieldwidths\"></a>FPAT and FIELDWIDTHS\n\n* `FS` allows to define field separator\n* In contrast, `FPAT` allows to define what should the fields be made up of\n* See also [gawk manual - Defining Fields by Content](https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html)\n\n```bash\n$ s='Sample123string54with908numbers'\n$ # define fields to be one or more consecutive digits\n$ echo \"$s\" | awk -v FPAT='[0-9]+' '{print $1, $2, $3}'\n123 54 908\n$ # define fields to be one or more consecutive alphabets\n$ echo \"$s\" | awk -v FPAT='[a-zA-Z]+' '{print $1, $2, $3, $4}'\nSample string with numbers\n```\n\n* For simpler **csv** input having quoted strings if fields themselves have `,` in them, using `FPAT` is reasonable approach\n* Use a proper parser if input can have other cases like newlines in fields\n    * See [unix.stackexchange - using csv parser](https://unix.stackexchange.com/a/238192) for a sample program in `perl`\n\n```bash\n$ s='foo,\"bar,123\",baz,abc'\n$ echo \"$s\" | awk -F, '{print $2}'\n\"bar\n$ echo \"$s\" | awk -v FPAT='\"[^\"]*\"|[^,]*' '{print $2}'\n\"bar,123\"\n```\n\n* if input has well defined fields based on number of characters, `FIELDWIDTHS` can be used to specify width of each field\n\n```bash\n$ awk -v FIELDWIDTHS='8 3' -v OFS= '/fig/{$2=35} 1' fruits.txt\nfruit   qty\napple   42\nbanana  31\nfig     35\nguava   6\n\n$ # without FIELDWIDTHS\n$ awk '/fig/{$2=35} 1' fruits.txt\nfruit   qty\napple   42\nbanana  31\nfig 35\nguava   6\n```\n\n**Further Reading**\n\n* [gawk manual - Processing Fixed-Width Data](https://www.gnu.org/software/gawk/manual/html_node/Fixed-width-data.html)\n* [unix.stackexchange - Modify records in fixed-width files](https://unix.stackexchange.com/questions/368574/modify-records-in-fixed-width-files)\n* [unix.stackexchange - detecting empty fields in fixed width files](https://unix.stackexchange.com/questions/321559/extracting-data-with-awk-when-some-lines-have-empty-missing-values)\n* [stackoverflow - count number of times value is repeated each line](https://stackoverflow.com/questions/37450880/how-do-i-filter-tab-separated-input-by-the-count-of-fields-with-a-given-value)\n* [stackoverflow - skip characters with FIELDWIDTHS in GNU Awk 4.2](https://stackoverflow.com/questions/46932189/how-do-you-skip-characters-with-fieldwidths-in-gnu-awk-4-2)\n\n<br>\n\n#### <a name=\"string-functions\"></a>String functions\n\n* `length` function - returns length of string, by default acts on `$0`\n\n```bash\n$ seq 8 13 | awk 'length()==1'\n8\n9\n\n$ awk 'NR==1 || length($1)>4' fruits.txt\nfruit   qty\napple   42\nbanana  31\nguava   6\n\n$ # character count and not byte count is calculated, similar to 'wc -m'\n$ printf 'hi👍' | awk '{print length()}'\n3\n\n$ # use -b option if number of bytes are needed\n$ printf 'hi👍' | awk -b '{print length()}'\n6\n```\n\n* `split` function - similar to `FS` splitting input record into fields\n* use `patsplit` function to get results similar to `FPAT`\n* See also [gawk manual - Split function](https://www.gnu.org/software/gawk/manual/gawk.html#index-split_0028_0029-function)\n* See also [unix.stackexchange - delimit second column](https://unix.stackexchange.com/questions/372253/awk-command-to-delimit-the-second-column)\n\n```bash\n$ # 1st argument is string to be split\n$ # 2nd argument is array to save results, indexed from 1\n$ # 3rd argument is separator, default is FS\n$ s='foo,1996-10-25,hello,good'\n$ echo \"$s\" | awk -F, '{split($2,d,\"-\"); print \"Month is: \" d[2]}'\nMonth is: 10\n\n$ # using regular expression to define separator\n$ # return value is number of fields after splitting\n$ s='Sample123string54with908numbers'\n$ echo \"$s\" | awk '{n=split($0,s,/[0-9]+/); for(i=1;i<=n;i++)print s[i]}'\nSample\nstring\nwith\nnumbers\n$ # use 4th argument if separators are needed as well\n$ echo \"$s\" | awk '{n=split($0,s,/[0-9]+/,seps); for(i=1;i<n;i++)print seps[i]}'\n123\n54\n908\n\n$ # single row to multiple rows based on splitting last field\n$ s='foo,baz,12:42:3'\n$ echo \"$s\" | awk -F, '{n=split($NF,a,\":\"); NF--; for(i=1;i<=n;i++) print $0,a[i]}'\nfoo baz 12\nfoo baz 42\nfoo baz 3\n```\n\n* `substr` function allows to extract specified number of characters from given string\n    * indexing starts with `1`\n* See [gawk manual - substr function](https://www.gnu.org/software/gawk/manual/gawk.html#index-substr_0028_0029-function) for corner cases and details\n\n```bash\n$ # 1st argument is string to be worked on\n$ # 2nd argument is starting position\n$ # 3rd argument is number of characters to be extracted\n$ echo 'abcdefghij' | awk '{print substr($0,1,5)}'\nabcde\n$ echo 'abcdefghij' | awk '{print substr($0,4,3)}'\ndef\n$ # if 3rd argument is not given, string is extracted until end\n$ echo 'abcdefghij' | awk '{print substr($0,6)}'\nfghij\n\n$ echo 'abcdefghij' | awk -v OFS=':' '{print substr($0,2,3), substr($0,6,3)}'\nbcd:fgh\n\n$ # if only few characters are needed from input line, can use empty FS\n$ echo 'abcdefghij' | awk -v FS= '{print $3}'\nc\n$ echo 'abcdefghij' | awk -v FS= '{print $3, $5}'\nc e\n```\n\n<br>\n\n#### <a name=\"executing-external-commands\"></a>Executing external commands\n\n* External commands can be issued using `system` function\n* Output would be as usual on `stdout` unless redirected while calling the command\n* Return value of `system` depends on `exit` status of executed command, see [gawk manual - Input/Output Functions](https://www.gnu.org/software/gawk/manual/html_node/I_002fO-Functions.html) for details\n\n```bash\n$ awk 'BEGIN{system(\"echo Hello World\")}'\nHello World\n\n$ wc poem.txt\n 4 13 65 poem.txt\n$ awk 'BEGIN{system(\"wc poem.txt\")}'\n 4 13 65 poem.txt\n\n$ awk 'BEGIN{system(\"seq 10 | paste -sd, > out.txt\")}'\n$ cat out.txt\n1,2,3,4,5,6,7,8,9,10\n\n$ ls xyz.txt\nls: cannot access 'xyz.txt': No such file or directory\n$ echo $?\n2\n$ awk 'BEGIN{s=system(\"ls xyz.txt\"); print \"Status: \" s}'\nls: cannot access 'xyz.txt': No such file or directory\nStatus: 2\n\n$ cat f2\nI bought two bananas and three mangoes\n$ echo 'f1,f2,odd.txt' | awk -F, '{system(\"cat \" $2)}'\nI bought two bananas and three mangoes\n```\n\n<br>\n\n#### <a name=\"printf-formatting\"></a>printf formatting\n\n* Similar to `printf` function in `C` and shell built-in command\n* use `sprintf` function to save result in variable instead of printing\n* See also [gawk manual - printf](https://www.gnu.org/software/gawk/manual/html_node/Printf.html)\n\n```bash\n$ awk '{sum += $1} END{print sum}' nums.txt\n10062.9\n\n$ # note that ORS is not appended and has to be added manually\n$ awk '{sum += $1} END{printf \"%.2f\\n\", sum}' nums.txt\n10062.86\n\n$ awk '{sum += $1} END{printf \"%10.2f\\n\", sum}' nums.txt\n  10062.86\n\n$ awk '{sum += $1} END{printf \"%010.2f\\n\", sum}' nums.txt\n0010062.86\n\n$ awk '{sum += $1} END{printf \"%d\\n\", sum}' nums.txt\n10062\n\n$ awk '{sum += $1} END{printf \"%+d\\n\", sum}' nums.txt\n+10062\n\n$ awk '{sum += $1} END{printf \"%e\\n\", sum}' nums.txt\n1.006286e+04\n```\n\n* to refer argument by positional number (starts with 1), use `<num>$`\n\n```bash\n$ # can also use: awk 'BEGIN{printf \"hex=%x\\noct=%o\\ndec=%d\\n\", 15, 15, 15}'\n$ awk 'BEGIN{printf \"hex=%1$x\\noct=%1$o\\ndec=%1$d\\n\", 15}'\nhex=f\noct=17\ndec=15\n\n$ # adding prefix to hex/oct numbers\n$ awk 'BEGIN{printf \"hex=%1$#x\\noct=%1$#o\\ndec=%1$d\\n\", 15}'\nhex=0xf\noct=017\ndec=15\n```\n\n* strings\n\n```bash\n$ # prefix remaining width with spaces\n$ awk 'BEGIN{printf \"%6s:%5s\\n\", \"foo\", \"bar\"}'\n   foo:  bar\n\n$ # suffix remaining width with spaces\n$ awk 'BEGIN{printf \"%-6s:%-5s\\n\", \"foo\", \"bar\"}'\nfoo   :bar  \n\n$ # truncate\n$ awk 'BEGIN{printf \"%.2s\\n\", \"foobar\"}'\nfo\n```\n\n* avoid using `printf` without format specifier\n\n```bash\n$ awk 'BEGIN{s=\"solve: 5 % x = 1\"; printf s}'\nawk: cmd. line:1: fatal: not enough arguments to satisfy format string\n    `solve: 5 % x = 1'\n               ^ ran out for this one\n\n$ awk 'BEGIN{s=\"solve: 5 % x = 1\"; printf \"%s\\n\", s}'\nsolve: 5 % x = 1\n```\n\n* See also [stackoverflow - concatenating columns in middle](https://stackoverflow.com/questions/49135518/linux-csv-file-concatenate-columns-into-one-column)\n\n<br>\n\n#### <a name=\"redirecting-print-output\"></a>Redirecting print output\n\n* redirecting to file instead of stdout using `>`\n* similar to behavior in shell, if file already exists it is overwritten\n    * use `>>` to append to an existing file without deleting content\n* however, unlike shell, subsequent redirections to same file will append to it\n* See also [gawk manual - Closing Input and Output Redirections](https://www.gnu.org/software/gawk/manual/html_node/Close-Files-And-Pipes.html) if you have too many redirections\n\n```bash\n$ seq 6 | awk 'NR%2{print > \"odd.txt\"; next} {print > \"even.txt\"}'\n$ cat odd.txt\n1\n3\n5\n$ cat even.txt\n2\n4\n6\n\n$ awk 'NR==1{col1=$1\".txt\"; col2=$2\".txt\"; next}\n       {print $1 > col1; print $2 > col2}' fruits.txt\n$ cat fruit.txt\napple\nbanana\nfig\nguava\n$ cat qty.txt\n42\n31\n90\n6\n```\n\n* redirecting to shell command\n* this is useful if you have different things to redirect to different commands, otherwise it can be done as usual in shell acting on `awk`'s output\n* all redirections to same command gets combined as single input to that command\n\n```bash\n$ # same as: echo 'foo good 123' | awk '{print $2}' | wc -c\n$ echo 'foo good 123' | awk '{print $2 | \"wc -c\"}'\n5\n$ # to avoid newline character being added to print\n$ echo 'foo good 123' | awk -v ORS= '{print $2 | \"wc -c\"}'\n4\n$ # assuming no format specifiers in input\n$ echo 'foo good 123' | awk '{printf $2 | \"wc -c\"}'\n4\n\n$ # same as: echo 'foo good 123' | awk '{printf $2 $3 | \"wc -c\"}'\n$ echo 'foo good 123' | awk '{printf $2 | \"wc -c\"; printf $3 | \"wc -c\"}'\n7\n```\n\n**Further Reading**\n\n* [gawk manual - Input/Output Functions](https://www.gnu.org/software/gawk/manual/html_node/I_002fO-Functions.html)\n* [gawk manual - Redirecting Output of print and printf](https://www.gnu.org/software/gawk/manual/html_node/Redirection.html)\n* [gawk manual - Two-Way Communications with Another Process](https://www.gnu.org/software/gawk/manual/html_node/Two_002dway-I_002fO.html)\n* [unix.stackexchange - inplace editing as well as stdout](https://unix.stackexchange.com/questions/321679/gawk-inplace-and-stdout)\n* [stackoverflow - redirect blocks to separate files](https://stackoverflow.com/questions/45098279/write-blocks-in-a-text-file-to-multiple-new-files)\n\n<br>\n\n## <a name=\"gotchas-and-tips\"></a>Gotchas and Tips\n\n* using `$` for variables\n* only input record `$0` and field contents `$1`, `$2` etc need `$`\n* See also [unix.stackexchange - Why does awk print the whole line when I want it to print a variable?](https://unix.stackexchange.com/questions/291126/why-does-awk-print-the-whole-line-when-i-want-it-to-print-a-variable)\n\n```bash\n$ # wrong\n$ awk -v word=\"apple\" '$1==$word' fruits.txt\n\n$ # right\n$ awk -v word=\"apple\" '$1==word' fruits.txt\napple   42\n```\n\n* dos style line endings\n* See also [unix.stackexchange - filtering when last column has \\r](https://unix.stackexchange.com/questions/399560/using-awk-to-select-rows-with-specific-value-in-specific-column)\n\n```bash\n$ # no issue with unix style line ending\n$ printf 'foo bar\\n123 789\\n' | awk '{print $2, $1}'\nbar foo\n789 123\n\n$ # dos style line ending causes trouble\n$ printf 'foo bar\\r\\n123 789\\r\\n' | awk '{print $2, $1}'\n foo\n 123\n\n$ # easy to deal by simply setting appropriate RS\n$ # note that ORS would still be newline character only\n$ printf 'foo bar\\r\\n123 789\\r\\n' | awk -v RS='\\r\\n' '{print $2, $1}'\nbar foo\n789 123\n```\n\n* relying on default initial value\n\n```bash\n$ # step 1 - works for single file\n$ awk '{sum += $1} END{print sum}' nums.txt\n10062.9\n\n$ # step 2 - change to work for multiple file\n$ awk '{sum += $1} ENDFILE{print FILENAME, sum}' nums.txt\nnums.txt 10062.9\n\n$ # step 3 - check with multiple file input\n$ # oops, default numerical value '0' for sum works only once\n$ awk '{sum += $1} ENDFILE{print FILENAME, sum}' nums.txt <(seq 3)\nnums.txt 10062.9\n/dev/fd/63 10068.9\n\n$ # step 4 - correctly initialize variables\n$ awk '{sum += $1} ENDFILE{print FILENAME, sum; sum=0}' nums.txt <(seq 3)\nnums.txt 10062.9\n/dev/fd/63 6\n```\n\n* use unary operator `+` to force numeric conversion\n\n```bash\n$ awk '{sum += $1} END{print FILENAME, sum}' nums.txt\nnums.txt 10062.9\n\n$ awk '{sum += $1} END{print FILENAME, sum}' /dev/null\n/dev/null \n\n$ awk '{sum += $1} END{print FILENAME, +sum}' /dev/null\n/dev/null 0\n```\n\n* concatenate empty string to force string comparison\n\n```bash\n$ echo '5 5.0' | awk '{print $1==$2 ? \"same\" : \"different\", \"string\"}'\nsame string\n\n$ echo '5 5.0' | awk '{print $1\"\"==$2 ? \"same\" : \"different\", \"string\"}'\ndifferent string\n```\n\n* beware of expressions going -ve for field calculations\n\n```bash\n$ cat misc.txt\nfoo\ngood bad ugly\n123 xyz\na b c d\n\n$ # trying to delete last two fields\n$ awk '{NF -= 2} 1' misc.txt\nawk: cmd. line:1: (FILENAME=misc.txt FNR=1) fatal: NF set to negative value\n$ # dynamically change it depending on number of fields\n$ awk '{NF = (NF<=2) ? 0 : NF-2} 1' misc.txt\n\ngood\n\na b\n\n$ # similarly, trying to access 3rd field from end\n$ awk '{print $(NF-2)}' misc.txt\nawk: cmd. line:1: (FILENAME=misc.txt FNR=1) fatal: attempt to access field -1\n$ awk 'NF>2{print $(NF-2)}' misc.txt\ngood\nb\n```\n\n* If input is ASCII alone, simple trick to improve speed\n* For simple non-regex based column filtering, using [cut](./miscellaneous.md#cut) command might give faster results\n    * See [stackoverflow - how to split columns faster](https://stackoverflow.com/questions/46882557/how-to-split-columns-faster-in-python/46883120#46883120) for example\n\n```bash\n$ # all words containing exactly 3 lowercase a\n$ time awk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words\n1019\n\nreal    0m0.075s\n\n$ time LC_ALL=C awk -F'a' 'NF==4{cnt++} END{print +cnt}' /usr/share/dict/words\n1019\n\nreal    0m0.045s\n```\n\n<br>\n\n## <a name=\"further-reading\"></a>Further Reading\n\n* Manual and related\n    * `man awk` and `info awk` for quick reference from command line\n    * [gawk manual](https://www.gnu.org/software/gawk/manual/gawk.html#SEC_Contents) for complete reference, extensions and more\n    * [awk FAQ](http://www.faqs.org/faqs/computer-lang/awk/faq/) - from 2002, but plenty of information, especially about all the various `awk` implementations\n* this tutorial has also been [converted to an ebook](https://github.com/learnbyexample/learn_gnuawk) with additional descriptions, examples, a chapter on regular expressions, etc.\n* What's up with different `awk` versions?\n    * [unix.stackexchange - brief explanation](https://unix.stackexchange.com/questions/29576/difference-between-gawk-vs-awk)\n    * [Differences between gawk, nawk, mawk, and POSIX awk](https://archive.is/btGky)\n    * [cheat sheet for awk/nawk/gawk](https://catonmat.net/ftp/awk.cheat.sheet.txt)\n* Tutorials and Q&A\n    * [code.snipcademy - gentle intro](https://code.snipcademy.com/tutorials/shell-scripting/awk/introduction)\n    * [funtoo - using examples](https://www.funtoo.org/Awk_by_Example,_Part_1)\n    * [grymoire - detailed tutorial](https://www.grymoire.com/Unix/Awk.html) - covers information about different `awk` versions as well\n    * [catonmat - one liners explained](https://catonmat.net/awk-one-liners-explained-part-one)\n    * [Why Learn AWK?](https://blog.jpalardy.com/posts/why-learn-awk/)\n    * [awk Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/awk?sort=votes&pageSize=15)\n    * [awk Q&A on unix.stackexchange](https://unix.stackexchange.com/questions/tagged/awk?sort=votes&pageSize=15)\n* Alternatives\n    * [GNU datamash](https://www.gnu.org/software/datamash/alternatives/)\n    * [bioawk](https://github.com/lh3/bioawk)\n    * [hawk](https://github.com/gelisam/hawk/blob/master/doc/README.md) - based on Haskell\n    * [miller](https://github.com/johnkerl/miller) - similar to awk/sed/cut/join/sort for name-indexed data such as CSV, TSV, and tabular JSON\n        * See this [ycombinator news](https://news.ycombinator.com/item?id=10066742) for other tools like this\n* miscellaneous\n    * [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)\n    * [awk-libs](https://github.com/e36freak/awk-libs) - lots of useful functions\n    * [awkaster](https://github.com/TheMozg/awk-raycaster) - Pseudo-3D shooter written completely in awk using raycasting technique\n    * [awk REPL](https://awk.js.org/) - live editor on browser\n* examples for some of the stuff not covered in this tutorial\n    * [unix.stackexchange - rand/srand](https://unix.stackexchange.com/questions/372816/awk-get-random-lines-of-file-satisfying-a-condition)\n    * [unix.stackexchange - strftime](https://unix.stackexchange.com/questions/224969/current-date-in-awk)\n    * [unix.stackexchange - ARGC and ARGV](https://unix.stackexchange.com/questions/222146/awk-does-not-end/222150#222150)\n    * [stackoverflow - arbitrary precision integer extension](https://stackoverflow.com/questions/46904447/strange-output-while-comparing-engineering-numbers-in-awk)\n    * [stackoverflow - recognizing hexadecimal numbers](https://stackoverflow.com/questions/3683110/how-to-make-calculations-on-hexadecimal-numbers-with-awk)\n    * [unix.stackexchange - sprintf and close](https://unix.stackexchange.com/questions/223727/splitting-file-for-every-10000-numbers-not-lines/223739#223739)\n    * [unix.stackexchange - user defined functions and array passing](https://unix.stackexchange.com/questions/72469/gawk-passing-arrays-to-functions)\n    * [unix.stackexchange - rename csv files based on number of fields in header row](https://unix.stackexchange.com/questions/408742/count-number-of-columns-in-csv-files-and-rename-if-less-than-11-columns)\n"
  },
  {
    "path": "gnu_grep.md",
    "content": "<br> <br> <br>\n\n---\n\n:information_source: :information_source: This chapter has been converted into a better formatted ebook: https://learnbyexample.github.io/learn_gnugrep_ripgrep/. The ebook also has content updated for newer version of the commands, includes exercises, solutions, has a separate chapter for popular alternative `ripgrep`, etc.\n\nFor markdown source and links to buy pdf/epub versions, see: https://github.com/learnbyexample/learn_gnugrep_ripgrep\n\n---\n\n<br> <br> <br>\n\n# <a name=\"gnu-grep\"></a>GNU grep\n\n**Table of Contents**\n\n* [Simple string search](#simple-string-search)\n* [Case insensitive search](#case-insensitive-search)\n* [Invert matching lines](#invert-matching-lines)\n* [Line number, count and limiting output lines](#line-number-count-and-limiting-output-lines)\n* [Multiple search strings](#multiple-search-strings)\n* [File names in output](#file-names-in-output)\n* [Match whole word or line](#match-whole-word-or-line)\n* [Colored output](#colored-output)\n* [Get only matching portion](#get-only-matching-portion)\n* [Context matching](#context-matching)\n* [Recursive search](#recursive-search)\n    * [Basic recursive search](#basic-recursive-search)\n    * [Exclude/Include specific files/directories](#excludeinclude-specific-filesdirectories)\n    * [Recursive search with bash options](#recursive-search-with-bash-options)\n    * [Recursive search using find command](#recursive-search-using-find-command)\n    * [Passing file names to other commands](#passing-file-names-to-other-commands)\n* [Search strings from file](#search-strings-from-file)\n* [Options for scripting purposes](#options-for-scripting-purposes)\n* [Regular Expressions - BRE/ERE](#regular-expressions-breere)\n    * [Line Anchors](#line-anchors)\n    * [Word Anchors](#word-anchors)\n    * [Alternation](#alternation)\n    * [The dot meta character](#the-dot-meta-character)\n    * [Quantifiers](#quantifiers)\n    * [Character classes](#character-classes)\n    * [Grouping](#grouping)\n    * [Back reference](#back-reference)\n* [Multiline matching](#multiline-matching)\n* [Perl Compatible Regular Expressions](#perl-compatible-regular-expressions)\n    * [Backslash sequences](#backslash-sequences)\n    * [Non-greedy matching](#non-greedy-matching)\n    * [Lookarounds](#lookarounds)\n    * [Ignoring specific matches](#ignoring-specific-matches)\n    * [Re-using regular expression pattern](#re-using-regular-expression-pattern)\n* [Gotchas and Tips](#gotchas-and-tips)\n* [Regular Expressions Reference (ERE)](#regular-expressions-reference-ere)\n    * [Anchors](#anchors)\n    * [Character Quantifiers](#character-quantifiers)\n    * [Character classes and backslash sequences](#character-classes-and-backslash-sequences)\n    * [Pattern groups](#pattern-groups)\n    * [Basic vs Extended Regular Expressions](#basic-vs-extended-regular-expressions)\n* [Further Reading](#further-reading)\n\n<br>\n\n```bash\n$ grep -V | head -1\ngrep (GNU grep) 2.25\n\n$ man grep\nGREP(1)                     General Commands Manual                    GREP(1)\n\nNAME\n       grep, egrep, fgrep, rgrep - print lines matching a pattern\n\nSYNOPSIS\n       grep [OPTIONS] PATTERN [FILE...]\n       grep [OPTIONS] [-e PATTERN]...  [-f FILE]...  [FILE...]\n\nDESCRIPTION\n       grep searches the named input FILEs for lines containing a match to the\n       given PATTERN.  If no files are specified, or if the file “-” is given,\n       grep  searches  standard  input.   By default, grep prints the matching\n       lines.\n\n       In addition, the variant programs egrep, fgrep and rgrep are  the  same\n       as  grep -E,  grep -F,  and  grep -r, respectively.  These variants are\n       deprecated, but are provided for backward compatibility.\n...\n```\n\n**Note** For more detailed documentation and examples, use `info grep`\n\n<br>\n\n## <a name=\"simple-string-search\"></a>Simple string search\n\n* First specify the search pattern (usually enclosed in single quotes) and then the file input\n* More than one file can be specified or input given from stdin\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ grep 'are' poem.txt\nRoses are red,\nViolets are blue,\nAnd so are you.\n\n$ grep 'so are' poem.txt\nAnd so are you.\n```\n\n* If search string contains any regular expression meta characters like `^$\\.*[]` (covered later), use the `-F` option or `fgrep` if available\n\n```bash\n$ echo 'int a[5]' | grep 'a[5]'\n$ echo 'int a[5]' | grep -F 'a[5]'\nint a[5]\n$ echo 'int a[5]' | fgrep 'a[5]'\nint a[5]\n```\n\n* See [Gotchas and Tips](#gotchas-and-tips) section if you get strange issues\n\n<br>\n\n## <a name=\"case-insensitive-search\"></a>Case insensitive search\n\n```bash\n$ grep -i 'rose' poem.txt\nRoses are red,\n\n$ grep -i 'and' poem.txt\nAnd so are you.\n```\n\n<br>\n\n## <a name=\"invert-matching-lines\"></a>Invert matching lines\n\n* Use the `-v` option to get lines other than those matching the search string\n* Tip: Look out for other opposite pairs like `-l -L`, `-h -H`, opposites in regular expression, etc\n\n```bash\n$ grep -v 'are' poem.txt\nSugar is sweet,\n\n$ # example for input from stdin\n$ seq 5 | grep -v '3'\n1\n2\n4\n5\n```\n\n<br>\n\n## <a name=\"line-number-count-and-limiting-output-lines\"></a>Line number, count and limiting output lines\n\n* Show line number of matching lines\n\n```bash\n$ grep -n 'sweet' poem.txt\n3:Sugar is sweet,\n```\n\n* Count number of matching lines\n\n```bash\n$ grep -c 'are' poem.txt\n3\n```\n\n* Limit number of matching lines\n\n```bash\n$ grep -m2 'are' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n<br>\n\n## <a name=\"multiple-search-strings\"></a>Multiple search strings\n\n* Match any\n\n```bash\n$ # search blue or you\n$ grep -e 'blue' -e 'you' poem.txt\nViolets are blue,\nAnd so are you.\n```\n\nIf there are lot of search strings, use a file input\n\n**Note** Be careful to avoid empty lines in the file, it would result in matching all the lines\n\n```bash\n$ printf 'rose\\nsugar\\n' > search_strings.txt\n$ cat search_strings.txt\nrose\nsugar\n\n$ # -f option accepts file input with search terms in separate lines\n$ grep -if search_strings.txt poem.txt\nRoses are red,\nSugar is sweet,\n```\n\n* Match all\n\n```bash\n$ # match line containing both are & And\n$ grep 'are' poem.txt | grep 'And'\nAnd so are you.\n```\n\n<br>\n\n## <a name=\"file-names-in-output\"></a>File names in output\n\n* `-l` to get files matching the search\n* `-L` to get files not matching the search\n* `grep` skips the rest of file once a match is found\n\n```bash\n$ grep -l 'Rose' poem.txt\npoem.txt\n\n$ grep -L 'are' poem.txt search_strings.txt\nsearch_strings.txt\n```\n\n* Prefix file name to search results\n* `-h` is default for single file input, no file name prefix in output\n* `-H` is default for multiple file input, file name prefix in output\n\n```bash\n$ grep -h 'Rose' poem.txt\nRoses are red,\n$ grep -H 'Rose' poem.txt\npoem.txt:Roses are red,\n\n$ # -H is default for multiple file input\n$ grep -i 'sugar' poem.txt search_strings.txt\npoem.txt:Sugar is sweet,\nsearch_strings.txt:sugar\n$ grep -ih 'sugar' poem.txt search_strings.txt\nSugar is sweet,\nsugar\n```\n\n<br>\n\n## <a name=\"match-whole-word-or-line\"></a>Match whole word or line\n\n* Word search using `-w` option\n    * word constitutes of alphabets, numbers and underscore character\n* This will ensure that given patterns are not surrounded by other word characters\n    * this is slightly different than using word boundaries in regular expressions\n* For example, this helps to distinguish `par` from `spar`, `part`, etc\n\n```bash\n$ printf 'par value\\nheir apparent\\n' | grep 'par'\npar value\nheir apparent\n\n$ printf 'par value\\nheir apparent\\n' | grep -w 'par'\npar value\n\n$ printf 'scare\\ncart\\ncar\\nmacaroni\\n' | grep -w 'car'\ncar\n```\n\n* Another useful option is `-x` to match only complete line, not anywhere in the line\n\n```bash\n$ printf 'see my book list\\nmy book\\n' | grep 'my book'\nsee my book list\nmy book\n\n$ printf 'see my book list\\nmy book\\n' | grep -x 'my book'\nmy book\n\n$ printf 'scare\\ncart\\ncar\\nmacaroni\\n' | grep -x 'car'\ncar\n```\n\n<br>\n\n## <a name=\"colored-output\"></a>Colored output\n\n* Highlight search strings, line numbers, file name, etc in different colors\n    * Depends on color support in terminal being used\n* options to `--color` are\n    * `auto` when output is redirected (another command, file, etc) the color information won't be passed\n    * `always` when output is redirected (another command, file, etc) the color information will also be passed\n    * `never` explicitly specify no highlighting\n\n```bash\n$ # can also use grep --color 'blue' as auto is default\n$ grep --color=auto 'blue' poem.txt\nViolets are blue,\n```\n\n* Sample screenshot\n\n![grep color output](./images/color_option.png)\n\n* Example to show difference between `auto` and `always`\n\n```bash\n$ grep --color=auto 'blue' poem.txt > saved_output.txt\n$ cat -v saved_output.txt\nViolets are blue,\n$ grep --color=always 'blue' poem.txt > saved_output.txt\n$ cat -v saved_output.txt\nViolets are ^[[01;31m^[[Kblue^[[m^[[K,\n\n$ # some commands like 'less' are capable of using the color information\n$ grep --color=always 'are' poem.txt | less -R\n$ # highlight multiple matching patterns\n$ grep --color=always 'are' poem.txt | grep --color 'd'\nRoses are red,\nAnd so are you.\n```\n\n<br>\n\n## <a name=\"get-only-matching-portion\"></a>Get only matching portion\n\n* The `-o` option to get only matched portion is more useful with regular expressions\n* Comes in handy if overall number of matches is required, instead of only line wise\n\n```bash\n$ grep -o 'are' poem.txt\nare\nare\nare\n\n$ # -c only gives count of matching lines\n$ grep -c 'e' poem.txt\n4\n$ grep -co 'e' poem.txt\n4\n$ # so need another command to get count of all matches\n$ grep -o 'e' poem.txt | wc -l\n9\n```\n\n<br>\n\n## <a name=\"context-matching\"></a>Context matching\n\n* The `-A`, `-B` and `-C` options are useful to get lines after/before/around matching line respectively\n\n```bash\n$ grep -A1 'blue' poem.txt\nViolets are blue,\nSugar is sweet,\n$ grep -B1 'blue' poem.txt\nRoses are red,\nViolets are blue,\n$ grep -C1 'blue' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\n```\n\n* If there are multiple non-adjacent matching segments, by default `grep` adds a line `--` to separate them\n    * non-adjacent here implies that segments are separated by at least one line in input data\n\n```bash\n$ seq 29 | grep -A1 '3'\n3\n4\n--\n13\n14\n--\n23\n24\n```\n\n* Use `--no-group-separator` option if the separator line is a hindrance, for example feeding the output of `grep` to another program\n\n```bash\n$ seq 29 | grep --no-group-separator -A1 '3'\n3\n4\n13\n14\n23\n24\n```\n\n* Use `--group-separator` to customize the separator\n\n```bash\n$ seq 29 | grep --group-separator='*****' -A1 '3'\n3\n4\n*****\n13\n14\n*****\n23\n24\n```\n\n<br>\n\n## <a name=\"recursive-search\"></a>Recursive search\n\nFirst let's create some more test files\n\n```bash\n$ mkdir -p test_files/hidden_files\n$ printf 'Red\\nGreen\\nBlue\\nBlack\\nWhite\\n' > test_files/colors.txt\n$ printf 'Violet\\nIndigo\\nBlue\\nGreen\\nYellow\\nOrange\\nRed\\n' > test_files/vibgyor.txt\n$ printf '#!/usr/bin/python3\\n\\nprint(\"Hello World\")\\n' > test_files/hello.py\n$ printf 'I like yellow\\nWhat about you\\n' > test_files/hidden_files/.fav_color.info\n```\n\nFrom `man grep`\n\n```bash\n       -r, --recursive\n              Read all files  under  each  directory,  recursively,  following\n              symbolic  links only if they are on the command line.  Note that\n              if  no  file  operand  is  given,  grep  searches  the   working\n              directory.  This is equivalent to the -d recurse option.\n\n       -R, --dereference-recursive\n              Read  all  files  under each directory, recursively.  Follow all\n              symbolic links, unlike -r.\n```\n\n<br>\n\n#### <a name=\"basic-recursive-search\"></a>Basic recursive search\n\n* Note that `-H` option automatically activates for multiple file input\n\n```bash\n$ # by default, current working directory is searched\n$ grep -r 'red'\npoem.txt:Roses are red,\n\n$ grep -ri 'red'\npoem.txt:Roses are red,\ntest_files/colors.txt:Red\ntest_files/vibgyor.txt:Red\n\n$ grep -rin 'red'\npoem.txt:1:Roses are red,\ntest_files/colors.txt:1:Red\ntest_files/vibgyor.txt:7:Red\n\n$ grep -ril 'red'\npoem.txt\ntest_files/colors.txt\ntest_files/vibgyor.txt\n```\n\n<br>\n\n#### <a name=\"excludeinclude-specific-filesdirectories\"></a>Exclude/Include specific files/directories\n\n* By default, recursive search includes hidden files as well\n* They can be excluded by file name or directory name\n    * [glob](https://github.com/learnbyexample/Linux_command_line/blob/master/Shell.md#wildcards) patterns can be used\n    * for example: `*.[ch]` to specify all files ending with `.c` or `.h`\n* The exclusion options can be used multiple times\n    * for example: `--exclude='*.txt' --exclude='*.log'` or specified from a file using `--exclude-from=FILE`\n* To search only files with specific pattern in their names, use `--include=GLOB`\n* **Note:** exclusion/inclusion applies only to basename of file/directory, not the entire path\n* To follow all symbolic links (not directly specificied as arguments, but found on recursive search), use `-R` instead of `-r`\n\n```bash\n$ grep -ri 'you'\npoem.txt:And so are you.\ntest_files/hidden_files/.fav_color.info:What about you\n\n$ # exclude file names starting with `.` i.e hidden files\n$ grep -ri --exclude='.*' 'you'\npoem.txt:And so are you.\n\n$ # include only file names ending with `.info`\n$ grep -ri --include='*.info' 'you'\ntest_files/hidden_files/.fav_color.info:What about you\n\n$ # exclude a directory\n$ grep -ri --exclude-dir='hidden_files' 'you'\npoem.txt:And so are you.\n\n$ # If you are using git(or similar), this would be handy\n$ # grep --exclude-dir='.git' -rl 'search pattern'\n```\n\n<br>\n\n#### <a name=\"recursive-search-with-bash-options\"></a>Recursive search with bash options\n\n* Using `bash` options `globstar` (for recursion)\n    * Other options like `extglob` and `dotglob` come in handy too\n    * See [glob](https://github.com/learnbyexample/Linux_command_line/blob/master/Shell.md#wildcards) for more info on these options\n* The `-d skip` option tells grep to skip directories instead of trying to treat them as text file to be searched\n\n```bash\n$ grep -ril 'yellow'\ntest_files/hidden_files/.fav_color.info\ntest_files/vibgyor.txt\n\n$ # recursive search\n$ shopt -s globstar\n$ grep -d skip -il 'yellow' **/*\ntest_files/vibgyor.txt\n\n$ # include hidden files as well\n$ shopt -s dotglob\n$ grep -d skip -il 'yellow' **/*\ntest_files/hidden_files/.fav_color.info\ntest_files/vibgyor.txt\n\n$ # use extended glob patterns\n$ shopt -s extglob\n$ # other than poem.txt\n$ grep -d skip -il 'red' **/!(poem.txt)\ntest_files/colors.txt\ntest_files/vibgyor.txt\n$ # other than poem.txt or colors.txt\n$ grep -d skip -il 'red' **/!(poem|colors).txt\ntest_files/vibgyor.txt\n```\n\n<br>\n\n#### <a name=\"recursive-search-using-find-command\"></a>Recursive search using find command\n\n* `find` is obviously more versatile\n* See also [this guide](./wheres_my_file.md#find) for more examples/tutorials on using `find`\n\n```bash\n$ # all files, including hidden ones\n$ find -type f -exec grep -il 'red' {} +\n./poem.txt\n./test_files/colors.txt\n./test_files/vibgyor.txt\n\n$ # all files ending with .txt\n$ find -type f -name '*.txt' -exec grep -in 'you' {} +\n./poem.txt:4:And so are you.\n\n$ # all files not ending with .txt\n$ find -type f -not -name '*.txt' -exec grep -in 'you' {} +\n./test_files/hidden_files/.fav_color.info:2:What about you\n```\n\n<br>\n\n#### <a name=\"passing-file-names-to-other-commands\"></a>Passing file names to other commands\n\n* To pass files filtered to another command, see if the receiving command can differentiate file names by ASCII NUL character\n* If so, use the `-Z` so that `grep` output is terminated with NUL character and commands like `xargs` have option `-0` to understand it\n* This helps when file names can have characters like space, newline, etc\n* Typical use case: Search and replace something in all files matching some pattern, for ex: `grep -rlZ 'PAT1' | xargs -0 sed -i 's/PAT2/REPLACE/g'`\n\n```bash\n$ # prompt at end of line not shown for simplicity\n$ # ^@ here indicates the NUL character\n$ grep -rlZ 'you' | cat -A\npoem.txt^@test_files/hidden_files/.fav_color.info^@\n\n$ # print first column from all lines of all files\n$ grep -rlZ 'you' | xargs -0 awk '{print $1}'\nRoses\nViolets\nSugar\nAnd\nI\nWhat\n```\n\n* simple example to show filenames with space causing issue if `-Z` is not used\n\n```bash\n$ # 'abc xyz.txt' is a file with space in its name\n$ grep -ri 'are'\nabc xyz.txt:hi how are you\npoem.txt:Roses are red,\npoem.txt:Violets are blue,\npoem.txt:And so are you.\nsaved_output.txt:Violets are blue,\n\n$ # problem when -Z is not used\n$ grep -ril 'are' | xargs grep 'you'\ngrep: abc: No such file or directory\ngrep: xyz.txt: No such file or directory\npoem.txt:And so are you.\n\n$ # no issues if -Z is used\n$ grep -rilZ 'are' | xargs -0 grep 'you'\nabc xyz.txt:hi how are you\npoem.txt:And so are you.\n```\n\n* Example for matching more than one search string anywhere in file\n\n```bash\n$ # files containing 'you'\n$ grep -rl 'you'\npoem.txt\ntest_files/hidden_files/.fav_color.info\n\n$ # files containing 'you' as well as 'are'\n$ grep -rlZ 'you' | xargs -0 grep -l 'are'\npoem.txt\n\n$ # files containing 'you' but NOT 'are'\n$ grep -rlZ 'you' | xargs -0 grep -L 'are'\ntest_files/hidden_files/.fav_color.info\n```\n\n* another example\n\n```bash\n$ grep -rilZ 'red' | xargs -0 grep -il 'blue'\npoem.txt\ntest_files/colors.txt\ntest_files/vibgyor.txt\n\n$ # note the use of `-Z` for middle command\n$ grep -rilZ 'red' | xargs -0 grep -ilZ 'blue' | xargs -0 grep -il 'violet'\npoem.txt\ntest_files/vibgyor.txt\n```\n\n<br>\n\n## <a name=\"search-strings-from-file\"></a>Search strings from file\n\n* using file input to specify search terms\n* `-F` option will force matching strings literally(no regular expressions)\n* See also [stackoverflow - Fastest way to find lines of a text file from another larger text file](https://stackoverflow.com/questions/42239179/fastest-way-to-find-lines-of-a-text-file-from-another-larger-text-file-in-bash) - read all answers\n\n```bash\n$ grep -if test_files/colors.txt poem.txt\nRoses are red,\nViolets are blue,\n\n$ # get common lines between two files\n$ grep -Fxf test_files/colors.txt test_files/vibgyor.txt\nBlue\nGreen\nRed\n\n$ # get lines present in vibgyor.txt but not in colors.txt\n$ grep -Fvxf test_files/colors.txt test_files/vibgyor.txt\nViolet\nIndigo\nYellow\nOrange\n```\n\n<br>\n\n## <a name=\"options-for-scripting-purposes\"></a>Options for scripting purposes\n\n* In scripts, often it is needed just to know if a pattern matches or not\n* The `-q` option doesn't print anything on stdout and exit status is `0` if match is found\n    * Check out [this practical script](https://github.com/learnbyexample/command_help/blob/master/ch) using the `-q` option\n\n```bash\n$ grep -qi 'rose' poem.txt\n$ echo $?\n0\n$ grep -qi 'lily' poem.txt\n$ echo $?\n1\n\n$ if grep -qi 'rose' poem.txt; then echo 'match found!'; else echo 'match not found'; fi\nmatch found!\n$ if grep -qi 'lily' poem.txt; then echo 'match found!'; else echo 'match not found'; fi\nmatch not found\n```\n\n* The `-s` option will suppress error messages as well\n\n```bash\n$ grep 'rose' file_xyz.txt\ngrep: file_xyz.txt: No such file or directory\n$ grep -s 'rose' file_xyz.txt\n$ echo $?\n2\n\n$ touch foo.txt\n$ chmod -r foo.txt\n$ grep 'rose' foo.txt\ngrep: foo.txt: Permission denied\n$ grep -s 'rose' foo.txt\n$ echo $?\n2\n```\n\n<br>\n\n## <a name=\"regular-expressions-breere\"></a>Regular Expressions - BRE/ERE\n\nBefore diving into regular expressions, few examples to show default `grep` behavior vs `-F`\n\n```bash\n$ # oops, why did it not match?\n$ echo 'int a[5]' | grep 'a[5]'\n\n$ # where did that error come from??\n$ echo 'int a[5]' | grep 'a['\ngrep: Invalid regular expression\n\n$ # what is going on???\n$ echo 'int a[5]' | grep 'a[5'\ngrep: Unmatched [ or [^\n\n$ # phew, -F is a life saver\n$ echo 'int a[5]' | grep -F 'a[5]'\nint a[5]\n\n$ # [ and ] are meta characters, details in following sections\n$ echo 'int a[5]' | grep 'a\\[5]'\nint a[5]\n```\n\n* By default, `grep` treats the search pattern as BRE (Basic Regular Expression)\n    * `-G` option can be used to specify explicitly that BRE is used\n* The `-E` option allows to use ERE (Extended Regular Expression) which in GNU grep's case only differs in how meta characters are used, no difference in regular expression functionalities\n* If `-F` option is used, the search string is treated literally\n* If available, one can also use `-P` which indicates PCRE (Perl Compatible Regular Expression)\n\n<br>\n\n#### <a name=\"line-anchors\"></a>Line Anchors\n\n* Often, search must match from beginning of line or towards end of line\n* For example, an integer variable declaration in `C` will start with optional white-space, the keyword `int`, white-space and then variable(s)\n    * This way one can avoid matching declarations inside single line comments as well.\n* Similarly, one might want to match a variable at end of statement\n* The meta characters for line anchoring are `^` for beginning of line and `$` for end of line\n\n```bash\n$ echo 'Fantasy is my favorite genre' > fav.txt\n$ echo 'My favorite genre is Fantasy' >> fav.txt\n$ cat fav.txt\nFantasy is my favorite genre\nMy favorite genre is Fantasy\n\n$ # start of line\n$ grep '^Fantasy' fav.txt\nFantasy is my favorite genre\n\n$ # end of line\n$ grep 'Fantasy$' fav.txt\nMy favorite genre is Fantasy\n\n$ # without anchors\n$ grep 'Fantasy' fav.txt\nFantasy is my favorite genre\nMy favorite genre is Fantasy\n```\n\n* As the meta characters have special meaning (assuming `-F` option is not used), they have to be escaped using `\\` to match literally\n* The `\\` itself is meta character, so to match it literally, use `\\\\`\n* The line anchors `^` and `$` have special meaning only when they are present at start/end of regular expression\n\n```bash\n$ echo '^foo bar$' | grep '^foo'\n$ echo '^foo bar$' | grep '\\^foo'\n^foo bar$\n$ echo '^foo bar$' | grep '^^foo'\n^foo bar$\n\n$ echo '^foo bar$' | grep 'bar$'\n$ echo '^foo bar$' | grep 'bar\\$'\n^foo bar$\n$ echo '^foo bar$' | grep 'bar$$'\n^foo bar$\n\n$ echo 'foo $ bar' | grep ' $ '\nfoo $ bar\n\n$ printf 'foo\\cbar' | grep -o '\\c'\nc\n$ printf 'foo\\cbar' | grep -o '\\\\c'\n\\c\n```\n\n<br>\n\n#### <a name=\"word-anchors\"></a>Word Anchors\n\n* The `-w` option works well to match whole words. But what about matching only start or end of words?\n* Anchors `\\<` and `\\>` will match start/end positions of a word\n* `\\b` can also be used instead of `\\<` and `\\>` which matches both edges of a word\n\n```bash\n$ printf 'spar\\npar\\npart\\napparent\\n'\nspar\npar\npart\napparent\n\n$ # words ending with par\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep 'par\\>'\nspar\npar\n\n$ # words starting with par\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep '\\<par'\npar\npart\n```\n\n* `-w` option is same as specifying both start and end word boundaries\n\n```bash\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep '\\<par\\>'\npar\n\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep '\\bpar\\b'\npar\n\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep -w 'par'\npar\n```\n\n* `\\b` has an opposite `\\B` which is quite useful too\n\n```bash\n$ # string not surrounded by word boundary either side\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep '\\Bpar\\B'\napparent\n\n$ # word containing par but not as start of word\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep '\\Bpar'\nspar\napparent\n\n$ # word containing par but not as end of word\n$ printf 'spar\\npar\\npart\\napparent\\n' | grep 'par\\B'\npart\napparent\n```\n\n* the word boundary escape sequences differ slightly from `-w` option\n\n```bash\n$ # this fails because there is no word boundary between space and +\n$ echo '2 +3 = 5' | grep '\\b+3\\b'\n$ # this works as -w only ensures that there are no surrounding word characters\n$ echo '2 +3 = 5' | grep -w '+3'\n2 +3 = 5\n\n$ # doesn't work as , isn't at start of word boundary\n$ echo 'hi, 2 one' | grep '\\<, 2\\>'\n$ # won't match as there are word characters before ,\n$ echo 'hi, 2 one' | grep -w ', 2'\n$ # works as \\b matches both edges and , is at end of word after i\n$ echo 'hi, 2 one' | grep '\\b, 2\\b'\nhi, 2 one\n```\n\n<br>\n\n#### <a name=\"alternation\"></a>Alternation\n\n* The `|` meta character is similar to using multiple `-e` option\n* Each side of `|` is complete regular expression with their own start/end anchors\n* How each part of alternation is handled and order of evaluation/output is beyond the scope of this tutorial\n    * See [this](https://www.regular-expressions.info/alternation.html) for more info on this topic.\n* `|` is one of meta characters that requires different syntax between BRE/ERE\n\n```bash\n$ grep 'blue\\|you' poem.txt\nViolets are blue,\nAnd so are you.\n$ grep -E 'blue|you' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # extract case-insensitive e or f from anywhere in line\n$ echo 'Fantasy is my favorite genre' | grep -Eio 'e|f'\nF\nf\ne\ne\ne\n\n$ # extract case-insensitive e at end of line, f at start of line\n$ echo 'Fantasy is my favorite genre' | grep -Eio 'e$|^f'\nF\ne\n```\n\n* A cool usecase of alternation is using `^` or `$` anchors to highlight searched term as well as display rest of unmatched lines\n    * the line anchors will match every input line, even empty lines as they are position markers\n\n```bash\n$ grep --color=auto -E '^|are' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ grep --color=auto -E 'is|$' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n```\n\nScreenshot for above example:\n\n![highlighting string](./images/highlight_string_whole_file_op.png)\n\nSee also\n\n* [stackoverflow - Grep output with multiple Colors](https://stackoverflow.com/questions/17236005/grep-output-with-multiple-colors)\n* [unix.stackexchange - Multicolored Grep](https://unix.stackexchange.com/questions/104350/multicolored-grep)\n\n<br>\n\n#### <a name=\"the-dot-meta-character\"></a>The dot meta character\n\nThe `.` meta character matches is used to match any character\n\n```bash\n$ # any two characters surrounded by word boundaries\n$ echo 'I have 12, he has 132!' | grep -ow '..'\n12\nhe\n\n$ # match three characters from start of line\n$ # \\t (TAB) is single character here\n$ printf 'a\\tbcd\\n' | grep -o '^...'\na       b\n\n$ # all three character word starting with c\n$ echo 'car bat cod cope scat dot abacus' | grep -ow 'c..'\ncar\ncod\n\n$ echo '1 & 2' | grep -o '.'\n1\n \n&\n \n2\n```\n\n<br>\n\n#### <a name=\"quantifiers\"></a>Greedy Quantifiers\n\nDefines how many times a character (simplified for now) should be matched\n\n* `?` will try to match 0 or 1 time\n* For BRE, use `\\?`\n\n```bash\n$ printf 'late\\npale\\nfactor\\nrare\\nact\\n'\nlate\npale\nfactor\nrare\nact\n\n$ # match a followed by t, with or without c in between\n$ printf 'late\\npale\\nfactor\\nrare\\nact\\n' | grep -E 'ac?t'\nlate\nfactor\nact\n\n$ # same as using this alternation\n$ printf 'late\\npale\\nfactor\\nrare\\nact\\n' | grep -E 'at|act'\nlate\nfactor\nact\n```\n\n* `*` will try to match 0 or more times\n* There is no upper limit and `*` will try to match as many times as possible\n    * if matching maximum times results in overall regex failing, then next best count is chosen until overall regex passes\n    * if there are multiple quantifiers, left-most quantifier gets precedence\n\n```bash\n$ echo 'abbbc' | grep -o 'b*'\nbbb\n\n$ # matches 0 or more b only if surrounded by a and c\n$ echo 'abc ac adc abbc bbb bc' | grep -o 'ab*c'\nabc\nac\nabbc\n\n$ # see how it matched everything\n$ echo 'car bat cod map scat dot abacus' | grep -o '.*'\ncar bat cod map scat dot abacus\n\n$ # but here it stops at m\n$ echo 'car bat cod map scat dot abacus' | grep -o '.*m'\ncar bat cod m\n\n$ # stopped at dot, not bat or scat - match as much as possible\n$ echo 'car bat cod map scat dot abacus' | grep -o 'c.*t'\ncar bat cod map scat dot\n\n$ # matching overall expression gets preference\n$ echo 'car bat cod map scat dot abacus' | grep -o 'c.*at'\ncar bat cod map scat\n\n$ # precedence is left to right in case of multiple matches\n$ echo 'car bat cod map scat dot abacus' | grep -o 'b.*m'\nbat cod m\n$ echo 'car bat cod map scat dot abacus' | grep -o 'b.*m*'\nbat cod map scat dot abacus\n```\n\n* `+` will try to match 1 or more times\n* Another meta character that differs in syntax between BRE/ERE\n\n```bash\n$ echo 'abbbc' | grep -o 'b\\+'\nbbb\n$ echo 'abbbc' | grep -oE 'b+'\nbbb\n\n$ echo 'abc ac adc abbc bbb bc' | grep -oE 'ab+c'\nabc\nabbc\n$ echo 'abc ac adc abbc bbb bc' | grep -o 'ab*c'\nabc\nac\nabbc\n```\n\n* For more precise control on number of times to match, `{}` is useful\n    * use `\\{\\}` for BRE\n* It can take one of four forms, `{m,n}`, `{,n}`, `{m,}` and `{n}`\n\n```bash\n$ # {m,n} - m to n, including both m and n\n$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{1,2}c'\nabc\nabbc\n\n$ # {,n} - 0 to n times\n$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{,2}c'\nac\nabc\nabbc\n\n$ # {m,} - at least m times\n$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2,}c'\nabbc\nabbbc\n\n$ # {n} - exactly n times\n$ echo 'ac abc abbc abbbc' | grep -Eo 'ab{2}c'\nabbc\n```\n\n<br>\n\n#### <a name=\"character-classes\"></a>Character classes\n\n* The meta character pairs `[]` allow to match any of the multiple characters within `[]`\n* Meta characters like `^`, `$` have different meaning inside and outside of `[]`\n* Simple example first, matching any of the characters within `[]`\n\n```bash\n$ echo 'do so in to no on' | grep -ow '[nt]o'\nto\nno\n\n$ echo 'do so in to no on' | grep -ow '[sot][on]'\nso\nto\non\n```\n\n* Adding a quantifier\n* Check out [unix words](https://en.wikipedia.org/wiki/Words_(Unix)) and [sample words file](https://users.cs.duke.edu/~ola/ap/linuxwords)\n\n```bash\n$ # words made up of letters o and n, at least 2 letters\n$ grep -xE '[on]{2,}' /usr/share/dict/words\nno\nnon\nnoon\non\n\n$ # lines containing only digits\n$ printf 'cat\\nfoo\\n123\\nbaz\\n42\\n' | grep -xE '[0123456789]+'\n123\n42\n```\n\n* Character ranges\n* Matching any alphabet, number, hexadecimal number etc becomes cumbersome if every character has to be individually specified\n* So, there's a shortcut, using `-` to construct a range (has to be specified in ascending order)\n* See [ascii codes table](https://ascii.cl/) for reference\n    * Note that behavior of range will differ for other character encodings\n    * See **Character Classes and Bracket Expressions** as well as **LC_COLLATE under Environment Variables** sections in `info grep` for more detail\n* [Matching Numeric Ranges with a Regular Expression](https://www.regular-expressions.info/numericranges.html)\n\n```bash\n$ printf 'cat\\nfoo\\n123\\nbaz\\n42\\n' | grep -xE '[0-9]+'\n123\n42\n\n$ printf 'cat\\nfoo\\n123\\nbaz\\n42\\n' | grep -xiE '[a-z]+'\ncat\nfoo\nbaz\n\n$ # only valid decimal numbers\n$ printf '128\\n34\\nfe32\\nfoo1\\nbar\\n' | grep -xE '[0-9]+'\n128\n34\n\n$ # only valid octal numbers\n$ printf '128\\n34\\nfe32\\nfoo1\\nbar\\n' | grep -xE '[0-7]+'\n34\n\n$ # only valid hexadecimal numbers\n$ printf '128\\n34\\nfe32\\nfoo1\\nbar\\n' | grep -xiE '[0-9a-f]+'\n128\n34\nfe32\n\n$ # numbers between 10-29\n$ echo '23 54 12 92' | grep -owE '[12][0-9]'\n23\n12\n```\n\n* Negating character class\n* By using `^` as first character inside `[]`, we get inverted character class\n    * As pointed out earlier, some meta characters behave differently inside and outside of `[]`\n\n```bash\n$ # alphabetic words not starting with c\n$ echo '123 core not sink code finish' | grep -owE '[^c][a-z]+'\nnot\nsink\nfinish\n\n$ # excluding numbers 2,3,4,9\n$ # note that 200a 200; etc will also match, usage depends on knowing input\n$ echo '2001 2004 2005 2008 2009' | grep -ow '200[^2-49]'\n2001\n2005\n2008\n\n$ # get characters from start of line upto(not including) known identifier\n$ echo 'foo=bar; baz=123' | grep -oE '^[^=]+'\nfoo\n\n$ # get characters at end of line from(not including) known identifier\n$ echo 'foo=bar; baz=123' | grep -oE '[^=]+$'\n123\n\n$ # get all sequence of characters surrounded by unique identifier\n$ echo 'I like \"mango\" and \"guava\"' | grep -oE '\"[^\"]+\"'\n\"mango\"\n\"guava\"\n```\n\n* Matching meta characters inside `[]`\n* Most meta characters like `( ) . + { } | $` don't have special meaning inside `[]` and hence do not require special treatment\n* Some combination like `[.` or `=]` cannot be used in this order, as they have special meaning within `[]`\n    * See **Character Classes and Bracket Expressions** section in `info grep` for more detail\n\n```bash\n$ # to match - it should be first or last character within []\n$ echo 'Foo-bar 123-456 42 Co-operate' | grep -oiwE '[a-z-]+'\nFoo-bar\nCo-operate\n\n$ # to match ] it should be first character within []\n$ printf 'int a[5]\\nfoo=bar\\n' | grep '[]=]'\nint a[5]\nfoo=bar\n\n$ # to match [ use [ anywhere in the character list\n$ # [][] will match both [ and ]\n$ printf 'int a[5]\\nfoo=bar\\n' | grep '[[]'\nint a[5]\n\n$ # to match ^ it should be other than first in the list\n$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | grep -owE '[a-z^0-9]{3,}'\na^2\nb^2\n2ab\n```\n\n* Named character classes\n* Equivalent class shown is for C locale and ASCII character encoding\n    * See [ascii codes table](https://ascii.cl/) for reference\n* See **Character Classes and Bracket Expressions** section in `info grep` for more detail\n\n| Character classes | Description |\n| ------------- | ----------- |\n| `[:digit:]` | Same as `[0-9]` |\n| `[:lower:]` | Same as `[a-z]` |\n| `[:upper:]` | Same as `[A-Z]` |\n| `[:alpha:]` | Same as `[a-zA-Z]` |\n| `[:alnum:]` | Same as `[0-9a-zA-Z]` |\n| `[:xdigit:]` | Same as `[0-9a-fA-F]` |\n| `[:cntrl:]` | Control characters - first 32 ASCII characters and 127th (DEL) |\n| `[:punct:]` | All the punctuation characters |\n| `[:graph:]` | `[:alnum:]` and `[:punct:]` |\n| `[:print:]` | `[:alnum:]`, `[:punct:]` and space |\n| `[:blank:]` | Space and tab characters |\n| `[:space:]` | white-space characters: tab, newline, vertical tab, form feed, carriage return and space |\n\n```bash\n$ printf '128\\n34\\nAB32\\nFoo\\nbar\\n' | grep -x '[[:alnum:]]*'\n128\n34\nAB32\nFoo\nbar\n\n$ printf '128\\n34\\nAB32\\nFoo\\nbar\\n' | grep -x '[[:lower:]]*'\nbar\n\n$ printf '128\\n34\\nAB32\\nFoo\\nbar\\n' | grep -x '[[:lower:]0-9]*'\n128\n34\nbar\n```\n\n* backslash character classes\n\n| Character classes | Description |\n| ------------- | ----------- |\n| `\\w` | Same as `[0-9a-zA-Z_]` or `[[:alnum:]_]` |\n| `\\W` | Same as `[^0-9a-zA-Z_]` or `[^[:alnum:]_]` |\n| `\\s` | Same as `[[:space:]]` |\n| `\\S` | Same as `[^[:space:]]` |\n\n```bash\n$ printf '123\\n$#\\ncmp_str\\nFoo_bar\\n' | grep -x '\\w*'\n123\ncmp_str\nFoo_bar\n$ printf '123\\n$#\\ncmp_str\\nFoo_bar\\n' | grep -x '[[:alnum:]_]*'\n123\ncmp_str\nFoo_bar\n\n$ printf '123\\n$#\\ncmp_str\\nFoo_bar\\n' | grep -x '\\W*'\n$#\n$ printf '123\\n$#\\ncmp_str\\nFoo_bar\\n' | grep -x '[^[:alnum:]_]*'\n$#\n```\n\n<br>\n\n#### <a name=\"grouping\"></a>Grouping\n\n* Character classes allow matching against a choice of multiple character list and then quantifier added if needed\n* One of the uses of grouping is analogous to character classes for whole regular expressions, instead of just list of characters\n* The meta characters `()` are used for grouping\n    * requires `\\(\\)` for BRE\n* Similar to `a(b+c)d = abd+acd` in maths, you get `a(b|c)d = abd|acd` in regular expressions\n\n```bash\n$ # 5 letter words starting with c and ending with ty or ly\n$ grep -xE 'c..(ty|ly)' /usr/share/dict/words\ncatty\ncoyly\ncurly\n\n$ # 7 letter words starting with e and ending with rged or sted\n$ grep -xE 'e..(rg|st)ed' /usr/share/dict/words\nemerged\nexisted\n\n$ # repeat a pattern 3 times\n$ grep -xE '([a-d][r-z]){3}' /usr/share/dict/words\navatar\nawards\ncravat\n\n$ # nesting of () is allowed\n$ grep -E '([as](p|c)[r-t]){2}' /usr/share/dict/words\nscraps\n\n$ # can be used to match specific columns in well defined tables\n$ echo 'foo:123:bar:baz' | grep -E '^([^:]+:){2}bar'\nfoo:123:bar:baz\n```\n\n* See also [stackoverflow - matching character exactly n times in a line](https://stackoverflow.com/questions/40187643/grep-search-with-regex)\n\n<br>\n\n#### <a name=\"back-reference\"></a>Back reference\n\n* The matched string within `()` can also be used to be matched again by back referencing the captured groups\n* `\\1` denotes the first matched group, `\\2` the second one and so on\n    * Order is leftmost `(` is `\\1`, next one is `\\2` and so on\n* Note that the matched string, not the regular expression itself is referenced\n    * for ex: if `([0-9][a-f])` matches `3b`, then back referencing will be `3b` not any other valid match of the regular expression like `8f`, `0a` etc\n    * Other regular expressions like PCRE do allow referencing the regular expression itself\n\n```bash\n$ # note how first three and last three letters are same\n$ grep -xE '([a-d]..)\\1' /usr/share/dict/words\nbonbon\ncancan\nchichi\n$ # note how adding quantifier is not same as back-referencing\n$ grep -m4 -xE '([a-d]..){2}' /usr/share/dict/words\nabacus\nabided\nabides\nablaze\n\n$ # words with consecutive repeated letters\n$ echo 'eel flee all pat ilk seen' | grep -iowE '[a-z]*(.)\\1[a-z]*'\neel\nflee\nall\nseen\n\n$ # 17 letter words with first and last as same letter\n$ grep -xE '(.)[a-z]{15}\\1' /usr/share/dict/words\nsemiprofessionals\ntranscendentalist\n```\n\n* Spotting repeated words\n\n```bash\n$ cat story.txt\nsinging tin in the rain\nwalking for for a cause\nhave a nice day\nday and night\n\n$ grep -wE '(\\w+)\\W+\\1' story.txt\nwalking for for a cause\n```\n\n* **Note** that there is an [issue for certain usage of back-reference and quantifier](https://debbugs.gnu.org/cgi/bugreport.cgi?bug=26864)\n\n```bash\n$ # no output\n$ grep -m5 -xiE '([a-z]*([a-z])\\2[a-z]*){2}' /usr/share/dict/words\n$ # works when nesting is unrolled\n$ grep -m5 -xiE '[a-z]*([a-z])\\1[a-z]*([a-z])\\2[a-z]*' /usr/share/dict/words\nAbbott\nAnnabelle\nAnnette\nAppaloosa\nAppleseed\n\n$ # no problem if PCRE is used instead of ERE\n$ grep -m5 -xiP '([a-z]*([a-z])\\2[a-z]*){2}' /usr/share/dict/words\nAbbott\nAnnabelle\nAnnette\nAppaloosa\nAppleseed\n```\n\n<br>\n\n## <a name=\"multiline-matching\"></a>Multiline matching\n\n* If input is small enough to meet memory requirements, the `-z` option comes in handy to match across multiple lines\n* Instead of newline being line separator, the ASCII NUL character is used\n    * So, multiline matching depends on whether or not input file itself contains the NUL character\n    * Usually text files won't have occasion to use the NUL character and presence of it marks it as binary file for `grep`\n\n```bash\n$ # \\0 for ASCII NUL character\n$ printf 'red\\nblue\\n\\0green\\n' | cat -e\nred$\nblue$\n^@green$\n\n$ # see --binary-files=TYPE option in info grep for binary details\n$ printf 'red\\nblue\\n\\0green\\n' | grep -a 'red'\nred\n\n$ # with -z, \\0 marks the different 'lines'\n$ printf 'red\\nblue\\n\\0green\\n' | grep -z 'red'\nred\nblue\n\n$ # if no \\0 in input, entire input read as single string\n$ printf 'red\\nblue\\ngreen\\n' | grep -z 'red'\nred\nblue\ngreen\n```\n\n* `\\n` is not defined in BRE/ERE\n    * see [unix.stackexchange - How to specify characters using hexadecimal codes](https://unix.stackexchange.com/questions/19491/how-to-specify-characters-using-hexadecimal-codes-in-grep) for a workaround\n* if some characteristics of input is known, `[[:space:]]` can be used as workaround, which matches all white-space characters\n\n```bash\n$ grep -oz 'Roses.*blue,[[:space:]]' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n<br>\n\n## <a name=\"perl-compatible-regular-expressions\"></a>Perl Compatible Regular Expressions\n\n```bash\n$ # see also: https://github.com/learnbyexample/command_help\n$ man grep | sed -n '/^\\s*-P/,/^$/p'\n       -P, --perl-regexp\n              Interpret the pattern as a  Perl-compatible  regular  expression\n              (PCRE).   This  is  highly  experimental and grep -P may warn of\n              unimplemented features.\n\n```\n\n* The man page informs that `-P` is *highly experimental*. So far, haven't faced any issues. But do keep this in mind.\n    * newer versions of `GNU grep` has fixes for some `-P` bugs, see [release notes](https://savannah.gnu.org/news/?group_id=67) for an overview of changes between versions\n* Only a few highlights is presented here\n* For more info\n    * `man pcrepattern` or [read it online](https://www.pcre.org/original/doc/html/pcrepattern.html)\n    * [perldoc - re](https://perldoc.perl.org/perlre.html) - Perl regular expression syntax, also links to other related tutorials\n    * [stackoverflow - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)\n\n<br>\n\n#### <a name=\"backslash-sequences\"></a>Backslash sequences\n\nSome of the backslash constructs available in PCRE over already seen ones in ERE\n\n* `\\d` for `[0-9]`\n* `\\s` for `[ \\t\\r\\n\\f\\v]`\n* `\\h` for `[ \\t]`\n* `\\n` for newline character\n* `\\D`, `\\S`, `\\H`, `\\N` etc for their opposites\n\n```bash\n$ # example for [0-9] in ERE and \\d in PCRE\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oE '[0-9]+'\n5\n3\n83\n120\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '\\d+'\n5\n3\n83\n120\n\n$ # (?s) allows newlines to be also matches when using . meta character\n$ grep -ozP '(?s)Roses.*blue,\\n' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n* See **INTERNAL OPTION SETTING** in `man pcrepattern` for more info on `(?s)`, `(?m)` etc\n* [Specifying Modes Inside The Regular Expression](https://www.regular-expressions.info/modifiers.html) also has some detail on such options\n\n<br>\n\n#### <a name=\"non-greedy-matching\"></a>Non-greedy matching\n\n* Both BRE/ERE support only greedy matching quantifiers\n    * match as much as possible\n* PCRE supports non-greedy version by adding `?` after quantifiers\n    * match as minimal as possible\n* See [this Python notebook](https://nbviewer.jupyter.org/url/norvig.com/ipython/pal3.ipynb) for an interesting project on palindrome sentences\n\n```bash\n$ echo 'foo and bar and baz went shopping bytes' | grep -oi '\\w.*and'\nfoo and bar and\n\n$ echo 'foo and bar and baz went shopping bytes' | grep -oiP '\\w.*?and'\nfoo and\nbar and\n\n$ # recall that matching overall expression gets preference\n$ echo 'foo and bar and baz went shopping bytes' | grep -oi '\\w.*and baz'\nfoo and bar and baz\n$ echo 'foo and bar and baz went shopping bytes' | grep -oiP '\\w.*?and baz'\nfoo and bar and baz\n\n$ # minimal matching with single character has simple workaround\n$ echo 'A man, a plan, a canal, Panama' | grep -oi 'a.*,'\nA man, a plan, a canal,\n$ echo 'A man, a plan, a canal, Panama' | grep -oi 'a[^,]*,'\nA man,\na plan,\na canal,\n```\n\n<br>\n\n#### <a name=\"lookarounds\"></a>Lookarounds\n\n* Ability to add conditions to match before/after required pattern\n* There are four types\n    * positive lookahead `(?=`\n    * negative lookahead `(?!`\n    * positive lookbehind `(?<=`\n    * negative lookbehind `(?<!`\n* One way to remember is that **behind** uses `<` and **negative** uses `!` instead of `=`\n* When used with `-o` option, lookarounds portion won't be part of output\n\nFixed and variable length *lookbehind*\n\n```bash\n$ # extract digits preceded by single lowercase letter and =\n$ # this is fixed length lookbehind because length is known\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '(?<=\\b[a-z]=)\\d+'\n83\n120\n\n$ # error because {2,} induces variable length matching\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '(?<=\\b[a-z]{2,}=)\\d+'\ngrep: lookbehind assertion is not fixed length\n\n$ # use \\K for such cases\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '\\b[a-z]{2,}=\\K\\d+'\n5\n3\n```\n\n* Examples for lookarounds\n\n```bash\n$ # extract digits that follow =\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '=\\K\\d+'\n5\n3\n83\n120\n\n$ # digits that follow = and has , after\n$ echo 'foo=5, bar=3; x=83, y=120' | grep -oP '=\\K\\d+(?=,)'\n5\n83\n\n$ # extract words, but not those at start of line\n$ echo 'car bat cod map' | grep -owP '(?<!^)\\w+'\nbat\ncod\nmap\n\n$ # extract words, but not those at start of line or end of line\n$ echo 'car bat cod map' | grep -owP '(?<!^)\\w+(?!$)'\nbat\ncod\n\n$ # matching multiple search patterns in any order\n$ grep -P '(?=.*are)(?=.*s).*d' poem.txt\nRoses are red,\nAnd so are you.\n```\n\n<br>\n\n#### <a name=\"ignoring-specific-matches\"></a>Ignoring specific matches\n\n* A useful construct is `(*SKIP)(*F)` which allows to discard matches not needed\n* Simple way to use is that regular expression which should be discarded is written first, `(*SKIP)(*F)` is appended and then whichever is required by added after `|`\n* See [Excluding Unwanted Matches](https://www.rexegg.com/backtracking-control-verbs.html#skipfail) for more info\n\n```bash\n$ # all words except bat and map\n$ echo 'car bat cod map' | grep -oP '(bat|map)(*SKIP)(*F)|\\w+'\ncar\ncod\n\n$ # all words except those surrounded by double quotes\n$ echo 'I like \"mango\" and \"guava\"' | grep -oP '\"[^\"]+\"(*SKIP)(*F)|\\w+'\nI\nlike\nand\n```\n\n<br>\n\n#### <a name=\"re-using-regular-expression-pattern\"></a>Re-using regular expression pattern\n\n* `\\1`, `\\2` etc only matches exact string\n* `(?1)`, `(?2)` etc re-uses the regular expression itself\n\n```bash\n$ # (?1) refers to first group \\d{4}-\\d{2}-\\d{2}\n$ echo '2008-03-24 and 2012-08-12 foo' | grep -oP '(\\d{4}-\\d{2}-\\d{2})\\D+(?1)'\n2008-03-24 and 2012-08-12\n```\n\n<br>\n\n## <a name=\"gotchas-and-tips\"></a>Gotchas and Tips\n\n* Always quote the search string (unless you know what you are doing :P)\n\n```bash\n$ # spaces are special\n$ grep so are poem.txt\ngrep: are: No such file or directory\npoem.txt:And so are you.\n$ grep 'so are' poem.txt\nAnd so are you.\n\n$ # use of # indicates start of comment\n$ printf 'foo\\na#2\\nb#3\\n' | grep #2\nUsage: grep [OPTION]... PATTERN [FILE]...\nTry 'grep --help' for more information.\n$ printf 'foo\\na#2\\nb#3\\n' | grep '#2'\na#2\n```\n\n* Another common problem is unquoted search string will be open to shell's own globbing rules\n\n```bash\n$ # sample output on bash shell, might vary for different shells\n$ echo '*.txt' | grep -F *.txt\n$ echo '*.txt' | grep -F '*.txt'\n*.txt\n```\n\n* Use double quotes for variable expansion, command substitution, etc (Note: could vary based on shell used)\n* See [mywiki.wooledge Quotes](https://mywiki.wooledge.org/Quotes) for detailed discussion of quoting in `bash` shell\n\n```bash\n$ # sample output on bash shell, might vary for different shells\n$ color='blue'\n$ grep \"$color\" poem.txt\nViolets are blue,\n```\n\n* Pattern starting with `-`\n\n```bash\n$ # this issue is not specific to grep alone\n$ # the command assumes -2 is an option and hence the error\n$ echo '5*3-2=13' | grep '-2'\nUsage: grep [OPTION]... PATTERN [FILE]...\nTry 'grep --help' for more information.\n\n$ # workaround by using \\-\n$ echo '5*3-2=13' | grep '\\-2'\n5*3-2=13\n\n$ # or use -- to indicate no further options to process\n$ echo '5*3-2=13' | grep -- '-2'\n5*3-2=13\n\n$ # same issue with printf\n$ printf '-1+2=1\\n'\nbash: printf: -1: invalid option\nprintf: usage: printf [-v var] format [arguments]\n$ printf -- '-1+2=1\\n'\n-1+2=1\n```\n\n* Tip: Options can be specified at end of command as well, useful if option was forgotten and have to quickly add it to previous command from history\n\n```bash\n$ grep 'are' poem.txt\nRoses are red,\nViolets are blue,\nAnd so are you.\n\n$ # use previous command from history, for ex up arrow key in bash\n$ # then simply add the option at end\n$ grep 'are' poem.txt -n\n1:Roses are red,\n2:Violets are blue,\n4:And so are you.\n```\n\n* Speed boost if input file is ASCII\n* See also [unix.stackexchange - Counting the number of lines having a number > 100](https://unix.stackexchange.com/questions/312297/counting-the-number-of-lines-having-a-number-greater-than-100/312330#312330) - where `grep` is blazing fast compared to other solutions\n\n```bash\n$ time grep -xE '([a-d][r-z]){3}' /usr/share/dict/words\navatar\nawards\ncravat\n\nreal    0m0.145s\n\n$ time LC_ALL=C grep -xE '([a-d][r-z]){3}' /usr/share/dict/words\navatar\nawards\ncravat\n\nreal    0m0.011s\n```\n\n* Speed boost by using PCRE for back-references\n* might be faster when using quantifiers as well\n\n```bash\n$ time LC_ALL=C grep -xE '([a-z]..)\\1' /usr/share/dict/words\nbonbon\ncancan\nchichi\nmurmur\nmuumuu\npawpaw\npompom\ntartar\ntestes\n\nreal    0m0.174s\n$ time grep -xP '([a-z]..)\\1' /usr/share/dict/words\nbonbon\ncancan\nchichi\nmurmur\nmuumuu\npawpaw\npompom\ntartar\ntestes\n\nreal    0m0.008s\n```\n\n<br>\n\n## <a name=\"regular-expressions-reference-ere\"></a>Regular Expressions Reference (ERE)\n\n<br>\n\n#### <a name=\"anchors\"></a>Anchors\n\n* `^` match from start of line\n* `$` match end of line\n* `\\<` match beginning of word\n* `\\>` match end of word\n* `\\b` match edge of word\n* `\\B` match other than edge of word\n\n<br>\n\n#### <a name=\"character-quantifiers\"></a>Character Quantifiers\n\n* `.` match any single character\n* `*` match preceding character/group 0 or more times\n* `+` match preceding character/group 1 or more times\n* `?` match preceding character/group 0 or 1 times\n* `{m,n}` match preceding character/group m to n times, including m and n\n* `{m,}` match preceding character/group m or more times\n* `{,n}` match preceding character/group 0 to n times\n* `{n}` match preceding character/group exactly n times\n\n<br>\n\n#### <a name=\"character-classes-and-backslash-sequences\"></a>Character classes and backslash sequences\n\n* `[aeiou]` match any of these characters\n* `[^aeiou]` do not match any of these characters\n* `[a-z]` match any lowercase alphabet\n* `[0-9]` match any digit character\n* `\\w` match alphabets, digits and underscore character, short cut for `[a-zA-Z0-9_]`\n* `\\W` opposite of `\\w` , short cut for `[^a-zA-Z0-9_]`\n* `\\s` match white-space characters: tab, newline, vertical tab, form feed, carriage return, and space\n* `\\S` match other than white-space characters\n\n<br>\n\n#### <a name=\"pattern-groups\"></a>Pattern groups\n\n* `|` matches either of the given patterns\n* `()` patterns within `()` are grouped and treated as one pattern, useful in conjunction with `|`\n* `\\1` backreference to first grouped pattern within `()`\n* `\\2` backreference to second grouped pattern within `()` and so on\n\n<br>\n\n#### <a name=\"basic-vs-extended-regular-expressions\"></a>Basic vs Extended Regular Expressions\n\nBy default, the pattern passed to `grep` is treated as Basic Regular Expressions(BRE), which can be overridden using options like `-E` for ERE and `-P` for Perl Compatible Regular Expression(PCRE). Paraphrasing from `info grep`\n\n>In Basic Regular Expressions the meta-characters `? + { | ( )` lose their special meaning, instead use the backslashed versions `\\? \\+ \\{ \\| \\( \\)`\n\n<br>\n\n## <a name=\"further-reading\"></a>Further Reading\n\n* `man grep` and `info grep`\n    * At least go through all options ;)\n    * **Usage section** in `info grep` has good examples as well\n* This chapter has also been [converted to a book](https://github.com/learnbyexample/learn_gnugrep_ripgrep) with additional examples, exercises and covers popular alternative `ripgrep`\n* A bit of history\n    * [Brian Kernighan remembers the origins of grep](https://thenewstack.io/brian-kernighan-remembers-the-origins-of-grep/)\n    * [how grep command was born](https://medium.com/@rualthanzauva/grep-was-a-private-command-of-mine-for-quite-a-while-before-i-made-it-public-ken-thompson-a40e24a5ef48)\n    * [why GNU grep is fast](https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html)\n    * [unix.stackexchange - Difference between grep, egrep and fgrep](https://unix.stackexchange.com/questions/17949/what-is-the-difference-between-grep-egrep-and-fgrep)\n* Q&A on stackoverflow/stackexchange are good source of learning material, good for practice exercises as well\n    * [grep Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/grep?sort=votes&pageSize=15)\n    * [grep Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/grep?sort=votes&pageSize=15)\n* Learn Regular Expressions (has information on flavors other than BRE/ERE/PCRE too)\n    * [Regular Expressions Tutorial](https://www.regular-expressions.info/tutorial.html)\n    * [rexegg](https://www.rexegg.com/) - tutorials, tricks and more\n    * [regexcrossword](https://regexcrossword.com/)\n    * [stackoverflow - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)\n    * [online regex tester and debugger](https://regex101.com/) - by default `pcre` flavor\n* Alternatives\n    * [ripgrep](https://github.com/BurntSushi/ripgrep)\n    * [pcregrep](https://www.pcre.org/original/doc/html/pcregrep.html)\n    * [ag - silver searcher](https://github.com/ggreer/the_silver_searcher)\n* [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)\n\n"
  },
  {
    "path": "gnu_sed.md",
    "content": "<br> <br> <br>\n\n---\n\n:information_source: :information_source: This chapter has been converted into a better formatted ebook: https://learnbyexample.github.io/learn_gnused/. The ebook also has content updated for newer version of the commands, includes exercises, solutions, etc.\n\nFor markdown source and links to buy pdf/epub versions, see: https://github.com/learnbyexample/learn_gnused\n\n---\n\n<br> <br> <br>\n\n# <a name=\"gnu-sed\"></a>GNU sed\n\n**Table of Contents**\n\n* [Simple search and replace](#simple-search-and-replace)\n    * [editing stdin](#editing-stdin)\n    * [editing file input](#editing-file-input)\n* [Inplace file editing](#inplace-file-editing)\n    * [With backup](#with-backup)\n    * [Without backup](#without-backup)\n    * [Multiple files](#multiple-files)\n    * [Prefix backup name](#prefix-backup-name)\n    * [Place backups in directory](#place-backups-in-directory)\n* [Line filtering options](#line-filtering-options)\n    * [Print command](#print-command)\n    * [Delete command](#delete-command)\n    * [Quit commands](#quit-commands)\n    * [Negating REGEXP address](#negating-regexp-address)\n    * [Combining multiple REGEXP](#combining-multiple-regexp)\n    * [Filtering by line number](#filtering-by-line-number)\n    * [Print only line number](#print-only-line-number)\n    * [Address range](#address-range)\n    * [Relative addressing](#relative-addressing)\n* [Using different delimiter for REGEXP](#using-different-delimiter-for-regexp)\n* [Regular Expressions](#regular-expressions)\n    * [Line Anchors](#line-anchors)\n    * [Word Anchors](#word-anchors)\n    * [Matching the meta characters](#matching-the-meta-characters)\n    * [Alternation](#alternation)\n    * [The dot meta character](#the-dot-meta-character)\n    * [Quantifiers](#quantifiers)\n    * [Character classes](#character-classes)\n    * [Escape sequences](#escape-sequences)\n    * [Grouping](#grouping)\n    * [Back reference](#back-reference)\n    * [Changing case](#changing-case)\n* [Substitute command modifiers](#substitute-command-modifiers)\n    * [g modifier](#g-modifier)\n    * [Replace specific occurrence](#replace-specific-occurrence)\n    * [Ignoring case](#ignoring-case)\n    * [p modifier](#p-modifier)\n    * [w modifier](#w-modifier)\n    * [e modifier](#e-modifier)\n    * [m modifier](#m-modifier)\n* [Shell substitutions](#shell-substitutions)\n    * [Variable substitution](#variable-substitution)\n    * [Command substitution](#command-substitution)\n* [z and s command line options](#z-and-s-command-line-options)\n* [change command](#change-command)\n* [insert command](#insert-command)\n* [append command](#append-command)\n* [adding contents of file](#adding-contents-of-file)\n    * [r for entire file](#r-for-entire-file)\n    * [R for line by line](#r-for-line-by-line)\n* [n and N commands](#n-and-n-commands)\n* [Control structures](#control-structures)\n    * [if then else](#if-then-else)\n    * [replacing in specific column](#replacing-in-specific-column)\n    * [overlapping substitutions](#overlapping-substitutions)\n* [Lines between two REGEXPs](#lines-between-two-regexps)\n    * [Include or Exclude matching REGEXPs](#include-or-exclude-matching-regexps)\n    * [First or Last block](#first-or-last-block)\n    * [Broken blocks](#broken-blocks)\n* [sed scripts](#sed-scripts)\n* [Gotchas and Tips](#gotchas-and-tips)\n* [Further Reading](#further-reading)\n\n<br>\n\n```bash\n$ sed --version | head -n1\nsed (GNU sed) 4.2.2\n\n$ man sed\nSED(1)                           User Commands                          SED(1)\n\nNAME\n       sed - stream editor for filtering and transforming text\n\nSYNOPSIS\n       sed [OPTION]... {script-only-if-no-other-script} [input-file]...\n\nDESCRIPTION\n       Sed  is a stream editor.  A stream editor is used to perform basic text\n       transformations on an input stream (a file or input from  a  pipeline).\n       While  in  some  ways similar to an editor which permits scripted edits\n       (such as ed), sed works by making only one pass over the input(s),  and\n       is consequently more efficient.  But it is sed's ability to filter text\n       in a pipeline which particularly distinguishes it from other  types  of\n       editors.\n...\n```\n\n**Note:** [Multiline and manipulating pattern space](https://www.gnu.org/software/sed/manual/sed.html#Multiline-techniques) with h,x,D,G,H,P etc is not covered in this chapter and examples/information is based on ASCII encoded text input only\n\n<br>\n\n## <a name=\"simple-search-and-replace\"></a>Simple search and replace\n\nDetailed examples for **substitute** command will be covered in later sections, syntax is\n\n```\ns/REGEXP/REPLACEMENT/FLAGS\n```\n\nThe `/` character is idiomatically used as delimiter character. See also [Using different delimiter for REGEXP](#using-different-delimiter-for-regexp)\n\n<br>\n\n#### <a name=\"editing-stdin\"></a>editing stdin\n\n```bash\n$ # sample command output to be edited\n$ seq 10 | paste -sd,\n1,2,3,4,5,6,7,8,9,10\n\n$ # change only first ',' to ' : '\n$ seq 10 | paste -sd, | sed 's/,/ : /'\n1 : 2,3,4,5,6,7,8,9,10\n\n$ # change all ',' to ' : ' by using 'g' modifier\n$ seq 10 | paste -sd, | sed 's/,/ : /g'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n```\n\n**Note:** As a good practice, all examples use single quotes around arguments to prevent shell interpretation. See [Shell substitutions](#shell-substitutions) section on use of double quotes\n\n<br>\n\n#### <a name=\"editing-file-input\"></a>editing file input\n\n* By default newline character is the line separator\n* See [Regular Expressions](#regular-expressions) section for qualifying search terms, for ex\n    * word boundaries to distinguish between 'hi', 'this', 'his', 'history', etc\n    * multiple search terms, specific set of character, etc\n\n```bash\n$ cat greeting.txt\nHi there\nHave a nice day\n\n$ # change first 'e' in each line to 'E'\n$ sed 's/e/E/' greeting.txt\nHi thEre\nHavE a nice day\n\n$ # change first 'nice day' in each line to 'safe journey'\n$ sed 's/nice day/safe journey/' greeting.txt\nHi there\nHave a safe journey\n\n$ # change all 'e' to 'E' and save changed text to another file\n$ sed 's/e/E/g' greeting.txt > out.txt\n$ cat out.txt\nHi thErE\nHavE a nicE day\n```\n\n<br>\n\n## <a name=\"inplace-file-editing\"></a>Inplace file editing\n\n* In previous section, the output from `sed` was displayed on stdout or saved to another file\n* To write the changes back to original file, use `-i` option\n\n**Note**:\n\n* Refer to `man sed` for details of how to use the `-i` option. It varies with different `sed` implementations. As mentioned at start of this chapter, `sed (GNU sed) 4.2.2` is being used here\n* See also [unix.stackexchange - working with symlinks](https://unix.stackexchange.com/questions/348693/sed-update-etc-grub-conf-in-spite-this-link-file)\n\n<br>\n\n#### <a name=\"with-backup\"></a>With backup\n\n* When extension is given, the original input file is preserved with name changed according to extension provided\n\n```bash\n$ # '.bkp' is extension provided\n$ sed -i.bkp 's/Hi/Hello/' greeting.txt\n$ # output from sed is written back to 'greeting.txt'\n$ cat greeting.txt\nHello there\nHave a nice day\n\n$ # original file is preserved in 'greeting.txt.bkp'\n$ cat greeting.txt.bkp\nHi there\nHave a nice day\n```\n\n<br>\n\n#### <a name=\"without-backup\"></a>Without backup\n\n* Use this option with caution, changes made cannot be undone\n\n```bash\n$ sed -i 's/nice day/safe journey/' greeting.txt\n\n$ # note, 'Hi' was already changed to 'Hello' in previous example\n$ cat greeting.txt\nHello there\nHave a safe journey\n```\n\n<br>\n\n#### <a name=\"multiple-files\"></a>Multiple files\n\n* Multiple input files are treated individually and changes are written back to respective files\n\n```bash\n$ cat f1\nI ate 3 apples\n$ cat f2\nI bought two bananas and 3 mangoes\n\n$ # -i can be used with or without backup\n$ sed -i 's/3/three/' f1 f2\n$ cat f1\nI ate three apples\n$ cat f2\nI bought two bananas and three mangoes\n```\n\n<br>\n\n#### <a name=\"prefix-backup-name\"></a>Prefix backup name\n\n* A `*` in argument given to `-i` will get expanded to input filename\n* This way, one can add prefix instead of suffix for backup\n\n```bash\n$ cat var.txt\nfoo\nbar\nbaz\n\n$ sed -i'bkp.*' 's/foo/hello/' var.txt\n$ cat var.txt\nhello\nbar\nbaz\n\n$ cat bkp.var.txt\nfoo\nbar\nbaz\n```\n\n<br>\n\n#### <a name=\"place-backups-in-directory\"></a>Place backups in directory\n\n* `*` also allows to specify an existing directory to place the backups instead of current working directory\n\n```bash\n$ mkdir bkp_dir\n$ sed -i'bkp_dir/*' 's/bar/hi/' var.txt\n$ cat var.txt\nhello\nhi\nbaz\n\n$ cat bkp_dir/var.txt\nhello\nbar\nbaz\n\n$ # extensions can be added as well\n$ # bkp_dir/*.bkp for suffix\n$ # bkp_dir/bkp.* for prefix\n$ # bkp_dir/bkp.*.2017 for both and so on\n```\n\n<br>\n\n## <a name=\"line-filtering-options\"></a>Line filtering options\n\n* By default, `sed` acts on entire file. Often, one needs to extract or change only specific lines based on text search, line numbers, lines between two patterns, etc\n* This filtering is much like using `grep`, `head` and `tail` commands in many ways and there are even more features\n    * Use `sed` for inplace editing, the filtered lines to be transformed etc. Not as substitute for those commands\n\n<br>\n\n#### <a name=\"print-command\"></a>Print command\n\n* It is usually used in conjunction with `-n` option\n* By default, `sed` prints every input line, including any changes made by commands like substitution\n    * printing here refers to line being part of `sed` output which may be shown on terminal, redirected to file, etc\n* Using `-n` option and `p` command together, only specific lines needed can be filtered\n* Examples below use the `/REGEXP/` addressing, other forms will be seen in sections to follow\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # all lines containing the string 'are'\n$ # same as: grep 'are' poem.txt\n$ sed -n '/are/p' poem.txt\nRoses are red,\nViolets are blue,\nAnd so are you.\n\n$ # all lines containing the string 'so are'\n$ # same as: grep 'so are' poem.txt\n$ sed -n '/so are/p' poem.txt\nAnd so are you.\n```\n\n* Using print and substitution together\n\n```bash\n$ # print only lines on which substitution happens\n$ sed -n 's/are/ARE/p' poem.txt\nRoses ARE red,\nViolets ARE blue,\nAnd so ARE you.\n\n$ # if line contains 'are', perform given command\n$ # print only if substitution succeeds\n$ sed -n '/are/ s/so/SO/p' poem.txt\nAnd SO are you.\n```\n\n* Duplicating every input line\n\n```bash\n$ # note, -n is not used and no filtering applied\n$ seq 3 | sed 'p'\n1\n1\n2\n2\n3\n3\n```\n\n<br>\n\n#### <a name=\"delete-command\"></a>Delete command\n\n* By default, `sed` prints every input line, including any changes like substitution\n* Using the `d` command, those specific lines will NOT be printed\n\n```bash\n$ # same as: grep -v 'are' poem.txt\n$ sed '/are/d' poem.txt\nSugar is sweet,\n\n$ # same as: seq 5 | grep -v '3'\n$ seq 5 | sed '/3/d'\n1\n2\n4\n5\n```\n\n* Modifier `I` allows to filter lines in case-insensitive way\n* See [Regular Expressions](#regular-expressions) section for more details\n\n```bash\n$ # /rose/I means match the string 'rose' irrespective of case\n$ sed '/rose/Id' poem.txt\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n```\n\n<br>\n\n#### <a name=\"quit-commands\"></a>Quit commands\n\n* Exit `sed` without processing further input\n\n```bash\n$ # same as: seq 23 45 | head -n5\n$ # remember that printing is default action if -n is not used\n$ # here, 5 is line number based addressing\n$ seq 23 45 | sed '5q'\n23\n24\n25\n26\n27\n```\n\n* `Q` is similar to `q` but won't print the matching line\n\n```bash\n$ seq 23 45 | sed '5Q'\n23\n24\n25\n26\n\n$ # useful to print from beginning of file up to but not including line matching REGEXP\n$ sed '/is/Q' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n* Use `tac` to get all lines starting from last occurrence of search string\n\n```bash\n$ # all lines from last occurrence of '7'\n$ seq 50 | tac | sed '/7/q' | tac\n47\n48\n49\n50\n\n$ # all lines from last occurrence of '7' excluding line with '7'\n$ seq 50 | tac | sed '/7/Q' | tac\n48\n49\n50\n```\n\n**Note**\n\n* This way of using quit commands won't work for inplace editing with multiple file input\n* See also [unix.stackexchange - applying changes to multiple files](https://unix.stackexchange.com/questions/309514/sed-apply-changes-in-multiple-files)\n\n<br>\n\n#### <a name=\"negating-regexp-address\"></a>Negating REGEXP address\n\n* Use `!` to invert the specified address\n\n```bash\n$ # same as: sed -n '/so are/p' poem.txt\n$ sed '/so are/!d' poem.txt\nAnd so are you.\n\n$ # same as: sed '/are/d' poem.txt\n$ sed -n '/are/!p' poem.txt\nSugar is sweet,\n```\n\n<br>\n\n#### <a name=\"combining-multiple-regexp\"></a>Combining multiple REGEXP\n\n* See also [sed manual - Multiple commands syntax](https://www.gnu.org/software/sed/manual/sed.html#Multiple-commands-syntax) for more details\n* See also [sed scripts](#sed-scripts) section for an alternate way\n\n```bash\n$ # each command as argument to -e option\n$ sed -n -e '/blue/p' -e '/you/p' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # each command separated by ;\n$ # not all commands can be specified so\n$ sed -n '/blue/p; /you/p' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # each command separated by literal newline character\n$ # might depend on whether the shell allows such multiline command\n$ sed -n '\n/blue/p\n/you/p\n' poem.txt\nViolets are blue,\nAnd so are you.\n```\n\n* Use `{}` command grouping for logical AND\n\n```bash\n$ # same as: grep 'are' poem.txt | grep 'And'\n$ # space between /REGEXP/ and {} is optional\n$ sed -n '/are/ {/And/p}' poem.txt\nAnd so are you.\n\n$ # same as: grep 'are' poem.txt | grep -v 'so'\n$ sed -n '/are/ {/so/!p}' poem.txt\nRoses are red,\nViolets are blue,\n\n$ # same as: grep -v 'red' poem.txt | grep -v 'blue'\n$ sed -n '/red/!{/blue/!p}' poem.txt\nSugar is sweet,\nAnd so are you.\n$ # many ways to do it, use whatever feels easier to construct\n$ # sed -e '/red/d' -e '/blue/d' poem.txt\n$ # grep -v -e 'red' -e 'blue' poem.txt\n```\n\n* Different ways to do same things. See also [Alternation](#alternation) and [Control structures](#control-structures)\n\n```bash\n$ # multiple commands can lead to duplicatation\n$ sed -n '/blue/p; /t/p' poem.txt\nViolets are blue,\nViolets are blue,\nSugar is sweet,\n$ # in such cases, use regular expressions instead\n$ sed -nE '/blue|t/p;' poem.txt\nViolets are blue,\nSugar is sweet,\n\n$ sed -nE '/red|blue/!p' poem.txt\nSugar is sweet,\nAnd so are you.\n\n$ sed -n '/so/b; /are/p' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n<br>\n\n#### <a name=\"filtering-by-line-number\"></a>Filtering by line number\n\n* Exact line number can be specified to be acted upon\n* As a special case, `$` indicates last line of file\n* See also [sed manual - Multiple commands syntax](https://www.gnu.org/software/sed/manual/sed.html#Multiple-commands-syntax)\n\n```bash\n$ # here, 2 represents the address for print command, similar to /REGEXP/p\n$ # same as: head -n2 poem.txt | tail -n1\n$ sed -n '2p' poem.txt\nViolets are blue,\n\n$ # print 2nd and 4th line\n$ sed -n '2p; 4p' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # same as: tail -n1 poem.txt\n$ sed -n '$p' poem.txt\nAnd so are you.\n\n$ # delete except 3rd line\n$ sed '3!d' poem.txt\nSugar is sweet,\n\n$ # substitution only on 2nd line\n$ sed '2 s/are/ARE/' poem.txt\nRoses are red,\nViolets ARE blue,\nSugar is sweet,\nAnd so are you.\n```\n\n* For large input files, combine `p` with `q` for speedy exit\n* `sed` would immediately quit without processing further input lines when `q` is used\n\n```bash\n$ seq 3542 4623452 | sed -n '2452{p;q}'\n5993\n\n$ seq 3542 4623452 | sed -n '250p; 2452{p;q}'\n3791\n5993\n\n$ # here is a sample time comparison\n$ time seq 3542 4623452 | sed -n '2452{p;q}' > /dev/null\n\nreal    0m0.003s\nuser    0m0.000s\nsys     0m0.000s\n$ time seq 3542 4623452 | sed -n '2452p' > /dev/null\n\nreal    0m0.334s\nuser    0m0.396s\nsys     0m0.024s\n```\n\n* mimicking `head` command using `q`\n\n```bash\n$ # same as: seq 23 45 | head -n5\n$ # remember that printing is default action if -n is not used\n$ seq 23 45 | sed '5q'\n23\n24\n25\n26\n27\n```\n\n<br>\n\n#### <a name=\"print-only-line-number\"></a>Print only line number\n\n```bash\n$ # gives both line number and matching line\n$ grep -n 'blue' poem.txt\n2:Violets are blue,\n\n$ # gives only line number of matching line\n$ sed -n '/blue/=' poem.txt\n2\n\n$ sed -n '/are/=' poem.txt\n1\n2\n4\n```\n\n* If needed, matching line can also be printed. But there will be newline separation\n\n```bash\n$ sed -n '/blue/{=;p}' poem.txt\n2\nViolets are blue,\n\n$ # or\n$ sed -n '/blue/{p;=}' poem.txt\nViolets are blue,\n2\n```\n\n<br>\n\n#### <a name=\"address-range\"></a>Address range\n\n* So far, we've seen how to filter specific line based on *REGEXP* and line numbers\n* `sed` also allows to combine them to enable selecting a range of lines\n* Consider the sample input file for this section\n\n```bash\n$ cat addr_range.txt\nHello World\n\nGood day\nHow are you\n\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n```\n\n* Range defined by start and end *REGEXP*\n* For other cases like getting lines without the line matching start and/or end, unbalanced start/end, when end *REGEXP* doesn't match, etc see [Lines between two REGEXPs](#lines-between-two-regexps) section\n\n```bash\n$ sed -n '/is/,/like/p' addr_range.txt\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\n$ sed -n '/just/I,/believe/Ip' addr_range.txt\nJust do-it\nBelieve it\n\n$ # the second REGEXP will always be checked after the line matching first address\n$ sed -n '/No/,/No/p' addr_range.txt\nNot a bit funny\nNo doubt you like it too\n\n$ # all the matching ranges will be printed\n$ sed -n '/you/,/do/p' addr_range.txt\nHow are you\n\nJust do-it\nNo doubt you like it too\n\nMuch ado about nothing\n```\n\n* Range defined by start and end line numbers\n\n```bash\n$ # print lines numbered 3 to 7\n$ sed -n '3,7p' addr_range.txt\nGood day\nHow are you\n\nJust do-it\nBelieve it\n\n$ # print lines from line number 13 to last line\n$ sed -n '13,$p' addr_range.txt\nMuch ado about nothing\nHe he he\n\n$ # delete lines numbered 2 to 13\n$ sed '2,13d' addr_range.txt\nHello World\nHe he he\n```\n\n* Range defined by mix of line number and *REGEXP*\n\n```bash\n$ sed -n '3,/do/p' addr_range.txt\nGood day\nHow are you\n\nJust do-it\n\n$ sed -n '/Today/,$p' addr_range.txt\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n```\n\n* Negating address range, just add `!` to end of address range\n\n```bash\n$ # same as: seq 10 | sed '3,7d'\n$ seq 10 | sed -n '3,7!p'\n1\n2\n8\n9\n10\n\n$ # same as: sed '/Today/,$d' addr_range.txt\n$ sed -n '/Today/,$!p' addr_range.txt\nHello World\n\nGood day\nHow are you\n\nJust do-it\nBelieve it\n\n```\n\n<br>\n\n#### <a name=\"relative-addressing\"></a>Relative addressing\n\n* Prefixing `+` to a number for second address gives relative filtering\n* Similar to using `grep -A<num> --no-group-separator 'REGEXP'` but `grep` merges adjacent groups while `sed` does not\n\n```bash\n$ # line matching 'is' and 2 lines after\n$ sed -n '/is/,+2p' addr_range.txt\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\n$ # note that all matching ranges will be filtered\n$ sed -n '/do/,+2p' addr_range.txt\nJust do-it\nBelieve it\n\nNo doubt you like it too\n\nMuch ado about nothing\n```\n\n* The first address could be number too\n* Useful when using [Shell substitutions](#shell-substitutions)\n\n```bash\n$ sed -n '3,+4p' addr_range.txt\nGood day\nHow are you\n\nJust do-it\nBelieve it\n```\n\n* Another relative format is `i~j` which acts on ith line and i+j, i+2j, i+3j, etc\n    * `1~2` means 1st, 3rd, 5th, 7th, etc (i.e odd numbered lines)\n    * `5~3` means 5th, 8th, 11th, etc\n\n```bash\n$ # match odd numbered lines\n$ # for even, use 2~2\n$ seq 10 | sed -n '1~2p'\n1\n3\n5\n7\n9\n\n$ # match line numbers: 2, 2+1*4, 2+1*4, etc\n$ seq 10 | sed -n '2~4p'\n2\n6\n10\n```\n\n* If `~j` is specified after `,` then meaning changes completely\n* After the matching line based on number or *REGEXP* of start address, the closest line number multiple of `j` will mark end address\n\n```bash\n$ # 2nd line is start address\n$ # closest multiple of 4 is 4th line\n$ seq 10 | sed -n '2,~4p'\n2\n3\n4\n$ # closest multiple of 4 is 8th line\n$ seq 10 | sed -n '5,~4p'\n5\n6\n7\n8\n\n$ # line matching on `Just` is 6th line, so ending is 10th line\n$ sed -n '/Just/,~5p' addr_range.txt\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\n```\n\n<br>\n\n## <a name=\"using-different-delimiter-for-regexp\"></a>Using different delimiter for REGEXP\n\n* `/` is idiomatically used as the *REGEXP* delimiter\n    * See also [a bit of history on why / is commonly used as delimiter](https://www.reddit.com/r/commandline/comments/3lhgwh/why_did_people_standardize_on_using_forward/cvgie7j/)\n* But any character other than `\\` and newline character can be used instead\n* This helps to avoid/reduce use of `\\`\n\n```bash\n$ # instead of this\n$ echo '/home/learnbyexample/reports' | sed 's/\\/home\\/learnbyexample\\//~\\//'\n~/reports\n\n$ # use a different delimiter\n$ echo '/home/learnbyexample/reports' | sed 's#/home/learnbyexample/#~/#'\n~/reports\n```\n\n* For *REGEXP* used in address matching, syntax is a bit different `\\<char>REGEXP<char>`\n\n```bash\n$ printf '/foo/bar/1\\n/foo/baz/1\\n'\n/foo/bar/1\n/foo/baz/1\n\n$ printf '/foo/bar/1\\n/foo/baz/1\\n' | sed -n '\\;/foo/bar/;p'\n/foo/bar/1\n```\n\n<br>\n\n## <a name=\"regular-expressions\"></a>Regular Expressions\n\n* By default, `sed` treats *REGEXP* as BRE (Basic Regular Expression)\n* The `-E` option enables ERE (Extended Regular Expression) which in GNU sed's case only differs in how meta characters are used, no difference in functionalities\n    * Initially GNU sed only had `-r` option to enable ERE and `man sed` doesn't even mention `-E`\n    * Other `sed` versions use `-E` and `grep` uses `-E` as well. So `-r` won't be used in examples in this tutorial\n    * See also [sed manual - BRE-vs-ERE](https://www.gnu.org/software/sed/manual/sed.html#BRE-vs-ERE)\n* See [sed manual - Regular Expressions](https://www.gnu.org/software/sed/manual/sed.html#sed-regular-expressions) for more details\n\n<br>\n\n#### <a name=\"line-anchors\"></a>Line Anchors\n\n* Often, search must match from beginning of line or towards end of line\n* For example, an integer variable declaration in `C` will start with optional white-space, the keyword `int`, white-space and then variable(s)\n    * This way one can avoid matching declarations inside single line comments as well\n* Similarly, one might want to match a variable at end of statement\n\nConsider the input file and sample substitution without using any anchoring\n\n```bash\n$ cat anchors.txt\ncat and dog\ntoo many cats around here\nto concatenate, use the cmd cat\ncatapults laid waste to the village\njust scat and quit bothering me\nthat is quite a fabricated tale\ntry the grape variety muscat\n\n$ # without anchors, substitution will replace wherever the string is found\n$ sed 's/cat/XXX/g' anchors.txt\nXXX and dog\ntoo many XXXs around here\nto conXXXenate, use the cmd XXX\nXXXapults laid waste to the village\njust sXXX and quit bothering me\nthat is quite a fabriXXXed tale\ntry the grape variety musXXX\n```\n\n* The meta character `^` forces *REGEXP* to match only at start of line\n\n```bash\n$ # filtering lines starting with 'cat'\n$ sed -n '/^cat/p' anchors.txt\ncat and dog\ncatapults laid waste to the village\n\n$ # replace only at start of line\n$ # g modifier not needed as there can only be single match at start of line\n$ sed 's/^cat/XXX/' anchors.txt\nXXX and dog\ntoo many cats around here\nto concatenate, use the cmd cat\nXXXapults laid waste to the village\njust scat and quit bothering me\nthat is quite a fabricated tale\ntry the grape variety muscat\n\n$ # add something to start of line\n$ echo 'Have a good day' | sed 's/^/Hi! /'\nHi! Have a good day\n```\n\n* The meta character `$` forces *REGEXP* to match only at end of line\n\n```bash\n$ # filtering lines ending with 'cat'\n$ sed -n '/cat$/p' anchors.txt\nto concatenate, use the cmd cat\ntry the grape variety muscat\n\n$ # replace only at end of line\n$ sed 's/cat$/YYY/' anchors.txt\ncat and dog\ntoo many cats around here\nto concatenate, use the cmd YYY\ncatapults laid waste to the village\njust scat and quit bothering me\nthat is quite a fabricated tale\ntry the grape variety musYYY\n\n$ # add something to end of line\n$ echo 'Have a good day' | sed 's/$/. Cya later/'\nHave a good day. Cya later\n```\n\n<br>\n\n#### <a name=\"word-anchors\"></a>Word Anchors\n\n* A **word** character is any alphabet (irrespective of case) or any digit or the underscore character\n* The word anchors help in matching or not matching boundaries of a word\n    * For example, to distinguish between `par`, `spar` and `apparent`\n* `\\b` matches word boundary\n    * `\\` is meta character and certain combinations like `\\b` and `\\B` have special meaning\n\n```bash\n$ # words ending with 'cat'\n$ sed -n 's/cat\\b/XXX/p' anchors.txt\nXXX and dog\nto concatenate, use the cmd XXX\njust sXXX and quit bothering me\ntry the grape variety musXXX\n\n$ # words starting with 'cat'\n$ sed -n 's/\\bcat/YYY/p' anchors.txt\nYYY and dog\ntoo many YYYs around here\nto concatenate, use the cmd YYY\nYYYapults laid waste to the village\n\n$ # only whole words\n$ sed -n 's/\\bcat\\b/ZZZ/p' anchors.txt\nZZZ and dog\nto concatenate, use the cmd ZZZ\n\n$ # word is made up of alphabets, numbers and _\n$ echo 'foo, foo_bar and foo1' | sed 's/\\bfoo\\b/baz/g'\nbaz, foo_bar and foo1\n```\n\n* `\\B` is opposite of `\\b`, i.e it doesn't match word boundaries\n\n```bash\n$ # substitute only if 'cat' is surrounded by word characters\n$ sed -n 's/\\Bcat\\B/QQQ/p' anchors.txt\nto conQQQenate, use the cmd cat\nthat is quite a fabriQQQed tale\n\n$ # substitute only if 'cat' is not start of word\n$ sed -n 's/\\Bcat/RRR/p' anchors.txt\nto conRRRenate, use the cmd cat\njust sRRR and quit bothering me\nthat is quite a fabriRRRed tale\ntry the grape variety musRRR\n\n$ # substitute only if 'cat' is not end of word\n$ sed -n 's/cat\\B/SSS/p' anchors.txt\ntoo many SSSs around here\nto conSSSenate, use the cmd cat\nSSSapults laid waste to the village\nthat is quite a fabriSSSed tale\n```\n\n* One can also use these alternatives for `\\b`\n    * `\\<` for start of word\n    * `\\>` for end of word\n\n```bash\n$ # same as: sed 's/\\bcat\\b/X/g'\n$ echo 'concatenate cat scat cater' | sed 's/\\<cat\\>/X/g'\nconcatenate X scat cater\n\n$ # add something to both start/end of word\n$ echo 'hi foo_baz 3b' | sed 's/\\b/:/g'\n:hi: :foo_baz: :3b:\n\n$ # add something only at start of word\n$ echo 'hi foo_baz 3b' | sed 's/\\</:/g'\n:hi :foo_baz :3b\n\n$ # add something only at end of word\n$ echo 'hi foo_baz 3b' | sed 's/\\>/:/g'\nhi: foo_baz: 3b:\n```\n\n<br>\n\n#### <a name=\"matching-the-meta-characters\"></a>Matching the meta characters\n\n* Since meta characters like `^`, `$`, `\\` etc have special meaning in *REGEXP*, they have to be escaped using `\\` to match them literally\n\n```bash\n$ # here, '^' will match only start of line\n$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | sed 's/^/**/g'\n**(a+b)^2 = a^2 + b^2 + 2ab\n\n$ # '\\` before '^' will match '^' literally\n$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | sed 's/\\^/**/g'\n(a+b)**2 = a**2 + b**2 + 2ab\n\n$ # to match '\\' use '\\\\'\n$ echo 'foo\\bar' | sed 's/\\\\/ /'\nfoo bar\n\n$ echo 'pa$$' | sed 's/$/s/g'\npa$$s\n$ echo 'pa$$' | sed 's/\\$/s/g'\npass\n\n$ # '^' has special meaning only at start of REGEXP\n$ # similarly, '$' has special meaning only at end of REGEXP\n$ echo '(a+b)^2 = a^2 + b^2 + 2ab' | sed 's/a^2/A^2/g'\n(a+b)^2 = A^2 + b^2 + 2ab\n```\n\n* Certain characters like `&` and `\\` have special meaning in *REPLACEMENT* section of substitute as well. They too have to be escaped using `\\`\n* And the delimiter character has to be escaped of course\n* See [back reference](#back-reference) section for use of `&` in *REPLACEMENT* section\n\n```bash\n$ # & will refer to entire matched string of REGEXP section\n$ echo 'foo and bar' | sed 's/and/\"&\"/'\nfoo \"and\" bar\n$ echo 'foo and bar' | sed 's/and/\"\\&\"/'\nfoo \"&\" bar\n\n$ # use different delimiter where required\n$ echo 'a b' | sed 's/ /\\//'\na/b\n$ echo 'a b' | sed 's# #/#'\na/b\n\n$ # use \\\\ to represent literal \\\n$ echo '/foo/bar/baz' | sed 's#/#\\\\#g'\n\\foo\\bar\\baz\n```\n\n<br>\n\n#### <a name=\"alternation\"></a>Alternation\n\n* Two or more *REGEXP* can be combined as logical OR using the `|` meta character\n    * syntax is `\\|` for BRE and `|` for ERE\n* Each side of `|` is complete regular expression with their own start/end anchors\n* How each part of alternation is handled and order of evaluation/output is beyond the scope of this tutorial\n    * See [this](https://www.regular-expressions.info/alternation.html) for more info on this topic.\n\n```bash\n$ # BRE\n$ sed -n '/red\\|blue/p' poem.txt\nRoses are red,\nViolets are blue,\n\n$ # ERE\n$ sed -nE '/red|blue/p' poem.txt\nRoses are red,\nViolets are blue,\n\n$ # filter lines starting or ending with 'cat'\n$ sed -nE '/^cat|cat$/p' anchors.txt\ncat and dog\nto concatenate, use the cmd cat\ncatapults laid waste to the village\ntry the grape variety muscat\n\n$ # g modifier is needed for more than one replacement\n$ echo 'foo and temp and baz' | sed -E 's/foo|temp|baz/XYZ/'\nXYZ and temp and baz\n$ echo 'foo and temp and baz' | sed -E 's/foo|temp|baz/XYZ/g'\nXYZ and XYZ and XYZ\n```\n\n<br>\n\n#### <a name=\"the-dot-meta-character\"></a>The dot meta character\n\n* The `.` meta character matches any character once, including newline\n\n```bash\n$ # replace all sequence of 3 characters starting with 'c' and ending with 't'\n$ echo 'coat cut fit c#t' | sed 's/c.t/XYZ/g'\ncoat XYZ fit XYZ\n\n$ # replace all sequence of 4 characters starting with 'c' and ending with 't'\n$ echo 'coat cut fit c#t' | sed 's/c..t/ABCD/g'\nABCD cut fit c#t\n\n$ # space, tab etc are also characters which will be matched by '.'\n$ echo 'coat cut fit c#t' | sed 's/t.f/IJK/g'\ncoat cuIJKit c#t\n```\n\n<br>\n\n#### <a name=\"quantifiers\"></a>Quantifiers\n\nAll quantifiers in `sed` are greedy, i.e longest match wins as long as overall *REGEXP* is satisfied and precedence is left to right. In this section, we'll cover usage of quantifiers on characters\n\n* `?` will try to match 0 or 1 time\n* For BRE, use `\\?`\n\n```bash\n$ printf 'late\\npale\\nfactor\\nrare\\nact\\n'\nlate\npale\nfactor\nrare\nact\n\n$ # same as using: sed -nE '/at|act/p'\n$ printf 'late\\npale\\nfactor\\nrare\\nact\\n' | sed -nE '/ac?t/p'\nlate\nfactor\nact\n\n$ # greediness comes in handy in some cases\n$ # problem: '<' has to be replaced with '\\<' only if not preceded by '\\'\n$ echo 'blah \\< foo bar < blah baz <'\nblah \\< foo bar < blah baz <\n$ # this won't work as '\\<' gets replaced with '\\\\<'\n$ echo 'blah \\< foo bar < blah baz <' | sed -E 's/</\\\\</g'\nblah \\\\< foo bar \\< blah baz \\<\n$ # by using '\\\\?<' both '\\<' and '<' gets replaced by '\\<'\n$ echo 'blah \\< foo bar < blah baz <' | sed -E 's/\\\\?</\\\\</g'\nblah \\< foo bar \\< blah baz \\<\n```\n\n* `*` will try to match 0 or more times\n\n```bash\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n'\nabc\nac\nadc\nabbc\nbbb\nbc\nabbbbbc\n\n$ # match 'a' and 'c' with any number of 'b' in between\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -n '/ab*c/p'\nabc\nac\nabbc\nabbbbbc\n\n$ # delete from start of line to 'te'\n$ echo 'that is quite a fabricated tale' | sed 's/.*te//'\nd tale\n$ # delete from start of line to 'te '\n$ echo 'that is quite a fabricated tale' | sed 's/.*te //'\na fabricated tale\n$ # delete from first 'f' in the line to end of line\n$ echo 'that is quite a fabricated tale' | sed 's/f.*//'\nthat is quite a \n```\n\n* `+` will try to match 1 or more times\n* For BRE, use `\\+`\n\n```bash\n$ # match 'a' and 'c' with at least one 'b' in between\n$ # BRE\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -n '/ab\\+c/p'\nabc\nabbc\nabbbbbc\n\n$ # ERE\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -nE '/ab+c/p'\nabc\nabbc\nabbbbbc\n```\n\n* For more precise control on number of times to match, use `{}`\n\n```bash\n$ # exactly 5 times\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -nE '/ab{5}c/p'\nabbbbbc\n\n$ # between 1 to 3 times, inclusive of 1 and 3\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -nE '/ab{1,3}c/p'\nabc\nabbc\n\n$ # maximum of 2 times, including 0 times\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -nE '/ab{,2}c/p'\nabc\nac\nabbc\n\n$ # minimum of 2 times\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -nE '/ab{2,}c/p'\nabbc\nabbbbbc\n\n$ # BRE\n$ printf 'abc\\nac\\nadc\\nabbc\\nbbb\\nbc\\nabbbbbc\\n' | sed -n '/ab\\{2,\\}c/p'\nabbc\nabbbbbc\n```\n\n<br>\n\n#### <a name=\"character-classes\"></a>Character classes\n\n* The `.` meta character provides a way to match any character\n* Character class provides a way to match any character among a specified set of characters enclosed within `[]`\n\n```bash\n$ # same as: sed -nE '/lane|late/p'\n$ printf 'late\\nlane\\nfate\\nfete\\n' | sed -n '/la[nt]e/p'\nlate\nlane\n\n$ printf 'late\\nlane\\nfate\\nfete\\n' | sed -n '/[fl]a[nt]e/p'\nlate\nlane\nfate\n\n$ # quantifiers can be added similar to using for any other character\n$ # filter lines made up entirely of digits, containing at least one digit\n$ printf 'cat5\\nfoo\\n123\\n42\\n' | sed -nE '/^[0123456789]+$/p'\n123\n42\n$ # filter lines made up entirely of digits, containing at least three digits\n$ printf 'cat5\\nfoo\\n123\\n42\\n' | sed -nE '/^[0123456789]{3,}$/p'\n123\n```\n\nCharacter ranges\n\n* Matching any alphabet, number, hexadecimal number etc becomes cumbersome if every character has to be individually specified\n* So, there's a shortcut, using `-` to construct a range (has to be specified in ascending order)\n* See [ascii codes table](https://ascii.cl/) for reference\n    * Note that behavior of range will depend on locale settings\n    * [arch wiki - locale](https://wiki.archlinux.org/index.php/locale)\n    * [Linux: Define Locale and Language Settings](https://www.shellhacks.com/linux-define-locale-language-settings/)\n\n```bash\n$ # filter lines made up entirely of digits, at least one\n$ printf 'cat5\\nfoo\\n123\\n42\\n' | sed -nE '/^[0-9]+$/p'\n123\n42\n\n$ # filter lines made up entirely of lower case alphabets, at least one\n$ printf 'cat5\\nfoo\\n123\\n42\\n' | sed -nE '/^[a-z]+$/p'\nfoo\n\n$ # filter lines made up entirely of lower case alphabets and digits, at least one\n$ printf 'cat5\\nfoo\\n123\\n42\\n' | sed -nE '/^[a-z0-9]+$/p'\ncat5\nfoo\n123\n42\n```\n\n* Numeric ranges, easy for certain cases but not suitable always. Use `awk` or `perl` for arithmetic computation\n* See also [Matching Numeric Ranges with a Regular Expression](https://www.regular-expressions.info/numericranges.html)\n\n```bash\n$ # numbers between 10 to 29\n$ printf '23\\n154\\n12\\n26\\n98234\\n' | sed -n '/^[12][0-9]$/p'\n23\n12\n26\n\n$ # numbers >= 100\n$ printf '23\\n154\\n12\\n26\\n98234\\n' | sed -nE '/^[0-9]{3,}$/p'\n154\n98234\n\n$ # numbers >= 100 if there are leading zeros\n$ printf '0501\\n035\\n154\\n12\\n26\\n98234\\n' | sed -nE '/^0*[1-9][0-9]{2,}$/p'\n0501\n154\n98234\n```\n\nNegating character class\n\n* Meta characters inside and outside of `[]` are completely different\n* For example, `^` as first character inside `[]` matches characters other than those specified inside character class\n\n```bash\n$ # delete zero or more characters before first =\n$ echo 'foo=bar; baz=123' | sed 's/^[^=]*//'\n=bar; baz=123\n\n$ # delete zero or more characters after last =\n$ echo 'foo=bar; baz=123' | sed 's/[^=]*$//'\nfoo=bar; baz=\n\n$ # same as: sed -n '/[aeiou]/!p'\n$ printf 'tryst\\nglyph\\npity\\nwhy\\n' | sed -n '/^[^aeiou]*$/p'\ntryst\nglyph\nwhy\n```\n\nMatching meta characters inside `[]`\n\n* Characters like `^`, `]`, `-`, etc need special attention to be part of list\n* Also, sequences like `[.` or `=]` have special meaning within `[]`\n    * See [sed manual - Character-Classes-and-Bracket-Expressions](https://www.gnu.org/software/sed/manual/sed.html#Character-Classes-and-Bracket-Expressions) for complete list\n\n```bash\n$ # to match - it should be first or last character within []\n$ printf 'Foo-bar\\nabc-456\\n42\\nCo-operate\\n' | sed -nE '/^[a-z-]+$/Ip'\nFoo-bar\nCo-operate\n\n$ # to match ] it should be first character within []\n$ printf 'int foo\\nint a[5]\\nfoo=bar\\n' | sed -n '/[]=]/p'\nint a[5]\nfoo=bar\n\n$ # to match [ use [ anywhere in the character list\n$ # [][] will match both [ and ]\n$ printf 'int foo\\nint a[5]\\nfoo=bar\\n' | sed -n '/[[]/p'\nint a[5]\n\n$ # to match ^ it should be other than first in the list\n$ printf 'c=a^b\\nd=f*h+e\\nz=x-y\\n' | sed -n '/[*^]/p'\nc=a^b\nd=f*h+e\n```\n\nNamed character classes\n\n* Equivalent class shown is for C locale and ASCII character encoding\n    * See [ascii codes table](https://ascii.cl/) for reference\n* See [sed manual - Character Classes and Bracket Expressions](https://www.gnu.org/software/sed/manual/sed.html#Character-Classes-and-Bracket-Expressions) for more details\n\n| Character classes | Description |\n| ------------- | ----------- |\n| `[:digit:]` | Same as `[0-9]` |\n| `[:lower:]` | Same as `[a-z]` |\n| `[:upper:]` | Same as `[A-Z]` |\n| `[:alpha:]` | Same as `[a-zA-Z]` |\n| `[:alnum:]` | Same as `[0-9a-zA-Z]` |\n| `[:xdigit:]` | Same as `[0-9a-fA-F]` |\n| `[:cntrl:]` | Control characters - first 32 ASCII characters and 127th (DEL) |\n| `[:punct:]` | All the punctuation characters |\n| `[:graph:]` | `[:alnum:]` and `[:punct:]` |\n| `[:print:]` | `[:alnum:]`, `[:punct:]` and space |\n| `[:blank:]` | Space and tab characters |\n| `[:space:]` | white-space characters: tab, newline, vertical tab, form feed, carriage return and space |\n\n```bash\n$ # lines containing only hexadecimal characters\n$ printf '128\\n34\\nfe32\\nfoo1\\nbar\\n' | sed -nE '/^[[:xdigit:]]+$/p'\n128\n34\nfe32\n\n$ # lines containing at least one non-hexadecimal character\n$ printf '128\\n34\\nfe32\\nfoo1\\nbar\\n' | sed -n '/[^[:xdigit:]]/p'\nfoo1\nbar\n\n$ # same as: sed -nE '/^[a-z-]+$/Ip'\n$ printf 'Foo-bar\\nabc-456\\n42\\nCo-operate\\n' | sed -nE '/^[[:alpha:]-]+$/p'\nFoo-bar\nCo-operate\n\n$ # remove all punctuation characters\n$ sed 's/[[:punct:]]//g' poem.txt\nRoses are red\nViolets are blue\nSugar is sweet\nAnd so are you\n```\n\nBackslash character classes\n\n* Equivalent class shown is for C locale and ASCII character encoding\n    * See [ascii codes table](https://ascii.cl/) for reference\n* See [sed manual - regular expression extensions](https://www.gnu.org/software/sed/manual/sed.html#regexp-extensions) for more details\n\n| Character classes | Description |\n| ------------- | ----------- |\n| `\\w` | Same as `[0-9a-zA-Z_]` or `[[:alnum:]_]` |\n| `\\W` | Same as `[^0-9a-zA-Z_]` or `[^[:alnum:]_]` |\n| `\\s` | Same as `[[:space:]]` |\n| `\\S` | Same as `[^[:space:]]` |\n\n```bash\n$ # lines containing only word characters\n$ printf '123\\na=b+c\\ncmp_str\\nFoo_bar\\n' | sed -nE '/^\\w+$/p'\n123\ncmp_str\nFoo_bar\n\n$ # backslash character classes cannot be used inside [] unlike perl\n$ # \\w would simply match w\n$ echo 'w=y-x+9*3' | sed 's/[\\w=]//g'\ny-x+9*3\n$ echo 'w=y-x+9*3' | perl -pe 's/[\\w=]//g'\n-+*\n```\n\n<br>\n\n#### <a name=\"escape-sequences\"></a>Escape sequences\n\n* Certain ASCII characters like tab, carriage return, newline, etc have escape sequence to represent them\n    * Unlike backslash character classes, these can be used within `[]` as well\n* Any ASCII character can be also represented using their decimal or octal or hexadecimal value\n    * See [ascii codes table](https://ascii.cl/) for reference\n* See [sed manual - Escapes](https://www.gnu.org/software/sed/manual/sed.html#Escapes) for more details\n\n```bash\n$ # example for representing tab character\n$ printf 'foo\\tbar\\tbaz\\n'\nfoo     bar     baz\n$ printf 'foo\\tbar\\tbaz\\n' | sed 's/\\t/ /g'\nfoo bar baz\n$ echo 'a b c' | sed 's/ /\\t/g'\na       b       c\n\n$ # using escape sequence inside character class\n$ printf 'a\\tb\\vc\\n'\na       b\n         c\n$ printf 'a\\tb\\vc\\n' | cat -vT\na^Ib^Kc\n$ printf 'a\\tb\\vc\\n' | sed 's/[\\t\\v]/ /g'\na b c\n\n$ # most common use case for hex escape sequence is to represent single quotes\n$ # equivalent is '\\d039' and '\\o047' for decimal and octal respectively\n$ echo \"foo: '34'\"\nfoo: '34'\n$ echo \"foo: '34'\" | sed 's/\\x27/\"/g'\nfoo: \"34\"\n$ echo 'foo: \"34\"' | sed 's/\"/\\x27/g'\nfoo: '34'\n```\n\n<br>\n\n#### <a name=\"grouping\"></a>Grouping\n\n* Character classes allow matching against a choice of multiple character list and then quantifier added if needed\n* One of the uses of grouping is analogous to character classes for whole regular expressions, instead of just list of characters\n* The meta characters `()` are used for grouping\n    * requires `\\(\\)` for BRE\n* Similar to maths `ab + ac = a(b+c)`, think of regular expression `a(b|c) = ab|ac`\n\n```bash\n$ # four letter words with 'on' or 'no' in middle\n$ printf 'known\\nmood\\nknow\\npony\\ninns\\n' | sed -nE '/\\b[a-z](on|no)[a-z]\\b/p'\nknow\npony\n$ # common mistake to use character class, will match 'oo' and 'nn' as well\n$ printf 'known\\nmood\\nknow\\npony\\ninns\\n' | sed -nE '/\\b[a-z][on]{2}[a-z]\\b/p'\nmood\nknow\npony\ninns\n\n$ # quantifier example\n$ printf 'handed\\nhand\\nhandy\\nhands\\nhandle\\n' | sed -nE '/^hand([sy]|le)?$/p'\nhand\nhandy\nhands\nhandle\n\n$ # remove first two columns where : is delimiter\n$ echo 'foo:123:bar:baz' | sed -E 's/^([^:]+:){2}//'\nbar:baz\n\n$ # can be nested as required\n$ printf 'spade\\nscore\\nscare\\nspare\\nsphere\\n' | sed -nE '/^s([cp](he|a)[rd])e$/p'\nspade\nscare\nspare\nsphere\n```\n\n<br>\n\n#### <a name=\"back-reference\"></a>Back reference\n\n* The matched string within `()` can also be used to be matched again by back referencing the captured groups\n* `\\1` denotes the first matched group, `\\2` the second one and so on\n    * Order is leftmost `(` is `\\1`, next one is `\\2` and so on\n    * Can be used both in *REGEXP* as well as in *REPLACEMENT* sections\n* `&` or `\\0` represents entire matched string in *REPLACEMENT* section\n* Note that the matched string, not the regular expression itself is referenced\n    * for ex: if `([0-9][a-f])` matches `3b`, then back referencing will be `3b` not any other valid match of the regular expression like `8f`, `0a` etc\n* As `\\` and `&` are special characters in *REPLACEMENT* section, use `\\\\` and `\\&` respectively for literal representation\n\n```bash\n$ # filter lines with consecutive repeated alphabets\n$ printf 'eel\\nflee\\nall\\npat\\nilk\\nseen\\n' | sed -nE '/([a-z])\\1/p'\neel\nflee\nall\nseen\n\n$ # reduce \\\\ to single \\ and delete if only single \\\n$ echo '\\[\\] and \\\\w and \\[a-zA-Z0-9\\_\\]' | sed -E 's/(\\\\?)\\\\/\\1/g'\n[] and \\w and [a-zA-Z0-9_]\n\n$ # remove two or more duplicate words separated by space\n$ # word boundaries prevent false matches like 'the theatre' 'sand and stone' etc\n$ echo 'a a a walking for for a cause' | sed -E 's/\\b(\\w+)( \\1)+\\b/\\1/g'\na walking for a cause\n\n$ # surround only third column with double quotes\n$ # note the nested capture groups and numbers used in REPLACEMENT section\n$ echo 'foo:123:bar:baz' | sed -E 's/^(([^:]+:){2})([^:]+)/\\1\"\\3\"/'\nfoo:123:\"bar\":baz\n\n$ # add first column data to end of line as well\n$ echo 'foo:123:bar:baz' | sed -E 's/^([^:]+).*/& \\1/'\nfoo:123:bar:baz foo\n\n$ # surround entire line with double quotes\n$ echo 'hello world' | sed 's/.*/\"&\"/'\n\"hello world\"\n$ # add something at start as well as end of line\n$ echo 'hello world' | sed 's/.*/Hi. &. Have a nice day/'\nHi. hello world. Have a nice day\n```\n\n<br>\n\n#### <a name=\"changing-case\"></a>Changing case\n\n* Applies only to *REPLACEMENT* section, unlike `perl` where these can be used in *REGEXP* portion as well\n* See [sed manual - The s Command](https://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command) for more details and corner cases\n\n```bash\n$ # UPPERCASE all alphabets, will be stopped on \\L or \\E\n$ echo 'HeLlO WoRLD' | sed 's/.*/\\U&/'\nHELLO WORLD\n\n$ # lowercase all alphabets, will be stopped on \\U or \\E\n$ echo 'HeLlO WoRLD' | sed 's/.*/\\L&/'\nhello world\n\n$ # Uppercase only next character\n$ echo 'foo bar' | sed 's/\\w*/\\u&/g'\nFoo Bar\n$ echo 'foo_bar next_line' | sed -E 's/_([a-z])/\\u\\1/g'\nfooBar nextLine\n\n$ # lowercase only next character\n$ echo 'FOO BAR' | sed 's/\\w*/\\l&/g'\nfOO bAR\n$ echo 'fooBar nextLine Baz' | sed -E 's/([a-z])([A-Z])/\\1_\\l\\2/g'\nfoo_bar next_line Baz\n\n$ # titlecase if input has mixed case\n$ echo 'HeLlO WoRLD' | sed 's/.*/\\L&/; s/\\w*/\\u&/g'\nHello World\n$ # sed 's/.*/\\L\\u&/' also works, but not sure if it is defined behavior\n$ echo 'HeLlO WoRLD' | sed 's/.*/\\L&/; s/./\\u&/'\nHello world\n\n$ # \\E will stop conversion started by \\U or \\L\n$ echo 'foo_bar next_line baz' | sed -E 's/([a-z]+)(_[a-z]+)/\\U\\1\\E\\2/g'\nFOO_bar NEXT_line baz\n```\n\n<br>\n\n## <a name=\"substitute-command-modifiers\"></a>Substitute command modifiers\n\nThe `s` command syntax:\n\n```\ns/REGEXP/REPLACEMENT/FLAGS\n```\n\n* Modifiers (or FLAGS) like `g`, `p` and `I` have been already seen. For completeness, they will be discussed again along with rest of the modifiers\n* See [sed manual - The s Command](https://www.gnu.org/software/sed/manual/sed.html#The-_0022s_0022-Command) for more details and corner cases\n\n<br>\n\n#### <a name=\"g-modifier\"></a>g modifier\n\nBy default, substitute command will replace only first occurrence of match. `g` modifier is needed to replace all occurrences\n\n```bash\n$ # replace only first : with -\n$ echo 'foo:123:bar:baz' | sed 's/:/-/'\nfoo-123:bar:baz\n\n$ # replace all : with -\n$ echo 'foo:123:bar:baz' | sed 's/:/-/g'\nfoo-123-bar-baz\n```\n\n<br>\n\n#### <a name=\"replace-specific-occurrence\"></a>Replace specific occurrence\n\n* A number can be used to specify *N*th match to be replaced\n\n```bash\n$ # replace first occurrence\n$ echo 'foo:123:bar:baz' | sed 's/:/-/'\nfoo-123:bar:baz\n$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/XYZ/'\nXYZ:123:bar:baz\n\n$ # replace second occurrence\n$ echo 'foo:123:bar:baz' | sed 's/:/-/2'\nfoo:123-bar:baz\n$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/XYZ/2'\nfoo:XYZ:bar:baz\n\n$ # replace third occurrence\n$ echo 'foo:123:bar:baz' | sed 's/:/-/3'\nfoo:123:bar-baz\n$ echo 'foo:123:bar:baz' | sed -E 's/[^:]+/XYZ/3'\nfoo:123:XYZ:baz\n\n$ # choice of quantifier depends on knowing input\n$ echo ':123:bar:baz' | sed 's/[^:]*/XYZ/2'\n:XYZ:bar:baz\n$ echo ':123:bar:baz' | sed -E 's/[^:]+/XYZ/2'\n:123:XYZ:baz\n```\n\n* Replacing *N*th match from end of line when number of matches is unknown\n* Makes use of greediness of quantifiers\n\n```bash\n$ # replacing last occurrence\n$ # can also use sed -E 's/:([^:]*)$/-\\1/'\n$ echo 'foo:123:bar:baz' | sed -E 's/(.*):/\\1-/'\nfoo:123:bar-baz\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):/\\1-/'\n456:foo:123:bar:789-baz\n$ echo 'foo and bar and baz land good' | sed -E 's/(.*)and/\\1XYZ/'\nfoo and bar and baz lXYZ good\n$ # use word boundaries as necessary\n$ echo 'foo and bar and baz land good' | sed -E 's/(.*)\\band\\b/\\1XYZ/'\nfoo and bar XYZ baz land good\n\n$ # replacing last but one\n$ echo 'foo:123:bar:baz' | sed -E 's/(.*):(.*:)/\\1-\\2/'\nfoo:123-bar:baz\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):(.*:)/\\1-\\2/'\n456:foo:123:bar-789:baz\n\n$ # replacing last but two\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){2})/\\1-\\2/'\n456:foo:123-bar:789:baz\n$ # replacing last but three\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/(.*):((.*:){3})/\\1-\\2/'\n456:foo-123:bar:789:baz\n```\n\n* Replacing all but first *N* occurrences by combining with `g` modifier\n\n```bash\n$ # replace all : with - except first two\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/:/-/3g'\n456:foo:123-bar-789-baz\n\n$ # replace all : with - except first three\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/:/-/4g'\n456:foo:123:bar-789-baz\n```\n\n* Replacing multiple *N*th occurrences\n\n```bash\n$ # replace first two occurrences of : with -\n$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/; s/:/-/'\n456-foo-123:bar:789:baz\n\n$ # replace second and third occurrences of : with -\n$ # note the changes in number to be used for subsequent replacement\n$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/2; s/:/-/2'\n456:foo-123-bar:789:baz\n\n$ # better way is to use descending order\n$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/3; s/:/-/2'\n456:foo-123-bar:789:baz\n$ # replace second, third and fifth occurrences of : with -\n$ echo '456:foo:123:bar:789:baz' | sed 's/:/-/5; s/:/-/3; s/:/-/2'\n456:foo-123-bar:789-baz\n```\n\n<br>\n\n#### <a name=\"ignoring-case\"></a>Ignoring case\n\n* Either `i` or `I` can be used for replacing in case-insensitive manner\n* Since only `I` can be used for address filtering (for ex: `sed '/rose/Id' poem.txt`), use `I` for substitute command as well for consistency\n\n```bash\n$ echo 'hello Hello HELLO HeLlO' | sed 's/hello/hi/g'\nhi Hello HELLO HeLlO\n\n$ echo 'hello Hello HELLO HeLlO' | sed 's/hello/hi/Ig'\nhi hi hi hi\n```\n\n<br>\n\n#### <a name=\"p-modifier\"></a>p modifier\n\n* Usually used in conjunction with `-n` option to output only modified lines\n\n```bash\n$ # no output if no substitution\n$ echo 'hi there. have a nice day' | sed -n 's/xyz/XYZ/p'\n$ # modified line if there is substitution\n$ echo 'hi there. have a nice day' | sed -n 's/\\bh/H/pg'\nHi there. Have a nice day\n\n$ # only lines containing 'are'\n$ sed -n 's/are/ARE/p' poem.txt\nRoses ARE red,\nViolets ARE blue,\nAnd so ARE you.\n\n$ # only lines containing 'are' as well as 'so'\n$ sed -n '/are/ s/so/SO/p' poem.txt\nAnd SO are you.\n```\n\n<br>\n\n#### <a name=\"w-modifier\"></a>w modifier\n\n* Allows to write only the changes to specified file name instead of default **stdout**\n\n```bash\n$ # space between w and filename is optional\n$ # same as: sed -n 's/3/three/p' > 3.txt\n$ seq 20 | sed -n 's/3/three/w 3.txt'\n$ cat 3.txt\nthree\n1three\n\n$ # do not use -n if output should be displayed as well as written to file\n$ echo '456:foo:123:bar:789:baz' | sed -E 's/(:[^:]*){2}$//w col.txt'\n456:foo:123:bar\n$ cat col.txt\n456:foo:123:bar\n```\n\n* For multiple output files, use `-e` for each file\n\n```bash\n$ seq 20 | sed -n -e 's/5/five/w 5.txt' -e 's/7/seven/w 7.txt'\n$ cat 5.txt\nfive\n1five\n$ cat 7.txt\nseven\n1seven\n```\n\n* There are two predefined filenames\n    * `/dev/stdout` to write to **stdout**\n    * `/dev/stderr` to write to **stderr**\n\n```bash\n$ # inplace editing as well as display changes on terminal\n$ sed -i 's/three/3/w /dev/stdout' 3.txt\n3\n13\n$ cat 3.txt\n3\n13\n```\n\n<br>\n\n#### <a name=\"e-modifier\"></a>e modifier\n\n* Allows to use shell command output in *REPLACEMENT* section\n* Trailing newline from command output is suppressed\n\n```bash\n$ # replacing a line with output of shell command\n$ printf 'Date:\\nreplace this line\\n'\nDate:\nreplace this line\n$ printf 'Date:\\nreplace this line\\n' | sed 's/^replace.*/date/e'\nDate:\nThu May 25 10:19:46 IST 2017\n\n$ # when using p modifier with e, order is important\n$ printf 'Date:\\nreplace this line\\n' | sed -n 's/^replace.*/date/ep'\nThu May 25 10:19:46 IST 2017\n$ printf 'Date:\\nreplace this line\\n' | sed -n 's/^replace.*/date/pe'\ndate\n\n$ # entire modified line is executed as shell command\n$ echo 'xyz 5' | sed 's/xyz/seq/e'\n1\n2\n3\n4\n5\n```\n\n<br>\n\n#### <a name=\"m-modifier\"></a>m modifier\n\n* Either `m` or `M` can be used\n* So far, we've seen only line based operations (newline character being used to distinguish lines)\n* There are various ways (see [sed manual - How sed Works](https://www.gnu.org/software/sed/manual/sed.html#Execution-Cycle)) by which more than one line is there in pattern space and in such cases `m` modifier can be used\n* See also [unix.stackexchange - usage of multi-line modifier](https://unix.stackexchange.com/questions/298670/simple-significant-usage-of-m-multi-line-address-suffix) for more examples\n\nBefore seeing example with `m` modifier, let's see a simple example to get two lines in pattern space\n\n```bash\n$ # line matching 'blue' and next line in pattern space\n$ sed -n '/blue/{N;p}' poem.txt\nViolets are blue,\nSugar is sweet,\n\n$ # applying substitution, remember that . matches newline as well\n$ sed -n '/blue/{N;s/are.*is//p}' poem.txt\nViolets  sweet,\n```\n\n* When `m` modifier is used, it affects the behavior of `^`, `$` and `.` meta characters\n\n```bash\n$ # without m modifier, ^ will anchor only beginning of entire pattern space\n$ sed -n '/blue/{N;s/^/:: /pg}' poem.txt\n:: Violets are blue,\nSugar is sweet,\n$ # with m modifier, ^ will anchor each individual line within pattern space\n$ sed -n '/blue/{N;s/^/:: /pgm}' poem.txt\n:: Violets are blue,\n:: Sugar is sweet,\n\n$ # same applies to $ as well\n$ sed -n '/blue/{N;s/$/ ::/pg}' poem.txt\nViolets are blue,\nSugar is sweet, ::\n$ sed -n '/blue/{N;s/$/ ::/pgm}' poem.txt\nViolets are blue, ::\nSugar is sweet, ::\n\n$ # with m modifier, . will not match newline character\n$ sed -n '/blue/{N;s/are.*//p}' poem.txt\nViolets \n$ sed -n '/blue/{N;s/are.*//pm}' poem.txt\nViolets \nSugar is sweet,\n```\n\n<br>\n\n## <a name=\"shell-substitutions\"></a>Shell substitutions\n\n* Examples presented works with `bash` shell, might differ for other shells\n* See also [stackoverflow - Difference between single and double quotes in Bash](https://stackoverflow.com/questions/6697753/difference-between-single-and-double-quotes-in-bash)\n* For robust substitutions taking care of meta characters in *REGEXP* and *REPLACEMENT* sections, see\n    * [unix.stackexchange - How to ensure that string interpolated into sed substitution escapes all metachars](https://unix.stackexchange.com/questions/129059/how-to-ensure-that-string-interpolated-into-sed-substitution-escapes-all-metac)\n    * [unix.stackexchange - What characters do I need to escape when using sed in a sh script?](https://unix.stackexchange.com/questions/32907/what-characters-do-i-need-to-escape-when-using-sed-in-a-sh-script)\n    * [stackoverflow - Is it possible to escape regex metacharacters reliably with sed](https://stackoverflow.com/questions/29613304/is-it-possible-to-escape-regex-metacharacters-reliably-with-sed)\n\n<br>\n\n#### <a name=\"variable-substitution\"></a>Variable substitution\n\n* Entire command in double quotes can be used for simple use cases\n\n```bash\n$ word='are'\n$ sed -n \"/$word/p\" poem.txt\nRoses are red,\nViolets are blue,\nAnd so are you.\n\n$ replace='ARE'\n$ sed \"s/$word/$replace/g\" poem.txt\nRoses ARE red,\nViolets ARE blue,\nSugar is sweet,\nAnd so ARE you.\n\n$ # need to use delimiter as suitable\n$ echo 'home path is:' | sed \"s/$/ $HOME/\"\nsed: -e expression #1, char 7: unknown option to `s'\n$ echo 'home path is:' | sed \"s|$| $HOME|\"\nhome path is: /home/learnbyexample\n```\n\n* If command has characters like `\\`, backtick, `!` etc, double quote only the variable\n\n```bash\n$ # if history expansion is enabled, ! is special\n$ word='are'\n$ sed \"/$word/!d\" poem.txt\nsed \"/$word/date +%A\" poem.txt\nsed: -e expression #1, char 7: extra characters after command\n\n$ # so double quote only the variable\n$ # the command is concatenation of '/' and \"$word\" and '/!d'\n$ sed '/'\"$word\"'/!d' poem.txt\nRoses are red,\nViolets are blue,\nAnd so are you.\n```\n\n<br>\n\n#### <a name=\"command-substitution\"></a>Command substitution\n\n* Much more flexible than using `e` modifier as part of line can be modified as well\n\n```bash\n$ echo 'today is date' | sed 's/date/'\"$(date +%A)\"'/'\ntoday is Tuesday\n\n$ # need to use delimiter as suitable\n$ echo 'current working dir is: ' | sed 's/$/'\"$(pwd)\"'/'\nsed: -e expression #1, char 6: unknown option to `s'\n$ echo 'current working dir is: ' | sed 's|$|'\"$(pwd)\"'|'\ncurrent working dir is: /home/learnbyexample/command_line_text_processing\n\n$ # multiline output cannot be substituted in this manner\n$ echo 'foo' | sed 's/foo/'\"$(seq 5)\"'/'\nsed: -e expression #1, char 7: unterminated `s' command\n```\n\n<br>\n\n## <a name=\"z-and-s-command-line-options\"></a>z and s command line options\n\n* We have already seen a few options like `-n`, `-e`, `-i` and `-E`\n* This section will cover `-z` and `-s` options\n* See [sed manual - Command line options](https://www.gnu.org/software/sed/manual/sed.html#Command_002dLine-Options) for other options and more details\n\nThe `-z` option will cause `sed` to separate input based on ASCII NUL character instead of newlines\n\n```bash\n$ # useful to process null separated data\n$ # for ex: output of grep -Z, find -print0, etc\n$ printf 'teal\\0red\\nblue\\n\\0green\\n' | sed -nz '/red/p' | cat -A\nred$\nblue$\n^@\n\n$ # also useful to process whole file(not having NUL characters) as a single string\n$ # adds ; to previous line if current line starts with c\n$ printf 'cat\\ndog\\ncoat\\ncut\\nmat\\n' | sed -z 's/\\nc/;&/g'\ncat\ndog;\ncoat;\ncut\nmat\n```\n\nThe `-s` option will cause `sed` to treat multiple input files separately instead of treating them as single concatenated input. If `-i` is being used, `-s` is implied\n\n```bash\n$ # without -s, there is only one first line\n$ # F command prints file name of current file\n$ sed '1F' f1 f2\nf1\nI ate three apples\nI bought two bananas and three mangoes\n\n$ # with -s, each file has its own address\n$ sed -s '1F' f1 f2\nf1\nI ate three apples\nf2\nI bought two bananas and three mangoes\n```\n\n<br>\n\n<br>\n\n## <a name=\"change-command\"></a>change command\n\nThe change command `c` will delete line(s) represented by address or address range and replace it with given string\n\n**Note** the string used cannot have literal newline character, use escape sequence instead\n\n```bash\n$ # white-space between c and replacement string is ignored\n$ seq 3 | sed '2c foo bar'\n1\nfoo bar\n3\n\n$ # note how all lines in address range are replaced\n$ seq 8 | sed '3,7cfoo bar'\n1\n2\nfoo bar\n8\n\n$ # escape sequences are allowed in string to be replaced\n$ sed '/red/,/is/chello\\nhi there' poem.txt\nhello\nhi there\nAnd so are you.\n```\n\n* command will apply for all matching addresses\n\n```bash\n$ seq 5 | sed '/[24]/cfoo'\n1\nfoo\n3\nfoo\n5\n```\n\n* `\\` is special immediately after `c`, see [sed manual - other commands](https://www.gnu.org/software/sed/manual/sed.html#Other-Commands) for details\n* If escape sequence is needed at beginning of replacement string, use an additional `\\`\n\n```bash\n$ # \\ helps to add leading spaces\n$ seq 3 | sed '2c  a'\n1\na\n3\n$ seq 3 | sed '2c\\ a'\n1\n a\n3\n\n$ seq 3 | sed '2c\\tgood day'\n1\ntgood day\n3\n$ seq 3 | sed '2c\\\\tgood day'\n1\n        good day\n3\n```\n\n* Since `;` cannot be used to distinguish between string and end of command, use `-e` for multiple commands\n\n```bash\n$ sed -e '/are/cHi;s/is/IS/' poem.txt\nHi;s/is/IS/\nHi;s/is/IS/\nSugar is sweet,\nHi;s/is/IS/\n\n$ sed -e '/are/cHi' -e 's/is/IS/' poem.txt\nHi\nHi\nSugar IS sweet,\nHi\n```\n\n* Using shell substitution\n\n```bash\n$ text='good day'\n$ seq 3 | sed '2c'\"$text\"\n1\ngood day\n3\n\n$ text='good day\\nfoo bar'\n$ seq 3 | sed '2c'\"$text\"\n1\ngood day\nfoo bar\n3\n\n$ seq 3 | sed '2c'\"$(date +%A)\"\n1\nThursday\n3\n\n$ # multiline command output will lead to error\n$ seq 3 | sed '2c'\"$(seq 2)\"\nsed: -e expression #1, char 5: missing command\n```\n\n<br>\n\n## <a name=\"insert-command\"></a>insert command\n\nThe insert command allows to add string before a line matching given address\n\n**Note** the string used cannot have literal newline character, use escape sequence instead\n\n```bash\n$ # white-space between i and string is ignored\n$ # same as: sed '2s/^/hello\\n/'\n$ seq 3 | sed '2i hello'\n1\nhello\n2\n3\n\n$ # escape sequences can be used\n$ seq 3 | sed '2ihello\\nhi'\n1\nhello\nhi\n2\n3\n```\n\n* command will apply for all matching addresses\n\n```bash\n$ seq 5 | sed '/[24]/ifoo'\n1\nfoo\n2\n3\nfoo\n4\n5\n```\n\n* `\\` is special immediately after `i`, see [sed manual - other commands](https://www.gnu.org/software/sed/manual/sed.html#Other-Commands) for details\n* If escape sequence is needed at beginning of replacement string, use an additional `\\`\n\n```bash\n$ seq 3 | sed '2i  foo'\n1\nfoo\n2\n3\n$ seq 3 | sed '2i\\ foo'\n1\n foo\n2\n3\n\n$ seq 3 | sed '2i\\tbar'\n1\ntbar\n2\n3\n$ seq 3 | sed '2i\\\\tbar'\n1\n        bar\n2\n3\n```\n\n* Since `;` cannot be used to distinguish between string and end of command, use `-e` for multiple commands\n\n```bash\n$ sed -e '/is/ifoobar;s/are/ARE/' poem.txt\nRoses are red,\nViolets are blue,\nfoobar;s/are/ARE/\nSugar is sweet,\nAnd so are you.\n\n$ sed -e '/is/ifoobar' -e 's/are/ARE/' poem.txt\nRoses ARE red,\nViolets ARE blue,\nfoobar\nSugar is sweet,\nAnd so ARE you.\n```\n\n* Using shell substitution\n\n```bash\n$ text='good day'\n$ seq 3 | sed '2i'\"$text\"\n1\ngood day\n2\n3\n\n$ text='good day\\nfoo bar'\n$ seq 3 | sed '2i'\"$text\"\n1\ngood day\nfoo bar\n2\n3\n\n$ seq 3 | sed '2iToday is '\"$(date +%A)\"\n1\nToday is Thursday\n2\n3\n\n$ # multiline command output will lead to error\n$ seq 3 | sed '2i'\"$(seq 2)\"\nsed: -e expression #1, char 5: missing command\n```\n\n<br>\n\n## <a name=\"append-command\"></a>append command\n\nThe append command allows to add string after a line matching given address\n\n**Note** the string used cannot have literal newline character, use escape sequence instead\n\n```bash\n$ # white-space between a and string is ignored\n$ # same as: sed '2s/$/\\nhello/'\n$ seq 3 | sed '2a hello'\n1\n2\nhello\n3\n\n$ # escape sequences can be used\n$ seq 3 | sed '2ahello\\nhi'\n1\n2\nhello\nhi\n3\n```\n\n* command will apply for all matching addresses\n\n```bash\n$ seq 5 | sed '/[24]/afoo'\n1\n2\nfoo\n3\n4\nfoo\n5\n```\n\n* `\\` is special immediately after `a`, see [sed manual - other commands](https://www.gnu.org/software/sed/manual/sed.html#Other-Commands) for details\n* If escape sequence is needed at beginning of replacement string, use an additional `\\`\n\n```bash\n$ seq 3 | sed '2a  foo'\n1\n2\nfoo\n3\n$ seq 3 | sed '2a\\ foo'\n1\n2\n foo\n3\n\n$ seq 3 | sed '2a\\tbar'\n1\n2\ntbar\n3\n$ seq 3 | sed '2a\\\\tbar'\n1\n2\n        bar\n3\n```\n\n* Since `;` cannot be used to distinguish between string and end of command, use `-e` for multiple commands\n\n```bash\n$ sed -e '/is/afoobar;s/are/ARE/' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nfoobar;s/are/ARE/\nAnd so are you.\n\n$ sed -e '/is/afoobar' -e 's/are/ARE/' poem.txt\nRoses ARE red,\nViolets ARE blue,\nSugar is sweet,\nfoobar\nAnd so ARE you.\n```\n\n* Using shell substitution\n\n```bash\n$ text='good day'\n$ seq 3 | sed '2a'\"$text\"\n1\n2\ngood day\n3\n\n$ text='good day\\nfoo bar'\n$ seq 3 | sed '2a'\"$text\"\n1\n2\ngood day\nfoo bar\n3\n\n$ seq 3 | sed '2aToday is '\"$(date +%A)\"\n1\n2\nToday is Thursday\n3\n\n$ # multiline command output will lead to error\n$ seq 3 | sed '2a'\"$(seq 2)\"\nsed: -e expression #1, char 5: missing command\n```\n\n* See also [stackoverflow - add newline character if last line of input doesn't have one](https://stackoverflow.com/questions/41343062/what-does-this-mean-in-linux-sed-a-a-txt)\n\n<br>\n\n## <a name=\"adding-contents-of-file\"></a>adding contents of file\n\n<br>\n\n#### <a name=\"r-for-entire-file\"></a>r for entire file\n\n* The `r` command allows to add contents of file after a line matching given address\n* It is a robust way to add multiline content or if content can have characters that may be interpreted\n* Special name `/dev/stdin` allows to read from **stdin** instead of file input\n* First, a simple example to add contents of one file into another at specified address\n\n```bash\n$ cat 5.txt\nfive\n1five\n\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # space between r and filename is optional\n$ sed '2r 5.txt' poem.txt\nRoses are red,\nViolets are blue,\nfive\n1five\nSugar is sweet,\nAnd so are you.\n\n$ # content cannot be added before first line\n$ sed '0r 5.txt' poem.txt\nsed: -e expression #1, char 2: invalid usage of line address 0\n$ # but that is trivial to solve: cat 5.txt poem.txt\n```\n\n* command will apply for all matching addresses\n\n```bash\n$ seq 5 | sed '/[24]/r 5.txt'\n1\n2\nfive\n1five\n3\n4\nfive\n1five\n5\n```\n\n* adding content of variable as it is without any interpretation\n* also shows example for using `/dev/stdin`\n\n```bash\n$ text='Good day\\nfoo bar baz\\n'\n$ # escape sequence like \\n will be interpreted when 'a' command is used\n$ sed '/is/a'\"$text\" poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nGood day\nfoo bar baz\n\nAnd so are you.\n\n$ # \\ is just another character, won't be treated as special with 'r' command\n$ echo \"$text\" | sed '/is/r /dev/stdin' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nGood day\\nfoo bar baz\\n\nAnd so are you.\n```\n\n* adding multiline command output is simple as well\n\n```bash\n$ seq 3 | sed '/is/r /dev/stdin' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\n1\n2\n3\nAnd so are you.\n```\n\n* replacing a line or range of lines with contents of file\n* See also [unix.stackexchange - various ways to replace line M in file1 with line N in file2](https://unix.stackexchange.com/a/396450)\n\n```bash\n$ # replacing range of lines\n$ # order is important, first 'r' and then 'd'\n$ sed -e '/is/r 5.txt' -e '1,/is/d' poem.txt\nfive\n1five\nAnd so are you.\n\n$ # replacing a line\n$ seq 3 | sed -e '3r /dev/stdin' -e '3d' poem.txt\nRoses are red,\nViolets are blue,\n1\n2\n3\nAnd so are you.\n\n$ # can also use {} grouping to avoid repeating the address\n$ seq 3 | sed -e '/blue/{r /dev/stdin' -e 'd}' poem.txt\nRoses are red,\n1\n2\n3\nSugar is sweet,\nAnd so are you.\n```\n\n<br>\n\n#### <a name=\"r-for-line-by-line\"></a>R for line by line\n\n* add a line for every address match\n* Special name `/dev/stdin` allows to read from **stdin** instead of file input\n\n```bash\n$ # space between R and filename is optional\n$ seq 3 | sed '/are/R /dev/stdin' poem.txt\nRoses are red,\n1\nViolets are blue,\n2\nSugar is sweet,\nAnd so are you.\n3\n$ # to replace matching line\n$ seq 3 | sed -e '/are/{R /dev/stdin' -e 'd}' poem.txt\n1\n2\nSugar is sweet,\n3\n\n$ sed '2,3R 5.txt' poem.txt\nRoses are red,\nViolets are blue,\nfive\nSugar is sweet,\n1five\nAnd so are you.\n```\n\n* number of lines from file to be read different from number of matching address lines\n\n```bash\n$ # file has more lines than matching address\n$ # 2 lines in 5.txt but only 1 line matching 'is'\n$ sed '/is/R 5.txt' poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nfive\nAnd so are you.\n\n$ # lines matching address is more than file to be read\n$ # 3 lines matching 'are' but only 2 lines from stdin\n$ seq 2 | sed '/are/R /dev/stdin' poem.txt\nRoses are red,\n1\nViolets are blue,\n2\nSugar is sweet,\nAnd so are you.\n```\n\n<br>\n\n## <a name=\"n-and-n-commands\"></a>n and N commands\n\n* These two commands will fetch next line (newline or NUL character separated, depending on options)\n\nQuoting from [sed manual - common commands](https://www.gnu.org/software/sed/manual/sed.html#Common-Commands) for `n` command\n\n>If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.\n\n```bash\n$ # if line contains 'blue', replace 'e' with 'E' only for following line\n$ sed '/blue/{n;s/e/E/g}' poem.txt\nRoses are red,\nViolets are blue,\nSugar is swEEt,\nAnd so are you.\n\n$ # better illustrated with -n option\n$ sed -n '/blue/{n;s/e/E/pg}' poem.txt\nSugar is swEEt,\n\n$ # if line contains 'blue', replace 'e' with 'E' only for next to next line\n$ sed -n '/blue/{n;n;s/e/E/pg}' poem.txt\nAnd so arE you.\n```\n\nQuoting from [sed manual - other commands](https://www.gnu.org/software/sed/manual/sed.html#Other-Commands) for `N` command\n\n>Add a newline to the pattern space, then append the next line of input to the pattern space. If there is no more input then sed exits without processing any more commands\n\n>When -z is used, a zero byte (the ascii ‘NUL’ character) is added between the lines (instead of a new line)\n\n* See also [stackoverflow - apply substitution every 4 lines but excluding the 4th line](https://stackoverflow.com/questions/40229578/how-to-insert-a-line-feed-into-a-sed-line-concatenation)\n\n```bash\n$ # if line contains 'blue', replace 'e' with 'E' both in current line and next\n$ sed '/blue/{N;s/e/E/g}' poem.txt\nRoses are red,\nViolEts arE bluE,\nSugar is swEEt,\nAnd so are you.\n\n$ # better illustrated with -n option\n$ sed -n '/blue/{N;s/e/E/pg}' poem.txt\nViolEts arE bluE,\nSugar is swEEt,\n\n$ sed -n '/blue/{N;N;s/e/E/pg}' poem.txt\nViolEts arE bluE,\nSugar is swEEt,\nAnd so arE you.\n```\n\n* Combination\n\n```bash\n$ # n will fetch next line, current line is out of pattern space\n$ # N will then add another line\n$ sed -n '/blue/{n;N;s/e/E/pg}' poem.txt\nSugar is swEEt,\nAnd so arE you.\n```\n\n* not necessary to qualify with an address\n\n```bash\n$ seq 6 | sed 'n;cXYZ'\n1\nXYZ\n3\nXYZ\n5\nXYZ\n\n$ seq 6 | sed 'N;s/\\n/ /'\n1 2\n3 4\n5 6\n```\n\n<br>\n\n## <a name=\"control-structures\"></a>Control structures\n\n* Using `:label` one can mark a command location to branch to conditionally or unconditionally\n* See [sed manual - Commands for sed gurus](https://www.gnu.org/software/sed/manual/sed.html#Programming-Commands) for more details\n\n<br>\n\n#### <a name=\"if-then-else\"></a>if then else\n\n* Simple if-then-else can be simulated using `b` command\n* `b` command will unconditionally branch to specified label\n* Without label, `b` will skip rest of commands and start next cycle\n* See [unix.stackexchange - processing only lines between REGEXPs](https://unix.stackexchange.com/questions/292819/remove-commented-lines-except-one-comment-using-sed) for interesting use case\n\n```bash\n$ # changing -ve to +ve and vice versa\n$ cat nums.txt\n42\n-2\n10101\n-3.14\n-75\n$ # same as: perl -pe '/^-/ ? s/// : s/^/-/'\n$ # empty REGEXP section will reuse previous REGEXP, in this case /^-/\n$ sed '/^-/{s///;b}; s/^/-/' nums.txt\n-42\n2\n-10101\n3.14\n75\n\n$ # same as: perl -pe '/are/ ? s/e/*/g : s/e/#/g'\n$ # if line contains 'are' replace 'e' with '*' else replace 'e' with '#'\n$ sed '/are/{s/e/*/g;b}; s/e/#/g' poem.txt\nRos*s ar* r*d,\nViol*ts ar* blu*,\nSugar is sw##t,\nAnd so ar* you.\n```\n\n<br>\n\n#### <a name=\"replacing-in-specific-column\"></a>replacing in specific column\n\n* `t` command will branch to specified label on successful substitution\n* Without label, `t` will skip rest of commands and start next cycle\n* More examples\n    * [stackoverflow - replace data after last delimiter](https://stackoverflow.com/questions/39907133/replace-data-after-last-delimiter-of-every-line-using-sed-or-awk/39908523#39908523)\n    * [stackoverflow - replace multiple occurrences in specific column](https://stackoverflow.com/questions/42886531/replace-mutliple-occurances-in-delimited-columns/42886919#42886919)\n\n```bash\n$ # replace space with underscore only in 3rd column\n$ # ^(([^|]+\\|){2} captures first two columns\n$ # [^|]* zero or more non-column separator characters\n$ # as long as match is found, command will be repeated on same input line\n$ echo 'foo bar|a b c|1 2 3|xyz abc' | sed -E ':a s/^(([^|]+\\|){2}[^|]*) /\\1_/; ta'\nfoo bar|a b c|1_2_3|xyz abc\n\n$ # use awk/perl for simpler syntax\n$ # for ex: awk 'BEGIN{FS=OFS=\"|\"} {gsub(/ /,\"_\",$3); print}'\n```\n\n* example to show difference between `b` and `t`\n\n```bash\n$ # whether or not 'R' is found on lines containing 'are', branch will happen\n$ sed '/are/{s/R/*/g;b}; s/e/#/g' poem.txt\n*oses are red,\nViolets are blue,\nSugar is sw##t,\nAnd so are you.\n\n$ # branch only if line contains 'are' and substitution of 'R' succeeds\n$ sed '/are/{s/R/*/g;t}; s/e/#/g' poem.txt\n*oses are red,\nViol#ts ar# blu#,\nSugar is sw##t,\nAnd so ar# you.\n```\n\n<br>\n\n#### <a name=\"overlapping-substitutions\"></a>overlapping substitutions\n\n* `t` command looping with label comes in handy for overlapping substitutions as well\n* Note that in general this method will work recursively, see [stackoverflow - substitute recursively](https://stackoverflow.com/questions/9983646/sed-substitute-recursively) for example\n\n```bash\n$ # consider the problem of replacing empty columns with something\n$ # case1: no consecutive empty columns - no problem\n$ echo 'foo::bar::baz' | sed 's/::/:0:/g'\nfoo:0:bar:0:baz\n$ # case2: consecutive empty columns are present - problematic\n$ echo 'foo:::bar::baz' | sed 's/::/:0:/g'\nfoo:0::bar:0:baz\n\n$ # t command looping will handle both cases\n$ echo 'foo::bar::baz' | sed ':a s/::/:0:/; ta'\nfoo:0:bar:0:baz\n$ echo 'foo:::bar::baz' | sed ':a s/::/:0:/; ta'\nfoo:0:0:bar:0:baz\n```\n\n<br>\n\n## <a name=\"lines-between-two-regexps\"></a>Lines between two REGEXPs\n\n* Simple cases were seen in [address range](#address-range) section\n* This section will deal with more cases and some corner cases\n\n<br>\n\n#### <a name=\"include-or-exclude-matching-regexps\"></a>Include or Exclude matching REGEXPs\n\nConsider the sample input file, for simplicity the two REGEXPs are **BEGIN** and **END** strings instead of regular expressions\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\nFirst, lines between the two *REGEXP*s are to be printed\n\n* Case 1: both starting and ending *REGEXP* part of output\n\n```bash\n$ sed -n '/BEGIN/,/END/p' range.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\na\nb\nc\nEND\n```\n\n* Case 2: both starting and ending *REGEXP* not part of ouput\n\n```bash\n$ # remember that empty REGEXP section will reuse previously matched REGEXP\n$ sed -n '/BEGIN/,/END/{//!p}' range.txt\n1234\n6789\na\nb\nc\n```\n\n* Case 3: only starting *REGEXP* part of output\n\n```bash\n$ sed -n '/BEGIN/,/END/{/END/!p}' range.txt\nBEGIN\n1234\n6789\nBEGIN\na\nb\nc\n```\n\n* Case 4: only ending *REGEXP* part of output\n\n```bash\n$ sed -n '/BEGIN/,/END/{/BEGIN/!p}' range.txt\n1234\n6789\nEND\na\nb\nc\nEND\n```\n\nSecond, lines between the two *REGEXP*s are to be deleted\n\n* Case 5: both starting and ending *REGEXP* not part of output\n\n```bash\n$ sed '/BEGIN/,/END/d' range.txt\nfoo\nbar\nbaz\n```\n\n* Case 6: both starting and ending *REGEXP* part of output\n\n```bash\n$ # remember that empty REGEXP section will reuse previously matched REGEXP\n$ sed '/BEGIN/,/END/{//!d}' range.txt\nfoo\nBEGIN\nEND\nbar\nBEGIN\nEND\nbaz\n```\n\n* Case 7: only starting *REGEXP* part of output\n\n```bash\n$ sed '/BEGIN/,/END/{/BEGIN/!d}' range.txt\nfoo\nBEGIN\nbar\nBEGIN\nbaz\n```\n\n* Case 8: only ending *REGEXP* part of output\n\n```bash\n$ sed '/BEGIN/,/END/{/END/!d}' range.txt\nfoo\nEND\nbar\nEND\nbaz\n```\n\n<br>\n\n#### <a name=\"first-or-last-block\"></a>First or Last block\n\n* Getting first block is very simple by using `q` command\n\n```bash\n$ sed -n '/BEGIN/,/END/{p;/END/q}' range.txt\nBEGIN\n1234\n6789\nEND\n\n$ # use other tricks discussed in previous section as needed\n$ sed -n '/BEGIN/,/END/{//!p;/END/q}' range.txt\n1234\n6789\n```\n\n* To get last block, reverse the input linewise, the order of *REGEXP*s and finally reverse again\n\n```bash\n$ tac range.txt | sed -n '/END/,/BEGIN/{p;/BEGIN/q}' | tac\nBEGIN\na\nb\nc\nEND\n\n$ # use other tricks discussed in previous section as needed\n$ tac range.txt | sed -n '/END/,/BEGIN/{//!p;/BEGIN/q}' | tac\na\nb\nc\n```\n\n* To get a specific block, say 3rd one, `awk` or `perl` would be a better choice\n    * See [Specific blocks](./gnu_awk.md#specific-blocks) for `awk` examples\n\n<br>\n\n#### <a name=\"broken-blocks\"></a>Broken blocks\n\n* If there are blocks with ending *REGEXP* but without corresponding starting *REGEXP*, `sed -n '/BEGIN/,/END/p'` will suffice\n* Consider the modified input file where final starting *REGEXP* doesn't have corresponding ending\n\n```bash\n$ cat broken_range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nbaz\n```\n\n* All lines till end of file gets printed with simple use of `sed -n '/BEGIN/,/END/p'`\n* The file reversing trick comes in handy here as well\n* But if both kinds of broken blocks are present, further processing will be required. Better to use `awk` or `perl` in such cases\n    * See [Broken blocks](./gnu_awk.md#broken-blocks) for `awk` examples\n\n```bash\n$ sed -n '/BEGIN/,/END/p' broken_range.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\na\nb\nc\nbaz\n\n$ tac broken_range.txt | sed -n '/END/,/BEGIN/p' | tac\nBEGIN\n1234\n6789\nEND\n```\n\n* If there are multiple starting *REGEXP* but single ending *REGEXP*, the reversing trick comes handy again\n\n```bash\n$ cat uneven_range.txt\nfoo\nBEGIN\n1234\nBEGIN\n42\n6789\nEND\nbar\nBEGIN\na\nBEGIN\nb\nBEGIN\nc\nBEGIN\nd\nBEGIN\ne\nEND\nbaz\n\n$ tac uneven_range.txt | sed -n '/END/,/BEGIN/p' | tac\nBEGIN\n42\n6789\nEND\nBEGIN\ne\nEND\n```\n\n<br>\n\n## <a name=\"sed-scripts\"></a>sed scripts\n\n* `sed` commands can be placed in a file and called using `-f` option or directly executed using [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix))\n* See [sed manual - Some Sample Scripts](https://www.gnu.org/software/sed/manual/sed.html#Examples) for more examples\n* See [sed manual - Often-Used Commands](https://www.gnu.org/software/sed/manual/sed.html#Common-Commands) for more details on using comments\n\n```bash\n$ cat script.sed\n# each line is a command\n/is/cfoo bar\n/you/r 3.txt\n/you/d\n# single quotes can be used freely\ns/are/'are'/g\n\n$ sed -f script.sed poem.txt\nRoses 'are' red,\nViolets 'are' blue,\nfoo bar\n3\n13\n\n$ # command line options are specified as usual\n$ sed -nf script.sed poem.txt\nfoo bar\n3\n13\n```\n\n* command line options can be specified along with shebang as well as added at time of invocation\n* See also [stackoverflow - usage of options along with shebang depends on lot of factors](https://stackoverflow.com/questions/4303128/how-to-use-multiple-arguments-with-a-shebang-i-e)\n\n```bash\n$ type sed\nsed is /bin/sed\n\n$ cat executable.sed\n#!/bin/sed -f\n/is/cfoo bar\n/you/r 3.txt\n/you/d\ns/are/'are'/g\n\n$ chmod +x executable.sed\n\n$ ./executable.sed poem.txt\nRoses 'are' red,\nViolets 'are' blue,\nfoo bar\n3\n13\n\n$ ./executable.sed -n poem.txt\nfoo bar\n3\n13\n```\n\n<br>\n\n## <a name=\"gotchas-and-tips\"></a>Gotchas and Tips\n\n* dos style line endings\n\n```bash\n$ # no issue with unix style line ending\n$ printf 'foo bar\\n123 789\\n' | sed -E 's/\\w+$/xyz/'\nfoo xyz\n123 xyz\n\n$ # dos style line ending causes trouble\n$ printf 'foo bar\\r\\n123 789\\r\\n' | sed -E 's/\\w+$/xyz/'\nfoo bar\n123 789\n\n$ # can be corrected by adding \\r as well to match\n$ # if needed, add \\r in replacement section as well\n$ printf 'foo bar\\r\\n123 789\\r\\n' | sed -E 's/\\w+\\r$/xyz/'\nfoo xyz\n123 xyz\n```\n\n* changing dos to unix style line ending and vice versa\n\n```bash\n$ # bash functions\n$ unix2dos() { sed -i 's/$/\\r/' \"$@\" ; }\n$ dos2unix() { sed -i 's/\\r$//' \"$@\" ; }\n\n$ cat -A 5.txt\nfive$\n1five$\n\n$ unix2dos 5.txt\n$ cat -A 5.txt\nfive^M$\n1five^M$\n\n$ dos2unix 5.txt\n$ cat -A 5.txt\nfive$\n1five$\n```\n\n* variable/command substitution\n* See also [stackoverflow - Is it possible to escape regex metacharacters reliably with sed](https://stackoverflow.com/questions/29613304/is-it-possible-to-escape-regex-metacharacters-reliably-with-sed)\n\n```bash\n$ # variables don't get expanded within single quotes\n$ printf 'user\\nhome\\n' | sed '/user/ s/$/: $USER/'\nuser: $USER\nhome\n$ printf 'user\\nhome\\n' | sed '/user/ s/$/: '\"$USER\"'/'\nuser: learnbyexample\nhome\n\n$ # variable being substituted cannot have the delimiter character\n$ printf 'user\\nhome\\n' | sed '/home/ s/$/: '\"$HOME\"'/'\nsed: -e expression #1, char 15: unknown option to `s'\n$ printf 'user\\nhome\\n' | sed '/home/ s#$#: '\"$HOME\"'#'\nuser\nhome: /home/learnbyexample\n\n$ # use r command for robust insertion from file/command-output\n$ sed '1a'\"$(seq 2)\" 5.txt\nsed: -e expression #1, char 5: missing command\n$ seq 2 | sed '1r /dev/stdin' 5.txt\nfive\n1\n2\n1five\n```\n\n* common regular expression mistakes #1 - greediness\n\n```bash\n$ s='foo and bar and baz land good'\n$ echo \"$s\" | sed 's/foo.*ba/123 789/'\n123 789z land good\n\n$ # use a more restrictive version\n$ echo \"$s\" | sed -E 's/foo \\w+ ba/123 789/'\n123 789r and baz land good\n\n$ # or use a tool with non-greedy feature available\n$ echo \"$s\" | perl -pe 's/foo.*?ba/123 789/'\n123 789r and baz land good\n\n$ # for single characters, use negated character class\n$ echo 'foo=123,baz=789,xyz=42' | sed 's/foo=.*,//'\nxyz=42\n$ echo 'foo=123,baz=789,xyz=42' | sed 's/foo=[^,]*,//'\nbaz=789,xyz=42\n```\n\n* common regular expression mistakes #2 - BRE vs ERE syntax\n\n```bash\n$ # + needs to be escaped with BRE or enable ERE\n$ echo 'like 42 and 37' | sed 's/[0-9]+/xxx/g'\nlike 42 and 37\n$ echo 'like 42 and 37' | sed -E 's/[0-9]+/xxx/g'\nlike xxx and xxx\n\n$ # or escaping when not required\n$ echo 'get {} and let' | sed 's/\\{\\}/[]/'\nsed: -e expression #1, char 10: Invalid preceding regular expression\n$ echo 'get {} and let' | sed 's/{}/[]/'\nget [] and let\n```\n\n* common regular expression mistakes #3 - using PCRE syntax/features\n    * especially by trying out solution on online sites like [regex101](https://regex101.com/) and expecting it to work with `sed` as well\n\n```bash\n$ # \\d is not available as backslash character class, will match 'd' instead\n$ echo 'like 42 and 37' | sed -E 's/\\d+/xxx/g'\nlike 42 anxxx 37\n$ echo 'like 42 and 37' | sed -E 's/[0-9]+/xxx/g'\nlike xxx and xxx\n\n$ # features like lookarounds/non-greedy/etc not available\n$ echo 'foo,baz,,xyz,,,123' | sed -E 's/,\\K(?=,)/NaN/g'\nsed: -e expression #1, char 16: Invalid preceding regular expression\n$ echo 'foo,baz,,xyz,,,123' | perl -pe 's/,\\K(?=,)/NaN/g'\nfoo,baz,NaN,xyz,NaN,NaN,123\n```\n\n* common regular expression mistakes #4 - end of line white-space\n\n```bash\n$ printf 'foo bar \\n123 789\\t\\n' | sed -E 's/\\w+$/xyz/'\nfoo bar \n123 789 \n\n$ printf 'foo bar \\n123 789\\t\\n' | sed -E 's/\\w+\\s*$/xyz/'\nfoo xyz\n123 xyz\n```\n\n* and many more... see also\n    * [unix.stackexchange - Why does my regular expression work in X but not in Y?](https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y)\n    * [stackoverflow - Greedy vs. Reluctant vs. Possessive Quantifiers](https://stackoverflow.com/questions/5319840/greedy-vs-reluctant-vs-possessive-quantifiers)\n    * [stackoverflow - How to replace everything between but only until the first occurrence of the end string?](https://stackoverflow.com/questions/45168607/how-to-replace-everything-between-but-only-until-the-first-occurrence-of-the-end)\n    * [stackoverflow - How to match a specified pattern with multiple possibilities](https://stackoverflow.com/questions/43650926/how-to-match-a-specified-pattern-with-multiple-possibilities)\n    * [stackoverflow - mixing different regex syntax](https://stackoverflow.com/questions/45389684/cant-comment-a-line-in-my-cnf/45389833#45389833)\n    * [sed manual - BRE-vs-ERE](https://www.gnu.org/software/sed/manual/sed.html#BRE-vs-ERE)\n\n* Speed boost for ASCII encoded input\n\n```bash\n$ time sed -nE '/^([a-d][r-z]){3}$/p' /usr/share/dict/words\navatar\nawards\ncravat\n\nreal    0m0.058s\n$ time LC_ALL=C sed -nE '/^([a-d][r-z]){3}$/p' /usr/share/dict/words\navatar\nawards\ncravat\n\nreal    0m0.038s\n\n$ time sed -nE '/^([a-z]..)\\1$/p' /usr/share/dict/words > /dev/null\n\nreal    0m0.111s\n$ time LC_ALL=C sed -nE '/^([a-z]..)\\1$/p' /usr/share/dict/words > /dev/null\n\nreal    0m0.073s\n```\n\n<br>\n\n## <a name=\"further-reading\"></a>Further Reading\n\n* Manual and related\n    * `man sed` and `info sed` for more details, known issues/limitations as well as options/commands not covered in this tutorial\n    * [GNU sed manual](https://www.gnu.org/software/sed/manual/sed.html) has even more detailed information and examples\n    * [sed FAQ](http://sed.sourceforge.net/sedfaq.html), last modified '10 March 2003'\n    * [stackoverflow - BSD/macOS Sed vs GNU Sed vs the POSIX Sed specification](https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference/24276470#24276470)\n    * [unix.stackexchange - Differences between sed on Mac OSX and other standard sed](https://unix.stackexchange.com/questions/13711/differences-between-sed-on-mac-osx-and-other-standard-sed)\n* This chapter has also been [converted to a book](https://github.com/learnbyexample/learn_gnused) with additional description, examples and exercises.\n* Tutorials and Q&A\n    * [sed basics](https://code.snipcademy.com/tutorials/shell-scripting/sed/introduction)\n    * [sed detailed tutorial](https://www.grymoire.com/Unix/Sed.html) - has details on differences between various `sed` versions as well\n    * [sed one-liners explained](https://catonmat.net/sed-one-liners-explained-part-one)\n    * [cheat sheet](https://catonmat.net/ftp/sed.stream.editor.cheat.sheet.txt)\n    * [unix.stackexchange - common search and replace examples](https://unix.stackexchange.com/questions/112023/how-can-i-replace-a-string-in-a-files)\n    * [sed Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/sed?sort=votes&pageSize=15)\n    * [sed Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/sed?sort=votes&pageSize=15)\n* Selected examples - portable solutions, commands not covered in this tutorial, same problem solved using different tools, etc\n    * [unix.stackexchange - replace multiline string](https://unix.stackexchange.com/questions/26284/how-can-i-use-sed-to-replace-a-multi-line-string)\n    * [stackoverflow - deleting empty lines with optional white spaces](https://stackoverflow.com/questions/16414410/delete-empty-lines-using-sed)\n    * [unix.stackexchange - print only line above the matching line](https://unix.stackexchange.com/questions/264489/find-each-line-matching-a-pattern-but-print-only-the-line-above-it)\n    * [stackoverflow - How to select lines between two patterns?](https://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns)\n    * [stackoverflow - get lines between two patterns only if there is third pattern between them](https://stackoverflow.com/questions/39960075/bash-how-to-get-lines-between-patterns-only-if-there-is-pattern2-between-them)\n        * [unix.stackexchange - similar example](https://unix.stackexchange.com/questions/228699/sed-print-lines-matched-by-a-pattern-range-if-one-line-matches-a-condition)\n* Learn Regular Expressions (has information on flavors other than BRE/ERE too)\n    * [Regular Expressions Tutorial](https://www.regular-expressions.info/tutorial.html)\n    * [regexcrossword](https://regexcrossword.com/)\n    * [stackoverflow - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)\n* Related tools\n    * [rpl](https://unix.stackexchange.com/questions/112023/how-can-i-replace-a-string-in-a-files/251742#251742) - search and replace tool, has interesting options like interactive mode and recursive mode\n    * [sd](https://github.com/chmln/sd) - simple search and replace, implemented in Rust\n    * [sedsed](https://github.com/aureliojargas/sedsed) - Debugger, indenter and HTMLizer for sed scripts\n    * [xo](https://github.com/ezekg/xo) - composes regular expression match groups\n* [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)\n\n"
  },
  {
    "path": "miscellaneous.md",
    "content": "# <a name=\"miscellaneous\"></a>Miscellaneous\n\n**Table of Contents**\n\n* [cut](#cut)\n    * [select specific fields](#select-specific-fields)\n    * [suppressing lines without delimiter](#suppressing-lines-without-delimiter)\n    * [specifying delimiters](#specifying-delimiters)\n    * [complement](#complement)\n    * [select specific characters](#select-specific-characters)\n    * [Further reading for cut](#further-reading-for-cut)\n* [tr](#tr)\n    * [translation](#translation)\n    * [escape sequences and character classes](#escape-sequences-and-character-classes)\n    * [deletion](#deletion)\n    * [squeeze](#squeeze)\n    * [Further reading for tr](#further-reading-for-tr)\n* [basename](#basename)\n* [dirname](#dirname)\n* [xargs](#xargs)\n* [seq](#seq)\n    * [integer sequences](#integer-sequences)\n    * [specifying separator](#specifying-separator)\n    * [floating point sequences](#floating-point-sequences)\n    * [Further reading for seq](#further-reading-for-seq)\n\n<br>\n\n## <a name=\"cut\"></a>cut\n\n```bash\n$ cut --version | head -n1\ncut (GNU coreutils) 8.25\n\n$ man cut\nCUT(1)                           User Commands                          CUT(1)\n\nNAME\n       cut - remove sections from each line of files\n\nSYNOPSIS\n       cut OPTION... [FILE]...\n\nDESCRIPTION\n       Print selected parts of lines from each FILE to standard output.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"select-specific-fields\"></a>select specific fields\n\n* Default delimiter is **tab** character\n* `-f` option allows to print specific field(s) from each input line\n\n```bash\n$ printf 'foo\\tbar\\t123\\tbaz\\n'\nfoo     bar     123     baz\n\n$ # single field\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut -f2\nbar\n\n$ # multiple fields can be specified by using ,\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut -f2,4\nbar     baz\n\n$ # output is always ascending order of field numbers\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut -f3,1\nfoo     123\n\n$ # range can be specified using -\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut -f1-3\nfoo     bar     123\n$ # if ending number is omitted, select till last field\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut -f3-\n123     baz\n```\n\n<br>\n\n#### <a name=\"suppressing-lines-without-delimiter\"></a>suppressing lines without delimiter\n\n```bash\n$ cat marks.txt\njan 2017\nfoobar  12      45      23\nfeb 2017\nfoobar  18      38      19\n\n$ # by default lines without delimiter will be printed\n$ cut -f2- marks.txt\njan 2017\n12      45      23\nfeb 2017\n18      38      19\n\n$ # use -s option to suppress such lines\n$ cut -s -f2- marks.txt\n12      45      23\n18      38      19\n```\n\n<br>\n\n#### <a name=\"specifying-delimiters\"></a>specifying delimiters\n\n* use `-d` option to specify input delimiter other than default **tab** character\n* only single character can be used, for multi-character/regex based delimiter use `awk` or `perl`\n\n```bash\n$ echo 'foo:bar:123:baz' | cut -d: -f3\n123\n\n$ # by default output delimiter is same as input\n$ echo 'foo:bar:123:baz' | cut -d: -f1,4\nfoo:baz\n\n$ # quote the delimiter character if it clashes with shell special characters\n$ echo 'one;two;three;four' | cut -d; -f3\ncut: option requires an argument -- 'd'\nTry 'cut --help' for more information.\n-f3: command not found\n$ echo 'one;two;three;four' | cut -d';' -f3\nthree\n```\n\n* use `--output-delimiter` option to specify different output delimiter\n* since this option accepts a string, more than one character can be specified\n* See also [using $ prefixed string](https://unix.stackexchange.com/questions/48106/what-does-it-mean-to-have-a-dollarsign-prefixed-string-in-a-script)\n\n```bash\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut --output-delimiter=: -f1-3\nfoo:bar:123\n\n$ echo 'one;two;three;four' | cut -d';' --output-delimiter=' ' -f1,3-\none three four\n\n$ # tested on bash, might differ with other shells\n$ echo 'one;two;three;four' | cut -d';' --output-delimiter=$'\\t' -f1,3-\none     three   four\n\n$ echo 'one;two;three;four' | cut -d';' --output-delimiter=' - ' -f1,3-\none - three - four\n```\n\n<br>\n\n#### <a name=\"complement\"></a>complement\n\n```bash\n$ echo 'one;two;three;four' | cut -d';' -f1,3-\none;three;four\n\n$ # to print other than specified fields\n$ echo 'one;two;three;four' | cut -d';' --complement -f2\none;three;four\n```\n\n<br>\n\n#### <a name=\"select-specific-characters\"></a>select specific characters\n\n* similar to `-f` for field selection, use `-c` for character selection\n* See manual for what defines a character and differences between `-b` and `-c`\n\n```bash\n$ echo 'foo:bar:123:baz' | cut -c4\n:\n\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | cut -c1,4,7\nf       r\n\n$ echo 'foo:bar:123:baz' | cut -c8-\n:123:baz\n\n$ echo 'foo:bar:123:baz' | cut --complement -c8-\nfoo:bar\n\n$ echo 'foo:bar:123:baz' | cut -c1,6,7 --output-delimiter=' '\nf a r\n\n$ echo 'abcdefghij' | cut --output-delimiter='-' -c1-3,4-7,8-\nabc-defg-hij\n\n$ cut -c1-3 marks.txt\njan\nfoo\nfeb\nfoo\n```\n\n<br>\n\n#### <a name=\"further-reading-for-cut\"></a>Further reading for cut\n\n* `man cut` and `info cut` for more options and detailed documentation\n* [cut Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cut?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"tr\"></a>tr\n\n```bash\n$ tr --version | head -n1\ntr (GNU coreutils) 8.25\n\n$ man tr\nTR(1)                            User Commands                           TR(1)\n\nNAME\n       tr - translate or delete characters\n\nSYNOPSIS\n       tr [OPTION]... SET1 [SET2]\n\nDESCRIPTION\n       Translate, squeeze, and/or delete characters from standard input, writ‐\n       ing to standard output.\n...\n```\n\n<br>\n\n#### <a name=\"translation\"></a>translation\n\n* one-to-one mapping of characters, all occurrences are translated\n* as good practice, enclose the arguments in single quotes to avoid issues due to shell interpretation\n\n```bash\n$ echo 'foo bar cat baz' | tr 'abc' '123'\nfoo 21r 31t 21z\n\n$ # use - to represent a range in ascending order\n$ echo 'foo bar cat baz' | tr 'a-f' '1-6'\n6oo 21r 31t 21z\n\n$ # changing case\n$ echo 'foo bar cat baz' | tr 'a-z' 'A-Z'\nFOO BAR CAT BAZ\n$ echo 'Hello World' | tr 'a-zA-Z' 'A-Za-z'\nhELLO wORLD\n\n$ echo 'foo;bar;baz' | tr ; :\ntr: missing operand\nTry 'tr --help' for more information.\n$ echo 'foo;bar;baz' | tr ';' ':'\nfoo:bar:baz\n```\n\n* rot13 example\n\n```bash\n$ echo 'foo bar cat baz' | tr 'a-z' 'n-za-m'\nsbb one png onm\n$ echo 'sbb one png onm' | tr 'a-z' 'n-za-m'\nfoo bar cat baz\n\n$ echo 'Hello World' | tr 'a-zA-Z' 'n-za-mN-ZA-M'\nUryyb Jbeyq\n$ echo 'Uryyb Jbeyq' | tr 'a-zA-Z' 'n-za-mN-ZA-M'\nHello World\n```\n\n* use shell input redirection for file input\n\n```bash\n$ cat marks.txt\njan 2017\nfoobar  12      45      23\nfeb 2017\nfoobar  18      38      19\n\n$ tr 'a-z' 'A-Z' < marks.txt\nJAN 2017\nFOOBAR  12      45      23\nFEB 2017\nFOOBAR  18      38      19\n```\n\n* if arguments are of different lengths\n\n```bash\n$ # when second argument is longer, the extra characters are ignored\n$ echo 'foo bar cat baz' | tr 'abc' '1-9'\nfoo 21r 31t 21z\n\n$ # when first argument is longer\n$ # the last character of second argument gets re-used\n$ echo 'foo bar cat baz' | tr 'a-z' '123'\n333 213 313 213\n\n$ # use -t option to truncate first argument to same length as second\n$ echo 'foo bar cat baz' | tr -t 'a-z' '123'\nfoo 21r 31t 21z\n```\n\n<br>\n\n#### <a name=\"escape-sequences-and-character-classes\"></a>escape sequences and character classes\n\n* Certain characters like newline, tab, etc can be represented using escape sequences or octal representation\n* Certain commonly useful groups of characters like alphabets, digits, punctuations etc have character class as shortcuts\n* See [gnu tr manual](http://www.gnu.org/software/coreutils/manual/html_node/Character-sets.html#Character-sets) for all escape sequences and character classes\n\n```bash\n$ printf 'foo\\tbar\\t123\\tbaz\\n' | tr '\\t' ':'\nfoo:bar:123:baz\n\n$ echo 'foo:bar:123:baz' | tr ':' '\\n'\nfoo\nbar\n123\nbaz\n$ # makes it easier to transform\n$ echo 'foo:bar:123:baz' | tr ':' '\\n' | pr -2ats'-'\nfoo-bar\n123-baz\n\n$ echo 'foo bar cat baz' | tr '[:lower:]' '[:upper:]'\nFOO BAR CAT BAZ\n```\n\n* since `-` is used for character ranges, place it at the end to represent it literally\n    * cannot be used at start of argument as it would get treated as option\n    * or use `--` to indicate end of option processing\n* similarly, to represent `\\` literally, use `\\\\`\n\n```bash\n$ echo '/foo-bar/baz/report' | tr '-a-z' '_A-Z'\ntr: invalid option -- 'a'\nTry 'tr --help' for more information.\n\n$ echo '/foo-bar/baz/report' | tr 'a-z-' 'A-Z_'\n/FOO_BAR/BAZ/REPORT\n\n$ echo '/foo-bar/baz/report' | tr -- '-a-z' '_A-Z'\n/FOO_BAR/BAZ/REPORT\n\n$ echo '/foo-bar/baz/report' | tr '/-' '\\\\_'\n\\foo_bar\\baz\\report\n```\n\n<br>\n\n#### <a name=\"deletion\"></a>deletion\n\n* use `-d` option to specify characters to be deleted\n* add complement option `-c` if it is easier to define which characters are to be retained\n\n```bash\n$ echo '2017-03-21' | tr -d '-'\n20170321\n\n$ echo 'Hi123 there. How a32re you' | tr -d '1-9'\nHi there. How are you\n\n$ # delete all punctuation characters\n$ echo '\"Foo1!\", \"Bar.\", \":Baz:\"' | tr -d '[:punct:]'\nFoo1 Bar Baz\n\n$ # deleting carriage return character\n$ cat -v greeting.txt\nHi there^M\nHow are you^M\n$ tr -d '\\r' < greeting.txt | cat -v\nHi there\nHow are you\n\n$ # retain only alphabets, comma and newline characters\n$ echo '\"Foo1!\", \"Bar.\", \":Baz:\"' | tr -cd '[:alpha:],\\n'\nFoo,Bar,Baz\n```\n\n<br>\n\n#### <a name=\"squeeze\"></a>squeeze\n\n* to change consecutive repeated characters to single copy of that character\n\n```bash\n$ # only lower case alphabets\n$ echo 'FFoo seed 11233' | tr -s 'a-z'\nFFo sed 11233\n\n$ # alphabets and digits\n$ echo 'FFoo seed 11233' | tr -s '[:alnum:]'\nFo sed 123\n\n$ # squeeze other than alphabets\n$ echo 'FFoo seed 11233' | tr -sc '[:alpha:]'\nFFoo seed 123\n\n$ # only characters present in second argument is used for squeeze\n$ echo 'FFoo seed 11233' | tr -s 'A-Z' 'a-z'\nfo sed 11233\n\n$ # multiple consecutive horizontal spaces to single space\n$ printf 'foo\\t\\tbar \\t123     baz\\n'\nfoo             bar     123     baz\n$ printf 'foo\\t\\tbar \\t123     baz\\n' | tr -s '[:blank:]' ' '\nfoo bar 123 baz\n```\n\n<br>\n\n#### <a name=\"further-reading-for-tr\"></a>Further reading for tr\n\n* `man tr` and `info tr` for more options and detailed documentation\n* [tr Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/tr?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"basename\"></a>basename\n\n```bash\n$ basename --version | head -n1\nbasename (GNU coreutils) 8.25\n\n$ man basename\nBASENAME(1)                      User Commands                     BASENAME(1)\n\nNAME\n       basename - strip directory and suffix from filenames\n\nSYNOPSIS\n       basename NAME [SUFFIX]\n       basename OPTION... NAME...\n\nDESCRIPTION\n       Print  NAME  with  any leading directory components removed.  If speci‐\n       fied, also remove a trailing SUFFIX.\n...\n```\n\n<br>\n\n**Examples**\n\n```bash\n$ # same as using pwd command\n$ echo \"$PWD\"\n/home/learnbyexample\n\n$ basename \"$PWD\"\nlearnbyexample\n\n$ # use -a option if there are multiple arguments\n$ basename -a foo/a/report.log bar/y/power.log\nreport.log\npower.log\n\n$ # use single quotes if arguments contain space and other special shell characters\n$ # use suffix option -s to strip file extension from filename\n$ basename -s '.log' '/home/learnbyexample/proj adder/power.log'\npower\n$ # -a is implied when using -s option\n$ basename -s'.log' foo/a/report.log bar/y/power.log\nreport\npower\n```\n\n* Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables\n    * assumes `bash` shell and similar that support this feature\n\n```bash\n$ # remove from start of string up to last /\n$ file='/home/learnbyexample/proj adder/power.log'\n$ basename \"$file\"\npower.log\n$ echo \"${file##*/}\"\npower.log\n\n$ t=\"${file##*/}\"\n$ # remove .log from end of string\n$ echo \"${t%.log}\"\npower\n```\n\n* See `man basename` and `info basename` for detailed documentation\n\n<br>\n\n## <a name=\"dirname\"></a>dirname\n\n```bash\n$ dirname --version | head -n1\ndirname (GNU coreutils) 8.25\n\n$ man dirname\nDIRNAME(1)                       User Commands                      DIRNAME(1)\n\nNAME\n       dirname - strip last component from file name\n\nSYNOPSIS\n       dirname [OPTION] NAME...\n\nDESCRIPTION\n       Output each NAME with its last non-slash component and trailing slashes\n       removed; if NAME contains no  /'s,  output  '.'  (meaning  the  current\n       directory).\n...\n```\n\n<br>\n\n**Examples**\n\n```bash\n$ echo \"$PWD\"\n/home/learnbyexample\n\n$ dirname \"$PWD\"\n/home\n\n$ # use single quotes if arguments contain space and other special shell characters\n$ dirname '/home/learnbyexample/proj adder/power.log'\n/home/learnbyexample/proj adder\n\n$ # unlike basename, by default dirname handles multiple arguments\n$ dirname foo/a/report.log bar/y/power.log\nfoo/a\nbar/y\n\n$ # if no / in argument, output is . to indicate current directory\n$ dirname power.log\n.\n```\n\n* Use `$()` command substitution to further process output as needed\n\n```bash\n$ dirname '/home/learnbyexample/proj adder/power.log'\n/home/learnbyexample/proj adder\n\n$ dirname \"$(dirname '/home/learnbyexample/proj adder/power.log')\"\n/home/learnbyexample\n\n$ basename \"$(dirname '/home/learnbyexample/proj adder/power.log')\"\nproj adder\n```\n\n* Can also use [Parameter expansion](http://mywiki.wooledge.org/BashFAQ/073) if working on file paths saved in variables\n    * assumes `bash` shell and similar that support this feature\n\n```bash\n$ # remove from last / in the string to end of string\n$ file='/home/learnbyexample/proj adder/power.log'\n$ dirname \"$file\"\n/home/learnbyexample/proj adder\n$ echo \"${file%/*}\"\n/home/learnbyexample/proj adder\n\n$ # remove from second last / to end of string\n$ echo \"${file%/*/*}\"\n/home/learnbyexample\n\n$ # apply basename trick to get just directory name instead of full path\n$ t=\"${file%/*}\"\n$ echo \"${t##*/}\"\nproj adder\n```\n\n* See `man dirname` and `info dirname` for detailed documentation\n\n<br>\n\n## <a name=\"xargs\"></a>xargs\n\n```bash\n$ xargs --version | head -n1\nxargs (GNU findutils) 4.7.0-git\n\n$ whatis xargs\nxargs (1)            - build and execute command lines from standard input\n\n$ # from 'man xargs'\n       This manual page documents the GNU version of xargs.  xargs reads items\n       from  the  standard  input, delimited by blanks (which can be protected\n       with double or single quotes or a backslash) or newlines, and  executes\n       the  command (default is /bin/echo) one or more times with any initial-\n       arguments followed by items read from standard input.  Blank  lines  on\n       the standard input are ignored.\n```\n\nWhile `xargs` is [primarily used](https://unix.stackexchange.com/questions/24954/when-is-xargs-needed) for passing output of command or file contents to another command as input arguments and/or parallel processing, it can be quite handy for certain text processing stuff with default `echo` command\n\n```bash\n$ printf ' foo\\t\\tbar \\t123     baz \\n' | cat -e\n foo\t\tbar \t123     baz $\n$ # tr helps to change consecutive blanks to single space\n$ # but what if blanks at start and end have to be removed as well?\n$ printf ' foo\\t\\tbar \\t123     baz \\n' | tr -s '[:blank:]' ' ' | cat -e\n foo bar 123 baz $\n$ # xargs does this by default\n$ printf ' foo\\t\\tbar \\t123     baz \\n' | xargs | cat -e\nfoo bar 123 baz$\n\n$ # -n option limits number of arguments per line\n$ printf ' foo\\t\\tbar \\t123     baz \\n' | xargs -n2\nfoo bar\n123 baz\n\n$ # same as using: paste -d' ' - - -\n$ # or: pr -3ats' '\n$ seq 6 | xargs -n3\n1 2 3\n4 5 6\n```\n\n* use `-a` option to specify file input instead of stdin\n\n```bash\n$ cat marks.txt\njan 2017\nfoobar  12      45      23\nfeb 2017\nfoobar  18      38      19\n\n$ xargs -a marks.txt\njan 2017 foobar 12 45 23 feb 2017 foobar 18 38 19\n\n$ # use -L option to limit max number of lines per command line\n$ xargs -L2 -a marks.txt\njan 2017 foobar 12 45 23\nfeb 2017 foobar 18 38 19\n```\n\n* **Note** since `echo` is the command being executed, it will cause issue with option interpretation\n\n```bash\n$ printf ' -e foo\\t\\tbar \\t123     baz \\n' | xargs -n2\nfoo\nbar 123\nbaz\n\n$ # use -t option to see what is happening (verbose output)\n$ printf ' -e foo\\t\\tbar \\t123     baz \\n' | xargs -n2 -t\necho -e foo \nfoo\necho bar 123 \nbar 123\necho baz \nbaz\n```\n\n* See `man xargs` and `info xargs` for detailed documentation\n\n<br>\n\n## <a name=\"seq\"></a>seq\n\n```bash\n$ seq --version | head -n1\nseq (GNU coreutils) 8.25\n\n$ man seq\nSEQ(1)                           User Commands                          SEQ(1)\n\nNAME\n       seq - print a sequence of numbers\n\nSYNOPSIS\n       seq [OPTION]... LAST\n       seq [OPTION]... FIRST LAST\n       seq [OPTION]... FIRST INCREMENT LAST\n\nDESCRIPTION\n       Print numbers from FIRST to LAST, in steps of INCREMENT.\n...\n```\n\n<br>\n\n#### <a name=\"integer-sequences\"></a>integer sequences\n\n* see `info seq` for details of how large numbers are handled\n    * for ex: `seq 50000000000000000000 2 50000000000000000004` may not work\n\n```bash\n$ # default start=1 and increment=1\n$ seq 3\n1\n2\n3\n\n$ # default increment=1\n$ seq 25434 25437\n25434\n25435\n25436\n25437\n$ seq -5 -3\n-5\n-4\n-3\n\n$ # different increment value\n$ seq 1000 5 1011\n1000\n1005\n1010\n\n$ # use negative increment for descending order\n$ seq 10 -5 -7\n10\n5\n0\n-5\n```\n\n* use `-w` option for leading zeros\n* largest length of start/end value is used to determine padding\n\n```bash\n$ seq 008 010\n8\n9\n10\n\n$ # or: seq -w 8 010\n$ seq -w 008 010\n008\n009\n010\n\n$ seq -w 0003\n0001\n0002\n0003\n```\n\n<br>\n\n#### <a name=\"specifying-separator\"></a>specifying separator\n\n* As seen already, default is newline separator between numbers\n* `-s` option allows to use custom string between numbers\n* A newline is always added at end\n\n```bash\n$ seq -s: 4\n1:2:3:4\n\n$ seq -s' ' 4\n1 2 3 4\n\n$ seq -s' - ' 4\n1 - 2 - 3 - 4\n```\n\n<br>\n\n#### <a name=\"floating-point-sequences\"></a>floating point sequences\n\n```bash\n$ # default increment=1\n$ seq 0.5 2.5\n0.5\n1.5\n2.5\n\n$ seq -s':' -2 0.75 3\n-2.00:-1.25:-0.50:0.25:1.00:1.75:2.50\n\n$ # Scientific notation is supported\n$ seq 1.2e2 1.22e2\n120\n121\n122\n```\n\n* formatting numbers, see `info seq` for details\n\n```bash\n$ seq -f'%.3f' -s':' -2 0.75 3\n-2.000:-1.250:-0.500:0.250:1.000:1.750:2.500\n\n$ seq -f'%.3e' 1.2e2 1.22e2\n1.200e+02\n1.210e+02\n1.220e+02\n```\n\n<br>\n\n#### <a name=\"further-reading-for-seq\"></a>Further reading for seq\n\n* `man seq` and `info seq` for more options, corner cases and detailed documentation\n* [seq Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/seq?sort=votes&pageSize=15)\n"
  },
  {
    "path": "overview_presentation/baz.json",
    "content": "{\n   \"abc\": {\n      \"@attr\": \"good\",\n      \"text\": \"Hi there\"\n   },\n   \"xyz\": {\n      \"@attr\": \"bad\",\n      \"text\": \"I am good. How are you?\"\n   }\n}\n"
  },
  {
    "path": "overview_presentation/foo.xml",
    "content": "<foo>\n    <abc attr=\"good\">Hi there</abc>\n    <xyz attr=\"bad\">I am good. How are you?</xyz>\n</foo>\n"
  },
  {
    "path": "overview_presentation/greeting.txt",
    "content": "Hi there\nHave a nice day\n"
  },
  {
    "path": "overview_presentation/sample.txt",
    "content": "Hello World!\n\nGood day\nHow do you do?\n\nJust do it\nBelieve 42 it!\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he 123 he he\n"
  },
  {
    "path": "perl_the_swiss_knife.md",
    "content": "<br> <br> <br>\n\n---\n\n:information_source: :information_source: This chapter has been converted into a better formatted ebook - https://learnbyexample.github.io/learn_perl_oneliners/. The ebook also has content updated for newer version of `perl`, includes exercises, solutions, etc.\n\nFor markdown source and links to buy pdf/epub versions, see: https://github.com/learnbyexample/learn_perl_oneliners\n\n---\n\n<br> <br> <br>\n\n# <a name=\"perl-one-liners\"></a>Perl one liners\n\n**Table of Contents**\n\n* [Executing Perl code](#executing-perl-code)\n* [Simple search and replace](#simple-search-and-replace)\n    * [inplace editing](#inplace-editing)\n* [Line filtering](#line-filtering)\n    * [Regular expressions based filtering](#regular-expressions-based-filtering)\n    * [Fixed string matching](#fixed-string-matching)\n    * [Line number based filtering](#line-number-based-filtering)\n* [Field processing](#field-processing)\n    * [Field comparison](#field-comparison)\n    * [Specifying different input field separator](#specifying-different-input-field-separator)\n    * [Specifying different output field separator](#specifying-different-output-field-separator)\n* [Changing record separators](#changing-record-separators)\n    * [Input record separator](#input-record-separator)\n    * [Output record separator](#output-record-separator)\n* [Multiline processing](#multiline-processing)\n* [Perl regular expressions](#perl-regular-expressions)\n    * [sed vs perl subtle differences](#sed-vs-perl-subtle-differences)\n    * [Backslash sequences](#backslash-sequences)\n    * [Non-greedy quantifier](#non-greedy-quantifier)\n    * [Lookarounds](#lookarounds)\n    * [Ignoring specific matches](#ignoring-specific-matches)\n    * [Special capture groups](#special-capture-groups)\n    * [Modifiers](#modifiers)\n    * [Quoting metacharacters](#quoting-metacharacters)\n    * [Matching position](#matching-position)\n* [Using modules](#using-modules)\n* [Two file processing](#two-file-processing)\n    * [Comparing whole lines](#comparing-whole-lines)\n    * [Comparing specific fields](#comparing-specific-fields)\n    * [Line number matching](#line-number-matching)\n* [Creating new fields](#creating-new-fields)\n* [Multiple file input](#multiple-file-input)\n* [Dealing with duplicates](#dealing-with-duplicates)\n* [Lines between two REGEXPs](#lines-between-two-regexps)\n    * [All unbroken blocks](#all-unbroken-blocks)\n    * [Specific blocks](#specific-blocks)\n    * [Broken blocks](#broken-blocks)\n* [Array operations](#array-operations)\n    * [Iteration and filtering](#iteration-and-filtering)\n    * [Sorting](#sorting)\n    * [Transforming](#transforming)\n* [Miscellaneous](#miscellaneous)\n    * [split](#split)\n    * [Fixed width processing](#fixed-width-processing)\n    * [String and file replication](#string-and-file-replication)\n    * [transliteration](#transliteration)\n    * [Executing external commands](#executing-external-commands)\n* [Further Reading](#further-reading)\n\n<br>\n\n```bash\n$ perl -le 'print $^V'\nv5.22.1\n\n$ man perl\nPERL(1)                Perl Programmers Reference Guide                PERL(1)\n\nNAME\n       perl - The Perl 5 language interpreter\n\nSYNOPSIS\n       perl [ -sTtuUWX ]      [ -hv ] [ -V[:configvar] ]\n            [ -cw ] [ -d[t][:debugger] ] [ -D[number/list] ]\n            [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal/hexadecimal] ]\n            [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ] [ -f ]\n            [ -C [number/list] ]      [ -S ]      [ -x[dir] ]\n            [ -i[extension] ]\n            [ [-e|-E] 'command' ] [ -- ] [ programfile ] [ argument ]...\n\n       For more information on these options, you can run \"perldoc perlrun\".\n...\n```\n\n**Prerequisites and notes**\n\n* familiarity with programming concepts like variables, printing, control structures, arrays, etc\n* Perl borrows syntax/features from **C, shell scripting, awk, sed** etc. Prior experience working with them would help a lot\n* familiarity with regular expression basics\n    * if not, check out **ERE** portion of [GNU sed regular expressions](./gnu_sed.md#regular-expressions)\n    * examples for non-greedy, lookarounds, etc will be covered here\n* this tutorial is primarily focussed on short programs that are easily usable from command line, similar to using `grep`, `sed`, `awk` etc\n    * do NOT use style/syntax presented here when writing full fledged Perl programs which should use **strict, warnings** etc\n    * see [perldoc - perlintro](https://perldoc.perl.org/perlintro.html) and [learnxinyminutes - perl](https://learnxinyminutes.com/docs/perl/) for quick intro to using Perl for full fledged programs\n* links to Perl documentation will be added as necessary\n* unless otherwise specified, consider input as ASCII encoded text only\n    * see also [stackoverflow - why UTF-8 is not default](https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default)\n\n<br>\n\n## <a name=\"executing-perl-code\"></a>Executing Perl code\n\n* One way is to put code in a file and use `perl` command with filename as argument\n* Another is to use [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) at beginning of script, make the file executable and directly run it\n\n```bash\n$ cat code.pl\nprint \"Hello Perl\\n\"\n$ perl code.pl\nHello Perl\n\n$ # similar to bash\n$ cat code.sh\necho 'Hello Bash'\n$ bash code.sh\nHello Bash\n```\n\n* For short programs, one can use `-e` commandline option to provide code from command line itself\n    * Use `-E` option to use newer features like `say`. See [perldoc - new features](https://perldoc.perl.org/feature.html)\n* This entire chapter is about using `perl` this way from commandline\n\n```bash\n$ perl -e 'print \"Hello Perl\\n\"'\nHello Perl\n\n$ # say automatically adds newline character\n$ perl -E 'say \"Hello Perl\"'\nHello Perl\n\n$ # similar to\n$ bash -c 'echo \"Hello Bash\"'\nHello Bash\n\n$ # multiple commands can be issued separated by ;\n$ # -l will be covered later, here used to append newline to print\n$ perl -le '$x=25; $y=12; print $x**$y'\n59604644775390625\n```\n\n* Perl is (in)famous for being able to things more than one way\n* examples in this chapter will mostly try to use the syntax that avoids `(){}`\n\n```bash\n$ # shows different syntax usage of if/say/print\n$ perl -e 'if(2<3){print(\"2 is less than 3\\n\")}'\n2 is less than 3\n$ perl -E 'say \"2 is less than 3\" if 2<3'\n2 is less than 3\n\n$ # string comparison uses eq for ==, lt for < and so on\n$ perl -e 'if(\"a\" lt \"b\"){$x=5; $y=10} print \"x=$x; y=$y\\n\"'\nx=5; y=10\n$ # x/y assignment will happen only if condition evaluates to true\n$ perl -E 'say \"x=$x; y=$y\" if \"a\" lt \"b\" and $x=5,$y=10'\nx=5; y=10\n\n$ # variables will be interpolated within double quotes\n$ # so, use q operator if single quoting is needed\n$ # as single quote is already being used to group perl code for -e option\n$ perl -le 'print \"ab $x 123\"'\nab  123\n$ perl -le 'print q/ab $x 123/'\nab $x 123\n```\n\n**Further Reading**\n\n* `perl -h` for summary of options\n* [perldoc - Command Switches](https://perldoc.perl.org/perlrun.html#Command-Switches)\n* [perldoc - Perl operators and precedence](https://perldoc.perl.org/perlop.html)\n* [explainshell](https://explainshell.com/explain?cmd=perl+-F+-l+-anpeE+-i+-0+-M) - to quickly get information without having to traverse through the docs\n* See [Changing record separators](#changing-record-separators) section for more details on `-l` option\n\n<br>\n\n## <a name=\"simple-search-and-replace\"></a>Simple search and replace\n\n* **substitution** command syntax is very similar to `sed` for search and replace\n    * syntax is `variable =~ s/REGEXP/REPLACEMENT/FLAGS` and by default acts on `$_` if variable is not specified\n    * see [perldoc - SPECIAL VARIABLES](https://perldoc.perl.org/perlvar.html#SPECIAL-VARIABLES) for explanation on `$_` and other such special variables\n    * more detailed examples will be covered in later sections\n* Just like other text processing commands, `perl` will automatically loop over input line by line when `-n` or `-p` option is used\n    * like `sed`, the `-n` option won't print the record\n    * `-p` will print the record, including any changes made\n    * newline character being default record separator\n    * `$_` will contain the input record content, including the record separator (unlike `sed` and `awk`)\n    * any directory name appearing in file arguments passed will be automatically ignored\n* and similar to other commands, `perl` will work with both stdin and file input\n    * See other chapters for examples of [seq](./miscellaneous.md#seq), [paste](./restructure_text.md#paste), etc\n\n```bash\n$ # sample stdin data\n$ seq 10 | paste -sd,\n1,2,3,4,5,6,7,8,9,10\n\n$ # change only first ',' to ' : '\n$ # same as: sed 's/,/ : /'\n$ seq 10 | paste -sd, | perl -pe 's/,/ : /'\n1 : 2,3,4,5,6,7,8,9,10\n\n$ # change all ',' to ' : ' by using 'g' modifier\n$ # same as: sed 's/,/ : /g'\n$ seq 10 | paste -sd, | perl -pe 's/,/ : /g'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n\n$ cat greeting.txt\nHi there\nHave a nice day\n$ # same as: sed 's/nice day/safe journey/' greeting.txt\n$ perl -pe 's/nice day/safe journey/' greeting.txt\nHi there\nHave a safe journey\n```\n\n<br>\n\n#### <a name=\"inplace-editing\"></a>inplace editing\n\n* similar to [GNU sed - using * with inplace option](./gnu_sed.md#prefix-backup-name), one can also use `*` to either prefix the backup name or place the backup files in another existing directory\n* See also [effectiveperlprogramming - caveats of using -i option](https://www.effectiveperlprogramming.com/2017/12/in-place-editing-gets-safer-in-v5-28/)\n\n```bash\n$ # same as: sed -i.bkp 's/Hi/Hello/' greeting.txt\n$ perl -i.bkp -pe 's/Hi/Hello/' greeting.txt\n$ # original file gets preserved in 'greeting.txt.bkp'\n$ cat greeting.txt\nHello there\nHave a nice day\n\n$ # using -i'bkp.*' will save backup file as 'bkp.greeting.txt'\n\n$ # use empty argument to -i with caution, changes made cannot be undone\n$ perl -i -pe 's/nice day/safe journey/' greeting.txt\n$ cat greeting.txt\nHello there\nHave a safe journey\n```\n\n* Multiple input files are treated individually and changes are written back to respective files\n\n```bash\n$ cat f1\nI ate 3 apples\n$ cat f2\nI bought two bananas and 3 mangoes\n\n$ perl -i.bkp -pe 's/3/three/' f1 f2\n$ cat f1\nI ate three apples\n$ cat f2\nI bought two bananas and three mangoes\n```\n\n<br>\n\n## <a name=\"line-filtering\"></a>Line filtering\n\n<br>\n\n#### <a name=\"regular-expressions-based-filtering\"></a>Regular expressions based filtering\n\n* syntax is `variable =~ m/REGEXP/FLAGS` to check for a match\n    * `variable !~ m/REGEXP/FLAGS` for negated match\n    * by default acts on `$_` if variable is not specified\n* as we need to print only selective lines, use `-n` option\n    * by default, contents of `$_` will be printed if no argument is passed to `print`\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # same as: grep '^[RS]' or sed -n '/^[RS]/p' or awk '/^[RS]/'\n$ # /^[RS]/ is shortcut for $_ =~ m/^[RS]/\n$ perl -ne 'print if /^[RS]/' poem.txt\nRoses are red,\nSugar is sweet,\n\n$ # same as: grep -i 'and' poem.txt\n$ perl -ne 'print if /and/i' poem.txt\nAnd so are you.\n\n$ # same as: grep -v 'are' poem.txt\n$ # !/are/ is shortcut for $_ !~ m/are/\n$ perl -ne 'print if !/are/' poem.txt\nSugar is sweet,\n\n$ # same as: awk '/are/ && !/so/' poem.txt\n$ perl -ne 'print if /are/ && !/so/' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n* using different delimiter\n* quoting from [perldoc - Regexp Quote-Like Operators](https://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators)\n\n> With the m you can use any pair of non-alphanumeric, non-whitespace characters as delimiters\n\n```bash\n$ cat paths.txt\n/foo/a/report.log\n/foo/y/power.log\n/foo/abc/errors.log\n\n$ perl -ne 'print if /\\/foo\\/a\\//' paths.txt\n/foo/a/report.log\n\n$ perl -ne 'print if m#/foo/a/#' paths.txt\n/foo/a/report.log\n\n$ perl -ne 'print if !m#/foo/a/#' paths.txt\n/foo/y/power.log\n/foo/abc/errors.log\n```\n\n<br>\n\n#### <a name=\"fixed-string-matching\"></a>Fixed string matching\n\n* similar to `grep -F` and `awk index`\n* See also\n    * [perldoc - index function](https://perldoc.perl.org/functions/index.html)\n    * [perldoc - Quote and Quote-like Operators](https://perldoc.perl.org/5.8.8/perlop.html#Quote-and-Quote-like-Operators)\n    * [Quoting metacharacters](#quoting-metacharacters) section\n\n```bash\n$ # same as: grep -F 'a[5]' or awk 'index($0, \"a[5]\")'\n$ # index returns matching position(starts at 0) and -1 if not found\n$ echo 'int a[5]' | perl -ne 'print if index($_, \"a[5]\") != -1'\nint a[5]\n\n$ # however, string within double quotes gets interpolated, for ex\n$ x='123'; echo \"$x\"\n123\n$ perl -e '$x=123; print \"$x\\n\"'\n123\n\n$ # so, for commandline usage, better to pass string as environment variable\n$ # they are accessible via the %ENV hash variable\n$ perl -le 'print $ENV{PWD}'\n/home/learnbyexample\n$ perl -le 'print $ENV{SHELL}'\n/bin/bash\n\n$ echo 'a#$%d' | perl -ne 'print if index($_, \"#$%\") != -1'\n$ echo 'a#$%d' | s='#$%' perl -ne 'print if index($_, $ENV{s}) != -1'\na#$%d\n```\n\n* return value is useful to match at specific position\n* for ex: at start/end of line\n\n```bash\n$ cat eqns.txt\na=b,a-b=c,c*d\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ # start of line\n$ # same as: s='a+b' awk 'index($0, ENVIRON[\"s\"])==1' eqns.txt\n$ s='a+b' perl -ne 'print if index($_, $ENV{s})==0' eqns.txt\na+b,pi=3.14,5e12\n\n$ # end of line\n$ # length function returns number of characters, by default acts on $_\n$ s='a+b' perl -ne '$pos = length() - length($ENV{s}) - 1;\n                    print if index($_, $ENV{s}) == $pos' eqns.txt\ni*(t+9-g)/8,4-a+b\n```\n\n<br>\n\n#### <a name=\"line-number-based-filtering\"></a>Line number based filtering\n\n* special variable `$.` contains total records read so far, similar to `NR` in `awk`\n    * But no equivalent of awk's `FNR`, [see this stackoverflow Q&A for workaround](https://stackoverflow.com/questions/12384692/line-number-of-a-file-in-perl)\n* See also [perldoc - eof](https://perldoc.perl.org/perlfunc.html#eof)\n\n```bash\n$ # same as: head -n2 poem.txt | tail -n1\n$ # or sed -n '2p' or awk 'NR==2'\n$ perl -ne 'print if $.==2' poem.txt\nViolets are blue,\n\n$ # print 2nd and 4th line\n$ # same as: sed -n '2p; 4p' or awk 'NR==2 || NR==4'\n$ perl -ne 'print if $.==2 || $.==4' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # same as: tail -n1 poem.txt\n$ # or sed -n '$p' or awk 'END{print}'\n$ perl -ne 'print if eof' poem.txt\nAnd so are you.\n```\n\n* for large input, use `exit` to avoid unnecessary record processing\n\n```bash\n$ # can also use: perl -ne 'print and exit if $.==234'\n$ seq 14323 14563435 | perl -ne 'if($.==234){print; exit}'\n14556\n\n$ # sample time comparison\n$ time seq 14323 14563435 | perl -ne 'if($.==234){print; exit}' > /dev/null\nreal    0m0.005s\n$ time seq 14323 14563435 | perl -ne 'print if $.==234' > /dev/null\nreal    0m2.439s\n\n$ # mimicking head command, same as: head -n3 or sed '3q'\n$ seq 14 25 | perl -pe 'exit if $.>3'\n14\n15\n16\n\n$ # same as: sed '3Q'\n$ seq 14 25 | perl -pe 'exit if $.==3'\n14\n15\n```\n\n* selecting range of lines\n* `..` is [perldoc - range operator](https://perldoc.perl.org/perlop.html#Range-Operators)\n\n```bash\n$ # same as: sed -n '3,5p' or awk 'NR>=3 && NR<=5'\n$ # in this context, the range is compared against $.\n$ seq 14 25 | perl -ne 'print if 3..5'\n16\n17\n18\n\n$ # selecting from particular line number to end of input\n$ # same as: sed -n '10,$p' or awk 'NR>=10'\n$ seq 14 25 | perl -ne 'print if $.>=10'\n23\n24\n25\n```\n\n<br>\n\n## <a name=\"field-processing\"></a>Field processing\n\n* `-a` option will auto-split each input record based on one or more continuous white-space, similar to default behavior in `awk`\n    * See also [split](#split) section\n* Special variable array `@F` will contain all the elements, indexing starts from 0\n    * negative indexing is also supported, `-1` gives last element, `-2` gives last-but-one and so on\n    * see [Array operations](#array-operations) section for examples on array usage\n\n```bash\n$ cat fruits.txt\nfruit   qty\napple   42\nbanana  31\nfig     90\nguava   6\n\n$ # print only first field, indexing starts from 0\n$ # same as: awk '{print $1}' fruits.txt\n$ perl -lane 'print $F[0]' fruits.txt\nfruit\napple\nbanana\nfig\nguava\n\n$ # print only second field\n$ # same as: awk '{print $2}' fruits.txt\n$ perl -lane 'print $F[1]' fruits.txt\nqty\n42\n31\n90\n6\n```\n\n* by default, leading and trailing whitespaces won't be considered when splitting the input record\n    * mimicking `awk`'s default behavior\n\n```bash\n$ printf ' a    ate b\\tc   \\n'\n a    ate b     c\n$ printf ' a    ate b\\tc   \\n' | perl -lane 'print $F[0]'\na\n$ printf ' a    ate b\\tc   \\n' | perl -lane 'print $F[-1]'\nc\n\n$ # number of fields, $#F gives index of last element - so add 1\n$ echo '1 a 7' | perl -lane 'print $#F+1'\n3\n$ printf ' a    ate b\\tc   \\n' | perl -lane 'print $#F+1'\n4\n$ # or use scalar context\n$ echo '1 a 7' | perl -lane 'print scalar @F'\n3\n```\n\n<br>\n\n#### <a name=\"field-comparison\"></a>Field comparison\n\n* for numeric context, Perl automatically tries to convert the string to number, ignoring white-space\n* for string comparison, use `eq` for `==`, `ne` for `!=` and so on\n\n```bash\n$ # if first field exactly matches the string 'apple'\n$ # same as: awk '$1==\"apple\"{print $2}' fruits.txt\n$ perl -lane 'print $F[1] if $F[0] eq \"apple\"' fruits.txt\n42\n\n$ # print first field if second field > 35 (excluding header)\n$ # same as: awk 'NR>1 && $2>35{print $1}' fruits.txt\n$ perl -lane 'print $F[0] if $F[1]>35 && $.>1' fruits.txt\napple\nfig\n\n$ # print header and lines with qty < 35\n$ # same as: awk 'NR==1 || $2<35' fruits.txt\n$ perl -ane 'print if $F[1]<35 || $.==1' fruits.txt\nfruit   qty\nbanana  31\nguava   6\n\n$ # if first field does NOT contain 'a'\n$ # same as: awk '$1 !~ /a/' fruits.txt\n$ perl -ane 'print if $F[0] !~ /a/' fruits.txt\nfruit   qty\nfig     90\n```\n\n<br>\n\n#### <a name=\"specifying-different-input-field-separator\"></a>Specifying different input field separator\n\n* by using `-F` command line option\n    * See also [split](#split) section, which covers details about trailing empty fields\n\n```bash\n$ # second field where input field separator is :\n$ # same as: awk -F: '{print $2}'\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[1]'\n123\n\n$ # last field, same as: awk -F: '{print $NF}'\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[-1]'\n789\n$ # second last field, same as: awk -F: '{print $(NF-1)}'\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[-2]'\nbar\n\n$ # second and last field\n$ # other ways to print more than 1 element will be covered later\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print \"$F[1] $F[-1]\"'\n123 789\n\n$ # use quotes to avoid clashes with shell special characters\n$ echo 'one;two;three;four' | perl -F';' -lane 'print $F[2]'\nthree\n```\n\n* Regular expressions based input field separator\n\n```bash\n$ # same as: awk -F'[0-9]+' '{print $2}'\n$ echo 'Sample123string54with908numbers' | perl -F'\\d+' -lane 'print $F[1]'\nstring\n\n$ # first field will be empty as there is nothing before '{'\n$ # same as: awk -F'[{}= ]+' '{print $1}'\n$ # \\x20 is space character, can't use literal space within [] when using -F\n$ echo '{foo}   bar=baz' | perl -F'[{}=\\x20]+' -lane 'print $F[0]'\n\n$ echo '{foo}   bar=baz' | perl -F'[{}=\\x20]+' -lane 'print $F[1]'\nfoo\n$ echo '{foo}   bar=baz' | perl -F'[{}=\\x20]+' -lane 'print $F[2]'\nbar\n```\n\n* empty argument to `-F` will split the input record character wise\n\n```bash\n$ # same as: gawk -v FS= '{print $1}'\n$ echo 'apple' | perl -F -lane 'print $F[0]'\na\n$ echo 'apple' | perl -F -lane 'print $F[1]'\np\n$ echo 'apple' | perl -F -lane 'print $F[-1]'\ne\n\n$ # use -C option when dealing with unicode characters\n$ # S will turn on UTF-8 for stdin/stdout/stderr streams\n$ printf 'hi👍 how are you?' | perl -CS -F -lane 'print $F[2]'\n👍\n```\n\n<br>\n\n#### <a name=\"specifying-different-output-field-separator\"></a>Specifying different output field separator\n\n* Method 1: use `$,` to change separator between `print` arguments\n    * could be remembered easily by noting that `,` is used to separate `print` arguments\n\n```bash\n$ # by default, the various arguments are concatenated\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[1], $F[-1]'\n123789\n\n$ # change $, if different separator is needed\n$ echo 'foo:123:bar:789' | perl -F: -lane '$,=\" \"; print $F[1], $F[-1]'\n123 789\n$ echo 'foo:123:bar:789' | perl -F: -lane '$,=\"-\"; print $F[1], $F[-1]'\n123-789\n\n$ # argument can be array too\n$ echo 'foo:123:bar:789' | perl -F: -lane '$,=\"-\"; print @F[1,-1]'\n123-789\n$ echo 'foo:123:bar:789' | perl -F: -lane '$,=\" - \"; print @F'\nfoo - 123 - bar - 789\n```\n\n* Method 2: use `join`\n\n```bash\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print join \"-\", $F[1], $F[-1]'\n123-789\n\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print join \"-\", @F[1,-1]'\n123-789\n\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print join \" - \", @F'\nfoo - 123 - bar - 789\n```\n\n* Method 3: use `$\"` to change separator when array is interpolated, default is space character\n    * could be remembered easily by noting that interpolation happens within double quotes\n\n```bash\n$ # default is space\n$ echo 'foo:123:bar:789' | perl -F: -lane 'print \"@F[1,-1]\"'\n123 789\n\n$ echo 'foo:123:bar:789' | perl -F: -lane '$\"=\"-\"; print \"@F[1,-1]\"'\n123-789\n\n$ echo 'foo:123:bar:789' | perl -F: -lane '$\"=\",\"; print \"@F\"'\nfoo,123,bar,789\n```\n\n* use `BEGIN` if same separator is to be used for all lines\n    * statements inside `BEGIN` are executed before processing any input text\n\n```bash\n$ # can also use: perl -lane 'BEGIN{$\"=\",\"} print \"@F\"' fruits.txt\n$ perl -lane 'BEGIN{$,=\",\"} print @F' fruits.txt\nfruit,qty\napple,42\nbanana,31\nfig,90\nguava,6\n```\n\n## <a name=\"changing-record-separators\"></a>Changing record separators\n\n* Before seeing examples for changing record separators, let's cover a detail about contents of input record and use of `-l` option\n* See also [perldoc - chomp](https://perldoc.perl.org/functions/chomp.html)\n\n```bash\n$ # input record includes the record separator as well\n$ # can also use: perl -pe 's/$/ 123/'\n$ echo 'foo' | perl -pe 's/\\n/ 123\\n/'\nfoo 123\n\n$ # this example shows better use case\n$ # similar to paste -sd but with ability to use multi-character delimiter\n$ seq 5 | perl -pe 's/\\n/ : / if !eof'\n1 : 2 : 3 : 4 : 5\n\n$ # -l option will chomp off the record separator (among other things)\n$ echo 'foo' | perl -l -pe 's/\\n/ 123\\n/'\nfoo\n\n$ # -l also sets output record separator which gets added to print statements\n$ # ORS gets input record separator value if no argument is passed to -l\n$ # hence the newline automatically getting added for print in this example\n$ perl -lane 'print $F[0] if $F[1]<35 && $.>1' fruits.txt\nbanana\nguava\n```\n\n<br>\n\n#### <a name=\"input-record-separator\"></a>Input record separator\n\n* by default, newline character is used as input record separator\n* use `$/` to specify a different input record separator\n    * unlike `awk`, only string can be used, no regular expressions\n* for single character separator, can also use `-0` command line option which accepts octal/hexadecimal value as argument\n* if `-l` option is also used\n    * input record separator will be chomped from input record\n    * in addition, if argument is not passed to `-l`, output record separator will get whatever is current value of input record separator\n    * so, order of `-l`, `-0` and/or `$/` usage becomes important\n\n```bash\n$ s='this is a sample string'\n\n$ # space as input record separator, printing all records\n$ # same as: awk -v RS=' ' '{print NR, $0}'\n$ # ORS is newline as -l is used before $/ gets changed\n$ printf \"$s\" | perl -lne 'BEGIN{$/=\" \"} print \"$. $_\"'\n1 this\n2 is\n3 a\n4 sample\n5 string\n\n$ # print all records containing 'a'\n$ # same as: awk -v RS=' ' '/a/'\n$ printf \"$s\" | perl -l -0040 -ne 'print if /a/'\na\nsample\n\n$ # if the order is changed, ORS will be space, not newline\n$ printf \"$s\" | perl -0040 -l -ne 'print if /a/'\na sample \n```\n\n* `-0` option used without argument will use the ASCII NUL character as input record separator\n\n```bash\n$ printf 'foo\\0bar\\0' | cat -A\nfoo^@bar^@$\n$ printf 'foo\\0bar\\0' | perl -l -0 -ne 'print'\nfoo\nbar\n\n$ # could be golfed to: perl -l -0pe ''\n$ # but dont use `-l0` as `0` will be treated as argument to `-l`\n```\n\n* values `-0400` to `-0777` will cause entire file to be slurped\n    * idiomatically, `-0777` is used\n\n```bash\n$ # s modifier allows . to match newline as well\n$ perl -0777 -pe 's/red.*are //s' poem.txt\nRoses are you.\n\n$ # replace first newline with '. '\n$ perl -0777 -pe 's/\\n/. /' greeting.txt\nHello there. Have a safe journey\n```\n\n* for paragraph mode (two more more consecutive newline characters), use `-00` or assign empty string to `$/`\n\nConsider the below sample file\n\n```bash\n$ cat sample.txt\nHello World\n\nGood day\nHow are you\n\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n```\n\n* again, input record will have the separator too and using `-l` will chomp it\n* however, if more than two consecutive newline characters separate the paragraphs, only two newlines will be preserved and the rest discarded\n    * use `$/=\"\\n\\n\"` to avoid this behavior\n\n```bash\n$ # print all paragraphs containing 'it'\n$ # same as: awk -v RS= -v ORS='\\n\\n' '/it/' sample.txt\n$ perl -00 -ne 'print if /it/' sample.txt\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\n$ # based on number of lines in each paragraph\n$ perl -F'\\n' -00 -ane 'print if $#F==0' sample.txt\nHello World\n\n$ # unlike awk -F'\\n' -v RS= -v ORS='\\n\\n' 'NF==2 && /do/' sample.txt\n$ # there wont be empty line at end because input file didn't have it\n$ perl -F'\\n' -00 -ane 'print if $#F==1 && /do/' sample.txt\nJust do-it\nBelieve it\n\nMuch ado about nothing\nHe he he\n```\n\n* Re-structuring paragraphs\n\n```bash\n$ # same as: awk 'BEGIN{FS=\"\\n\"; OFS=\". \"; RS=\"\"; ORS=\"\\n\\n\"} {$1=$1} 1'\n$ perl -F'\\n' -00 -ane 'print join \". \", @F; print \"\\n\\n\"' sample.txt\nHello World\n\nGood day. How are you\n\nJust do-it. Believe it\n\nToday is sunny. Not a bit funny. No doubt you like it too\n\nMuch ado about nothing. He he he\n\n```\n\n* multi-character separator\n\n```bash\n$ cat report.log\nblah blah\nError: something went wrong\nmore blah\nwhatever\nError: something surely went wrong\nsome text\nsome more text\nblah blah blah\n\n$ # number of records, same as: awk -v RS='Error:' 'END{print NR}'\n$ perl -lne 'BEGIN{$/=\"Error:\"} print $. if eof' report.log\n3\n$ # print first record\n$ perl -lne 'BEGIN{$/=\"Error:\"} print if $.==1' report.log\nblah blah\n\n$ # same as: awk -v RS='Error:' '/surely/{print RS $0}' report.log\n$ perl -lne 'BEGIN{$/=\"Error:\"} print \"$/$_\" if /surely/' report.log\nError: something surely went wrong\nsome text\nsome more text\nblah blah blah\n\n```\n\n* Joining lines based on specific end of line condition\n\n```bash\n$ cat msg.txt\nHello there.\nIt will rain to-\nday. Have a safe\nand pleasant jou-\nrney.\n\n$ # same as: awk -v RS='-\\n' -v ORS= '1' msg.txt\n$ # can also use: perl -pe 's/-\\n//' msg.txt\n$ perl -pe 'BEGIN{$/=\"-\\n\"} chomp' msg.txt\nHello there.\nIt will rain today. Have a safe\nand pleasant journey.\n```\n\n<br>\n\n#### <a name=\"output-record-separator\"></a>Output record separator\n\n* one way is to use `$\\` to specify a different output record separator\n    * by default it doesn't have a value\n\n```bash\n$ # note that despite $\\ not having a value, output has newlines\n$ # because the input record still has the input record separator\n$ seq 3 | perl -ne 'print'\n1\n2\n3\n$ # same as: awk -v ORS='\\n\\n' '{print $0}'\n$ seq 3 | perl -ne 'BEGIN{$\\=\"\\n\"} print'\n1\n\n2\n\n3\n\n$ seq 2 | perl -ne 'BEGIN{$\\=\"---\\n\"} print'\n1\n---\n2\n---\n```\n\n* dynamically changing output record separator\n\n```bash\n$ # same as: awk '{ORS = NR%2 ? \" \" : \"\\n\"} 1'\n$ # note the use of -l to chomp the input record separator\n$ seq 6 | perl -lpe '$\\ = $.%2 ? \" \" : \"\\n\"'\n1 2\n3 4\n5 6\n\n$ # -l also sets the output record separator\n$ # but gets overridden by $\\\n$ seq 6 | perl -lpe '$\\ = $.%3 ? \"-\" : \"\\n\"'\n1-2-3\n4-5-6\n```\n\n* passing argument to `-l` to set output record separator\n\n```bash\n$ seq 8 | perl -ne 'print if /[24]/'\n2\n4\n\n$ # null separator, note how -l also chomps input record separator\n$ seq 8 | perl -l0 -ne 'print if /[24]/' | cat -A\n2^@4^@\n\n$ # comma separator, won't have a newline at end\n$ seq 8 | perl -l054 -ne 'print if /[24]/'\n2,4,\n\n$ # to add a final newline to output, use END and printf\n$ seq 8 | perl -l054 -ne 'print if /[24]/; END{printf \"\\n\"}'\n2,4,\n```\n\n<br>\n\n## <a name=\"multiline-processing\"></a>Multiline processing\n\n* Processing consecutive lines\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # match two consecutive lines\n$ # same as: awk 'p~/are/ && /is/{print p ORS $0} {p=$0}' poem.txt\n$ perl -ne 'print $p,$_ if /is/ && $p=~/are/; $p=$_' poem.txt\nViolets are blue,\nSugar is sweet,\n$ # if only the second line is needed, same as: awk 'p~/are/ && /is/; {p=$0}'\n$ perl -ne 'print if /is/ && $p=~/are/; $p=$_' poem.txt\nSugar is sweet,\n\n$ # print if line matches a condition as well as condition for next 2 lines\n$ # same as: awk 'p2~/red/ && p1~/blue/ && /is/{print p2} {p2=p1; p1=$0}'\n$ perl -ne 'print $p2 if /is/ && $p1=~/blue/ && $p2=~/red/;\n            $p2=$p1; $p1=$_' poem.txt\nRoses are red,\n```\n\nConsider this sample input file\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\n* extracting lines around matching line\n* how `$n && $n--` works:\n    * need to note that right hand side of `&&` is processed only if left hand side is `true`\n    * so for example, if initially `$n=2`, then we get\n        * `2 && 2; $n=1` - evaluates to `true`\n        * `1 && 1; $n=0` - evaluates to `true`\n        * `0 && ` - evaluates to `false` ... no decrementing `$n` and hence will be `false` until `$n` is re-assigned non-zero value\n\n```bash\n$ # similar to: grep --no-group-separator -A1 'BEGIN' range.txt\n$ # same as: awk '/BEGIN/{n=2} n && n--' range.txt\n$ perl -ne '$n=2 if /BEGIN/; print if $n && $n--' range.txt\nBEGIN\n1234\nBEGIN\na\n\n$ # print only line after matching line, same as: awk 'n && n--; /BEGIN/{n=1}'\n$ perl -ne 'print if $n && $n--; $n=1 if /BEGIN/' range.txt\n1234\na\n\n$ # generic case: print nth line after match, awk 'n && !--n; /BEGIN/{n=3}'\n$ perl -ne 'print if $n && !--$n; $n=3 if /BEGIN/' range.txt\nEND\nc\n\n$ # print second line prior to matched line\n$ # same as: awk '/END/{print p2} {p2=p1; p1=$0}' range.txt\n$ perl -ne 'print $p2 if /END/; $p2=$p1; $p1=$_' range.txt\n1234\nb\n\n$ # use reversing trick for generic case of nth line before match\n$ # same as: tac range.txt | awk 'n && !--n; /END/{n=3}' | tac\n$ tac range.txt | perl -ne 'print if $n && !--$n; $n=3 if /END/' | tac\nBEGIN\na\n```\n\n**Further Reading**\n\n* [stackoverflow - multiline find and replace](https://stackoverflow.com/questions/39884112/perl-multiline-find-and-replace-with-regex)\n* [stackoverflow - delete line based on content of previous/next lines](https://stackoverflow.com/questions/49112877/delete-line-if-line-matches-foo-line-above-matches-bar-and-line-below-match)\n* [softwareengineering - FSM examples](https://softwareengineering.stackexchange.com/questions/47806/examples-of-finite-state-machines)\n* [wikipedia - FSM](https://en.wikipedia.org/wiki/Finite-state_machine)\n\n<br>\n\n## <a name=\"perl-regular-expressions\"></a>Perl regular expressions\n\n* examples to showcase some of the features not present in ERE and modifiers not available in `sed`'s substitute command\n* many features of Perl regular expressions will NOT be covered, but external links will be provided wherever relevant\n    * See [perldoc - perlre](https://perldoc.perl.org/perlre.html) for complete reference\n    * and [perldoc - regular expressions FAQ](https://perldoc.perl.org/perlfaq.html#the-perlfaq6-manpage%3a-Regular-Expressions)\n* examples/descriptions based only on ASCII encoding\n\n<br>\n\n#### <a name=\"sed-vs-perl-subtle-differences\"></a>sed vs perl subtle differences\n\n* input record separator being part of input record\n\n```bash\n$ echo 'foo:123:bar:789' | sed -E 's/[^:]+$/xyz/'\nfoo:123:bar:xyz\n$ # newline character gets replaced too as shown by shell prompt\n$ echo 'foo:123:bar:789' | perl -pe 's/[^:]+$/xyz/'\nfoo:123:bar:xyz$\n$ # simple workaround is to use -l option\n$ echo 'foo:123:bar:789' | perl -lpe 's/[^:]+$/xyz/'\nfoo:123:bar:xyz\n\n$ # of course it has uses too\n$ seq 10 | paste -sd, | sed 's/,/ : /g'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n$ seq 10 | perl -pe 's/\\n/ : / if !eof'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n```\n\n* how much does `*` match?\n\n```bash\n$ # sed will choose biggest match\n$ echo ',baz,,xyz,,,' | sed 's/[^,]*/A/g'\nA,A,A,A,A,A,A\n$ echo 'foo,baz,,xyz,,,123' | sed 's/[^,]*/A/g'\nA,A,A,A,A,A,A\n\n$ # but perl will match both empty and non-empty strings\n$ echo ',baz,,xyz,,,' | perl -lpe 's/[^,]*/A/g'\nA,AA,A,AA,A,A,A\n$ echo 'foo,baz,,xyz,,,123' | perl -lpe 's/[^,]*/A/g'\nAA,AA,A,AA,A,A,AA\n\n$ echo '42,789' | sed 's/[0-9]*/\"&\"/g'\n\"42\",\"789\"\n$ echo '42,789' | perl -lpe 's/\\d*/\"$&\"/g'\n\"42\"\"\",\"789\"\"\"\n$ echo '42,789' | perl -lpe 's/\\d+/\"$&\"/g'\n\"42\",\"789\"\n```\n\n* backslash sequences inside character classes\n\n```bash\n$ # \\w would simply match w\n$ echo 'w=y-x+9*3' | sed 's/[\\w=]//g'\ny-x+9*3\n\n$ # \\w would match any word character\n$ echo 'w=y-x+9*3' | perl -pe 's/[\\w=]//g'\n-+*\n```\n\n* replacing specific occurrence\n* See [stackoverflow - substitute the nth occurrence of a match in a Perl regex](https://stackoverflow.com/questions/2555662/how-can-i-substitute-the-nth-occurrence-of-a-match-in-a-perl-regex) for workarounds\n\n```bash\n$ echo 'foo:123:bar:baz' | sed 's/:/-/2'\nfoo:123-bar:baz\n\n$ echo 'foo:123:bar:baz' | perl -pe 's/:/-/2'\nUnknown regexp modifier \"/2\" at -e line 1, at end of line\nExecution of -e aborted due to compilation errors.\n$ # e modifier covered later, allows Perl code in replacement section\n$ echo 'foo:123:bar:baz' | perl -pe '$c=0; s/:/++$c==2 ? \"-\" : $&/ge'\nfoo:123-bar:baz\n$ # or use non-greedy and \\K(covered later), same as: sed 's/and/-/3'\n$ echo 'foo and bar and baz land good' | perl -pe 's/(and.*?){2}\\Kand/-/'\nfoo and bar and baz l- good\n\n$ # emulating GNU sed's number+g modifier\n$ a='456:foo:123:bar:789:baz\nx:y:z:a:v:xc:gf'\n$ echo \"$a\" | sed 's/:/-/3g'\n456:foo:123-bar-789-baz\nx:y:z-a-v-xc-gf\n$ echo \"$a\" | perl -pe '$c=0; s/:/++$c<3 ? $& : \"-\"/ge'\n456:foo:123-bar-789-baz\nx:y:z-a-v-xc-gf\n```\n\n* variable interpolation when `$` or `@` is used\n* See also [perldoc - Quote and Quote-like Operators](https://perldoc.perl.org/5.8.8/perlop.html#Quote-and-Quote-like-Operators)\n\n```bash\n$ seq 2 | sed 's/$x/xyz/'\n1\n2\n\n$ # uninitialized variable, same applies for: perl -pe 's/@a/xyz/'\n$ seq 2 | perl -pe 's/$x/xyz/'\nxyz1\nxyz2\n$ # initialized variable\n$ seq 2 | perl -pe '$x=2; s/$x/xyz/'\n1\nxyz\n\n$ # using single quotes as delimiter won't interpolate\n$ # not usable for one-liners given shell's own single/double quotes behavior\n$ cat sub_sq.pl\ns'$x'xyz'\n$ seq 2 | perl -p sub_sq.pl\n1\n2\n```\n\n* back reference\n* See also [perldoc - Warning on \\1 Instead of $1](https://perldoc.perl.org/perlre.html#Warning-on-%5c1-Instead-of-%241)\n\n```bash\n$ # use $& to refer entire matched string in replacement section\n$ echo 'hello world' | sed 's/.*/\"&\"/'\n\"hello world\"\n$ echo 'hello world' | perl -pe 's/.*/\"&\"/'\n\"&\"\n$ echo 'hello world' | perl -pe 's/.*/\"$&\"/'\n\"hello world\"\n\n$ # use \\1, \\2, etc or \\g1, \\g2 etc for back referencing in search section\n$ # use $1, $2, etc in replacement section\n$ echo 'a a a walking for for a cause' | perl -pe 's/\\b(\\w+)( \\1)+\\b/$1/g'\na walking for a cause\n```\n\n<br>\n\n#### <a name=\"backslash-sequences\"></a>Backslash sequences\n\n* `\\d` for `[0-9]`\n* `\\s` for `[ \\t\\r\\n\\f\\v]`\n* `\\h` for `[ \\t]`\n* `\\n` for newline character\n* `\\D`, `\\S`, `\\H`, `\\N` respectively for their opposites\n* See [perldoc - perlrecharclass](https://perldoc.perl.org/perlrecharclass.html#Backslash-sequences) for full list and details\n\n```bash\n$ # same as: sed -E 's/[0-9]+/xxx/g'\n$ echo 'like 42 and 37' | perl -pe 's/\\d+/xxx/g'\nlike xxx and xxx\n\n$ # same as: sed -E 's/[^0-9]+/xxx/g'\n$ # note again the use of -l because of newline in input record\n$ echo 'like 42 and 37' | perl -lpe 's/\\D+/xxx/g'\nxxx42xxx37\n\n$ # no need -l here as \\h won't match newline\n$ echo 'a b c  ' | perl -pe 's/\\h*$//'\na b c\n```\n\n<br>\n\n#### <a name=\"non-greedy-quantifier\"></a>Non-greedy quantifier\n\n* adding a `?` to `?` or `*` or `+` or `{}` quantifiers will change matching from greedy to non-greedy. In other words, to match as minimally as possible\n    * also known as lazy quantifier\n* See also [regular-expressions.info - Possessive Quantifiers](https://www.regular-expressions.info/possessive.html)\n\n```bash\n$ # greedy matching\n$ echo 'foo and bar and baz land good' | perl -pe 's/foo.*and//'\n good\n$ # non-greedy matching\n$ echo 'foo and bar and baz land good' | perl -pe 's/foo.*?and//'\n bar and baz land good\n\n$ echo '12342789' | perl -pe 's/\\d{2,5}//'\n789\n$ echo '12342789' | perl -pe 's/\\d{2,5}?//'\n342789\n\n$ # for single character, non-greedy is not always needed\n$ echo '123:42:789:good:5:bad' | perl -pe 's/:.*?:/:/'\n123:789:good:5:bad\n$ echo '123:42:789:good:5:bad' | perl -pe 's/:[^:]*:/:/'\n123:789:good:5:bad\n\n$ # just like greedy, overall matching is considered, as minimal as possible\n$ echo '123:42:789:good:5:bad' | perl -pe 's/:.*?:[a-z]/:/'\n123:ood:5:bad\n$ echo '123:42:789:good:5:bad' | perl -pe 's/:.*:[a-z]/:/'\n123:ad\n```\n\n<br>\n\n#### <a name=\"lookarounds\"></a>Lookarounds\n\n* Ability to add if conditions to match before/after required pattern\n* There are four types\n    * positive lookahead `(?=`\n    * negative lookahead `(?!`\n    * positive lookbehind `(?<=`\n    * negative lookbehind `(?<!`\n* One way to remember is that **behind** uses `<` and **negative** uses `!` instead of `=`\n\nThe string matched by lookarounds are like word boundaries and anchors, do not constitute as part of matched string. They are termed as **zero-width patterns**\n\n* positive lookbehind `(?<=`\n\n```bash\n$ s='foo=5, bar=3; x=83, y=120'\n\n$ # extract all digit sequences\n$ echo \"$s\" | perl -lne 'print join \" \", /\\d+/g'\n5 3 83 120\n\n$ # extract digits only if preceded by two lowercase alphabets and =\n$ # note how the characters matched by lookbehind isn't part of output\n$ echo \"$s\" | perl -lne 'print join \" \", /(?<=[a-z]{2}=)\\d+/g'\n5 3\n\n$ # this can be done without lookbehind too\n$ # taking advantage of behavior of //g when () is used\n$ echo \"$s\" | perl -lne 'print join \" \", /[a-z]{2}=(\\d+)/g'\n5 3\n\n$ # change all digits preceded by single lowercase alphabet and =\n$ echo \"$s\" | perl -pe 's/(?<=\\b[a-z]=)\\d+/42/g'\nfoo=5, bar=3; x=42, y=42\n$ # alternate, without lookbehind\n$ echo \"$s\" | perl -pe 's/(\\b[a-z]=)\\d+/${1}42/g'\nfoo=5, bar=3; x=42, y=42\n```\n\n* positive lookahead `(?=`\n\n```bash\n$ s='foo=5, bar=3; x=83, y=120'\n\n$ # extract digits that end with ,\n$ # can also use: perl -lne 'print join \":\", /(\\d+),/g'\n$ echo \"$s\" | perl -lne 'print join \":\", /\\d+(?=,)/g'\n5:83\n\n$ # change all digits ending with ,\n$ # can also use: perl -pe 's/\\d+,/42,/g'\n$ echo \"$s\" | perl -pe 's/\\d+(?=,)/42/g'\nfoo=42, bar=3; x=42, y=120\n\n$ # both lookbehind and lookahead\n$ echo 'foo,,baz,,,xyz' | perl -pe 's/,,/,NA,/g'\nfoo,NA,baz,NA,,xyz\n$ echo 'foo,,baz,,,xyz' | perl -pe 's/(?<=,)(?=,)/NA/g'\nfoo,NA,baz,NA,NA,xyz\n```\n\n* negative lookbehind `(?<!` and negative lookahead `(?!`\n\n```bash\n$ # change foo if not preceded by _\n$ # note how 'foo' at start of line is matched as well\n$ echo 'foo _foo 1foo' | perl -pe 's/(?<!_)foo/baz/g'\nbaz _foo 1baz\n\n$ # join each line in paragraph by replacing newline character\n$ # except the one at end of paragraph\n$ perl -00 -pe 's/\\n(?!$)/. /g' sample.txt\nHello World\n\nGood day. How are you\n\nJust do-it. Believe it\n\nToday is sunny. Not a bit funny. No doubt you like it too\n\nMuch ado about nothing. He he he\n```\n\n* `\\K` helps as a workaround for some of the variable-length lookbehind cases\n* See also [stackoverflow - Variable-length lookbehind-assertion alternatives](https://stackoverflow.com/questions/11640447/variable-length-lookbehind-assertion-alternatives-for-regular-expressions)\n\n```bash\n$ # lookbehind is checking start of line (0 characters) and comma(1 character)\n$ echo ',baz,,,xyz,,' | perl -pe 's/(?<=^|,)(?=,|$)/NA/g'\nVariable length lookbehind not implemented in regex m/(?<=^|,)(?=,|$)/ at -e line 1.\n\n$ # \\K helps in such cases\n$ echo ',baz,,,xyz,,' | perl -pe 's/(^|,)\\K(?=,|$)/NA/g'\nNA,baz,NA,NA,xyz,NA,NA\n```\n\n* some more examples\n\n```bash\n$ # helps to avoid , within fields for field splitting\n$ # note how the quotes are still part of field value\n$ echo '\"foo\",\"12,34\",\"good\"' | perl -F'/\"\\K,(?=\")/' -lane 'print $F[1]'\n\"12,34\"\n$ echo '\"foo\",\"12,34\",\"good\"' | perl -F'/\"\\K,(?=\")/' -lane 'print $F[2]'\n\"good\"\n\n$ # capture groups inside lookarounds\n$ echo 'a b c d e' | perl -pe 's/(\\H+\\h+)(?=(\\H+)\\h)/$1$2\\n/g'\na b\nb c\nc d\nd e\n$ # generic formula :)\n$ echo 'a b c d e' | perl -pe 's/(\\H+\\h+)(?=(\\H+(\\h+\\H+){1})\\h)/$1$2\\n/g'\na b c\nb c d\nc d e\n$ echo 'a b c d e' | perl -pe 's/(\\H+\\h+)(?=(\\H+(\\h+\\H+){2})\\h)/$1$2\\n/g'\na b c d\nb c d e\n```\n\n**Further Reading**\n\n* [stackoverflow - reverse four letter words](https://stackoverflow.com/questions/46870285/reverse-four-length-of-letters-with-sed-in-unix)\n* [stackoverflow - lookarounds and possessive quantifier](https://stackoverflow.com/questions/42437747/pcre-negative-lookahead-gives-unexpected-match)\n\n<br>\n\n#### <a name=\"ignoring-specific-matches\"></a>Ignoring specific matches\n\n* A useful construct is `(*SKIP)(*F)` which allows to discard matches not needed\n    * regular expression which should be discarded is written first, `(*SKIP)(*F)` is appended and then required regular expression is added after `|`\n\n```bash\n$ s='Car Bat cod12 Map foo_bar'\n$ # all words except those starting with 'c' or 'C'\n$ echo \"$s\" | perl -lne 'print join \"\\n\", /\\bc\\w+(*SKIP)(*F)|\\w+/gi'\nBat\nMap\nfoo_bar\n\n$ s='I like \"mango\" and \"guava\"'\n$ # all words except those surrounded by double quotes\n$ echo \"$s\" | perl -lne 'print join \"\\n\", /\"[^\"]+\"(*SKIP)(*F)|\\w+/g'\nI\nlike\nand\n$ # change words except those surrounded by double quotes\n$ echo \"$s\" | perl -pe 's/\"[^\"]+\"(*SKIP)(*F)|\\w+/\\U$&/g'\nI LIKE \"mango\" AND \"guava\"\n```\n\n* for line based decisions, simple if-else might help\n\n```bash\n$ cat nums.txt\n42\n-2\n10101\n-3.14\n-75\n\n$ # change +ve number to -ve and vice versa\n$ # note that empty regexp will reuse last successfully matched regexp\n$ perl -pe '/^-/ ? s/// : s/^/-/' nums.txt\n-42\n2\n-10101\n3.14\n75\n```\n\n**Further Reading**\n\n* [perldoc - Special Backtracking Control Verbs](https://perldoc.perl.org/perlre.html#Special-Backtracking-Control-Verbs)\n* [rexegg - Excluding Unwanted Matches](https://www.rexegg.com/backtracking-control-verbs.html#skipfail)\n\n<br>\n\n#### <a name=\"special-capture-groups\"></a>Special capture groups\n\n* `\\1`, `\\2` etc only matches exact string\n* `(?1)`, `(?2)` etc re-uses the regular expression itself\n\n```bash\n$ s='baz 2008-03-24 and 2012-08-12 foo 2016-03-25'\n$ # (?1) refers to first capture group (\\d{4}-\\d{2}-\\d{2})\n$ echo \"$s\" | perl -pe 's/(\\d{4}-\\d{2}-\\d{2}) and (?1)/XYZ/'\nbaz XYZ foo 2016-03-25\n\n$ # using \\1 won't work as the two dates are different\n$ echo \"$s\" | perl -pe 's/(\\d{4}-\\d{2}-\\d{2}) and \\1//'\nbaz 2008-03-24 and 2012-08-12 foo 2016-03-25\n```\n\n* use `(?:` to group regular expressions without capturing it, so this won't be counted for backreference\n* See also\n    * [stackoverflow - what is non-capturing group](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do)\n    * [stackoverflow - extract specific fields and key-value pairs](https://stackoverflow.com/questions/46632397/parse-vcf-files-info-field)\n\n```bash\n$ s='Car Bat cod12 Map foo_bar'\n$ # check what happens if ?: is not used\n$ echo \"$s\" | perl -lne 'print join \"\\n\", /(?:Bat|Map)(*SKIP)(*F)|\\w+/gi'\nCar\ncod12\nfoo_bar\n\n$ # using ?: helps to focus only on required capture groups\n$ echo 'cod1 foo_bar' | perl -pe 's/(?:co|fo)\\K(\\w)(\\w)/$2$1/g'\nco1d fo_obar\n$ # without ?: you'd need to remember all the other groups as well\n$ echo 'cod1 foo_bar' | perl -pe 's/(co|fo)\\K(\\w)(\\w)/$3$2/g'\nco1d fo_obar\n```\n\n* named capture groups `(?<name>`\n    * for backreference, use `\\k<name>`\n    * accessible via `%+` hash in replacement section\n\n```bash\n$ s='baz 2008-03-24 and 2012-08-12 foo 2016-03-25'\n$ echo \"$s\" | perl -pe 's/(\\d{4})-(\\d{2})-(\\d{2})/$3-$2-$1/g'\nbaz 24-03-2008 and 12-08-2012 foo 25-03-2016\n\n$ # naming the capture groups might offer clarity\n$ echo \"$s\" | perl -pe 's/(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})/$+{d}-$+{m}-$+{y}/g'\nbaz 24-03-2008 and 12-08-2012 foo 25-03-2016\n$ echo \"$s\" | perl -pe 's/(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})/$+{m}-$+{d}-$+{y}/g'\nbaz 03-24-2008 and 08-12-2012 foo 03-25-2016\n\n$ # and useful to transform different capture groups\n$ s='\"foo,bar\",123,\"x,y,z\",42'\n$ echo \"$s\" | perl -lpe 's/\"(?<a>[^\"]+)\",|(?<a>[^,]+),/$+{a}|/g'\nfoo,bar|123|x,y,z|42\n$ # can also use (?| branch reset\n$ echo \"$s\" | perl -lpe 's/(?|\"([^\"]+)\",|([^,]+),)/$1|/g'\nfoo,bar|123|x,y,z|42\n```\n\n**Further Reading**\n\n* [perldoc - Extended Patterns](https://perldoc.perl.org/perlre.html#Extended-Patterns)\n* [rexegg - all the (? usages](https://www.rexegg.com/regex-disambiguation.html)\n* [regular-expressions - recursion](https://www.regular-expressions.info/recurse.html#balanced)\n\n<br>\n\n#### <a name=\"modifiers\"></a>Modifiers\n\n* some are already seen, like the `g` (global match) and `i` (case insensitive matching)\n* first up, the `r` modifier which returns the substitution result instead of modifying the variable it is acting upon\n\n```bash\n$ perl -e '$x=\"feed\"; $y=$x=~s/e/E/gr; print \"x=$x\\ny=$y\\n\"'\nx=feed\ny=fEEd\n\n$ # the r modifier is available for transliteration operator too\n$ perl -e '$x=\"food\"; $y=$x=~tr/a-z/A-Z/r; print \"x=$x\\ny=$y\\n\"'\nx=food\ny=FOOD\n```\n\n* `e` modifier allows to use Perl code in replacement section instead of string\n* use `ee` if you need to construct a string and then apply evaluation\n\n```bash\n$ # replace numbers with their squares\n$ echo '4 and 10' | perl -pe 's/\\d+/$&*$&/ge'\n16 and 100\n\n$ # replace matched string with incremental value\n$ echo '4 and 10 foo 57' | perl -pe 's/\\d+/++$c/ge'\n1 and 2 foo 3\n$ # passing initial value\n$ echo '4 and 10 foo 57' | c=100 perl -pe 's/\\d+/$ENV{c}++/ge'\n100 and 101 foo 102\n\n$ # formatting string\n$ echo 'a1-2-deed' | perl -lpe 's/[^-]+/sprintf \"%04s\", $&/ge'\n00a1-0002-deed\n\n$ # calling a function\n$ echo 'food:12:explain:789' | perl -pe 's/\\w+/length($&)/ge'\n4:2:7:3\n\n$ # applying another substitution to matched string\n$ echo '\"mango\" and \"guava\"' | perl -pe 's/\"[^\"]+\"/$&=~s|a|A|gr/ge'\n\"mAngo\" and \"guAvA\"\n```\n\n* multiline modifiers\n\n```bash\n$ # m modifier to match beginning/end of each line within multiline string\n$ perl -00 -ne 'print if /^Believe/' sample.txt\n$ perl -00 -ne 'print if /^Believe/m' sample.txt\nJust do-it\nBelieve it\n\n$ perl -00 -ne 'print if /funny$/' sample.txt\n$ perl -00 -ne 'print if /funny$/m' sample.txt\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\n$ # s modifier to allow . meta character to match newlines as well\n$ perl -00 -ne 'print if /do.*he/' sample.txt\n$ perl -00 -ne 'print if /do.*he/s' sample.txt\nMuch ado about nothing\nHe he he\n```\n\n**Further Reading**\n\n* [perldoc - perlre Modifiers](https://perldoc.perl.org/perlre.html#Modifiers)\n* [stackoverflow - replacement within matched string](https://stackoverflow.com/questions/40458639/replacement-within-the-matched-string-with-sed)\n\n<br>\n\n#### <a name=\"quoting-metacharacters\"></a>Quoting metacharacters\n\n* part of regular expression can be surrounded within `\\Q` and `\\E` to prevent matching meta characters within that portion\n    * however, `$` and `@` would still be interpolated as long as delimiter isn't single quotes\n    * `\\E` is optional if applying `\\Q` till end of search expression\n* typical use case is string to be protected is already present in a variable, for ex: user input or result of another command\n* quotemeta will add a backslash to all characters other than `\\w` characters\n* See also [perldoc - Quoting metacharacters](https://perldoc.perl.org/perlre.html#Quoting-metacharacters)\n\n```bash\n$ # quotemeta in action\n$ perl -le '$x=\"[a].b+c^\"; print quotemeta $x'\n\\[a\\]\\.b\\+c\\^\n\n$ # same as: s='a+b' perl -ne 'print if index($_, $ENV{s})==0' eqns.txt\n$ s='a+b' perl -ne 'print if /^\\Q$ENV{s}/' eqns.txt\na+b,pi=3.14,5e12\n\n$ s='a+b' perl -pe 's/^\\Q$ENV{s}/ABC/' eqns.txt\na=b,a-b=c,c*d\nABC,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ s='a+b' perl -pe 's/\\Q$ENV{s}\\E.*,/ABC,/' eqns.txt\na=b,a-b=c,c*d\nABC,5e12\ni*(t+9-g)/8,4-a+b\n```\n\n* use `q` operator for replacement section\n* it would treat contents as if they were placed inside single quotes and hence no interpolation\n* See also [perldoc - Quote and Quote-like Operators](https://perldoc.perl.org/5.8.8/perlop.html#Quote-and-Quote-like-Operators)\n\n```bash\n$ # q in action\n$ perl -le '$x=\"[a].b+c^$@123\"; print $x'\n[a].b+c^123\n$ perl -le '$x=q([a].b+c^$@123); print $x'\n[a].b+c^$@123\n$ perl -le '$x=q([a].b+c^$@123); print quotemeta $x'\n\\[a\\]\\.b\\+c\\^\\$\\@123\n\n$ echo 'foo 123' | perl -pe 's/foo/$foo/'\n 123\n$ echo 'foo 123' | perl -pe 's/foo/q($foo)/e'\n$foo 123\n$ echo 'foo 123' | perl -pe 's/foo/q{$f)oo}/e'\n$f)oo 123\n\n$ # string saved in other variables do not need special attention\n$ echo 'foo 123' | s='a$b' perl -pe 's/foo/$ENV{s}/'\na$b 123\n$ echo 'foo 123' | perl -pe 's/foo/a$b/'\na 123\n```\n\n<br>\n\n#### <a name=\"matching-position\"></a>Matching position\n\n* From [perldoc - perlvar](https://perldoc.perl.org/perlvar.html#SPECIAL-VARIABLES)\n\n>$-[0] is the offset of the start of the last successful match\n\n>$+[0] is the offset into the string of the end of the entire match\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # starting position of match\n$ perl -lne 'print \"line: $., offset: $-[0]\" if /are/' poem.txt\nline: 1, offset: 6\nline: 2, offset: 8\nline: 4, offset: 7\n$ # if offset is needed starting from 1 instead of 0\n$ perl -lne 'print \"line: $., offset: \",$-[0]+1 if /are/' poem.txt\nline: 1, offset: 7\nline: 2, offset: 9\nline: 4, offset: 8\n\n$ # ending position of match\n$ perl -lne 'print \"line: $., offset: $+[0]\" if /are/' poem.txt\nline: 1, offset: 9\nline: 2, offset: 11\nline: 4, offset: 10\n```\n\n* for multiple matches, use `while` loop to go over all the matches\n\n```bash\n$ perl -lne 'print \"$.:$&:$-[0]\" while /is|so|are/g' poem.txt\n1:are:6\n2:are:8\n3:is:6\n4:so:4\n4:are:7\n```\n\n<br>\n\n## <a name=\"using-modules\"></a>Using modules\n\n* There are many standard modules available that come with Perl installation\n* and many more available from **Comprehensive Perl Archive Network** (CPAN)\n    * [stackoverflow - easiest way to install a missing module](https://stackoverflow.com/questions/65865/whats-the-easiest-way-to-install-a-missing-perl-module)\n\n```bash\n$ echo '34,17,6' | perl -F, -lane 'BEGIN{use List::Util qw(max)} print max @F'\n34\n$ # -M option provides a way to specify modules from command line\n$ echo '34,17,6' | perl -MList::Util=max -F, -lane 'print max @F'\n34\n$ echo '34,17,6' | perl -MList::Util=sum0 -F, -lane 'print sum0 @F'\n57\n$ echo '34,17,6' | perl -MList::Util=product -F, -lane 'print product @F'\n3468\n\n$ s='1,2,3,4,5'\n$ echo \"$s\" | perl -MList::Util=shuffle -F, -lane 'print join \",\",shuffle @F'\n5,3,4,1,2\n\n$ s='3,b,a,c,d,1,d,c,2,3,1,b'\n$ echo \"$s\" | perl -MList::MoreUtils=uniq -F, -lane 'print join \",\",uniq @F'\n3,b,a,c,d,1,2\n\n$ echo 'foo 123 baz' | base64\nZm9vIDEyMyBiYXoK\n$ echo 'foo 123 baz' | perl -MMIME::Base64 -ne 'print encode_base64 $_'\nZm9vIDEyMyBiYXoK\n$ echo 'Zm9vIDEyMyBiYXoK' | perl -MMIME::Base64 -ne 'print decode_base64 $_'\nfoo 123 baz\n```\n\n* a cool module [O](https://perldoc.perl.org/O.html) helps to convert one-liners to full fledged programs\n    * similar to `-o` option for GNU awk\n\n```bash\n$ # command being deparsed is discussed in a later section\n$ perl -MO=Deparse -ne 'if(!$#ARGV){$h{$_}=1; next}\n            print if $h{$_}' colors_1.txt colors_2.txt\nLINE: while (defined($_ = <ARGV>)) {\n    unless ($#ARGV) {\n        $h{$_} = 1;\n        next;\n    }\n    print $_ if $h{$_};\n}\n-e syntax OK\n\n$ perl -MO=Deparse -00 -ne 'print if /it/' sample.txt\nBEGIN { $/ = \"\"; $\\ = undef; }\nLINE: while (defined($_ = <ARGV>)) {\n    print $_ if /it/;\n}\n-e syntax OK\n```\n\n**Further Reading**\n\n* [perldoc - perlmodlib](https://perldoc.perl.org/perlmodlib.html)\n* [perldoc - Core modules](https://perldoc.perl.org/index-modules-L.html)\n* [unix.stackexchange - example for Algorithm::Combinatorics](https://unix.stackexchange.com/questions/310840/better-solution-for-finding-id-groups-permutations-combinations)\n* [unix.stackexchange - example for Text::ParseWords](https://unix.stackexchange.com/questions/319301/excluding-enclosed-delimiters-with-cut)\n* [stackoverflow - regular expression modules](https://stackoverflow.com/questions/3258847/what-are-good-perl-pattern-matching-regex-modules)\n* [metacpan - String::Approx](https://metacpan.org/pod/String::Approx) - Perl extension for approximate matching (fuzzy matching)\n* [metacpan - Tie::IxHash](https://metacpan.org/pod/Tie::IxHash) - ordered associative arrays for Perl\n\n<br>\n\n## <a name=\"two-file-processing\"></a>Two file processing\n\nFirst, a bit about `$#ARGV` and hash variables\n\n```bash\n$ # $#ARGV can be used to know which file is being processed\n$ perl -lne 'print $#ARGV' <(seq 2) <(seq 3) <(seq 1)\n1\n1\n0\n0\n0\n-1\n\n$ # creating hash variable\n$ # checking if a key is present using exists\n$ # or if value is known to evaluate to true\n$ perl -le '$h{\"a\"}=5; $h{\"b\"}=0; $h{1}=\"abc\";\n            print \"key:a value=\", $h{\"a\"};\n            print \"key:b present\" if exists $h{\"b\"};\n            print \"key:1 present\" if $h{1}'\nkey:a value=5\nkey:b present\nkey:1 present\n```\n\n<br>\n\n#### <a name=\"comparing-whole-lines\"></a>Comparing whole lines\n\nConsider the following test files\n\n```bash\n$ cat colors_1.txt\nBlue\nBrown\nPurple\nRed\nTeal\nYellow\n\n$ cat colors_2.txt\nBlack\nBlue\nGreen\nRed\nWhite\n```\n\n* For two files as input, `$#ARGV` will be `0` only when first file is being processed\n* Using `next` will skip rest of code\n* entire line is used as key\n\n```bash\n$ # common lines\n$ # note that all duplicates matching in second file would get printed\n$ # same as: grep -Fxf colors_1.txt colors_2.txt\n$ # same as: awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt\n$ perl -ne 'if(!$#ARGV){$h{$_}=1; next}\n            print if $h{$_}' colors_1.txt colors_2.txt\nBlue\nRed\n$ # can also use: perl -ne '!$#ARGV ? $h{$_}=1 : $h{$_} && print'\n\n$ # lines from colors_2.txt not present in colors_1.txt\n$ # same as: grep -vFxf colors_1.txt colors_2.txt\n$ # same as: awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt\n$ perl -ne 'if(!$#ARGV){$h{$_}=1; next}\n            print if !$h{$_}' colors_1.txt colors_2.txt\nBlack\nGreen\nWhite\n```\n\n* alternative constructs\n* `<FILEHANDLE>` reads line(s) from the specified file\n    * defaults to current file argument(includes stdin as well), so `<>` can be used as shortcut\n    * `<STDIN>` will read only from stdin, there are also predefined handles for stdout/stderr\n    * in list context, all the lines would be read\n    * See [perldoc - I/O Operators](https://perldoc.perl.org/perlop.html#I%2fO-Operators) for details\n\n```bash\n$ # using if-else instead of next\n$ perl -ne 'if(!$#ARGV){ $h{$_}=1 }\n            else{ print if $h{$_} }' colors_1.txt colors_2.txt\nBlue\nRed\n\n$ # read all lines of first file in BEGIN block\n$ # <> reads a line from current file argument\n$ # eof will ensure only first file is read\n$ perl -ne 'BEGIN{ $h{<>}=1 while !eof; }\n            print if $h{$_}' colors_1.txt colors_2.txt\nBlue\nRed\n$ # this method also allows to easily reset line number\n$ # close ARGV is similar to calling nextfile in GNU awk\n$ perl -ne 'BEGIN{ $h{<>}=1 while !eof; close ARGV}\n            print \"$.\\n\" if $h{$_}' colors_1.txt colors_2.txt\n2\n4\n\n$ # or pass 1st file content as STDIN, $. will be automatically reset as well\n$ perl -ne 'BEGIN{ $h{$_}=1 while <STDIN> }\n            print if $h{$_}' <colors_1.txt colors_2.txt\nBlue\nRed\n```\n\n<br>\n\n#### <a name=\"comparing-specific-fields\"></a>Comparing specific fields\n\nConsider the sample input file\n\n```bash\n$ cat marks.txt\nDept    Name    Marks\nECE     Raj     53\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nEEE     Tia     59\nECE     Om      92\nCSE     Amy     67\n```\n\n* single field\n* For ex: only first field comparison instead of entire line as key\n\n```bash\n$ cat list1\nECE\nCSE\n\n$ # extract only lines matching first field specified in list1\n$ # same as: awk 'NR==FNR{a[$1]; next} $1 in a' list1 marks.txt\n$ perl -ane 'if(!$#ARGV){ $h{$F[0]}=1 }\n             else{ print if $h{$F[0]} }' list1 marks.txt\nECE     Raj     53\nECE     Joel    72\nCSE     Surya   81\nECE     Om      92\nCSE     Amy     67\n\n$ # if header is needed as well\n$ # same as: awk 'NR==FNR{a[$1]; next} FNR==1 || $1 in a' list1 marks.txt\n$ perl -ane 'if(!$#ARGV){ $h{$F[0]}=1; $.=0 }\n             else{ print if $h{$F[0]} || $.==1 }' list1 marks.txt\nDept    Name    Marks\nECE     Raj     53\nECE     Joel    72\nCSE     Surya   81\nECE     Om      92\nCSE     Amy     67\n```\n\n* multiple field comparison\n\n```bash\n$ cat list2\nEEE Moi\nCSE Amy\nECE Raj\n\n$ # extract only lines matching both fields specified in list2\n$ # same as: awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' list2 marks.txt\n$ # default SUBSEP(stored in $;) is \\034, same as GNU awk\n$ perl -ane 'if(!$#ARGV){ $h{$F[0],$F[1]}=1 }\n             else{ print if $h{$F[0],$F[1]} }' list2 marks.txt\nECE     Raj     53\nEEE     Moi     68\nCSE     Amy     67\n\n$ # or use multidimensional hash\n$ perl -ane 'if(!$#ARGV){ $h{$F[0]}{$F[1]}=1 }\n             else{ print if $h{$F[0]}{$F[1]} }' list2 marks.txt\nECE     Raj     53\nEEE     Moi     68\nCSE     Amy     67\n```\n\n* field and value comparison\n\n```bash\n$ cat list3\nECE 70\nEEE 65\nCSE 80\n\n$ # extract line matching Dept and minimum marks specified in list3\n$ # same as: awk 'NR==FNR{d[$1]; m[$1]=$2; next} $1 in d && $3 >= m[$1]'\n$ perl -ane 'if(!$#ARGV){ $d{$F[0]}=1; $m{$F[0]}=$F[1] }\n             else{ print if $d{$F[0]} && $F[2]>=$m{$F[0]} }' list3 marks.txt\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nECE     Om      92\n```\n\n* See also [stackoverflow - Fastest way to find lines of a text file from another larger text file](https://stackoverflow.com/questions/42239179/fastest-way-to-find-lines-of-a-text-file-from-another-larger-text-file-in-bash)\n\n<br>\n\n#### <a name=\"line-number-matching\"></a>Line number matching\n\n```bash\n$ # replace mth line in poem.txt with nth line from nums.txt\n$ # assumes that there are at least n lines in nums.txt\n$ # same as: awk -v m=3 -v n=2 'BEGIN{while(n-- > 0) getline s < \"nums.txt\"}\n$ #                             FNR==m{$0=s} 1' poem.txt\n$ m=3 n=2 perl -pe 'BEGIN{ $s=<> while $ENV{n}-- > 0; close ARGV}\n                    $_=$s if $.==$ENV{m}' nums.txt poem.txt\nRoses are red,\nViolets are blue,\n-2\nAnd so are you.\n\n$ # print line from fruits.txt if corresponding line from nums.txt is +ve number\n$ # same as: awk -v file='nums.txt' '(getline num < file)==1 && num>0'\n$ <nums.txt perl -ne 'print if <STDIN> > 0' fruits.txt\nfruit   qty\nbanana  31\n```\n\n<br>\n\n## <a name=\"creating-new-fields\"></a>Creating new fields\n\n* Number of fields in input record can be changed by simply manipulating `$#F`\n\n```bash\n$ s='foo,bar,123,baz'\n\n$ # reducing fields\n$ # same as: awk -F, -v OFS=, '{NF=2} 1'\n$ echo \"$s\" | perl -F, -lane '$,=\",\"; $#F=1; print @F'\nfoo,bar\n\n$ # creating new empty field(s)\n$ # same as: awk -F, -v OFS=, '{NF=5} 1'\n$ echo \"$s\" | perl -F, -lane '$,=\",\"; $#F=4; print @F'\nfoo,bar,123,baz,\n\n$ # assigning to field greater than $#F will create empty fields as needed\n$ # same as: awk -F, -v OFS=, '{$7=42} 1'\n$ echo \"$s\" | perl -F, -lane '$,=\",\"; $F[6]=42; print @F'\nfoo,bar,123,baz,,,42\n```\n\n* adding a field based on existing fields\n    * See also [split](#split) and [Array operations](#array-operations) sections\n\n```bash\n$ # adding a new 'Grade' field\n$ # same as: awk 'BEGIN{OFS=\"\\t\"; split(\"DCBAS\",g,//)}\n$ #          {NF++; $NF = NR==1 ? \"Grade\" : g[int($(NF-1)/10)-4]} 1' marks.txt\n$ perl -lane 'BEGIN{$,=\"\\t\"; @g = split //, \"DCBAS\"} $#F++;\n              $F[-1] = $.==1 ? \"Grade\" : $g[$F[-2]/10 - 5]; print @F' marks.txt\nDept    Name    Marks   Grade\nECE     Raj     53      D\nECE     Joel    72      B\nEEE     Moi     68      C\nCSE     Surya   81      A\nEEE     Tia     59      D\nECE     Om      92      S\nCSE     Amy     67      C\n\n$ # alternate syntax: array initialization and appending array element\n$ perl -lane 'BEGIN{$,=\"\\t\"; @g = qw(D C B A S)}\n              push @F, $.==1 ? \"Grade\" : $g[$F[-1]/10 - 5]; print @F' marks.txt\n```\n\n* two file example\n\n```bash\n$ cat list4\nRaj class_rep\nAmy sports_rep\nTia placement_rep\n\n$ # same as: awk -v OFS='\\t' 'NR==FNR{r[$1]=$2; next}\n$ #          {NF++; $NF = FNR==1 ? \"Role\" : $NF=r[$2]} 1' list4 marks.txt\n$ perl -lane 'if(!$#ARGV){ $r{$F[0]}=$F[1]; $.=0 }\n              else{ push @F, $.==1 ? \"Role\" : $r{$F[1]};\n                    print join \"\\t\", @F }' list4 marks.txt\nDept    Name    Marks   Role\nECE     Raj     53      class_rep\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nEEE     Tia     59      placement_rep\nECE     Om      92\nCSE     Amy     67      sports_rep\n```\n\n<br>\n\n## <a name=\"multiple-file-input\"></a>Multiple file input\n\n* there is no gawk's `FNR/BEGINFILE/ENDFILE` equivalent in perl, but it can be worked around\n\n```bash\n$ # same as: awk 'FNR==2' poem.txt greeting.txt\n$ # close ARGV will reset $. to 0\n$ perl -ne 'print if $.==2; close ARGV if eof' poem.txt greeting.txt\nViolets are blue,\nHave a safe journey\n\n$ # same as: awk 'BEGINFILE{print \"file: \"FILENAME} ENDFILE{print $0\"\\n------\"}'\n$ perl -lne 'print \"file: $ARGV\" if $.==1;\n             print \"$_\\n------\" and close ARGV if eof' poem.txt greeting.txt\nfile: poem.txt\nAnd so are you.\n------\nfile: greeting.txt\nHave a safe journey\n------\n```\n\n* workaround for gawk's `nextfile`\n* to skip remaining lines from current file being processed and move on to next file\n\n```bash\n$ # same as: head -q -n1 and awk 'FNR>1{nextfile} 1'\n$ perl -pe 'close ARGV if $.>=1' poem.txt greeting.txt fruits.txt\nRoses are red,\nHello there\nfruit   qty\n\n$ # same as: awk 'tolower($1) ~ /red/{print FILENAME; nextfile}' *\n$ perl -lane 'print $ARGV and close ARGV if $F[0] =~ /red/i' *\ncolors_1.txt\ncolors_2.txt\n```\n\n<br>\n\n## <a name=\"dealing-with-duplicates\"></a>Dealing with duplicates\n\n* retain only first copy of duplicates\n\n```bash\n$ cat duplicates.txt\nabc  7   4\nfood toy ****\nabc  7   4\ntest toy 123\ngood toy ****\n\n$ # whole line, same as: awk '!seen[$0]++' duplicates.txt\n$ perl -ne 'print if !$seen{$_}++' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # particular column, same as: awk '!seen[$2]++' duplicates.txt\n$ perl -ane 'print if !$seen{$F[1]}++' duplicates.txt\nabc  7   4\nfood toy ****\n\n$ # total count, same as: awk '!seen[$2]++{c++} END{print +c}' duplicates.txt\n$ perl -lane '$c++ if !$seen{$F[1]}++; END{print $c+0}' duplicates.txt\n2\n```\n\n* if input is so large that integer numbers can overflow\n* See also [perldoc - bignum](https://perldoc.perl.org/bignum.html)\n\n```bash\n$ perl -le 'print \"equal\" if\n   102**33==1922231403943151831696327756255167543169267432774552016351387451392'\n$ # -M option here enables the use of bignum module\n$ perl -Mbignum -le 'print \"equal\" if\n   102**33==1922231403943151831696327756255167543169267432774552016351387451392'\nequal\n\n$ # avoid unnecessary counting altogether\n$ # same as: awk '!($2 in seen); {seen[$2]}' duplicates.txt\n$ perl -ane 'print if !$seen{$F[1]}; $seen{$F[1]}=1' duplicates.txt\nabc  7   4\nfood toy ****\n\n$ # same as: awk -M '!($2 in seen){c++} {seen[$2]} END{print +c}' duplicates.txt\n$ perl -Mbignum -lane '$c++ if !$seen{$F[1]}; $seen{$F[1]}=1;\n                       END{print $c+0}' duplicates.txt\n2\n```\n\n* multiple fields\n* See also [unix.stackexchange - based on same fields that could be in different order](https://unix.stackexchange.com/questions/325619/delete-lines-that-contain-the-same-information-but-in-different-order)\n\n```bash\n$ # same as: awk '!seen[$2,$3]++' duplicates.txt\n$ # default SUBSEP(stored in $;) is \\034, same as GNU awk\n$ perl -ane 'print if !$seen{$F[1],$F[2]}++' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\n\n$ # or use multidimensional key\n$ perl -ane 'print if !$seen{$F[1]}{$F[2]}++' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\n```\n\n* retaining specific copy\n\n```bash\n$ # second occurrence of duplicate\n$ # same as: awk '++seen[$2]==2' duplicates.txt\n$ perl -ane 'print if ++$seen{$F[1]}==2' duplicates.txt\nabc  7   4\ntest toy 123\n\n$ # third occurrence of duplicate\n$ # same as: awk '++seen[$2]==3' duplicates.txt\n$ perl -ane 'print if ++$seen{$F[1]}==3' duplicates.txt\ngood toy ****\n\n$ # retaining only last copy of duplicate\n$ # reverse the input line-wise, retain first copy and then reverse again\n$ # same as: tac duplicates.txt | awk '!seen[$2]++' | tac\n$ tac duplicates.txt | perl -ane 'print if !$seen{$F[1]}++' | tac\nabc  7   4\ngood toy ****\n```\n\n* filtering based on duplicate count\n* allows to emulate [uniq](./sorting_stuff.md#uniq) command for specific fields\n\n```bash\n$ # all duplicates based on 1st column\n$ # same as: awk 'NR==FNR{a[$1]++; next} a[$1]>1' duplicates.txt duplicates.txt\n$ perl -ane 'if(!$#ARGV){ $x{$F[0]}++ }\n             else{ print if $x{$F[0]}>1 }' duplicates.txt duplicates.txt\nabc  7   4\nabc  7   4\n\n$ # more than 2 duplicates based on 2nd column\n$ # same as: awk 'NR==FNR{a[$2]++; next} a[$2]>2' duplicates.txt duplicates.txt\n$ perl -ane 'if(!$#ARGV){ $x{$F[1]}++ }\n             else{ print if $x{$F[1]}>2 }' duplicates.txt duplicates.txt\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # only unique lines based on 3rd column\n$ # same as: awk 'NR==FNR{a[$3]++; next} a[$3]==1' duplicates.txt duplicates.txt\n$ perl -ane 'if(!$#ARGV){ $x{$F[2]}++ }\n             else{ print if $x{$F[2]}==1 }' duplicates.txt duplicates.txt\ntest toy 123\n```\n\n<br>\n\n## <a name=\"lines-between-two-regexps\"></a>Lines between two REGEXPs\n\n* This section deals with filtering lines bound by two *REGEXP*s (referred to as blocks)\n* For simplicity the two *REGEXP*s usually used in below examples are the strings **BEGIN** and **END**\n\n<br>\n\n#### <a name=\"all-unbroken-blocks\"></a>All unbroken blocks\n\nConsider the below sample input file, which doesn't have any unbroken blocks (i.e **BEGIN** and **END** are always present in pairs)\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\n* Extracting lines between starting and ending *REGEXP*\n\n```bash\n$ # include both starting/ending REGEXP\n$ # same as: awk '/BEGIN/{f=1} f; /END/{f=0}' range.txt\n$ perl -ne '$f=1 if /BEGIN/; print if $f; $f=0 if /END/' range.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\na\nb\nc\nEND\n\n$ # can also use: perl -ne 'print if /BEGIN/../END/' range.txt\n$ # which is similar to sed -n '/BEGIN/,/END/p'\n$ # but not suitable to extend for other cases\n```\n\n* other variations\n\n```bash\n$ # same as: awk '/END/{f=0} f; /BEGIN/{f=1}' range.txt\n$ perl -ne '$f=0 if /END/; print if $f; $f=1 if /BEGIN/' range.txt\n1234\n6789\na\nb\nc\n\n$ # check out what these do:\n$ perl -ne '$f=1 if /BEGIN/; $f=0 if /END/; print if $f' range.txt\n$ perl -ne 'print if $f; $f=0 if /END/; $f=1 if /BEGIN/' range.txt\n```\n\n* Extracting lines other than lines between the two *REGEXP*s\n\n```bash\n$ # same as: awk '/BEGIN/{f=1} !f; /END/{f=0}' range.txt\n$ # can also use: perl -ne 'print if !(/BEGIN/../END/)' range.txt\n$ perl -ne '$f=1 if /BEGIN/; print if !$f; $f=0 if /END/' range.txt\nfoo\nbar\nbaz\n\n$ # the other three cases would be\n$ perl -ne '$f=0 if /END/; print if !$f; $f=1 if /BEGIN/' range.txt\n$ perl -ne 'print if !$f; $f=1 if /BEGIN/; $f=0 if /END/' range.txt\n$ perl -ne '$f=1 if /BEGIN/; $f=0 if /END/; print if !$f' range.txt\n```\n\n<br>\n\n#### <a name=\"specific-blocks\"></a>Specific blocks\n\n* Getting first block\n\n```bash\n$ # same as: awk '/BEGIN/{f=1} f; /END/{exit}' range.txt\n$ perl -ne '$f=1 if /BEGIN/; print if $f; exit if /END/' range.txt\nBEGIN\n1234\n6789\nEND\n\n$ # use other tricks discussed in previous section as needed\n$ # same as: awk '/END/{exit} f; /BEGIN/{f=1}' range.txt\n$ perl -ne 'exit if /END/; print if $f; $f=1 if /BEGIN/' range.txt\n1234\n6789\n```\n\n* Getting last block\n\n```bash\n$ # reverse input linewise, change the order of REGEXPs, finally reverse again\n$ # same as: tac range.txt | awk '/END/{f=1} f; /BEGIN/{exit}' | tac\n$ tac range.txt | perl -ne '$f=1 if /END/; print if $f; exit if /BEGIN/' | tac\nBEGIN\na\nb\nc\nEND\n\n$ # or, save the blocks in a buffer and print the last one alone\n$ # same as: awk '/4/{f=1; b=$0; next} f{b=b ORS $0} /6/{f=0} END{print b}'\n$ seq 30 | perl -ne 'if(/4/){$f=1; $b=$_; next}\n                     $b.=$_ if $f; $f=0 if /6/; END{print $b}'\n24\n25\n26\n```\n\n* Getting blocks based on a counter\n\n```bash\n$ # get only 2nd block\n$ # same as: seq 30 | awk -v b=2 '/4/{c++} c==b{print; if(/6/) exit}'\n$ seq 30 | b=2 perl -ne '$c++ if /4/; if($c==$ENV{b}){print; exit if /6/}'\n14\n15\n16\n\n$ # to get all blocks greater than 'b' blocks\n$ # same as: seq 30 | awk -v b=1 '/4/{f=1; c++} f && c>b; /6/{f=0}'\n$ seq 30 | b=1 perl -ne '$f=1, $c++ if /4/;\n                         print if $f && $c>$ENV{b}; $f=0 if /6/'\n14\n15\n16\n24\n25\n26\n```\n\n* excluding a particular block\n\n```bash\n$ # excludes 2nd block\n$ # same as: seq 30 | awk -v b=2 '/4/{f=1; c++} f && c!=b; /6/{f=0}'\n$ seq 30 | b=2 perl -ne '$f=1, $c++ if /4/;\n                         print if $f && $c!=$ENV{b}; $f=0 if /6/'\n4\n5\n6\n24\n25\n26\n```\n\n* extract block only if it matches another string as well\n\n```bash\n$ # string to match inside block: 23\n$ perl -ne 'if(/BEGIN/){$f=1; $m=0; $b=\"\"}; $m=1 if $f && /23/;\n            $b.=$_ if $f; if(/END/){print $b if $m; $f=0}' range.txt\nBEGIN\n1234\n6789\nEND\n\n$ # line to match inside block: 5 or 25\n$ seq 30 | perl -ne 'if(/4/){$f=1; $m=0; $b=\"\"}; $m=1 if $f && /^(5|25)$/;\n                     $b.=$_ if $f; if(/6/){print $b if $m; $f=0}'\n4\n5\n6\n24\n25\n26\n```\n\n<br>\n\n#### <a name=\"broken-blocks\"></a>Broken blocks\n\n* If there are blocks with ending *REGEXP* but without corresponding start, earlier techniques used will suffice\n* Consider the modified input file where starting *REGEXP* doesn't have corresponding ending\n\n```bash\n$ cat broken_range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nbaz\n\n$ # the file reversing trick comes in handy here as well\n$ # same as: tac broken_range.txt | awk '/END/{f=1} f; /BEGIN/{f=0}' | tac\n$ tac broken_range.txt | perl -ne '$f=1 if /END/;\n                         print if $f; $f=0 if /BEGIN/' | tac\nBEGIN\n1234\n6789\nEND\n```\n\n* But if both kinds of broken blocks are present, for ex:\n\n```bash\n$ cat multiple_broken.txt\nqqqqqqq\nBEGIN\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nEND\n0-42-1\nBEGIN\na\nBEGIN\nb\nEND\nxyzabc\n```\n\nthen use buffers to accumulate the records and print accordingly\n\n```bash\n$ # same as: awk '/BEGIN/{f=1; buf=$0; next} f{buf=buf ORS $0}\n$ #          /END/{f=0; if(buf) print buf; buf=\"\"}' multiple_broken.txt\n$ perl -ne 'if(/BEGIN/){$f=1; $b=$_; next} $b.=$_ if $f;\n            if(/END/){$f=0; print $b if $b; $b=\"\"}' multiple_broken.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\nb\nEND\n\n$ # note how buffer is initialized as well as cleared\n$ # on matching beginning/end REGEXPs respectively\n$ # 'undef $b' can also be used here instead of $b=\"\"\n```\n\n<br>\n\n## <a name=\"array-operations\"></a>Array operations\n\n* initialization\n\n```bash\n$ # list example, each value is separated by comma\n$ perl -e '($x, $y) = (4, 5); print \"$x:$y\\n\"'\n4:5\n\n$ # using list to initialize arrays, allows variable interpolation\n$ # ($x, $y) = ($y, $x) will swap variables :)\n$ perl -e '@nums = (4, 5, 84); print \"@nums\\n\"'\n4 5 84\n$ perl -e '@nums = (4, 5, 84, \"foo\"); print \"@nums\\n\"'\n4 5 84 foo\n$ perl -e '$x=5; @y=(3, 2); @nums = ($x, \"good\", @y); print \"@nums\\n\"'\n5 good 3 2\n\n$ # use qw to specify string elements separated by space, no interpolation\n$ perl -e '@nums = qw(4 5 84 \"foo\"); print \"@nums\\n\"'\n4 5 84 \"foo\"\n$ perl -e '@nums = qw(a $x @y); print \"@nums\\n\"'\na $x @y\n$ # use different delimiter as needed\n$ perl -e '@nums = qw/baz 1)foo/; print \"@nums\\n\"'\nbaz 1)foo\n```\n\n* accessing individual elements\n* See also [perldoc - functions for arrays](https://perldoc.perl.org/index-functions-by-cat.html#Functions-for-real-@ARRAYs) for push,pop,shift,unshift functions\n\n```bash\n$ # index starts from 0\n$ perl -le '@nums = (4, \"foo\", 2, \"x\"); print $nums[0]'\n4\n$ # note the use of $ when accessing individual element\n$ perl -le '@nums = (4, \"foo\", 2, \"x\"); print $nums[2]'\n2\n$ # to access elements from end, use -ve index from -1\n$ perl -le '@nums = (4, \"foo\", 2, \"x\"); print $nums[-1]'\nx\n\n$ # index of last element in array\n$ perl -le '@nums = (4, \"foo\", 2, \"x\"); print $#nums'\n3\n$ # size of array, i.e total number of elements\n$ perl -le '@nums = (4, \"foo\", 2, \"x\"); $s=@nums; print $s'\n4\n$ perl -le '@nums = (4, \"foo\", 2, \"x\"); print scalar @nums'\n4\n```\n\n* array slices\n* See also [perldoc - Range Operators](https://perldoc.perl.org/perlop.html#Range-Operators)\n\n```bash\n$ # note the use of @ when accessing more than one element\n$ echo 'a b c d' | perl -lane 'print \"@F[0,-1,2]\"'\na d c\n$ # range operator\n$ echo 'a b c d' | perl -lane 'print \"@F[1..2]\"'\nb c\n$ # rotating elements\n$ echo 'a b c d' | perl -lane 'print \"@F[1..$#F,0]\"'\nb c d a\n\n$ # index needed can be given from another array too\n$ echo 'a b c d' | perl -lane '@i=(3,1); print \"@F[@i]\"'\nd b\n\n$ # easy swapping of columns\n$ perl -lane 'print join \"\\t\", @F[1,0]' fruits.txt\nqty     fruit\n42      apple\n31      banana\n90      fig\n6       guava\n```\n\n* range operator also allows handy initialization\n\n```bash\n$ perl -le '@n = (12..17); print \"@n\"'\n12 13 14 15 16 17\n\n$ perl -le '@n = (l..ad); print \"@n\"'\nl m n o p q r s t u v w x y z aa ab ac ad\n```\n\n<br>\n\n#### <a name=\"iteration-and-filtering\"></a>Iteration and filtering\n\n* See also [stackoverflow - extracting multiline text and performing substitution](https://stackoverflow.com/questions/47653826/awk-extracting-a-data-which-is-on-several-lines/47654406#47654406)\n\n```bash\n$ # foreach will return each value one by one\n$ # can also use 'for' keyword instead of 'foreach'\n$ perl -le 'print $_*2 foreach (12..14)'\n24\n26\n28\n\n$ # iterate using index\n$ perl -le '@x = (a..e); foreach (0..$#x){print $x[$_]}'\na\nb\nc\nd\ne\n\n$ # C-style for loop can be used as well\n$ perl -le '@x = (a..c); for($i=0;$i<=$#x;$i++){print $x[$i]}'\na\nb\nc\n```\n\n* use `grep` for filtering array elements based on a condition\n* See also [unix.stackexchange - extract specific fields and use corresponding header text](https://unix.stackexchange.com/questions/397498/create-lists-of-words-according-to-binary-numbers/397504#397504)\n\n```bash\n$ # as usual, $_ will get the value each iteration\n$ perl -le '$,=\" \"; print grep { /[35]/ } 2..26'\n3 5 13 15 23 25\n$ # alternate syntax\n$ perl -le '$,=\" \"; print grep /[35]/, 2..26'\n3 5 13 15 23 25\n\n$ # to get index instead of matches\n$ perl -le '$,=\" \"; @n=(2..26); print grep {$n[$_]=~/[35]/} 0..$#n'\n1 3 11 13 21 23\n\n$ # compare values\n$ s='23 756 -983 5'\n$ echo \"$s\" | perl -lane 'print join \" \", grep $_<100, @F'\n23 -983 5\n\n$ # filters only those elements with successful substitution\n$ # note that it would modify array elements as well\n$ echo \"$s\" | perl -lane 'print join \" \", grep s/3/E/, @F'\n2E -98E\n```\n\n* more examples\n\n```bash\n$ # filtering column(s) based on header\n$ perl -lane '@i = grep {$F[$_] eq \"Name\"} 0..$#F if $.==1;\n              print @F[@i]' marks.txt\nName\nRaj\nJoel\nMoi\nSurya\nTia\nOm\nAmy\n\n$ cat split.txt\nfoo,1:2:5,baz\nwry,4,look\nfree,3:8,oh\n$ # print line if more than one column has a digit\n$ perl -F: -lane 'print if (grep /\\d/, @F) > 1' split.txt\nfoo,1:2:5,baz\nfree,3:8,oh\n```\n\n* to get random element from array\n\n```bash\n$ s='65 23 756 -983 5'\n$ echo \"$s\" | perl -lane 'print $F[rand @F]'\n5\n$ echo \"$s\" | perl -lane 'print $F[rand @F]'\n23\n$ echo \"$s\" | perl -lane 'print $F[rand @F]'\n-983\n\n$ # in scalar context, size of array gets passed to rand\n$ # rand actually returns a float\n$ # which then gets converted to int index\n```\n\n<br>\n\n#### <a name=\"sorting\"></a>Sorting\n\n* See [perldoc - sort](https://perldoc.perl.org/functions/sort.html) for details\n* `$a` and `$b` are special variables used for sorting, avoid using them as user defined variables\n\n```bash\n$ # by default, sort does string comparison\n$ s='foo baz v22 aimed'\n$ echo \"$s\" | perl -lane 'print join \" \", sort @F'\naimed baz foo v22\n\n$ # same as default sort\n$ echo \"$s\" | perl -lane 'print join \" \", sort {$a cmp $b} @F'\naimed baz foo v22\n$ # descending order, note how $a and $b are switched\n$ echo \"$s\" | perl -lane 'print join \" \", sort {$b cmp $a} @F'\nv22 foo baz aimed\n\n$ # functions can be used for custom sorting\n$ # lc lowercases string, so this sorts case insensitively\n$ perl -lane 'print join \" \", sort {lc $a cmp lc $b} @F' poem.txt\nare red, Roses\nare blue, Violets\nis Sugar sweet,\nAnd are so you.\n```\n\n* sorting characters within word\n\n```bash\n$ echo 'foobar' | perl -F -lane 'print sort @F'\nabfoor\n\n$ cat words.txt\nbot\nart\nare\nboat\ntoe\nflee\nreed\n\n$ # words with characters in ascending order\n$ perl -F -lane 'print if (join \"\", sort @F) eq $_' words.txt\nbot\nart\n\n$ # words with characters in descending order\n$ perl -F -lane 'print if (join \"\", sort {$b cmp $a} @F) eq $_' words.txt\ntoe\nreed\n```\n\n* for numeric comparison, use `<=>` instead of `cmp`\n\n```bash\n$ s='23 756 -983 5'\n$ echo \"$s\" | perl -lane 'print join \" \",sort {$a <=> $b} @F'\n-983 5 23 756\n$ echo \"$s\" | perl -lane 'print join \" \",sort {$b <=> $a} @F'\n756 23 5 -983\n\n$ # sorting strings based on their length\n$ s='floor bat to dubious four'\n$ echo \"$s\" | perl -lane 'print join \":\",sort {length $a <=> length $b} @F'\nto:bat:four:floor:dubious\n```\n\n* sorting columns based on header\n\n```bash\n$ # need to get indexes of order required for header, then use it for all lines\n$ perl -lane '@i = sort {$F[$a] cmp $F[$b]} 0..$#F if $.==1;\n              print join \"\\t\", @F[@i]' marks.txt\nDept    Marks   Name\nECE     53      Raj\nECE     72      Joel\nEEE     68      Moi\nCSE     81      Surya\nEEE     59      Tia\nECE     92      Om\nCSE     67      Amy\n\n$ perl -lane '@i = sort {$F[$b] cmp $F[$a]} 0..$#F if $.==1;\n              print join \"\\t\", @F[@i]' marks.txt\nName    Marks   Dept\nRaj     53      ECE\nJoel    72      ECE\nMoi     68      EEE\nSurya   81      CSE\nTia     59      EEE\nOm      92      ECE\nAmy     67      CSE\n```\n\n**Further Reading**\n\n* [perldoc - How do I sort a hash (optionally by value instead of key)?](https://perldoc.perl.org/perlfaq4.html#How-do-I-sort-a-hash-(optionally-by-value-instead-of-key)%3f)\n* [stackoverflow - sort the keys of a hash by value](https://stackoverflow.com/questions/10901084/how-to-sort-perl-hash-on-values-and-order-the-keys-correspondingly-in-two-array)\n* [stackoverflow - sort only from 2nd field, ignore header](https://stackoverflow.com/questions/48920626/sort-rows-in-csv-file-without-header-first-column)\n* [stackoverflow - sort based on group of lines](https://stackoverflow.com/questions/48925359/sorting-groups-of-lines)\n\n<br>\n\n#### <a name=\"transforming\"></a>Transforming\n\n* shuffling list elements\n\n```bash\n$ s='23 756 -983 5'\n$ # note that this doesn't change the input array\n$ echo \"$s\" | perl -MList::Util=shuffle -lane 'print join \" \", shuffle @F'\n756 23 -983 5\n$ echo \"$s\" | perl -MList::Util=shuffle -lane 'print join \" \", shuffle @F'\n5 756 23 -983\n\n$ # randomizing file contents\n$ perl -MList::Util=shuffle -e 'print shuffle <>' poem.txt\nSugar is sweet,\nAnd so are you.\nViolets are blue,\nRoses are red,\n\n$ # or if shuffle order is known\n$ seq 5 | perl -e '@lines=<>; print @lines[3,1,0,2,4]'\n4\n2\n1\n3\n5\n```\n\n* use `map` to transform every element\n\n```bash\n$ echo '23 756 -983 5' | perl -lane 'print join \" \", map {$_*$_} @F'\n529 571536 966289 25\n$ echo 'a b c' | perl -lane 'print join \",\", map {qq/\"$_\"/} @F'\n\"a\",\"b\",\"c\"\n$ echo 'a b c' | perl -lane 'print join \",\", map {uc qq/\"$_\"/} @F'\n\"A\",\"B\",\"C\"\n\n$ # changing the array itself\n$ perl -le '@s=(4, 245, 12); map {$_*$_} @s; print join \" \", @s'\n4 245 12\n$ perl -le '@s=(4, 245, 12); map {$_ = $_*$_} @s; print join \" \", @s'\n16 60025 144\n\n$ # ASCII int values for each character\n$ echo 'AaBbCc' | perl -F -lane 'print join \" \", map ord, @F'\n65 97 66 98 67 99\n\n$ s='this is a sample sentence'\n$ # shuffle each word, split here converts each element to character array\n$ # join the characters after shuffling with empty string\n$ # finally print each changed element with space as separator\n$ echo \"$s\" | perl -MList::Util=shuffle -lane '$,=\" \";\n                    print map {join \"\", shuffle split//} @F;'\ntshi si a mleasp ncstneee\n```\n\n* fun little unreadable script...\n\n```bash\n$ cat para.txt\nWhy cannot I go back to my ignorant days with wild imaginations and fantasies?\nPerhaps the answer lies in not being able to adapt to my freedom.\nThose little dreams, goal setting, anticipation of results, used to be my world.\nAll joy within the soul and less dependent on outside world.\nBut all these are absent for a long time now.\nHope I can wake those dreams all over again.\n\n$ perl -MList::Util=shuffle -F'/([^a-zA-Z]+)/' -lane '\n        print map {@c=split//; $#c<3 || /[^a-zA-Z]/? $_ :\n              join \"\",$c[0],(shuffle @c[1..$#c-1]),$c[-1]} @F;' para.txt\nWhy coannt I go back to my inoagrnt dyas wtih wild imiaintangos and fatenasis?\nPhearps the awsenr lies in not bieng albe to aadpt to my fedoerm.\nToshe llttie draems, goal stetnig, aaioiciptntn of rtuelss, uesd to be my wrlod.\nAll joy witihn the suol and less dnenepedt on oiduste world.\nBut all tsehe are abenst for a lnog tmie now.\nHpoe I can wkae toshe daemrs all over aiagn.\n```\n\n* reverse array\n* See also [stackoverflow - apply tr and reverse to particular column](https://stackoverflow.com/questions/45571828/execute-bash-command-inside-awk-and-print-command-output/45572038#45572038)\n\n```bash\n$ s='23 756 -983 5'\n$ echo \"$s\" | perl -lane 'print join \" \", reverse @F'\n5 -983 756 23\n\n$ echo 'foobar' | perl -lne 'print reverse split//'\nraboof\n$ # can also use scalar context instead of using split\n$ echo 'foobar' | perl -lne '$x=reverse; print $x'\nraboof\n$ echo 'foobar' | perl -lne 'print scalar reverse'\nraboof\n```\n\n<br>\n\n## <a name=\"miscellaneous\"></a>Miscellaneous\n\n<br>\n\n#### <a name=\"split\"></a>split\n\n* the `-a` command line option uses `split` and automatically saves the results in `@F` array\n* default separator is `\\s+`\n* by default acts on `$_`\n* and by default all splits are performed\n* See also [perldoc - split function](https://perldoc.perl.org/functions/split.html)\n\n```bash\n$ echo 'a 1 b 2 c' | perl -lane 'print $F[2]'\nb\n$ echo 'a 1 b 2 c' | perl -lne '@x=split; print $x[2]'\nb\n$ # temp variable can be avoided by using list context\n$ echo 'a 1 b 2 c' | perl -lne 'print join \":\", (split)[2,-1]'\nb:c\n\n$ # using digits as separator\n$ echo 'a 1 b 2 c' | perl -lne '@x=split /\\d+/; print \":$x[1]:\"'\n: b :\n\n$ # specifying maximum number of splits\n$ echo 'a 1 b 2 c' | perl -lne '@x=split /\\h+/,$_,2; print \"$x[0]:$x[1]:\"'\na:1 b 2 c:\n$ # specifying limit using -F option\n$ echo 'a 1 b 2 c' | perl -F'/\\h+/,$_,2' -lane 'print \"$F[0]:$F[1]:\"'\na:1 b 2 c:\n```\n\n* by default, trailing empty fields are stripped\n* specify a negative value to preserve trailing empty fields\n\n```bash\n$ echo ':123::' | perl -lne 'print scalar split /:/'\n2\n$ echo ':123::' | perl -lne 'print scalar split /:/,$_,-1'\n4\n\n$ echo ':123::' | perl -F: -lane 'print scalar @F'\n2\n$ echo ':123::' | perl -F'/:/,$_,-1' -lane 'print scalar @F'\n4\n```\n\n* to save the separators as well, use capture groups\n\n```bash\n$ echo 'a 1 b 2 c' | perl -lne '@x=split /(\\d+)/; print \"$x[1],$x[3]\"'\n1,2\n$ # or, without the temp variable\n$ echo 'a 1 b 2 c' | perl -lne 'print join \",\", (split /(\\d+)/)[1,3]'\n1,2\n\n$ # same can be done for -F option\n$ echo 'a 1 b 2 c' | perl -F'(\\d+)' -lane 'print \"$F[1],$F[3]\"'\n1,2\n```\n\n* single line to multiple line by splitting a column\n\n```bash\n$ cat split.txt\nfoo,1:2:5,baz\nwry,4,look\nfree,3:8,oh\n\n$ perl -F, -ane 'print join \",\", $F[0],$_,$F[2] for split /:/,$F[1]' split.txt\nfoo,1,baz\nfoo,2,baz\nfoo,5,baz\nwry,4,look\nfree,3,oh\nfree,8,oh\n```\n\n* weird behavior if literal space character is used with `-F` option\n\n```bash\n$ # only one element in @F array\n$ echo 'a 1 b 2 c' | perl -F'/b /' -lane 'print $F[1]'\n\n$ # space not being used by separator\n$ echo 'a 1 b 2 c' | perl -F'b ' -lane 'print $F[1]'\n 2 c\n$ # correct behavior\n$ echo 'a 1 b 2 c' | perl -F'b\\x20' -lane 'print $F[1]'\n2 c\n\n$ # errors out if space used inside character class\n$ echo 'a 1 b 2 c' | perl -F'/b[ ]/' -lane 'print $F[1]'\nUnmatched [ in regex; marked by <-- HERE in m//b[ <-- HERE /.\n$ echo 'a 1 b 2 c' | perl -lne '@x=split /b[ ]/; print $x[1]'\n2 c\n```\n\n<br>\n\n#### <a name=\"fixed-width-processing\"></a>Fixed width processing\n\n```bash\n$ # here 'a' indicates arbitrary binary data\n$ # the number that follows indicates length\n$ # the 'x' indicates characters to ignore, use length after 'x' if needed\n$ # and there are many other formats, see perldoc for details\n$ echo 'b 123 good' | perl -lne '@x = unpack(\"a1xa3xa4\", $_); print $x[0]'\nb\n$ echo 'b 123 good' | perl -lne '@x = unpack(\"a1xa3xa4\", $_); print $x[1]'\n123\n$ echo 'b 123 good' | perl -lne '@x = unpack(\"a1xa3xa4\", $_); print $x[2]'\ngood\n\n$ # unpack not always needed, can simply capture characters needed\n$ echo 'b 123 good' | perl -lne 'print /.{2}(.{3})/'\n123\n$ # or use substr to specify offset (starts from 0) and length\n$ echo 'b 123 good' | perl -lne 'print substr $_, 6, 4'\ngood\n\n$ # substr can also be used for replacing\n$ echo 'b 123 good' | perl -lpe 'substr $_, 2, 3, \"gleam\"'\nb gleam good\n```\n\n**Further Reading**\n\n* [perldoc - tutorial on pack and unpack](https://perldoc.perl.org/perlpacktut.html)\n* [perldoc - substr](https://perldoc.perl.org/functions/substr.html)\n* [stackoverflow - extract columns from a fixed-width format](https://stackoverflow.com/questions/1494611/how-can-i-extract-columns-from-a-fixed-width-format-in-perl)\n* [stackoverflow - build fixed-width template from header](https://stackoverflow.com/questions/4911044/parse-fixed-width-files)\n* [stackoverflow - convert fixed-width to delimited format](https://stackoverflow.com/questions/43734981/display-column-from-empty-column-delimited-space-in-bash)\n\n<br>\n\n#### <a name=\"string-and-file-replication\"></a>String and file replication\n\n```bash\n$ # replicate each line\n$ seq 2 | perl -ne 'print $_ x 2'\n1\n1\n2\n2\n\n$ # replicate a string\n$ perl -le 'print \"abc\" x 5'\nabcabcabcabcabc\n\n$ # works for lists too\n$ perl -le '@x = (3, 2, 1) x 2; print join \" \",@x'\n3 2 1 3 2 1\n\n$ # replicating file\n$ wc -c poem.txt\n65 poem.txt\n$ perl -0777 -ne 'print $_ x 100' poem.txt | wc -c\n6500\n```\n\n* the [perldoc - glob](https://perldoc.perl.org/functions/glob.html) function can be hacked to generate combinations of strings\n\n```bash\n$ # typical use case\n$ # same as: echo *.log\n$ perl -le 'print join \" \", glob q/*.log/'\nreport.log\n$ # same as: echo *.{log,pl}\n$ perl -le 'print join \" \", glob q/*.{log,pl}/'\nreport.log code.pl sub_sq.pl\n\n$ # hacking\n$ # same as: echo {1,3}{a,b}\n$ perl -le '@x=glob q/{1,3}{a,b}/; print \"@x\"'\n1a 1b 3a 3b\n$ # same as: echo {1,3}{1,3}{1,3}\n$ perl -le '@x=glob \"{1,3}\" x 3; print \"@x\"'\n111 113 131 133 311 313 331 333\n```\n\n<br>\n\n#### <a name=\"transliteration\"></a>transliteration\n\n* See `tr` under [perldoc - Quote-Like Operators](https://perldoc.perl.org/perlop.html#Quote-Like-Operators) section for details\n* similar to substitution, by default `tr` acts on `$_` variable and modifies it unless `r` modifier is specified\n* however, characters `$` and `@` are treated as literals - i.e no interpolation\n* similar to `sed`, one can also use `y` instead of `tr`\n\n```bash\n$ # one-to-one mapping of characters, all occurrences are translated\n$ echo 'foo bar cat baz' | perl -pe 'tr/abc/123/'\nfoo 21r 31t 21z\n\n$ # use - to represent a range in ascending order\n$ echo 'Hello World' | perl -pe 'tr/a-zA-Z/n-za-mN-ZA-M/'\nUryyb Jbeyq\n$ echo 'Uryyb Jbeyq' | perl -pe 'tr|a-zA-Z|n-za-mN-ZA-M|'\nHello World\n```\n\n* if arguments are of different lengths\n\n```bash\n$ # when second argument is longer, the extra characters are ignored\n$ echo 'foo bar cat baz' | perl -pe 'tr/abc/1-9/'\nfoo 21r 31t 21z\n\n$ # when first argument is longer\n$ # the last character of second argument gets padded to make it equal\n$ echo 'foo bar cat baz' | perl -pe 'tr/a-z/123/'\n333 213 313 213\n```\n\n* modifiers\n\n```bash\n$ # no padding, absent mappings are deleted\n$ echo 'fob bar cat baz' | perl -pe 'tr/a-z/123/d'\n2 21 31 21\n$ echo 'Hello:123:World' | perl -pe 'tr/a-z//d'\nH:123:W\n\n$ # c modifier complements first argument characters\n$ echo 'Hello:123:World' | perl -lpe 'tr/a-z//cd'\nelloorld\n\n$ # s modifier to keep only one copy of repeated characters\n$ echo 'FFoo seed 11233' | perl -pe 'tr/a-z//s'\nFFo sed 11233\n$ # when replacement is done as well, only replaced characters are squeezed\n$ # unlike 'tr -s' which squeezes characters specified by second argument\n$ echo 'FFoo seed 11233' | perl -pe 'tr/A-Z/a-z/s'\nfoo seed 11233\n\n$ perl -e '$x=\"food\"; $y=$x=~tr/a-z/A-Z/r; print \"x=$x\\ny=$y\\n\"'\nx=food\ny=FOOD\n```\n\n* since `-` is used for character ranges, place it at the start/end to represent it literally\n* similarly, to represent `\\` literally, use `\\\\`\n\n```bash\n$ echo '/foo-bar/baz/report' | perl -pe 'tr/-a-z/_A-Z/'\n/FOO_BAR/BAZ/REPORT\n\n$ echo '/foo-bar/baz/report' | perl -pe 'tr|/-|\\\\_|'\n\\foo_bar\\baz\\report\n```\n\n* return value is number of replacements made\n\n```bash\n$ echo 'Hello there. How are you?' | grep -o '[a-z]' | wc -l\n17\n\n$ echo 'Hello there. How are you?' | perl -lne 'print tr/a-z//'\n17\n```\n\n* unicode examples\n\n```bash\n$ echo 'hello!' | perl -CS -pe 'tr/a-z/\\x{1d5ee}-\\x{1d607}/'\n𝗵𝗲𝗹𝗹𝗼!\n\n$ echo 'How are you?' | perl -Mopen=locale -Mutf8 -pe 'tr/a-zA-Z/𝗮-𝘇𝗔-𝗭/'\n𝗛𝗼𝘄 𝗮𝗿𝗲 𝘆𝗼𝘂?\n```\n\n<br>\n\n#### <a name=\"executing-external-commands\"></a>Executing external commands\n\n* External commands can be issued using `system` function\n* Output would be as usual on `stdout` unless redirected while calling the command\n\n```bash\n$ perl -e 'system(\"echo Hello World\")'\nHello World\n$ # use q operator to avoid interpolation\n$ perl -e 'system q/echo $HOME/'\n/home/learnbyexample\n\n$ perl -e 'system q/wc poem.txt/'\n 4 13 65 poem.txt\n\n$ perl -e 'system q/seq 10 | paste -sd, > out.txt/'\n$ cat out.txt\n1,2,3,4,5,6,7,8,9,10\n\n$ cat f2\nI bought two bananas and three mangoes\n$ echo 'f1,f2,odd.txt' | perl -F, -lane 'system \"cat $F[1]\"'\nI bought two bananas and three mangoes\n```\n\n* return value of `system` will have exit status information or `$?` can be used\n* see [perldoc - system](https://perldoc.perl.org/functions/system.html) for details\n\n```bash\n$ perl -le '$es=system q/ls poem.txt/; print \"$es\"'\npoem.txt\n0\n$ perl -le 'system q/ls poem.txt/; print \"exit status: $?\"'\npoem.txt\nexit status: 0\n\n$ perl -le 'system q/ls xyz.txt/; print \"exit status: $?\"'\nls: cannot access 'xyz.txt': No such file or directory\nexit status: 512\n```\n\n* to save result of external command, use backticks or `qx` operator\n* newline gets saved too, use `chomp` if needed\n\n```bash\n$ perl -e '$lines = `wc -l < poem.txt`; print $lines'\n4\n$ perl -e '$nums = qx/seq 3/; print $nums'\n1\n2\n3\n```\n\n* See also [stackoverflow - difference between backticks, system, exec and open](https://stackoverflow.com/questions/799968/whats-the-difference-between-perls-backticks-system-and-exec)\n\n<br>\n\n## <a name=\"further-reading\"></a>Further Reading\n\n* Manual and related\n    * [perldoc - overview](https://perldoc.perl.org/index-overview.html)\n    * [perldoc - faqs](https://perldoc.perl.org/index-faq.html)\n    * [perldoc - tutorials](https://perldoc.perl.org/index-tutorials.html)\n    * [perldoc - functions](https://perldoc.perl.org/index-functions.html)\n    * [perldoc - special variables](https://perldoc.perl.org/perlvar.html)\n    * [perldoc - perlretut](https://perldoc.perl.org/perlretut.html)\n* Tutorials and Q&A\n    * [Perl one-liners explained](http://www.catonmat.net/series/perl-one-liners-explained)\n    * [perl Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/perl?sort=votes&pageSize=15)\n    * [regex FAQ on SO](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)\n    * [regexone](https://regexone.com/) - interative tutorial\n    * [regexcrossword](https://regexcrossword.com/) - practice by solving crosswords, read 'How to play' section before you start\n* Alternatives\n    * [bioperl](http://bioperl.org/howtos/index.html)\n    * [ruby](https://www.ruby-lang.org/en/)\n    * [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)\n\n"
  },
  {
    "path": "restructure_text.md",
    "content": "# <a name=\"restructure-text\"></a>Restructure text\n\n**Table of Contents**\n\n* [paste](#paste)\n    * [Concatenating files column wise](#concatenating-files-column-wise)\n    * [Interleaving lines](#interleaving-lines)\n    * [Lines to multiple columns](#lines-to-multiple-columns)\n    * [Different delimiters between columns](#different-delimiters-between-columns)\n    * [Multiple lines to single row](#multiple-lines-to-single-row)\n    * [Further reading for paste](#further-reading-for-paste)\n* [column](#column)\n    * [Pretty printing tables](#pretty-printing-tables)\n    * [Specifying different input delimiter](#specifying-different-input-delimiter)\n    * [Further reading for column](#further-reading-for-column)\n* [pr](#pr)\n    * [Converting lines to columns](#converting-lines-to-columns)\n    * [Changing PAGE_WIDTH](#changing-page_width)\n    * [Combining multiple input files](#combining-multiple-input-files)\n    * [Transposing a table](#transposing-a-table)\n    * [Further reading for pr](#further-reading-for-pr)\n* [fold](#fold)\n    * [Examples](#examples)\n    * [Further reading for fold](#further-reading-for-fold)\n\n<br>\n\n## <a name=\"paste\"></a>paste\n\n```bash\n$ paste --version | head -n1\npaste (GNU coreutils) 8.25\n\n$ man paste\nPASTE(1)                         User Commands                        PASTE(1)\n\nNAME\n       paste - merge lines of files\n\nSYNOPSIS\n       paste [OPTION]... [FILE]...\n\nDESCRIPTION\n       Write  lines  consisting  of  the sequentially corresponding lines from\n       each FILE, separated by TABs, to standard output.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"concatenating-files-column-wise\"></a>Concatenating files column wise\n\n* By default, `paste` adds a TAB between corresponding lines of input files\n\n```bash\n$ paste colors_1.txt colors_2.txt\nBlue    Black\nBrown   Blue\nPurple  Green\nRed     Red\nTeal    White\n```\n\n* Specifying a different delimiter using `-d`\n* The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution)\n    * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file\n\n```bash\n$ paste -d, <(seq 5) <(seq 6 10)\n1,6\n2,7\n3,8\n4,9\n5,10\n\n$ # empty cells if number of lines is not same for all input files\n$ # -d\\| can also be used\n$ paste -d'|' <(seq 3) <(seq 4 6) <(seq 7 10)\n1|4|7\n2|5|8\n3|6|9\n||10\n```\n\n* to paste without any character in between, use `\\0` as delimiter\n    * note that `\\0` here doesn't mean the ASCII NUL character\n    * can also use `-d ''` with `GNU paste`\n\n```bash\n$ paste -d'\\0' <(seq 3) <(seq 6 8)\n16\n27\n38\n```\n\n<br>\n\n#### <a name=\"interleaving-lines\"></a>Interleaving lines\n\n* Interleave lines by using newline as delimiter\n\n```bash\n$ paste -d'\\n' <(seq 11 13) <(seq 101 103)\n11\n101\n12\n102\n13\n103\n```\n\n<br>\n\n#### <a name=\"lines-to-multiple-columns\"></a>Lines to multiple columns\n\n* Number of `-` specified determines number of output columns\n* Input lines can be passed only as stdin\n\n```bash\n$ # single column to two columns\n$ seq 10 | paste -d, - -\n1,2\n3,4\n5,6\n7,8\n9,10\n\n$ # single column to five columns\n$ seq 10 | paste -d: - - - - -\n1:2:3:4:5\n6:7:8:9:10\n\n$ # input redirection for file input\n$ paste -d, - - < colors_1.txt\nBlue,Brown\nPurple,Red\nTeal,\n```\n\n* Use `printf` trick if number of columns to specify is too large\n\n```bash\n$ # prompt at end of line not shown for simplicity\n$ printf -- \"- %.s\" {1..5}\n- - - - - \n\n$ seq 10 | paste -d, $(printf -- \"- %.s\" {1..5})\n1,2,3,4,5\n6,7,8,9,10\n```\n\n<br>\n\n#### <a name=\"different-delimiters-between-columns\"></a>Different delimiters between columns\n\n* For more than 2 columns, different delimiter character can be specified - passed as list to `-d` option\n\n```bash\n$ # , is used between 1st and 2nd column\n$ # - is used between 2nd and 3rd column\n$ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9)\n1,4-7\n2,5-8\n3,6-9\n\n$ # re-use list from beginning if not specified for all columns\n$ paste -d',-' <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)\n1,4-7,10\n2,5-8,11\n3,6-9,12\n$ # another example\n$ seq 10 | paste -d':,' - - - - -\n1:2,3:4,5\n6:7,8:9,10\n\n$ # so, with single delimiter, it is just re-used for all columns\n$ paste -d, <(seq 3) <(seq 4 6) <(seq 7 9) <(seq 10 12)\n1,4,7,10\n2,5,8,11\n3,6,9,12\n```\n\n* combination of `-d` and `/dev/null` (empty file) can give multi-character separation between columns\n* If this is too confusing to use, consider [pr](#pr) instead\n\n```bash\n$ paste -d' : ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)\n1 : 4 : 7\n2 : 5 : 8\n3 : 6 : 9\n\n$ # or just use pr instead\n$ pr -mts' : ' <(seq 3) <(seq 4 6) <(seq 7 9)\n1 : 4 : 7\n2 : 5 : 8\n3 : 6 : 9\n\n$ # but paste would allow different delimiters ;)\n$ paste -d' :  - ' <(seq 3) /dev/null /dev/null <(seq 4 6) /dev/null /dev/null <(seq 7 9)\n1 : 4 - 7\n2 : 5 - 8\n3 : 6 - 9\n\n$ # pr would need two invocations\n$ pr -mts' : ' <(seq 3) <(seq 4 6) | pr -mts' - ' - <(seq 7 9)\n1 : 4 - 7\n2 : 5 - 8\n3 : 6 - 9\n```\n\n* example to show using empty file instead of `/dev/null`\n\n```bash\n$ # assuming file named e doesn't exist\n$ touch e\n$ # or use this, will empty contents even if file named e already exists :P\n$ > e\n\n$ paste -d' :  - ' <(seq 3) e e <(seq 4 6) e e <(seq 7 9)\n1 : 4 - 7\n2 : 5 - 8\n3 : 6 - 9\n```\n\n<br>\n\n#### <a name=\"multiple-lines-to-single-row\"></a>Multiple lines to single row\n\n```bash\n$ paste -sd, colors_1.txt\nBlue,Brown,Purple,Red,Teal\n\n$ # multiple files each gets a row\n$ paste -sd: colors_1.txt colors_2.txt\nBlue:Brown:Purple:Red:Teal\nBlack:Blue:Green:Red:White\n\n$ # multiple input files need not have same number of lines\n$ paste -sd, <(seq 3) <(seq 5 9)\n1,2,3\n5,6,7,8,9\n```\n\n* Often used to serialize multiple line output from another command\n\n```bash\n$ sort -u colors_1.txt colors_2.txt | paste -sd,\nBlack,Blue,Brown,Green,Purple,Red,Teal,White\n```\n\n* For multiple character delimiter, post-process if separator is unique or use another tool like `perl`\n\n```bash\n$ seq 10 | paste -sd,\n1,2,3,4,5,6,7,8,9,10\n\n$ # post-process\n$ seq 10 | paste -sd, | sed 's/,/ : /g'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n\n$ # using perl alone\n$ seq 10 | perl -pe 's/\\n/ : / if(!eof)'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n```\n\n<br>\n\n#### <a name=\"further-reading-for-paste\"></a>Further reading for paste\n\n* `man paste` and `info paste` for more options and detailed documentation\n* [paste Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/paste?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"column\"></a>column\n\n```bash\nCOLUMN(1)                 BSD General Commands Manual                COLUMN(1)\n\nNAME\n     column — columnate lists\n\nSYNOPSIS\n     column [-entx] [-c columns] [-s sep] [file ...]\n\nDESCRIPTION\n     The column utility formats its input into multiple columns.  Rows are\n     filled before columns.  Input is taken from file operands, or, by\n     default, from the standard input.  Empty lines are ignored unless the -e\n     option is used.\n...\n```\n\n<br>\n\n#### <a name=\"pretty-printing-tables\"></a>Pretty printing tables\n\n* by default whitespace is input delimiter\n\n```bash\n$ cat dishes.txt\nNorth alootikki baati khichdi makkiroti poha\nSouth appam bisibelebath dosa koottu sevai\nWest dhokla khakhra modak shiro vadapav\nEast handoguri litti momo rosgulla shondesh\n\n$ column -t dishes.txt\nNorth  alootikki  baati         khichdi  makkiroti  poha\nSouth  appam      bisibelebath  dosa     koottu     sevai\nWest   dhokla     khakhra       modak    shiro      vadapav\nEast   handoguri  litti         momo     rosgulla   shondesh\n```\n\n* often useful to get neatly aligned columns from output of another command\n\n```bash\n$ paste fruits.txt price.txt\nFruits  Price\napple   182\nguava   90\nwatermelon      35\nbanana  72\npomegranate     280\n\n$ paste fruits.txt price.txt | column -t\nFruits       Price\napple        182\nguava        90\nwatermelon   35\nbanana       72\npomegranate  280\n```\n\n<br>\n\n#### <a name=\"specifying-different-input-delimiter\"></a>Specifying different input delimiter\n\n* Use `-s` to specify input delimiter\n* Use `-n` to prevent merging empty cells\n    * From `man column` \"This option is a Debian GNU/Linux extension\"\n\n```bash\n$ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13)\n1,5,11\n2,6,12\n3,7,13\n,8,\n,9,\n\n$ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -t\n1  5  11\n2  6  12\n3  7  13\n8\n9\n\n$ paste -d, <(seq 3) <(seq 5 9) <(seq 11 13) | column -s, -nt\n1  5  11\n2  6  12\n3  7  13\n   8  \n   9  \n```\n\n<br>\n\n#### <a name=\"further-reading-for-column\"></a>Further reading for column\n\n* `man column` for more options and detailed documentation\n* [column Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/columns?sort=votes&pageSize=15)\n* More examples [here](http://www.commandlinefu.com/commands/using/column/sort-by-votes)\n\n<br>\n\n## <a name=\"pr\"></a>pr\n\n```bash\n$ pr --version | head -n1\npr (GNU coreutils) 8.25\n\n$ man pr\nPR(1)                            User Commands                           PR(1)\n\nNAME\n       pr - convert text files for printing\n\nSYNOPSIS\n       pr [OPTION]... [FILE]...\n\nDESCRIPTION\n       Paginate or columnate FILE(s) for printing.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n* `Paginate` is not covered, examples related only to `columnate`\n* For example, default invocation on a file would add a header, etc\n\n```bash\n$ # truncated output shown\n$ pr fruits.txt\n\n\n2017-04-21 17:49                    fruits.txt                    Page 1\n\n\nFruits\napple\nguava\nwatermelon\nbanana\npomegranate\n\n```\n\n* Following sections will use `-t` to omit page headers and trailers\n\n<br>\n\n#### <a name=\"converting-lines-to-columns\"></a>Converting lines to columns\n\n* With [paste](#lines-to-multiple-columns), changing input file rows to column(s) is possible only with consecutive lines\n* `pr` can do that as well as split entire file itself according to number of columns needed\n* And `-s` option in `pr` allows multi-character output delimiter\n* As usual, examples to better show the functionalities\n\n```bash\n$ # note how the input got split into two and resulting splits joined by ,\n$ seq 6 | pr -2ts,\n1,4\n2,5\n3,6\n\n$ # note how two consecutive lines gets joined by ,\n$ seq 6 | paste -d, - -\n1,2\n3,4\n5,6\n```\n\n* Default **PAGE_WIDTH** is 72 characters, so each column gets 72 divided by number of columns unless `-s` is used\n\n```bash\n$ # 3 columns, so each column width is 24 characters\n$ seq 9 | pr -3t\n1                       4                       7\n2                       5                       8\n3                       6                       9\n\n$ # using -s, desired delimiter can be specified\n$ seq 9 | pr -3ts' '\n1 4 7\n2 5 8\n3 6 9\n\n$ seq 9 | pr -3ts' : '\n1 : 4 : 7\n2 : 5 : 8\n3 : 6 : 9\n\n$ # default is TAB when using -s option with no arguments\n$ seq 9 | pr -3ts\n1       4       7\n2       5       8\n3       6       9\n```\n\n* Using `-a` to change consecutive rows, similar to `paste`\n\n```bash\n$ seq 8 | pr -4ats:\n1:2:3:4\n5:6:7:8\n\n$ # no output delimiter for empty cells\n$ seq 22 | pr -5ats,\n1,2,3,4,5\n6,7,8,9,10\n11,12,13,14,15\n16,17,18,19,20\n21,22\n\n$ # note output delimiter even for empty cells\n$ seq 22 | paste -d, - - - - -\n1,2,3,4,5\n6,7,8,9,10\n11,12,13,14,15\n16,17,18,19,20\n21,22,,,\n```\n\n<br>\n\n#### <a name=\"changing-page_width\"></a>Changing PAGE_WIDTH\n\n* The default PAGE_WIDTH is 72\n* The formula `(col-1)*len(delimiter) + col` seems to work in determining minimum PAGE_WIDTH required for multiple column output\n    * `col` is number of columns required\n\n```bash\n$ # (36-1)*1 + 36 = 71, so within PAGE_WIDTH limit\n$ seq 74 | pr -36ats,\n1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36\n37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72\n73,74\n$ # (37-1)*1 + 37 = 73, more than default PAGE_WIDTH limit\n$ seq 74 | pr -37ats,\npr: page width too narrow\n```\n\n* Use `-w` to specify a different PAGE_WIDTH\n* The `-J` option turns off truncation\n\n```bash\n$ # (37-1)*1 + 37 = 73\n$ seq 74 | pr -J -w73 -37ats,\n1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37\n38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74\n\n$ # (3-1)*4 + 3 = 11\n$ seq 6 | pr -J -w10 -3ats'::::'\npr: page width too narrow\n$ seq 6 | pr -J -w11 -3ats'::::'\n1::::2::::3\n4::::5::::6\n\n$ # if calculating is difficult, simply use a large number\n$ seq 6 | pr -J -w500 -3ats'::::'\n1::::2::::3\n4::::5::::6\n```\n\n<br>\n\n#### <a name=\"combining-multiple-input-files\"></a>Combining multiple input files\n\n* Use `-m` option to combine multiple files in parallel, similar to `paste`\n\n```bash\n$ # 2 columns, so each column width is 36 characters\n$ pr -mt fruits.txt price.txt\nFruits                              Price\napple                               182\nguava                               90\nwatermelon                          35\nbanana                              72\npomegranate                         280\n\n$ # default is TAB when using -s option with no arguments\n$ pr -mts <(seq 3) <(seq 4 6) <(seq 7 10)\n1       4       7\n2       5       8\n3       6       9\n                10\n\n$ # double TAB as separator\n$ # shell expands $'\\t\\t' before command is executed\n$ pr -mts$'\\t\\t' colors_1.txt colors_2.txt\nBlue            Black\nBrown           Blue\nPurple          Green\nRed             Red\nTeal            White\n```\n\n* For interleaving, specify newline as separator\n\n```bash\n$ pr -mts$'\\n' fruits.txt price.txt\nFruits\nPrice\napple\n182\nguava\n90\nwatermelon\n35\nbanana\n72\npomegranate\n280\n```\n\n<br>\n\n#### <a name=\"transposing-a-table\"></a>Transposing a table\n\n```bash\n$ # delimiter is single character, so easy to use tr to change it to newline\n$ cat dishes.txt\nNorth alootikki baati khichdi makkiroti poha\nSouth appam bisibelebath dosa koottu sevai\nWest dhokla khakhra modak shiro vadapav\nEast handoguri litti momo rosgulla shondesh\n\n$ # 4 columns, so each column width is 18 characters\n$ # $(wc -l < dishes.txt) gives number of columns required\n$ tr ' ' '\\n' < dishes.txt | pr -$(wc -l < dishes.txt)t\nNorth             South             West              East\nalootikki         appam             dhokla            handoguri\nbaati             bisibelebath      khakhra           litti\nkhichdi           dosa              modak             momo\nmakkiroti         koottu            shiro             rosgulla\npoha              sevai             vadapav           shondesh\n```\n\n* Pipe the output to `column` if spacing is too much\n\n```bash\n$ tr ' ' '\\n' < dishes.txt | pr -$(wc -l < dishes.txt)t | column -t\nNorth      South         West     East\nalootikki  appam         dhokla   handoguri\nbaati      bisibelebath  khakhra  litti\nkhichdi    dosa          modak    momo\nmakkiroti  koottu        shiro    rosgulla\npoha       sevai         vadapav  shondesh\n```\n\n<br>\n\n#### <a name=\"further-reading-for-pr\"></a>Further reading for pr\n\n* `man pr` and `info pr` for more options and detailed documentation\n* More examples [here](http://docstore.mik.ua/orelly/unix3/upt/ch21_15.htm)\n\n<br>\n\n## <a name=\"fold\"></a>fold\n\n```bash\n$ fold --version | head -n1\nfold (GNU coreutils) 8.25\n\n$ man fold\nFOLD(1)                          User Commands                         FOLD(1)\n\nNAME\n       fold - wrap each input line to fit in specified width\n\nSYNOPSIS\n       fold [OPTION]... [FILE]...\n\nDESCRIPTION\n       Wrap input lines in each FILE, writing to standard output.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"examples\"></a>Examples\n\n```bash\n$ nl story.txt\n     1\tThe princess of a far away land fought bravely to rescue a travelling group from bandits. And the happy story ends here. Have a nice day.\n     2\tStill here? okay, read on: The prince of Happalakkahuhu wished he could be as brave as his sister and vowed to train harder\n\n$ # default folding width is 80\n$ fold story.txt\nThe princess of a far away land fought bravely to rescue a travelling group from\n bandits. And the happy story ends here. Have a nice day.\nStill here? okay, read on: The prince of Happalakkahuhu wished he could be as br\nave as his sister and vowed to train harder\n\n$ fold story.txt | nl\n     1\tThe princess of a far away land fought bravely to rescue a travelling group from\n     2\t bandits. And the happy story ends here. Have a nice day.\n     3\tStill here? okay, read on: The prince of Happalakkahuhu wished he could be as br\n     4\tave as his sister and vowed to train harder\n```\n\n* `-s` option breaks at spaces to avoid word splitting\n\n```bash\n$ fold -s story.txt\nThe princess of a far away land fought bravely to rescue a travelling group \nfrom bandits. And the happy story ends here. Have a nice day.\nStill here? okay, read on: The prince of Happalakkahuhu wished he could be as \nbrave as his sister and vowed to train harder\n```\n\n* Use `-w` to change default width\n\n```bash\n$ fold -s -w60 story.txt\nThe princess of a far away land fought bravely to rescue a \ntravelling group from bandits. And the happy story ends \nhere. Have a nice day.\nStill here? okay, read on: The prince of Happalakkahuhu \nwished he could be as brave as his sister and vowed to \ntrain harder\n```\n\n<br>\n\n#### <a name=\"further-reading-for-fold\"></a>Further reading for fold\n\n* `man fold` and `info fold` for more options and detailed documentation\n\n"
  },
  {
    "path": "ruby_one_liners.md",
    "content": "<br> <br> <br>\n\n---\n\n:information_source: :information_source: This chapter has been converted into a better formatted ebook - https://learnbyexample.github.io/learn_ruby_oneliners/. The ebook also has content updated for newer version of `ruby`, extra chapter for parsing json/csv/xml, includes exercises, solutions, etc.\n\nFor markdown source and links to buy pdf/epub versions, see: https://github.com/learnbyexample/learn_ruby_oneliners\n\n---\n\n<br> <br> <br>\n\n# <a name=\"ruby-one-liners\"></a>Ruby one liners\n\n**Table of Contents**\n\n* [Executing Ruby code](#executing-ruby-code)\n* [Simple search and replace](#simple-search-and-replace)\n    * [inplace editing](#inplace-editing)\n* [Line filtering](#line-filtering)\n    * [Regular expressions based filtering](#regular-expressions-based-filtering)\n    * [Fixed string matching](#fixed-string-matching)\n    * [Line number based filtering](#line-number-based-filtering)\n* [Field processing](#field-processing)\n    * [Field comparison](#field-comparison)\n    * [Specifying different input field separator](#specifying-different-input-field-separator)\n    * [Specifying different output field separator](#specifying-different-output-field-separator)\n* [Changing record separators](#changing-record-separators)\n    * [Input record separator](#input-record-separator)\n    * [Output record separator](#output-record-separator)\n* [Multiline processing](#multiline-processing)\n* [Ruby regular expressions](#ruby-regular-expressions)\n    * [gotchas and tricks](#gotchas-and-tricks)\n    * [Backslash sequences](#backslash-sequences)\n    * [Non-greedy quantifier](#non-greedy-quantifier)\n    * [Lookarounds](#lookarounds)\n    * [Special capture groups](#special-capture-groups)\n    * [Modifiers](#modifiers)\n    * [Code in replacement section](#code-in-replacement-section)\n    * [Quoting metacharacters](#quoting-metacharacters)\n* [Two file processing](#two-file-processing)\n    * [Comparing whole lines](#comparing-whole-lines)\n    * [Comparing specific fields](#comparing-specific-fields)\n    * [Line number matching](#line-number-matching)\n* [Creating new fields](#creating-new-fields)\n* [Multiple file input](#multiple-file-input)\n* [Dealing with duplicates](#dealing-with-duplicates)\n    * [using uniq method](#using-uniq-method)\n* [Lines between two REGEXPs](#lines-between-two-regexps)\n    * [All unbroken blocks](#all-unbroken-blocks)\n    * [Specific blocks](#specific-blocks)\n    * [Broken blocks](#broken-blocks)\n* [Array operations](#array-operations)\n    * [Filtering](#filtering)\n    * [Sorting](#sorting)\n    * [Transforming](#transforming)\n* [Miscellaneous](#miscellaneous)\n    * [split](#split)\n    * [Fixed width processing](#fixed-width-processing)\n    * [String and file replication](#string-and-file-replication)\n    * [transliteration](#transliteration)\n    * [Executing external commands](#executing-external-commands)\n* [Further Reading](#further-reading)\n\n<br>\n\n```\n$ ruby --version\nruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]\n\n$ man ruby\nRUBY(1)                Ruby Programmers Reference Guide                RUBY(1)\n\nNAME\n     ruby — Interpreted object-oriented scripting language\n\nSYNOPSIS\n     ruby [--copyright] [--version] [-SUacdlnpswvy] [-0[octal]] [-C directory]\n          [-E external[:internal]] [-F[pattern]] [-I directory] [-K[c]]\n          [-T[level]] [-W[level]] [-e command] [-i[extension]] [-r library]\n          [-x[directory]] [--{enable|disable}-FEATURE] [--dump=target]\n          [--verbose] [--] [program_file] [argument ...]\n\nDESCRIPTION\n     Ruby is an interpreted scripting language for quick and easy object-ori‐\n     ented programming.  It has many features to process text files and to do\n     system management tasks (like in Perl).  It is simple, straight-forward,\n     and extensible.\n\n     If you want a language for easy object-oriented programming, or you don't\n     like the Perl ugliness, or you do like the concept of LISP, but don't\n     like too many parentheses, Ruby might be your language of choice.\n...\n```\n\n**Prerequisites and notes**\n\n* familiarity with programming concepts like variables, printing, control structures, arrays, etc\n* familiarity with regular expressions\n* this tutorial is primarily focussed on short programs that are easily usable from command line, similar to using `grep`, `sed`, `awk`, `perl` etc\n* unless otherwise specified, consider input as ASCII encoded text only\n* this is an attempt to translate [Perl chapter](./perl_the_swiss_knife.md) to `ruby`, I don't have prior experience of using `ruby`\n\n<br>\n\n## <a name=\"executing-ruby-code\"></a>Executing Ruby code\n\n* One way is to put code in a file and use `ruby` command with filename as argument\n    * another is to use [shebang](https://en.wikipedia.org/wiki/Shebang_(Unix)) at beginning of script, make the file executable and directly run it\n* For short programs, one can use `-e` commandline option to provide code from command line itself\n    * this entire chapter is about using `ruby` this way from commandline\n\n```bash\n$ cat code.rb\nprint \"Hello Ruby\\n\"\n$ ruby code.rb\nHello Ruby\n\n$ # same as: perl -e 'print \"Hello Perl\\n\"'\n$ ruby -e 'print \"Hello Ruby\\n\"'\nHello Ruby\n\n$ # multiple statements can be issued separated by ;\n$ # puts adds newline character if input doesn't end with a newline\n$ # similar to: perl -E '$x=25; $y=12; say $x**$y'\n$ ruby -e 'x=25; y=12; puts x**y'\n59604644775390625\n```\n\n**Further Reading**\n\n* `ruby -h` for summary of options\n    * [explainshell](https://explainshell.com/explain?cmd=ruby+-F+-l+-anpe+-i+-0) - to quickly get information without having to traverse through the docs\n* [ruby-lang documentation](https://www.ruby-lang.org/en/documentation/) - manuals, tutorials and references\n\n<br>\n\n## <a name=\"simple-search-and-replace\"></a>Simple search and replace\n\n* More detailed examples with regular expressions will be covered in later sections\n* Just like other text processing commands, `ruby` will automatically loop over input line by line when `-n` or `-p` option is used\n    * like `sed`, the `-n` option won't print the record\n    * `-p` will print the record, including any changes made\n    * default record separator is newline character\n    * `$_` will contain the input record content, including the record separator (like `perl` and unlike `sed/awk`)\n* and similar to other commands, `ruby` will work with both stdin and file input\n    * See other chapters for examples of [seq](./miscellaneous.md#seq), [paste](./restructure_text.md#paste), etc\n\n```bash\n$ # sample stdin data\n$ seq 10 | paste -sd,\n1,2,3,4,5,6,7,8,9,10\n\n$ # change only first ',' to ' : '\n$ # same as: perl -pe 's/,/ : /'\n$ seq 10 | paste -sd, | ruby -pe 'sub(/,/, \" : \")'\n1 : 2,3,4,5,6,7,8,9,10\n\n$ # change all ',' to ' : '\n$ # same as: perl -pe 's/,/ : /g'\n$ seq 10 | paste -sd, | ruby -pe 'gsub(/,/, \" : \")'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n\n$ # sub(/,/, \" : \") is shortcut for $_.sub!(/,/, \" : \")\n$ # gsub(/,/, \" : \") is shortcut for $_.gsub!(/,/, \" : \")\n$ # sub! and gsub! do inplace changing\n$ # sub and gsub returns the result, similar to perl's s///r modifier\n$ # () is optional, sub /,/, \" : \" can be used instead of sub(/,/, \" : \")\n```\n\n<br>\n\n#### <a name=\"inplace-editing\"></a>inplace editing\n\n```bash\n$ cat greeting.txt\nHi there\nHave a nice day\n\n$ # original file gets preserved in 'greeting.txt.bkp'\n$ # same as: perl -i.bkp -pe 's/Hi/Hello/' greeting.txt\n$ ruby -i.bkp -pe 'sub(/Hi/, \"Hello\")' greeting.txt\n$ cat greeting.txt\nHello there\nHave a nice day\n\n$ # use empty argument to -i with caution, changes made cannot be undone\n$ ruby -i -pe 'sub(/nice day/, \"safe journey\")' greeting.txt\n$ cat greeting.txt\nHello there\nHave a safe journey\n```\n\n* Multiple input files are treated individually and changes are written back to respective files\n\n```bash\n$ cat f1\nI ate 3 apples\n$ cat f2\nI bought two bananas and 3 mangoes\n\n$ # same as: perl -i.bkp -pe 's/3/three/' f1 f2\n$ ruby -i.bkp -pe 'sub(/3/, \"three\")' f1 f2\n$ cat f1\nI ate three apples\n$ cat f2\nI bought two bananas and three mangoes\n```\n\n**Further Reading**\n\n* [ruby-doc: Pre-defined variables](https://ruby-doc.org/core-2.5.0/doc/globals_rdoc.html#label-Pre-defined+variables) for explanation on `$_` and other such special variables\n* [ruby-doc: gsub](https://ruby-doc.org/core-2.5.0/String.html#method-i-gsub) for `gsub` syntax details\n\n<br>\n\n## <a name=\"line-filtering\"></a>Line filtering\n\n<br>\n\n#### <a name=\"regular-expressions-based-filtering\"></a>Regular expressions based filtering\n\n* one way is to use `variable =~ /REGEXP/FLAGS` to check for a match\n    * use `variable !~ /REGEXP/FLAGS` for negated match\n    * by default acts on `$_` if variable is not specified\n    * see [ruby-doc: Regexp](https://ruby-doc.org/core-2.5.0/Regexp.html) for documentation\n* as we need to print only selective lines, use `-n` option\n    * by default, contents of `$_` will be printed if no argument is passed to `print`\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # same as: perl -ne 'print if /^[RS]/' poem.txt\n$ # /^[RS]/ is shortcut for $_ =~ /^[RS]/\n$ ruby -ne 'print if /^[RS]/' poem.txt\nRoses are red,\nSugar is sweet,\n\n$ # same as: perl -ne 'print if /and/i' poem.txt\n$ ruby -ne 'print if /and/i' poem.txt\nAnd so are you.\n\n$ # same as: perl -ne 'print if !/are/' poem.txt\n$ # !/are/ is shortcut for $_ !~ /are/\n$ ruby -ne 'print if !/are/' poem.txt\nSugar is sweet,\n\n$ # same as: perl -ne 'print if /are/ && !/so/' poem.txt\n$ ruby -ne 'print if /are/ && !/so/' poem.txt\nRoses are red,\nViolets are blue,\n```\n\n* using different delimiter\n* quoting from [ruby-doc: Percent Strings](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Percent+Strings)\n\n> If you are using “(”, “[”, “{”, “<” you must close it with “)”, “]”, “}”, “>” respectively. You may use most other non-alphanumeric characters for percent string delimiters such as “%”, “|”, “^”, etc.\n\n```bash\n$ cat paths.txt\n/foo/a/report.log\n/foo/y/power.log\n/foo/abc/errors.log\n\n$ # same as: perl -ne 'print if /\\/foo\\/a\\//' paths.txt\n$ ruby -ne 'print if /\\/foo\\/a\\//' paths.txt\n/foo/a/report.log\n\n$ # same as: perl -ne 'print if m#/foo/a/#' paths.txt\n$ ruby -ne 'print if %r#/foo/a/#' paths.txt\n/foo/a/report.log\n\n$ # same as: perl -ne 'print if !m#/foo/a/#' paths.txt\n$ ruby -ne 'print if !%r#/foo/a/#' paths.txt\n/foo/y/power.log\n/foo/abc/errors.log\n```\n\n<br>\n\n#### <a name=\"fixed-string-matching\"></a>Fixed string matching\n\n* To match strings literally, use `include?` method\n\n```bash\n$ echo 'int a[5]' | ruby -ne 'print if /a[5]/'\n$ echo 'int a[5]' | ruby -ne 'print if $_.include?(\"a[5]\")'\nint a[5]\n\n$ # however, string within double quotes gets interpolated\n$ ruby -e 'a=5; puts \"value of a:\\t#{a}\"'\nvalue of a:     5\n$ # use %q (covered later) to specify single quoted string\n$ echo 'int #{a}' | ruby -ne 'print if $_.include?(%q/#{a}/)'\nint #{a}\n$ # or pass the string as environment variable\n$ echo 'int #{a}' | s='#{a}' ruby -ne 'print if $_.include?(ENV[\"s\"])'\nint #{a}\n```\n\n* restricting match to start/end of line\n\n```bash\n$ cat eqns.txt\na=b,a-b=c,c*d\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ # start of line\n$ s='a+b' ruby -ne 'print if $_.start_with?(ENV[\"s\"])' eqns.txt\na+b,pi=3.14,5e12\n\n$ # end of line\n$ # -l option is needed to remove record separator (covered later)\n$ s='a+b' ruby -lne 'print if $_.end_with?(ENV[\"s\"])' eqns.txt\ni*(t+9-g)/8,4-a+b\n```\n\n* `index` method returns matching position (starts at 0) and nil if not found\n    * supports both string and regexp\n    * optional 2nd argument allows to specify offset to start searching\n* See [ruby-doc: index](https://ruby-doc.org/core-2.5.0/String.html#method-i-index) for details\n\n```bash\n$ # passing string\n$ ruby -ne 'print if $_.index(\"a+b\")' eqns.txt\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n$ ruby -ne 'print if $_.index(\"a+b\")==0' eqns.txt\na+b,pi=3.14,5e12\n\n$ # passing regexp\n$ ruby -ne 'print if $_.index(/[+*]/)<5' eqns.txt\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ s='a+b' ruby -ne 'print if $_.index(ENV[\"s\"], 1)' eqns.txt\ni*(t+9-g)/8,4-a+b\n```\n\n<br>\n\n#### <a name=\"line-number-based-filtering\"></a>Line number based filtering\n\n* special variable `$.` contains total records read so far, similar to `NR` in `awk`\n    * as far as I've checked the docs, there's no equivalent of awk's `FNR`\n* See also [ruby-doc: eof](https://ruby-doc.org/core-2.5.0/IO.html#method-i-eof)\n\n```bash\n$ # print 2nd line\n$ # same as: perl -ne 'print if $.==2' poem.txt\n$ ruby -ne 'print if $.==2' poem.txt\nViolets are blue,\n\n$ # print 2nd and 4th line\n$ # same as: perl -ne 'print if $.==2 || $.==4' poem.txt\n$ # can also use: ruby -ne 'print if [2, 4].include?($.)' poem.txt\n$ ruby -ne 'print if $.==2 || $.==4' poem.txt\nViolets are blue,\nAnd so are you.\n\n$ # print last line\n$ # same as: perl -ne 'print if eof' poem.txt\n$ # $< is like filehandle for input files/stdin given from commandline\n$ ruby -ne 'print if $<.eof' poem.txt\nAnd so are you.\n```\n\n* for large input, use `exit` to avoid unnecessary record processing\n* See [ruby-doc: Control Expressions](https://ruby-doc.org/core-2.5.0/doc/syntax/control_expressions_rdoc.html) for syntax details\n\n```bash\n$ # same as: perl -ne 'if($.==234){print; exit}'\n$ seq 14323 14563435 | ruby -ne 'if $.==234 then print; exit end'\n14556\n$ # can also group the statements in ()\n$ seq 14323 14563435 | ruby -ne '(print; exit) if $.==234'\n14556\n\n$ # mimicking head command\n$ # same as: head -n3 and sed '3q' or perl -pe 'exit if $.>3'\n$ seq 14 25 | ruby -pe 'exit if $.>3'\n14\n15\n16\n\n$ # same as: sed '3Q' and perl -pe 'exit if $.==3'\n$ seq 14 25 | ruby -pe 'exit if $.==3'\n14\n15\n```\n\n* selecting range of lines\n* See [ruby-doc: Range](https://ruby-doc.org/core-2.5.0/Range.html) for syntax details\n\n```bash\n$ # in this context, the range is compared against $.\n$ # same as: perl -ne 'print if 3..5'\n$ seq 14 25 | ruby -ne 'print if 3..5'\n16\n17\n18\n\n$ # selecting from particular line number to end of input\n$ # same as: perl -ne 'print if $.>=10'\n$ seq 14 25 | ruby -ne 'print if $.>=10'\n23\n24\n25\n```\n\n<br>\n\n## <a name=\"field-processing\"></a>Field processing\n\n* `-a` option will auto-split each input record based on one or more continuous white-space\n    * similar to default behavior in `awk` and same as `perl -a`\n    * See also [split](#split) section\n* Special variable array `$F` will contain all the elements, indexing starts from 0\n    * negative indexing is also supported, `-1` gives last element, `-2` gives last-but-one and so on\n    * see [Array operations](#array-operations) section for examples on array usage\n\n```bash\n$ cat fruits.txt\nfruit   qty\napple   42\nbanana  31\nfig     90\nguava   6\n\n$ # print only first field, indexing starts from 0\n$ # same as: perl -lane 'print $F[0]' fruits.txt\n$ ruby -ane 'puts $F[0]' fruits.txt\nfruit\napple\nbanana\nfig\nguava\n\n$ # print only second field\n$ # same as: perl -lane 'print $F[1]' fruits.txt\n$ ruby -ane 'puts $F[1]' fruits.txt\nqty\n42\n31\n90\n6\n```\n\n* by default, leading and trailing whitespaces won't be considered when splitting the input record\n    * same as `awk`'s default behavior and `perl -a`\n\n```bash\n$ printf ' a    ate b\\tc   \\n'\n a    ate b     c\n$ printf ' a    ate b\\tc   \\n' | ruby -ane 'puts $F[0]'\na\n$ printf ' a    ate b\\tc   \\n' | ruby -ane 'puts $F[-1]'\nc\n\n$ # number of elements\n$ printf ' a    ate b\\tc   \\n' | ruby -ane 'puts $F.length'\n4\n```\n\n<br>\n\n#### <a name=\"field-comparison\"></a>Field comparison\n\n* operators `=`, `!=`, `<`, etc will work for both string/numeric comparison\n* unlike `perl`, numeric comparison for text requires converting to appropriate numeric format\n    * See [ruby-doc: string methods](https://ruby-doc.org/core-2.5.0/String.html#method-i-to_c) for details\n\n```bash\n$ # if first field exactly matches the string 'apple'\n$ # same as: perl -lane 'print $F[1] if $F[0] eq \"apple\"' fruits.txt\n$ ruby -ane 'puts $F[1] if $F[0] == \"apple\"' fruits.txt\n42\n\n$ # print first field if second field > 35 (excluding header)\n$ # same as: perl -lane 'print $F[0] if $F[1]>35 && $.>1' fruits.txt\n$ ruby -ane 'puts $F[0] if $F[1].to_i > 35 && $.>1' fruits.txt\napple\nfig\n\n$ # print header and lines with qty < 35\n$ # same as: perl -ane 'print if $F[1]<35 || $.==1' fruits.txt\n$ ruby -ane 'print if $F[1].to_i < 35 || $.==1' fruits.txt\nfruit   qty\nbanana  31\nguava   6\n\n$ # if first field does NOT contain 'a'\n$ # same as: perl -ane 'print if $F[0] !~ /a/' fruits.txt\n$ ruby -ane 'print if $F[0] !~ /a/' fruits.txt\nfruit   qty\nfig     90\n```\n\n<br>\n\n#### <a name=\"specifying-different-input-field-separator\"></a>Specifying different input field separator\n\n* by using `-F` command line option\n\n```bash\n$ # second field where input field separator is :\n$ # same as: perl -F: -lane 'print $F[1]'\n$ echo 'foo:123:bar:789' | ruby -F: -ane 'puts $F[1]'\n123\n\n$ # last field, same as: perl -F: -lane 'print $F[-1]'\n$ echo 'foo:123:bar:789' | ruby -F: -ane 'puts $F[-1]'\n789\n$ # second last field, perl -F: -lane 'print $F[-2]'\n$ echo 'foo:123:bar:789' | ruby -F: -ane 'puts $F[-2]'\nbar\n\n$ # second and last field, same as: perl -F: -lane 'print \"$F[1] $F[-1]\"'\n$ echo 'foo:123:bar:789' | ruby -F: -ane 'puts \"#{$F[1]} #{$F[-1]}\"'\n123 789\n\n$ # use quotes to avoid clashes with shell special characters\n$ echo 'one;two;three;four' | ruby -F';' -ane 'puts $F[2]'\nthree\n```\n\n* last element of `$F` array will contain the record separator as well\n    * note that default `-a` option without `-F` won't have this issue as whitespaces at start/end are stripped\n* it doesn't make visual difference when `puts` is used as it adds newline only if not already present\n* if the record separator is not desired, use `-l` option to remove the record separator from input\n\n```bash\n$ echo 'foo 123' | ruby -ane 'puts \"#{$F[-1]}xyz\"'\n123xyz\n\n$ echo 'foo:123:bar:789' | ruby -F: -ane 'puts \"#{$F[-1]}a\"'\n789\na\n$ echo 'foo:123:bar:789' | ruby -F: -lane 'puts \"#{$F[-1]}a\"'\n789a\n```\n\n* Regular expressions based input field separator\n\n```bash\n$ # same as: perl -F'\\d+' -lane 'print $F[1]'\n$ echo 'Sample123string54with908numbers' | ruby -F'\\d+' -ane 'puts $F[1]'\nstring\n\n$ # first field will be empty as there is nothing before '{'\n$ echo '{foo}   bar=baz' | ruby -F'[{}= ]+' -ane 'puts $F[0]'\n\n$ echo '{foo}   bar=baz' | ruby -F'[{}= ]+' -ane 'puts $F[1]'\nfoo\n$ echo '{foo}   bar=baz' | ruby -F'[{}= ]+' -ane 'puts $F[2]'\nbar\n$ echo '{foo}   bar=baz' | ruby -F'[{}= ]+' -ane 'puts $F[-1]'\nbaz\n```\n\n* to process individual characters, simply use indexing on input string\n* See [ruby-doc: Encoding](https://ruby-doc.org/core-2.5.0/Encoding.html) for details on handling different string encodings\n\n```bash\n$ # same as: perl -F -lane 'print $F[0]'\n$ echo 'apple' | ruby -ne 'puts $_[0]'\na\n\n$ # if needed, chomp the record separator using -l\n$ # same as: perl -F -lane 'print $F[-1]'\n$ echo 'apple' | ruby -lne 'puts $_[-1]'\ne\n\n$ ruby -e 'puts Encoding.default_external'\nUTF-8\n$ printf 'hi👍 how are you?' | ruby -ne 'puts $_[2]'\n👍\n$ # use -E option to explicitly specify external/internal encodings\n$ printf 'hi👍 how are you?' | ruby -E UTF-8:UTF-8 -ne 'puts $_[2]'\n👍\n```\n\n<br>\n\n#### <a name=\"specifying-different-output-field-separator\"></a>Specifying different output field separator\n\n* use `$,` to change separator between `print` arguments\n    * could be remembered easily by noting that `,` is used to separate `print` arguments\n    * note that `$,` doesn't affect `puts` which always uses newline as separator\n* the `-l` option is useful here in more than one way\n    * it removes input record separator\n    * and appends the record separator to `print` output\n\n```bash\n$ # by default, the various arguments are concatenated\n$ echo 'foo:123:bar:789' | ruby -F: -lane 'print $F[1], $F[-1]'\n123789\n\n$ # change $, if different separator is needed\n$ # same as: perl -F: -lane '$,=\" \"; print $F[1], $F[-1]'\n$ echo 'foo:123:bar:789' | ruby -F: -lane '$,=\" \"; print $F[1], $F[-1]'\n123 789\n$ echo 'foo:123:bar:789' | ruby -F: -lane '$,=\"-\"; print $F[1], $F[-1]'\n123-789\n\n$ # array's join method also uses $,\n$ # same as: perl -F: -lane '$,=\" - \"; print @F'\n$ echo 'foo:123:bar:789' | ruby -F: -lane '$,=\" - \"; print $F.join'\nfoo - 123 - bar - 789\n$ # or pass the separator as argument to join method\n$ echo 'foo:123:bar:789' | ruby -F: -lane 'print $F.join(\" - \")'\nfoo - 123 - bar - 789\n$ # or the equivalent\n$ echo 'foo:123:bar:789' | ruby -F: -lane 'print $F * \" - \"'\nfoo - 123 - bar - 789\n```\n\n* use `BEGIN` if same separator is to be used for all lines\n    * statements inside `BEGIN` are executed before processing any input text\n\n```bash\n$ # same as: perl -lane 'BEGIN{$,=\",\"} print @F' fruits.txt\n$ ruby -lane 'BEGIN{$,=\",\"}; print $F.join' fruits.txt\nfruit,qty\napple,42\nbanana,31\nfig,90\nguava,6\n```\n\n<br>\n\n## <a name=\"changing-record-separators\"></a>Changing record separators\n\n<br>\n\n#### <a name=\"input-record-separator\"></a>Input record separator\n\n* by default, newline character is used as input record separator\n* use `$/` to specify a different input record separator\n    * unlike `gawk`, only string can be used, no regular expressions\n* for single character separator, can also use `-0` command line option which accepts octal value as argument\n* if `-l` option is also used\n    * input record separator will be chomped from input record\n        * earlier versions used `chop` instead of `chomp`. See [bugs.ruby-lang.org 12926](https://bugs.ruby-lang.org/issues/12926)\n    * in addition, output record separator(ORS) will get whatever is current value of input record separator\n    * so, order of `-l`, `-0` and/or `$/` usage becomes important\n\n```bash\n$ s='this is a sample string'\n\n$ # space as input record separator, printing all records\n$ # ORS is newline as -l is used before $/ gets changed\n$ # same as: perl -lne 'BEGIN{$/=\" \"} print \"$. $_\"'\n$ printf \"$s\" | ruby -lne 'BEGIN{$/=\" \"}; print \"#{$.} #{$_}\"'\n1 this\n2 is\n3 a\n4 sample\n5 string\n\n$ # print all records containing 'a'\n$ # same as: perl -l -0040 -ne 'print if /a/'\n$ printf \"$s\" | ruby -l -0040 -ne 'print if /a/'\na\nsample\n\n$ # if the order is changed, ORS will be space, not newline\n$ printf \"$s\" | ruby -0040 -l -ne 'print if /a/'\na sample \n```\n\n* `-0` option used without argument will use the ASCII NUL character as input record separator\n* `-0777` will cause entire file to be slurped\n\n```bash\n$ printf 'foo\\0bar\\0' | cat -A\nfoo^@bar^@$\n$ # same as: perl -l -0 -ne 'print'\n$ # could be golfed to: ruby -l0pe ''\n$ printf 'foo\\0bar\\0' | ruby -l -0 -ne 'print'\nfoo\nbar\n\n$ # replace first newline with '. '\n$ # same as: perl -0777 -pe 's/\\n/. /' greeting.txt\n$ ruby -0777 -pe 'sub(/\\n/, \". \")' greeting.txt\nHello there. Have a safe journey\n```\n\n* for paragraph mode (two more more consecutive newline characters), use `-00` or assign empty string to `$/`\n\nConsider the below sample file\n\n```bash\n$ cat sample.txt\nHello World\n\nGood day\nHow are you\n\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\nMuch ado about nothing\nHe he he\n```\n\n* again, input record will have the separator too and using `-l` will chomp it\n* however, if more than two consecutive newline characters separate the paragraphs, only two newlines will be preserved and the rest discarded\n    * use `$/=\"\\n\\n\"` to avoid this behavior\n\n```bash\n$ # print all paragraphs containing 'it'\n$ # same as: perl -00 -ne 'print if /it/' sample.txt\n$ ruby -00 -ne 'print if /it/' sample.txt\nJust do-it\nBelieve it\n\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n\n$ # based on number of lines in each paragraph\n$ # same as: perl -F'\\n' -00 -ane 'print if $#F==0' sample.txt\n$ ruby -F'\\n' -00 -ane 'print if $F.length==1' sample.txt\nHello World\n\n```\n\n* Re-structuring paragraphs\n\n```bash\n$ # same as: perl -F'\\n' -l -00 -ane 'print join \". \", @F' sample.txt\n$ ruby -F'\\n' -l -00 -ane 'print $F.join(\". \")' sample.txt\nHello World\nGood day. How are you\nJust do-it. Believe it\nToday is sunny. Not a bit funny. No doubt you like it too\nMuch ado about nothing. He he he\n```\n\n* multi-character separator\n\n```bash\n$ cat report.log\nblah blah\nError: something went wrong\nmore blah\nwhatever\nError: something surely went wrong\nsome text\nsome more text\nblah blah blah\n\n$ # number of records, same as: perl -lne 'BEGIN{$/=\"Error:\"} print $. if eof'\n$ ruby -ne 'BEGIN{$/=\"Error:\"}; puts $. if $<.eof' report.log\n3\n$ # print first record, same as: perl -lne 'BEGIN{$/=\"Error:\"} print if $.==1'\n$ ruby -lne 'BEGIN{$/=\"Error:\"}; print if $.==1' report.log\nblah blah\n\n$ # print a record if it contains given string\n$ # same as: perl -lne 'BEGIN{$/=\"Error:\"} print \"$/$_\" if /surely/'\n$ ruby -lne 'BEGIN{$/=\"Error:\"}; print $/,$_ if /surely/' report.log\nError: something surely went wrong\nsome text\nsome more text\nblah blah blah\n\n```\n\n* Joining lines based on specific end of line condition\n\n```bash\n$ cat msg.txt\nHello there.\nIt will rain to-\nday. Have a safe\nand pleasant jou-\nrney.\n\n$ # same as: perl -pe 'BEGIN{$/=\"-\\n\"} chomp' msg.txt\n$ ruby -pe 'BEGIN{$/=\"-\\n\"}; chomp' msg.txt\nHello there.\nIt will rain today. Have a safe\nand pleasant journey.\n```\n\n<br>\n\n#### <a name=\"output-record-separator\"></a>Output record separator\n\n* use `$\\` to specify a different output record separator\n    * applies to `print` but not `puts`\n\n```bash\n$ # note that despite not setting $\\, output has newlines\n$ # because the input record still has the input record separator\n$ seq 3 | ruby -ne 'print'\n1\n2\n3\n$ # same as: perl -ne 'BEGIN{$\\=\"\\n\"} print'\n$ seq 3 | ruby -ne 'BEGIN{$\\=\"\\n\"}; print'\n1\n\n2\n\n3\n\n$ seq 2 | ruby -ne 'BEGIN{$\\=\"---\\n\"}; print'\n1\n---\n2\n---\n```\n\n* dynamically changing output record separator\n* **Note:** except `nil` and `false`, all other values evaluate to `true`\n    * `0`, empty string/array/etc evaluate to `true`\n\n```bash\n$ # note the use of -l to chomp the input record separator\n$ # same as: perl -lpe '$\\ = $.%2 ? \" \" : \"\\n\"'\n$ seq 6 | ruby -lpe '$\\ = $.%2!=0 ? \" \" : \"\\n\"'\n1 2\n3 4\n5 6\n\n$ # -l also sets the output record separator\n$ # but gets overridden by $\\\n$ # same as: perl -lpe '$\\ = $.%3 ? \"-\" : \"\\n\"'\n$ seq 6 | ruby -lpe '$\\ = $.%3!=0 ? \"-\" : \"\\n\"'\n1-2-3\n4-5-6\n```\n\n<br>\n\n## <a name=\"multiline-processing\"></a>Multiline processing\n\n* Processing consecutive lines\n* to keep the one-liner short, global variables(`$` prefix) are used here\n    * See [ruby-doc: Global variables](https://ruby-doc.org/core-2.5.0/doc/syntax/assignment_rdoc.html#label-Global+Variables) for syntax details\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ # match two consecutive lines\n$ # same as: perl -ne 'print $p,$_ if /is/ && $p=~/are/; $p=$_' poem.txt\n$ ruby -ne 'print $p,$_ if /is/ && $p=~/are/; $p=$_' poem.txt\nViolets are blue,\nSugar is sweet,\n$ # if only the second line is needed\n$ ruby -ne 'print if /is/ && $p=~/are/; $p=$_' poem.txt\nSugar is sweet,\n\n$ # print if line matches a condition as well as condition for next 2 lines\n$ ruby -ne 'print $p2 if /is/ && $p1=~/blue/ && $p2=~/red/;\n            $p2=$p1; $p1=$_' poem.txt\nRoses are red,\n```\n\nConsider this sample input file\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\n* extracting lines around matching line\n* **Note**\n    * default uninitialized value is `nil`, has to be explicitly converted for comparison\n    * no auto increment/decrement operators, can use `+=1` and `-=1`\n\n\n```bash\n$ ruby -le 'print $a'\n\n$ ruby -le 'print $a.to_i'\n0\n\n$ # print matching line and n-1 lines following the matched line\n$ # same as: perl -ne '$n=2 if /BEGIN/; print if $n && $n--' range.txt\n$ # can also use: ruby -ne 'BEGIN{n=0}; n=2 if /BEGIN/; print if n>0 && n-=1'\n$ ruby -ne '$n=2 if /BEGIN/; print if $n.to_i>0 && $n-=1' range.txt\nBEGIN\n1234\nBEGIN\na\n\n$ # print nth line after match\n$ # same as: perl -ne 'print if $n && !--$n; $n=3 if /BEGIN/' range.txt\n$ ruby -ne '$n.to_i>0 && (print if $n==1; $n-=1); $n=3 if /BEGIN/' range.txt\nEND\nc\n\n$ # use reversing trick for nth line before match\n$ tac range.txt | ruby -ne '$n.to_i>0 && (print if $n==1; $n-=1); $n=3 if /END/' | tac\nBEGIN\na\n```\n\n**Further Reading**\n\n* [softwareengineering - FSM examples](https://softwareengineering.stackexchange.com/questions/47806/examples-of-finite-state-machines)\n* [wikipedia - FSM](https://en.wikipedia.org/wiki/Finite-state_machine)\n\n<br>\n\n## <a name=\"ruby-regular-expressions\"></a>Ruby regular expressions\n\n* assuming that you are already familiar with basics of regular expressions\n    * if not, check out [Ruby Regexp](https://leanpub.com/rubyregexp) ebook - step by step guide from beginner to advanced levels\n* examples/descriptions are for string containing ASCII characters only\n* See [ruby-doc: Regexp](https://ruby-doc.org/core-2.5.0/Regexp.html) for documentation\n* See [rexegg ruby](https://www.rexegg.com/regex-ruby.html) for a bit of ruby regexp history and differences with other regexp engines\n\n<br>\n\n#### <a name=\"gotchas-and-tricks\"></a>gotchas and tricks\n\n* input record separator being part of input record\n\n```bash\n$ # newline character gets replaced too as shown by shell prompt\n$ echo 'foo:123:bar:789' | ruby -pe 'sub(/[^:]+$/, \"xyz\")'\nfoo:123:bar:xyz$\n$ # simple workaround is to use -l option\n$ echo 'foo:123:bar:789' | ruby -lpe 'sub(/[^:]+$/, \"xyz\")'\nfoo:123:bar:xyz\n\n$ # of course it is useful too\n$ # same as: perl -pe 's/\\n/ : / if !eof'\n$ seq 10 | ruby -pe 'sub(/\\n/, \" : \") if !$<.eof'\n1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10\n```\n\n* how much does `*` match?\n\n```bash\n$ # both empty and non-empty strings are matched\n$ # even though * is a greedy quantifier\n$ echo ',baz,,xyz,,,' | ruby -lpe 'gsub(/[^,]*/, \"A\")'\nA,AA,A,AA,A,A,A\n$ echo 'foo,baz,,xyz,,,123' | ruby -lpe 'gsub(/[^,]*/, \"A\")'\nAA,AA,A,AA,A,A,AA\n\n$ # one workaround is to use lookarounds(covered later)\n$ echo ',baz,,xyz,,,' | ruby -lpe 'gsub(/(?<=^|,)[^,]*/, \"A\")'\nA,A,A,A,A,A,A\n$ echo 'foo,baz,,xyz,,,123' | ruby -lpe 'gsub(/(?<=^|,)[^,]*/, \"A\")'\nA,A,A,A,A,A,A\n```\n\n* difference between `^` and `\\A`\n\n```bash\n$ # ^ matches start of line, not start of string\n$ # same as: perl -00 -ne 'print if /^Believe/m' sample.txt\n$ ruby -00 -ne 'print if /^Believe/' sample.txt\nJust do-it\nBelieve it\n\n$ ruby -00 -ne 'print if /^he/i' sample.txt\nHello World\n\nMuch ado about nothing\nHe he he\n\n$ # \\A matches start of string\n$ # without m modifier, both ^ and \\A will match start of string in perl\n$ ruby -00 -ne 'print if /\\Ahe/i' sample.txt\nHello World\n\n$ # similarly, $ matches end of line\n$ ruby -00 -ne 'print if /funny$/' sample.txt\nToday is sunny\nNot a bit funny\nNo doubt you like it too\n```\n\n* difference between `\\z` and `\\Z`\n\n```bash\n$ # \\Z matches just before newline\n$ seq 14 | ruby -ne 'print if /2\\Z/'\n2\n12\n\n$ # \\z matches end of string\n$ seq 14 | ruby -ne 'print if /2\\z/'\n$ seq 14 | ruby -ne 'print if /2\\n\\z/'\n2\n12\n\n$ # without newline at end of line, both \\z and \\Z will behave same\n$ seq 14 | ruby -lne 'print if /2\\z/'\n2\n12\n```\n\n* delimiters and quoting\n* from [ruby-doc: Percent Strings](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Percent+Strings)\n\n> If you are using “(”, “[”, “{”, “<” you must close it with “)”, “]”, “}”, “>” respectively. You may use most other non-alphanumeric characters for percent string delimiters such as “%”, “|”, “^”, etc.\n\n```bash\n$ # %r allows to use delimiter other than /\n$ echo 'a/b' | ruby -pe 'sub(/a\\/b/, \"foo\")'\nfoo\n$ echo 'a/b' | ruby -pe 'sub(%r{a/b}, \"foo\")'\nfoo\n\n$ # use %q (single quoting) to avoid variable interpolation\n$ echo 'foo123' | ruby -pe 'a=\"huh?\"; sub(/12/, \"#{a}\")'\nfoohuh?3\n$ echo 'foo123' | ruby -pe 'a=\"huh?\"; sub(/12/, %q/#{a}/)'\nfoo#{a}3\n\n$ # %q also useful for backreferences, as \\ is special inside double quotes\n$ echo 'a a a 2 be be' | ruby -pe 'gsub(/\\b(\\w+)( \\1)+\\b/, \"\\\\1\")'\na 2 be\n$ echo 'a a a 2 be be' | ruby -pe 'gsub(/\\b(\\w+)( \\1)+\\b/, %q/\\1/)'\na 2 be\n$ # and when double quotes is part of replacement string\n$ echo '42,789' | ruby -lpe 'gsub(/\\d+/, \"\\\"\\\\0\\\"\")'\n\"42\",\"789\"\n$ echo '42,789' | ruby -lpe 'gsub(/\\d+/, %q/\"\\0\"/)'\n\"42\",\"789\"\n$ # \\& can also be used instead of \\0\n```\n\n<br>\n\n#### <a name=\"backslash-sequences\"></a>Backslash sequences\n\n* `\\w` for `[A-Za-z0-9_]`\n* `\\d` for `[0-9]`\n* `\\s` for `[ \\t\\r\\n\\f\\v]`\n* `\\h` for `[0-9a-fA-F]` or `[[:xdigit:]]`\n* `\\W`, `\\D`, `\\S`, `\\H`, respectively for their opposites\n* See also [ruby-doc: scan](https://ruby-doc.org/core-2.5.0/String.html#method-i-scan)\n\n```bash\n$ # same as: perl -ne 'print if /^[[:xdigit:]]+$/'\n$ # can also use: ruby -lne 'print if !/\\H/'\n$ printf '128A\\n34\\nfe32\\nfoo1\\nbar\\n' | ruby -ne 'print if /^\\h+$/'\n128A\n34\nfe32\n\n$ # same as: perl -pe 's/\\d+/xxx/g'\n$ echo 'like 42 and 37' | ruby -pe 'gsub(/\\d+/, \"xxx\")'\nlike xxx and xxx\n\n$ # note again the use of -l because of newline in input record\n$ # same as: perl -lpe 's/\\D+/xxx/g'\n$ echo 'like 42 and 37' | ruby -lpe 'gsub(/\\D+/, \"xxx\")'\nxxx42xxx37\n\n$ # get all matches as an array\n$ echo 'tea sea-pit sit' | ruby -ne 'puts $_.scan(/[\\w\\s]+/)'\ntea sea\npit sit\n```\n\n<br>\n\n#### <a name=\"non-greedy-quantifier\"></a>Non-greedy quantifier\n\n* adding a `?` to `?` or `*` or `+` or `{}` quantifiers will change matching from greedy to non-greedy. In other words, to match as minimally as possible\n    * also known as lazy quantifier\n\n```bash\n$ # greedy matching\n$ echo 'foo and bar and baz land good' | ruby -lne 'print $_.scan(/.*and/)'\n[\"foo and bar and baz land\"]\n$ # non-greedy matching\n$ echo 'foo and bar and baz land good' | ruby -lne 'print $_.scan(/.*?and/)'\n[\"foo and\", \" bar and\", \" baz land\"]\n\n$ echo '12342789' | ruby -pe 'sub(/\\d{2,5}/, \"\")'\n789\n$ echo '12342789' | ruby -pe 'sub(/\\d{2,5}?/, \"\")'\n342789\n\n$ # for single character, non-greedy is not always needed\n$ echo '123:42:789:good:5:bad' | ruby -pe 'sub(/:.*?:/, \":\")'\n123:789:good:5:bad\n$ echo '123:42:789:good:5:bad' | ruby -pe 'sub(/:[^:]*:/, \":\")'\n123:789:good:5:bad\n\n$ # just like greedy, overall matching is considered, as minimal as possible\n$ echo '123:42:789:good:5:bad' | ruby -pe 'sub(/:.*?:[a-z]/, \":\")'\n123:ood:5:bad\n$ echo '123:42:789:good:5:bad' | ruby -pe 'sub(/:.*:[a-z]/, \":\")'\n123:ad\n```\n\n<br>\n\n#### <a name=\"lookarounds\"></a>Lookarounds\n\n* Ability to add if conditions to match before/after required pattern\n* There are four types\n    * positive lookahead `(?=`\n    * negative lookahead `(?!`\n    * positive lookbehind `(?<=`\n    * negative lookbehind `(?<!`\n* One way to remember is that **behind** uses `<` and **negative** uses `!` instead of `=`\n\nThe string matched by lookarounds are like word boundaries and anchors, do not constitute as part of matched string. They are termed as **zero-width patterns**\n\n* positive lookbehind `(?<=`\n\n```bash\n$ s='foo=5, bar=3; x=83, y=120'\n\n$ # extract all digit sequences, same as: perl -lne 'print join \" \", /\\d+/g'\n$ echo \"$s\" | ruby -lne 'print $_.scan(/\\d+/).join(\" \")'\n5 3 83 120\n\n$ # extract digits only if preceded by two lowercase alphabets and =\n$ # note how the characters matched by lookbehind isn't part of output\n$ # same as: perl -lne 'print join \" \", /(?<=[a-z]{2}=)\\d+/g'\n$ echo \"$s\" | ruby -lne 'print $_.scan(/(?<=[a-z]{2}=)\\d+/).join(\" \")'\n5 3\n$ # this can be done without lookbehind too\n$ echo \"$s\" | ruby -lne 'print $_.scan(/[a-z]{2}=(\\d+)/).join(\" \")'\n5 3\n\n$ # change all digits preceded by single lowercase alphabet and =\n$ # same as: perl -pe 's/(?<=\\b[a-z]=)\\d+/42/g'\n$ echo \"$s\" | ruby -pe 'gsub(/(?<=\\b[a-z]=)\\d+/, \"42\")'\nfoo=5, bar=3; x=42, y=42\n```\n\n* positive lookahead `(?=`\n\n```bash\n$ s='foo=5, bar=3; x=83, y=120'\n\n$ # extract digits that end with ,\n$ # same as: perl -lne 'print join \":\", /\\d+(?=,)/g'\n$ echo \"$s\" | ruby -lne 'print $_.scan(/\\d+(?=,)/).join(\":\")'\n5:83\n\n$ # change all digits ending with ,\n$ # same as: perl -pe 's/\\d+(?=,)/42/g'\n$ echo \"$s\" | ruby -pe 'gsub(/\\d+(?=,)/, \"42\")'\nfoo=42, bar=3; x=42, y=120\n\n$ # both lookbehind and lookahead\n$ echo 'foo,,baz,,,xyz' | ruby -pe 'gsub(/,,/, \",NA,\")'\nfoo,NA,baz,NA,,xyz\n$ echo 'foo,,baz,,,xyz' | ruby -pe 'gsub(/(?<=,)(?=,)/, \"NA\")'\nfoo,NA,baz,NA,NA,xyz\n```\n\n* negative lookbehind `(?<!` and negative lookahead `(?!`\n\n```bash\n$ # change foo if not preceded by _\n$ # note how 'foo' at start of line is matched as well\n$ # same as: perl -pe 's/(?<!_)foo/baz/g'\n$ echo 'foo _foo 1foo' | ruby -pe 'gsub(/(?<!_)foo/, \"baz\")'\nbaz _foo 1baz\n\n$ # join each line in paragraph by replacing newline character\n$ # except the one at end of paragraph\n$ # same as: perl -00 -pe 's/\\n(?!$)/. /g' sample.txt\n$ ruby -00 -pe 'gsub(/\\n(?!$)/, \". \")' sample.txt\nHello World\n\nGood day. How are you\n\nJust do-it. Believe it\n\nToday is sunny. Not a bit funny. No doubt you like it too\n\nMuch ado about nothing. He he he\n```\n\n* capture groups can also be used inside lookarounds\n\n```bash\n$ # same as: perl -pe 's/(\\H+\\h+)(?=(\\H+)\\h)/$1$2\\n/g'\n$ # %q cannot be used here as \\n is not meaningful inside single quotes\n$ echo 'a b c d e' | ruby -lpe 'gsub(/(\\S+\\s+)(?=(\\S+)\\s)/, \"\\\\1\\\\2\\n\")'\na b\nb c\nc d\nd e\n```\n\n* `\\K` helps as a workaround for some of the variable-length lookbehind cases\n* See also [stackoverflow - Variable-length lookbehind-assertion alternatives](https://stackoverflow.com/questions/11640447/variable-length-lookbehind-assertion-alternatives-for-regular-expressions)\n\n```bash\n$ echo '1 and 2 and 3 land 4' | ruby -pe 'sub(/(?<=(and.*?){2})and/, \"-\")'\n-e:1: invalid pattern in look-behind: /(?<=(and.*?){2})and/\n\n$ # \\K helps in such cases\n$ # same as: sed 's/and/-/3' and perl -pe 's/(and.*?){2}\\Kand/-/'\n$ echo '1 and 2 and 3 land 4' | ruby -pe 'sub(/(and.*?){2}\\Kand/, \"-\")'\n1 and 2 and 3 l- 4\n```\n\n* don't use `\\K` if there are consecutive matches\n* this is because of how the regexp engine has been implemented, `perl` or `vim`'s `\\zs` don't have this limitation\n\n```bash\n$ echo ',,' | perl -pe 's/,\\K/foo/g'\n,foo,foo\n$ echo ',,' | ruby -pe 'gsub(/,\\K/, \"foo\")'\n,foo,\n$ echo ',,' | ruby -pe 'gsub(/(?<=,)/, \"foo\")'\n,foo,foo\n\n$ # another example\n$ echo '\"foo\",\"12,34\",\"good\"' | perl -F'/\"\\K,(?=\")/' -lane 'print $F[1]'\n\"12,34\"\n$ echo '\"foo\",\"12,34\",\"good\"' | ruby -F'\"\\K,(?=\")' -lane 'print $F[1]'\n\"12,34\n$ echo '\"foo\",\"12,34\",\"good\"' | ruby -F'(?<=\"),(?=\")' -lane 'print $F[1]'\n\"12,34\"\n```\n\n<br>\n\n#### <a name=\"special-capture-groups\"></a>Special capture groups\n\n* `\\1`, `\\2` etc only matches exact string\n* `\\g<1>`, `\\g<2>` etc re-uses the regular expression itself\n\n```bash\n$ s='baz 2008-03-24 and 2012-08-12 foo 2016-03-25'\n$ # same as: perl -pe 's/(\\d{4}-\\d{2}-\\d{2}) and (?1)/XYZ/'\n$ echo \"$s\" | ruby -pe 'sub(/(\\d{4}-\\d{2}-\\d{2}) and \\g<1>/, \"XYZ\")'\nbaz XYZ foo 2016-03-25\n\n$ # using \\1 won't work as the two dates are different\n$ echo \"$s\" | ruby -pe 'sub(/(\\d{4}-\\d{2}-\\d{2}) and \\1/, \"\")'\nbaz 2008-03-24 and 2012-08-12 foo 2016-03-25\n```\n\n* use `(?:` to group regular expressions without capturing it, so this won't be counted for backreference\n* See also [stackoverflow - what is non-capturing group](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do)\n\n```bash\n$ # using ?: helps to focus only on required capture groups\n$ # same as: perl -pe 's/(?:co|fo)\\K(\\w)(\\w)/$2$1/g'\n$ echo 'cod1 foo_bar' | ruby -pe 'gsub(/(?:co|fo)\\K(\\w)(\\w)/, %q/\\2\\1/)'\nco1d fo_obar\n\n$ # without ?: you'd need to remember all the other groups as well\n$ echo 'cod1 foo_bar' | ruby -pe 'gsub(/(co|fo)\\K(\\w)(\\w)/, %q/\\3\\2/)'\nco1d fo_obar\n```\n\n* named capture groups `(?<name>` or `(?'name'`\n* for backreference, use `\\k<name>`\n* both named capture groups and normal capture groups cannot be used at the same time\n\n```bash\n$ # same as: perl -pe 's/(?<fw>\\w+) (?<sw>\\w+)/$+{sw} $+{fw}/'\n$ echo 'foo 123' | ruby -pe 'sub(/(?<fw>\\w+) (?<sw>\\w+)/, %q/\\k<sw> \\k<fw>/)'\n123 foo\n\n$ # also useful to transform different capture groups\n$ s='\"foo,bar\",123,\"x,y,z\",42'\n$ # same as: perl -lpe 's/\"(?<a>[^\"]+)\",|(?<a>[^,]+),/$+{a}|/g'\n$ echo \"$s\" | ruby -lpe 'gsub(/\"(?<a>[^\"]+)\",|(?<a>[^,]+),/, %q/\\k<a>|/)'\nfoo,bar|123|x,y,z|42\n```\n\n**Further Reading**\n\n* [rexegg - all the (? usages](https://www.rexegg.com/regex-disambiguation.html)\n* [regular-expressions - recursion](https://www.regular-expressions.info/recurse.html#balanced)\n* [stackoverflow - Recursive nested matching pairs of curly braces](https://stackoverflow.com/questions/19486686/recursive-nested-matching-pairs-of-curly-braces-in-ruby-regex)\n\n<br>\n\n#### <a name=\"modifiers\"></a>Modifiers\n\n* use `i` modifier to ignore case while matching\n\n```bash\n$ ruby -ne 'print if /rose/i' poem.txt\nRoses are red,\n\n$ echo 'foo 123 FoO' | ruby -pe 'gsub(/foo/i, \"good\")'\ngood 123 good\n```\n\n* by default, `.` doesn't match the newline character\n* `m` modifier allows `.` metacharacter to match newline character as well\n\n```bash\n$ # searching for a match which can span across multiple lines\n\n$ # no output as . doesn't match newline\n$ ruby -00 -ne 'print if /do.*he/' sample.txt\n\n$ # same as: perl -00 -ne 'print if /do.*he/s' sample.txt\n$ ruby -00 -ne 'print if /do.*he/m' sample.txt\nMuch ado about nothing\nHe he he\n```\n\n<br>\n\n#### <a name=\"code-in-replacement-section\"></a>Code in replacement section\n\n* block form allows to use `ruby` code for replacement section\n\nquoting from [ruby-doc: gsub](https://ruby-doc.org/core-2.5.0/String.html#method-i-gsub)\n\n>In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.\n\n* `$1`, `$2`, etc are equivalent of `\\1`, `\\2`, etc\n* `$&` is equivalent of `\\&`(or `\\0`) - i.e the entire matched string\n\n\n```bash\n$ # replace numbers with their squares, same as: perl -pe 's/\\d+/$&**2/ge'\n$ echo '4 and 10' | ruby -pe 'gsub(/\\d+/){$&.to_i ** 2}'\n16 and 100\n\n$ # replace matched string with incremental value\n$ # same as: perl -pe 's/\\d+/++$c/ge'\n$ echo '4 and 10 foo 57' | ruby -pe 'BEGIN{c=0}; gsub(/\\d+/){c+=1}'\n1 and 2 foo 3\n\n$ # replace with string length, same as: perl -pe 's/\\w+/length($&)/ge'\n$ echo 'food:12:explain:789' | ruby -pe 'gsub(/\\w+/){$&.length}'\n4:2:7:3\n\n$ # formatting string, same as: perl -lpe 's/[^-]+/sprintf \"%04s\", $&/ge'\n$ echo 'a1-2-deed' | ruby -lpe 'gsub(/[^-]+/){ $&.rjust(4, \"0\") }'\n00a1-0002-deed\n\n$ # applying another substitution to matched string\n$ # same as: perl -pe 's/\"[^\"]+\"/$&=~s|a|A|gr/ge'\n$ echo '\"mango\" and \"guava\"' | ruby -pe 'gsub(/\"[^\"]+\"/){$&.gsub(/a/, \"A\")}'\n\"mAngo\" and \"guAvA\"\n```\n\n* replacing specific occurrence\n\n```bash\n$ # replacing 2nd occurrence, same as: sed 's/:/-/2'\n$ # same as: perl -pe '$c=0; s/:/++$c==2 ? \"-\" : $&/ge'\n$ echo 'foo:123:bar:baz' | ruby -pe 'c=0; gsub(/:/){(c+=1)==2 ? \"-\" : $&}'\nfoo:123-bar:baz\n$ # or use non-greedy matching, same as: sed 's/and/-/3'\n$ echo 'foo and bar and baz land good' | ruby -pe 'sub(/(and.*?){2}\\Kand/, \"-\")'\nfoo and bar and baz l- good\n\n$ # emulating GNU sed's number+g modifier\n$ a='456:foo:123:bar:789:baz\nx:y:z:a:v:xc:gf'\n$ echo \"$a\" | sed 's/:/-/3g'\n456:foo:123-bar-789-baz\nx:y:z-a-v-xc-gf\n$ # same as: perl -pe '$c=0; s/:/++$c<3 ? $& : \"-\"/ge'\n$ echo \"$a\" | ruby -pe 'c=0; gsub(/:/){(c+=1)<3 ? $& : \"-\"}'\n456:foo:123-bar-789-baz\nx:y:z-a-v-xc-gf\n```\n\n<br>\n\n#### <a name=\"quoting-metacharacters\"></a>Quoting metacharacters\n\n* to match contents of string variable exactly, all metacharacters need to be escaped\n* See [ruby-doc: Regexp.escape](https://ruby-doc.org/core-2.5.0/Regexp.html#method-c-escape) for syntax details\n\n```bash\n$ cat eqns.txt\na=b,a-b=c,c*d\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ # since + is a metacharacter, no match found\n$ # note that #{} allows interpolation\n$ s='a+b' ruby -ne 'print if /#{ENV[\"s\"]}/' eqns.txt\n\n$ # same as: s='a+b' perl -ne 'print if /\\Q$ENV{s}/' eqns.txt\n$ s='a+b' ruby -ne 'print if /#{Regexp.escape(ENV[\"s\"])}/' eqns.txt\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a+b\n\n$ # use regexp as needed around variable content, for ex: end of line anchor\n$ ruby -pe 'BEGIN{s=\"a+b\"}; sub(/#{Regexp.escape(s)}$/, \"a**b\")' eqns.txt\na=b,a-b=c,c*d\na+b,pi=3.14,5e12\ni*(t+9-g)/8,4-a**b\n```\n\n<br>\n\n## <a name=\"two-file-processing\"></a>Two file processing\n\nFirst, a bit about `ARGV` which allows to keep track of which file is being processed\n\n```bash\n$ # similar to: perl -lne 'print $#ARGV' <(seq 2) <(seq 3) <(seq 1)\n$ ruby -ne 'puts ARGV.length' <(seq 2) <(seq 3) <(seq 1)\n2\n2\n1\n1\n1\n0\n```\n\n<br>\n\n#### <a name=\"comparing-whole-lines\"></a>Comparing whole lines\n\nConsider the following test files\n\n```bash\n$ cat colors_1.txt\nBlue\nBrown\nPurple\nRed\nTeal\nYellow\n\n$ cat colors_2.txt\nBlack\nBlue\nGreen\nRed\nWhite\n```\n\n* `-r` command line option allows to specify library required\n    * the `include?` method allows to check if `set` already contains the element\n    * See [ruby-doc: include?](https://ruby-doc.org/stdlib-2.5.0/libdoc/set/rdoc/Set.html#method-i-include-3F) for syntax details\n\n```bash\n$ # common lines\n$ # note that all duplicates matching in second file would get printed\n$ # same as: perl -ne 'if(!$#ARGV){$h{$_}=1; next}\n$ #            print if $h{$_}' colors_1.txt colors_2.txt\n$ ruby -rset -ne 'BEGIN{s=Set.new}; s.add($_) && next if ARGV.length==1;\n                  print if s.include?($_)' colors_1.txt colors_2.txt\nBlue\nRed\n\n$ # lines from colors_2.txt not present in colors_1.txt\n$ ruby -rset -ne 'BEGIN{s=Set.new}; s.add($_) && next if ARGV.length==1;\n                  print if !s.include?($_)' colors_1.txt colors_2.txt\nBlack\nGreen\nWhite\n\n$ # next - to skip rest of code and process next input line\n$ # here used to skip rest of code as long as first file is being processed\n$ # alternate: ARGV.length==1 ? s.add($_) : s.include?($_) && print\n```\n\nalternate solution by using set operations available for arrays\n\n* [ruby-doc: ARGF](https://ruby-doc.org/core-2.5.0/ARGF.html) filehandle allows to read from filename arguments supplied to script\n    * if filename arguments are not present, it would act upon stdin\n* `STDIN` filehandle allows to read from stdin\n* [ruby-doc: readlines](https://ruby-doc.org/core-2.5.0/IO.html#method-c-readlines) method allows to read all the lines as an array\n    * if filehandle is not specified, default is ARGF\n* some comparison notes\n    * both files will get saved as array in memory here, while previous solution would save only first file\n    * duplicates would get removed here\n    * likely to be faster compared to previous solution\n\n```bash\n$ # note that -n/-p options are not used\n$ # and puts is helpful here as record separator is newline character\n\n$ # common lines, output order is based on array to left of & operator\n$ ruby -e 'f1=STDIN.readlines; f2=readlines;\n           puts f1 & f2' <colors_1.txt colors_2.txt\nBlue\nRed\n\n$ # lines from colors_2.txt not present in colors_1.txt\n$ ruby -e 'f1=STDIN.readlines; f2=readlines;\n           puts f2 - f1' <colors_1.txt colors_2.txt\nBlack\nGreen\nWhite\n\n$ # for union, use either of these\n$ # ruby -e 'f1=STDIN.readlines; f2=readlines;\n$ #          puts f1 | f2' <colors_1.txt colors_2.txt\n$ # ruby -e 'puts readlines.uniq' colors_1.txt colors_2.txt\n```\n\n<br>\n\n#### <a name=\"comparing-specific-fields\"></a>Comparing specific fields\n\nConsider the sample input file\n\n```bash\n$ cat marks.txt\nDept    Name    Marks\nECE     Raj     53\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nEEE     Tia     59\nECE     Om      92\nCSE     Amy     67\n```\n\n* single field\n* For ex: only first field comparison instead of entire line as key\n\n```bash\n$ cat list1\nECE\nCSE\n\n$ # extract only lines matching first field specified in list1\n$ ruby -rset -ane 'BEGIN{s=Set.new}; s.add($F[0]) && next if ARGV.length==1;\n                   print if s.include?($F[0])' list1 marks.txt\nECE     Raj     53\nECE     Joel    72\nCSE     Surya   81\nECE     Om      92\nCSE     Amy     67\n```\n\n* multiple field comparison\n\n```bash\n$ cat list2\nEEE Moi\nCSE Amy\nECE Raj\n\n$ # $F[0..1] will return array with elements specified by range (0 to 1 here)\n$ ruby -rset -ane 'BEGIN{s=Set.new}; s.add($F[0..1]) && next if ARGV.length==1;\n                   print if s.include?($F[0..1])' list2 marks.txt\nECE     Raj     53\nEEE     Moi     68\nCSE     Amy     67\n```\n\n* field and value comparison\n* here, we use [hash](https://ruby-doc.org/core-2.5.0/Hash.html) as well to save values based on a key\n\n```bash\n$ cat list3\nECE 70\nEEE 65\nCSE 80\n\n$ # extract line matching Dept and minimum marks specified in list3\n$ ruby -rset -ane 'BEGIN{d=Set.new; m={}};\n                   (d.add($F[0]); m[$F[0]]=$F[1]) && next if ARGV.length==1;\n                   print if d.include?($F[0]) && $F[2]>=m[$F[0]]' list3 marks.txt\nECE     Joel    72\nEEE     Moi     68\nCSE     Surya   81\nECE     Om      92\n```\n\n<br>\n\n#### <a name=\"line-number-matching\"></a>Line number matching\n\n```bash\n$ # replace mth line in poem.txt with nth line from list1\n$ # same as: m=3 n=2 perl -pe 'BEGIN{ $s=<> while $ENV{n}-- > 0; close ARGV}\n$ #                    $_=$s if $.==$ENV{m}' list1 poem.txt\n$ m=3 n=2 ruby -pe 'BEGIN{ENV[\"n\"].to_i.times { $s=gets }; ARGF.close };\n                    $_=$s if $.==ENV[\"m\"].to_i' list1 poem.txt\nRoses are red,\nViolets are blue,\nCSE\nAnd so are you.\n\n$ # print line from fruits.txt if corresponding line from nums.txt is +ve number\n$ # same as: <nums.txt perl -ne 'print if <STDIN> > 0' fruits.txt\n$ # line from fruits.txt is saved first as STDIN.gets will also set $_\n$ <nums.txt ruby -ne 'ln=$_; print ln if STDIN.gets.to_i>0' fruits.txt\nfruit   qty\nbanana  31\n$ # can also use:\n$ # ruby -e 'STDIN.readlines.zip(readlines).each {|a| puts a[1] if a[0].to_i>0}'\n```\n\nFor syntax and implementation details, see\n\n* [ruby-doc: ARGF](https://ruby-doc.org/core-2.5.0/ARGF.html)\n* [ruby-doc: times](https://ruby-doc.org/core-2.5.0/Integer.html#method-i-times)\n* [ruby-doc: gets](https://ruby-doc.org/core-2.5.0/IO.html#method-i-gets)\n\n<br>\n\n## <a name=\"creating-new-fields\"></a>Creating new fields\n\n* See [ruby-doc: slice](https://ruby-doc.org/core-2.5.0/Array.html#method-i-slice) for syntax details\n\n```bash\n$ s='foo,bar,123,baz'\n\n$ # to reduce fields, use slice method\n$ # same as: echo \"$s\" | perl -F, -lane '$,=\",\"; $#F=1; print @F'\n$ # 1st arg - starting index, 2nd arg - number of elements\n$ echo \"$s\" | ruby -F, -lane '$F.slice!(-2,2); print $F * \",\"'\nfoo,bar\n\n$ # assigning to field greater than length will create empty fields as needed\n$ # same as: echo \"$s\" | perl -F, -lane '$,=\",\"; $F[6]=42; print @F'\n$ echo \"$s\" | ruby -F, -lane '$F[6]=42; print $F * \",\"'\nfoo,bar,123,baz,,,42\n```\n\n* adding a field based on existing fields\n* See [ruby-doc: Percent Strings](https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Percent+Strings) for details on `%w`\n\n```bash\n$ # adding a new 'Grade' field\n$ # same as: perl -lane 'BEGIN{$,=\"\\t\"; @g = qw(D C B A S)}\n$ #          push @F, $.==1 ? \"Grade\" : $g[$F[-1]/10 - 5]; print @F' marks.txt\n$ ruby -lane 'BEGIN{g = %w[D C B A S]};\n              $F.push($.==1 ? \"Grade\" : g[$F[-1].to_i/10 - 5]);\n              print $F * \"\\t\"' marks.txt\nDept    Name    Marks   Grade\nECE     Raj     53      D\nECE     Joel    72      B\nEEE     Moi     68      C\nCSE     Surya   81      A\nEEE     Tia     59      D\nECE     Om      92      S\nCSE     Amy     67      C\n```\n\n<br>\n\n## <a name=\"multiple-file-input\"></a>Multiple file input\n\n* processing based on line-number/begin/end of each input file\n\n```bash\n$ # same as: perl -ne 'print if $.==2; close ARGV if eof'\n$ # ARGF.close will reset $. to 0\n$ ruby -ne 'print if $.==2; ARGF.close if $<.eof' poem.txt greeting.txt\nViolets are blue,\nHave a safe journey\n\n$ # same as: perl -lne 'print \"file: $ARGV\" if $.==1;\n$ #            print \"$_\\n------\" and close ARGV if eof' poem.txt greeting.txt\n$ ruby -lne 'print \"file: #{ARGF.filename}\" if $.==1;\n             (print \"#{$_}\\n------\"; ARGF.close) if $<.eof' poem.txt greeting.txt\nfile: poem.txt\nAnd so are you.\n------\nfile: greeting.txt\nHave a safe journey\n------\n```\n\n* to skip remaining lines from current file being processed and move on to next file\n\n```bash\n$ # same as: perl -pe 'close ARGV if $.>=1' poem.txt greeting.txt fruits.txt\n$ ruby -pe 'ARGF.close if $.>=1' poem.txt greeting.txt fruits.txt\nRoses are red,\nHello there\nfruit   qty\n\n$ # same as: perl -lane 'print $ARGV and close ARGV if $F[0] =~ /red/i' *\n$ ruby -ane '(puts ARGF.filename; ARGF.close) if $F[0] =~ /red/i' *\ncolors_1.txt\ncolors_2.txt\n```\n\n<br>\n\n## <a name=\"dealing-with-duplicates\"></a>Dealing with duplicates\n\n* retain only first copy of duplicates\n* `-r` command line option allows to specify library required\n* here, `set` data type is used to keep track of unique values - be it whole line or a particular field\n    * the `add?` method will add element to `set` and returns `nil` if element already exists\n    * See [ruby-doc: add?](https://ruby-doc.org/stdlib-2.5.0/libdoc/set/rdoc/Set.html#method-i-add-3F) for syntax details\n\n```bash\n$ cat duplicates.txt\nabc  7   4\nfood toy ****\nabc  7   4\ntest toy 123\ngood toy ****\n\n$ # whole line, same as: perl -ne 'print if !$seen{$_}++' duplicates.txt\n$ ruby -rset -ne 'BEGIN{s=Set.new}; print if s.add?($_)' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # particular column, same as: perl -ane 'print if !$seen{$F[1]}++'\n$ ruby -rset -ane 'BEGIN{s=Set.new}; print if s.add?($F[1])' duplicates.txt\nabc  7   4\nfood toy ****\n\n$ # total count, same as: perl -lane '$c++ if !$seen{$F[1]}++; END{print $c}'\n$ ruby -rset -ane 'BEGIN{s=Set.new}; s.add($F[1]);\n                   END{puts s.length}' duplicates.txt\n2\n```\n\n* multiple fields\n\n```bash\n$ # same as: perl -ane 'print if !$seen{$F[1],$F[2]}++' duplicates.txt\n$ # $F[1..2] will return an array with fields 2 and 3 as elements\n$ ruby -rset -ane 'BEGIN{s=Set.new}; print if s.add?($F[1..2])' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\n```\n\n* retaining only last copy of duplicate\n\n```bash\n$ # reverse the input line-wise, retain first copy and then reverse again\n$ # same as: tac duplicates.txt | perl -ane 'print if !$seen{$F[1]}++' | tac\n$ tac duplicates.txt | ruby -rset -ane 'BEGIN{s=Set.new};\n                       print if s.add?($F[1])' | tac\nabc  7   4\ngood toy ****\n```\n\n* for count based filtering (other than first/last count), use a `hash`\n* `Hash.new(0)` will initialize value of new key to `0`\n\n```bash\n$ # second occurrence of duplicate\n$ # same as: perl -ane 'print if ++$h{$F[1]}==2' duplicates.txt\n$ ruby -ane 'BEGIN{h=Hash.new(0)}; print if (h[$F[1]]+=1)==2' duplicates.txt\nabc  7   4\ntest toy 123\n\n$ # third occurrence of duplicate\n$ # same as: perl -ane 'print if ++$h{$F[1]}==3' duplicates.txt\n$ ruby -ane 'BEGIN{h=Hash.new(0)}; print if (h[$F[1]]+=1)==3' duplicates.txt\ngood toy ****\n```\n\n* filtering based on duplicate count\n* allows to emulate [uniq](./sorting_stuff.md#uniq) command for specific fields\n\n```bash\n$ # all duplicates based on 1st column\n$ # same as: perl -ane '!$#ARGV ? $x{$F[0]}++ : $x{$F[0]}>1 && print'\n$ ruby -ane 'BEGIN{h=Hash.new(0)}; ARGV.length==1 ? h[$F[0]]+=1 :\n              h[$F[0]]>1 && print' duplicates.txt duplicates.txt\nabc  7   4\nabc  7   4\n\n$ # more than 2 duplicates based on 2nd column\n$ ruby -ane 'BEGIN{h=Hash.new(0)}; ARGV.length==1 ? h[$F[1]]+=1 :\n              h[$F[1]]>2 && print' duplicates.txt duplicates.txt\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # only unique lines based on 3rd column\n$ ruby -ane 'BEGIN{h=Hash.new(0)}; ARGV.length==1 ? h[$F[2]]+=1 :\n              h[$F[2]]==1 && print' duplicates.txt duplicates.txt\ntest toy 123\n```\n\n<br>\n\n#### <a name=\"using-uniq-method\"></a>using uniq method\n\n* [ruby-doc: uniq](https://ruby-doc.org/core-2.5.0/Array.html#method-i-uniq)\n* original order is maintained\n\n```bash\n$ # same as: ruby -rset -ne 'BEGIN{s=Set.new}; print if s.add?($_)'\n$ ruby -e 'puts readlines.uniq' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\ngood toy ****\n\n$ # same as: ruby -rset -ane 'BEGIN{s=Set.new}; print if s.add?($F[1])'\n$ ruby -e 'puts readlines.uniq {|s| s.split[1]}' duplicates.txt\nabc  7   4\nfood toy ****\n\n$ # same as: ruby -rset -ane 'BEGIN{s=Set.new}; print if s.add?($F[1..2])'\n$ ruby -e 'puts readlines.uniq {|s| s.split[1..2]}' duplicates.txt\nabc  7   4\nfood toy ****\ntest toy 123\n```\n\n<br>\n\n## <a name=\"lines-between-two-regexps\"></a>Lines between two REGEXPs\n\n* This section deals with filtering lines bound by two *REGEXP*s (referred to as blocks)\n* For simplicity the two *REGEXP*s usually used in below examples are the strings **BEGIN** and **END**\n\n<br>\n\n#### <a name=\"all-unbroken-blocks\"></a>All unbroken blocks\n\nConsider the below sample input file, which doesn't have any unbroken blocks (i.e **BEGIN** and **END** are always present in pairs)\n\n```bash\n$ cat range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nEND\nbaz\n```\n\n* Extracting lines between starting and ending *REGEXP*\n\n```bash\n$ # include both starting/ending REGEXP\n$ # same as: perl -ne '$f=1 if /BEGIN/; print if $f; $f=0 if /END/'\n$ ruby -ne '$f=1 if /BEGIN/; print if $f==1; $f=0 if /END/' range.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\na\nb\nc\nEND\n\n$ # can also use: ruby -ne 'print if /BEGIN/../END/' range.txt\n$ # which is similar to sed -n '/BEGIN/,/END/p'\n$ # but not suitable to extend for other cases\n```\n\n* other variations\n\n```bash\n$ # exclude both starting/ending REGEXP\n$ # same as: perl -ne '$f=0 if /END/; print if $f; $f=1 if /BEGIN/'\n$ ruby -ne '$f=0 if /END/; print if $f==1; $f=1 if /BEGIN/' range.txt\n1234\n6789\na\nb\nc\n\n$ # check out what these do:\n$ ruby -ne '$f=1 if /BEGIN/; $f=0 if /END/; print if $f==1' range.txt\n$ ruby -ne 'print if $f==1; $f=0 if /END/; $f=1 if /BEGIN/' range.txt\n```\n\n* Extracting lines other than lines between the two *REGEXP*s\n\n```bash\n$ # same as: perl -ne '$f=1 if /BEGIN/; print if !$f; $f=0 if /END/'\n$ # can also use: ruby -ne 'print if !(/BEGIN/../END/)' range.txt\n$ ruby -ne '$f=1 if /BEGIN/; print if $f!=1; $f=0 if /END/' range.txt\nfoo\nbar\nbaz\n\n$ # the other three cases would be\n$ ruby -ne '$f=0 if /END/; print if $f!=1; $f=1 if /BEGIN/' range.txt\n$ ruby -ne 'print if $f!=1; $f=1 if /BEGIN/; $f=0 if /END/' range.txt\n$ ruby -ne '$f=1 if /BEGIN/; $f=0 if /END/; print if $f!=1' range.txt\n```\n\n<br>\n\n#### <a name=\"specific-blocks\"></a>Specific blocks\n\n* Getting first block\n\n```bash\n$ # same as: perl -ne '$f=1 if /BEGIN/; print if $f; exit if /END/'\n$ ruby -ne '$f=1 if /BEGIN/; print if $f==1; exit if /END/' range.txt\nBEGIN\n1234\n6789\nEND\n\n$ # use other tricks discussed in previous section as needed\n$ ruby -ne 'exit if /END/; print if $f==1; $f=1 if /BEGIN/' range.txt\n1234\n6789\n```\n\n* Getting last block\n\n```bash\n$ # reverse input linewise, change the order of REGEXPs, finally reverse again\n$ tac range.txt | ruby -ne '$f=1 if /END/; print if $f==1; exit if /BEGIN/' | tac\nBEGIN\na\nb\nc\nEND\n\n$ # or, save the blocks in a buffer and print the last one alone\n$ # same as: seq 30 | perl -ne 'if(/4/){$f=1; $b=$_; next}\n$ #                     $b.=$_ if $f; $f=0 if /6/; END{print $b}'\n$ # << operator concatenates given string to the variable in-place\n$ seq 30 | ruby -ne '($f=1; $b=$_) && next if /4/;\n                     $b << $_ if $f==1; $f=0 if /6/; END{print $b}'\n24\n25\n26\n```\n\n* Getting blocks based on a counter\n\n```bash\n$ # get only 2nd block\n$ # same as: b=2 perl -ne '$c++ if /4/; if($c==$ENV{b}){print; exit if /6/}'\n$ seq 30 | b=2 ruby -ne 'BEGIN{c=0}; c+=1 if /4/;\n                         c==ENV[\"b\"].to_i && (print; exit if /6/)'\n14\n15\n16\n\n$ # to get all blocks greater than 'b' blocks\n$ seq 30 | b=1 ruby -ne 'BEGIN{c=0}; ($f=1; c+=1) if /4/;\n                         print if $f==1 && c>ENV[\"b\"].to_i; $f=0 if /6/'\n14\n15\n16\n24\n25\n26\n```\n\n* excluding a particular block\n\n```bash\n$ # excludes 2nd block\n$ seq 30 | b=2 ruby -ne 'BEGIN{c=0}; ($f=1; c+=1) if /4/;\n                         print if $f==1 && c!=ENV[\"b\"].to_i; $f=0 if /6/'\n4\n5\n6\n24\n25\n26\n```\n\n* extract block only if it matches another string as well\n\n```bash\n$ # string to match inside block: 23\n$ # same as: perl -ne 'if(/BEGIN/){$f=1; $m=0; $b=\"\"}; $m=1 if $f && /23/;\n$ #            $b.=$_ if $f; if(/END/){print $b if $m; $f=0}' range.txt\n$ ruby -ne '($f=1; $m=0; $b=\"\") if /BEGIN/; $m=1 if $f==1 && /23/;\n            $b<<$_ if $f==1; (print $b if $m==1; $f=0) if /END/' range.txt\nBEGIN\n1234\n6789\nEND\n\n$ # line to match inside block: 5 or 25\n$ seq 30 | ruby -ne '($f=1; $m=0; $b=\"\") if /4/; $m=1 if $f==1 && /^2?5$/;\n                     $b<<$_ if $f==1; (print $b if $m==1; $f=0) if /6/'\n4\n5\n6\n24\n25\n26\n```\n\n<br>\n\n#### <a name=\"broken-blocks\"></a>Broken blocks\n\n* If there are blocks with ending *REGEXP* but without corresponding start, earlier techniques used will suffice\n* Consider the modified input file where starting *REGEXP* doesn't have corresponding ending\n\n```bash\n$ cat broken_range.txt\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nBEGIN\na\nb\nc\nbaz\n\n$ # the file reversing trick comes in handy here as well\n$ tac broken_range.txt | ruby -ne '$f=1 if /END/;\n                         print if $f==1; $f=0 if /BEGIN/' | tac\nBEGIN\n1234\n6789\nEND\n```\n\n* But if both kinds of broken blocks are present, for ex:\n\n```bash\n$ cat multiple_broken.txt\nqqqqqqq\nBEGIN\nfoo\nBEGIN\n1234\n6789\nEND\nbar\nEND\n0-42-1\nBEGIN\na\nBEGIN\nb\nEND\nxyzabc\n```\n\nthen use buffers to accumulate the records and print accordingly\n\n```bash\n$ # same as: perl -ne 'if(/BEGIN/){$f=1; $b=$_; next} $b.=$_ if $f;\n$ #            if(/END/){$f=0; print $b if $b; $b=\"\"}' multiple_broken.txt\n$ ruby -ne '($f=1; $b=$_) && next if /BEGIN/; $b << $_ if $f==1;\n            ($f=0; print $b if $b!=\"\"; $b=\"\") if /END/' multiple_broken.txt\nBEGIN\n1234\n6789\nEND\nBEGIN\nb\nEND\n\n$ # note how buffer is initialized as well as cleared\n$ # on matching beginning/end REGEXPs respectively\n```\n\n<br>\n\n## <a name=\"array-operations\"></a>Array operations\n\nSee [ruby-doc: Array](https://ruby-doc.org/core-2.5.0/Array.html) for various ways to initialize and methods available\n\n* initialization\n\n```bash\n$ # as comma separated values, indexing starts at 0\n$ ruby -le 'sq = [1, 4, 9, 16]; print sq[2]'\n9\n$ ruby -le 'a = [123, \"foo\", \"baz789\"]; print a[1]'\nfoo\n$ # -ve indexing, -1 for last element, -2 for second last, etc\n$ ruby -le 'foo = [2, \"baz\", [\"a\", \"b\"]]; print foo[-1]'\n[\"a\", \"b\"]\n\n$ # variables can be used, double quoted string will interpolate\n$ ruby -le 'a=5; b=[\"a\", \"b\"]; c=[a, 789, b]; print c'\n[5, 789, [\"a\", \"b\"]]\n$ ruby -le 'c=[89, \"a\\nb\"]; print c[-1]'\na\nb\n\n$ # %w allows space separated string values, no interpolation\n$ ruby -le 'b = %w[123 foo baz789]; print b[1]'\nfoo\n$ ruby -le 's = %w[foo \"baz\" \"a\\nb\"]; print s[-1]'\n\"a\\nb\"\n```\n\n* array slices\n* See also [ruby-doc: Array to Arguments Conversion](https://ruby-doc.org/core-2.5.0/doc/syntax/calling_methods_rdoc.html#label-Array+to+Arguments+Conversion)\n\n```bash\n$ # accessing more than one element in random order\n$ echo 'a b c d' | ruby -lane 'print $F.values_at(0,-1,2) * \" \"'\na d c\n$ echo 'a b c d' | ruby -lane 'i=[0, -1, 2]; print $F.values_at(*i) * \" \"'\na d c\n\n$ # starting index and number of elements needed from that index\n$ echo 'a b c d' | ruby -lane 'print $F[0,3] * \" \"'\na b c\n$ # range operator, arguments are start/end indexes\n$ echo 'a b c d' | ruby -lane 'print $F[1..3] * \" \"'\nb c d\n\n$ # n elements from start, can also use 'first' method instead of 'take'\n$ echo 'a b c d' | ruby -lane 'print $F.take(2) * \" \"'\na b\n$ # remaining elements after ignoring n elements from start\n$ echo 'a b c d' | ruby -lane 'print $F.drop(3) * \" \"'\nd\n$ # n elements from end\n$ echo 'a b c d' | ruby -lane 'print $F.last(3) * \" \"'\nb c d\n```\n\n* looping\n\n```bash\n$ # by element value, use 'reverse_each' to iterate in reversed order\n$ # can also use range here: ruby -e '(1..4).each {|n| puts n*2}'\n$ ruby -e 'nums=[1, 2, 3, 4]; nums.each {|n| puts n*2}'\n2\n4\n6\n8\n\n$ # by index\n$ ruby -e 'books=%w[Elantris Martian Dune Alchemist]\n           books.each_index {|i| puts \"#{i+1}) #{books[i]}\"}'\n1) Elantris\n2) Martian\n3) Dune\n4) Alchemist\n```\n\n<br>\n\n#### <a name=\"filtering\"></a>Filtering\n\n* based on regexp\n\n```ruby\n$ s='foo:123:bar:baz'\n$ echo \"$s\" | ruby -F: -lane 'print $F.grep(/[a-z]/) * \":\"'\nfoo:bar:baz\n\n$ words='tryst fun glyph pity why'\n$ echo \"$words\" | ruby -lane 'puts $F.grep(/[a-g]/)'\nfun\nglyph\n\n$ # grep_v inverts the selection\n$ echo \"$words\" | ruby -lane 'puts $F.grep_v(/[aeiou]/)'\ntryst\nglyph\nwhy\n```\n\n* use `select` or `reject` for generic conditions\n\n```bash\n$ # to get index instead of matches\n$ s='foo:123:bar:baz'\n$ echo \"$s\" | ruby -F: -lane 'print $F.each_index.select{|i| $F[i] =~ /[a-z]/}'\n[0, 2, 3]\n\n$ # based on numeric value\n$ s='23 756 -983 5'\n$ echo \"$s\" | ruby -lane 'print $F.select { |s| s.to_i < 100 } * \" \"'\n23 -983 5\n\n$ # filters only those elements with successful substitution\n$ # for opposite, either use negated condition or use reject instead of select\n$ echo \"$s\" | ruby -lane 'print $F.select { |s| s.sub!(/3/, \"E\") } * \" \"'\n2E -98E\n```\n\n* random element(s)\n\n```bash\n$ s='65 23 756 -983 5'\n$ echo \"$s\" | ruby -lane 'print $F.sample'\n23\n$ echo \"$s\" | ruby -lane 'print $F.sample'\n5\n\n$ echo \"$s\" | ruby -lane 'print $F.sample(2)'\n[\"-983\", \"756\"]\n```\n\n<br>\n\n#### <a name=\"sorting\"></a>Sorting\n\n* [ruby-doc: sort](https://ruby-doc.org/core-2.5.0/Array.html#method-i-sort)\n* See also [stackoverflow What does map(&:name) mean in Ruby?](https://stackoverflow.com/questions/1217088/what-does-mapname-mean-in-ruby) for explanation on `&:`\n\n```bash\n$ s='foo baz v22 aimed'\n$ # same as: perl -lane 'print join \" \", sort @F'\n$ echo \"$s\" | ruby -lane 'print $F.sort * \" \"'\naimed baz foo v22\n\n$ # demonstrating the <=> operator\n$ ruby -e 'puts 4 <=> 2'\n1\n$ ruby -e 'puts 4 <=> 20'\n-1\n$ ruby -e 'puts 4 <=> 4'\n0\n\n$ # descending order\n$ # same as: perl -lane 'print join \" \", sort {$b cmp $a} @F'\n$ echo \"$s\" | ruby -lane 'print $F.sort { |a,b| b <=> a } * \" \"'\nv22 foo baz aimed\n$ # can also reverse the array after default sorting\n$ echo \"$s\" | ruby -lane 'print $F.sort.reverse * \" \"'\nv22 foo baz aimed\n```\n\n* using `sort_by` to sort based on a key\n\n```bash\n$ s='floor bat to dubious four'\n$ # can also use: ruby -lane 'print $F.sort_by(&:length) * \":\"'\n$ echo \"$s\" | ruby -lane 'print $F.sort_by {|a| a.length} * \":\"'\nto:bat:four:floor:dubious\n\n$ # for descending order, simply negate the key\n$ echo \"$s\" | ruby -lane 'print $F.sort_by {|a| -a.length} * \":\"'\ndubious:floor:four:bat:to\n\n$ # need to explicitly convert from string to number for numeric input\n$ s='23 756 -983 5'\n$ echo \"$s\" | ruby -lane 'print $F.sort_by(&:to_i) * \" \"'\n-983 5 23 756\n$ s='5.33:2.2e3:42'\n$ echo \"$s\" | ruby -F: -lane 'print $F.sort_by{|n| -n.to_f} * \":\"'\n2.2e3:42:5.33\n```\n\n* sorting characters within word\n* `chars` method returns array with individual characters\n\n```bash\n$ echo 'foobar' | ruby -lne 'print $_.chars.sort * \"\"'\nabfoor\n\n$ cat words.txt\nbot\nart\nare\nboat\ntoe\nflee\nreed\n\n$ # words with characters in ascending order\n$ # can also use: ruby -lne 'print if $_.chars == $_.chars.sort' words.txt\n$ ruby -lne 'print if $_ == $_.chars.sort * \"\"' words.txt\nbot\nart\n\n$ # words with characters in descending order\n$ # can also use: ruby -lne 'print if $_.chars == $_.chars.sort.reverse'\n$ ruby -lne 'print if $_ == $_.chars.sort {|a,b| b <=> a} * \"\"' words.txt\ntoe\nreed\n```\n\n* sorting columns based on header\n\n```bash\n$ # need to get indexes of order required for header, then use it for all lines\n$ # same as: perl -lane '@i = sort {$F[$a] cmp $F[$b]} 0..$#F if $.==1;\n$ #              print join \"\\t\", @F[@i]' marks.txt\n$ ruby -lane 'idx = $F.each_index.sort {|i,j| $F[i] <=> $F[j]} if $.==1;\n              print $F.values_at(*idx) * \"\\t\"' marks.txt\nDept    Marks   Name\nECE     53      Raj\nECE     72      Joel\nEEE     68      Moi\nCSE     81      Surya\nEEE     59      Tia\nECE     92      Om\nCSE     67      Amy\n```\n\n* [ruby-doc: uniq](https://ruby-doc.org/core-2.5.0/Array.html#method-i-uniq)\n* order is preserved\n\n```bash\n$ s='3,b,a,c,d,1,d,c,2,3,1,b'\n$ # same as: perl -MList::MoreUtils=uniq -F, -lane 'print join \",\",uniq @F'\n$ echo \"$s\" | ruby -F, -lane 'print $F.uniq * \",\"'\n3,b,a,c,d,1,2\n\n$ # same as: ruby -rset -ane 'BEGIN{s=Set.new}; print if s.add?($F[1])'\n$ # note that -n/-p option is not used\n$ ruby -e 'puts readlines.uniq {|s| s.split[1]}' duplicates.txt\nabc  7   4\nfood toy ****\n```\n\n* max/min values\n\n```bash\n$ # if numeric array is constructed from string input\n$ echo '34,17,6' | ruby -F, -lane 'print $F.max {|a,b| a.to_i <=> b.to_i}'\n34\n$ # or convert numeric array first, 'map' is covered in next section\n$ echo '34,17,6' | ruby -F, -lane 'print $F.map(&:to_i).max'\n34\n$ echo '23.5,42,-36' | ruby -F, -lane 'puts $F.map(&:to_f).max'\n42.0\n\n$ # string comparison is default\n$ s='floor bat to dubious four'\n$ echo \"$s\" | ruby -lane 'print $F.min'\nbat\n\n$ # can also get max/min 'n' elements\n$ echo \"$s\" | ruby -lane 'print $F.max(2)'\n[\"to\", \"four\"]\n$ echo \"$s\" | ruby -lane 'print $F.min(3) {|a,b| a.size <=> b.size}'\n[\"to\", \"bat\", \"four\"]\n```\n\n<br>\n\n#### <a name=\"transforming\"></a>Transforming\n\n* shuffling elements\n\n```bash\n$ s='23 756 -983 5'\n$ echo \"$s\" | ruby -lane 'print $F.shuffle * \" \"'\n5 756 -983 23\n$ echo \"$s\" | ruby -lane 'print $F.shuffle * \" \"'\n756 5 23 -983\n\n$ # randomizing file contents\n$ # note that -n/-p option is not used\n$ ruby -e 'puts readlines.shuffle' poem.txt\nAnd so are you.\nViolets are blue,\nRoses are red,\nSugar is sweet,\n\n$ # or if shuffle order is known\n$ seq 5 | ruby -e 'puts readlines.values_at(3,1,0,2,4)'\n4\n2\n1\n3\n5\n```\n\n* use `map` to transform every element\n* See also [stackoverflow What does map(&:name) mean in Ruby?](https://stackoverflow.com/questions/1217088/what-does-mapname-mean-in-ruby) for explanation on `&:`\n\n```bash\n$ echo '23 756 -983 5' | ruby -lane 'print $F.map {|n| n.to_i ** 2} * \" \"'\n529 571536 966289 25\n$ echo 'a b c' | ruby -lane 'print $F.map {|s| %Q/\"#{s}\"/} * \",\"'\n\"a\",\"b\",\"c\"\n$ echo 'a b c' | ruby -lane 'print $F.map {|s| %Q/\"#{s}\"/.upcase} * \",\"'\n\"A\",\"B\",\"C\"\n\n$ # ASCII int values for each character\n$ echo 'AaBbCc' | ruby -lne 'print $_.chars.map(&:ord) * \" \"'\n65 97 66 98 67 99\n\n$ echo '34,17,6' | ruby -F, -lane 'puts $F.map(&:to_i).sum'\n57\n\n$ # shuffle each field character wise\n$ s='this is a sample sentence'\n$ echo \"$s\" | ruby -lane 'print $F.map {|s| s.chars.shuffle * \"\"} * \" \"'\nhsti si a mlepas esencnet\n```\n\n* reverse array/string\n\n```bash\n$ s='23 756 -983 5'\n$ echo \"$s\" | ruby -lane 'print $F.reverse * \" \"'\n5 -983 756 23\n\n$ echo 'foobar' | ruby -lne 'print $_.reverse'\nraboof\n$ # or inplace reverse\n$ echo 'foobar' | ruby -lpe '$_.reverse!'\nraboof\n```\n\n* See also [ruby-doc: Enumerable](https://ruby-doc.org/core-2.5.0/Enumerable.html) for more methods like `inject`\n\n<br>\n\n## <a name=\"miscellaneous\"></a>Miscellaneous\n\n<br>\n\n#### <a name=\"split\"></a>split\n\n* the `-a` command line option uses `split` and automatically saves the results in `$F` array\n* default separator is `\\s+` and also strips whitespace from start/end of string\n* See also [ruby-doc: split](https://ruby-doc.org/core-2.5.0/String.html#method-i-split)\n\n```bash\n$ # specifying maximum number of splits\n$ # same as: perl -lne 'print join \":\", split /\\s+/,$_,2'\n$ echo 'a 1 b 2 c' | ruby -lne 'print $_.split(/\\s+/, 2) * \":\"'\na:1 b 2 c\n\n$ # by default, trailing empty fields are stripped\n$ echo ':123::' | ruby -lne 'print $_.split(/:/) * \",\"'\n,123\n$ # specify a negative count to preserve trailing empty fields\n$ echo ':123::' | ruby -lne 'print $_.split(/:/, -1) * \",\"'\n,123,,\n\n$ # use string argument for fixed-string split instead of regexp\n$ echo 'foo**123**baz' | ruby -lne 'print $_.split(\"**\") * \":\"'\nfoo:123:baz\n\n$ # to save the separators as well, use capture groups\n$ s='Sample123string54with908numbers'\n$ echo \"$s\" | ruby -lne 'print $_.split(/(\\d+)/) * \":\"'\nSample:123:string:54:with:908:numbers\n```\n\n* single line to multiple line by splitting a column\n\n```bash\n$ cat split.txt\nfoo,1:2:5,baz\nwry,4,look\nfree,3:8,oh\n\n$ # same as: perl -F, -ane 'print join \",\", $F[0],$_,$F[2] for split /:/,$F[1]'\n$ ruby -F, -ane '$F[1].split(/:/).each {|x| print [$F[0],x,$F[2]]*\",\"}' split.txt\nfoo,1,baz\nfoo,2,baz\nfoo,5,baz\nwry,4,look\nfree,3,oh\nfree,8,oh\n$ # can also use scan here:\n$ # ruby -F, -ane '$F[1].scan(/[^:]+/) {|x| print [$F[0],x,$F[2]]*\",\"}'\n```\n\n<br>\n\n#### <a name=\"fixed-width-processing\"></a>Fixed width processing\n\n* [ruby-doc: unpack](https://ruby-doc.org/core-2.5.0/String.html#method-i-unpack)\n\n```bash\n$ # same as: perl -lne '@x = unpack(\"a1xa3xa4\", $_); print $x[0]'\n$ # here 'a' indicates arbitrary binary string\n$ # the number that follows indicates length\n$ # the 'x' indicates characters to ignore, use length after 'x' if needed\n$ # and there are many other formats, see ruby-doc for details\n$ echo 'b 123 good' | ruby -lne 'print $_.unpack(\"a1xa3xa4\")[0]'\nb\n$ echo 'b 123 good' | ruby -lne 'print $_.unpack(\"a1xa3xa4\")[1]'\n123\n$ echo 'b 123 good' | ruby -lne 'print $_.unpack(\"a1xa3xa4\")[2]'\ngood\n\n$ # unpack not always needed, simple slicing might help\n$ echo 'b 123 good' | ruby -ne 'puts $_[2,3]'\n123\n$ echo 'b 123 good' | ruby -ne 'puts $_[6,4]'\ngood\n\n$ # replacing arbitrary slice\n$ # same as: perl -lpe 'substr $_, 2, 3, \"gleam\"'\n$ echo 'b 123 good' | ruby -lpe '$_[2,3] = \"gleam\"'\nb gleam good\n```\n\n<br>\n\n#### <a name=\"string-and-file-replication\"></a>String and file replication\n\n```bash\n$ # replicate each line, same as: perl -ne 'print $_ x 2'\n$ seq 2 | ruby -ne 'print $_ * 2'\n1\n1\n2\n2\n\n$ # replicate a string, same as: perl -le 'print \"abc\" x 5'\n$ ruby -e 'puts \"abc\" * 5'\nabcabcabcabcabc\n\n$ # works for array too, but be careful with mutable elements\n$ ruby -le 'x = [3, 2, 1] * 2; print x'\n[3, 2, 1, 3, 2, 1]\n$ ruby -le 'x = [3, 2, [1, 7]] * 2; x[2][0]=\"a\"; print x'\n[3, 2, [\"a\", 7], 3, 2, [\"a\", 7]]\n\n$ # replicating file, same as: perl -0777 -ne 'print $_ x 100'\n$ wc -c poem.txt\n65 poem.txt\n$ ruby -0777 -ne 'print $_ * 100' poem.txt | wc -c\n6500\n```\n\n<br>\n\n#### <a name=\"transliteration\"></a>transliteration\n\n* [ruby-doc: tr](https://ruby-doc.org/core-2.5.0/String.html#method-i-tr)\n\n```bash\n$ echo 'Uryyb Jbeyq' | ruby -pe '$_.tr!(\"a-zA-Z\", \"n-za-mN-ZA-M\")'\nHello World\n$ echo 'hi there!' | ruby -pe '$_.tr!(\"a-z\", \"\\u{1d5ee}-\\u{1d607}\")'\n𝗵𝗶 𝘁𝗵𝗲𝗿𝗲!\n\n$ # when first argument is longer\n$ # the last character of second argument is padded\n$ echo 'foo bar cat baz' | ruby -pe '$_.tr!(\"a-z\", \"123\")'\n333 213 313 213\n\n$ # use ^ at start of first argument to complement specified characters\n$ echo 'foo:123:baz' | ruby -lpe '$_.tr!(\"^0-9\", \"-\")'\n----123----\n\n$ # use empty second argument to delete specified characters\n$ echo '\"Foo1!\", \"Bar.\", \":Baz:\"' | ruby -lpe '$_.tr!(\"^A-Za-z,\", \"\")'\nFoo,Bar,Baz\n\n$ # use - at start/end and ^ other than start to match themselves\n$ echo 'a^3-b*d' | ruby -lpe '$_.tr!(\"-^*\", \"*/+\")'\na/3*b+d\n```\n\n<br>\n\n#### <a name=\"executing-external-commands\"></a>Executing external commands\n\n* External commands can be issued using `system` function\n* Output would be as usual on `stdout` unless redirected while calling the command\n\n```bash\n$ # same as: perl -e 'system(\"echo Hello World\")'\n$ ruby -e 'system(\"echo Hello World\")'\nHello World\n\n$ ruby -e 'system(\"wc poem.txt\")'\n 4 13 65 poem.txt\n\n$ ruby -e 'system(\"seq 10 | paste -sd, > out.txt\")'\n$ cat out.txt\n1,2,3,4,5,6,7,8,9,10\n\n$ cat f2\nI bought two bananas and three mangoes\n$ # same as: perl -F, -lane 'system \"cat $F[1]\"'\n$ echo 'f1,f2,odd.txt' | ruby -F, -lane 'system(\"cat #{$F[1]}\")'\nI bought two bananas and three mangoes\n```\n\n* return value of `system` or global variable `$?` can be used to act upon exit status of command issued\n* see [ruby-doc: system](https://ruby-doc.org/core-2.5.0/Kernel.html#method-i-system) for details\n\n```bash\n$ ruby -e 'es=system(\"ls poem.txt\"); puts es'\npoem.txt\ntrue\n$ ruby -e 'system(\"ls poem.txt\"); puts $?'\npoem.txt\npid 17005 exit 0\n\n$ ruby -e 'system(\"ls xyz.txt\"); puts $?'\nls: cannot access 'xyz.txt': No such file or directory\npid 17059 exit 2\n```\n\n* to save result of external command, use backticks or `%x`\n\n```bash\n$ ruby -e 'lines = `wc -l < poem.txt`; print lines'\n4\n\n$ ruby -e 'nums = %x/seq 3/; print nums'\n1\n2\n3\n```\n\n* See also [stackoverflow - difference between exec, system and %x() or backticks](https://stackoverflow.com/questions/6338908/ruby-difference-between-exec-system-and-x-or-backticks)\n\n<br>\n\n## <a name=\"further-reading\"></a>Further Reading\n\n* Manual and related\n    * [ruby-lang documentation](https://www.ruby-lang.org/en/documentation/) - manuals, tutorials and references\n    * [ruby-lang - faqs](https://www.ruby-lang.org/en/documentation/faq/)\n    * [ruby-lang - quickstart](https://www.ruby-lang.org/en/documentation/quickstart/)\n    * [ruby-lang - To Ruby From Perl](https://www.ruby-lang.org/en/documentation/ruby-from-other-languages/to-ruby-from-perl/)\n    * [rubular - Ruby regular expression editor](http://rubular.com/)\n* Tutorials and Q&A\n    * [Smooth Ruby One-Liners](https://dev.to/rpalo/smooth-ruby-one-liners-154) - simple intro to ruby one-liners\n    * [Ruby one-liners](http://benoithamelin.tumblr.com/ruby1line) based on [awk one-liners](http://www.pement.org/awk/awk1line.txt)\n    * [Ruby Tricks, Idiomatic Ruby, Refactorings and Best Practices](https://franzejr.github.io/best-ruby/index.html)\n    * [freecodecamp - learning Ruby](https://medium.freecodecamp.org/learning-ruby-from-zero-to-hero-90ad4eecc82d)\n    * [Ruby Regexp](https://leanpub.com/rubyregexp) ebook - step by step guide from beginner to advanced levels\n    * [regex FAQ on SO](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)\n* Alternatives\n    * [bioruby](https://github.com/bioruby/bioruby)\n    * [perl](https://perldoc.perl.org/)\n    * [unix.stackexchange - When to use grep, sed, awk, perl, etc](https://unix.stackexchange.com/questions/303044/when-to-use-grep-less-awk-sed)\n\n"
  },
  {
    "path": "sorting_stuff.md",
    "content": "# <a name=\"sorting-stuff\"></a>Sorting stuff\n\n**Table of Contents**\n\n* [sort](#sort)\n    * [Default sort](#default-sort)\n    * [Reverse sort](#reverse-sort)\n    * [Various number sorting](#various-number-sorting)\n    * [Random sort](#random-sort)\n    * [Specifying output file](#specifying-output-file)\n    * [Unique sort](#unique-sort)\n    * [Column based sorting](#column-based-sorting)\n    * [Further reading for sort](#further-reading-for-sort)\n* [uniq](#uniq)\n    * [Default uniq](#default-uniq)\n    * [Only duplicates](#only-duplicates)\n    * [Only unique](#only-unique)\n    * [Prefix count](#prefix-count)\n    * [Ignoring case](#ignoring-case)\n    * [Combining multiple files](#combining-multiple-files)\n    * [Column options](#column-options)\n    * [Further reading for uniq](#further-reading-for-uniq)\n* [comm](#comm)\n    * [Default three column output](#default-three-column-output)\n    * [Suppressing columns](#suppressing-columns)\n    * [Files with duplicates](#files-with-duplicates)\n    * [Further reading for comm](#further-reading-for-comm)\n* [shuf](#shuf)\n    * [Random lines](#random-lines)\n    * [Random integer numbers](#random-integer-numbers)\n    * [Further reading for shuf](#further-reading-for-shuf)\n\n<br>\n\n## <a name=\"sort\"></a>sort\n\n```bash\n$ sort --version | head -n1\nsort (GNU coreutils) 8.25\n\n$ man sort\nSORT(1)                          User Commands                         SORT(1)\n\nNAME\n       sort - sort lines of text files\n\nSYNOPSIS\n       sort [OPTION]... [FILE]...\n       sort [OPTION]... --files0-from=F\n\nDESCRIPTION\n       Write sorted concatenation of all FILE(s) to standard output.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n**Note**: All examples shown here assumes ASCII encoded input file\n\n\n<br>\n\n#### <a name=\"default-sort\"></a>Default sort\n\n```bash\n$ cat poem.txt\nRoses are red,\nViolets are blue,\nSugar is sweet,\nAnd so are you.\n\n$ sort poem.txt\nAnd so are you.\nRoses are red,\nSugar is sweet,\nViolets are blue,\n```\n\n* Well, that was easy. The lines were sorted alphabetically (ascending order by default) and it so happened that first letter alone was enough to decide the order\n* For next example, let's extract all the words and sort them\n    * also allows to showcase `sort` accepting stdin\n    * See [GNU grep](./gnu_grep.md) chapter if the `grep` command used below looks alien\n\n```bash\n$ # output might differ depending on locale settings\n$ # note the case-insensitiveness of output\n$ grep -oi '[a-z]*' poem.txt | sort\nAnd\nare\nare\nare\nblue\nis\nred\nRoses\nso\nSugar\nsweet\nViolets\nyou\n```\n\n* heed hereunto\n* See also\n    * [arch wiki - locale](https://wiki.archlinux.org/index.php/locale)\n    * [Linux: Define Locale and Language Settings](https://www.shellhacks.com/linux-define-locale-language-settings/)\n\n```bash\n$ info sort | tail\n\n   (1) If you use a non-POSIX locale (e.g., by setting ‘LC_ALL’ to\n‘en_US’), then ‘sort’ may produce output that is sorted differently than\nyou’re accustomed to.  In that case, set the ‘LC_ALL’ environment\nvariable to ‘C’.  Note that setting only ‘LC_COLLATE’ has two problems.\nFirst, it is ineffective if ‘LC_ALL’ is also set.  Second, it has\nundefined behavior if ‘LC_CTYPE’ (or ‘LANG’, if ‘LC_CTYPE’ is unset) is\nset to an incompatible value.  For example, you get undefined behavior\nif ‘LC_CTYPE’ is ‘ja_JP.PCK’ but ‘LC_COLLATE’ is ‘en_US.UTF-8’.\n```\n\n* Example to help show effect of locale setting\n\n```bash\n$ # note how uppercase is sorted before lowercase\n$ grep -oi '[a-z]*' poem.txt | LC_ALL=C sort\nAnd\nRoses\nSugar\nViolets\nare\nare\nare\nblue\nis\nred\nso\nsweet\nyou\n```\n\n<br>\n\n#### <a name=\"reverse-sort\"></a>Reverse sort\n\n* This is simply reversing from default ascending order to descending order\n\n```bash\n$ sort -r poem.txt\nViolets are blue,\nSugar is sweet,\nRoses are red,\nAnd so are you.\n```\n\n<br>\n\n#### <a name=\"various-number-sorting\"></a>Various number sorting\n\n```bash\n$ cat numbers.txt\n20\n53\n3\n101\n\n$ sort numbers.txt\n101\n20\n3\n53\n```\n\n* Whoops, what happened there? `sort` won't know to treat them as numbers unless specified\n* Depending on format of numbers, different options have to be used\n* First up is `-n` option, which sorts based on numerical value\n\n```bash\n$ sort -n numbers.txt\n3\n20\n53\n101\n\n$ sort -nr numbers.txt\n101\n53\n20\n3\n```\n\n* The `-n` option can handle negative numbers\n* As well as thousands separator and decimal point (depends on locale)\n* The `<()` syntax is [Process Substitution](http://mywiki.wooledge.org/ProcessSubstitution)\n    * to put it simply - allows output of command to be passed as input file to another command without needing to manually create a temporary file\n\n```bash\n$ # multiple files are merged as single input by default\n$ sort -n numbers.txt <(echo '-4')\n-4\n3\n20\n53\n101\n\n$ sort -n numbers.txt <(echo '1,234')\n3\n20\n53\n101\n1,234\n\n$ sort -n numbers.txt <(echo '31.24')\n3\n20\n31.24\n53\n101\n```\n\n* Use `-g` if input contains numbers prefixed by `+` or [E scientific notation](https://en.wikipedia.org/wiki/Scientific_notation#E_notation)\n\n```bash\n$ cat generic_numbers.txt\n+120\n-1.53\n3.14e+4\n42.1e-2\n\n$ sort -g generic_numbers.txt\n-1.53\n42.1e-2\n+120\n3.14e+4\n```\n\n* Commands like `du` have options to display numbers in human readable formats\n* `sort` supports sorting such numbers using the `-h` option\n\n```bash\n$ du -sh *\n104K    power.log\n746M    projects\n316K    report.log\n20K     sample.txt\n$ du -sh * | sort -h\n20K     sample.txt\n104K    power.log\n316K    report.log\n746M    projects\n\n$ # --si uses powers of 1000 instead of 1024\n$ du -s --si *\n107k    power.log\n782M    projects\n324k    report.log\n21k     sample.txt\n$ du -s --si * | sort -h\n21k     sample.txt\n107k    power.log\n324k    report.log\n782M    projects\n```\n\n* Version sort - dealing with numbers mixed with other characters\n* If this sorting is needed simply while displaying directory contents, use `ls -v` instead of piping to `sort -V`\n\n```bash\n$ cat versions.txt\nfoo_v1.2\nbar_v2.1.3\nfoobar_v2\nfoo_v1.2.1\nfoo_v1.3\n\n$ sort -V versions.txt\nbar_v2.1.3\nfoobar_v2\nfoo_v1.2\nfoo_v1.2.1\nfoo_v1.3\n```\n\n* Another common use case is when there are multiple filenames differentiated by numbers\n\n```bash\n$ cat files.txt\nfile0\nfile10\nfile3\nfile4\n\n$ sort -V files.txt\nfile0\nfile3\nfile4\nfile10\n```\n\n* Can be used when dealing with numbers reported by `time` command as well\n\n```bash\n$ # different solving durations\n$ cat rubik_time.txt\n5m35.363s\n3m20.058s\n4m5.099s\n4m1.130s\n3m42.833s\n4m33.083s\n\n$ # assuming consistent min/sec format\n$ sort -V rubik_time.txt\n3m20.058s\n3m42.833s\n4m1.130s\n4m5.099s\n4m33.083s\n5m35.363s\n```\n\n<br>\n\n#### <a name=\"random-sort\"></a>Random sort\n\n* Note that duplicate lines will always end up next to each other\n    * might be useful as a feature for some cases ;)\n    * Use `shuf` if this is not desirable\n* See also [How can I shuffle the lines of a text file on the Unix command line or in a shell script?](https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel)\n\n```bash\n$ cat nums.txt\n1\n10\n10\n12\n23\n563\n\n$ # the two 10s will always be next to each other\n$ sort -R nums.txt\n563\n12\n1\n10\n10\n23\n\n$ # duplicates can end up anywhere\n$ shuf nums.txt\n10\n23\n1\n10\n563\n12\n```\n\n<br>\n\n#### <a name=\"specifying-output-file\"></a>Specifying output file\n\n* The `-o` option can be used to specify output file\n* Useful for in place editing\n\n```bash\n$ sort -R nums.txt -o rand_nums.txt\n$ cat rand_nums.txt\n23\n1\n10\n10\n563\n12\n\n$ sort -R nums.txt -o nums.txt\n$ cat nums.txt\n563\n23\n10\n10\n1\n12\n```\n\n* Use shell script looping if there multiple files to be sorted in place\n* Below snippet is for `bash` shell\n\n```bash\n$ for f in *.txt; do echo sort -V \"$f\" -o \"$f\"; done\nsort -V files.txt -o files.txt\nsort -V rubik_time.txt -o rubik_time.txt\nsort -V versions.txt -o versions.txt\n\n$ # remove echo once commands look fine\n$ for f in *.txt; do sort -V \"$f\" -o \"$f\"; done\n```\n\n<br>\n\n#### <a name=\"unique-sort\"></a>Unique sort\n\n* Keep only first copy of lines that are deemed to be same according to `sort` option used\n\n```bash\n$ cat duplicates.txt\nfoo\n12 carrots\nfoo\n12 apples\n5 guavas\n\n$ # only one copy of foo in output\n$ sort -u duplicates.txt\n12 apples\n12 carrots\n5 guavas\nfoo\n```\n\n* According to option used, definition of duplicate will vary\n* For example, when `-n` is used, matching numbers are deemed same even if rest of line differs\n    * Pipe the output to `uniq` if this is not desirable\n\n```bash\n$ # note how first copy of line starting with 12 is retained\n$ sort -nu duplicates.txt\nfoo\n5 guavas\n12 carrots\n\n$ # use uniq when entire line should be compared to find duplicates\n$ sort -n duplicates.txt | uniq\nfoo\n5 guavas\n12 apples\n12 carrots\n```\n\n* Use `-f` option to ignore case of alphabets while determining duplicates\n\n```bash\n$ cat words.txt\nCAR\nare\ncar\nAre\nfoot\nare\n\n$ # only the two 'are' were considered duplicates\n$ sort -u words.txt\nare\nAre\ncar\nCAR\nfoot\n\n$ # note again that first copy of duplicate is retained\n$ sort -fu words.txt\nare\nCAR\nfoot\n```\n\n<br>\n\n#### <a name=\"column-based-sorting\"></a>Column based sorting\n\nFrom `info sort`\n\n```\n‘-k POS1[,POS2]’\n‘--key=POS1[,POS2]’\n     Specify a sort field that consists of the part of the line between\n     POS1 and POS2 (or the end of the line, if POS2 is omitted),\n     _inclusive_.\n\n     Each POS has the form ‘F[.C][OPTS]’, where F is the number of the\n     field to use, and C is the number of the first character from the\n     beginning of the field.  Fields and character positions are\n     numbered starting with 1; a character position of zero in POS2\n     indicates the field’s last character.  If ‘.C’ is omitted from\n     POS1, it defaults to 1 (the beginning of the field); if omitted\n     from POS2, it defaults to 0 (the end of the field).  OPTS are\n     ordering options, allowing individual keys to be sorted according\n     to different rules; see below for details.  Keys can span multiple\n     fields.\n```\n\n* By default, blank characters (space and tab) serve as field separators\n\n```bash\n$ cat fruits.txt\napple   42\nguava   6\nfig     90\nbanana  31\n\n$ sort fruits.txt\napple   42\nbanana  31\nfig     90\nguava   6\n\n$ # sort based on 2nd column numbers\n$ sort -k2,2n fruits.txt\nguava   6\nbanana  31\napple   42\nfig     90\n```\n\n* Using a different field separator\n* Consider the following sample input file having fields separated by `:`\n\n```bash\n$ # name:pet_name:no_of_pets\n$ cat pets.txt\nfoo:dog:2\nxyz:cat:1\nbaz:parrot:5\nabcd:cat:3\njoe:dog:1\nbar:fox:1\ntemp_var:squirrel:4\nboss:dog:10\n```\n\n* Sorting based on particular column or column to end of line\n* In case of multiple entries, by default `sort` would use content of remaining parts of line to resolve\n\n```bash\n$ # only 2nd column\n$ # -k2,4 would mean 2nd column to 4th column\n$ sort -t: -k2,2 pets.txt\nabcd:cat:3\nxyz:cat:1\nboss:dog:10\nfoo:dog:2\njoe:dog:1\nbar:fox:1\nbaz:parrot:5\ntemp_var:squirrel:4\n\n$ # from 2nd column to end of line\n$ sort -t: -k2 pets.txt\nxyz:cat:1\nabcd:cat:3\njoe:dog:1\nboss:dog:10\nfoo:dog:2\nbar:fox:1\nbaz:parrot:5\ntemp_var:squirrel:4\n```\n\n* Multiple keys can be specified to resolve ties\n* Note that if there are still multiple entries with specified keys, remaining parts of lines would be used\n\n```bash\n$ # default sort for 2nd column, numeric sort on 3rd column to resolve ties\n$ sort -t: -k2,2 -k3,3n pets.txt\nxyz:cat:1\nabcd:cat:3\njoe:dog:1\nfoo:dog:2\nboss:dog:10\nbar:fox:1\nbaz:parrot:5\ntemp_var:squirrel:4\n\n$ # numeric sort on 3rd column, default sort for 2nd column to resolve ties\n$ sort -t: -k3,3n -k2,2 pets.txt\nxyz:cat:1\njoe:dog:1\nbar:fox:1\nfoo:dog:2\nabcd:cat:3\ntemp_var:squirrel:4\nbaz:parrot:5\nboss:dog:10\n```\n\n* Use `-s` option to retain original order of lines in case of tie\n\n```bash\n$ sort -s -t: -k2,2 pets.txt\nxyz:cat:1\nabcd:cat:3\nfoo:dog:2\njoe:dog:1\nboss:dog:10\nbar:fox:1\nbaz:parrot:5\ntemp_var:squirrel:4\n```\n\n* The `-u` option, as seen earlier, will retain only first match\n\n```bash\n$ sort -u -t: -k2,2 pets.txt\nxyz:cat:1\nfoo:dog:2\nbar:fox:1\nbaz:parrot:5\ntemp_var:squirrel:4\n\n$ sort -u -t: -k3,3n pets.txt\nxyz:cat:1\nfoo:dog:2\nabcd:cat:3\ntemp_var:squirrel:4\nbaz:parrot:5\nboss:dog:10\n```\n\n* Sometimes, the input has to be sorted first and then `-u` used on the sorted output\n* See also [remove duplicates based on the value of another column](https://unix.stackexchange.com/questions/379835/remove-duplicates-based-on-the-value-of-another-column)\n\n```bash\n$ # sort by number in 3rd column\n$ sort -t: -k3,3n pets.txt\nbar:fox:1\njoe:dog:1\nxyz:cat:1\nfoo:dog:2\nabcd:cat:3\ntemp_var:squirrel:4\nbaz:parrot:5\nboss:dog:10\n\n$ # then get unique entry based on 2nd column\n$ sort -t: -k3,3n pets.txt | sort -t: -u -k2,2\nxyz:cat:1\njoe:dog:1\nbar:fox:1\nbaz:parrot:5\ntemp_var:squirrel:4\n```\n\n* Specifying particular characters within fields\n* If character position is not specified, defaults to `1` for starting column and `0` (last character) for ending column\n\n```bash\n$ cat marks.txt\nfork,ap_12,54\nflat,up_342,1.2\nfold,tn_48,211\nmore,ap_93,7\nrest,up_5,63\n\n$ # for 2nd column, sort numerically only from 4th character to end\n$ sort -t, -k2.4,2n marks.txt\nrest,up_5,63\nfork,ap_12,54\nfold,tn_48,211\nmore,ap_93,7\nflat,up_342,1.2\n\n$ # sort uniquely based on first two characters of line\n$ sort -u -k1.1,1.2 marks.txt\nflat,up_342,1.2\nfork,ap_12,54\nmore,ap_93,7\nrest,up_5,63\n```\n\n* If there are headers\n\n```bash\n$ cat header.txt\nfruit   qty\napple   42\nguava   6\nfig     90\nbanana  31\n\n$ # separate and combine header and content to be sorted\n$ cat <(head -n1 header.txt) <(tail -n +2 header.txt | sort -k2nr)\nfruit   qty\nfig     90\napple   42\nbanana  31\nguava   6\n```\n\n* See also [sort by last field value when number of fields varies](https://stackoverflow.com/questions/3832068/bash-sort-text-file-by-last-field-value)\n\n<br>\n\n#### <a name=\"further-reading-for-sort\"></a>Further reading for sort\n\n* There are many other options apart from handful presented above. See `man sort` and `info sort` for detailed documentation and more examples\n* [sort like a master](http://www.skorks.com/2010/05/sort-files-like-a-master-with-the-linux-sort-command-bash/)\n* [When -b to ignore leading blanks is needed](https://unix.stackexchange.com/a/104527/109046)\n* [sort Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/sort?sort=votes&pageSize=15)\n* [sort on multiple columns using -k option](https://unix.stackexchange.com/questions/249452/unix-multiple-column-sort-issue)\n* [sort a string character wise](https://stackoverflow.com/questions/2373874/how-to-sort-characters-in-a-string)\n* [Scalability of 'sort -u' for gigantic files](https://unix.stackexchange.com/questions/279096/scalability-of-sort-u-for-gigantic-files)\n\n<br>\n\n## <a name=\"uniq\"></a>uniq\n\n```bash\n$ uniq --version | head -n1\nuniq (GNU coreutils) 8.25\n\n$ man uniq\nUNIQ(1)                          User Commands                         UNIQ(1)\n\nNAME\n       uniq - report or omit repeated lines\n\nSYNOPSIS\n       uniq [OPTION]... [INPUT [OUTPUT]]\n\nDESCRIPTION\n       Filter  adjacent matching lines from INPUT (or standard input), writing\n       to OUTPUT (or standard output).\n\n       With no options, matching lines are merged to the first occurrence.\n...\n```\n\n<br>\n\n#### <a name=\"default-uniq\"></a>Default uniq\n\n```bash\n$ cat word_list.txt\nare\nare\nto\ngood\nbad\nbad\nbad\ngood\nare\nbad\n\n$ # adjacent duplicate lines are removed, leaving one copy\n$ uniq word_list.txt\nare\nto\ngood\nbad\ngood\nare\nbad\n\n$ # To remove duplicates from entire file, input has to be sorted first\n$ # also showcases that uniq accepts stdin as input\n$ sort word_list.txt | uniq\nare\nbad\ngood\nto\n```\n\n<br>\n\n#### <a name=\"only-duplicates\"></a>Only duplicates\n\n```bash\n$ # duplicates adjacent to each other\n$ uniq -d word_list.txt\nare\nbad\n\n$ # duplicates in entire file\n$ sort word_list.txt | uniq -d\nare\nbad\ngood\n```\n\n* To get only duplicates as well as show all duplicates\n\n```bash\n$ uniq -D word_list.txt\nare\nare\nbad\nbad\nbad\n\n$ sort word_list.txt | uniq -D\nare\nare\nare\nbad\nbad\nbad\nbad\ngood\ngood\n```\n\n* To distinguish the different groups\n\n```bash\n$ # using --all-repeated=prepend will add a newline before the first group as well\n$ sort word_list.txt | uniq --all-repeated=separate\nare\nare\nare\n\nbad\nbad\nbad\nbad\n\ngood\ngood\n```\n\n<br>\n\n#### <a name=\"only-unique\"></a>Only unique\n\n```bash\n$ # lines with no adjacent duplicates\n$ uniq -u word_list.txt\nto\ngood\ngood\nare\nbad\n\n$ # unique lines in entire file\n$ sort word_list.txt | uniq -u\nto\n```\n\n<br>\n\n#### <a name=\"prefix-count\"></a>Prefix count\n\n```bash\n$ # adjacent lines\n$ uniq -c word_list.txt\n      2 are\n      1 to\n      1 good\n      3 bad\n      1 good\n      1 are\n      1 bad\n\n$ # entire file\n$ sort word_list.txt | uniq -c\n      3 are\n      4 bad\n      2 good\n      1 to\n\n$ # entire file, only duplicates\n$ sort word_list.txt | uniq -cd\n      3 are\n      4 bad\n      2 good\n```\n\n* Sorting by count\n\n```bash\n$ # sort by count\n$ sort word_list.txt | uniq -c | sort -n\n      1 to\n      2 good\n      3 are\n      4 bad\n\n$ # reverse the order, highest count first\n$ sort word_list.txt | uniq -c | sort -nr\n      4 bad\n      3 are\n      2 good\n      1 to\n```\n\n* To get only entries with min/max count, bit of [awk](./gnu_awk.md) magic would help\n\n```bash\n$ # consider this result\n$ sort colors.txt | uniq -c | sort -nr\n      3 Red\n      3 Blue\n      2 Yellow\n      1 Green\n      1 Black\n\n$ # to get all max count\n$ # save 1st line 1st column value to c and then print if 1st column equals c\n$ sort colors.txt | uniq -c | sort -nr | awk 'NR==1{c=$1} $1==c'\n      3 Red\n      3 Blue\n$ # to get all min count\n$ sort colors.txt | uniq -c | sort -n | awk 'NR==1{c=$1} $1==c'\n      1 Black\n      1 Green\n```\n\n* Get rough count of most used commands from `history` file\n\n```bash\n$ # awk '{print $1}' will get the 1st column alone\n$ awk '{print $1}' \"$HISTFILE\" | sort | uniq -c | sort -nr | head\n   1465 echo\n   1180 grep\n    552 cd\n    531 awk\n    451 sed\n    423 vi\n    418 cat\n    392 perl\n    325 printf\n    320 sort\n\n$ # extract command name from start of line or preceded by 'spaces|spaces'\n$ # won't catch commands in other places like command substitution though\n$ grep -oP '(^| +\\| +)\\K[^ ]+' \"$HISTFILE\" | sort | uniq -c | sort -nr | head\n   2006 grep\n   1469 echo\n    933 sed\n    698 awk\n    552 cd\n    513 perl\n    510 cat\n    453 sort\n    423 vi\n    327 printf\n```\n\n<br>\n\n#### <a name=\"ignoring-case\"></a>Ignoring case\n\n```bash\n$ cat another_list.txt\nfood\nFood\ngood\nare\nbad\nAre\n\n$ # note how first copy is retained\n$ uniq -i another_list.txt\nfood\ngood\nare\nbad\nAre\n\n$ uniq -iD another_list.txt\nfood\nFood\n```\n\n<br>\n\n#### <a name=\"combining-multiple-files\"></a>Combining multiple files\n\n```bash\n$ sort -f word_list.txt another_list.txt | uniq -i\nare\nbad\nfood\ngood\nto\n\n$ sort -f word_list.txt another_list.txt | uniq -c\n      4 are\n      1 Are\n      5 bad\n      1 food\n      1 Food\n      3 good\n      1 to\n\n$ sort -f word_list.txt another_list.txt | uniq -ic\n      5 are\n      5 bad\n      2 food\n      3 good\n      1 to\n```\n\n* If only adjacent lines (not sorted) is required, need to concatenate files using another command\n\n```bash\n$ uniq -id word_list.txt\nare\nbad\n\n$ uniq -id another_list.txt\nfood\n\n$ cat word_list.txt another_list.txt | uniq -id\nare\nbad\nfood\n```\n\n<br>\n\n#### <a name=\"column-options\"></a>Column options\n\n* `uniq` has few options dealing with column manipulations. Not extensive as `sort -k` but handy for some cases\n* First up, skipping fields\n    * No option to specify different delimiter\n    * From `info uniq`: Fields are sequences of non-space non-tab characters that are separated from each other by at least one space or tab\n    * Number of spaces/tabs between fields should be same\n\n```bash\n$ cat shopping.txt\nlemon 5\nmango 5\nbanana 8\nbread 1\norange 5\n\n$ # skips first field\n$ uniq -f1 shopping.txt\nlemon 5\nbanana 8\nbread 1\norange 5\n\n$ # use -f3 to skip first three fields and so on\n```\n\n* Skipping characters\n\n```bash\n$ cat text\nglue\nblue\nblack\nstack\nstuck\n\n$ # don't consider first 2 characters\n$ uniq -s2 text\nglue\nblack\nstuck\n\n$ # to visualize the above example\n$ # assume there are two fields and uniq is applied on 2nd column\n$ sed 's/^../& /' text\ngl ue\nbl ue\nbl ack\nst ack\nst uck\n```\n\n* Upto specified characters\n\n```bash\n$ # consider only first 2 characters\n$ uniq -w2 text\nglue\nblue\nstack\n\n$ # to visualize the above example\n$ # assume there are two fields and uniq is applied on 1st column\n$ sed 's/^../& /' text\ngl ue\nbl ue\nbl ack\nst ack\nst uck\n```\n\n* Combining `-s` and `-w`\n* Can be combined with `-f` as well\n\n```bash\n$ # skip first 3 characters and then use next 2 characters\n$ uniq -s3 -w2 text\nglue\nblack\n```\n\n\n<br>\n\n#### <a name=\"further-reading-for-uniq\"></a>Further reading for uniq\n\n* Do check out `man uniq` and `info uniq` for other options and more detailed documentation\n* [uniq Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/uniq?sort=votes&pageSize=15)\n* [process duplicate lines only based on certain fields](https://unix.stackexchange.com/questions/387590/print-the-duplicate-lines-only-on-fields-1-2-from-csv-file)\n\n<br>\n\n## <a name=\"comm\"></a>comm\n\n```bash\n$ comm --version | head -n1\ncomm (GNU coreutils) 8.25\n\n$ man comm\nCOMM(1)                          User Commands                         COMM(1)\n\nNAME\n       comm - compare two sorted files line by line\n\nSYNOPSIS\n       comm [OPTION]... FILE1 FILE2\n\nDESCRIPTION\n       Compare sorted files FILE1 and FILE2 line by line.\n\n       When FILE1 or FILE2 (not both) is -, read standard input.\n\n       With  no  options,  produce  three-column  output.  Column one contains\n       lines unique to FILE1, column two contains lines unique to  FILE2,  and\n       column three contains lines common to both files.\n...\n```\n\n<br>\n\n#### <a name=\"default-three-column-output\"></a>Default three column output\n\nConsider below sample input files\n\n```bash\n$ # sorted input files viewed side by side\n$ paste colors_1.txt colors_2.txt\nBlue    Black\nBrown   Blue\nPurple  Green\nRed     Red\nTeal    White\nYellow\n```\n\n* Without any option, `comm` gives 3 column output\n    * lines unique to first file\n    * lines unique to second file\n    * lines common to both files\n\n```bash\n$ comm colors_1.txt colors_2.txt\n        Black\n                Blue\nBrown\n        Green\nPurple\n                Red\nTeal\n        White\nYellow\n```\n\n<br>\n\n#### <a name=\"suppressing-columns\"></a>Suppressing columns\n\n* `-1` suppress lines unique to first file\n* `-2` suppress lines unique to second file\n* `-3` suppress lines common to both files\n\n```bash\n$ # suppressing column 3\n$ comm -3 colors_1.txt colors_2.txt\n        Black\nBrown\n        Green\nPurple\nTeal\n        White\nYellow\n```\n\n* Combining options gives three distinct and useful constructs\n* First, getting only common lines to both files\n\n```bash\n$ comm -12 colors_1.txt colors_2.txt\nBlue\nRed\n```\n\n* Second, lines unique to first file\n\n```bash\n$ comm -23 colors_1.txt colors_2.txt\nBrown\nPurple\nTeal\nYellow\n```\n\n* And the third, lines unique to second file\n\n```bash\n$ comm -13 colors_1.txt colors_2.txt\nBlack\nGreen\nWhite\n```\n\n* See also how the above three cases can be done [using grep alone](./gnu_grep.md#search-strings-from-file)\n    * **Note** input files do not need to be sorted for `grep` solution\n\nIf different `sort` order than default is required, use `--nocheck-order` to ignore error message\n\n```bash\n$ comm -23 <(sort -n numbers.txt) <(sort -n nums.txt)\n3\ncomm: file 1 is not in sorted order\n20\n53\n101\n\n$ comm --nocheck-order -23 <(sort -n numbers.txt) <(sort -n nums.txt)\n3\n20\n53\n101\n```\n\n<br>\n\n#### <a name=\"files-with-duplicates\"></a>Files with duplicates\n\n* As many duplicate lines match in both files, they'll be considered as common\n* Rest will be unique to respective files\n* This is useful for cases like finding lines present in first but not in second taking in to consideration count of duplicates as well\n    * This solution won't be possible with `grep`\n\n```bash\n$ paste list1 list2\na       a\na       b\na       c\nb       c\nb       d\nc\n\n$ comm list1 list2\n                a\na\na\n                b\nb\n                c\n        c\n        d\n\n$ comm -23 list1 list2\na\na\nb\n```\n\n<br>\n\n#### <a name=\"further-reading-for-comm\"></a>Further reading for comm\n\n* `man comm` and `info comm` for more options and detailed documentation\n* [comm Q&A on unix stackexchange](http://unix.stackexchange.com/questions/tagged/comm?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"shuf\"></a>shuf\n\n```bash\n$ shuf --version | head -n1\nshuf (GNU coreutils) 8.25\n\n$ man shuf\nSHUF(1)                          User Commands                         SHUF(1)\n\nNAME\n       shuf - generate random permutations\n\nSYNOPSIS\n       shuf [OPTION]... [FILE]\n       shuf -e [OPTION]... [ARG]...\n       shuf -i LO-HI [OPTION]...\n\nDESCRIPTION\n       Write a random permutation of the input lines to standard output.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"random-lines\"></a>Random lines\n\n* Without repeating input lines\n\n```bash\n$ cat nums.txt\n1\n10\n10\n12\n23\n563\n\n$ # duplicates can end up anywhere\n$ # all lines are part of output\n$ shuf nums.txt\n10\n23\n1\n10\n563\n12\n\n$ # limit max number of output lines\n$ shuf -n2 nums.txt\n563\n23\n```\n\n* Use `-o` option to specify output file name instead of displaying on stdout\n* Helpful for inplace editing\n\n```bash\n$ shuf nums.txt -o nums.txt\n$ cat nums.txt\n10\n12\n23\n10\n563\n1\n```\n\n* With repeated input lines\n\n```bash\n$ # -n3 for max 3 lines, -r allows input lines to be repeated\n$ shuf -n3 -r nums.txt\n1\n1\n563\n\n$ seq 3 | shuf -n5 -r\n2\n1\n2\n1\n2\n\n$ # if a limit using -n is not specified, shuf will output lines indefinitely\n```\n\n* use `-e` option to specify multiple input lines from command line itself\n\n```bash\n$ shuf -e red blue green\ngreen\nblue\nred\n\n$ shuf -e 'hi there' 'hello world' foo bar\nbar\nhi there\nfoo\nhello world\n\n$ shuf -n2 -e 'hi there' 'hello world' foo bar\nfoo\nhi there\n\n$ shuf -r -n4 -e foo bar\nfoo\nfoo\nbar\nfoo\n```\n\n<br>\n\n#### <a name=\"random-integer-numbers\"></a>Random integer numbers\n\n* The `-i` option accepts integer range as input to be shuffled\n\n```bash\n$ shuf -i 3-8\n3\n7\n6\n4\n8\n5\n```\n\n* Combine with other options as needed\n\n```bash\n$ shuf -n3 -i 3-8\n5\n4\n7\n\n$ shuf -r -n4 -i 3-8\n5\n5\n7\n8\n\n$ shuf -r -n5 -i 0-1\n1\n0\n0\n1\n1\n```\n\n* Use [seq](./miscellaneous.md#seq) input if negative numbers, floating point, etc are needed\n\n```bash\n$ seq 2 -1 -2 | shuf\n2\n-1\n-2\n0\n1\n\n$ seq 0.3 0.1 0.7 | shuf -n3\n0.4\n0.5\n0.7\n```\n\n\n<br>\n\n#### <a name=\"further-reading-for-shuf\"></a>Further reading for shuf\n\n* `man shuf` and `info shuf` for more options and detailed documentation\n* [Generate random numbers in specific range](https://unix.stackexchange.com/questions/140750/generate-random-numbers-in-specific-range)\n* [Variable - randomly choose among three numbers](https://unix.stackexchange.com/questions/330689/variable-randomly-chosen-among-three-numbers-10-100-and-1000)\n* Related to 'random' stuff:\n    * [How to generate a random string?](https://unix.stackexchange.com/questions/230673/how-to-generate-a-random-string)\n    * [How can I populate a file with random data?](https://unix.stackexchange.com/questions/33629/how-can-i-populate-a-file-with-random-data)\n    * [Run commands at random](https://unix.stackexchange.com/questions/81566/run-commands-at-random)\n\n"
  },
  {
    "path": "tail_less_cat_head.md",
    "content": "# <a name=\"cat-less-tail-and-head\"></a>Cat, Less, Tail and Head\n\n**Table of Contents**\n\n* [cat](#cat)\n    * [Concatenate files](#concatenate-files)\n    * [Accepting input from stdin](#accepting-input-from-stdin)\n    * [Squeeze consecutive empty lines](#squeeze-consecutive-empty-lines)\n    * [Prefix line numbers](#prefix-line-numbers)\n    * [Viewing special characters](#viewing-special-characters)\n    * [Writing text to file](#writing-text-to-file)\n    * [tac](#tac)\n    * [Useless use of cat](#useless-use-of-cat)\n    * [Further Reading for cat](#further-reading-for-cat)\n* [less](#less)\n    * [Navigation commands](#navigation-commands)\n    * [Further Reading for less](#further-reading-for-less)\n* [tail](#tail)\n    * [linewise tail](#linewise-tail)\n    * [characterwise tail](#characterwise-tail)\n    * [multiple file input for tail](#multiple-file-input-for-tail)\n    * [Further Reading for tail](#further-reading-for-tail)\n* [head](#head)\n    * [linewise head](#linewise-head)\n    * [characterwise head](#characterwise-head)\n    * [multiple file input for head](#multiple-file-input-for-head)\n    * [combining head and tail](#combining-head-and-tail)\n    * [Further Reading for head](#further-reading-for-head)\n* [Text Editors](#text-editors)\n\n<br>\n\n## <a name=\"cat\"></a>cat\n\n```bash\n$ cat --version | head -n1\ncat (GNU coreutils) 8.25\n\n$ man cat\nCAT(1)                           User Commands                          CAT(1)\n\nNAME\n       cat - concatenate files and print on the standard output\n\nSYNOPSIS\n       cat [OPTION]... [FILE]...\n\nDESCRIPTION\n       Concatenate FILE(s) to standard output.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n* For below examples, `marks_201*` files contain 3 fields delimited by TAB\n* To avoid formatting issues, TAB has been converted to spaces using `col -x` while pasting the output here\n\n<br>\n\n#### <a name=\"concatenate-files\"></a>Concatenate files\n\n* One or more files can be given as input and hence a lot of times, `cat` is used to quickly see contents of small single file on terminal\n* To save the output of concatenation, just redirect stdout\n\n```bash\n$ ls\nmarks_2015.txt  marks_2016.txt  marks_2017.txt\n\n$ cat marks_201*\nName    Maths   Science\nfoo     67      78\nbar     87      85\nName    Maths   Science\nfoo     70      75\nbar     85      88\nName    Maths   Science\nfoo     68      76\nbar     90      90\n\n$ # save stdout to a file\n$ cat marks_201* > all_marks.txt\n```\n\n<br>\n\n#### <a name=\"accepting-input-from-stdin\"></a>Accepting input from stdin\n\n```bash\n$ # combining input from stdin and other files\n$ printf 'Name\\tMaths\\tScience \\nbaz\\t56\\t63\\nbak\\t71\\t65\\n' | cat - marks_2015.txt\nName    Maths   Science\nbaz     56      63\nbak     71      65\nName    Maths   Science\nfoo     67      78\nbar     87      85\n\n$ # - can be placed in whatever order is required\n$ printf 'Name\\tMaths\\tScience \\nbaz\\t56\\t63\\nbak\\t71\\t65\\n' | cat marks_2015.txt -\nName    Maths   Science\nfoo     67      78\nbar     87      85\nName    Maths   Science\nbaz     56      63\nbak     71      65\n```\n\n<br>\n\n#### <a name=\"squeeze-consecutive-empty-lines\"></a>Squeeze consecutive empty lines\n\n```bash\n$ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n'\nhello\n\n\nworld\n\nhave a nice day\n$ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n' | cat -s\nhello\n\nworld\n\nhave a nice day\n```\n\n<br>\n\n#### <a name=\"prefix-line-numbers\"></a>Prefix line numbers\n\n```bash\n$ # number all lines\n$ cat -n marks_201*\n     1  Name    Maths   Science\n     2  foo     67      78\n     3  bar     87      85\n     4  Name    Maths   Science\n     5  foo     70      75\n     6  bar     85      88\n     7  Name    Maths   Science\n     8  foo     68      76\n     9  bar     90      90\n\n$ # number only non-empty lines\n$ printf 'hello\\n\\n\\nworld\\n\\nhave a nice day\\n' | cat -sb\n     1  hello\n\n     2  world\n\n     3  have a nice day\n```\n\n* For more numbering options, check out the command `nl`\n\n```bash\n$ whatis nl\nnl (1)               - number lines of files\n```\n\n<br>\n\n#### <a name=\"viewing-special-characters\"></a>Viewing special characters\n\n* End of line identified by `$`\n* Useful for example to see trailing spaces\n\n```bash\n$ cat -E marks_2015.txt\nName    Maths   Science $\nfoo     67      78$\nbar     87      85$\n```\n\n* TAB identified by `^I`\n\n```bash\n$ cat -T marks_2015.txt\nName^IMaths^IScience \nfoo^I67^I78\nbar^I87^I85\n```\n\n* Non-printing characters\n* See [Show Non-Printing Characters](http://docstore.mik.ua/orelly/unix/upt/ch25_07.htm) for more detailed info\n\n```bash\n$ # NUL character\n$ printf 'foo\\0bar\\0baz\\n' | cat -v\nfoo^@bar^@baz\n\n$ # to check for dos-style line endings\n$ printf 'Hello World!\\r\\n' | cat -v\nHello World!^M\n\n$ printf 'Hello World!\\r\\n' | dos2unix | cat -v\nHello World!\n```\n\n* the `-A` option is equivalent to `-vET`\n* the `-e` option is equivalent to `-vE`\n* If `dos2unix` and `unix2dos` are not available, see [How to convert DOS/Windows newline (CRLF) to Unix newline (\\n)](https://stackoverflow.com/questions/2613800/how-to-convert-dos-windows-newline-crlf-to-unix-newline-n-in-a-bash-script)\n\n<br>\n\n#### <a name=\"writing-text-to-file\"></a>Writing text to file\n\n```bash\n$ cat > sample.txt\nThis is an example of adding text to a new file using cat command.\nPress Ctrl+d on a newline to save and quit.\n\n$ cat sample.txt\nThis is an example of adding text to a new file using cat command.\nPress Ctrl+d on a newline to save and quit.\n```\n\n* See also how to use [heredoc](http://mywiki.wooledge.org/HereDocument)\n    * [How can I write a here doc to a file](https://stackoverflow.com/questions/2953081/how-can-i-write-a-here-doc-to-a-file-in-bash-script)\n* See also [difference between Ctrl+c and Ctrl+d to signal end of stdin input in bash](https://unix.stackexchange.com/questions/16333/how-to-signal-the-end-of-stdin-input-in-bash)\n\n<br>\n\n#### <a name=\"tac\"></a>tac\n\n```bash\n$ whatis tac\ntac (1)              - concatenate and print files in reverse\n$ tac --version | head -n1\ntac (GNU coreutils) 8.25\n\n$ seq 3 | tac\n3\n2\n1\n\n$ tac marks_2015.txt\nbar     87      85\nfoo     67      78\nName    Maths   Science\n```\n\n* Useful in cases where logic is easier to write when working on reversed file\n* Consider this made up log file, many **Warning** lines but need to extract only from last such **Warning** upto **Error** line\n    * See [GNU sed chapter](./gnu_sed.md#lines-between-two-regexps) for details on the `sed` command used below\n\n```bash\n$ cat report.log\nblah blah\nWarning: something went wrong\nmore blah\nwhatever\nWarning: something else went wrong\nsome text\nsome more text\nError: something seriously went wrong\nblah blah blah\n\n$ tac report.log | sed -n '/Error:/,/Warning:/p' | tac\nWarning: something else went wrong\nsome text\nsome more text\nError: something seriously went wrong\n```\n\n* Similarly, if characters in lines have to be reversed, use the `rev` command\n\n```bash\n$ whatis rev\nrev (1)              - reverse lines characterwise\n```\n\n<br>\n\n#### <a name=\"useless-use-of-cat\"></a>Useless use of cat\n\n* `cat` is used so frequently to view contents of a file that somehow users think other commands cannot handle file input\n* [UUOC](https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat)\n* [Useless Use of Cat Award](http://porkmail.org/era/unix/award.html)\n\n```bash\n$ cat report.log | grep -E 'Warning|Error'\nWarning: something went wrong\nWarning: something else went wrong\nError: something seriously went wrong\n$ grep -E 'Warning|Error' report.log\nWarning: something went wrong\nWarning: something else went wrong\nError: something seriously went wrong\n```\n\n* Use [input redirection](http://wiki.bash-hackers.org/howto/redirection_tutorial) if a command doesn't accept file input\n\n```bash\n$ cat marks_2015.txt | tr 'A-Z' 'a-z'\nname    maths   science\nfoo     67      78\nbar     87      85\n$ tr 'A-Z' 'a-z' < marks_2015.txt\nname    maths   science\nfoo     67      78\nbar     87      85\n```\n\n* However, `cat` should definitely be used where **concatenation** is needed\n\n```bash\n$ grep -c 'foo' marks_201*\nmarks_2015.txt:1\nmarks_2016.txt:1\nmarks_2017.txt:1\n\n$ # concatenation allows to get overall count in one-shot in this case\n$ cat marks_201* | grep -c 'foo'\n3\n```\n\n<br>\n\n#### <a name=\"further-reading-for-cat\"></a>Further Reading for cat\n\n* [cat Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/cat?sort=votes&pageSize=15)\n* [cat Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/cat?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"less\"></a>less\n\n```bash\n$ less --version | head -n1\nless 481 (GNU regular expressions)\n\n$ # By default, pager is used to display the man pages\n$ # and usually, pager is linked to less command\n$ type pager less\npager is /usr/bin/pager\nless is /usr/bin/less\n\n$ realpath /usr/bin/pager\n/bin/less\n$ realpath /usr/bin/less\n/bin/less\n$ diff -s /usr/bin/pager /usr/bin/less\nFiles /usr/bin/pager and /usr/bin/less are identical\n```\n\n* `cat` command is NOT suitable for viewing contents of large files on the Terminal\n* `less` displays contents of a file, automatically fits to size of Terminal, allows scrolling in either direction and other options for effective viewing\n* Usually, `man` command uses `less` command to display the help page\n* The navigation commands are similar to `vi` editor\n\n<br>\n\n#### <a name=\"navigation-commands\"></a>Navigation commands\n\nCommonly used commands are given below, press `h` for summary of options\n\n* `g` go to start of file\n* `G` go to end of file\n* `q` quit\n* `/pattern` search for the given pattern in forward direction\n* `?pattern` search for the given pattern in backward direction\n* `n` go to next pattern\n* `N` go to previous pattern\n\n<br>\n\n#### <a name=\"further-reading-for-less\"></a>Further Reading for less\n\n* See `man less` for detailed info on commands and options. For example:\n    * `-s` option to squeeze consecutive blank lines\n    * `-N` option to prefix line number\n* `less` command is an [improved version](https://unix.stackexchange.com/questions/604/isnt-less-just-more) of `more` command\n* [differences between most, more and less](https://unix.stackexchange.com/questions/81129/what-are-the-differences-between-most-more-and-less)\n* [less Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/less?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"tail\"></a>tail\n\n```bash\n$ tail --version | head -n1\ntail (GNU coreutils) 8.25\n\n$ man tail\nTAIL(1)                          User Commands                         TAIL(1)\n\nNAME\n       tail - output the last part of files\n\nSYNOPSIS\n       tail [OPTION]... [FILE]...\n\nDESCRIPTION\n       Print  the  last  10  lines of each FILE to standard output.  With more\n       than one FILE, precede each with a header giving the file name.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"linewise-tail\"></a>linewise tail\n\nConsider this sample file, with line numbers prefixed\n\n```bash\n$ cat sample.txt\n 1) Hello World\n 2) \n 3) Good day\n 4) How are you\n 5) \n 6) Just do-it\n 7) Believe it\n 8) \n 9) Today is sunny\n10) Not a bit funny\n11) No doubt you like it too\n12) \n13) Much ado about nothing\n14) He he he\n15) Adios amigo\n```\n\n* default behavior - display last 10 lines\n\n```bash\n$ tail sample.txt\n 6) Just do-it\n 7) Believe it\n 8) \n 9) Today is sunny\n10) Not a bit funny\n11) No doubt you like it too\n12) \n13) Much ado about nothing\n14) He he he\n15) Adios amigo\n```\n\n* Use `-n` option to control number of lines to filter\n\n```bash\n$ tail -n3 sample.txt\n13) Much ado about nothing\n14) He he he\n15) Adios amigo\n\n$ # some versions of tail allow to skip explicit n character\n$ tail -5 sample.txt\n11) No doubt you like it too\n12) \n13) Much ado about nothing\n14) He he he\n15) Adios amigo\n```\n\n* when number is prefixed with `+` sign, all lines are fetched from that particular line number to end of file\n\n```bash\n$ tail -n +10 sample.txt\n10) Not a bit funny\n11) No doubt you like it too\n12) \n13) Much ado about nothing\n14) He he he\n15) Adios amigo\n\n$ seq 13 17 | tail -n +3\n15\n16\n17\n```\n\n<br>\n\n#### <a name=\"characterwise-tail\"></a>characterwise tail\n\n* Note that this works byte wise and not suitable for multi-byte character encodings\n\n```bash\n$ # last three characters including the newline character\n$ echo 'Hi there!' | tail -c3\ne!\n\n$ # excluding the first character\n$ echo 'Hi there!' | tail -c +2\ni there!\n```\n\n<br>\n\n#### <a name=\"multiple-file-input-for-tail\"></a>multiple file input for tail\n\n```bash\n$ tail -n2 report.log sample.txt\n==> report.log <==\nError: something seriously went wrong\nblah blah blah\n\n==> sample.txt <==\n14) He he he\n15) Adios amigo\n\n$ # -q option to avoid filename in output\n$ tail -q -n2 report.log sample.txt\nError: something seriously went wrong\nblah blah blah\n14) He he he\n15) Adios amigo\n```\n\n<br>\n\n#### <a name=\"further-reading-for-tail\"></a>Further Reading for tail\n\n* `tail -f` and related options are beyond the scope of this tutorial. Below links might be useful\n    * [look out for buffering](http://mywiki.wooledge.org/BashFAQ/009)\n    * [Piping tail -f output though grep twice](https://stackoverflow.com/questions/13858912/piping-tail-output-though-grep-twice)\n    * [tail and less](https://unix.stackexchange.com/questions/196168/does-less-have-a-feature-like-tail-follow-name-f)\n* [tail Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/tail?sort=votes&pageSize=15)\n* [tail Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/tail?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"head\"></a>head\n\n```bash\n$ head --version | head -n1\nhead (GNU coreutils) 8.25\n\n$ man head\nHEAD(1)                          User Commands                         HEAD(1)\n\nNAME\n       head - output the first part of files\n\nSYNOPSIS\n       head [OPTION]... [FILE]...\n\nDESCRIPTION\n       Print  the  first  10 lines of each FILE to standard output.  With more\n       than one FILE, precede each with a header giving the file name.\n\n       With no FILE, or when FILE is -, read standard input.\n...\n```\n\n<br>\n\n#### <a name=\"linewise-head\"></a>linewise head\n\n* default behavior - display starting 10 lines\n\n```bash\n$ head sample.txt\n 1) Hello World\n 2) \n 3) Good day\n 4) How are you\n 5) \n 6) Just do-it\n 7) Believe it\n 8) \n 9) Today is sunny\n10) Not a bit funny\n```\n\n* Use `-n` option to control number of lines to filter\n\n```bash\n$ head -n3 sample.txt\n 1) Hello World\n 2) \n 3) Good day\n\n$ # some versions of head allow to skip explicit n character\n$ head -4 sample.txt\n 1) Hello World\n 2) \n 3) Good day\n 4) How are you\n```\n\n* when number is prefixed with `-` sign, all lines are fetched except those many lines to end of file\n\n```bash\n$ # except last 9 lines of file\n$ head -n -9 sample.txt\n 1) Hello World\n 2) \n 3) Good day\n 4) How are you\n 5) \n 6) Just do-it\n\n$ # except last 2 lines\n$ seq 13 17 | head -n -2\n13\n14\n15\n```\n\n<br>\n\n#### <a name=\"characterwise-head\"></a>characterwise head\n\n* Note that this works byte wise and not suitable for multi-byte character encodings\n\n```bash\n$ # if output of command doesn't end with newline, prompt will be on same line\n$ # to highlight working of command, the prompt for such cases is not shown here\n\n$ # first two characters\n$ echo 'Hi there!' | head -c2\nHi\n\n$ # excluding last four characters\n$ echo 'Hi there!' | head -c -4\nHi the\n```\n\n<br>\n\n#### <a name=\"multiple-file-input-for-head\"></a>multiple file input for head\n\n```bash\n$ head -n3 report.log sample.txt\n==> report.log <==\nblah blah\nWarning: something went wrong\nmore blah\n\n==> sample.txt <==\n 1) Hello World\n 2) \n 3) Good day\n\n$ # -q option to avoid filename in output\n$ head -q -n3 report.log sample.txt\nblah blah\nWarning: something went wrong\nmore blah\n 1) Hello World\n 2) \n 3) Good day\n```\n\n<br>\n\n#### <a name=\"combining-head-and-tail\"></a>combining head and tail\n\n* Despite involving two commands, often this combination is faster than equivalent sed/awk versions\n\n```bash\n$ head -n11 sample.txt | tail -n3\n 9) Today is sunny\n10) Not a bit funny\n11) No doubt you like it too\n\n$ tail sample.txt | head -n2\n 6) Just do-it\n 7) Believe it\n```\n\n<br>\n\n#### <a name=\"further-reading-for-head\"></a>Further Reading for head\n\n* [head Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/head?sort=votes&pageSize=15)\n\n<br>\n\n## <a name=\"text-editors\"></a>Text Editors\n\nFor editing text files, the following applications can be used. Of these, `gedit`, `nano`, `vi` and/or `vim` are available in most distros by default\n\nEasy to use\n\n* [gedit](https://wiki.gnome.org/Apps/Gedit)\n* [geany](http://www.geany.org/)\n* [nano](http://nano-editor.org/)\n\nPowerful text editors\n\n* [vim](https://github.com/vim/vim)\n    * [vim learning resources](https://github.com/learnbyexample/scripting_course/blob/master/Vim_curated_resources.md) and [vim reference](https://github.com/learnbyexample/vim_reference) for further info\n* [emacs](https://www.gnu.org/software/emacs/)\n* [atom](https://atom.io/)\n* [sublime](https://www.sublimetext.com/)\n\nCheck out [this analysis](https://github.com/jhallen/joes-sandbox/tree/master/editor-perf) for some performance/feature comparisons of various text editors\n"
  },
  {
    "path": "whats_the_difference.md",
    "content": "# <a name=\"whats-the-difference\"></a>What's the difference\n\n**Table of Contents**\n\n* [cmp](#cmp)\n* [diff](#diff)\n    * [Comparing Directories](#comparing-directories)\n    * [colordiff](#colordiff)\n\n<br>\n\n## <a name=\"cmp\"></a>cmp\n\n```bash\n$ cmp --version | head -n1\ncmp (GNU diffutils) 3.3\n\n$ man cmp\nCMP(1)                           User Commands                          CMP(1)\n\nNAME\n       cmp - compare two files byte by byte\n\nSYNOPSIS\n       cmp [OPTION]... FILE1 [FILE2 [SKIP1 [SKIP2]]]\n\nDESCRIPTION\n       Compare two files byte by byte.\n\n       The optional SKIP1 and SKIP2 specify the number of bytes to skip at the\n       beginning of each file (zero by default).\n...\n```\n\n* As the comparison is byte by byte, it doesn't matter if file is human readable or not\n* A typical use case is to check if two executables are same or not\n\n```bash\n$ echo 'foo 123' > f1; echo 'food 123' > f2\n$ cmp f1 f2\nf1 f2 differ: byte 4, line 1\n\n$ # print differing bytes\n$ cmp -b f1 f2\nf1 f2 differ: byte 4, line 1 is  40   144 d\n\n$ # skip given bytes from each file\n$ # if only one number is given, it is used for both inputs\n$ cmp -i 3:4 f1 f2\n$ echo $?\n0\n\n$ # compare only given number of bytes from start of inputs\n$ cmp -n 3 f1 f2\n$ echo $?\n0\n\n$ # suppress output\n$ cmp -s f1 f2\n$ echo $?\n1\n```\n\n* Comparison stops immediately at the first difference found\n* If verbose option `-l` is used, comparison would stop at whichever input reaches end of file first\n\n```bash\n$ # first column is byte number\n$ # second/third column is respective octal value of differing bytes\n$ cmp -l f1 f2\n4  40 144\n5  61  40\n6  62  61\n7  63  62\n8  12  63\ncmp: EOF on f1\n```\n\n**Further Reading**\n\n* `man cmp` and `info cmp` for more options and detailed documentation\n\n\n<br>\n\n## <a name=\"diff\"></a>diff\n\n```bash\n$ diff --version | head -n1\ndiff (GNU diffutils) 3.3\n\n$ man diff\nDIFF(1)                          User Commands                         DIFF(1)\n\nNAME\n       diff - compare files line by line\n\nSYNOPSIS\n       diff [OPTION]... FILES\n\nDESCRIPTION\n       Compare FILES line by line.\n...\n```\n\n* `diff` output shows lines from first file input starting with `<`\n* lines from second file input starts with `>`\n* between the two file contents, `---` is used as separator\n* each difference is prefixed by a command that indicates the differences (see links at end of section for more details)\n\n```bash\n$ paste d1 d2\n1       1\n2       hello\n3       3\nworld   4\n\n$ diff d1 d2\n2c2\n< 2\n---\n> hello\n4c4\n< world\n---\n> 4\n\n$ diff <(seq 4) <(seq 5)\n4a5\n> 5\n```\n\n* use `-i` option to ignore case\n\n```bash\n$ echo 'Hello World!' > i1\n$ echo 'hello world!' > i2\n\n$ diff i1 i2\n1c1\n< Hello World!\n---\n> hello world!\n\n$ diff -i i1 i2\n$ echo $?\n0\n```\n\n* ignoring difference in white spaces\n\n```bash\n$ # -b option to ignore changes in the amount of white space\n$ diff -b <(echo 'good day') <(echo 'good    day')\n$ echo $?\n0\n\n$ # -w option to ignore all white spaces\n$ diff -w <(echo 'hi    there ') <(echo ' hi there')\n$ echo $?\n0\n$ diff -w <(echo 'hi    there ') <(echo 'hithere')\n$ echo $?\n0\n\n# use -B to ignore only blank lines\n# use -E to ignore changes due to tab expansion\n# use -z to ignore trailing white spaces at end of line\n```\n\n* side-by-side output\n\n```bash\n$ diff -y d1 d2\n1                                                               1\n2                                                             | hello\n3                                                               3\nworld                                                         | 4\n\n$ # -y is usually used along with other options\n$ # default width is 130 print columns\n$ diff -W 60 --suppress-common-lines -y d1 d2\n2                            |  hello\nworld                        |  4\n\n$ diff -W 20 --left-column -y <(seq 4) <(seq 5)\n1     (\n2     (\n3     (\n4     (\n      > 5\n```\n\n* by default, there is no output if input files are same. Use `-s` option to additionally indicate files are same\n* by default, all differences are shown. Use `-q` option to indicate only that files differ\n\n```bash\n$ cp i1 i1_copy\n$ diff -s i1 i1_copy\nFiles i1 and i1_copy are identical\n$ diff -s i1 i2\n1c1\n< Hello World!\n---\n> hello world!\n\n$ diff -q i1 i1_copy\n$ diff -q i1 i2\nFiles i1 and i2 differ\n\n$ # combine them to always get one line output\n$ diff -sq i1 i1_copy\nFiles i1 and i1_copy are identical\n$ diff -sq i1 i2\nFiles i1 and i2 differ\n```\n\n<br>\n\n#### <a name=\"comparing-directories\"></a>Comparing Directories\n\n* when comparing two files of same name from different directories, specifying the filename is optional for one of the directories\n\n```bash\n$ mkdir dir1 dir2\n$ echo 'Hello World!' > dir1/i1\n$ echo 'hello world!' > dir2/i1\n\n$ diff dir1/i1 dir2\n1c1\n< Hello World!\n---\n> hello world!\n\n$ diff -s i1 dir1/\nFiles i1 and dir1/i1 are identical\n$ diff -s . dir1/i1\nFiles ./i1 and dir1/i1 are identical\n```\n\n* if both arguments are directories, all files are compared\n\n```bash\n$ touch dir1/report.log dir1/lists dir2/power.log\n$ cp f1 dir1/\n$ cp f1 dir2/\n\n$ # by default, all differences are reported\n$ # as well as filenames which are unique to respective directories\n$ diff dir1 dir2\ndiff dir1/i1 dir2/i1\n1c1\n< Hello World!\n---\n> hello world!\nOnly in dir1: lists\nOnly in dir2: power.log\nOnly in dir1: report.log\n```\n\n* to report only filenames\n\n```bash\n$ diff -sq dir1 dir2\nFiles dir1/f1 and dir2/f1 are identical\nFiles dir1/i1 and dir2/i1 differ\nOnly in dir1: lists\nOnly in dir2: power.log\nOnly in dir1: report.log\n\n$ # list only differing files\n$ # also useful to copy-paste the command for GUI diffs like tkdiff/vimdiff\n$ diff dir1 dir2 | grep '^diff '\ndiff dir1/i1 dir2/i1\n```\n\n* to recursively compare sub-directories as well, use `-r`\n\n```bash\n$ mkdir dir1/subdir dir2/subdir\n$ echo 'good' > dir1/subdir/f1\n$ echo 'goad' > dir2/subdir/f1\n\n$ diff -srq dir1 dir2\nFiles dir1/f1 and dir2/f1 are identical\nFiles dir1/i1 and dir2/i1 differ\nOnly in dir1: lists\nOnly in dir2: power.log\nOnly in dir1: report.log\nFiles dir1/subdir/f1 and dir2/subdir/f1 differ\n\n$ diff -r dir1 dir2 | grep '^diff '\ndiff -r dir1/i1 dir2/i1\ndiff -r dir1/subdir/f1 dir2/subdir/f1\n```\n\n* See also [GNU diffutils manual - comparing directories](https://www.gnu.org/software/diffutils/manual/diffutils.html#Comparing-Directories) for further options and details like excluding files, ignoring filename case, etc and `dirdiff` command\n\n<br>\n\n#### <a name=\"colordiff\"></a>colordiff\n\n```bash\n$ whatis colordiff \ncolordiff (1)        - a tool to colorize diff output\n\n$ whatis wdiff\nwdiff (1)            - display word differences between text files\n```\n\n* simply replace `diff` with `colordiff`\n\n![colordiff](./images/colordiff.png)\n\n* or, pass output of a `diff` tool to `colordiff`\n\n![wdiff to colordiff](./images/wdiff_to_colordiff.png)\n\n* See also [stackoverflow - How to colorize diff on the command line?](https://stackoverflow.com/questions/8800578/how-to-colorize-diff-on-the-command-line) for other options\n\n<br>\n\n**Further Reading**\n\n* `man diff` and `info diff` for more options and detailed documentation\n    * [GNU diffutils manual](https://www.gnu.org/software/diffutils/manual/diffutils.html) for a better documentation\n* `man -k diff` to get list of all commands related to `diff`\n* [diff Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/diff?sort=votes&pageSize=15)\n* [unix.stackexchange - GUI diff and merge tools](https://unix.stackexchange.com/questions/4573/which-gui-diff-viewer-would-you-recommend-with-copy-to-left-right-functionality)\n* [unix.stackexchange - Understanding diff output](https://unix.stackexchange.com/questions/81998/understanding-of-diff-output)\n* [stackoverflow - Using output of diff to create patch](https://stackoverflow.com/questions/437219/using-the-output-of-diff-to-create-the-patch)\n\n"
  },
  {
    "path": "wheres_my_file.md",
    "content": "# <a name=\"where's-my-file\"></a>Where's my file\n\n**Table of Contents**\n\n* [find](#find)\n* [locate](#locate)\n\n<br>\n\n## <a name=\"find\"></a>find\n\n```bash\n$ find --version | head -n1\nfind (GNU findutils) 4.7.0-git\n\n$ man find\nFIND(1)                     General Commands Manual                    FIND(1)\n\nNAME\n       find - search for files in a directory hierarchy\n\nSYNOPSIS\n       find  [-H]  [-L]  [-P]  [-D  debugopts]  [-Olevel]  [starting-point...]\n       [expression]\n\nDESCRIPTION\n       This manual page documents the GNU version of find.  GNU find  searches\n       the  directory  tree  rooted at each given starting-point by evaluating\n       the given expression from left to right,  according  to  the  rules  of\n       precedence  (see  section  OPERATORS),  until the outcome is known (the\n       left hand side is false for and operations,  true  for  or),  at  which\n       point  find  moves  on  to the next file name.  If no starting-point is\n       specified, `.' is assumed.\n...\n```\n\n**Examples**\n\nFiltering based on file name\n\n* `find . -iname 'power.log'` search and print path of file named power.log (ignoring case) in current directory and its sub-directories\n* `find -name '*log'` search and print path of all files whose name ends with log in current directory - using `.` is optional when searching in current directory\n* `find -not -name '*log'` print path of all files whose name does NOT end with log in current directory\n* `find -regextype egrep -regex '.*/\\w+'` use extended regular expression to match filename containing only `[a-zA-Z_]` characters\n    * `.*/` is needed to match initial part of file path\n\nFiltering based on file type\n\n* `find /home/guest1/proj -type f` print path of all regular files found in specified directory\n* `find /home/guest1/proj -type d` print path of all directories found in specified directory\n* `find /home/guest1/proj -type f -name '.*'` print path of all hidden files\n\nFiltering based on depth\n\nThe relative path `.` is considered as depth 0 directory, files and folders immediately contained in a directory are at depth 1 and so on\n\n* `find -maxdepth 1 -type f` all regular files (including hidden ones) from current directory (without going to sub-directories)\n* `find -maxdepth 1 -type f -name '[!.]*'` all regular files (but not hidden ones) from current directory (without going to sub-directories)\n    * `-not -name '.*'` can be also used\n* `find -mindepth 1 -maxdepth 1 -type d` all directories (including hidden ones) in current directory (without going to sub-directories)\n\nFiltering based on file properties\n\n* `find -mtime -2` print files that were modified within last two days in current directory\n    * Note that day here means 24 hours\n* `find -mtime +7` print files that were modified more than seven days back in current directory\n* `find -daystart -type f -mtime -1` files that were modified from beginning of day (not past 24 hours)\n* `find -size +10k` print files with size greater than 10 kilobytes in current directory\n* `find -size -1M` print files with size less than 1 megabytes in current directory\n* `find -size 2G` print files of size 2 gigabytes in current directory\n\nPassing filtered files as input to other commands\n\n* `find report -name '*log*' -exec rm {} \\;` delete all filenames containing log in report folder and its sub-folders\n    * here `rm` command is called for every file matching the search conditions\n    * since `;` is a special character for shell, it needs to be escaped using `\\`\n* `find report -name '*log*' -delete` delete all filenames containing log in report folder and its sub-folders\n* `find -name '*.txt' -exec wc {} +` list of files ending with txt are all passed together as argument to `wc` command instead of executing wc command for every file\n    * no need to use escape the `+` character in this case\n    * also note that number of invocations of command specified is not necessarily once if number of files found is too large\n* `find -name '*.log' -exec mv {} ../log/ \\;` move files ending with .log to log directory present in one hierarchy above. `mv` is executed once per each filtered file\n* `find -name '*.log' -exec mv -t ../log/ {} +` the `-t` option allows to specify target directory and then provide multiple files to be moved as argument\n    * Similarly, one can use `-t` for `cp` command\n\n**Further Reading**\n\n* [using find](http://mywiki.wooledge.org/UsingFind)\n* [find examples on SO](https://stackoverflow.com/documentation/bash/566/find#t=201612140534548263961)\n* [Collection of find examples](http://alvinalexander.com/unix/edu/examples/find.shtml)\n* [find Q&A on unix stackexchange](https://unix.stackexchange.com/questions/tagged/find?sort=votes&pageSize=15)\n* [find and tar example](https://unix.stackexchange.com/questions/282762/find-mtime-1-print-xargs-tar-archives-all-files-from-directory-ignoring-t/282885#282885)\n* [find Q&A on stackoverflow](https://stackoverflow.com/questions/tagged/find?sort=votes&pageSize=15)\n* [Why is looping over find's output bad practice?](https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice)\n\n\n<br>\n\n## <a name=\"locate\"></a>locate\n\n```bash\n$ locate --version | head -n1\nmlocate 0.26\n\n$ man locate\nlocate(1)                   General Commands Manual                  locate(1)\n\nNAME\n       locate - find files by name\n\nSYNOPSIS\n       locate [OPTION]... PATTERN...\n\nDESCRIPTION\n       locate  reads  one or more databases prepared by updatedb(8) and writes\n       file names matching at least one of the PATTERNs  to  standard  output,\n       one per line.\n\n       If  --regex is not specified, PATTERNs can contain globbing characters.\n       If any PATTERN contains no globbing characters, locate  behaves  as  if\n       the pattern were *PATTERN*.\n...\n```\n\nFaster alternative to `find` command when searching for a file by its name. It is based on a database, which gets updated by a `cron` job. So, newer files may be not present in results. Use this command if it is available in your distro and you remember some part of filename. Very useful if one has to search entire filesystem in which case `find` command might take a very long time compared to `locate`\n\n**Examples**\n\n* `locate 'power'` print path of files containing power in the whole filesystem\n    * matches anywhere in path, ex: '/home/learnbyexample/lowpower_adder/result.log' and '/home/learnbyexample/power.log' are both a valid match\n    * implicitly, `locate` would change the string to `*power*` as no globbing characters are present in the string specified\n* `locate -b '\\power.log'` print path matching the string power.log exactly at end of path\n    * '/home/learnbyexample/power.log' matches but not '/home/learnbyexample/lowpower.log'\n    * since globbing character '\\' is used while specifying search string, it doesn't get implicitly replaced by `*power.log*`\n* `locate -b '\\proj_adder'` the `-b` option also comes in handy to print only the path of directory name, otherwise every file under that folder would also be displayed\n* [find vs locate - pros and cons](https://unix.stackexchange.com/questions/60205/locate-vs-find-usage-pros-and-cons-of-each-other)\n\n\n"
  }
]