[
  {
    "path": ".travis.yml",
    "content": "sudo: required\nlanguage:\n- bash\nscript:\n- sudo apt-get install shellcheck\n- shellcheck *.sh\n- ( cd tests/ && openssl aes-256-cbc -K $encrypted_4d6c5775c90a_key -iv $encrypted_4d6c5775c90a_iv -in curl-options.txt.enc -out curl-options.txt -d ;)\n- ./tests/tests.sh\n"
  },
  {
    "path": "CHANGELOG.md",
    "content": "## v2.0.0\n\n* Using `curl` instead of `wget`\n* Fix #36 (unable to read cookie file)\n* Fix #34 (`413 Request Entity Too Large`)\n\n## v1.2.2\n\n* Loop detection: #24.\n* Add test cases.\n* Update documentation (Cookie issue.)\n* Minor code improvements.\n* Group with category support (#28, Thanks @LeeKevin)\n\n## v1.2.1\n\n* Fix bugs: #6 (compatibility issue),\n    #13 (so large group),\n    #16 (email exporting and third-party license issue)\n* Fix script shebang.\n* Google organization support.\n* Ensure group name is in lowercase.\n* Minor scripting improvements.\n\n## v1.2.0\n\n* Drop the use of `lynx` program. `wget` handles all download now.\n* Accept `_WGET_OPTIONS` environment to control `wget` commands.\n* Can work with private groups thanks to `_WGET_OPTIONS` environment.\n* Rename script (`craw.sh` becomes `crawler.sh`.)\n* Output important variables to the output script.\n* Update documentation (`README.md`.)\n\n## v1.0.1\n\n* Provide fancy agent to `wget` and `lynx` command.\n* Fix wrong URL of `rss` feed.\n* Use `set -u` to avoid unbound variable.\n* Fix display charset of `lynx` program. See #3.\n\n## v1.0.0\n\n* The first public version.\n"
  },
  {
    "path": "README.md",
    "content": "WARNING: This project doesn't work and it's deprecated. \n**Reason:** Ajax support is completely deprecated by Google\n  See also https://github.com/icy/google-group-crawler/issues/42#issuecomment-889013487\n\n[![Build Status](https://travis-ci.org/icy/google-group-crawler.svg?branch=master)](https://travis-ci.org/icy/google-group-crawler)\n\n## Download all messages from Google Group archive\n\n`google-group-crawler` is a `Bash-4` script to download all (original)\nmessages from a Google group archive.\nPrivate groups require some cookies string/file.\nGroups with adult contents haven't been supported yet.\n\n* [Installation](#installation)\n* [Usage](#usage)\n  * [The first run](#the-first-run)\n  * [Update your local archive thanks to rss feed](#update-your-local-archive-thanks-to-rss-feed)\n  * [Private group or Group hosted by an organization](#private-group-or-group-hosted-by-an-organization)\n  * [The hook](#the-hook)\n  * [What to do with your local archive](#what-to-do-with-your-local-archive)\n  * [Rescan the whole local archive](#rescan-the-whole-local-archive)\n  * [Known problems](#known-problems)\n* [Contributions](#contributions)\n* [Similar projects](#similar-projects)\n* [License](#license)\n* [Author](#author)\n* [For script hackers](#for-script-hackers)\n\n## Installation\n\nThe script requires `bash-4`, `sort`, `curl`, `sed`, `awk`.\n\nMake the script executable with `chmod 755` and put them in your path\n(e.g, `/usr/local/bin/`.)\n\nThe script may not work on `Windows` environment as reported in\nhttps://github.com/icy/google-group-crawler/issues/26.\n\n## Usage\n\n### The first run\n\nFor private group, please\n[prepare your cookies file](#private-group-or-group-hosted-by-an-organization).\n\n    # export _CURL_OPTIONS=\"-v\"       # use curl options to provide e.g, cookies\n    # export _HOOK_FILE=\"/some/path\"  # provide a hook file, see in #the-hook\n\n    # export _ORG=\"your.company\"      # required, if you are using Gsuite\n    export _GROUP=\"mygroup\"           # specify your group\n    ./crawler.sh -sh                  # first run for testing\n    ./crawler.sh -sh > curl.sh        # save your script\n    bash curl.sh                      # downloading mbox files\n\nYou can execute `curl.sh` script multiple times, as `curl` will skip\nquickly any fully downloaded files.\n\n### Update your local archive thanks to RSS feed\n\nAfter you have an archive from the first run you only need to add the latest\nmessages as shown in the feed. You can do that with `-rss` option and the\nadditional `_RSS_NUM` environment variable:\n\n    export _RSS_NUM=50                # (optional. See Tips & Tricks.)\n    ./crawler.sh -rss > update.sh     # using rss feed for updating\n    bash update.sh                    # download the latest posts\n\nIt's useful to follow this way frequently to update your local archive.\n\n### Private group or Group hosted by an organization\n\nTo download messages from private group or group hosted by your organization,\nyou need to provide some cookie information to the script. In the past,\nthe script uses `wget` and the Netscape cookie file format,\nnow we are using `curl` with cookie string and a configuration file.\n\n0. Open Firefox, press F12 to enable Debug mode and select Network tab\n   from the Debug console of Firefox. (You may find a similar way for\n   your favorite browser.)\n1. Log in to your testing google account, and access your group.\n   For example\n     https://groups.google.com/forum/?_escaped_fragment_=categories/google-group-crawler-public\n   (replace `google-group-crawler-public` with your group name).\n   Make sure you can read some contents with your own group URI.\n2. Now from the Network tab in Debug console, select the address\n   and select `Copy -> Copy Request Headers`. You will have a lot of\n   things in the result, but please paste them in your text editor\n   and select only `Cookie` part.\n3. Now prepare a file `curl-options.txt` as below\n\n        user-agent = \"Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0\"\n        header = \"Cookie: <snip>\"\n\n   Of course, replace the `<snip>` part with your own cookie strings.\n   See `man curl` for more details of the file format.\n\n2. Specify your cookie file by `_CURL_OPTIONS`:\n\n        export _CURL_OPTIONS=\"-K /path/to/curl-options.txt\"\n\n   Now every hidden group can be downloaded :)\n\n### The hook\n\nIf you want to execute a `hook` command after a `mbox` file is downloaded,\nyou can do as below.\n\n1. Prepare a Bash script file that contains a definition of `__curl_hook`\n   command. The first argument is to specify an output filename, and the\n   second argument is to specify an URL. For example, here is simple hook\n\n        # $1: output file\n        # $2: url (https://groups.google.com/forum/message/raw?msg=foobar/topicID/msgID)\n        __curl_hook() {\n          if [[ \"$(stat -c %b \"$1\")\" == 0 ]]; then\n            echo >&2 \":: Warning: empty output '$1'\"\n          fi\n        }\n\n    In this example, the `hook` will check if the output file is empty,\n    and send a warning to the standard error device.\n\n2. Set your environment variable `_HOOK_FILE` which should be the path\n   to your file. For example,\n\n        export _GROUP=archlinuxvn\n        export _HOOK_FILE=$HOME/bin/curl.hook.sh\n\n   Now the hook file will be loaded in your future output of commands\n   `crawler.sh -sh` or `crawler.sh -rss`.\n\n### What to do with your local archive\n\nThe downloaded messages are found under `$_GROUP/mbox/*`.\n\nThey are in `RFC 822` format (possibly with obfuscated email addresses)\nand they can be converted to `mbox` format easily before being imported\nto your email clients  (`Thunderbird`, `claws-mail`, etc.)\n\nYou can also use [mhonarc](https://www.mhonarc.org/) ultility to convert\nthe downloaded to `HTML` files.\n\nSee also\n\n* https://github.com/icy/google-group-crawler/issues/15#issuecomment-221018338\n* https://github.com/icy/google-group-crawler/issues/35#issuecomment-580659966\n* My script https://github.com/icy/bashy/blob/master/libs/raw2mbox.sh\n\n### Rescan the whole local archive\n\nSometimes you may need to rescan / redownload all messages.\nThis can be done by removing all temporary files\n\n    rm -fv $_GROUP/threads/t.*    # this is a must\n    rm -fv $_GROUP/msgs/m.*       # see also Tips & Tricks\n\nor you can use `_FORCE` option:\n\n    _FORCE=\"true\" ./crawler.sh -sh\n\nAnother option is to delete all files under `$_GROUP/` directory.\nAs usual, remember to backup before you delete some thing.\n\n### Known problems\n\n1. Fails on group with adult contents (https://github.com/icy/google-group-crawler/issues/14)\n1. This script may not recover emails from public groups.\n  When you use valid cookies, you may see the original emails\n  if you are a manager of the group. See also https://github.com/icy/google-group-crawler/issues/16.\n2. When cookies are used, the original emails may be recovered\n  and you must filter them before making your archive public.\n3. Script can't fetch from group whose name contains some special character (e.g, `+`)\n  See also https://github.com/icy/google-group-crawler/issues/30\n\n## Contributions\n\n1. `parallel` support: @Pikrass has a script to download messages in parallel.\n  It's discussed in the ticket https://github.com/icy/google-group-crawler/issues/32.\n  The script: https://gist.github.com/Pikrass/f8462ff8a9af18f97f08d2a90533af31\n2. `raw access denied`: @alexivkin mentioned he could use the `print` function\n  to work-around the issue. See it here\n  https://github.com/icy/google-group-crawler/issues/29#issuecomment-468810786\n\n## Similar projects\n\n* (website) [Google Takeout - Download all info for any groups you own](https://takeout.google.com/)\n* (Shell/curl) [ggscrape - Download emails from a Google Group. Rescue your archives](https://git.scuttlebot.io/%25nkOkiGF0Dd321GmNqs6aW%2BWHaH9Uunq4m8dVfJuU%2Bps%3D.sha256)\n* (Python/Webdriver) [scrape_google_groups.py  - A simple script to scrape a google group](https://gist.github.com/punchagan/7947337)\n* (Python/webscraping.webkit) [gg-scrape - Liberate you data from google groups](https://github.com/jrholliday/gg-scrape)\n* (Python/urllib) [gg_scraper](https://gitlab.com/mcepl/gg_scraper)\n* (PHP/libcurl) [scraping-google-groups](http://saturnboy.com/2010/03/scraping-google-groups/)\n\n## License\n\nThis work is released under the terms of a MIT license.\n\n## Author\n\nThis script is written by Anh K. Huynh.\n\nHe wrote this script because he couldn't resolve the problem by using\n`nodejs`, `phantomjs`, `Watir`.\n\nNew web technology just makes life harder, doesn't it?\n\n## For script hackers\n\nPlease skip this section unless your really know to work with `Bash` and shells.\n\n0. If you clean your files _(as below)_, you may notice that it will be\n   very slow when re-downloading all files. You may consider to use\n   the `-rss` option instead. This option will fetch data from a `rss` link.\n\n   It's recommmeded to use the `-rss` option for daily update. By default,\n   the number of items is 50. You can change it by the `_RSS_NUM` variable.\n   However, don't use a very big number, because Google will ignore that.\n\n1. Because Topics is a FIFO list, you only need to remove the last file.\n   The script will re-download the last item, and if there is a new page,\n   that page will be fetched.\n\n        ls $_GROUP/msgs/m.* \\\n        | sed -e 's#\\.[0-9]\\+$##g' \\\n        | sort -u \\\n        | while read f; do\n            last_item=\"$f.$( \\\n              ls $f.* \\\n              | sed -e 's#^.*\\.\\([0-9]\\+\\)#\\1#g' \\\n              | sort -n \\\n              | tail -1 \\\n            )\";\n            echo $last_item;\n          done\n\n2. The list of threads is a LIFO list. If you want to rescan your list,\n   you will need to delete all files under `$_D_OUTPUT/threads/`\n\n3. You can set the time for `mbox` output files, as below\n\n        ls $_GROUP/mbox/m.* \\\n        | while read FILE; do \\\n            date=\"$( \\\n              grep ^Date: $FILE\\\n              | head -1\\\n              | sed -e 's#^Date: ##g' \\\n            )\";\n            touch -d \"$date\" $FILE;\n          done\n\n    This will be very useful, for example, when you want to use the\n    `mbox` files with `mhonarc`.\n"
  },
  {
    "path": "contrib/README.md",
    "content": "\n## Fix dot in email addresses\n\nBy default, emails exported by the tool are not original because\nGoogle's anti-spam mechanism removes some characters from them, for e.g,\n\n    this.is.my.email@example.net    --> this.....@example.net\n\nThe `discourse` has a great script to fix this problem, as seen at\n\nhttps://github.com/discourse/discourse/blob/648bcb6432ee1fbca0fc9d45c25c3d114f2a0892/script/import_scripts/mbox.rb\n\nThis script was imported to the `google-group-crawler` project, but it\nwas removed on Apr 24th 2017 due to license problem as described here\n\nhttps://github.com/icy/google-group-crawler/issues/16#issuecomment-292509711\n\nRemoving is the best way to avoid duplication and future confusion.\n"
  },
  {
    "path": "crawler.sh",
    "content": "#!/usr/bin/env bash\n#\n# Purpose: Make a backup of Google Group [Google Group Crawler]\n# Author : Anh K. Huynh\n# Date   : 2013 Sep 22nd\n# License: MIT license\n#\n# Copyright (c) 2013 - 2020 Ky-Anh Huynh <kyanh@viettug.org>\n#\n# Permission is hereby granted, free of charge, to any person obtaining a copy\n# of this software and associated documentation files (the \"Software\"), to deal\n# in the Software without restriction, including without limitation the rights\n# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n# copies of the Software, and to permit persons to whom the Software is\n# furnished to do so, subject to the following conditions:\n#\n# The above copyright notice and this permission notice shall be included in\n# all copies or substantial portions of the Software.\n#\n# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n# THE SOFTWARE.\n\n# For your hack ;)\n#\n# Forum, list of all threads (topics), LIFO\n#   https://groups.google.com/forum/?_escaped_fragment_=forum/archlinuxvn\n#\n# Topic, list of all messages in a thread (topic), FIFO\n#   https://groups.google.com/forum/?_escaped_fragment_=topic/archlinuxvn/wXRTQFqBtlA\n#\n# Raw, a MH mail message:\n#   https://groups.google.com/forum/message/raw?msg=archlinuxvn/_atKwaIFVGw/rnwjMJsA4ZYJ\n#\n# Specification:\n#\n#   1. https://developers.google.com/search/docs/ajax-crawling/docs/specification\n#   2. (Deprecation notice) https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html\n#\n# Atom link\n#\n#   https://groups.google.com/forum/feed/archlinuxvn/msgs/atom.xml?num=100\n#   https://groups.google.com/forum/feed/archlinuxvn/topics/atom.xml?num=50\n#\n#   Don't use a very big `num`. Google knows that and changes to 16.\n#   The bad thing is that Google doesn't provide a link to a post.\n#   It only provides link to a topic. Hence for two links above you\n#   would get the same result: links to your topics...\n#\n# Rss link\n#\n#   https://groups.google.com/forum/feed/archlinuxvn/msgs/rss.xml?num=50\n#   https://groups.google.com/forum/feed/archlinuxvn/topics/rss.xml?num=50\n#\n#   Rss link contains link to topic. That's great.\n#\n\n_short_url() {\n  printf '%s\\n' \"${*//https:\\/\\/groups.google.com${_ORG:+\\/a\\/$_ORG}\\/forum\\/\\?_escaped_fragment_=/}\"\n}\n\n_links_dump() {\n  # shellcheck disable=2086\n  curl \\\n    --user-agent \"$_USER_AGENT\" \\\n    $_CURL_OPTIONS \\\n    -Lso- \"$@\" \\\n  | sed -e \"s#['\\\"]#\\\\\"$'\\n#g' \\\n  | grep -E '^https?://' \\\n  | sort -u\n}\n\n# $1: output file [/path/to/directory/prefix]\n# $2: url\n_download_page() {\n  local _f_output\n  local _url=\"$2\"\n  local _surl=\n  local __\n\n  _surl=\"$(_short_url \"$_url\")\"\n  __=0\n  while :; do\n    _f_output=\"$1.${__}\"\n    if [[ -f \"$_f_output\" ]]; then\n      if [[ -n \"${_FORCE:-}\" ]]; then\n        echo >&2 \":: Updating '$_f_output' with '${_surl}'\"\n      else\n        echo >&2 \":: Skipping '$_f_output' (downloaded with '${_surl}')\"\n        if ! _url=\"$(grep -E -- \"_escaped_fragment_=((forum)|(topic)|(categories))/$_GROUP\" \"$_f_output\")\"; then\n          break\n        fi\n        (( __ ++ ))\n        continue\n      fi\n    else\n      echo >&2 \":: Creating '$_f_output' with '${_surl}'\"\n    fi\n\n    {\n      echo >&2 \":: Fetching data from '$_url'...\"\n      _links_dump \"$_url\"\n    } \\\n    | grep \"https://\" \\\n    | grep \"/$_GROUP\" \\\n    | awk '{print $NF}' \\\n    > \"$_f_output\"\n\n    # Loop detection. See also\n    #   https://github.com/icy/google-group-crawler/issues/24\n    # FIXME: 2020/04: This isn't necessary after Google has changed something\n    if [[ $__ -ge 1 ]]; then\n      if diff \"$_f_output\" \"$1.$(( __ - 1 ))\" >/dev/null 2>&1; then\n        echo >&2 \":: ==================================================\"\n        echo >&2 \":: Loop detected. Your cookie may not work correctly.\"\n        echo >&2 \":: You may want to generate new cookie file\"\n        echo >&2 \":: and/or remove all '#HttpOnly_' strings from it.\"\n        echo >&2 \":: ==================================================\"\n        exit 125\n      fi\n    fi\n\n    if ! _url=\"$(grep -E -- \"_escaped_fragment_=((forum)|(topic)|(categories))/$_GROUP\" \"$_f_output\")\"; then\n      break\n    fi\n\n    (( __ ++ ))\n  done\n}\n\n# Main routine\n_main() {\n  mkdir -pv \"$_D_OUTPUT\"/{threads,msgs,mbox}/ 1>&2 || exit 1\n\n  echo >&2 \":: Downloading all topics (thread) pages...\"\n  # Each page contains a bunch of\n  # topics sorted by time (the latest updated topic comes first.)\n  #\n  #  t.0 the first page   (the latest update)\n  #  t.1 the second page\n  #  (and so on)\n  #\n  _download_page \"$_D_OUTPUT/threads/t\" \\\n    \"https://groups.google.com${_ORG:+/a/$_ORG}/forum/?_escaped_fragment_=categories/$_GROUP\"\n\n  echo >&2 \":: Downloading list of all messages...\"\n  #\n  # Each thread (topic) file (`t.<number>`) contains a list of messages\n  # sorted by time (the latest updated message comes first.)\n  #\n  #   t.0\n  #     msg/m.{topic_id}.0  (the latest update)\n  #     msg/m.{topic_id}.1\n  #     (and so on)\n  #\n  #   t.1\n  #     msg/m.{topic_id}.0  (the latest update [in this topic])\n  #     msg/m.{topic_id}.1\n  #     (and so on)\n  #\n  find \"$_D_OUTPUT\"/threads/ -type f -iname \"t.[0-9]*\" -exec cat {} \\; \\\n  | grep '^https://' \\\n  | grep \"/d/topic/$_GROUP\" \\\n  | sort -u \\\n  | sed -e 's#/d/topic/#/forum/?_escaped_fragment_=topic/#g' \\\n  | while read -r _url; do\n      _topic_id=\"${_url##*/}\"\n      _download_page \"$_D_OUTPUT/msgs/m.${_topic_id}\" \"$_url\"\n      #                                 <--+------->\n    done #                                 |\n  #                                       /\n  # FIXME: Sorting issue here -----------'\n\n  echo >&2 \":: Gnerating command to download raw messages...\"\n  find \"$_D_OUTPUT\"/msgs/ -type f -iname \"m.*\" -exec cat {} \\; \\\n  | grep '^https://' \\\n  | grep '/d/msg/' \\\n  | sort -u \\\n  | sed -e 's#/d/msg/#/forum/message/raw?msg=#g' \\\n  | while read -r _url; do\n      _id=\"$(echo \"$_url\"| sed -e \"s#.*=$_GROUP/##g\" -e 's#/#.#g')\"\n      echo \"__curl__ \\\"$_D_OUTPUT/mbox/m.${_id}\\\" \\\"$_url\\\"\"\n    done\n}\n\n_rss() {\n  mkdir -pv \"$_D_OUTPUT\"/{threads,msgs,mbox}/ 1>&2 || exit 1\n\n  {\n    echo >&2 \":: Fetching RSS data...\"\n    # shellcheck disable=2086\n    curl \\\n      --user-agent \"$_USER_AGENT\" \\\n      $_CURL_OPTIONS \\\n      -Lso- \"https://groups.google.com${_ORG:+/a/$_ORG}/forum/feed/$_GROUP/msgs/rss.xml?num=${_RSS_NUM}\"\n  } \\\n  | grep '<link>' \\\n  | grep 'd/msg/' \\\n  | sort -u \\\n  | sed \\\n      -e 's#<link>##g' \\\n      -e 's#</link>##g' \\\n  | while read -r _url; do\n      # shellcheck disable=SC2001\n      _id_origin=\"$(sed -e \"s#.*$_GROUP/##g\" <<<\"$_url\")\"\n      _url=\"https://groups.google.com${_ORG:+/a/$_ORG}/forum/message/raw?msg=$_GROUP/$_id_origin\"\n      _id=\"${_id_origin//\\//.}\"\n      echo \"__curl__ \\\"$_D_OUTPUT/mbox/m.${_id}\\\" \\\"$_url\\\"\"\n    done\n}\n\n# $1: Output File\n# $2: The URL\n__curl__() {\n  if [[ ! -f \"$1\" ]]; then\n    >&2 echo \":: Downloading '$1'...\"\n    # shellcheck disable=2086\n    curl -Ls \\\n      -A \"$_USER_AGENT\" \\\n      $_CURL_OPTIONS \\\n      \"$2\" -o \"$1\"\n    __curl_hook \"$1\" \"$2\"\n  else\n    >&2 echo \":: Skipping '$1'...\"\n  fi\n}\n\n# $1: Output File\n# $2: The URL\n__curl_hook() {\n  :\n}\n\n__sourcing_hook() {\n  # shellcheck disable=1090\n  source \"$1\" \\\n  || {\n    echo >&2 \":: Error occurred when loading hook file '$1'\"\n    exit 1\n  }\n}\n\n_ship_hook() {\n  echo \"#!/usr/bin/env bash\"\n  echo \"\"\n  echo \"export _ORG=\\\"\\${_ORG:-$_ORG}\\\"\"\n  echo \"export _GROUP=\\\"\\${_GROUP:-$_GROUP}\\\"\"\n  echo \"export _D_OUTPUT=\\\"\\${_D_OUTPUT:-$_D_OUTPUT}\\\"\"\n  echo \"export _USER_AGENT=\\\"\\${_USER_AGENT:-$_USER_AGENT}\\\"\"\n  echo \"export _CURL_OPTIONS=\\\"\\${_CURL_OPTIONS:-$_CURL_OPTIONS}\\\"\"\n  echo \"\"\n  declare -f __curl_hook\n\n  if [[ -f \"${_HOOK_FILE:-}\" ]]; then\n    declare -f __sourcing_hook\n    echo \"__sourcing_hook $_HOOK_FILE\"\n  elif [[ -n \"${_HOOK_FILE:-}\" ]]; then\n    echo >&2 \":: ${FUNCNAME[0]}: _HOOK_FILE ($_HOOK_FILE) does not exist.\"\n    exit 1\n  fi\n\n  declare -f __curl__\n}\n\n_help() {\n  echo \"Please visit https://github.com/icy/google-group-crawler for details.\"\n}\n\n_has_command() {\n  # well, this is exactly `for cmd in \"$@\"; do`\n  for cmd do\n    command -v \"$cmd\" >/dev/null 2>&1 || return 1\n  done\n}\n\n_check() {\n  local _requirements=\n  _requirements=\"curl sort awk sed diff\"\n  # shellcheck disable=2086\n  _has_command $_requirements \\\n  || {\n    echo >&2 \":: Some program is missing. Please make sure you have $_requirements.\"\n    return 1\n  }\n\n  if [[ -z \"$_GROUP\" ]]; then\n    echo >&2 \":: Please use _GROUP environment variable to specify your google group\"\n    return 1\n  fi\n}\n\n# An empty function. Can you tell me why is it?\n__main__() { :; }\n\nset -u\n\n_ORG=\"${_ORG:-}\"\n_GROUP=\"${_GROUP:-}\"\n_D_OUTPUT=\"${_D_OUTPUT:-./${_ORG:+${_ORG}-}${_GROUP}/}\"\n# _GROUP=\"${_GROUP//+/%2B}\"\n_USER_AGENT=\"${_USER_AGENT:-Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0}\"\n_CURL_OPTIONS=\"${_CURL_OPTIONS:-}\"\n_RSS_NUM=\"${_RSS_NUM:-50}\"\n\nexport _ORG _GROUP _D_OUTPUT _USER_AGENT _CURL_OPTIONS _RSS_NUM\n\n_check || exit\n\ncase ${1:-} in\n\"-h\"|\"--help\")    _help;;\n\"-sh\"|\"--bash\")   _ship_hook; _main;;\n\"-rss\")           _ship_hook; _rss;;\n*)                echo >&2 \":: Use '-h' or '--help' for more details\";;\nesac\n"
  },
  {
    "path": "tests/tests.sh",
    "content": "#!/usr/bin/env bash\n\n_test_public_1() {\n  export _GROUP=\"${_GROUP:-google-group-crawler-public}\"\n  export _D_OUTPUT=\"${_D_OUTPUT:-./${_ORG:+${_ORG}-}${_GROUP}/}\"\n  export _F_OUTPUT=\"${_F_OUTPUT:-./${_ORG:+${_ORG}-}${_GROUP}.sh}\"\n  export _GREP_MESSAGE=\"${_GREP_MESSAGE:-CICD passed}\"\n\n  echo >&2 \"\"\n  echo >&2 \":: --> Testing Public Group $_GROUP (ORG: ${_ORG:-<empty>}) <--\"\n  echo >&2 \":: --> _CURL_OPTIONS: ${_CURL_OPTIONS:-<empty>}\"\n  echo >&2 \"\"\n  echo >&2 \":: Removing $PWD/$_D_OUTPUT\"\n  rm -rf \"$PWD/$_D_OUTPUT/\"\n  echo >&2 \":: Generating $_F_OUTPUT...\"\n  crawler.sh -sh > \"$_F_OUTPUT\" || return 1\n  bash -n \"$_F_OUTPUT\" || return 1\n  echo >&2 \":: Executing $_F_OUTPUT...\"\n  bash -x \"$_F_OUTPUT\" || return 1\n  crawler.sh -rss || return 1\n\n  grep -Ri \"Message-Id:\" \"$_D_OUTPUT/mbox/\" \\\n  || {\n    echo >&2 \":: Unable to find any mail messages from $_D_OUTPUT/mbox/\"\n    return 1\n  }\n\n  grep -Ri \"$_GREP_MESSAGE\" \"$_D_OUTPUT/mbox/\" \\\n  || {\n    echo >&2 \":: Unable to find string 'CICD passed' from $_D_OUTPUT/mbox/\"\n    return 1\n  }\n}\n\n_test_reset() {\n  unset _ORG\n  unset _D_OUTPUT\n  unset _F_OUTPUT\n  unset _GREP_MESSAGE\n  unset _CURL_OPTIONS\n}\n\n_test_public_1_with_cat() {\n  (\n    _test_reset\n    export _GROUP=\"google-group-crawler-public2\"\n    _test_public_1\n  )\n}\n\n_test_public_2_loop_detection() {\n  (\n    _test_reset\n    export _ORG=\"viettug.org\"\n    export _GROUP=\"google-group-crawler-public2\"\n    _test_public_1\n    [[ $? == 125 ]] \\\n    || {\n      echo >&2 \":: Unable to detect a loop.\"\n      return 1\n    }\n    echo >&2 \":: Loop detected when no cookie is provided. Test passed.\"\n  )\n}\n\n_test_public_2_with_cookie() {\n  (\n    _test_reset\n    export _ORG=\"viettug.org\"\n    export _GROUP=\"google-group-crawler-public2\"\n    export _CURL_OPTIONS=\"--config curl-options.txt\"\n    export _GREP_MESSAGE=\"This is a public group from a private organization\"\n    _test_public_1\n  )\n}\n\n_test_private_1() {\n  (\n    _test_reset\n    export _GROUP=\"google-group-crawler-private\"\n    export _CURL_OPTIONS=\"--config curl-options.txt\"\n    _test_public_1\n  )\n}\n\n_main() { :; }\n\nset -u\n\ncd \"$(dirname \"${BASH_SOURCE[0]:-.}\")/../tests/\" || exit 1\nexport PATH=\"$PATH:$(pwd -P)/../\"\n\n_test_public_1 || exit 1\n_test_public_1_with_cat || exit 1\n#_test_public_2_loop_detection || exit 1\n_test_public_2_with_cookie || exit 2\n_test_private_1 || exit 3\n"
  }
]