Repository: xnl-h4ck3r/urless Branch: main Commit: e9bfa484ea6e Files: 7 Total size: 67.9 KB Directory structure: gitextract_gotbcstc/ ├── .gitignore ├── CHANGELOG.md ├── README.md ├── config.yml ├── setup.py └── urless/ ├── __init__.py └── urless.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ build/ dist/ urless.egg-info __pycache__ test.txt ================================================ FILE: CHANGELOG.md ================================================ ## Changelog - v2.7 - New - If the `config.yml` file is not found in the expected config directory (e.g. `~/.config/urless/` on Linux or `%APPDATA%/urless/` on Windows), it will be automatically created with default values. This fixes the issue where installing with `pipx` did not create the `config.yml` file. - Surpresses the warning about `requests` not being able to import `urllib3`. - v2.6 - Changed - BUG FIX: Change the type `js.ko` to `ja,ko` in `LANGUAGE` within `config.yml` and `DEFAULT_LANGUAGE` within `urless.py` - Set `DEFAULT_REMOVE_PARAMS` and the `REMOVE_PARAMS` in `config.yml` file to `_,cachebuster,cacheBuster,utm_source,utm_medium,utm_campaign,utm_content,utm_term,utm_adgroup,utm_custom,utm_name` in `urless.py`. These was a mismatch between the two files. Also, the Google Analytics parameters should be removed by default. - v2.5 - Changed - Fix the issue of it saying the version is outdated when it is the latest version. - Applied black code formatting to `__init__.py`, `setup.py`, and `urless.py` to ensure consistent code style. - v2.4 - Changed - Various optimizations to improve performance, e.g. Pre-compiled Regular Expressions, Optimized Extension Filtering and Memory-Efficient File Processing. - v2.3 - Fixed - Remove TTY-gating that silences output in non-TTY environments like Docker, CI, or cron jobs. The --no-banner flag and -o/--output already provide users control over output, so the extra TTY checks only broke non-interactive usage. Thanks to [@tavgar](https://github.com/tavgar) for the fix in [PR #15](https://github.com/xnl-h4ck3r/urless/pull/15). - v2.2 - New - Add argument `-c`/`--config` to specify a path to a custom `config.yml` file. This resolves [Issue 9](https://github.com/xnl-h4ck3r/urless/issues/9). - Add argument `-dp`/`--disregard-params`. There is certain filtering that is not done if the URLs have parameters, because by default we want to see all possible parameters. If this argument is passed, then the filtering will be done, regardless of the existence of any parameters. This resolves [Issue 11](https://github.com/xnl-h4ck3r/urless/issues/11) and [Issue 12](https://github.com/xnl-h4ck3r/urless/issues/12). - Changed - The description for argument `-khw`/`--keep-human-written` says `By default, any URL with a path part that contains 3 or more dashes (-) are removed` but this will be corrected to `contains more than 3 dashes`. - Correct the description for argument `-kym`/`--keep-yyyymm` on the `-h` output and `README.md`. It says `By default, any URL with a path containing 3 /YYYY/MM` but the `3` should be removed. - v2.1 - New - Add `long_description_content_type` to `setup.py` to upload to PyPi - Add `urless` to `PyPi` so can be installed with `pip install urless` - v2.0 - New - Add `REMOVE_PARAMS` to `config.yml`. This will be a comma separated list of case sensitive parameter names that you want removed completely from URLs. This can be useful to remove cache buster parameters, so will default to `cachebuster,cacheBuster` to show examples. - Add arg `-rp`/`--remove-params` which can be used to pass a comma separated list of parameter names to remove from URLs. This will override the `REMOVE_PARAMS` list in `config.yml`. - Show the current version of the tool in the banner, and whether it is the latest, or outdated. - Add arg `--version` to show the current version of the tool. - When installing `urless`, if the `config.yml` already exists then it will keep that one and create `config.yml.NEW` in case you need to replace the old config. - Changed - Fix a bug that meant defaults were not set correctly if `config.yml` keys are missing. - v1.3 - New - Add argument `-fnp`/`--fragment-not-param`. If passed the URL fragments `#` will NOT be treated in the same way as parameters, e.g. if a link has a filter keyword and a fragment (or param) the link is usually kept, but if this argument is passed and a link has a filter word and fragment, the link will be removed. Also, if this arg is passed and `-iq` / `--ignore-querystring` is used, the fragment will NOT be removed from links if no query string is in the link. - v1.2 - Changed - Changes to prevent `SyntaxWarning: invalid escape sequence` errors when Python 3.12 is used. - v1.1 - Changed - Add support to automatically identify file encoding. - v1.0 - Changed - Add support for quick install using pip or pipx. - v0.9 - Changed - Add i18N language codes `gb-en,ca-en,au-en,fr-fr,ca-fr,es-es,mx-es,de-de,it-it,br-pt,pt-pt,jp-ja,cn-zh,tw-zh,kr-ko,sa-ar,in-hi,ru-ru` - v0.8 - New - Add `DEFAULT_LANGUAGE` constant and `LANGUAGE` key in `config.yml` with the most common language codes: `en,en-us,en-gb,fr,de,pl,nl,fi,sv,it,es,pt,ru,pt-br,es-mx,zh-tw,js.ko` - Add `-lang`/`--language` argument. If passed and there are multiple URLs with different language codes as a part of the path, only one version of the URL will be output. The codes are specific in the `LANGUAGE` key of `config.yml` - Changed - A URL can have a GUID, Integer, CustomID and Language Code in the same URL and be de-cluttered properly. - If the Custom Regex ID doesn't start with `^` and end in `$`, those will be added. - Fix bug where it added the last occurrence of a regex pattern instead of the first. - Simplify the code in `processUrl` and `createPattern` functions... I had some strange logic that was unnecessary! - Make sure case is ignored when any `FILTER_EXTENSIONS` in `config.yml` or passed with `-fe` are compared with input. - v0.7 - New - Add `-rcid` / `--regex-custom-id` argument to provide a regex expression for a Custom ID that your target uses. - Add `-nb` / `--no-banner` argument to hide the tool banner. This is only needed if you are not piping input to `urless`. - Add `-khw` / `--keep-human-written` argument to prevent URLs with a path part that contains 3 or more dashes (-) from being removed (e.g. blog post). These are normally removed by default. - Add `-kym` / `--keep-yyyymm` argument to prevent URLs with a path part that contains a year and month in the format `/YYYY/DD` (e.g. blog or news). These are normally removed by default. - Add `-iq` / `--ignore-querystring` argument to remove the query string (including URL fragments `#`) so output is unique paths only. - Changed - Fix bug where `/blah/1337` was not being treated differently to `/1337` for example. - When a Custom ID, GUID or Integer ID is found in a URL, and only one URL from many in the same format are returned in the output, use the first ID found in the input for that ID type. - v0.6 - New - By default, a trailing `/` will be removed from the end of a URL. - Added new argument `-ks`/`--keep-slash` that will ensure any links that do have a trailing slash in the input will not have the slash removed in the output, and therefore there may be identical URLs output, one with and one without a trailing slash. - v0.5 - Changed - Fixed Github Issue #3 to remove port 80 and 443 correctly - v0.4 - Changed - Various bug fixes - v0.3 - New - Add an `__init_.py` file to store the version, and move the image to a separate folder to make it cleaner. - Changed - If a line in the input throws an error due to not being a valid URL when parsed, then skip it, but output an error showing the URL if the `-v` arg is passed. - v0.2 - Fixed the bug `ERROR matchesPatterns 1: missing ), unterminated subpattern at position 237` by escaping the regex string before searching - v0.1 - Inital release. Please see README.md ================================================ FILE: README.md ================================================
## About - v2.7 This is a tool used to de-clutter a list of URLs. As a starting point, I took the amazing tool [uro](https://github.com/s0md3v/uro/) by Somdev Sangwan. But I wanted to change a few things, make some improvements (like deal with GUIDs) and make it more customizable. ## Installation `urless` supports **Python 3**. Install `urless` in default (global) python environment. ```bash pip install urless ``` OR ```bash pip install git+https://github.com/xnl-h4ck3r/urless.git -v ``` You can upgrade with ```bash pip install --upgrade urless ``` ### pipx Quick setup in isolated python environment using [pipx](https://pypa.github.io/pipx/) ```bash pipx install git+https://github.com/xnl-h4ck3r/urless.git ``` ## Usage | Argument | Long Argument | Description | | -------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | -i | --input | A file of URLs to de-clutter. | | -o | --output | The output file that will contain the de-cluttered list of URLs (default: output.txt). If piped to another program, output will be written to STDOUT instead. | | -fk | --filter-keywords | A comma separated list of keywords to exclude links (if there no parameters). This will override the `FILTER_KEYWORDS` list specified in config.yml | | -fe | --filter-extensions | A comma separated list of file extensions to exclude. This will override the `FILTER_EXTENSIONS` list specified in `config.yml` | | -rp | --remove-params | A comma separated list of **case senistive** parameters to remove from ALL URLs. This will override the `REMOVE_PARAMS` list specified in `config.yml`. This can be useful to remove cache buster parameters for example.\*\* | | -ks | --keep-slash | A trailing slash at the end of a URL in input will not be removed. Therefore there may be identical URLs output, one with and one without a trailing slash. | | -khw | --keep-human-written | By default, any URL with a path part that contains more than 3 dashes (-) are removed because it is assumed to be human written content (e.g. blog post), and not interesting. Passing this argument will keep them in the output. | | -kym | --keep-yyyymm | By default, any URL with a path containing /YYYY/MM (where YYYY is a year and MM month) are removed because it is assumed to be blog/news content, and not interesting. Passing this argument will keep them in the output. | | -rcid | --regex-custom-id | **USE WITH CAUTION!** Regex for a Custom ID that your target uses. Ensure the value is passed in quotes. See the section below for more details on this. | | -iq | --ignore-querystring | Remove the query string (including URL fragments `#`) so output is unique paths only. | | -fnp | --fragment-not-param | Don't treat URL fragments `#` in the same way as parameters, e.g. if a link has a filter keyword and a fragment (or param) the link is usually kept, but if this argument is passed and a link has a filter word and fragment, the link will be removed. Also, if this arg is passed and `-iq` / `--ignore-querystring` is used, the fragment will NOT be removed from links if no query string is in the link. | | -lang | --language | If passed and there are multiple URLs with different language codes as a part of the path, only one version of the URL will be output. The codes are specified in the `LANGUAGE` section of `config.yml`. | | -c | --config | Path to the YML config file. If not passed, it looks for file `config.yml` in the default config directory, e.g. `~/.config/urless/`. | | -dp | --disregard-params | There is certain filtering that is not done if the URLs have parameters, because by default we want to see all possible parameters. If this argument is passed, then the filtering will be done, regardless of the existence of any parameters. | | -nb | --no-banner | Hides the tool banner (it is hidden by default if you pipe input to urless) output. | | | --version | Show current version number. | | -v | --verbose | Verbose output | ## What does it do exactly? You basically pass a list of URLs in (from a file, or pipe from STDIN), and get a de-cluttered file or URLs out. But in what way are they de-cluttered? I'll explain this below, but first here are some terms that will be used: - **FILTER-EXTENSIONS**: This refers to the list of extensions that can either be passed with `-fe`, specified with `FILTER_EXTENSIONS` in the `config.yml`, or if neither of those exist, a default list of `.css,.ico,.jpg,.jpeg,.png,.bmp,.svg,.img,.gif,.mp4,.flv,.ogv,.webm,.webp,.mov,.mp3,.m4a,.m4p,.scss,.tif,.tiff,.ttf,.otf,.woff,.woff2,.bmp,.ico,.eot,.htc,.rtf,.swf,.image`. - **FILTER-KEYWORDS**: This refers to the list of keywords that can either be passed with `-fk`, specified with `FILTER_KEYWORDS` in the `config.yml`, or if neither of those exist, a default list of `blog,article,news,bootstrap,jquery,captcha,node_modules` - **LANGUAGE**: This refers to the list of language codes that can be specified with `LANGUAGE` in the `config.yml`, or if it doesn't exist, a default list of the most common codes `en,en-us,en-gb,fr,de,pl,nl,fi,sv,it,es,pt,ru,pt-br,es-mx,zh-tw,js.ko` - **UNWANTED-CONTENT**: - A section of the URL path contains more than 3 dashes (`-`), BUT isn't a GUID. This implies human written content, e.g. `how-to-hack-the-planet`. If arg `-khw` is passed, then this won't be removed. - The URL contains `/YYYY/MM/` , e.g. a year, month . This is usually static content such as a blog. If arg `-kym` is passed, then this won't be removed. Here's what happens: - If a URL has port 80 or 443 explicitly given, then remove it from the URL (e.g. http://example.com:80/test -> http://example.com/test) - If the URL has any **FILTER-EXTENSIONS**, it will be removed from the output. - If the URL has NO parameters **OR** the `-dp`/`--disregard-params` argument was passed: - If the URL contains a **FILTER-KEYWORDS** or **UNWANTED-CONTENT**, it will be removed. - if the URL query string contains unwanted parameters specified in config `REMOVE_PARAMS` (or overridden wit argument `-rp`/`--remove-params`), they will be removed from all URLs before processing. - If `-rcid`/`--regex-custom-id` is passed and the URL path contains a Custom ID, only one match to the Custom ID regex will be included if there are multiple URLs where that is the only difference. - If the URL path contains a GUID, only one of the GUIDs will be included if there are multiple URLs where the GUID is the only difference. - If the URL path contains an Integer ID, only one of the Integer IDs will be included if there are multiple URLs where the Integer ID is the only difference. - If the `-lang` argument is passed and the URL contains a language code (e.g. `en-gb`), only one of the language codes will be included if there are multiple URLs where the language code is different. - Else the URL has Parameters (or a fragment `#`) **AND** the `-dp`/`--disregard-params` argument was NOT passed: - If there are multiple URLs with the same parameters, then only URLs with unique parameter values are included. - If there are URL's with a Parameter, but no value (or a fragment), then this will be included. ## Examples ### Basic use ``` cat target_urls.txt | urless ``` or ``` urless -i target_urls.txt ``` ### Capture output ``` cat target_urls.txt | urless > output.txt ``` or ``` urless -i target_urls.txt -o output.txt ``` ## config.yml The `config.yml` file has the keys which can be updated to suit your needs: - `FILTER_KEYWORDS` - A comma separated list of keywords (e.g. `blog,article,news` etc.) that URLs are checked against in certain circumstances. - `FILTER_EXTENSIONS` - A comma separated list of file extensions (e.g. `.css,.jpg,.jpeg` etc.) that all URLs are checked against. If a URL includes any of the strings then it will be excluded from the output. - `LANGUAGE` - A comma separated list of language codes (e.g. `en-gb,fr,nl` etc.) that all URLs are checked against when the `-lang` argument is passed. If there are multiple URLs with different language codes, only one version of the URL will be output. - `REMOVE_PARAMS` - A comma separated list of **case sensitive** parameter names (e.g. `cachebuster,cacheBuster`) that will be removed from all URLs before processing. ## Custom Regex There are currently automatic regex checks for a path part being a Globally Unique ID (GUID) and an Integer ID, but the `-rcid` / `--regex-custom-id` argument lets you provide a regular expression to identify a custom ID. For example, if a target has a specific ID format (that isn't a GUID or Integer) then you can specify a regex expression for it, and then only one of those will be returned in the output if the rest of the URL is the same. For example: - Assume the target has a user ID in a format like `U-65241X` - And there are multiple URLs like the following: ``` https://target.com/blah/U-61723A/settings https://target.com/blah/U-63352B/settings https://target.com/blah/U-61351A/profile https://target.com/blah/U-61723A/settings https://target.com/blah/U-64135C/profile ``` - You can call `urless` and pass `-rcid 'U-[0-9]{5}[A-Z]'`, then the output would be: ``` https://target.com/blah/U-61723A/settings https://target.com/blah/U-64135C/profile ``` **IMPORTANT REGEX NOTES:** - Writing correct regex expressions can be difficult, and if it isn't correct, you could end up with unpredictable and incorrect output. - Always enclose your regex expression in single quotes when passing to the `-rcid` argument. - You don't need to add a custom regex for a GUID or Integer ID - these are dealt with already. - The regex expression should highlight the whole part of the path. So, if your regex only identifies the start of the path, then add `[^(\?|\/|#|$)]*` to the end of your regex which will mean ALL other characters up until the end of the path part. - You can add `^` at the start, and `$` at the end, of your regex to ensure it represents the whole part of a path between slashes. However, these will be added for you if they are left out. - Make sure the regex only identifies the sections you are interested in, otherwise you may have unexpected results. To test your regex, you can take your input file and do `cat input.txt | grep -E 'U-[0-9]{5}[A-Z]'` for example, and see whether your expression looks correct (it should only highlight what you are interested in, and highlight the whole part of the path that is the custom ID). - You can also test using [Regex101](https://regex101.com), entering sample URLs in the **TEST STRING** section to check if it is correct. Make sure the **REGEX FLAGS** **g**lobal and **m**ultiline are selected. - There maybe cases where you just can't supply a regex that is going to identify the Custom ID correctly without treating other values as the same. For example, if there are URLs like `https://target.com/blah/xnl/settings` where `xnl` is a User Name, you won't be able to create a regex for user name because it is not a unique enough format to distinguish it from other possible path values. ## Issues If you come across any problems at all, or have ideas for improvements, please feel free to raise an issue on Github. If there is a problem, it will be useful if you can provide the exact command you ran and a detailed description of the problem. If possible, run with `-v` to reproduce the problem and let me know about any error messages that are given. ## TODO None - feel free to raise a Github issue to suggest any enhancements. ## And finally... Good luck and good hunting! If you really love the tool (or any others), or they helped you find an awesome bounty, consider [BUYING ME A COFFEE!](https://ko-fi.com/xnlh4ck3r) ☕ (I could use the caffeine!) 🤘 /XNL-h4ck3r

Buy Me a Coffee at ko-fi.com ================================================ FILE: config.yml ================================================ FILTER_KEYWORDS: blog,article,news,bootstrap,jquery,captcha,node_modules FILTER_EXTENSIONS: .css,.ico,.jpg,.jpeg,.png,.bmp,.svg,.img,.gif,.mp4,.flv,.ogv,.webm,.webp,.mov,.mp3,.m4a,.m4p,.scss,.tif,.tiff,.ttf,.otf,.woff,.woff2,.bmp,.ico,.eot,.htc,.rtf,.swf,.image LANGUAGE: en,en-us,en-gb,fr,de,pl,nl,fi,sv,it,es,pt,ru,pt-br,es-mx,zh-tw,ja,ko,gb-en,ca-en,au-en,fr-fr,ca-fr,es-es,mx-es,de-de,it-it,br-pt,pt-pt,jp-ja,cn-zh,tw-zh,kr-ko,sa-ar,in-hi,ru-ru REMOVE_PARAMS: _,cachebuster,cacheBuster,utm_source,utm_medium,utm_campaign,utm_content,utm_term,utm_adgroup,utm_custom,utm_name ================================================ FILE: setup.py ================================================ #!/usr/bin/env python import os import shutil from setuptools import setup, find_packages # Define the target directory for the config.yml file target_directory = ( os.path.join(os.getenv("APPDATA", ""), "urless") if os.name == "nt" else ( os.path.join(os.path.expanduser("~"), ".config", "urless") if os.name == "posix" else ( os.path.join( os.path.expanduser("~"), "Library", "Application Support", "urless" ) if os.name == "darwin" else None ) ) ) # Copy the config.yml file to the target directory if it exists configNew = False if target_directory and os.path.isfile("config.yml"): os.makedirs(target_directory, exist_ok=True) # If file already exists, create a new one if os.path.isfile(target_directory + "/config.yml"): configNew = True os.rename( target_directory + "/config.yml", target_directory + "/config.yml.OLD" ) shutil.copy("config.yml", target_directory) os.rename( target_directory + "/config.yml", target_directory + "/config.yml.NEW" ) os.rename( target_directory + "/config.yml.OLD", target_directory + "/config.yml" ) else: shutil.copy("config.yml", target_directory) setup( name="urless", packages=find_packages(), version=__import__("urless").__version__, description="De-clutter a list of URLs", long_description=open("README.md").read(), long_description_content_type="text/markdown", author="@xnl-h4ck3r", url="https://github.com/xnl-h4ck3r/urless", zip_safe=False, install_requires=[ "argparse", "pyyaml", "termcolor", "urlparse3", "chardet", "requests", ], entry_points={ "console_scripts": [ "urless = urless.urless:main", ], }, ) if configNew: print( "\n\033[33mIMPORTANT: The file " + target_directory + "/config.yml already exists.\nCreating config.yml.NEW but leaving existing config.\nIf you need the new file, then remove the current one and rename config.yml.NEW to config.yml\n\033[0m" ) else: print( "\n\033[92mThe file " + target_directory + "/config.yml has been created.\n\033[0m" ) ================================================ FILE: urless/__init__.py ================================================ __version__ = "2.7" ================================================ FILE: urless/urless.py ================================================ #!/usr/bin/env python # Python 3 # urless - by @Xnl-h4ck3r: De-clutter a list of URLs # Full help here: https://github.com/xnl-h4ck3r/urless/blob/main/README.md # Good luck and good hunting! If you really love the tool (or any others), or they helped you find an awesome bounty, consider BUYING ME A COFFEE! (https://ko-fi.com/xnlh4ck3r) ☕ (I could use the caffeine!) import re import os import sys from typing import Pattern import yaml import argparse import chardet from signal import SIGINT, signal from urllib.parse import urlparse from termcolor import colored from pathlib import Path try: from . import __version__ import warnings with warnings.catch_warnings(): warnings.simplefilter("ignore") import requests except Exception: pass # Default values if config.yml not found DEFAULT_FILTER_EXTENSIONS = ".css,.ico,.jpg,.jpeg,.png,.bmp,.svg,.img,.gif,.mp4,.flv,.ogv,.webm,.webp,.mov,.mp3,.m4a,.m4p,.scss,.tif,.tiff,.ttf,.otf,.woff,.woff2,.bmp,.ico,.eot,.htc,.rtf,.swf,.image" DEFAULT_FILTER_KEYWORDS = "blog,article,news,bootstrap,jquery,captcha,node_modules" DEFAULT_LANGUAGE = "en,en-us,en-gb,fr,de,pl,nl,fi,sv,it,es,pt,ru,pt-br,es-mx,zh-tw,ja,ko,gb-en,ca-en,au-en,fr-fr,ca-fr,es-es,mx-es,de-de,it-it,br-pt,pt-pt,jp-ja,cn-zh,tw-zh,kr-ko,sa-ar,in-hi,ru-ru" DEFAULT_REMOVE_PARAMS = "_,cachebuster,cacheBuster,utm_source,utm_medium,utm_campaign,utm_content,utm_term,utm_adgroup,utm_custom,utm_name" # Variables to hold config.yml values FILTER_EXTENSIONS = "" FILTER_KEYWORDS = "" LANGUAGE = "" REMOVE_PARAMS = "" reFilterKeywords = "" badExtensions = () # Regex delimiters REGEX_START = "^" REGEX_END = "$" # Regex for a path folder of integer REGEX_INTEGER = REGEX_START + r"\d+" + REGEX_END reIntPart = re.compile(REGEX_INTEGER) patternsInt = {} # Regex for a path folder of GUID REGEX_GUID = ( REGEX_START + "[({]?[a-fA-F0-9]{8}[-]?([a-fA-F0-9]{4}[-]?){3}[a-fA-F0-9]{12}[})]?" + REGEX_END ) reGuidPart = re.compile(REGEX_GUID) patternsGUID = {} # Regex fields for Custom ID reCustomIDPart = Pattern patternsCustomID = {} # Regex for path of YYYY/MM REGEX_YYYYMM = r"\/[1|2][0|1|9]\\d{2}/[0|1]\\d{1}\/" reYYYYMM = re.compile(REGEX_YYYYMM) # Regex for path of language code reLangPart = Pattern patternsLang = {} # Global variables args = None urlmap = {} patternsSeen = [] outFile = None linesOrigCount = 0 linesFinalCount = 0 usingConfigDefaults = False def verbose(): """ Functions used when printing messages dependant on verbose option """ return args.verbose def write(text=""): """ Always print one line to stdout. The --no-banner flag and -o/--output already give users control over noise and redirection, so extra TTY checks only break non-interactive usage (Docker, CI, cron). """ sys.stdout.write(text + "\n") def writerr(text=""): """ Always print one line to stderr. """ sys.stderr.write(text + "\n") def showVersion(): try: try: resp = requests.get( "https://raw.githubusercontent.com/xnl-h4ck3r/urless/main/urless/__init__.py", timeout=3, ) except Exception: write( "Current urless version " + __version__ + " (unable to check if latest)\n" ) if __version__ == resp.text.split("=")[1].replace('"', "").strip(): write( "Current urless version " + __version__ + " (" + colored("latest", "green") + ")\n" ) else: write( "Current urless version " + __version__ + " (" + colored("outdated", "red") + ")\n" ) except Exception: pass def showBanner(): write("") write(colored(r" __ _ ____ _ ___ ___ ____ ", "red")) write(colored(r" | | | | _ \| | / _ \/ __/ __/ ", "yellow")) write(colored(r" | | | | |_) | || __/\__ \__ \ ", "green")) write(colored(r" | |_| | _ <| |_\___/\___/___/ ", "cyan")) write(colored(r" \___/|_| \_\___/", "magenta") + colored("by Xnl-h4ck3r", "white")) write("") showVersion() def getConfig(): """ Try to get the values from the config file, otherwise use the defaults """ global FILTER_EXTENSIONS, FILTER_KEYWORDS, LANGUAGE, REMOVE_PARAMS, reLangPart, usingConfigDefaults, reFilterKeywords, badExtensions try: # Try to get the config file values try: # Put config in global location based on the OS. urlessPath = ( Path(os.path.join(os.getenv("APPDATA", ""), "urless")) if os.name == "nt" else ( Path(os.path.join(os.path.expanduser("~"), ".config", "urless")) if os.name == "posix" else ( Path( os.path.join( os.path.expanduser("~"), "Library", "Application Support", "urless", ) ) if os.name == "darwin" else None ) ) ) urlessPath.absolute if args.config is None: if urlessPath == "": configPath = "config.yml" else: configPath = Path(urlessPath / "config.yml") else: configPath = Path(args.config) config = yaml.safe_load(open(configPath)) # If the user provided the --filter-extensions argument then it overrides the config value if args.filter_keywords: FILTER_KEYWORDS = args.filter_keywords else: try: FILTER_KEYWORDS = config.get("FILTER_KEYWORDS") if str(FILTER_KEYWORDS) == "None": writerr( colored( "No value for FILTER_KEYWORDS in config.yml - default set", "yellow", ) ) FILTER_KEYWORDS = DEFAULT_FILTER_KEYWORDS except Exception: writerr( colored( "Unable to read FILTER_EXTENSIONS from config.yml - default set", "red", ) ) FILTER_KEYWORDS = DEFAULT_FILTER_KEYWORDS reFilterKeywords = re.compile( FILTER_KEYWORDS.replace(",", "|"), re.IGNORECASE ) # If the user provided the --filter-extensions argument then it overrides the config value if args.filter_extensions: FILTER_EXTENSIONS = args.filter_extensions else: try: FILTER_EXTENSIONS = config.get("FILTER_EXTENSIONS") if str(FILTER_EXTENSIONS) == "None": writerr( colored( "No value for FILTER_EXTENSIONS in config.yml - default set", "yellow", ) ) FILTER_EXTENSIONS = DEFAULT_FILTER_EXTENSIONS except Exception: writerr( colored( "Unable to read FILTER_EXTENSIONS from config.yml - default set", "red", ) ) FILTER_EXTENSIONS = DEFAULT_FILTER_EXTENSIONS badExtensions = tuple(ext.lower() for ext in FILTER_EXTENSIONS.split(",")) # If the user provided the --language argument then create the regex for language codes if args.language: # Get the language codes try: LANGUAGE = config.get("LANGUAGE") if str(LANGUAGE) == "None": writerr( colored( "No value for LANGUAGE in config.yml - default set", "yellow", ) ) LANGUAGE = DEFAULT_LANGUAGE except Exception: writerr( colored( "Unable to read LANGUAGE from config.yml - default set", "red", ) ) LANGUAGE = DEFAULT_LANGUAGE # Set the language regex try: reLangPart = re.compile( REGEX_START + "(" + LANGUAGE.replace(",", "|") + ")" + REGEX_END ) except Exception as e: writerr(colored("ERROR getConfig 2: " + str(e), "red")) # If the user provided the --remove-params argument then it overrides the config value if args.remove_params: REMOVE_PARAMS = args.remove_params else: try: REMOVE_PARAMS = config.get("REMOVE_PARAMS") if str(REMOVE_PARAMS) == "None": if verbose(): writerr( colored( "No value for REMOVE_PARAMS in config.yml - default set", "yellow", ) ) REMOVE_PARAMS = DEFAULT_REMOVE_PARAMS except Exception: if verbose(): writerr( colored( "Unable to read REMOVE_PARAMS from config.yml - default set", "red", ) ) REMOVE_PARAMS = DEFAULT_REMOVE_PARAMS except Exception: if args.config is None: writerr( colored( 'WARNING: Cannot find file "config.yml", so using default values', "yellow", ) ) else: writerr( colored( 'WARNING: Cannot find file "' + args.config + '", so using default values', "yellow", ) ) usingConfigDefaults = True FILTER_EXTENSIONS = DEFAULT_FILTER_EXTENSIONS FILTER_KEYWORDS = DEFAULT_FILTER_KEYWORDS LANGUAGE = DEFAULT_LANGUAGE REMOVE_PARAMS = DEFAULT_REMOVE_PARAMS reFilterKeywords = re.compile( FILTER_KEYWORDS.replace(",", "|"), re.IGNORECASE ) badExtensions = tuple(ext.lower() for ext in FILTER_EXTENSIONS.split(",")) except Exception as e: writerr(colored("ERROR getConfig 1: " + str(e), "red")) def ensureConfig(): """ Ensure the config.yml file exists in the default config directory. If not, create the directory and write the default config. This is called before argument parsing so the file is created even when running 'urless' or 'urless -h'. """ try: # Determine the config directory based on OS if os.name == "nt": urlessPath = Path(os.path.join(os.getenv("APPDATA", ""), "urless")) elif os.name == "posix": urlessPath = Path( os.path.join(os.path.expanduser("~"), ".config", "urless") ) else: urlessPath = Path( os.path.join( os.path.expanduser("~"), "Library", "Application Support", "urless", ) ) configPath = urlessPath / "config.yml" # If the config file doesn't exist, create it with default values if not configPath.exists(): try: urlessPath.mkdir(parents=True, exist_ok=True) with open(configPath, "w") as f: f.write(f"FILTER_KEYWORDS: {DEFAULT_FILTER_KEYWORDS}\n") f.write(f"FILTER_EXTENSIONS: {DEFAULT_FILTER_EXTENSIONS}\n") f.write(f"LANGUAGE: {DEFAULT_LANGUAGE}\n") f.write(f"REMOVE_PARAMS: {DEFAULT_REMOVE_PARAMS}\n") except Exception as e: writerr( colored("WARNING: Could not create config.yml: " + str(e), "yellow") ) except Exception as e: writerr(colored("ERROR ensureConfig: " + str(e), "red")) def handler(signal_received, frame): """ This function is called if Ctrl-C is called by the user An attempt will be made to try and clean up properly """ writerr(colored('>>> "Oh my God, they killed Kenny... and urless!" - Kyle', "red")) sys.exit() def paramsToDict(params: str) -> list: """ converts query string to dict """ try: the_dict = {} if params: for pair in params.split("&"): # If there is a parameter but no = then add a value of {EMPTY} if pair.find("=") < 0: key = pair + "{EMPTY}" the_dict[key] = "{EMPTY}" else: parts = pair.split("=") try: the_dict[parts[0]] = parts[1] except IndexError: pass return the_dict except Exception as e: writerr(colored("ERROR paramsToDict 1: " + str(e), "red")) def dictToParams(params: dict) -> str: """ converts dict of params to query string """ try: # If a parameter has a value of {EMPTY} then just the name will be written and no = stringed = [ name if value == "{EMPTY}" else name + "=" + value for name, value in params.items() ] # Only add a ? at the start of parameters, unless the first starts with # if list(params.keys())[0][:1] == "#": paramString = "".join(stringed) else: paramString = "?" + "&".join(stringed) # If a there are any parameters with {EMPTY} in the name then remove the string return paramString.replace("{EMPTY}", "") except Exception as e: writerr(colored("ERROR dictToParams 1: " + str(e), "red")) def compareParams(currentParams: list, newParams: dict) -> bool: """ checks if newParams contain a param that doesn't exist in currentParams """ try: ogSet = set([]) for each in currentParams: for key in each.keys(): ogSet.add(key) return set(newParams.keys()) - ogSet except Exception as e: writerr(colored("ERROR compareParams 1: " + str(e), "red")) def isUnwantedContent(path: str) -> bool: """ Checks any potentially unwanted patterns (unless specified otherwise) such as blog/news content """ try: unwanted = False if not args.keep_human_written: # If the path has more than 3 dashes '-' AND isn't a GUID AND (if specified) isn't a Custom ID, then assume it's human written content, e.g. blog for part in path.split("/"): if part.count("-") > 3: if str(reCustomIDPart.pattern) == "": if not reGuidPart.search(part) and reCustomIDPart.search(part): unwanted = True else: if not reGuidPart.search(part): unwanted = True if not args.keep_yyyymm: # If it contains a year and month in the path then assume like blog/news content, r.g. .../2019/06/... if reYYYYMM.search(path): unwanted = True return unwanted except Exception as e: writerr(colored("ERROR isUnwantedContent 1: " + str(e), "red")) def createPattern(path: str) -> str: """ creates patterns for urls with integers or GUIDs in them """ global patternsGUID, patternsInt, patternsCustomID, patternsLang try: newParts = [] regexInt = False regexGUID = False regexCustom = False regexLang = False for part in path.split("/"): if part == "": newParts.append(part) elif str(reCustomIDPart.pattern) != "" and reCustomIDPart.search(part): regexCustom = True newParts.append(reCustomIDPart.pattern) elif reGuidPart.search(part): regexGUID = True newParts.append(reGuidPart.pattern) elif reIntPart.match(part): regexInt = True newParts.append(reIntPart.pattern) elif args.language and reLangPart.match(part.lower()): regexLang = True newParts.append(reLangPart.pattern) else: newParts.append(part) createdPattern = "/".join(newParts) # Depending on the type of regex, add the found pattern to the dictionary if it hasn't been added already if regexCustom and createdPattern not in patternsCustomID: patternsCustomID[createdPattern] = path elif regexGUID and createdPattern not in patternsGUID: patternsGUID[createdPattern] = path elif regexInt and createdPattern not in patternsInt: patternsInt[createdPattern] = path elif regexLang and createdPattern not in patternsLang: patternsLang[createdPattern] = path return createdPattern except Exception as e: writerr(colored("ERROR createPattern 1: " + str(e), "red")) def patternExists(pattern: str) -> bool: """ Checks if a pattern exists """ try: for i, seen_pattern in enumerate(patternsSeen): if pattern == seen_pattern: patternsSeen[i] = pattern return True elif seen_pattern in pattern: return True return False except Exception as e: writerr(colored("ERROR patternExists 1: " + str(e), "red")) def matchesPatterns(path: str) -> bool: """ checks if the url matches any of the regex patterns """ try: for pattern in patternsSeen: if re.search(pattern, re.escape(path)) is not None: return True return False except Exception as e: writerr(colored("ERROR matchesPatterns 1: " + str(e), "red")) def hasFilterKeyword(path: str) -> bool: """ checks if the url matches the blacklist regex """ global reFilterKeywords try: return reFilterKeywords.search(path) except Exception as e: writerr(colored("ERROR hasFilterKeyword 1: " + str(e), "red")) def hasBadExtension(path: str) -> bool: """ checks if a url has a blacklisted extension """ global badExtensions try: return path.lower().endswith(badExtensions) except Exception as e: writerr(colored("ERROR hasBadExtension 1: " + str(e), "red")) def removeParameters(params) -> dict: """ Removes any parameters from the parameter dictionary """ global REMOVE_PARAMS try: # For every parameter name in the REMOVE_PARAMS list, remove from the dictionary passed for param in REMOVE_PARAMS.split(","): if param in params: del params[param] return params except Exception as e: writerr(colored("ERROR removeParameters 1: " + str(e), "red")) def processUrl(line): try: parsed = urlparse(line.strip()) # Set the host scheme = parsed.scheme if scheme == "": host = parsed.netloc else: host = scheme + "://" + parsed.netloc # If the link specifies port 80 or 443, e.g. http://example.com:80, then remove the port if str(parsed.port) == "80": host = host.replace(":80", "", 1) if str(parsed.port) == "443": host = host.replace(":443", "", 1) # Build the path and parameters path, params = parsed.path, paramsToDict(parsed.query) # Remove any necessary parameters params = removeParameters(params) # If there is a fragment... # if arg -fnp / --fragment-not-param was passed, change the path to include the hash, # else, add as the last parameter with a name but with value {EMPTY} that doesn't add an = afterwards if parsed.fragment: if args.fragment_not_param: path = path + "#" + parsed.fragment else: params["#" + parsed.fragment] = "{EMPTY}" # Add the host to the map if it hasn't already been seen if host not in urlmap: urlmap[host] = {} # If the path has an extension we want to exclude, then just return to continue with the next line if hasBadExtension(path): return # If there are no parameters (or the --disregard-params argument was passed) and path isn't empty if (not params or args.disregard_params) and path != "": # If its unwanted content or has a keyword to be excluded, then just return to continue with the next line if isUnwantedContent(path) or hasFilterKeyword(path): return # If the current path already matches a previously saved pattern then just return to continue with the next line if matchesPatterns(path): return # If the path has ++ in it for any reason, then just output "as is" otherwise it will raise a regex Multiple Repeat Error if path.find("++") > 0: pattern = path else: # Create a pattern for the current path pattern = createPattern(path) # Update the url map if pattern not in urlmap[host]: urlmap[host][pattern] = [params] if params else [] elif params and compareParams(urlmap[host][pattern], params): urlmap[host][pattern].append(params) except ValueError: if verbose(): writerr( colored( "This URL caused a Value Error and was not included: " + line, "red" ) ) except Exception as e: writerr(colored("ERROR processUrl 1: " + str(e), "red")) def processLine(line): """ Process a line from the input based on whether the -ks / --keep-slash argument was passed """ # If the -ks / --keep-slash argument was passed, then just add all URLs, # else remove the trailing slash form any URLs (before any query string) if args.keep_slash: line = line.rstrip("\n") else: if line.find("/?") > 0: line = line.replace("/?", "?", 1) else: line = line.rstrip("\n").rstrip("/") # If the -iq / --ignore-querystring argument was passed, remove any querystring and fragment (unless -fnp is passed, in which case the fragment is only removed if a query string exists too) if args.ignore_querystring: if args.fragment_not_param: line = line.split("?")[0] else: line = line.split("?")[0].split("#")[0] return line def processInput(): global linesOrigCount try: if not sys.stdin.isatty(): for line in sys.stdin: processUrl(processLine(line)) else: with open(os.path.expanduser(args.input), "rb") as f: result = chardet.detect(f.read()) # or readline if the file is large try: linesOrigCount = 0 with open( os.path.expanduser(args.input), "r", encoding=result["encoding"] ) as inFile: for line in inFile: linesOrigCount += 1 processUrl(processLine(line)) except Exception as e: writerr(colored("ERROR processInput 2 " + str(e), "red")) except Exception as e: writerr(colored("ERROR processInput 1: " + str(e), "red")) def processOutput(): global linesFinalCount, linesOrigCount, patternsGUID, patternsInt, patternsCustomID, patternsLang try: # If an output file was specified, open it if args.output is not None: try: outFile = open(os.path.expanduser(args.output), "w") except Exception as e: writerr(colored("ERROR processOutput 2 " + str(e), "red")) # Output all URLs for host, value in urlmap.items(): for path, params in value.items(): # Replace the regex pattern in the path with the first occurrence of that pattern found try: customRegexFound = False if ( str(reCustomIDPart.pattern) != "" and path.find(str(reCustomIDPart.pattern)) > 0 ): for pattern in patternsCustomID: if pattern == path: path = patternsCustomID[pattern] customRegexFound = True if not customRegexFound: if path.find(REGEX_GUID) > 0: for pattern in patternsGUID: if pattern == path: path = patternsGUID[pattern] elif path.find(REGEX_INTEGER) > 0: for pattern in patternsInt: if pattern == path: path = patternsInt[pattern] elif path.find(str(reLangPart.pattern)) > 0: for pattern in patternsLang: if pattern == path: path = patternsLang[pattern] except Exception as e: writerr(colored("ERROR processOutput 4: " + str(e), "red")) if params: for param in params: linesFinalCount = linesFinalCount + 1 # If an output file was specified, write to the file if args.output is not None: outFile.write(host + path + dictToParams(param) + "\n") else: # If output is piped or the --output argument was not specified, output to STDOUT if not sys.stdin.isatty() or args.output is None: write(host + path + dictToParams(param)) else: linesFinalCount = linesFinalCount + 1 # If an output file was specified, write to the file if args.output is not None: outFile.write(host + path + "\n") else: # If output is piped or the --output argument was not specified, output to STDOUT if not sys.stdin.isatty() or args.output is None: write(host + path) if verbose() and sys.stdin.isatty(): writerr( colored( "\nInput reduced from " + str(linesOrigCount) + " to " + str(linesFinalCount) + " lines 🤘", "cyan", ) ) # Close the output file if it was opened try: if args.output is not None: write( colored("Output successfully written to file: ", "cyan") + colored(args.output, "white") ) write() outFile.close() except Exception as e: writerr(colored("ERROR processOutput 3: " + str(e), "red")) except Exception as e: writerr(colored("ERROR processOutput 1: " + str(e), "red")) def showOptionsAndConfig(): global FILTER_EXTENSIONS, FILTER_KEYWORDS, LANGUAGE, REMOVE_PARAMS, usingConfigDefaults try: write(colored("Selected options and config:", "cyan")) write( colored("-i: " + args.input, "magenta") + colored(" The input file of URLs to de-clutter.", "white") ) if args.output is not None: write( colored("-o: " + args.output, "magenta") + colored( " The output file that the de-cluttered URL list will be written to.", "white", ) ) else: write( colored("-o: ", "magenta") + colored( " An output file wasn't given, so output will be written to STDOUT.", "white", ) ) if args.disregard_params: write( colored("-dp: True", "magenta") + colored( " When filtering the URLs, they will not be treated differently just because they have parameters.", "white", ) ) if args.config: if usingConfigDefaults: write( colored("-config: " + args.config, "magenta") + colored(" The path of the YML config file.", "white") + colored(" WARNING: Not found, so using default values.", "yellow") ) else: write( colored("-config: " + args.config, "magenta") + colored(" The path of the YML config file.", "white") ) if args.filter_keywords: write( colored("-fk (Keywords to Filter): ", "magenta") + colored(args.filter_keywords, "white") ) else: write( colored("Filter Keywords (from Config.yml): ", "magenta") + colored(FILTER_KEYWORDS, "white") ) if args.filter_extensions: write( colored("-fe (Extensions to Filter): ", "magenta") + colored(args.filter_extensions, "white") ) else: write( colored("Filter Extensions (from Config.yml): ", "magenta") + colored(FILTER_EXTENSIONS, "white") ) if args.language: write( colored("Languages (from Config.yml): ", "magenta") + colored(LANGUAGE, "white") ) write( colored("-lang: True", "magenta") + colored( "If there are multiple URLs with different language codes as a part of the path, only one version of the URL will be output.", "white", ) ) if args.remove_params: write( colored("-rp (Params to Remove): ", "magenta") + colored(args.remove_params, "white") ) else: write( colored("Remove Params (from Config.yml): ", "magenta") + colored(REMOVE_PARAMS, "white") ) if args.keep_slash: write( colored("-ks: True", "magenta") + colored( "A trailing slash at the end of a URL in input will not be removed. Therefore there may be identical URLs output, one with and one without a trailing slash.", "white", ) ) if args.keep_human_written: write( colored("-khw: True", "magenta") + colored( "Prevent URLs with a path part that contains 3 or more dashes (-) from being removed (e.g. blog post)", "white", ) ) if args.keep_yyyymm: write( colored("-kym: True", "magenta") + colored( "Prevent URLs with a path part that contains a year and month in the format `/YYYY/DD` (e.g. blog or news)", "white", ) ) if args.regex_custom_id: write( colored("-rcid: '" + str(reCustomIDPart.pattern) + "'", "magenta") + colored(" USE WITH CAUTION! ", "red") + colored( "Regex for a Custom ID that your target uses. Ensure the value is passed in quotes. See the README for more details on this.", "white", ) ) if args.keep_yyyymm: write( colored("-iq: True", "magenta") + colored( " Remove the query string (including URL fragments `#`) so output is unique paths only.", "white", ) ) write("") except Exception as e: writerr(colored("ERROR showOptionsAndConfig 1: " + str(e), "red")) def argCheckRegexCustomID(value): global reCustomIDPart try: # If the Custom ID regex was passed, then prefix with ^ and suffix with $ if they are not there already if value != "": if value[0] != REGEX_START: value = REGEX_START + value if value[-1] != REGEX_END: value = value + REGEX_END # Try to compile the regex reCustomIDPart = re.compile(value) return value except Exception: raise argparse.ArgumentTypeError("Valid regex must be passed.") def main(): global args, urlmap, patternsSeen, patternsInt, patternsCustomID, patternsGUID, patternsLang # Ensure config.yml exists before anything else ensureConfig() # Tell Python to run the handler() function when SIGINT is received signal(SIGINT, handler) # Parse command line arguments parser = argparse.ArgumentParser( description="urless - by @Xnl-h4ck3r: De-clutter a list of URLs." ) parser.add_argument( "-i", "--input", action="store", help="A file of URLs to de-clutter." ) parser.add_argument( "-o", "--output", action="store", help="The output file that will contain the de-cluttered list of URLs (default: output.txt). If piped to another program, output will be written to STDOUT instead.", ) parser.add_argument( "-fk", "--filter_keywords", action="store", help="A comma separated list of keywords to exclude links (if there no parameters). This will override the FILTER_KEYWORDS list specified in config.yml", metavar="", ) parser.add_argument( "-fe", "--filter-extensions", action="store", help="A comma separated list of file extensions to exclude. This will override the FILTER_EXTENSIONS list specified in config.yml", metavar="", ) parser.add_argument( "-rp", "--remove-params", action="store", help="A comma separated list of case sensitive parameters to remove from all URLs. This will override the REMOVE_PARAMS list specified in config.yml. This can be useful for cache buster parameters for example.", metavar="", ) parser.add_argument( "-ks", "--keep-slash", action="store_true", help="A trailing slash at the end of a URL in input will not be removed. Therefore there may be identical URLs output, one with and one without a trailing slash.", ) parser.add_argument( "-khw", "--keep-human-written", action="store_true", help="By default, any URL with a path part that contains more than 3 dashes (-) are removed because it is assumed to be human written content (e.g. blog post) and not interesting. Passing this argument will keep them in the output.", ) parser.add_argument( "-kym", "--keep-yyyymm", action="store_true", help="By default, any URL with a path containing /YYYY/MM (where YYYY is a year and MM month) are removed because it is assumed to be blog/news content, and not interesting. Passing this argument will keep them in the output.", ) parser.add_argument( "-rcid", "--regex-custom-id", action="store", help="USE WITH CAUTION! Regex for a Custom ID that your target uses. Ensure the value is passed in quotes. See the README for more details on this.", default="", metavar="REGEX", type=argCheckRegexCustomID, ) parser.add_argument( "-iq", "--ignore-querystring", action="store_true", help="Remove the query string (including URL fragments `#`) so output is unique paths only.", ) parser.add_argument( "-fnp", "--fragment-not-param", action="store_true", help="Don't treat URL fragments `#` in the same way as parameters, e.g. if a link has a filter keyword and a fragment (or param) it is usually kept, but if this argument is passed and a link has a filter word and fragment, it will be removed.", ) parser.add_argument( "-lang", "--language", action="store_true", help='If passed, and there are multiple URLs with different language codes as a part of the path, only one version of the URL will be output. The codes are specified in the "LANGUAGE" section of "config.yml".', ) parser.add_argument( "-c", "--config", action="store", help="Path to the YML config file. If not passed, it looks for file 'config.yml' in the default config directory, e.g. '~/.config/urless/'.", ) parser.add_argument( "-dp", "--disregard-params", action="store_true", help="There is certain filtering that is not done if the URLs have parameters, because by default we want to see all possible parameters. If this argument is passed, then the filtering will be done, regardless of the existence of any parameters.", ) parser.add_argument( "-nb", "--no-banner", action="store_true", help="Hides the tool banner." ) parser.add_argument("--version", action="store_true", help="Show version number") parser.add_argument("-v", "--verbose", action="store_true", help="Verbose output.") args = parser.parse_args() # If --version was passed, display version and exit if args.version: write(colored("urless - v" + __version__, "cyan")) sys.exit() try: # If no input was given, raise an error if sys.stdin.isatty(): if args.input is None: writerr( colored( "You need to provide an input with -i argument or through .", "red", ) ) sys.exit() # Get the config settings from the config.yml file getConfig() # If input is not piped, show the banner, and if --verbose option was chosen show options and config values if sys.stdin.isatty(): # Show banner unless requested to hide if not args.no_banner: showBanner() if verbose(): showOptionsAndConfig() # Process the input given on -i (--input), or processInput() # Output the saved urls with parameters processOutput() except Exception as e: writerr(colored("ERROR main 1: " + str(e), "red")) # Show ko-fi link if verbose and not piped try: if verbose() and sys.stdin.isatty(): writerr( colored( "✅ Want to buy me a coffee? ☕ https://ko-fi.com/xnlh4ck3r 🤘", "green", ) ) except Exception: pass finally: # Clean up urlmap = None patternsSeen = None patternsCustomID = None patternsGUID = None patternsInt = None patternsLang = None if __name__ == "__main__": main()