[
  {
    "path": ".coveragerc",
    "content": "[paths]\nsource =\n    scourgify\n\n[run]\nsource =\n    scourgify\nomit =\n    *tox*\n    setup.py\n    *test*\n\n[report]\n;sort = Cover\nsort = Name\nskip_covered = True\nshow_missing = True"
  },
  {
    "path": ".gitignore",
    "content": "# Created by .ignore support plugin (hsz.mobi)\n### Python template\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\n*.egg-info/\n.installed.cfg\n*.egg\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n.hypothesis/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# pyenv\n.python-version\n\n# celery beat schedule file\ncelerybeat-schedule\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n\n.pytest_cache/\n\n# pycharm\n.idea\n"
  },
  {
    "path": ".isort.cfg",
    "content": "[settings]\nline_length=79\nmulti_line_output=3\ninclude_trailing_comma=1\nknown_standard_library=typing\nknown_django=django\nknown_thirdparty=peewee\nimport_heading_stdlib=Imports from Standard Library\nimport_heading_thirdparty=Imports from Third Party Modules\nimport_heading_django=Imports from Django\nimport_heading_firstparty=Local Imports\nsections=FUTURE,STDLIB,DJANGO,THIRDPARTY,FIRSTPARTY,LOCALFOLDER\nnot_skip = __init__.py\n\n# for additional settings see:\n#    https://github.com/timothycrosley/isort/wiki/isort-Settings\n"
  },
  {
    "path": "CHANGELOG.rst",
    "content": "Changelog\n=========\n0.2.3 [2020-05-06]\n------------------\n* Valid OccupancyType bug fix for OccupancyType that is already valid abbreviation\n\n0.2.1 [2020-05-06]\n------------------\n* Corrected for late OccupancyType additions and allowed # OccpancyType to pass through\n\n0.2.0 [2020-05-06]\n------------------\n* potentially breaking change. Non-standard unit numbers now converted to a default.\nThis is based on a real life incident; the original\nbehavior to allow non-standard unit types to pass through resulted\nin an address validation service also allowing the address to pass\nthrough even though no unit should have existed on the home.\n\n0.1.3 [2018-09-09]\n------------------\n* python 3.7.0 compatibility\n\n0.1.0 [2018-02-16]\n------------------\n* OpenSource release\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\r\n\r\nCopyright (c) 2018 Green Building Registry\r\n\r\nPermission is hereby granted, free of charge, to any person obtaining a copy\r\nof this software and associated documentation files (the \"Software\"), to deal\r\nin the Software without restriction, including without limitation the rights\r\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\ncopies of the Software, and to permit persons to whom the Software is\r\nfurnished to do so, subject to the following conditions:\r\n\r\nThe above copyright notice and this permission notice shall be included in all\r\ncopies or substantial portions of the Software.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\nSOFTWARE."
  },
  {
    "path": "README.rst",
    "content": "usaddress-scourgify\n===================\n\nA Python3.x library for cleaning/normalizing US addresses following USPS pub 28 and RESO guidelines.\n\n\n\nDocumentation\n-------------\nUse\n\n``normalize_address_record()``\n\n or\n\n``get_geocoder_normalized_addr()``\n\nor\n\n``NormalizeAddress().normalize()``\n\nto standardize your addresses. (Note: usaddress-scourgify does not make any attempts at address validation.)\n\nBoth functions, and the class init, take an address string, or a dict-like object, and return an address dict with all field values in uppercase format mapped to the keys address_line_1, address_line_2, city, state, postal_code... code-block:: python\n\n\n.. code-block:: python\n\n\n        from scourgify import normalize_address_record, NormalizeAddress\n\n        normalize_address_record('123 southwest Main street, Boring, or, 97203')\n        \n        normalize_address_record({\n            'address_line_1': '123 southwest Main street',\n            'address_line_2': 'unit 2',\n            'city': 'Boring',\n            'state': 'or',\n            'postal_code': '97203'\n        })\n\n        NormalizeAddress('123 southwest Main street, Boring, or, 97203').normalize()\n\nexpected output\n\n\n.. code-block:: python\n\n       {\n            'address_line_1': '123 SW MAIN ST',\n            'address_line_2': 'UNIT 2'\n            'city': 'BORING',\n            'state': 'OR',\n            'postal_code': '97203'\n        }\n\n\nBy default, the output style abbreviates all pre or post directionals, street types, and occupancy types.\nAlternately, if you would like to receive your output with full word directionals and street types, you can use the `long_hand` parameter.\n\n.. code-block:: python\n\n\n        from scourgify import normalize_address_record, NormalizeAddress\n\n        normalize_address_record('123 southwest Main street, Boring, or, 97203', long_hand=True)\n\n        normalize_address_record({\n            'address_line_1': '123 southwest Main street',\n            'address_line_2': 'unit 2,\n            'city': 'Boring',\n            'state': 'or',\n            'postal_code': '97203'\n        })\n\n        NormalizeAddress('123 southwest Main street, Boring, or, 97203', long_hand=True).normalize()\n\nexpected output\n\n\n.. code-block:: python\n\n       {\n            'address_line_1': '123 SOUTHWEST MAIN STREET',\n            'address_line_2': 'UNIT 2'\n            'city': 'BORING',\n            'state': 'OR',\n            'postal_code': '97203'\n        }\n\nnormalized_address_record() uses the included processing functions to remove unacceptable special characters, extra spaces, predictable abnormal character sub-strings and phrases. It also abbreviates directional indicators and street types according to the abbreviation mappings found in address_constants.  If applicable, line 2 address elements (ie: Apt, Unit) are separated from line 1 inputs and standard occupancy type abbreviations are applied.\n\nYou may supply additional additional processing functions as a list of callable supplied to the addtl_funcs parameter. Any additional functions should take a string address and return a tuple of strings (line1, line2).\n\nPostal codes are normalized to US zip or zip+4 and zero padded as applicable.  ie: `2129 => 02129`, `02129-44 => 02129-0044`, `021290044 => 02129-0044`.\nHowever, postal codes that cannot be effectively normalized, such as invalid length or invalid characters, will raise AddressValidationError. ie `12345678901 or 02129- or 02129-0044-123, etc`\n\nAlternately, you may extend the `NormalizeAddress` class to customize the normalization behavior by overriding any of the class' methods.\n\nIf your address is in the form of a dict that does not use the keys address_line_1, address_line_2, city, state, and postal_code, you must supply a key map to the addr_map parameter in the format {standard_key: custom_key}\n\n\n.. code-block:: python\n\n        {\n            'address_line_1': 'Line1',\n            'address_line_2': 'Line2',\n            'city': 'City',\n            'state': 'State',\n            'postal_code': 'Zip'\n        }\n\n\nYou can also customize the address constants used by setting up an `address_constants.yaml` config file.\nAllowed keys are::\n            DIRECTIONAL_REPLACEMENTS\n            OCCUPANCY_TYPE_ABBREVIATIONS\n            STATE_ABBREVIATIONS\n            STREET_TYPE_ABBREVIATIONS\n            KNOWN_ODDITIES\n            PROBLEM_ST_TYPE_ABBRVS\n\nYou may also use the key `insertion_method` with a value of `update` or `replace` to indicate where you would like to insert your values into the existing constants or replace them. If `insertion_method` is not present, update is assumed.\n\n\n.. code-block:: yaml\n\n        insertion_method: update\n        KNOWN_ODDITIES:\n            'developed by HOST': ''\n            ', UN ': ' UNIT '\n\n        OCCUPANCY_TYPE_ABBREVIATIONS:\n            'UN': 'UNIT'\n\n\nget_geocoder_normalized_addr() uses geocoder.google to parse your address into a standard dict.  No additional cleaning is performed, so if your address contains any stray or non-conforming elements (ie: 8888 NE KILLINGSWORTH ST, UN C, PORTLAND, OR 97008), no result will be returned.\nSince geocoder accepts an address string, if your address is in dict format you will need to supply a list of the address related keys within your dict, in the order of address string composition, if your keys do not match the standard key set (address_line_1, address_line_2, city, state, postal_code)\n\nInstallation\n------------\nRequires Python3.x.\n\n``pip install usaddress-scourgify``\n\nTo use a custom constants yaml, set the ADDRESS_CONFIG_DIR environment variable with the full path to the directory containing your address_constants.yaml file\n\n``export ADDRESS_CONFIG_DIR=/path/to/your/config_dir``\n\nTo use get_geocoder_normalized_addr, set the GOOGLE_API_KEY environment variable\n\n``export GOOGLE_API_KEY=your_google_api_key``\n\nContributing\n------------\nCreate a new branch to hold your change; no pull requests submitted directly to dev or master will be approved.  Please include a comment explain the issue your pull request solves. Make sure all appropriate test, and tox, updates are included and that all tests are passing.\n\nLicense\n-------\nusaddress-scourgify is released under the terms of the MIT license. Full details in LICENSE file.\n\nChangelog\n---------\nusaddress-scourgify was developed for use in the greenbuildingregistry project.\nFor a full changelog see `CHANGELOG.rst <https://github.com/GreenBuildingRegistry/usaddress-scourgify/blob/master/CHANGELOG.rst>`_.\n"
  },
  {
    "path": "requirements/base.txt",
    "content": "usaddress>=0.5.9\ngeocoder>=1.22.6\nyaml-config>=0.1.2\ntyping>=3.6.1; python_version<'3.6'\n"
  },
  {
    "path": "requirements/dev.txt",
    "content": "-r base.txt\ncoverage>=6.2\nflake8>=3.0.4\nfrosted>=1.4.1\nisort>=4.2.5\npep8>=1.7.0\npylama>=7.3.3\npylint>=1.6.4\ntox>=2.7.0\n\n"
  },
  {
    "path": "scourgify/__init__.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016  Earth Advantage.\nAll rights reserved\n\"\"\"\n\n# Local Imports\nfrom scourgify.normalize import (\n    get_geocoder_normalized_addr,\n    normalize_address_record,\n    NormalizeAddress\n)\n"
  },
  {
    "path": "scourgify/address_constants.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016-2017 Earth Advantage.\nAll rights reserved\n..codeauthor::Fable Turas <fable@rainsoftware.tech>\n\n\"\"\"\n# Imports from Third Party Modules\nfrom yamlconf import Config, ConfigError\n\nKNOWN_ODDITIES = {}\nABNORMAL_OCCUPANCY_ABBRVS = {}\n\nPROBLEM_ST_TYPE_ABBRVS = {\n    'CT': 'COURT'\n}\n\nAMBIGUOUS_DIRECTIONALS = {\n    'NORTH-WEST': 'NW',\n    'NORTH-EAST': 'NE',\n    'SOUTH-WEST': 'SW',\n    'SOUTH-EAST': 'SE'\n}\n\nDIRECTIONAL_REPLACEMENTS = {\n    'EAST': 'E',\n    'WEST': 'W',\n    'NORTH': 'N',\n    'SOUTH': 'S',\n    'NORTHEAST': 'NE',\n    'NORTHWEST': 'NW',\n    'SOUTHEAST': 'SE',\n    'SOUTHWEST': 'SW'\n}\n\nLONGHAND_DIRECTIONALS = {v: k for k, v in DIRECTIONAL_REPLACEMENTS.items()}\n\nCITY_ABBREVIATIONS = LONGHAND_DIRECTIONALS.copy()\nCITY_ABBRS = {\n    'ST': 'SAINT',\n    'MT': 'MOUNT',\n    'FT': 'FORT',\n    'VA': 'VIRGINIA'\n}\nCITY_ABBREVIATIONS.update(CITY_ABBRS)\nSTREET_TYPE_ABBREVIATIONS = {\n    'ALLEE': 'ALY',\n    'ALLEY': 'ALY',\n    'ALLY': 'ALY',\n    'ALY': 'ALY',\n    'ANEX': 'ANX',\n    'ANNEX': 'ANX',\n    'ANNX': 'ANX',\n    'ANX': 'ANX',\n    'ARC': 'ARC',\n    'ARCADE': 'ARC',\n    'AV': 'AVE',\n    'AVE': 'AVE',\n    'AVEN': 'AVE',\n    'AVENU': 'AVE',\n    'AVENUE': 'AVE',\n    'AVN': 'AVE',\n    'AVNUE': 'AVE',\n    'BAYOO': 'BYU',\n    'BAYOU': 'BYU',\n    'BCH': 'BCH',\n    'BEACH': 'BCH',\n    'BEND': 'BND',\n    'BND': 'BND',\n    'BLF': 'BLF',\n    'BLUF': 'BLF',\n    'BLUFF': 'BLF',\n    'BLUFFS': 'BLFS',\n    'BOT': 'BTM',\n    'BOTTM': 'BTM',\n    'BOTTOM': 'BTM',\n    'BTM': 'BTM',\n    'BLVD': 'BLVD',\n    'BOUL': 'BLVD',\n    'BOULEVARD': 'BLVD',\n    'BOULV': 'BLVD',\n    'BR': 'BR',\n    'BRANCH': 'BR',\n    'BRNCH': 'BR',\n    'BRDGE': 'BRG',\n    'BRG': 'BRG',\n    'BRIDGE': 'BRG',\n    'BRK': 'BRK',\n    'BROOK': 'BRK',\n    'BROOKS': 'BRKS',\n    'BURG': 'BG',\n    'BURGS': 'BGS',\n    'BYP': 'BYP',\n    'BYPA': 'BYP',\n    'BYPAS': 'BYP',\n    'BYPASS': 'BYP',\n    'BYPS': 'BYP',\n    'CAMP': 'CP',\n    'CMP': 'CP',\n    'CP': 'CP',\n    'CANYN': 'CYN',\n    'CANYON': 'CYN',\n    'CNYN': 'CYN',\n    'CYN': 'CYN',\n    'CAPE': 'CPE',\n    'CPE': 'CPE',\n    'CAUSEWAY': 'CSWY',\n    'CAUSWAY': 'CSWY',\n    'CSWY': 'CSWY',\n    'CEN': 'CTR',\n    'CENT': 'CTR',\n    'CENTER': 'CTR',\n    'CENTR': 'CTR',\n    'CENTRE': 'CTR',\n    'CNTER': 'CTR',\n    'CNTR': 'CTR',\n    'CTR': 'CTR',\n    'CENTERS': 'CTRS',\n    'CIR': 'CIR',\n    'CIRC': 'CIR',\n    'CIRCL': 'CIR',\n    'CIRCLE': 'CIR',\n    'CRCL': 'CIR',\n    'CRCLE': 'CIR',\n    'CIRCLES': 'CIRS',\n    'CLF': 'CLF',\n    'CLIFF': 'CLF',\n    'CLFS': 'CLFS',\n    'CLIFFS': 'CLFS',\n    'CLB': 'CLB',\n    'CLUB': 'CLB',\n    'COMMON': 'CMN',\n    'COR': 'COR',\n    'CORNER': 'COR',\n    'CORNERS': 'CORS',\n    'CORS': 'CORS',\n    'COURSE': 'CRSE',\n    'CRSE': 'CRSE',\n    'COURT': 'CT',\n    'CRT': 'CT',\n    'CT': 'CT',\n    'COURTS': 'CTS',\n    'COVE': 'CV',\n    'CV': 'CV',\n    'COVES': 'CVS',\n    'CK': 'CRK',\n    'CR': 'CRK',\n    'CREEK': 'CRK',\n    'CRK': 'CRK',\n    'CRECENT': 'CRES',\n    'CRES': 'CRES',\n    'CRESCENT': 'CRES',\n    'CRESENT': 'CRES',\n    'CRSCNT': 'CRES',\n    'CRSENT': 'CRES',\n    'CRSNT': 'CRES',\n    'CREST': 'CRST',\n    'CROSSING': 'XING',\n    'CRSSING': 'XING',\n    'CRSSNG': 'XING',\n    'XING': 'XING',\n    'CROSSROAD': 'XRD',\n    'CURVE': 'CURV',\n    'DALE': 'DL',\n    'DL': 'DL',\n    'DAM': 'DM',\n    'DM': 'DM',\n    'DIV': 'DV',\n    'DIVIDE': 'DV',\n    'DV': 'DV',\n    'DVD': 'DV',\n    'DR': 'DR',\n    'DRIV': 'DR',\n    'DRIVE': 'DR',\n    'DRV': 'DR',\n    'DRIVES': 'DRS',\n    'EST': 'EST',\n    'ESTATE': 'EST',\n    'ESTATES': 'ESTS',\n    'ESTS': 'ESTS',\n    'EXP': 'EXPY',\n    'EXPR': 'EXPY',\n    'EXPRESS': 'EXPY',\n    'EXPRESSWAY': 'EXPY',\n    'EXPW': 'EXPY',\n    'EXPY': 'EXPY',\n    'EXT': 'EXT',\n    'EXTENSION': 'EXT',\n    'EXTN': 'EXT',\n    'EXTNSN': 'EXT',\n    'EXTENSIONS': 'EXTS',\n    'EXTS': 'EXTS',\n    'FALL': 'FALL',\n    'FALLS': 'FLS',\n    'FLS': 'FLS',\n    'FERRY': 'FRY',\n    'FRRY': 'FRY',\n    'FRY': 'FRY',\n    'FIELD': 'FLD',\n    'FLD': 'FLD',\n    'FIELDS': 'FLDS',\n    'FLDS': 'FLDS',\n    'FLAT': 'FLT',\n    'FLT': 'FLT',\n    'FLATS': 'FLTS',\n    'FLTS': 'FLTS',\n    'FORD': 'FRD',\n    'FRD': 'FRD',\n    'FORDS': 'FRDS',\n    'FOREST': 'FRST',\n    'FORESTS': 'FRST',\n    'FRST': 'FRST',\n    'FORG': 'FRG',\n    'FORGE': 'FRG',\n    'FRG': 'FRG',\n    'FORGES': 'FRGS',\n    'FORK': 'FRK',\n    'FRK': 'FRK',\n    'FORKS': 'FRKS',\n    'FRKS': 'FRKS',\n    'FORT': 'FT',\n    'FRT': 'FT',\n    'FT': 'FT',\n    'FREEWAY': 'FWY',\n    'FREEWY': 'FWY',\n    'FRWAY': 'FWY',\n    'FRWY': 'FWY',\n    'FWY': 'FWY',\n    'GARDEN': 'GDN',\n    'GARDN': 'GDN',\n    'GDN': 'GDN',\n    'GRDEN': 'GDN',\n    'GRDN': 'GDN',\n    'GARDENS': 'GDNS',\n    'GDNS': 'GDNS',\n    'GRDNS': 'GDNS',\n    'GATEWAY': 'GTWY',\n    'GATEWY': 'GTWY',\n    'GATWAY': 'GTWY',\n    'GTWAY': 'GTWY',\n    'GTWY': 'GTWY',\n    'GLEN': 'GLN',\n    'GLN': 'GLN',\n    'GLENS': 'GLNS',\n    'GREEN': 'GRN',\n    'GRN': 'GRN',\n    'GREENS': 'GRNS',\n    'GROV': 'GRV',\n    'GROVE': 'GRV',\n    'GRV': 'GRV',\n    'GROVES': 'GRVS',\n    'HARB': 'HBR',\n    'HARBOR': 'HBR',\n    'HARBR': 'HBR',\n    'HBR': 'HBR',\n    'HRBOR': 'HBR',\n    'HARBORS': 'HBRS',\n    'HAVEN': 'HVN',\n    'HAVN': 'HVN',\n    'HVN': 'HVN',\n    'HEIGHT': 'HTS',\n    'HEIGHTS': 'HTS',\n    'HGTS': 'HTS',\n    'HT': 'HTS',\n    'HTS': 'HTS',\n    'HIGHWAY': 'HWY',\n    'HIGHWY': 'HWY',\n    'HIWAY': 'HWY',\n    'HIWY': 'HWY',\n    'HWAY': 'HWY',\n    'HWY': 'HWY',\n    'HILL': 'HL',\n    'HL': 'HL',\n    'HILLS': 'HLS',\n    'HLS': 'HLS',\n    'HLLW': 'HOLW',\n    'HOLLOW': 'HOLW',\n    'HOLLOWS': 'HOLW',\n    'HOLW': 'HOLW',\n    'HOLWS': 'HOLW',\n    'INLET': 'INLT',\n    'INLT': 'INLT',\n    'IS': 'IS',\n    'ISLAND': 'IS',\n    'ISLND': 'IS',\n    'ISLANDS': 'ISS',\n    'ISLNDS': 'ISS',\n    'ISS': 'ISS',\n    'ISLE': 'ISLE',\n    'ISLES': 'ISLE',\n    'JCT': 'JCT',\n    'JCTION': 'JCT',\n    'JCTN': 'JCT',\n    'JUNCTION': 'JCT',\n    'JUNCTN': 'JCT',\n    'JUNCTON': 'JCT',\n    'JCTNS': 'JCTS',\n    'JCTS': 'JCTS',\n    'JUNCTIONS': 'JCTS',\n    'KEY': 'KY',\n    'KY': 'KY',\n    'KEYS': 'KYS',\n    'KYS': 'KYS',\n    'KNL': 'KNL',\n    'KNOL': 'KNL',\n    'KNOLL': 'KNL',\n    'KNLS': 'KNLS',\n    'KNOLLS': 'KNLS',\n    'LAKE': 'LK',\n    'LK': 'LK',\n    'LAKES': 'LKS',\n    'LKS': 'LKS',\n    'LAND': 'LAND',\n    'LANDING': 'LNDG',\n    'LNDG': 'LNDG',\n    'LNDNG': 'LNDG',\n    'LA': 'LN',\n    'LANE': 'LN',\n    'LANES': 'LN',\n    'LN': 'LN',\n    'LGT': 'LGT',\n    'LIGHT': 'LGT',\n    'LIGHTS': 'LGTS',\n    'LF': 'LF',\n    'LOAF': 'LF',\n    'LCK': 'LCK',\n    'LOCK': 'LCK',\n    'LCKS': 'LCKS',\n    'LOCKS': 'LCKS',\n    'LDG': 'LDG',\n    'LDGE': 'LDG',\n    'LODG': 'LDG',\n    'LODGE': 'LDG',\n    'LOOP': 'LOOP',\n    'LOOPS': 'LOOP',\n    'MALL': 'MALL',\n    'MANOR': 'MNR',\n    'MNR': 'MNR',\n    'MANORS': 'MNRS',\n    'MNRS': 'MNRS',\n    'MDW': 'MDW',\n    'MEADOW': 'MDW',\n    'MDWS': 'MDWS',\n    'MEADOWS': 'MDWS',\n    'MEDOWS': 'MDWS',\n    'MEWS': 'MEWS',\n    'MILL': 'ML',\n    'ML': 'ML',\n    'MILLS': 'MLS',\n    'MLS': 'MLS',\n    'MISSION': 'MSN',\n    'MISSN': 'MSN',\n    'MSN': 'MSN',\n    'MSSN': 'MSN',\n    'MOTORWAY': 'MTWY',\n    'MNT': 'MT',\n    'MOUNT': 'MT',\n    'MT': 'MT',\n    'MNTAIN': 'MTN',\n    'MNTN': 'MTN',\n    'MOUNTAIN': 'MTN',\n    'MOUNTIN': 'MTN',\n    'MTIN': 'MTN',\n    'MTN': 'MTN',\n    'MNTNS': 'MTNS',\n    'MOUNTAINS': 'MTNS',\n    'NCK': 'NCK',\n    'NECK': 'NCK',\n    'ORCH': 'ORCH',\n    'ORCHARD': 'ORCH',\n    'ORCHRD': 'ORCH',\n    'OVAL': 'OVAL',\n    'OVL': 'OVAL',\n    'OVERPASS': 'OPAS',\n    'PARK': 'PARK',\n    'PK': 'PARK',\n    'PRK': 'PARK',\n    'PARKS': 'PARK',\n    'PARKWAY': 'PKWY',\n    'PARKWY': 'PKWY',\n    'PKWAY': 'PKWY',\n    'PKWY': 'PKWY',\n    'PKY': 'PKWY',\n    'PARKWAYS': 'PKWY',\n    'PKWYS': 'PKWY',\n    'PASS': 'PASS',\n    'PASSAGE': 'PSGE',\n    'PATH': 'PATH',\n    'PATHS': 'PATH',\n    'PIKE': 'PIKE',\n    'PIKES': 'PIKE',\n    'PINE': 'PNE',\n    'PINES': 'PNES',\n    'PNES': 'PNES',\n    'PL': 'PL',\n    'PLACE': 'PL',\n    'PLAIN': 'PLN',\n    'PLN': 'PLN',\n    'PLAINES': 'PLNS',\n    'PLAINS': 'PLNS',\n    'PLNS': 'PLNS',\n    'PLAZA': 'PLZ',\n    'PLZ': 'PLZ',\n    'PLZA': 'PLZ',\n    'POINT': 'PT',\n    'PT': 'PT',\n    'POINTS': 'PTS',\n    'PTS': 'PTS',\n    'PORT': 'PRT',\n    'PRT': 'PRT',\n    'PORTS': 'PRTS',\n    'PRTS': 'PRTS',\n    'PR': 'PR',\n    'PRAIRIE': 'PR',\n    'PRARIE': 'PR',\n    'PRR': 'PR',\n    'RAD': 'RADL',\n    'RADIAL': 'RADL',\n    'RADIEL': 'RADL',\n    'RADL': 'RADL',\n    'RAMP': 'RAMP',\n    'RANCH': 'RNCH',\n    'RANCHES': 'RNCH',\n    'RNCH': 'RNCH',\n    'RNCHS': 'RNCH',\n    'RAPID': 'RPD',\n    'RPD': 'RPD',\n    'RAPIDS': 'RPDS',\n    'RPDS': 'RPDS',\n    'REST': 'RST',\n    'RST': 'RST',\n    'RDG': 'RDG',\n    'RDGE': 'RDG',\n    'RIDGE': 'RDG',\n    'RDGS': 'RDGS',\n    'RIDGES': 'RDGS',\n    'RIV': 'RIV',\n    'RIVER': 'RIV',\n    'RIVR': 'RIV',\n    'RVR': 'RIV',\n    'RD': 'RD',\n    'ROAD': 'RD',\n    'RDS': 'RDS',\n    'ROADS': 'RDS',\n    'ROUTE': 'RTE',\n    'ROW': 'ROW',\n    'RUE': 'RUE',\n    'RUN': 'RUN',\n    'SHL': 'SHL',\n    'SHOAL': 'SHL',\n    'SHLS': 'SHLS',\n    'SHOALS': 'SHLS',\n    'SHOAR': 'SHR',\n    'SHORE': 'SHR',\n    'SHR': 'SHR',\n    'SHOARS': 'SHRS',\n    'SHORES': 'SHRS',\n    'SHRS': 'SHRS',\n    'SKYWAY': 'SKWY',\n    'SPG': 'SPG',\n    'SPNG': 'SPG',\n    'SPRING': 'SPG',\n    'SPRNG': 'SPG',\n    'SPGS': 'SPGS',\n    'SPNGS': 'SPGS',\n    'SPRINGS': 'SPGS',\n    'SPRNGS': 'SPGS',\n    'SPUR': 'SPUR',\n    'SPURS': 'SPUR',\n    'SQ': 'SQ',\n    'SQR': 'SQ',\n    'SQRE': 'SQ',\n    'SQU': 'SQ',\n    'SQUARE': 'SQ',\n    'SQRS': 'SQS',\n    'SQUARES': 'SQS',\n    'STA': 'STA',\n    'STATION': 'STA',\n    'STATN': 'STA',\n    'STN': 'STA',\n    'STRA': 'STRA',\n    'STRAV': 'STRA',\n    'STRAVE': 'STRA',\n    'STRAVEN': 'STRA',\n    'STRAVENUE': 'STRA',\n    'STRAVN': 'STRA',\n    'STRVN': 'STRA',\n    'STRVNUE': 'STRA',\n    'STREAM': 'STRM',\n    'STREME': 'STRM',\n    'STRM': 'STRM',\n    'ST': 'ST',\n    'STR': 'ST',\n    'STREET': 'ST',\n    'STRT': 'ST',\n    'STREETS': 'STS',\n    'SMT': 'SMT',\n    'SUMIT': 'SMT',\n    'SUMITT': 'SMT',\n    'SUMMIT': 'SMT',\n    'TER': 'TER',\n    'TERR': 'TER',\n    'TERRACE': 'TER',\n    'THROUGHWAY': 'TRWY',\n    'TRACE': 'TRCE',\n    'TRACES': 'TRCE',\n    'TRCE': 'TRCE',\n    'TRACK': 'TRAK',\n    'TRACKS': 'TRAK',\n    'TRAK': 'TRAK',\n    'TRK': 'TRAK',\n    'TRKS': 'TRAK',\n    'TRAFFICWAY': 'TRFY',\n    'TRFY': 'TRFY',\n    'TR': 'TRL',\n    'TRAIL': 'TRL',\n    'TRAILS': 'TRL',\n    'TRL': 'TRL',\n    'TRLS': 'TRL',\n    'TUNEL': 'TUNL',\n    'TUNL': 'TUNL',\n    'TUNLS': 'TUNL',\n    'TUNNEL': 'TUNL',\n    'TUNNELS': 'TUNL',\n    'TUNNL': 'TUNL',\n    'TPK': 'TPKE',\n    'TPKE': 'TPKE',\n    'TRNPK': 'TPKE',\n    'TRPK': 'TPKE',\n    'TURNPIKE': 'TPKE',\n    'TURNPK': 'TPKE',\n    'UNDERPASS': 'UPAS',\n    'UN': 'UN',\n    'UNION': 'UN',\n    'UNIONS': 'UNS',\n    'VALLEY': 'VLY',\n    'VALLY': 'VLY',\n    'VLLY': 'VLY',\n    'VLY': 'VLY',\n    'VALLEYS': 'VLYS',\n    'VLYS': 'VLYS',\n    'VDCT': 'VIA',\n    'VIA': 'VIA',\n    'VIADCT': 'VIA',\n    'VIADUCT': 'VIA',\n    'VIEW': 'VW',\n    'VW': 'VW',\n    'VIEWS': 'VWS',\n    'VWS': 'VWS',\n    'VILL': 'VLG',\n    'VILLAG': 'VLG',\n    'VILLAGE': 'VLG',\n    'VILLG': 'VLG',\n    'VILLIAGE': 'VLG',\n    'VLG': 'VLG',\n    'VILLAGES': 'VLGS',\n    'VLGS': 'VLGS',\n    'VILLE': 'VL',\n    'VL': 'VL',\n    'VIS': 'VIS',\n    'VIST': 'VIS',\n    'VISTA': 'VIS',\n    'VST': 'VIS',\n    'VSTA': 'VIS',\n    'WALK': 'WALK',\n    'WALKS': 'WALK',\n    'WALL': 'WALL',\n    'WAY': 'WAY',\n    'WY': 'WAY',\n    'WAYS': 'WAYS',\n    'WELL': 'WL',\n    'WELLS': 'WLS',\n    'WLS': 'WLS'\n}\n\nOCCUPANCY_TYPE_ABBREVIATIONS = {\n    'APARTMENT': 'APT',\n    'BUILDING': 'BLDG',\n    'BASEMENT': 'BSMT',\n    'DEPARTMENT': 'DEPT',\n    'FLOOR': 'FL',\n    'FRONT': 'FRNT',\n    'HANGER': 'HNGR',\n    'KEY': 'KEY',\n    'LOBBY': 'LBBY',\n    'LOT': 'LOT',\n    'LOWER': 'LOWR',\n    'OFFICE': 'OFC',\n    'PENTHOUSE': 'PH',\n    'PIER': 'PIER',\n    'REAR': 'REAR',\n    'ROOM': 'RM',\n    'SIDE': 'SIDE',\n    'SLIP': 'SLIP',\n    'SPACE': 'SPC',\n    'STOP': 'STOP',\n    'SUITE': 'STE',\n    'TRAILER': 'TRLR',\n    'UNIT': 'UNIT',\n    'UPPER': 'UPPER',\n    '#': '#'\n}\nLONGHAND_STREET_TYPES = {\n    'ALY': 'ALLEY',\n    'ANX': 'ANNEX',\n    'ARC': 'ARCADE',\n    'AVE': 'AVENUE',\n    'BYU': 'BAYOU',\n    'BCH': 'BEACH',\n    'BND': 'BEND',\n    'BLF': 'BLUFF',\n    'BLFS': 'BLUFFS',\n    'BTM': 'BOTTOM',\n    'BLVD': 'BOULEVARD',\n    'BR': 'BRANCH',\n    'BRG': 'BRIDGE',\n    'BRK': 'BROOK',\n    'BRKS': 'BROOKS',\n    'BGS': 'BURGS',\n    'BYP': 'BYPASS',\n    'CP': 'CAMP',\n    'CYN': 'CANYON',\n    'CPE': 'CAPE',\n    'CSWY': 'CAUSEWAY',\n    'CTR': 'CENTER',\n    'CTRS': 'CENTERS',\n    'CIR': 'CIRCLE',\n    'CIRS': 'CIRCLES',\n    'CLF': 'CLIFF',\n    'CLFS': 'CLIFFS',\n    'CMN': 'COMMON',\n    'COR': 'CORNER',\n    'CORS': 'CORNERS',\n    'CRSE': 'COURSE',\n    'CT': 'COURT',\n    'CTS': 'COURTS',\n    'CVS': 'COVES',\n    'CRK': 'CREEK',\n    'CRES': 'CRESCENT',\n    'CRST': 'CREST',\n    'XING': 'CROSSING',\n    'XRD': 'CROSSROAD',\n    'CURV': 'CURVE',\n    'DL': 'DALE',\n    'DM': 'DAM',\n    'DV': 'DIVIDE',\n    'DR': 'DRIVE',\n    'DRS': 'DRIVES',\n    'EST': 'ESTATE',\n    'ESTS': 'ESTATES',\n    'EXPY': 'EXPRESSWAY',\n    'EXT': 'EXTENSION',\n    'EXTS': 'EXTENSIONS',\n    'FALL': 'FALL',\n    'FLS': 'FALLS',\n    'FRY': 'FERRY',\n    'FLD': 'FIELD',\n    'FLDS': 'FIELDS',\n    'FLT': 'FLAT',\n    'FLTS': 'FLATS',\n    'FRD': 'FORD',\n    'FRDS': 'FORDS',\n    'FRST': 'FORESTS',\n    'FRG': 'FORGE',\n    'FRGS': 'FORGES',\n    'FRK': 'FORK',\n    'FRKS': 'FORKS',\n    'FT': 'FORT',\n    'FWY': 'FREEWAY',\n    'GDN': 'GARDEN',\n    'GDNS': 'GARDENS',\n    'GTWY': 'GATEWAY',\n    'GLN': 'GLEN',\n    'GLNS': 'GLENS',\n    'GRNS': 'GREENS',\n    'GRV': 'GROVE',\n    'GRVS': 'GROVES',\n    'HBR': 'HARBOR',\n    'HBRS': 'HARBORS',\n    'HVN': 'HAVEN',\n    'HTS': 'HEIGHTS',\n    'HWY': 'HIGHWAY',\n    'HL': 'HILL',\n    'HLS': 'HILLS',\n    'HOLW': 'HOLLOW',\n    'INLT': 'INLET',\n    'IS': 'ISLAND',\n    'ISS': 'ISLANDS',\n    'ISLE': 'ISLE',\n    'JCT': 'JUNCTION',\n    'JCTS': 'JUNCTIONS',\n    'KY': 'KEY',\n    'KYS': 'KEYS',\n    'KNL': 'KNOLL',\n    'KNLS': 'KNOLLS',\n    'LK': 'LAKE',\n    'LKS': 'LAKES',\n    'LAND': 'LAND',\n    'LNDG': 'LANDING',\n    'LN': 'LANE',\n    'LGT': 'LIGHT',\n    'LGTS': 'LIGHTS',\n    'LF': 'LOAF',\n    'LCK': 'LOCK',\n    'LCKS': 'LOCKS',\n    'LDG': 'LODGE',\n    'LOOP': 'LOOP',\n    'MALL': 'MALL',\n    'MNR': 'MANOR',\n    'MNRS': 'MANORS',\n    'MDW': 'MEADOW',\n    'MDWS': 'MEADOWS',\n    'MEWS': 'MEWS',\n    'ML': 'MILL',\n    'MLS': 'MILLS',\n    'MSN': 'MISSION',\n    'MTWY': 'MOTORWAY',\n    'MT': 'MOUNT',\n    'MTN': 'MOUNTAIN',\n    'MTNS': 'MOUNTAINS',\n    'NCK': 'NECK',\n    'ORCH': 'ORCHARD',\n    'OVAL': 'OVAL',\n    'OPAS': 'OVERPASS',\n    'PARK': 'PARKS',\n    'PKWY': 'PARKWAY',\n    'PASS': 'PASS',\n    'PSGE': 'PASSAGE',\n    'PATH': 'PATHS',\n    'PIKE': 'PIKES',\n    'PNE': 'PINE',\n    'PNES': 'PINES',\n    'PL': 'PLACE',\n    'PLN': 'PLAIN',\n    'PLNS': 'PLAINS',\n    'PLZ': 'PLAZA',\n    'PT': 'POINT',\n    'PTS': 'POINTS',\n    'PRT': 'PORT',\n    'PRTS': 'PORTS',\n    'PR': 'PRAIRIE',\n    'RADL': 'RADIAL',\n    'RAMP': 'RAMP',\n    'RNCH': 'RANCH',\n    'RPD': 'RAPID',\n    'RPDS': 'RAPIDS',\n    'RST': 'REST',\n    'RDG': 'RIDGE',\n    'RDGS': 'RIDGES',\n    'RIV': 'RIVER',\n    'RD': 'ROAD',\n    'RDS': 'ROADS',\n    'RTE': 'ROUTE',\n    'ROW': 'ROW',\n    'RUE': 'RUE',\n    'RUN': 'RUN',\n    'SHL': 'SHOAL',\n    'SHLS': 'SHOALS',\n    'SHR': 'SHORE',\n    'SHRS': 'SHORES',\n    'SKWY': 'SKYWAY',\n    'SPG': 'SPRING',\n    'SPGS': 'SPRINGS',\n    'SPUR': 'SPURS',\n    'SQ': 'SQUARE',\n    'SQS': 'SQUARES',\n    'STA': 'STATION',\n    'STRA': 'STRAVENUE',\n    'STRM': 'STREAM',\n    'ST': 'STREET',\n    'STS': 'STREETS',\n    'SMT': 'SUMMIT',\n    'TER': 'TERRACE',\n    'TRWY': 'THROUGHWAY',\n    'TRCE': 'TRACE',\n    'TRAK': 'TRACK',\n    'TRFY': 'TRAFFICWAY',\n    'TRL': 'TRAIL',\n    'TUNL': 'TUNNEL',\n    'TPKE': 'TURNPIKE',\n    'UPAS': 'UNDERPASS',\n    'UN': 'UNION',\n    'UNS': 'UNIONS',\n    'VLY': 'VALLEY',\n    'VLYS': 'VALLEYS',\n    'VIA': 'VIADUCT',\n    'VW': 'VIEW',\n    'VWS': 'VIEWS',\n    'VLG': 'VILLAGE',\n    'VLGS': 'VILLAGES',\n    'VL': 'VILLE',\n    'VIS': 'VISTA',\n    'WALK': 'WALK',\n    'WALL': 'WALL',\n    'WAY': 'WAY',\n    'WL': 'WELL',\n    'WLS': 'WELLS'\n}\nSTATE_ABBREVIATIONS = {\n    'ALABAMA': 'AL',\n    'ALA': 'AL',\n    'ALASKA': 'AK',\n    'ALAS': 'AK',\n    'ARIZONA': 'AZ',\n    'ARIZ': 'AZ',\n    'ARKANSAS': 'AR',\n    'ARK': 'AR',\n    'CALIFORNIA': 'CA',\n    'CALIF': 'CA',\n    'CAL': 'CA',\n    'COLORADO': 'CO',\n    'COLO': 'CO',\n    'COL': 'CO',\n    'CONNECTICUT': 'CT',\n    'CONN': 'CT',\n    'DELAWARE': 'DE',\n    'DEL': 'DE',\n    'DISTRICT OF COLUMBIA': 'DC',\n    'FLORIDA': 'FL',\n    'FLA': 'FL',\n    'FLOR': 'FL',\n    'GEORGIA': 'GA',\n    'GA': 'GA',\n    'HAWAII': 'HI',\n    'IDAHO': 'ID',\n    'IDA': 'ID',\n    'ILLINOIS': 'IL',\n    'ILL': 'IL',\n    'INDIANA': 'IN',\n    'IND': 'IN',\n    'IOWA': 'IA',\n    'KANSAS': 'KS',\n    'KANS': 'KS',\n    'KAN': 'KS',\n    'KENTUCKY': 'KY',\n    'KEN': 'KY',\n    'KENT': 'KY',\n    'LOUISIANA': 'LA',\n    'MAINE': 'ME',\n    'MARYLAND': 'MD',\n    'MASSACHUSETTS': 'MA',\n    'MASS': 'MA',\n    'MICHIGAN': 'MI',\n    'MICH': 'MI',\n    'MINNESOTA': 'MN',\n    'MINN': 'MN',\n    'MISSISSIPPI': 'MS',\n    'MISS': 'MS',\n    'MISSOURI': 'MO',\n    'MONTANA': 'MT',\n    'MONT': 'MT',\n    'NEBRASKA': 'NE',\n    'NEBR': 'NE',\n    'NEB': 'NE',\n    'NEVADA': 'NV',\n    'NEV': 'NV',\n    'NEW HAMPSHIRE': 'NH',\n    'NEW JERSEY': 'NJ',\n    'NEW MEXICO': 'NM',\n    'N MEX': 'NM',\n    'NEW M': 'NM',\n    'NEW YORK': 'NY',\n    'NORTH CAROLINA': 'NC',\n    'NORTH DAKOTA': 'ND',\n    'N DAK': 'ND',\n    'OHIO': 'OH',\n    'OKLAHOMA': 'OK',\n    'OKLA': 'OK',\n    'OREGON': 'OR',\n    'OREG': 'OR',\n    'ORE': 'OR',\n    'PENNSYLVANIA': 'PA',\n    'PENN': 'PA',\n    'RHODE ISLAND': 'RI',\n    'SOUTH CAROLINA': 'SC',\n    'SOUTH DAKOTA': 'SD',\n    'S DAK': 'SD',\n    'TENNESSEE': 'TN',\n    'TENN': 'TN',\n    'TEXAS': 'TX',\n    'TEX': 'TX',\n    'UTAH': 'UT',\n    'VERMONT': 'VT',\n    'VIRGINIA': 'VA',\n    'WASHINGTON': 'WA',\n    'WASH': 'WA',\n    'WEST VIRGINIA': 'WV',\n    'W VA': 'WV',\n    'WISCONSIN': 'WI',\n    'WIS': 'WI',\n    'WISC': 'WI',\n    'WYOMING': 'WY',\n    'WYO': 'WY'\n}\n\nADDRESS_KEYS = (\n    'address_line_1', 'address_line_2', 'city', 'state', 'postal_code'\n)\n\n\nclass NormalizationConfig(Config):\n    \"\"\"Config class for GBR\"\"\"\n    # pylint: disable=too-few-public-methods\n    default_file = 'address_constants.yaml'\n\n    def __init__(self, config_file=None, config_dir=None, section=None):\n        super(NormalizationConfig, self).__init__(\n            config_file=config_file, config_dir=config_dir, section=section,\n            env_prefix='ADDRESS_CONFIG'\n        )\n\n\ndef set_address_constants():\n    config = NormalizationConfig()\n    if config:\n        addr_constants = (\n            'DIRECTIONAL_REPLACEMENTS',\n            'OCCUPANCY_TYPE_ABBREVIATIONS',\n            'STATE_ABBREVIATIONS',\n            'STREET_TYPE_ABBREVIATIONS',\n            'KNOWN_ODDITIES',\n            'PROBLEM_ST_TYPE_ABBRVS',\n            'LONGHAND_DIRECTIONALS',\n            'LONGHAND_STREET_TYPES',\n        )\n        insertion_method = config.get('insertion_method', default='update')\n        update = ('update', 'insert')\n        replace = ('replace', 'overwrite')\n        if insertion_method not in update + replace:\n            msg = \"'{}' is not a valid option for 'insertion_method'\".format(\n                insertion_method\n            )\n            raise ConfigError(msg)\n        globals()['ADDRESS_KEYS'] = config.get(\n            'ADDRESS_KEYS', default=globals()['ADDRESS_KEYS']\n        )\n        for key in addr_constants:\n            new_vals = config.get(key, default={})\n            if key == 'OCCUPANCY_TYPE_ABBREVIATIONS' and new_vals:\n                org_keys = OCCUPANCY_TYPE_ABBREVIATIONS.keys()\n                new_keys = new_vals.keys()\n                globals()['ABNORMAL_OCCUPANCY_ABBRVS'] = (\n                    set(new_keys) - set(org_keys)\n                )\n            if new_vals and insertion_method in update:\n                globals()[key].update(**new_vals)\n            elif new_vals and insertion_method in replace:\n                globals()[key] = new_vals\n\n\nset_address_constants()\n"
  },
  {
    "path": "scourgify/cleaning.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016-2017 Earth Advantage.\nAll rights reserved\n..codeauthor::Fable Turas <fable@rainsoftware.tech>\n\n[ INSERT DOC STRING ]  # TODO\n\"\"\"\n\n# Imports from Standard Library\nimport re\nimport unicodedata\nfrom typing import Any, Optional, Sequence, Union\n\n# Imports from Third Party Modules\nimport usaddress\n\n# Local Imports\nfrom scourgify.address_constants import (\n    KNOWN_ODDITIES,\n    OCCUPANCY_TYPE_ABBREVIATIONS,\n    PROBLEM_ST_TYPE_ABBRVS,\n    AMBIGUOUS_DIRECTIONALS\n)\n\n# Setup\n\n# Constants\n# periods (in decimals), hyphens, / , and & are acceptable address components\n# ord('&') ord('#') ord('-'), ord('.') and ord('/')\nALLOWED_CHARS = [35, 38, 45, 46, 47]\n\n# Don't remove ',', '(' or ')' in PRE_CLEAN\nPRECLEAN_EXCLUDE = [40, 41, 44]\nEXCLUDE_ALL = ALLOWED_CHARS + PRECLEAN_EXCLUDE\n\nSTRIP_CHAR_CATS = (\n    'M', 'S', 'C', 'Nl', 'No', 'Pc', 'Ps', 'Pe', 'Pi', 'Pf', 'Po'\n)\nSTRIP_PUNC_CATS = ('Z', 'Pd')\nSTRIP_ALL_CATS = STRIP_CHAR_CATS + STRIP_PUNC_CATS\n\n# Data Structure Definitions\n\n# Private Functions\n\n\n# Public Classes and Functions\n\ndef pre_clean_addr_str(addr_str, state=None):\n    # type: (str, Optional[str]) -> str\n    \"\"\"Remove any known undesirable sub-strings and special characters.\n\n    Cleaning should be enacted on an addr_str to remove known characters\n    and phrases that might prevent usaddress from successfully parsing.\n    Follows USPS pub 28 guidelines for undesirable special characters.\n    Non-address phrases or character sets known to occur in raw addresses\n    should be added to address_constants.KNOWN_ODDITIES.\n\n    Some characters are left behind to potentially assist in second chance\n    processing of unparseable addresses and should be further cleaned\n    post_processing. (see post_clean_addr_str).\n\n    :param addr_str: raw address string\n    :type addr_str: str\n    :param state: optional string containing normalized state data.\n    :type state: str\n    :return: cleaned string\n    :rtype: str\n    \"\"\"\n    # replace any easily handled, undesirable sub-strings\n    if any(oddity in addr_str for oddity in KNOWN_ODDITIES.keys()):\n        for key, replacement in KNOWN_ODDITIES.items():      # pragma: no cover\n            addr_str = addr_str.replace(key, replacement)\n\n    # remove non-decimal point period chars.\n    if '.' in addr_str:                                      # pragma: no cover\n        addr_str = clean_period_char(addr_str)\n\n    addr_str = pre_clean_directionals(addr_str)\n\n    # remove special characters per USPS pub 28, except & which impacts\n    # intersection addresses, and - which impacts range addresses and zipcodes.\n    # ',', '(' and ')' are also left for potential use in additional line 2\n    # processing functions\n    addr_str = clean_upper(\n        addr_str, exclude=EXCLUDE_ALL, removal_cats=STRIP_CHAR_CATS\n    )\n\n    # to prevent any potential confusion between CT = COURT v CT = Connecticut,\n    # clean_ambiguous_street_types is not applied if state is CT.\n    if state and state not in PROBLEM_ST_TYPE_ABBRVS.keys():\n        addr_str = clean_ambiguous_street_types(addr_str)\n\n    return addr_str\n\n\ndef clean_ambiguous_street_types(addr_str):\n    # type: (str) -> str\n    \"\"\"Clean street type abbreviations treated ambiguously by usaddress.\n\n    Some two char street type abbreviations (ie. CT) are treated as StateName\n    by usaddress when address lines are parsed in isolation. To correct this,\n    known problem abbreviations are converted to their whole word equivalent.\n\n    :param addr_str: string containing address street and occupancy data\n        without city and state.\n    :type addr_str: str | None\n    :return: original or cleaned addr_str\n    :rtype: str | None\n    \"\"\"\n    if addr_str:\n        split_addr = addr_str.split()\n        for key in PROBLEM_ST_TYPE_ABBRVS:\n            if key in split_addr:\n                split_addr[split_addr.index(key)] = PROBLEM_ST_TYPE_ABBRVS[key]\n                addr_str = ' '.join(split_addr)\n                break\n    return addr_str\n\n\ndef post_clean_addr_str(addr_str):\n    # type: (Union[str, None], Optional[bool]) -> str\n    \"\"\"Remove any special chars or extra white space remaining post-processing.\n\n    :param addr_str: post-processing address string.\n    :type addr_str: str | None\n    :param is_line2: optional boolean to trigger extra line 2 processing.\n    :type is_line2: bool\n    :return: str set to uppercase, extra white space and special chars removed.\n    :rtype: str\n    \"\"\"\n    if addr_str:\n        addr_str = clean_upper(\n            addr_str, exclude=ALLOWED_CHARS, removal_cats=STRIP_CHAR_CATS\n        )\n    return addr_str\n\n\ndef _parse_occupancy(addr_line_2):\n    occupancy = None\n    if addr_line_2:\n        parsed = None\n        # first try usaddress parsing labels\n        try:\n            parsed = usaddress.tag(addr_line_2)\n        except usaddress.RepeatedLabelError:\n            pass\n        if parsed:\n            occupancy = parsed[0].get('OccupancyIdentifier')\n    return occupancy\n\n\ndef strip_occupancy_type(addr_line_2):\n    # type: (str) -> str\n    \"\"\"Strip occupancy type (ie apt, unit, etc) from addr_line_2 string\n\n    :param addr_line_2: address line 2 string that may contain type\n    :type addr_line_2: str\n    :return:\n    :rtype: str\n    \"\"\"\n    occupancy = None\n    if addr_line_2:\n        addr_line_2 = addr_line_2.replace('#', '').strip().upper()\n        occupancy = _parse_occupancy(addr_line_2)\n\n        # if that doesn't work, clean abbrevs and try again\n        if not occupancy:\n            parts = str(addr_line_2).split()\n            for p in parts:\n                if p in OCCUPANCY_TYPE_ABBREVIATIONS:\n                    addr_line_2 = addr_line_2.replace(\n                        p, OCCUPANCY_TYPE_ABBREVIATIONS[p]\n                    )\n            occupancy = _parse_occupancy(addr_line_2)\n\n            # if that doesn't work, dissect it manually\n            if not occupancy:\n                occupancy = addr_line_2\n                types = (\n                    list(OCCUPANCY_TYPE_ABBREVIATIONS.keys())\n                    + list(OCCUPANCY_TYPE_ABBREVIATIONS.values())\n                )\n                if parts and len(parts) > 1:\n                    ids = [p for p in parts if p not in types]\n                    print(ids)\n                    occupancy = ' '.join(ids)\n\n    return occupancy\n\n\ndef clean_upper(text,                           # type: Any\n                exclude=None,                   # type: Optional[Sequence[int]]\n                removal_cats=STRIP_CHAR_CATS,   # type: Optional[Sequence[str]]\n                strip_spaces=False              # type: Optional[bool]\n                ):\n    # type: (str, Optional[Sequence[int]], Optional[Sequence[str]]) -> str\n    \"\"\"\n    Return text as upper case unicode string and remove unwanted characters.\n    Defaults to STRIP_CHARS e.g all  whitespace, punctuation etc\n    :param text: text to clean\n    :type text: str\n    :param exclude: sequence of char ordinals to exclude from text.translate\n    :type exclude: Sequence\n    :param removal_cats: sequence of strings identifying unicodedata categories\n        (or startswith) of characters to be removed from text\n    :type removal_cats: Sequence\n    :param strip_spaces: Bool to indicate whether to leave or remove all\n        spaces. Default is False (leaves single spaces)\n    :type strip_spaces: bool\n    :return: cleaned uppercase unicode string\n    :rtype: str\n    \"\"\"\n    exclude = exclude or []\n    # coerce ints etc to str\n    if not isinstance(text, str):  # pragma: no cover\n        text = str(text)\n    # catch and convert fractions\n    text = unicodedata.normalize('NFKD', text)\n    text = text.translate({8260: '/'})\n\n    # evaluate string without commas (,) or ampersand (&) to determine if\n    # further processing is necessary\n    alnum_text = text.translate({44: None, 38: None})\n\n    # remove unwanted non-alphanumeric characters and convert all dash type\n    # characters to hyphen\n    if not alnum_text.replace(' ', '').isalnum():\n        for char in text:\n            if (unicodedata.category(char).startswith(removal_cats)\n                    and ord(char) not in exclude):\n                text = text.translate({ord(char): None})\n            elif unicodedata.category(char).startswith('Pd'):\n                text = text.translate({ord(char): '-'})\n    join_char = ' '\n    if strip_spaces:\n        join_char = ''\n    # remove extra spaces and convert to uppercase\n    return join_char.join(text.split()).upper()\n\n\ndef clean_period_char(text):\n    \"\"\"Remove all period characters that are not decimal points.\n\n    :param text: string text to clean\n    :type text: str\n    :return: cleaned string\n    :rtype: str\n    \"\"\"\n    period_pattern = re.compile(r'\\.(?!\\d)')\n    return re.sub(period_pattern, '', text)\n\n\ndef pre_clean_directionals(text):\n    \"\"\"\n    Replaces any ambiguous directionals (ie south-west) with their\n    standard abbreviation (ie SW). This helps ensure the directionals are\n    correctly identified during the usaddress tagging, rather than being\n    identified as part of the street name.\n    Directionals misrepresented as two words (ie south west) are not cleaned\n    because directional named streets (ie West St) do exist with\n    pre-directionals (ie S West St).\n    \"\"\"\n    for direction, abbr in AMBIGUOUS_DIRECTIONALS.items():\n        text = text.upper().replace(direction, abbr)\n    return text\n"
  },
  {
    "path": "scourgify/exceptions.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016-2017 Earth Advantage.\nAll rights reserved\n..codeauthor::Fable Turas <fable@rainsoftware.tech>\n\nCustom errors pertaining to address normalization.\n\"\"\"\n\n\n# Private Functions\n\n\n# Public Classes and Functions\n\nclass AddressNormalizationError(Exception):\n    \"\"\"Indicates error during normalization\"\"\"\n    TITLE = None\n    MESSAGE = None\n\n    def __init__(self, error=None, title=None, *args):\n        self.error = error or self.MESSAGE\n        self.title = title or self.TITLE\n        args = (error, title) + args\n        super(AddressNormalizationError, self).__init__(*args)\n\n    def __str__(self):\n        msg = \"{}: {}\".format(self.title, self.error)\n        if len(self.args) > 2:\n            msg = \"{}, {}\".format(\n                msg, ', '.join(str(a) for a in self.args[2:])\n            )\n        return msg\n\n\nclass AmbiguousAddressError(AddressNormalizationError):\n    \"\"\"Indicates an error from ambiguous addresses or address parts.\"\"\"\n    MESSAGE = \"This address contains ambiguous elements.\"\n    TITLE = \"AMBIGUOUS ADDRESS\"\n\n\nclass UnParseableAddressError(AddressNormalizationError):\n    \"\"\"Indicates an error from addresses that cannot be parsed.\"\"\"\n    MESSAGE = \"Unable to break this address into its component parts\"\n    TITLE = \"UNPARSEABLE ADDRESS\"\n\n\nclass IncompleteAddressError(AddressNormalizationError):\n    \"\"\"Indicates error from addresses that don't have enough data to index.\"\"\"\n    MESSAGE = \"This address is missing one or more required elements\"\n    TITLE = \"INCOMPLETE ADDRESS\"\n\n\nclass AddressValidationError(AddressNormalizationError):\n    \"\"\"Indicates address elements that don't meet format standards.\"\"\"\n    MESSAGE = \"Address contains invalid formatting\"\n    TITLE = \"ADDRESS FORMAT VALIDATION\"\n"
  },
  {
    "path": "scourgify/normalize.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016  Earth Advantage.\nAll rights reserved\n..codeauthor::Fable Turas <fable@rainsoftware.tech>\n\nProvides functions to normalize address per USPS pub 28 and/or RESO standards.\n\"\"\"\nfrom __future__ import annotations\n\n# TODO: Find why # with no street gets through\n# form_normalization = {\n#     'jurisdiction_property_id': 'TST123',\n#     'address_line_1': '123',\n#     'city': 'Portland',\n#     'state': 'OR',\n#     'postal_code': '97212'\n# }\n\n# Imports from Standard Library\n\nfrom string import Template\nfrom collections import OrderedDict  # noqa # pylint: disable=unused-import\nfrom typing import (  # noqa # pylint: disable=unused-import\n    Callable,\n    Mapping,\n    MutableMapping,\n    Optional,\n    Sequence,\n    Union,\n)\n\n# Imports from Third Party Modules\nimport geocoder\nimport usaddress\n\n# Local Imports\nfrom scourgify.address_constants import (\n    ABNORMAL_OCCUPANCY_ABBRVS,\n    ADDRESS_KEYS,\n    CITY_ABBREVIATIONS,\n    DIRECTIONAL_REPLACEMENTS,\n    LONGHAND_DIRECTIONALS,\n    LONGHAND_STREET_TYPES,\n    OCCUPANCY_TYPE_ABBREVIATIONS,\n    STATE_ABBREVIATIONS,\n    STREET_TYPE_ABBREVIATIONS,\n)\nfrom scourgify.cleaning import (\n    clean_upper,\n    post_clean_addr_str,\n    pre_clean_addr_str,\n    strip_occupancy_type,\n)\nfrom scourgify.exceptions import (\n    AddressNormalizationError,\n    AmbiguousAddressError,\n    UnParseableAddressError,\n)\nfrom scourgify.validations import (\n    validate_address_components,\n    validate_parens_groups_parsed,\n    validate_us_postal_code_format,\n)\n\n# Setup\n\n# Constants\n\nLINE1_USADDRESS_LABELS = (\n    'AddressNumber',\n    'StreetName',\n    'AddressNumberPrefix',\n    'AddressNumberSuffix',\n    'StreetNamePreDirectional',\n    'StreetNamePostDirectional',\n    'StreetNamePreModifier',\n    'StreetNamePostType',\n    'StreetNamePreType',\n    'IntersectionSeparator',\n    'SecondStreetNamePreDirectional',\n    'SecondStreetNamePostDirectional',\n    'SecondStreetNamePreModifier',\n    'SecondStreetNamePostType',\n    'SecondStreetNamePreType',\n    'LandmarkName',\n    'CornerOf',\n    'IntersectionSeparator',\n    'BuildingName',\n)\nLINE2_USADDRESS_LABELS = (\n    'OccupancyType',\n    'OccupancyIdentifier',\n    'SubaddressIdentifier',\n    'SubaddressType',\n)\n\nLAST_LINE_LABELS = (\n    'PlaceName',\n    'StateName',\n    'ZipCode',\n)\n\nAMBIGUOUS_LABELS = (\n    'Recipient',\n    'USPSBoxType',\n    'USPSBoxID',\n    'USPSBoxGroupType',\n    'USPSBoxGroupID',\n    'NotAddress'\n)\n\nSTRIP_CHAR_CATS = (\n    'M', 'S', 'C', 'Nl', 'No', 'Pc', 'Ps', 'Pe', 'Pi', 'Pf', 'Po'\n)\nSTRIP_PUNC_CATS = ('Z', 'Pd')\nSTRIP_ALL_CATS = STRIP_CHAR_CATS + STRIP_PUNC_CATS\n\n\n# Private Functions\n\n# Public Classes and Functions\n\ndef normalize_address_record(address: str | dict, addr_map: dict = None,\n                             addtl_funcs: [Callable] = None,\n                             strict: bool = True,\n                             long_hand: bool = False) -> dict:\n    \"\"\"Normalize an address according to USPS pub. 28 standards.\n\n    Takes an address string, or a dict-like with standard address fields\n    (address_line_1, address_line_2, city, state, postal_code), removes\n    unacceptable special characters, extra spaces, predictable abnormal\n    character sub-strings and phrases, abbreviates directional indicators\n    and street types.  If applicable, line 2 address elements (ie: Apt, Unit)\n    are separated from line 1 inputs.\n\n    addr_map, if used, must be in the format {standard_key: custom_key} based\n    on standard address keys sighted above.\n\n    Returns an address dict with all field values in uppercase format.\n\n    :param address: str or dict-like object containing details of a single\n        address.\n    :type address: str | Mapping[str, str]\n    :param addr_map: mapping of standard address fields to custom key names\n    :type addr_map: Mapping[str, str]\n    :param addtl_funcs: optional sequence of funcs that take string for further\n        processing and return line1 and line2 strings\n    :type addtl_funcs: Sequence[Callable[str, (str, str)]]\n    :param strict: bool indicating strict handling of components address parts\n        city, state and postal_code, vs city and state OR postal_code\n    :param long_hand: bool indicating whether to use long hand versions of\n        directionals and street types in the output.\n    :return: address dict containing parsed and normalized address values.\n    :rtype: Mapping[str, str]\n    \"\"\"\n    if isinstance(address, str):\n        return normalize_addr_str(\n            address, addtl_funcs=addtl_funcs, long_hand=long_hand\n        )\n    else:\n        return normalize_addr_dict(\n            address, addr_map=addr_map, addtl_funcs=addtl_funcs,\n            strict=strict, long_hand=long_hand\n        )\n\n\ndef normalize_addr_str(addr_str: str, line2: str = None, city: str = None,\n                       state: str = None, zipcode: str = None,\n                       addtl_funcs: [Callable] = None,\n                       long_hand: bool = False) -> dict:\n    \"\"\"Normalize a complete or partial address string.\n\n    :param addr_str: str containing address data.\n    :type addr_str: str\n    :param line2: optional str containing occupancy or sub-address data\n        (eg: Unit, Apt, Lot).\n    :type line2: str\n    :param city: optional str city name that does not need to be parsed from\n        addr_str.\n    :type city: str\n    :param state: optional str state name that does not need to be parsed from\n        addr_str.\n    :type state: str\n    :param zipcode: optional str postal code that does not need to be parsed\n        from addr_str.\n    :type zipcode: str\n    :param addtl_funcs: optional sequence of funcs that take string for further\n        processing and return line1 and line2 strings\n    :type addtl_funcs: Sequence[Callable[str, (str)]]\n    :param long_hand: bool indicating whether to use long hand versions of\n        directionals and street types in the output.\n    :return: address dict with uppercase parsed and normalized address values.\n    :rtype: Mapping[str, str]\n    \"\"\"\n    # get address parsed into usaddress components.\n    error = None\n    parsed_addr = None\n    addr_str = pre_clean_addr_str(addr_str, normalize_state(state))\n    try:\n        parsed_addr = parse_address_string(addr_str)\n    except (usaddress.RepeatedLabelError, AmbiguousAddressError) as err:\n        error = err\n        if not line2 and addtl_funcs:\n            for func in addtl_funcs:\n                try:\n                    line1, line2 = func(addr_str)\n                    error = False\n                    # send refactored line_1 and line_2 back through processing\n                    return normalize_addr_str(\n                        line1, line2=line2, city=city,\n                        state=state, zipcode=zipcode, long_hand=long_hand\n                    )\n                except ValueError:\n                    # try a different additional processing function\n                    pass\n\n    if parsed_addr and not parsed_addr.get('StreetName'):\n        addr_dict = dict(\n            address_line_1=addr_str, address_line_2=line2, city=city,\n            state=state, postal_code=zipcode\n        )\n        full_addr = format_address_record(addr_dict)\n        try:\n            parsed_addr = parse_address_string(full_addr)\n        except (usaddress.RepeatedLabelError, AmbiguousAddressError) as err:\n            parsed_addr = None\n            error = err\n\n    if parsed_addr:\n        parsed_addr = normalize_address_components(\n            parsed_addr, long_hand=long_hand\n        )\n        zipcode = get_parsed_values(\n            parsed_addr, zipcode, 'ZipCode', addr_str\n        )\n        city = get_parsed_values(\n            parsed_addr, city, 'PlaceName', addr_str\n        )\n        state = get_parsed_values(\n            parsed_addr, state, 'StateName', addr_str\n        )\n        state = normalize_state(state)\n\n        # assumes if line2 is passed in that it need not be parsed from\n        # addr_str. Primarily used to allow advanced processing of otherwise\n        # unparsable addresses.\n        line2 = line2 if line2 else get_normalized_line_segment(\n            parsed_addr, LINE2_USADDRESS_LABELS\n        )\n        line2 = post_clean_addr_str(line2)\n        # line 1 is fully post cleaned in get_normalized_line_segment.\n        line1 = get_normalized_line_segment(\n            parsed_addr, LINE1_USADDRESS_LABELS\n        )\n        validate_parens_groups_parsed(line1)\n    else:\n        # line1 is set to addr_str so complete dict can be passed to error.\n        line1 = addr_str\n\n    addr_rec = OrderedDict(\n        address_line_1=line1, address_line_2=line2, city=city,\n        state=state, postal_code=zipcode\n    )\n    if error:\n        raise UnParseableAddressError(None, None, addr_rec)\n    else:\n        return addr_rec\n\n\ndef normalize_addr_dict(addr_dict: dict, addr_map: dict = None,\n                        addtl_funcs: [Callable] = None,\n                        strict: bool = True, long_hand: bool = False) -> dict:\n    \"\"\"Normalize an address from dict or dict-like object.\n\n    Assumes addr_dict will have standard address related keys (address_line_1,\n    address_line_2, city, state, postal_code), unless addr_map is provided.\n\n    addr_map, if used, must be in the format {standard_key: custom_key} based\n    on standard address keys sighted above.\n\n    :param addr_dict: mapping containing address keys and values.\n    :type addr_dict: Mapping\n    :param addr_map: mapping of standard address fields to custom key names\n    :type addr_map: Mapping[str, str]\n    :param addtl_funcs: optional sequence of funcs that take string for further\n        processing and return line1 and line2 strings\n    :type addtl_funcs: Sequence[Callable[str, (str, str)]]\n    :param strict: bool indicating strict handling of components address parts\n        city, state and postal_code, vs city and state OR postal_code\n    :param long_hand: bool indicating whether to use long hand versions of\n        directionals and street types in the output.\n    :return: address dict with normalized, uppercase address values.\n    :rtype: Mapping[str, str]\n    \"\"\"\n    if addr_map:\n        addr_dict = {key: addr_dict.get(val) for key, val in addr_map.items()}\n    addr_dict = validate_address_components(addr_dict, strict=strict)\n\n    # line 1 and line 2 elements are combined to ensure consistent processing\n    # whether the line 2 elements are pre-parsed or included in line 1\n    addr_str = get_addr_line_str(addr_dict, comma_separate=True)\n    postal_code = addr_dict.get('postal_code')\n    zipcode = validate_us_postal_code_format(\n        postal_code, addr_dict\n    ) if postal_code else None\n    city = addr_dict.get('city')\n    state = addr_dict.get('state')\n    try:\n        address = normalize_addr_str(\n            addr_str, city=city, state=state, zipcode=zipcode,\n            addtl_funcs=addtl_funcs, long_hand=long_hand\n        )\n    except AddressNormalizationError:\n        addr_str = get_addr_line_str(\n            addr_dict, comma_separate=True, addr_parts=ADDRESS_KEYS\n        )\n        address = normalize_addr_str(\n            addr_str, city=city, state=state, zipcode=zipcode,\n            addtl_funcs=addtl_funcs, long_hand=long_hand\n        )\n    return address\n\n\ndef parse_address_string(addr_str: str) -> dict:\n    \"\"\"Separate an address string into its component parts per usaddress.\n\n    Attempts to parse addr_str into it's component parts, using usaddress.\n\n    If usaddress identifies the address type as Ambiguous or the resulting\n    OrderedDict includes any keys from AMBIGUOUS_LABELS that would constitute\n    ambiguous address in the SEED/GBR use case (ie: Recipient) then\n    an AmbiguousAddressError is raised.\n\n    :param addr_str: str address to be processed.\n    :type addr_str: str\n    :return: usaddress OrderedDict\n    :rtype: MutableMapping\n    \"\"\"\n    parsed_results = usaddress.tag(addr_str)\n    parsed_addr = parsed_results[0]\n    # if the address is parseable but some form of ambiguity is found that\n    # may result in data corruption NormalizationError is raised.\n    if (parsed_results[1] == 'Ambiguous' or\n            any(key in AMBIGUOUS_LABELS for key in parsed_addr.keys())):\n        raise AmbiguousAddressError()\n    parsed_addr = handle_abnormal_occupancy(parsed_addr, addr_str)\n    return parsed_addr\n\n\ndef handle_abnormal_occupancy(parsed_addr: OrderedDict,\n                              addr_str: str) -> OrderedDict:\n    \"\"\"Handle abnormal occupancy abbreviations that are parsed as street type.\n\n    Evaluates addresses with an Occupancy or Subaddress identifier whose type\n    may be parsed into StreetNamePostType and swaps the StreetNamePostType tag\n    for the OccupancyType tag if necessary.\n\n    For example: Portland Maps uses 'UN' as an abbreviation for 'Unit' which\n    usaddress parses to 'StreetNamePostType' since 'UN' is correctly an\n    abbreviation for 'Union' street type.\n        123 MAIN UN => 123 MAIN UN\n        123 MAIN UN A => 123 MAIN, UNIT A\n        123 MAIN UN UN A => 123 MAIN UN, UNIT A\n\n    :param parsed_addr: address parsed into usaddress components\n    :type parsed_addr: OrderedDict\n    :param addr_str: Original address string\n    :type addr_str: str\n    :return: parsed address\n    :rtype: OrderedDict\n    \"\"\"\n    occupancy_id_key = None\n    occupany_type_key = 'OccupancyType'\n    street_type_key = 'StreetNamePostType'\n    occupany_type_keys = (occupany_type_key, 'SubaddressType')\n    occupancy_identifier_keys = ('OccupancyIdentifier', 'SubaddressIdentifier')\n    street_type = parsed_addr.get(street_type_key)\n    if street_type in ABNORMAL_OCCUPANCY_ABBRVS:\n        occupancy_type = None\n        occupancy = None\n        for key in occupany_type_keys:\n            try:\n                occupancy_type = parsed_addr[key]\n                break\n            except KeyError:\n                pass\n        for key in occupancy_identifier_keys:\n            try:\n                occupancy = parsed_addr[key]\n                occupancy_id_key = key\n                break\n            except KeyError:\n                break\n        if occupancy and not occupancy_type:\n            if street_type in occupancy:\n                occupancy = occupancy.replace(street_type, '').strip()\n                del parsed_addr[occupancy_id_key]\n            else:\n                line2 = \"{} {}\".format(street_type, occupancy)\n                addr_str = addr_str.replace(line2, '')\n                parsed_addr = parse_address_string(addr_str)\n            parsed_addr.update({occupany_type_key: street_type})\n            parsed_addr.update({occupancy_id_key: occupancy})\n    return parsed_addr\n\n\ndef get_parsed_values(parsed_addr: OrderedDict, orig_val: str,\n                      val_label: str, orig_addr_str: str) -> str | None:\n    \"\"\"Get valid values from parsed_addr corresponding to val_label.\n\n    Retrieves values from parsed_addr corresponding to the label supplied in\n    val_label.\n    If a value for val_label is found in parsed_addr AND an orig_val is\n    supplied, a single string will be returned if the values match. If only\n    one of the two contains a non-null value.\n    If both values are empty, None is returned.\n    If the values an AmbiguousAddressError will be returned if the two values\n    are not equal. This provides a check against misidentified address\n    components when known values are available. (For example when a city is\n    supplied from the address dict or record being normalized, but usaddress\n    identifies extra information stored in address_line_1 as a PlaceName.)\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :type parsed_addr: Mapping\n    :param orig_val: related value passed in from incoming data source.\n    :type orig_val: str\n    :param val_label: label to locate in parsed_addr\n    :type val_label: str\n    :param orig_addr_str: address string to pass to error, if applicable.\n    :type orig_addr_str: str\n    :return: str | None\n    \"\"\"\n    val_from_parse = parsed_addr.get(val_label)\n    orig_val = post_clean_addr_str(orig_val)\n    val_from_parse = post_clean_addr_str(val_from_parse)\n    non_null_val_set = {orig_val, val_from_parse} - {None}\n    if len(non_null_val_set) > 1:\n        msg = (\n            f'Parsed {val_label} does not align with submitted value: '\n            f'Parsed: {val_from_parse}. Original: {orig_val}'\n        )\n        raise AmbiguousAddressError(None, msg, orig_addr_str)\n    else:\n        return non_null_val_set.pop() if non_null_val_set else None\n\n\ndef normalize_address_components(parsed_addr: OrderedDict,\n                                 long_hand: bool = False) -> OrderedDict:\n    \"\"\"Normalize parsed sections of address as appropriate.\n\n    Processes parsed address through subsets of normalization rules.\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :param long_hand: bool indicating whether to use long hand versions of\n        directional and street type in the output.\n    :return: parsed_addr with normalization processing applied to elements.\n    :rtype: OrderedDict\n    \"\"\"\n    parsed_addr = normalize_numbered_streets(parsed_addr)\n    parsed_addr = normalize_directionals(parsed_addr, long_hand=long_hand)\n    parsed_addr = normalize_street_types(parsed_addr, long_hand=long_hand)\n    parsed_addr = normalize_occupancy_type(parsed_addr)\n    return parsed_addr\n\n\ndef normalize_numbered_streets(parsed_addr: OrderedDict) -> OrderedDict:\n    \"\"\"Change numbered street names to include missing original identifiers.\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :type parsed_addr: Mapping\n    :return: parsed_addr with ordinal identifiers appended to numbered streets.\n    :rtype: dict\"\"\"\n    street_tags = ['StreetName', 'SecondStreetName']\n    for tag in street_tags:\n        post_type_tag = '{}PostType'.format(tag)\n        # limits updates to numbered street names that include a post street\n        # type, since an ordinal indicator would be inappropriate for some\n        # numbered streets (ie. Country Road 97).\n        if tag in parsed_addr.keys() and post_type_tag in parsed_addr.keys():\n            try:\n                cardinal = int(parsed_addr[tag])\n                ord_indicator = get_ordinal_indicator(cardinal)\n                parsed_addr[tag] = '{}{}'.format(cardinal, ord_indicator)\n            except ValueError:\n                pass\n    return parsed_addr\n\n\ndef normalize_directionals(parsed_addr: OrderedDict,\n                           long_hand=False) -> OrderedDict:\n    \"\"\"Change directional notations to standard abbreviations.\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :type parsed_addr: Mapping\n    :param long_hand: bool indicating whether to use long hand versions of\n        directionals in the output.\n    :return: parsed_addr with directionals updated to abbreviated format.\n    :rtype: dict\n    \"\"\"\n    # get the directional related keys from the current address.\n    found_directional_tags = (\n        tag for tag in parsed_addr.keys() if 'Directional' in tag\n    )\n    found_directional_tags = list(found_directional_tags)\n    for found in found_directional_tags:\n        # get the original directional related value per key.\n        dir_str = parsed_addr[found]\n        # remove spaces, punctuation, hyphens etc so two part directions\n        # conform to a single word standard. Convert to upper case\n        dir_str = clean_upper(\n            dir_str, exclude=[], removal_cats=STRIP_ALL_CATS, strip_spaces=True\n        )\n        if dir_str in DIRECTIONAL_REPLACEMENTS.keys():\n            dir_str = DIRECTIONAL_REPLACEMENTS[dir_str]\n        if long_hand:\n            dir_str = LONGHAND_DIRECTIONALS[dir_str]\n        parsed_addr[found] = dir_str\n\n    return parsed_addr\n\n\ndef normalize_street_types(parsed_addr: OrderedDict,\n                           long_hand=False) -> OrderedDict:\n    \"\"\"Change street types to accepted abbreviated format.\n\n    No change is made if street types do not conform to common usages per\n    USPS pub 28 appendix C.\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :type parsed_addr: Mapping\n    :param long_hand: bool indicating whether to use long hand versions of\n        street types in the output.\n    :return: parsed_addr with street types updated to abbreviated format.\n    :rtype: dict\n    \"\"\"\n    # get the *Street*Type keys from the current parsed address.\n    found_type_tags = (\n        tag for tag in parsed_addr.keys() if 'Street' in tag and 'Type' in tag\n    )\n    for found in found_type_tags:\n        street_type = parsed_addr[found]\n        # lookup the appropriate abbrev for the street type found per key.\n        type_abbr = STREET_TYPE_ABBREVIATIONS.get(parsed_addr[found])\n        # update the street type only if a new abbreviation is found.\n        if type_abbr:\n            street_type = type_abbr\n        if long_hand:\n            street_type = LONGHAND_STREET_TYPES[street_type]\n        parsed_addr[found] = street_type\n    return parsed_addr\n\n\ndef normalize_occupancy_type(parsed_addr: OrderedDict,\n                             default=None) -> OrderedDict:\n    \"\"\"Change occupancy types to accepted abbreviated format.\n\n    If there is an occupancy and it does not conform to one of the\n    OCCUPANCY_TYPE_ABBREVIATIONS, occupancy is changed to the generic 'UNIT'.\n    OCCUPANCY_TYPE_ABBREVIATIONS contains common abbreviations per\n    USPS pub 28 appendix C, however, OCCUPANCY_TYPE_ABBREVIATIONS can be\n    customized to allow alternate abbreviations to pass through. (see README)\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :type parsed_addr: Mapping\n    :param default: default abbreviation to use for types that fall outside the\n     standard abbreviations. Default is 'UNIT'\n    :return: parsed_addr with occupancy types updated to abbreviated format.\n    :rtype: dict\n    \"\"\"\n    default = default if default is not None else 'UNIT'\n    occupancy_type_label = 'OccupancyType'\n    occupancy_type = parsed_addr.pop(occupancy_type_label, None)\n    occupancy_type_abbr = (\n        occupancy_type\n        if occupancy_type in OCCUPANCY_TYPE_ABBREVIATIONS.values()\n        else OCCUPANCY_TYPE_ABBREVIATIONS.get(occupancy_type)\n    )\n    occupancy_id = parsed_addr.get('OccupancyIdentifier')\n    if ((occupancy_id and not occupancy_id.startswith('#'))\n            and not occupancy_type_abbr):\n        occupancy_type_abbr = default\n    if occupancy_type_abbr:\n        parsed_list = list(parsed_addr.items())\n        try:\n            index = parsed_list.index(('OccupancyIdentifier', occupancy_id))\n        except ValueError:\n            msg = (\n                'Address has an occupancy type (ie: Apt, Unit, etc) '\n                'but no occupancy identifier (ie: 101, A, etc)'\n            )\n            raise AddressNormalizationError(msg)\n        parsed_list.insert(index, (occupancy_type_label, occupancy_type_abbr))\n        parsed_addr = OrderedDict(parsed_list)\n    return parsed_addr\n\n\ndef normalize_state(state: str | None) -> str | None:\n    \"\"\"Change state string to accepted abbreviated format.\n\n    :param state: string containing state name or abbreviation.\n    :type state: str | None\n    :return: 2 char state abbreviation, or original state string if not found\n        in state names or standard long abbreviations.\n    :rtype: str | None\n    \"\"\"\n    if state:\n        state_abbrv = STATE_ABBREVIATIONS.get(state.upper())\n        if state_abbrv:\n            state = state_abbrv\n    return state\n\n\ndef normalize_city(city: str):\n    city = city.split()\n    for i, part in enumerate(city):\n        replacement = CITY_ABBREVIATIONS.get(part.replace('.', ''))\n        if replacement:\n            city[i] = replacement\n    return ' '.join(city)\n\n\ndef get_normalized_line_segment(parsed_addr: OrderedDict,\n                                line_labels: [str]) -> str:\n    \"\"\"\n\n    :param parsed_addr: address parsed into ordereddict per usaddress.\n    :param line_labels: tuple of str labels of all the potential keys related\n        to the desired address segment (ie address_line_1 or address_line_2).\n    :return: s/r joined values from parsed_addr corresponding to given labels.\n    \"\"\"\n    line_elems = [\n        elem for key, elem in parsed_addr.items() if key in line_labels\n    ]\n    line_str = ' '.join(line_elems) if line_elems else None\n    return post_clean_addr_str(line_str)\n\n\ndef get_addr_line_str(addr_dict: dict, addr_parts: [str] = None,\n                      comma_separate: bool = False) -> str:\n    \"\"\"Get address 'line' elements as a single string.\n\n    Combines 'address_line_1' and 'address_line_2' elements as a single string\n    to ensure no data is lost and line_2 can be processed according to a\n    standard set of rules.\n\n    :param addr_dict: dict containing keys 'address_line_1', 'address_line_2'.\n    :type addr_dict: Mapping\n    :param addr_parts: optional sequence of address elements\n    :type addr_parts:\n    :param comma_separate: optional boolean to separate dict values by comma\n        useful for dealing with potentially ambiguous line 2 segments\n    :type bool:\n    :return: string combining 'address_line_1' & 'address_line_2' values.\n    :rtype: str\n    \"\"\"\n    if not addr_parts:\n        addr_parts = ['address_line_1', 'address_line_2']\n    if not isinstance(addr_parts, (list, tuple)):\n        raise TypeError('addr_parts must be a list or tuple')\n    separator = ', ' if comma_separate else ' '\n    addr_str = separator.join(\n        str(addr_dict[elem]) for elem in addr_parts if addr_dict.get(elem)\n    )\n    return addr_str\n\n\ndef format_address_record(address: dict) -> str:\n    # type AddressRecord -> str\n    \"\"\"Format AddressRecord as string.\"\"\"\n    address_template = Template('$address')\n    address = dict(address)\n    addr_parts = [\n        str(address[field]) for field in ADDRESS_KEYS if address.get(field)\n    ]\n    return address_template.safe_substitute(address=', '.join(addr_parts))\n\n\ndef get_geocoder_normalized_addr(address: dict | str,\n                                 addr_keys: [str] = ADDRESS_KEYS) -> dict:\n    \"\"\"Get geocoder normalized address parsed to dict with addr_keys.\n\n    :param address: string or dict-like containing address data\n    :param addr_keys: optional list of address keys. standard list of keys will\n        be used if not supplied\n    :return: dict containing geocoder address result\n    \"\"\"\n    address_line_2 = None\n    geo_addr_dict = {}\n    if not isinstance(address, str):\n        address_line_2 = address.get('address_line_2')\n        address = get_addr_line_str(address, addr_parts=addr_keys)\n    geo_resp = geocoder.google(address)\n    if geo_resp.ok and geo_resp.housenumber:\n        line2 = geo_resp.subpremise or address_line_2\n        geo_addr_dict = {\n            'address_line_1':\n                ' '.join([geo_resp.housenumber, geo_resp.street]),\n            'address_line_2': strip_occupancy_type(line2),\n            'city': geo_resp.city,\n            'state': geo_resp.state,\n            'postal_code': geo_resp.postal\n        }\n        for key, value in geo_addr_dict.items():\n            geo_addr_dict[key] = value.upper() if value else None\n    return geo_addr_dict\n\n\ndef get_ordinal_indicator(number: int) -> str:\n    \"\"\"Get the ordinal indicator suffix applicable to the supplied number.\n\n     Ordinal numbers are words representing position or rank in a sequential\n     order (1st, 2nd, 3rd, etc).\n     Ordinal indicators are the suffix characters (st, nd, rd, th) that, when\n     applied to a numeral (int), denote that it an ordinal number.\n\n    :param number: int\n    :type: int\n    :return: ordinal indicator appropriate to the number supplied.\n    :rtype: str\n    \"\"\"\n    str_num = str(number)\n    digits = len(str_num)\n    if str_num[-1] == '1' and not (digits >= 2 and str_num[-2:] == '11'):\n        return 'st'\n    elif str_num[-1] == '2' and not (digits >= 2 and str_num[-2:] == '12'):\n        return 'nd'\n    elif str_num[-1] == '3' and not (digits >= 2 and str_num[-2:] == '13'):\n        return 'rd'\n    else:\n        return 'th'\n\n\nclass NormalizeAddress(object):\n    \"\"\"Normalize an address according to USPS pub. 28 standards.\n\n    Instantiates with an address string, or a dict-like with standard address\n    fields (address_line_1, address_line_2, city, state, postal_code), removes\n    unacceptable special characters, extra spaces, predictable abnormal\n    character sub-strings and phrases, abbreviates directional indicators\n    and street types.  If applicable, line 2 address elements (ie: Apt, Unit)\n    are separated from line 1 inputs.\n\n    addr_map, if used, must be in the format {standard_key: custom_key} based\n    on standard address keys sighted above.\n\n    Returns an address dict with all field values in uppercase format.\n\n    :param address: str or dict-like object containing details of a single\n        address.\n    :param addr_map: mapping of standard address fields to custom key names\n    :param addtl_funcs: optional sequence of funcs that take string for further\n        processing and return line1 and line2 strings\n    :type addtl_funcs: Sequence[Callable[str, (str, str)]]\n    :param strict: bool indicating strict handling of components address parts\n        city, state and postal_code, vs city and state OR postal_code\n    :param long_hand: bool indicating whether to use long hand versions of\n        directionals and street types in the output.\n    :return: address dict containing parsed and normalized address values.\n    \"\"\"\n    parse_address_string = staticmethod(parse_address_string)\n    pre_clean_addr_str = staticmethod(pre_clean_addr_str)\n    post_clean_addr_str = staticmethod(post_clean_addr_str)\n    format_address_record = staticmethod(format_address_record)\n    normalize_address_components = staticmethod(normalize_address_components)\n    get_parsed_values = staticmethod(get_parsed_values)\n\n    def __init__(self, address, addr_map=None, addtl_funcs=None,\n                 strict=None, long_hand=False):\n        self.address = address\n        self.addtl_funcs = addtl_funcs\n        self.strict = True if strict is None else strict\n        self.long_hand = long_hand\n        if addr_map and not isinstance(self.address, str):\n            self.address = {\n                key: self.address.get(val) for key, val in addr_map.items()\n            }\n\n    @staticmethod\n    def get_normalized_line_1(parsed_addr, line_labels=LINE1_USADDRESS_LABELS):\n        return get_normalized_line_segment(parsed_addr, line_labels)\n\n    @staticmethod\n    def get_normalized_line_2(parsed_addr, line_labels=LINE2_USADDRESS_LABELS):\n        return get_normalized_line_segment(parsed_addr, line_labels)\n\n    def normalize(self):\n        if isinstance(self.address, str):\n            return self.normalize_addr_str(\n                self.address, long_hand=self.long_hand\n            )\n        else:\n            return self.normalize_addr_dict()\n\n    def normalize_addr_str(self, addr_str,  # type: str\n                           line2=None,  # type: Optional[str]\n                           city=None,  # type: Optional[str]\n                           state=None,  # type: Optional[str]\n                           zipcode=None,  # type: Optional[str]\n                           long_hand=False\n                           ):  # noqa\n        # get address parsed into usaddress components.\n        error = None\n        parsed_addr = None\n        addr_str = self.pre_clean_addr_str(addr_str, normalize_state(state))\n        try:\n            parsed_addr = self.parse_address_string(addr_str)\n        except (usaddress.RepeatedLabelError, AmbiguousAddressError) as err:\n            error = err\n            if not line2 and self.addtl_funcs:\n                for func in self.addtl_funcs:\n                    try:\n                        line1, line2 = func(addr_str)\n                        error = False\n                        # send refactored line_1 and line_2 back through\n                        # processing\n                        return self.normalize_addr_str(\n                            line1, line2=line2,\n                            city=city, state=state, zipcode=zipcode,\n                            long_hand=long_hand\n                        )\n                    except ValueError:\n                        # try a different additional processing function\n                        pass\n\n        if parsed_addr and not parsed_addr.get('StreetName'):\n            addr_dict = dict(\n                address_line_1=addr_str, address_line_2=line2, city=city,\n                state=state, postal_code=zipcode\n            )\n            full_addr = self.format_address_record(addr_dict)\n            try:\n                parsed_addr = self.parse_address_string(full_addr)\n            except (usaddress.RepeatedLabelError,\n                    AmbiguousAddressError) as err:\n                parsed_addr = None\n                error = err\n\n        if parsed_addr:\n            parsed_addr = self.normalize_address_components(\n                parsed_addr, long_hand=long_hand\n            )\n            zipcode = self.get_parsed_values(\n                parsed_addr, zipcode, 'ZipCode', addr_str\n            )\n            city = self.normalize_city(parsed_addr, addr_str, city)\n            state = self.get_parsed_values(\n                parsed_addr, state, 'StateName', addr_str\n            )\n            state = normalize_state(state)\n\n            # assumes if line2 is passed in that it need not be parsed from\n            # addr_str. Primarily used to allow advanced processing of\n            # otherwise unparsable addresses.\n            line2 = line2 if line2 else self.get_normalized_line_2(parsed_addr)\n            line2 = self.post_clean_addr_str(line2)\n            # line 1 is fully post cleaned in get_normalized_line_segment.\n            line1 = self.get_normalized_line_1(parsed_addr)\n            validate_parens_groups_parsed(line1)\n        else:\n            # line1 is set to addr_str so complete dict can be passed to error.\n            line1 = addr_str\n\n        addr_rec = OrderedDict(\n            address_line_1=line1, address_line_2=line2, city=city,\n            state=state, postal_code=zipcode\n        )\n        if error:\n            raise UnParseableAddressError(None, None, addr_rec)\n        else:\n            return addr_rec\n\n    def normalize_addr_dict(self):\n        addr_dict = validate_address_components(\n            self.address, strict=self.strict\n        )\n\n        # line 1 and line 2 elements are combined to ensure consistent\n        # processing whether the line 2 elements are pre-parsed or\n        # included in line 1\n        addr_str = get_addr_line_str(addr_dict, comma_separate=True)\n        postal_code = addr_dict.get('postal_code')\n        zipcode = validate_us_postal_code_format(\n            postal_code, addr_dict\n        ) if postal_code else None\n        city = addr_dict.get('city')\n        state = addr_dict.get('state')\n        try:\n            address = self.normalize_addr_str(\n                addr_str, city=city, state=state,\n                zipcode=zipcode, long_hand=self.long_hand\n            )\n        except AddressNormalizationError:\n            addr_str = get_addr_line_str(\n                addr_dict, comma_separate=True, addr_parts=ADDRESS_KEYS\n            )\n            address = self.normalize_addr_str(\n                addr_str, city=city, state=state,\n                zipcode=zipcode, long_hand=self.long_hand\n            )\n        return address\n\n    def normalize_city(self, parsed_addr, addr_str, city=None):\n        return self.get_parsed_values(\n            parsed_addr, city, 'PlaceName', addr_str\n        )\n"
  },
  {
    "path": "scourgify/tests/__init__.py",
    "content": ""
  },
  {
    "path": "scourgify/tests/config/__init__.py",
    "content": ""
  },
  {
    "path": "scourgify/tests/config/address_constants.yaml",
    "content": "\nKNOWN_ODDITIES:\n    'developed by HOST': ''\n    ', UN ': ' UNIT '\n\nOCCUPANCY_TYPE_ABBREVIATIONS:\n    'UN': 'UNIT'"
  },
  {
    "path": "scourgify/tests/test_address_normalization.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016-2019 Earth Advantage.\nAll rights reserved\n\nUnit tests for scourgify.\n\"\"\"\n\n# Imports from Standard Library\nfrom collections import OrderedDict\nfrom unittest import TestCase, mock\n\n# Imports from Third Party Modules\nfrom yamlconf import ConfigError\n\n# Local Imports\nfrom scourgify import address_constants\nfrom scourgify.cleaning import (\n    clean_ambiguous_street_types,\n    clean_period_char,\n    post_clean_addr_str,\n    pre_clean_addr_str,\n)\nfrom scourgify.exceptions import (\n    AddressNormalizationError,\n    AddressValidationError,\n    AmbiguousAddressError,\n    IncompleteAddressError,\n    UnParseableAddressError,\n)\nfrom scourgify.normalize import (\n    get_addr_line_str,\n    get_geocoder_normalized_addr,\n    get_normalized_line_segment,\n    get_ordinal_indicator,\n    get_parsed_values,\n    normalize_addr_dict,\n    normalize_addr_str,\n    normalize_address_record,\n    normalize_directionals,\n    normalize_numbered_streets,\n    normalize_occupancy_type,\n    normalize_state,\n    normalize_street_types,\n    parse_address_string,\n    NormalizeAddress\n)\nfrom scourgify.validations import (\n    validate_address_components,\n    validate_parens_groups_parsed,\n    validate_us_postal_code_format,\n)\n\n# Constants\nSERVICE = 'GBR Test Normalization'\n# Helper Functions & Classes\n\n\n# Tests\nclass TestAddressNormalization(TestCase):\n    \"\"\"Unit tests for scourgify\"\"\"\n    # pylint:disable=too-many-arguments\n\n    def setUp(self):\n        \"\"\"setUp\"\"\"\n        self.expected = dict(\n            address_line_1='123 NOWHERE ST',\n            address_line_2='STE 0',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n        self.address_dict = dict(\n            address_line_1='123 Nowhere St',\n            address_line_2='Suite 0',\n            city='Boring',\n            state='OR',\n            postal_code='97009'\n        )\n\n        self.ordinal_addr = dict(\n            address_line_1='4333 NE 113th',\n            city='Boring',\n            state='OR',\n            postal_code='97009'\n        )\n        self.ordinal_expected = dict(\n            address_line_1='4333 NE 113TH',\n            address_line_2=None,\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n        self.parseable_addr_str = '123 Nowhere Street Suite 0 Boring OR 97009'\n        self.parsed_addr = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetName', 'NOWHERE'),\n            ('StreetNamePostType', 'STREET'),\n            ('OccupancyType', 'SUITE'),\n            ('OccupancyIdentifier', '0'),\n            ('PlaceName', 'BORING'),\n            ('StateName', 'OR'),\n            ('ZipCode', '97009')\n        ])\n        self.hash_tag = '999 Nowhere Street # 12 Boring OR 97009'\n        self.hash_expected = dict(\n            address_line_1='999 NOWHERE ST',\n            address_line_2='# 12',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n        self.unparesable_addr_str = '6000 SW 1000TH AVE  (BLDG  A5 RIGHT)'\n\n        self.direction_expected = dict(\n            address_line_1='123 SW NOWHERE ST',\n            address_line_2='STE 0',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n        self.long_hand_expected = dict(\n            address_line_1='123 SOUTHWEST NOWHERE STREET',\n            address_line_2='STE 0',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n        self.abnormal_direction = dict(\n            address_line_1='123 South-West Nowhere St',\n            address_line_2='Suite 0',\n            city='Boring',\n            state='OR',\n            postal_code='97009'\n        )\n\n    def test_normalize_address_record(self):\n        \"\"\"Test normalize_address_record function.\"\"\"\n        result = normalize_address_record(self.parseable_addr_str)\n        self.assertDictEqual(self.expected, result)\n\n        result = normalize_address_record(self.address_dict)\n        self.assertDictEqual(self.expected, result)\n\n        result = normalize_address_record(self.ordinal_addr)\n        self.assertDictEqual(self.ordinal_expected, result)\n\n        result = normalize_address_record(self.hash_tag)\n        self.assertDictEqual(self.hash_expected, result)\n\n        result = normalize_address_record(self.abnormal_direction)\n        self.assertDictEqual(self.direction_expected, result)\n\n        result = normalize_address_record(\n            self.abnormal_direction, long_hand=True\n        )\n        self.assertDictEqual(self.long_hand_expected, result)\n\n    def test_normalize_class(self):\n        \"\"\"Test normalize_address_record function.\"\"\"\n        result = NormalizeAddress(self.parseable_addr_str).normalize()\n        self.assertDictEqual(self.expected, result)\n\n        result = NormalizeAddress(self.address_dict).normalize()\n        self.assertDictEqual(self.expected, result)\n\n        result = NormalizeAddress(self.ordinal_addr).normalize()\n        self.assertDictEqual(self.ordinal_expected, result)\n\n        result = NormalizeAddress(self.hash_tag).normalize()\n        self.assertDictEqual(self.hash_expected, result)\n\n        result = NormalizeAddress(self.abnormal_direction).normalize()\n        self.assertDictEqual(self.direction_expected, result)\n\n        result = NormalizeAddress(\n            self.abnormal_direction, long_hand=True\n        ).normalize()\n        self.assertDictEqual(self.long_hand_expected, result)\n\n    def test_normalize_addr_str(self):\n        \"\"\"Test normalize_addr_str function.\"\"\"\n        result = normalize_addr_str(self.parseable_addr_str)\n        self.assertDictEqual(self.expected, result)\n\n        broken_line1 = '6000 SW 1000TH AVE '\n        broken_line2 = '(BLDG  A1 RIGHT)'\n        result = normalize_addr_str(\n            broken_line1, line2=broken_line2,\n            city='Portland', state='OR', zipcode='97203'\n        )\n        expected = {\n            'address_line_1': '6000 SW 1000TH AVE',\n            'address_line_2': 'BLDG A1 RIGHT',\n            'state': 'OR', 'city': 'PORTLAND',\n            'postal_code': '97203'\n        }\n        self.assertDictEqual(expected, result)\n\n        def addtl_test_func(addr_str):\n            if 'BLDG A1' in addr_str:\n                return '123 NOWHERE STREET', 'BLDG A1 RIGHT'\n            else:\n                raise ValueError\n\n        addtl_processing = '123 Nowhere Street (BLDG A1 RIGHT)'\n        expected = {\n            'address_line_1': '123 NOWHERE ST',\n            'address_line_2': 'BLDG A1 RIGHT',\n            'state': 'OR', 'city': 'PORTLAND',\n            'postal_code': '97203'\n        }\n        result = normalize_addr_str(\n            addtl_processing, city='Portland', state='OR', zipcode='97203',\n            addtl_funcs=[addtl_test_func]\n        )\n        self.assertDictEqual(expected, result)\n\n        self.assertRaises(\n            UnParseableAddressError,\n            normalize_addr_str,\n            self.unparesable_addr_str,\n            city='Portland', state='OR', zipcode='97203',\n            addtl_funcs=[addtl_test_func]\n\n        )\n\n    def test_normalize_addr_dict(self):\n        \"\"\"Test normalize_addr_dict function.\"\"\"\n        result = normalize_addr_dict(self.address_dict)\n        self.assertDictEqual(self.expected, result)\n\n        alternate_dict = dict(\n            address1='123 Nowhere St',\n            address2='Suite 0',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        dict_map = {\n            'address_line_1': 'address1',\n            'address_line_2': 'address2',\n            'city': 'city',\n            'state': 'state',\n            'postal_code': 'zip'\n        }\n        result = normalize_addr_dict(alternate_dict, addr_map=dict_map)\n        self.assertDictEqual(self.expected, result)\n\n    def test_parse_address_string(self):\n        \"\"\"Test parse_address_string function.\"\"\"\n        result = parse_address_string(self.parseable_addr_str)\n        self.assertIsInstance(result, OrderedDict)\n\n        ambig_addr_str = 'AWBREY VILLAGE'\n        with self.assertRaises(AmbiguousAddressError):\n            parse_address_string(ambig_addr_str)\n\n    def test_normalize_occupancies(self):\n        \"\"\"Test normalize_addr_dict function with handling for occupancy\n        type oddities.  This is based on a real life incident; the original\n        behavior to allow non-standard unit types to pass through resulted\n        in an address validation service also allowing the address to pass\n        through even though no unit should have existed on the home.\n        \"\"\"\n        dict_map = {\n            'address_line_1': 'address1',\n            'address_line_2': 'address2',\n            'city': 'city',\n            'state': 'state',\n            'postal_code': 'zip'\n        }\n\n        weird_unit = dict(\n            address1='123 Nowhere St',\n            address2='Ave 345',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        expected = dict(\n            address_line_1='123 NOWHERE ST',\n            address_line_2='UNIT 345',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n        result = normalize_addr_dict(weird_unit, addr_map=dict_map)\n        self.assertDictEqual(expected, result)\n\n        late_unit_add = dict(\n            address1='123 Nowhere St',\n            address2='345',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        result = normalize_addr_dict(late_unit_add, addr_map=dict_map)\n        self.assertDictEqual(expected, result)\n\n        expected = dict(\n            address_line_1='123 NOWHERE ST',\n            address_line_2='# 345',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n\n        hashtag_unit = dict(\n            address1='123 Nowhere St',\n            address2='# 345',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        result = normalize_addr_dict(hashtag_unit, addr_map=dict_map)\n        self.assertDictEqual(expected, result)\n\n        hashtag_unit = dict(\n            address1='123 Nowhere St',\n            address2='#345',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        result = normalize_addr_dict(hashtag_unit, addr_map=dict_map)\n        self.assertDictEqual(expected, result)\n\n        expected = dict(\n            address_line_1='123 NOWHERE ST',\n            address_line_2='APT 345',\n            city='BORING',\n            state='OR',\n            postal_code='97009'\n        )\n\n        abbreviation = dict(\n            address1='123 Nowhere St',\n            address2='Apt 345',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        result = normalize_addr_dict(abbreviation, addr_map=dict_map)\n        self.assertDictEqual(expected, result)\n\n        full_name = dict(\n            address1='123 Nowhere St',\n            address2='Apartment 345',\n            city='Boring',\n            state='OR',\n            zip='97009'\n        )\n        result = normalize_addr_dict(full_name, addr_map=dict_map)\n        self.assertDictEqual(expected, result)\n\n\nclass TestAddressNormalizationUtils(TestCase):\n    \"\"\"Unit tests for scourgify utils\"\"\"\n\n    def setUp(self):\n\n        self.address_dict = dict(\n            address_line_1='123 Nowhere St',\n            address_line_2='Suite 0',\n            city='Boring',\n            state='OR',\n            postal_code='97009'\n        )\n        self.parseable_addr = '123 Nowhere Street Suite 0 Boring OR 97009'\n        self.parsed_addr = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetName', 'NOWHERE'),\n            ('StreetNamePostType', 'STREET'),\n            ('OccupancyType', 'SUITE'),\n            ('OccupancyIdentifier', '0'),\n            ('PlaceName', 'BORING'),\n            ('StateName', 'OR'),\n            ('ZipCode', '97009')\n        ])\n\n        self.unparesable_addr = '6000 SW 1000TH AVE  (BLDG  A1 RIGHT)'\n\n        self.unparesable_addr_dict = OrderedDict([\n            ('AddressNumber', '6000'),\n            ('StreetNamePreDirectional', 'SW'),\n            ('StreetName', '1000TH'),\n            ('StreetNamePostType', 'AVE'),\n            ('SubaddressType', 'BLDG'),\n            ('SubaddressIdentifier', 'A1'),\n            ('SubaddressType', 'RIGHT')\n        ])\n\n    def test_get_parsed_values(self):\n        \"\"\"Test get_parsed_values function.\"\"\"\n        expected = 'BORING'\n        result = get_parsed_values(self.parsed_addr, 'Boring',\n                                   'PlaceName', self.parseable_addr)\n        self.assertEqual(expected, result)\n\n        expected = 'ONE VALUE PRESENT'\n        result = get_parsed_values(self.parsed_addr, 'One Value Present',\n                                   'LandmarkName', self.parseable_addr)\n        self.assertEqual(expected, result)\n\n        result = get_parsed_values(self.parsed_addr, None,\n                                   'LandmarkName', self.parseable_addr)\n        self.assertIsNone(result)\n\n        with self.assertRaises(AmbiguousAddressError):\n            get_parsed_values(self.parsed_addr, 'UnMatched City',\n                              'PlaceName', self.parseable_addr)\n\n    def test_get_norm_line_segment(self):\n        \"\"\"Test get normalized_line_segment function.\"\"\"\n        result = get_normalized_line_segment(self.parsed_addr,\n                                             ['StreetName', 'AddressNumber'])\n        expected = '{} {}'.format(self.parsed_addr['AddressNumber'],\n                                  self.parsed_addr['StreetName'])\n        self.assertEqual(expected, result)\n\n        result = get_normalized_line_segment(\n            self.parsed_addr,\n            ['StreetName', 'StreetNamePostType', 'IntersectionSeparator']\n        )\n        expected = '{} {}'.format(self.parsed_addr['StreetName'],\n                                  self.parsed_addr['StreetNamePostType'])\n        self.assertEqual(expected, result)\n\n    def test_normalize_numbered_streets(self):\n        \"\"\"Test normalize_numbered_streets function.\"\"\"\n        numbered_addr = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetName', '100'),\n            ('StreetNamePostType', 'STREET')\n        ])\n        county_road = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreType', 'COUNTY ROAD'),\n            ('StreetName', '100')\n        ])\n        string_addr = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetName', '91st'),\n            ('StreetNamePostType', 'STREET')\n        ])\n\n        expected = '{}{}'.format(\n            numbered_addr['StreetName'], 'th'\n        )\n        result = normalize_numbered_streets(numbered_addr)\n        self.assertEqual(expected, result['StreetName'])\n\n        result = normalize_numbered_streets(county_road)\n        self.assertDictEqual(county_road, result)\n\n        result = normalize_numbered_streets(string_addr)\n        self.assertDictEqual(string_addr, result)\n\n    def test_normalize_directionals(self):\n        \"\"\"Test normalize_directionals function.\"\"\"\n        unabbr_directional = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'South West', ),\n            ('StreetName', '100'),\n            ('StreetNamePostType', 'STREET')\n        ])\n        abbrev_directional = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW'),\n            ('StreetNamePreType', 'COUNTY ROAD'),\n            ('StreetName', '100')\n        ])\n        no_directional = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetName', '91st'),\n            ('StreetNamePostType', 'STREET')\n        ])\n\n        expected = 'SW'\n        result = normalize_directionals(unabbr_directional)\n        self.assertEqual(expected, result['StreetNamePreDirectional'])\n\n        result = normalize_directionals(abbrev_directional)\n        self.assertDictEqual(abbrev_directional, result)\n\n        result = normalize_directionals(no_directional)\n        self.assertDictEqual(no_directional, result)\n\n        expected = 'SOUTHWEST'\n        result = normalize_directionals(abbrev_directional, long_hand=True)\n        self.assertEqual(expected, result['StreetNamePreDirectional'])\n\n    def test_normalize_street_types(self):\n        \"\"\"Test normalize_street_types function.\"\"\"\n        unabbr_type = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW', ),\n            ('StreetName', 'MAIN'),\n            ('StreetNamePostType', 'STREET')\n        ])\n        abbrev_type = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW', ),\n            ('StreetName', 'MAIN'),\n            ('StreetNamePostType', 'AVE')\n        ])\n        typo_type = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW', ),\n            ('StreetName', 'MAIN'),\n            ('StreetNamePostType', 'STROET')\n        ])\n        no_type = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW', ),\n            ('StreetName', 'MAIN'),\n        ])\n\n        expected = 'ST'\n        result = normalize_street_types(unabbr_type)\n        self.assertEqual(expected, result['StreetNamePostType'])\n\n        result = normalize_street_types(abbrev_type)\n        self.assertDictEqual(abbrev_type, result)\n\n        result = normalize_street_types(typo_type)\n        self.assertDictEqual(typo_type, result)\n\n        result = normalize_street_types(no_type)\n        self.assertDictEqual(no_type, result)\n\n        expected = 'AVENUE'\n        result = normalize_street_types(abbrev_type, long_hand=True)\n        self.assertEqual(expected, result['StreetNamePostType'])\n\n    def test_normalize_occupancy_type(self):\n        \"\"\"Test normalize_occupancy_type function.\"\"\"\n        expected = 'STE'\n        result = normalize_occupancy_type(self.parsed_addr)\n        self.assertEqual(expected, result['OccupancyType'])\n\n    def test_normalize_state(self):\n        \"\"\"Test normalize_state function\"\"\"\n        state = 'ore'\n        expected = 'OR'\n        result = normalize_state(state)\n        self.assertEqual(expected, result)\n\n        state = 'oregano'\n        expected = state\n        result = normalize_state(state)\n        self.assertEqual(expected, result)\n\n        self.assertIsNone(normalize_state(None))\n\n    def test_pre_clean_addr_str(self):\n        \"\"\"Test pre_clean_addr_str function\"\"\"\n        odd_addr = '123 Nowhere    Street, Suite 0; @Boring OR 97009'\n        # we're leaving commas in the pre-clean until norm can be revisited\n        expected = '123 Nowhere Street, Suite 0 Boring OR 97009'.upper()\n        # expected = '123 Nowhere Street Suite 0 Boring OR 97009'.upper()\n        result = pre_clean_addr_str(odd_addr)\n        self.assertEqual(expected, result)\n\n    def test_post_clean_addr_str(self):\n        \"\"\"Test post_clean_addr_str function.\"\"\"\n        addr_str = '(100-104) SW NO   WHERE st'\n        expected = '100-104 SW NO WHERE ST'\n        result = post_clean_addr_str(addr_str)\n        self.assertEqual(expected, result)\n\n        self.assertIsNone(post_clean_addr_str(None))\n        self.assertEqual('', post_clean_addr_str(''))\n\n    def test_validate_address(self):\n        \"\"\"Test validate_address_components function.\"\"\"\n        expected = self.address_dict\n        result = validate_address_components(self.address_dict)\n        self.assertEqual(expected, result)\n\n        minus_line1 = dict(\n            address_line_1=None,\n            address_line_2='Suite 0',\n            city='Boring',\n            state='OR',\n            postal_code='97009'\n        )\n        with self.assertRaises(IncompleteAddressError):\n            validate_address_components(minus_line1)\n\n        minus_zip = dict(\n            address_line_1='123 NoWhere St',\n            address_line_2='Suite 0',\n            city='Boring',\n            state='OR',\n            postal_code=None\n        )\n        with self.assertRaises(IncompleteAddressError):\n            validate_address_components(minus_zip)\n\n        minus_city_state = dict(\n            address_line_1='123 NoWhere St',\n            address_line_2='Suite 0',\n            city=None,\n            state=None,\n            postal_code='97009'\n        )\n\n        with self.assertRaises(IncompleteAddressError):\n            validate_address_components(minus_city_state)\n\n        minus_city_state_zip = dict(\n            address_line_1='123 NoWhere St',\n            address_line_2='Suite 0',\n            city=None,\n            state=None,\n            postal_code=None\n        )\n        with self.assertRaises(IncompleteAddressError):\n            validate_address_components(minus_city_state_zip)\n\n    def test_validate_postal_code(self):\n        \"\"\"Test validate_us_postal_code_format\"\"\"\n\n        with self.assertRaises(AddressValidationError):\n            zip_five = 'AAAAA'\n            validate_us_postal_code_format(zip_five, self.address_dict)\n\n        with self.assertRaises(AddressValidationError):\n            zip_five = '97219-AAAA'\n            validate_us_postal_code_format(zip_five, self.address_dict)\n\n        with self.assertRaises(AddressValidationError):\n            zip_plus = '97219-000100'\n            validate_us_postal_code_format(zip_plus, self.address_dict)\n\n        with self.assertRaises(AddressValidationError):\n            zip_plus = '97219-0001-00'\n            validate_us_postal_code_format(zip_plus, self.address_dict)\n\n        with self.assertRaises(AddressValidationError):\n            zip_five = '9721900'\n            validate_us_postal_code_format(zip_five, self.address_dict)\n\n        zip_five = '972'\n        expected = '00972'\n        result = validate_us_postal_code_format(zip_five, self.address_dict)\n        self.assertEqual(expected, result)\n\n        zip_plus = '97219-00'\n        expected = '97219-0000'\n        result = validate_us_postal_code_format(zip_plus, self.address_dict)\n        self.assertEqual(expected, result)\n\n        zip_plus = '972-0001'\n        expected = '00972-0001'\n        result = validate_us_postal_code_format(zip_plus, self.address_dict)\n        self.assertEqual(expected, result)\n\n        zip_plus = '972190001'\n        expected = '97219-0001'\n        result = validate_us_postal_code_format(zip_plus, self.address_dict)\n        self.assertEqual(expected, result)\n\n        expected = '97219'\n        result = validate_us_postal_code_format(expected, self.address_dict)\n        self.assertEqual(expected, result)\n\n    def test_get_addr_line_str(self):\n        \"\"\"Test get_addr_line_str function.\"\"\"\n        expected = '{} {}'.format(\n            self.address_dict['address_line_1'],\n            self.address_dict['address_line_2']\n        )\n        result = get_addr_line_str(self.address_dict)\n        self.assertEqual(expected, result)\n\n        no_line_2 = {\n            'address_line_1': 'address line 1'\n        }\n        expected = no_line_2['address_line_1']\n        result = get_addr_line_str(no_line_2)\n        self.assertEqual(expected, result)\n\n        empty_line_2 = {\n            'address_line_1': 'address line 1',\n            'address_line_2': None\n        }\n        expected = no_line_2['address_line_1']\n        result = get_addr_line_str(empty_line_2)\n        self.assertEqual(expected, result)\n\n        with self.assertRaises(TypeError):\n            get_addr_line_str(self.address_dict, addr_parts='line1')\n\n    @mock.patch(\n        'scourgify.normalize.geocoder'\n    )\n    def test_get_geocoder_normalized_addr(self, mock_geocoder):\n        \"\"\"Test get_geocoder_normalized_addr\"\"\"\n        geo_addr = mock.MagicMock()\n        geo_addr.ok = True\n        geo_addr.housenumber = '1234'\n        geo_addr.street = \"Main\"\n        geo_addr.subpremise = ''\n        geo_addr.city = 'Boring'\n        geo_addr.state = 'OR'\n        geo_addr.postal = '97000'\n\n        mock_geocoder.google.return_value = geo_addr\n\n        address = {\n            'address_line_1': '1234 Main',\n            'address_line_2': None,\n            'city': 'Boring',\n            'state': 'OR',\n            'postal_code': '97000'\n        }\n        addr_str_return_value = \"1234 Main Boring OR 97000\"\n        get_geocoder_normalized_addr(address)\n        mock_geocoder.google.assert_called_with(addr_str_return_value)\n\n    def test_get_ordinal_indicator(self):\n        \"\"\"Test get_ordinal_indicator\"\"\"\n        result = get_ordinal_indicator(11)\n        expected = 'th'\n        self.assertEqual(expected, result)\n\n        result = get_ordinal_indicator(112)\n        self.assertEqual(expected, result)\n\n        result = get_ordinal_indicator(3113)\n        self.assertEqual(expected, result)\n\n        result = get_ordinal_indicator(1)\n        expected = 'st'\n        self.assertEqual(expected, result)\n\n        result = get_ordinal_indicator(22)\n        expected = 'nd'\n        self.assertEqual(expected, result)\n\n        result = get_ordinal_indicator(31243)\n        expected = 'rd'\n        self.assertEqual(expected, result)\n\n    def test_clean_period_char(self):\n        \"\"\"Test clean_period_char\"\"\"\n        text = \"49.5 blah.blah.\"\n        expected = \"49.5 blahblah\"\n        result = clean_period_char(text)\n        self.assertEqual(expected, result)\n\n    def test_validate_parens_group_parsed(self):\n        \"\"\"Test validate_parens_groups_parsed\"\"\"\n        broken_line1 = '6000 SW 1000TH AVE'\n        result = validate_parens_groups_parsed(broken_line1)\n        self.assertEqual(broken_line1, result)\n\n        bad_addr = '10000 NE 8TH (ROW HOUSE)'\n        with self.assertRaises(AmbiguousAddressError):\n            validate_parens_groups_parsed(bad_addr)\n\n    def test_clean_ambiguous_street_types(self):\n        \"\"\" Test clean_ambiguous_street_types\"\"\"\n        problem_addr = \"1234 BROKEN CT\"\n        expected = \"1234 BROKEN COURT\"\n        result = clean_ambiguous_street_types(problem_addr)\n        self.assertEqual(expected, result)\n\n        normal_addr = \"1234 NORMAL ST\"\n        result = clean_ambiguous_street_types(normal_addr)\n        self.assertEqual(normal_addr, result)\n\n    def test_address_normalization_error(self):\n        error_msg = 'Error Message'\n        error_title = 'ERROR TITLE'\n        addtl_args = 'Addition info'\n        expected = \"{}: {}, {}\".format(error_title, error_msg, addtl_args)\n        error = AddressNormalizationError(error_msg, error_title, addtl_args)\n        self.assertEqual(expected, str(error))\n\n    @mock.patch.object(address_constants.NormalizationConfig, 'get')\n    def test_set_constants(self, mock_config_get):\n        new_addr_keys = ['new keys']\n        new_problem_st = {\n            \"PS\": 'STREET'\n        }\n        mock_config_get.side_effect = (\n            'update', new_addr_keys,\n            None, None, None, None, None,\n            new_problem_st, None, None,\n        )\n        address_constants.set_address_constants()\n        self.assertEqual(address_constants.ADDRESS_KEYS, new_addr_keys)\n        self.assertIn(\"PS\", address_constants.PROBLEM_ST_TYPE_ABBRVS.keys())\n\n        mock_config_get.side_effect = (\n            'replace', new_addr_keys,\n            None, None, None, None, None,\n            new_problem_st, None, None,\n        )\n        address_constants.set_address_constants()\n        self.assertEqual(address_constants.ADDRESS_KEYS, new_addr_keys)\n        self.assertDictEqual(\n            new_problem_st, address_constants.PROBLEM_ST_TYPE_ABBRVS\n        )\n\n        mock_config_get.side_effect = (\n            'invalid', new_addr_keys,\n            None, None, None, None, None,\n            new_problem_st, None, None,\n        )\n        self.assertRaises(\n            ConfigError, address_constants.set_address_constants\n        )\n\n    def test_handle_abnormal_occupancy(self):\n        addr_str = '123 SW MAIN UN'\n        expected = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW'),\n            ('StreetName', 'MAIN'),\n            ('StreetNamePostType', 'UN'),\n        ])\n        result = parse_address_string(addr_str)\n        self.assertEqual(expected, result)\n\n        addr_str = '123 SW MAIN UN A'\n        expected = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW'),\n            ('StreetName', 'MAIN'),\n            ('OccupancyType', 'UN'),\n            ('OccupancyIdentifier', 'A')\n        ])\n        result = parse_address_string(addr_str)\n        self.assertEqual(expected, result)\n\n        addr_str = '123 SW MAIN UN, UN A'\n        expected = OrderedDict([\n            ('AddressNumber', '123'),\n            ('StreetNamePreDirectional', 'SW'),\n            ('StreetName', 'MAIN'),\n            ('StreetNamePostType', 'UN'),\n            ('OccupancyType', 'UN'),\n            ('OccupancyIdentifier', 'A')\n        ])\n        result = parse_address_string(addr_str)\n        self.assertEqual(expected, result)\n"
  },
  {
    "path": "scourgify/tests/test_cleaning.py",
    "content": "#!/usr/bin/env python\r\n# encoding: utf-8\r\n\"\"\"\r\ncopyright (c) 2016-2019 Earth Advantage.\r\nAll rights reserved\r\n\"\"\"\r\n\r\n# Imports from Standard Library\r\nfrom unittest import TestCase\r\n\r\n# Local Imports\r\nfrom scourgify.cleaning import strip_occupancy_type\r\n\r\n\r\nclass CleaningTests(TestCase):\r\n\r\n    def test_strip_occupancy_type(self):\r\n        expected = '33'\r\n\r\n        line2 = 'Unit 33'\r\n        result = strip_occupancy_type(line2)\r\n        self.assertEqual(result, expected)\r\n\r\n        line2 = 'Apartment 33'\r\n        result = strip_occupancy_type(line2)\r\n        self.assertEqual(result, expected)\r\n\r\n        line2 = 'Unit #33'\r\n        result = strip_occupancy_type(line2)\r\n        self.assertEqual(result, expected)\r\n\r\n        line2 = 'Building 3 Unit 33'\r\n        result = strip_occupancy_type(line2)\r\n        self.assertEqual(result, expected)\r\n\r\n        line2 = 'Building 3 UN 33'\r\n        result = strip_occupancy_type(line2)\r\n        self.assertEqual(result, expected)\r\n\r\n        line2 = '33'\r\n        result = strip_occupancy_type(line2)\r\n        self.assertEqual(result, expected)\r\n"
  },
  {
    "path": "scourgify/validations.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n\"\"\"\ncopyright (c) 2016-2017 Earth Advantage.\nAll rights reserved\n..codeauthor::Fable Turas <fable@rainsoftware.tech>\n\n[ INSERT DOC STRING ]  # TODO\n\"\"\"\n# Imports from Standard Library\nimport re\nfrom typing import Mapping, Union\n\n# Local Imports\n# Public Classes and Functions\nfrom scourgify.cleaning import post_clean_addr_str\nfrom scourgify.exceptions import (\n    AddressValidationError,\n    AmbiguousAddressError,\n    IncompleteAddressError,\n)\n\n# Setup\n\n# Constants\n\n# Data Structure Definitions\n\n# Private Functions\n\n\ndef _get_substrings_with_regex(string, pattern=None):\n    # type: (str) -> list\n    \"\"\"Get substring matching regex rule.\n\n    :param string: string to search for substring\n    :type string: str\n    :param pattern: regex pattern\n    :type pattern: regex\n    :return: str matching pattern search or None\n    :rtype: list\n    \"\"\"\n    pattern = re.compile(pattern)\n    match = re.findall(pattern, string)\n    return match\n\n\n# Public Functions\ndef validate_address_components(address_dict, strict=True):\n    # type: (Mapping[str, str]) -> Mapping[str, str]\n    \"\"\"Validate non-null values for minimally viable address elements.\n\n    All addresses should have at least an address_line_1 and a postal_code\n    or a city and state.\n\n    :param address_dict: dict containing address components having keys\n        'address_line_1', 'postal_code', 'city', 'state'\n    :type address_dict: Mapping\n    :param strict: bool indicating strict handling of components address parts\n        city, state and postal_code, vs city and state OR postal_code\n    :return: address_dict if no errors are raised.\n    :rtype: Mapping\n    \"\"\"\n    locality = (\n        address_dict.get('postal_code') and\n        address_dict.get('city') and address_dict.get('state')\n        if strict else\n        address_dict.get('postal_code') or\n        (address_dict.get('city') and address_dict.get('state'))\n    )\n    if not address_dict.get('address_line_1'):\n        msg = 'Address records must include Line 1 data.'\n        raise IncompleteAddressError(msg)\n    elif not locality:\n        msg = (\n            'Address records must contain a city, state, and postal_code.'\n            if strict else\n            'Address records must contain a city and state, or a postal_code'\n        )\n        raise IncompleteAddressError(msg)\n    return address_dict\n\n\ndef validate_us_postal_code_format(postal_code, address):\n    # type: (str, Union[str, Mapping]) -> str\n    \"\"\"Validate postal code conforms to US five-digit Zip or Zip+4 standards.\n\n    :param postal_code: string containing US postal code data.\n    :type postal_code: str\n    :param address: dict or string containing original address.\n    :type address: dict | str\n    :return: original postal code if no error is raised\n    :rtype: str\n    \"\"\"\n    error = None\n    msg = (\n        'US Postal Codes must conform to five-digit Zip or Zip+4 standards.'\n    )\n    postal_code = post_clean_addr_str(postal_code)\n    plus_four_code = postal_code.split('-')\n    for code in plus_four_code:\n        try:\n            int(code)\n        except ValueError:\n            error = True\n    if not error:\n        if '-' in postal_code:\n            if len(postal_code.replace('-', '')) > 9:\n                error = True\n            elif len(plus_four_code) != 2:\n                error = True\n            else:\n                postal_code = '-'.join([\n                    plus_four_code[0].zfill(5), plus_four_code[1].zfill(4)\n                ])\n        elif len(postal_code) == 9:\n            postal_code = '-'.join([postal_code[:5], postal_code[5:]])\n        elif len(postal_code) > 5:\n            error = True\n        else:\n            postal_code = postal_code.zfill(5)\n\n    if error:\n        raise AddressValidationError(msg, None, address)\n    else:\n        return postal_code\n\n\ndef validate_parens_groups_parsed(line1):\n    # type: (str) -> str\n    \"\"\"Validate any parenthesis segments have been successfully parsed.\n\n    Assumes any parenthesis segments in original address string are either\n    line 2 or ambiguous address elements.  If any parenthesis segment remains\n    in line1 after all other address processing has been applied,\n    AmbiguousAddressError is raised.\n\n    :param line1: processed line1 address string portion\n    :type line1: str\n    :return: line1 address string\n    :rtype: str\n    \"\"\"\n    parenthesis_groups = _get_substrings_with_regex(line1, r'\\((.+?)\\)')\n    if parenthesis_groups:\n        raise AmbiguousAddressError(None, None, line1)\n    else:\n        return line1\n"
  },
  {
    "path": "setup.cfg",
    "content": "[metadata]\nname=usaddress-scourgify\nversion=0.6.0\ndescription=Clean US addresses following USPS pub 28 and RESO guidelines\nauthor=Fable Turas\nauthor_email=fable@rainsoftware.tech\nmaintainer=GreenBuildingRegistry\nmaintainer_email=admin@greenbuildingregistry.com\nkeywords= usaddress, normalization, address\nurl=https://github.com/GreenBuildingRegistry/usaddress-scourgify\nclassifiers =\n\tDevelopment Status :: 5 - Production/Stable\n\tIntended Audience :: Developers\n\tOperating System :: OS Independent\n\tProgramming Language :: Python :: 3.5\n\tProgramming Language :: Python :: 3.6\n\tProgramming Language :: Python :: 3.7\n\tProgramming Language :: Python :: 3.8\npython_requires='>=3.5'\n[options]\npackages = find:\ninclude_package_data = True\nzip_safe = False\ninstall_requires =\n\tgeocoder>=1.22.6\n\tusaddress>=0.5.9\n\tyaml-config>=0.1.2\n[bdist_wheel]\npython-tag = py3\n"
  },
  {
    "path": "setup.py",
    "content": "#!/usr/bin/env python\n\nfrom setuptools import setup\n\n\nsetup()\n"
  },
  {
    "path": "tox.ini",
    "content": "[tox]\nenvlist = py35,py36,py37,py38\n\n[testenv]\nsetenv =\n    ADDRESS_CONFIG_DIR = {toxinidir}/scourgify/tests/config\ndeps=\n    -rrequirements/dev.txt\n\tpytest\n    pytest-cov\n    pytest-xdist\n    testfixtures>=5.1.1\n\ncommands =\n    pytest --cov=. --cov-report= --cov-append -s\n    flake8 scourgify\n\n[flake8]\nexclude=__init__.py"
  }
]