[
  {
    "path": ".gitignore",
    "content": "*.pyc\n\n# coverage files\n*.coverage\nhtmlcov\ncover\n\n# temporary files\n*~\n\n# virtual environments\nenv*/\nvenv/\n\n# python setup.py build artifacts\nbuild/\ndist/\n*egg*/\ndocs/_build\n"
  },
  {
    "path": "HISTORY.rst",
    "content": ".. :changelog:\n\nHistory\n-------\n\n1.1.1 (2017-05-15)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n**New Features**:\n\n* ``compress_varchar`` parameter added to ETLAlchemySource.__init__() to allow for optional \"_minimizing of varchar() columns_\". Defaults to ``False``.\n\n**Bug Fixes**:\n\n* Handles huge Decimal values ( > 2^32 ) when determining whether or not to coerce column type to Integer, or leave Decimal.\n\n* Fixed bugs surrounding **Upserting** of rows (when ``drop_database=False``).\n\n* Nearest power of 2 now rounds properly\n\n\n1.0.7 (2016-08-04)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n**New Features**:\n\n* Auto-determination of VARCHAR(n) column size. Size of VARCHAR(n) fields are auto-determined based on the # of chars in the largest string in the column, which is rounded up to the nearest power of 2. (i.e. 21 becomes 32).\n\n**Bug Fixes**:\n\n* Added more thorough UTF-8 support. Previous releases broke when unicode strings were decoded as if they were byte-strings.\n\n* Fixed bug which threw an exception when source is PostgreSQL, and table is capitalized.\n\n``engine.execute(SELECT COUNT(*) FROM Capitalizedtable)`` *replaced with*\n``T_with_capitalized_name..count().fetchone() for cross-db support``\n\n\n**Other Changes**:\n\n* Created HISTORY.rst\n"
  },
  {
    "path": "LICENSE.txt",
    "content": "The MIT License (MIT)\nCopyright (c) 2016 Sean Harrington\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "MANIFEST",
    "content": "# file GENERATED by distutils, do NOT edit\nsetup.cfg\nsetup.py\netlalchemy/ETLAlchemySource.py\netlalchemy/ETLAlchemyTarget.py\netlalchemy/__init__.py\netlalchemy/etlalchemy_exceptions.py\netlalchemy/literal_value_generator.py\netlalchemy/schema_transformer.py\n"
  },
  {
    "path": "README.md",
    "content": "# etlalchemy\nExtract, Transform and Load...Migrate any SQL Database in 4 Lines of Code. *[Read more here...](http://thelaziestprogrammer.com/sharrington/databases/migrating-between-databases-with-etlalchemy)*\n\n[![Donate](https://img.shields.io/badge/donate-paypal-blue.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=DH544PY7RFSLA)\n[![Donate](https://img.shields.io/badge/donate-gratipay-green.svg)](https://gratipay.com/etlalchemy/)\n# Installation\n\n```bash\npip install etlalchemy\n# On El Capitan:\n### pip install --ignore-installed etlalchemy\n\n# Also install the necessary DBAPI modules and SQLAlchemy dialects\n# For example, for MySQL, you might use:\n# pip install pymsql\n```\n\n# Basic Usage\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n\nsource = ETLAlchemySource(\"mssql+pyodbc://username:password@DSN_NAME\")\ntarget = ETLAlchemyTarget(\"mysql://username:password@hostname/db_name\", drop_database=True)\ntarget.addSource(source)\ntarget.migrate()\n````\n\n# Examples\n\n**Provide a list of tables to include/exclude in migration**\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n\n# Load ONLY the 'salaries' table\nsource = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\",\n                          included_tables=[\"salaries\"])\n# Conversely, you could load ALL tables EXCEPT 'salaries'\n# source = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\",\\\n#                          excluded_tables=[\"salaries\"])\n\ntarget = ETLAlchemyTarget(\"postgresql://etlalchemy:etlalchemy@localhost/test\", drop_database=True)\ntarget.addSource(source)\ntarget.migrate()\n```\n**Only migrate schema, or only Data, or only FKs, or only Indexes (or any combination of the 4!)**\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n\nsource = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\")\n\ntarget = ETLAlchemyTarget(\"postgresql://etlalchemy:etlalchemy@localhost/test\", drop_database=True)\ntarget.addSource(source)\n# Note that each phase (schema, data, index, fk) is independent of all others,\n# and can be run standalone, or in any combination. (Obviously you need a schema to send data, etc...)\ntarget.migrate(migrate_fks=False, migrate_indexes=False, migrate_data=False, migrate_schema=True)\n```\n**Skip columns and tables if they are empty**\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n# This will skip tables with no rows (or all empty rows), and ignore them during schema migration\n# This will skip columns if they have all NULL values, and ignore them during schema migration\nsource = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\",\\\n                          skip_column_if_empty=True,\\\n                          skip_table_if_empty=True)\ntarget = ETLAlchemyTarget(\"postgresql://etlalchemy:etlalchemy@localhost/test\", drop_database=True)\ntarget.addSource(source)\ntarget.migrate()\n```\n**Enable 'upserting' of data**\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n\nsource = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\")\n# This will leave the target DB as is, and if the tables being migrated from Source -> Target\n# already exist on the Target, then rows will be updated based on PKs if they exist, or \n# inserted if they DNE on the Target table.\ntarget = ETLAlchemyTarget(\"postgresql://etlalchemy:etlalchemy@localhost/test\", drop_database=False)\ntarget.addSource(source)\ntarget.migrate()\n```\n**Alter schema (change column names, column types, table names, and Drop tables/columns)**\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n# See below for the simple structure of the .csv's for schema changes\nsource = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\",\\\n                          column_schema_transformation_file=os.getcwd() + \"/transformations/column_mappings.csv\",\\\n                          table_schema_transformation_file=os.getcwd() + \"/transformations/table_mappings.csv\")\ntarget = ETLAlchemyTarget(\"postgresql://SeanH:Pats15Ball@localhost/test\", drop_database=True)\ntarget.addSource(source)\ntarget.migrate()\n```\n| *column_mappings.csv* | *table_mappings.csv* |\n| :--- | :--- |\n|Column Name,Table Name,New Column Name,New Column Type,Delete|Table Name,New Table Name,Delete|\n|last_name,employees,,,True|table_to_rename,new_table_name,False|\n|fired,employees,,Boolean,False|table_to_delete,,True|\n|birth_date,employees,dob,,False|departments,dept,False|\n\n**Rename any column which ends in a given 'suffix' (or skip the column during migration)**\n```python\nfrom etlalchemy import ETLAlchemySource, ETLAlchemyTarget\n# global_renamed_col_suffixes is useful to standardize column names across tables (like the date example below)\nsource = ETLAlchemySource(\"mysql://etlalchemy:etlalchemy@localhost/employees\",\\\n                          global_ignored_col_suffixes=['drop_all_columns_that_end_in_this'],\\\n                          global_renamed_col_suffixes={'date': 'dt'},\\ #i.e. \"created_date -> created_dt\"\n                         )\ntarget = ETLAlchemyTarget(\"postgresql://SeanH:Pats15Ball@localhost/test\", drop_database=True)\ntarget.addSource(source)\ntarget.migrate()\n```\n\n# Known Limitations\n1. 'sqlalchemy_migrate' does not support MSSQL FK migrations.\n   *_(So, FK migrations will be skipped when Target is MSSQL)_\n2. Currently not compatible with Windows\n   * Several \"os.system()\" calls with UNIX-specific utilities\n   * One option for Windows users is installing through the [Windows Subsystem for Linux (WSL)](https://msdn.microsoft.com/en-us/commandline/wsl/install_guide)\n3. If Target DB is in the Azure Cloud (MSSQL), FreeTDS has some compatibility issues which are performance related. This may be noticed when migrating tables with 1,000,000+ rows into a Azure MSSQL Server.\n4. Though the MSSQL 'BULK INSERT' feature is supported in this tool, it is NOT supported on either Azure environments, or AWS MSSQL Server environments (no 'bulkadmin' role allowed). Feel free to test this out on a different MSSQL environment!\n5. Regression tests have not **(yet)** been created due to the unique **(and expensive)** way one must test all of the different database types.\n6. Migrations *to* MSSQL and Oracle are extremely slow due to the lack of 'fast' import capabilities. \n  * 'SQL Loader' can be used on Oracle, and the 'BULK INSERT' operation can be used on MSSQL, however the former is a PITA to install, and the latter is not supported in several MSSQL environments (see 'Known Limitations' below).\n  * 'BULK INSERT' *is supported* in etlalchemy (with limited testing), but \"SQL LOADER\" is not (yet).\n7. When sending data to PostgreSQL, if the data contains VARCHAR() or TEXT() columns with carriage returns ('^M' or '\\r'), these will be stripped.\n  * This is due to the lack of the \"ENCLOSED BY\" option of psycopg.copy_from() - these chars are interpreted as literals, and in turn tell the COPY FROM operation that \"the row ends here\"\n\n# Assumptions Made\n1. Default date formats for all Target DB's are assumed to be the 'out-of-the-box' defaults.\n2. Text fields to not contain the character \"|\", or the string \"|,\".\n   * On some Target DBs, if you have text fields containing \"|,\" (mssql) or \"|\" (sqlite), then the 'fast' import may fail, or insert bizarre values into your DB. This is due to the 'delimiter' which separates column values in the file that is sent to the Target DB.\n\n# On Testing \n1. The 'Performance' matrix has been put together using a simple script which tests every combination of Source (5) and Target (5) DB migration (25 total combinations).\n  * The script is not included (publicly), as it contains the connection strings of AWS RDS instances.\n2. A regression test suite is needed, as is funding to setup an environment for Oracle and MSSQL instances.. \n3. There are definitely some untested column types here amongst all 5 RDBMS's. Please create *pull requests* or open *issues* that describe the problem **in detail** as these arise!\n\n\n# Contributors\nWe are always looking for contributors! \n\nThis project has [its origins](http://thelaziestprogrammer.com/migrating-between-databases-with-etlalchemy) in solving the problem of migrating off of bulky, expensive enterprise-level databases. If the project has helped you to migrate off of these databases, and onto open-source RDBMS's, the best way to show your support is by opening Pull Requests and Issues.\n\n\n\n# Donations\n[Donations through Gratipay](https://gratipay.com/etlalchemy/) are welcome, but **Pull Requests** are better!\n\nYou can also support us [via PayPal here.](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=DH544PY7RFSLA)\n\n# Other\n\nFor help installing cx_Oracle on a Mac (El Capitan + cx_Oracle = Misery), [check out this blog post](http://thelaziestprogrammer.com/sharrington/databases/oracle/install-cx_oracle-mac) for help. \n\nRun this tool from the **same server that hosts your Target database** to get **maximum performance** out of it.\n"
  },
  {
    "path": "TODO.md",
    "content": "1. Add regression tests\n2. Add unit tests\n3. Add support for Python 3.5\n4. Check to see if FK exists between 2 tables - if it does, append an integer to the end of the constraint_name. Right now we just catch exceptions, and only on a subset of supported RDBMSs (OperationalError - MySQL, ProgrammingError - PostgreSQL)\n5. Replace column-type guessing process (iterating over every row)  with a GROUP BY query to improve performance.\n6. Add parameter for \"_quoted_strings_enclosed_by_\" to **ETLAlchemySource** to override default .csv quote-character.\n7. Add parameter for \"_cleanup_table_csv_files_\", with default value of True, allowing the user to override default and let files persist after they are loaded into Target DB.\n"
  },
  {
    "path": "etlalchemy/ETLAlchemySource.py",
    "content": "import codecs\nfrom itertools import islice\nfrom literal_value_generator import dump_to_sql_statement, dump_to_csv,\\\n    dump_to_oracle_insert_statements\nimport random\nfrom migrate.changeset.constraint import ForeignKeyConstraint\nfrom datetime import datetime\nimport time\nfrom copy import deepcopy\nimport pickle\nimport sqlalchemy\nimport logging\n# from clean import cleaners\nfrom sqlalchemy.sql import select\nfrom sqlalchemy.schema import CreateTable, Column\nfrom sqlalchemy.sql.schema import Table, Index\nfrom sqlalchemy.ext.automap import automap_base\nfrom sqlalchemy.orm import Session, sessionmaker\nfrom sqlalchemy import create_engine, MetaData, func, and_\nfrom sqlalchemy.engine import reflection\nfrom sqlalchemy.inspection import inspect\nfrom sqlalchemy.exc import NoSuchTableError\nfrom sqlalchemy.types import Text, Numeric, BigInteger, Integer, DateTime, Date, TIMESTAMP, String, BINARY, LargeBinary\nfrom sqlalchemy.dialects.postgresql import BYTEA\nimport inspect as ins\nimport re\nimport csv\nfrom schema_transformer import SchemaTransformer\nfrom etlalchemy_exceptions import DBApiNotFound\nimport os\n\n# Parse the connn_string to find relevant info for each db engine #\n\n\"\"\"\nAn instance of 'ETLAlchemySource' represents 1 DB. This DB can be sent to\nmultiple 'ETLAlchemyTargets' via calls to ETLAlchemySource.migrate().\nSee examples (on github) for info...\n\"\"\"\n\nclass ETLAlchemySource():\n\n    def __init__(self,\n                 conn_string,\n                 global_ignored_col_suffixes=[],\n                 global_renamed_col_suffixes={},\n                 column_schema_transformation_file=None,\n                 table_schema_transformation_file=None,\n                 included_tables=None,\n                 excluded_tables=None,\n                 skip_table_if_empty=False,\n                 skip_column_if_empty=False,\n                 compress_varchar=False,\n                 log_file=None):\n        # TODO: Store unique columns in here, and ADD the unique constraints\n        # after data has been migrated, rather than before\n        self.unique_columns = []\n        self.compress_varchar = compress_varchar\n        \n        self.logger = logging.getLogger(\"ETLAlchemySource\")\n        self.logger.propagate = False\n        \n        for h in list(self.logger.handlers):\n            # Clean up any old loggers...(useful during testing w/ multiple\n            # log_files)\n            self.logger.removeHandler(h)\n        handler = logging.StreamHandler()\n        if log_file is not None:\n            handler = logging.FileHandler(log_file)\n        formatter = logging.Formatter('%(name)s (%(levelname)s) - %(message)s')\n        handler.setFormatter(formatter)\n\n        self.logger.addHandler(handler)\n        self.logger.setLevel(logging.INFO)\n        # Load the json dict of cleaners...\n        # {'table': [cleaner1, cleaner2,...etc],\n        #  'table2': [cleaner1,...cleanerN]}\n\n        self.included_tables = included_tables\n        self.excluded_tables = excluded_tables\n        # Set this to 'False' if you are using either of the\n        # following MSSQL Environments:\n        #  1.) AWS SQL Server\n        #  ---> The 'bulkadmin' role required for BULK INSERT permissions\n        #  is not available in AWS\n        #  (see https://forums.aws.amazon.com/thread.jspa?threadID=122351)\n        #  2.) Azure SQL\n        #  ---> The 'BULK INSERT' feature is disabled in the Microsoft Azure\n        #  cloud.\n        # ** Otherwise, setting this to 'True' will vastly improve run-time...\n        self.enable_mssql_bulk_insert = False\n\n        self.current_ordered_table_columns = []\n        self.cleaners = {}\n\n        self.schema_transformer = SchemaTransformer(\n            column_transform_file=column_schema_transformation_file,\n            table_transform_file=table_schema_transformation_file,\n            global_renamed_col_suffixes=global_renamed_col_suffixes)\n\n        self.tgt_insp = None\n        self.src_insp = None\n        \n        self.dst_engine = None\n        self.constraints = {}\n        self.indexes = {}\n        self.fks = {}\n        self.engine = None\n        self.connection = None\n        self.orm = None\n        self.database_url = conn_string\n\n        self.total_rows = 0\n        self.column_count = 0\n        self.table_count = 0\n        self.empty_table_count = 0\n        self.empty_tables = []\n        self.deleted_table_count = 0\n        self.deleted_column_count = 0\n        self.deleted_columns = []\n        self.null_column_count = 0\n        self.null_columns = []\n        self.referential_integrity_violations = 0\n        self.unique_constraint_violations = []\n        self.unique_constraint_violation_count = 0\n\n        self.skip_column_if_empty = skip_column_if_empty\n        self.skip_table_if_empty = skip_table_if_empty\n\n        self.total_indexes = 0\n        self.index_count = 0\n        self.skipped_index_count = 0\n\n        self.total_fks = 0\n        self.fk_count = 0\n        self.skipped_fk_count = 0\n        # Config\n        self.check_referential_integrity = False\n        self.riv_arr = []\n        self.start = datetime.now()\n\n        self.global_ignored_col_suffixes = global_ignored_col_suffixes\n\n        self.times = {}  # Map Tables to Names...\n\n    def get_nearest_power_of_two(self, num):\n        # This is optimized for MySQL: we want to optimize\n        # cache hits by defining our column sizes as small\n        # as possible, to the nearest power of 2.\n        i = 2\n        if num < 256:\n            # Disk space is L + 1 byte for length (1 - 255)\n            while (i-1) < num:\n                i *= 2\n            return i - 1\n        else:\n            # Disk space is L + 2 bytes for length (256 - 65536)\n            while (i-2) < num:\n                i *= 2\n            return i - 2\n\n\n    def standardize_column_type(self, column, raw_rows):\n        old_column_class = column.type.__class__\n        column_copy = Column(column.name,\n                             column.type,\n                             nullable=column.nullable,\n                             unique=column.unique,\n                             primary_key=column.primary_key)\n        if column.unique:\n            self.unique_columns.append(column.name)\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        \"\"\" *** STANDARDIZATION *** \"\"\"\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        idx = self.current_ordered_table_columns.index(column.name)\n        ##############################\n        # Duck-typing to remove\n        # database-vendor specific column types\n        ##############################\n        base_classes = map(\n            lambda c: c.__name__.upper(),\n            column.type.__class__.__bases__)\n        self.logger.info(\"({0}) {1}\".format(column.name,\n            column.type.__class__.__name__))\n        self.logger.info(\"Bases: {0}\".format(str(base_classes))) \n\n        # Assume the column is empty, unless told otherwise\n        null = True\n\n        if \"ENUM\" in base_classes:\n            for r in raw_rows:\n                if r[idx] is not None:\n                    null = False\n            # Hack for error 'postgresql enum type requires a name'\n            if self.dst_engine.dialect.name.lower() == \"postgresql\":\n                column_copy.type = column.type\n                column_copy.type.__class__ = column.type.__class__.__bases__[0]\n                # Name the enumeration 'table_column'\n                column_copy.type.name = str(column).replace(\".\", \"_\")\n            else:\n                column_copy.type.__class__ = column.type.__class__.__bases__[0]\n        elif \"STRING\" in base_classes\\\n                or \"VARCHAR\" in base_classes\\\n                or \"TEXT\" in base_classes:\n            #########################################\n            # Get the VARCHAR size of the column...\n            ########################################\n            varchar_length = column.type.length\n            ##################################\n            # Strip collation here ...\n            ##################################\n            column_copy.type.collation = None\n            max_data_length = 0\n            for row in raw_rows:\n                data = row[idx]\n                if data is not None:\n                    null = False\n                    # Update varchar(size)\n                    if len(data) > max_data_length:\n                        max_data_length = len(data)\n                    if isinstance(row[idx], unicode):\n                        row[idx] = row[idx].encode('utf-8', 'ignore')\n                    else:\n                        row[idx] = row[idx].decode('utf-8', 'ignore').encode('utf-8')\n            if self.compress_varchar:\n                # Let's reduce the \"n\" in VARCHAR(n) to a power of 2\n                if max_data_length > 0:\n                    # The column is not empty...\n                    column_size = self.get_nearest_power_of_two(max_data_length)\n                    column_copy.type = String(column_size)\n                    self.logger.info(\"Converting to -> VARCHAR({0}) (max_data_length: {1})\".format(str(column_size), str(max_data_length)))\n                elif varchar_length > 0:\n                    # The column is empty BUT has a predfined size\n                    column_size = self.get_nearest_power_of_two(varchar_length)\n                    column_copy.type = String(column_size)\n                    self.logger.info(\"Converting to -> VARCHAR({0}) (prev varchar size: {1})\".format(str(column_size), str(varchar_length)))\n                else:\n                    # The column is empty and has NO predefined size\n                    column_copy.type = Text()\n                    self.logger.info(\"Converting to Text()\")\n            else:\n                if varchar_length > 0:\n                    column_copy.type = String(varchar_length)\n                else:\n                    # The column has NO predefined size\n                    column_copy.type = Text()\n                    self.logger.info(\"Converting to Text()\")\n        elif \"UNICODE\" in base_classes:\n            #########################################\n            # Get the VARCHAR size of the column...\n            ########################################\n            varchar_length = column.type.length\n            column_copy.type = String()\n            column_copy.type.length = varchar_length\n            ##################################\n            # Strip collation here ...\n            ##################################\n            column_copy.type.collation = None\n            for row in raw_rows:\n                data = row[idx]\n                if varchar_length and data and len(data) > varchar_length:\n                    self.logger.critical(\n                        \"Length of column '{0}' exceeds VARCHAR({1})\".format(\n                            column.name, str(varchar_length)))\n                if data is not None:\n                    null = False\n                    if isinstance(row[idx], unicode):\n                        row[idx] = row[idx].encode('utf-8', 'ignore')\n                #if row[idx]:\n                #    row[idx] = row[idx].decode('utf-8', 'ignore')\n\n        elif \"DATE\" in base_classes or \"DATETIME\" in base_classes:\n            ####################################\n            # Determine whether this is a Date\n            # or Datetime field\n            ###################################\n            type_count = {}\n            types = set([])\n            for row in raw_rows:\n                data = row[\n                    self.current_ordered_table_columns.index(\n                        column.name)]\n                types.add(data.__class__.__name__)\n                if type_count.get(data.__class__.__name__):\n                    type_count[data.__class__.__name__] += 1\n                else:\n                    type_count[data.__class__.__name__] = 1\n                if data is not None:\n                    null = False\n            self.logger.warning(str(type_count))\n            if type_count.get(\"datetime\"):\n                if self.dst_engine.dialect.name.lower() in [\"postgresql\"]:\n                    self.logger.info(\"Postgresql has no DATETIME - converting to TIMESTAMP\")\n                    column_copy.type = TIMESTAMP()\n                else:\n                    column_copy.type = DateTime()\n            else:\n                column_copy.type = Date()\n\n        elif \"NUMERIC\" in base_classes\\\n                or \"FLOAT\" in base_classes\\\n                or \"DECIMAL\" in base_classes:\n            ####################################\n            # Check all cleaned_rows to determine\n            # if column is decimal or integer\n            ####################################\n            mantissa_max_digits = 0\n            left_hand_max_digits = 0\n            mantissa_gt_zero = False\n            intCount = 0\n            maxDigit = 0\n            type_count = {}\n            types = set([])\n            for row in raw_rows:\n                data = row[\n                    self.current_ordered_table_columns.index(\n                        column.name)]\n                types.add(data.__class__.__name__)\n                if type_count.get(data.__class__.__name__):\n                    type_count[data.__class__.__name__] += 1\n                else:\n                    type_count[data.__class__.__name__] = 1\n                ######################\n                # Check for NULL data\n                # (We will drop column if all rows contain null)\n                ######################\n                if data is not None:\n                    null = False\n                if data.__class__.__name__ == 'Decimal' or\\\n                   data.__class__.__name__ == 'float':\n                    splt = str(data).split(\".\")\n                    if len(splt) == 1:\n                        intCount += 1\n                        maxDigit = max(data, maxDigit)\n                        continue\n\n                    left_hand_digits = splt[0]\n                    mantissa_digits = splt[1]\n\n                    # Store greatest mantissa to check for decimal cols that\n                    # should be integers...(i.e. if m = 3.000)\n                    mantissa_max_digits = max(mantissa_max_digits,\n                                              len(mantissa_digits))\n                    left_hand_max_digits = max(left_hand_max_digits,\n                                               len(left_hand_digits))\n                    # If we have a mantissa greater than zero, we can keep this column as a decimal\n                    if not mantissa_gt_zero and float(mantissa_digits) > 0:\n                        # Short circuit the above 'and' so we don't keep resetting mantissa_gt_zero\n                        mantissa_gt_zero = True\n\n                elif data.__class__.__name__ == 'int':\n                    intCount += 1\n                    maxDigit = max(data, maxDigit)\n            self.logger.info(\" --> \" + str(column.name) +\n                             \"...\" + str(type_count))\n            #self.logger.info(\"Max Digit Length: {0}\".format(str(len(str(maxDigit)))))\n            #self.logger.info(\"Max Mantissa Digits: {0}\".format(str(mantissa_max_digits)))\n            #self.logger.info(\"Max Left Hand Digit: {0}\".format(str(left_hand_max_digits)))\n            #self.logger.info(\"Total Left Max Digits: {0}\".format(str(max(len(str(maxDigit)), left_hand_max_digits))))\n            if mantissa_gt_zero:\n                cum_max_left_digits = max(\n                    len(str(maxDigit)), (left_hand_max_digits))\n                self.logger.info(\"Numeric({0}, {1})\".format(str(cum_max_left_digits + mantissa_max_digits), str(mantissa_max_digits)))\n                column_copy.type = Numeric(\n                    precision=cum_max_left_digits + mantissa_max_digits,\n                    scale=mantissa_max_digits)\n                if intCount > 0:\n                    self.logger.warning(\n                        \"Column '\" +\n                        column.name +\n                        \"' contains decimals and integers, \" +\n                        \"resorting to type 'Numeric'\")\n                if column.primary_key:\n                    self.logger.warning(\n                        \"Column '\" +\n                        column.name +\n                        \"' is a primary key, but is of type 'Decimal'\")\n            else:\n                self.logger.warning(\n                    \"Column '\" +\n                    column.name +\n                    \"' is of type 'Decimal', but contains no mantissas \" +\n                    \"> 0. (i.e. 3.00, 2.00, etc..)\\n \")\n                if maxDigit > 4294967295:\n                    self.logger.warning(\"Coercing to 'BigInteger'\")\n                    column_copy.type = BigInteger()\n                    # Do conversion...\n                    for r in raw_rows:\n                        if r[idx] is not None:\n                            r[idx] = long(r[idx])\n                else:\n                    column_copy.type = Integer()\n                    self.logger.warning(\"Coercing to 'Integer'\")\n                    for r in raw_rows:\n                        if r[idx] is not None:\n                            r[idx] = int(r[idx])\n        elif column.type.__class__.__name__ == \"BIT\":\n            for r in raw_rows:\n                if r[idx] is not None:\n                    null = False\n            self.logger.info(\"Found column of type 'BIT' -> \" +\n                \"coercing to Boolean'\")\n            column_copy.type.__class__ = sqlalchemy.types.Boolean\n        elif \"TYPEENGINE\" in base_classes:\n            for r in raw_rows:\n                if r[idx] is not None:\n                    null = False\n            self.logger.warning(\n                \"Type '{0}' has no base class!\".format(\n                    column.type.__class__.__name__))\n        elif \"VARBINARY\" in base_classes or \"LARGEBINARY\" in base_classes:\n            if self.dst_engine.dialect.name.lower() == \"postgresql\":\n                for r in raw_rows:\n                    if r[idx] is not None:\n                        null = False\n                        r[idx] = r[idx].encode('hex')\n            column_copy.type = LargeBinary()\n        elif \"_BINARY\" in base_classes:\n            for r in raw_rows:\n                if r[idx] is not None:\n                    null = False\n                    r[idx] = r[idx].encode('hex')\n            if self.dst_engine.dialect.name.lower() == \"postgresql\":\n                column_copy.type = BYTEA()\n            else:\n                column_copy.type = BINARY()\n        else:\n            #####################################################\n            # Column type is not dialect-specific, but...\n            # ... we need to check for null columns still b/c\n            # ... we default to True !\n            ######################################################\n            for r in raw_rows:\n                if r[idx] is not None:\n                    null = False\n            # Reset collations...\n            if hasattr(column.type, 'collation'):\n                column_copy.type.collation = None\n            self.logger.info(\"({0}) Class: \".format(\n                column_copy.name) + str(column.type.__class__.__name__))\n            self.logger.info(\n                \"({0}) ---> Bases: \".format(column_copy.name) +\n                str(column.type.__class__.__bases__))\n\n            column_copy.type.__class__ = column.type.__class__.__bases__[0]\n        #########################################\n        # If the entire column is null, and we specify\n        # the option below (skip_column_if_empty),\n        # schedule a 'column_transformer' to delete the\n        # column later ...\n        ########################################\n        if null and self.skip_column_if_empty:\n            # The column should be deleted due to it being empty\n            self.null_column_count += 1\n            self.null_columns.append(column.table.name + \".\" + column.name)\n            self.logger.warning(\n                \"Column '\" +\n                column.table.name +\n                \".\" +\n                column_copy.name +\n                \"' has all NULL entries, skipping...\")\n            self.schema_transformer.schedule_deletion_of_column(\n                    column.name,\n                    column.table.name\n                   )\n        \n        return column_copy\n\n    def add_or_eliminate_column(\n            self,\n            T,\n            T_dst_exists,\n            column,\n            column_copy,\n            raw_rows):\n        self.logger.info(\"Checking column for elimination status...\")\n        old_column_class = column.type.__class__\n        table_name = T.name\n        null = True\n        idx = self.current_ordered_table_columns.index(column.name)\n        \n        cname = column_copy.name\n        columnHasGloballyIgnoredSuffix = len(\n            filter(\n                lambda s: cname.find(s) > -1,\n                self.global_ignored_col_suffixes)) > 0\n\n        oldColumns = self.current_ordered_table_columns\n        oldColumnsLength = len(self.current_ordered_table_columns)\n        ##################################\n        # Transform the column schema below\n        ##################################\n        self.current_ordered_table_columns = \\\n            self.schema_transformer.transform_column(\n                column_copy, T.name, self.current_ordered_table_columns)\n        if oldColumnsLength != len(self.current_ordered_table_columns):\n            # A column is scheduled to be deleted in \"column_transformations_file\"\n            self.logger.warning(\n                \" ------> Column '\" +\n                cname +\n                \"' is scheduled to be deleted -- **NOT** migrating this col..\")\n            self.deleted_column_count += 1\n            self.deleted_columns.append(table_name + \".\" + cname)\n            if T_dst_exists:\n                pass\n                # TODO: Delete the column from T_dst\n            return False\n        elif oldColumns[idx] != self.current_ordered_table_columns[idx]:\n            # Column was renamed\n            if T_dst_exists:\n                pass\n                # TODO Add the column to the table...\n            else:\n                # column_copy has updated datatype...\n                T.append_column(column_copy)\n            self.logger.info(\"Column '{0}' renamed to '{1}'\".format(oldColumns[idx], self.current_ordered_table_columns[idx]))\n            return True\n        else:\n            if T_dst_exists:\n                pass\n                # TODO Add the column to the table...\n            else:\n                T.append_column(column_copy)\n            return True\n\n    def transform_table(self, T):\n        ################################\n        # Run Table Transformations\n        ################################\n        \"\"\" This will update the table 'T' in-place\n        (i.e. change the table's name)\n        \"\"\"\n        if not self.schema_transformer.transform_table(T):\n            self.logger.info(\n                \" ---> Table ({0}) is scheduled to be deleted \" +\n                \"according to table transformations...\".format(T.name))\n            # Clean up FKs and Indexes on this table...\n            del self.indexes[T.name]\n            del self.fks[T.name]\n            self.deleted_table_count += 1\n            self.deleted_columns += map(lambda c: T.name +\n                                       \".\" + c.name, T.columns)\n            self.deleted_column_count += len(T.columns)\n            return None\n        return True\n\n    def check_multiple_autoincrement_issue(self, auto_inc_count, pk_count, T):\n        if pk_count > 1:\n            # Sometimes we can't detect the 'autoincrement' attr on columns\n            # (For instance on SQL Server...)\n            for c in T.columns:\n                if c.primary_key:\n                    c.autoincrement = False\n            # and engine == MySQL.innoDB...\n            if auto_inc_count > 0:\n                # print the verbose warning\n                self.logger.warning(\"\"\"\n                ****************************************************************\n                **** Table '{0}' contains a composite primary key,\n                **** with an auto-increment attribute tagged on 1 of the columns.\n                *****************************************************************\n                ********* --We are dropping the auto-increment field-- **********\n                *****************************************************************\n                ** (why? MySQL -> InnoDB Engine does not support this.\n                ** Try MyISAM for support - understand that Oracle does not allow\n                ** auto-increment fields, but uses sequences to create unique\n                ** composite PKs\")\n                *****************************************************************\n                \"\"\".format(T.name))\n\n    def transform_data(self, T_src, raw_rows):\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        \"\"\" *** TRANSFORMATION *** \"\"\"\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        # Transform the data first\n        # if self.cleaners.get(T_src.name):\n        # TODO: Finish Implementing TableCleaner.clean(rows)\n        # TC = TableCleaner(T_src)\n        # TC.loadCleaners(self.cleaners[table_name])\n        # TC.clean(raw_rows)\n        # Transform the schema second (by updating the column names [keys of\n        # dict])\n        self.schema_transformer.transform_rows(\n            raw_rows, self.original_ordered_table_columns, T_src.name)\n\n    def create_table(self, T_dst_exists, T):\n        with self.dst_engine.connect() as conn:\n            if not T_dst_exists:\n                self.logger.info(\" --> Creating table '{0}'\".format(T.name))\n                try:\n                    T.create(conn)\n                    return True\n                except Exception as e:\n                    self.logger.error(\n                        \"Failed to create table '{0}'\\n\\n{1}\".format(\n                            T.name, e))\n                    raise\n            else:\n                self.logger.warning(\n                    \"Table '{0}' already exists - not creating table, \" +\n                    \"reflecting to get new changes instead..\".format(T.name))\n                self.tgt_insp.reflecttable(T, None)\n                return True\n                # We need to Upsert the data...\n\n    def send_data(self, table, columns):\n        Session = sessionmaker(bind=self.dst_engine)\n        session = Session()\n        data_file_path = os.getcwd() + \"/\" + table + \".sql\"\n\n        self.logger.info(\n            \"Transferring data from local file '{0}' to target DB\".format(\n                table + \".sql\"))\n        if self.dst_engine.dialect.name.lower() == \"mssql\":\n            username = self.dst_engine.url.username\n            password = self.dst_engine.url.password\n            dsn = self.dst_engine.url.host\n            db_name = list(self.dst_engine.execute(\n                \"SELECT DB_NAME()\").fetchall())[0][0]\n            if not self.enable_mssql_bulk_insert:\n                ######################################\n                # SQL Azure does not support BULK INSERT\n                # ... we resort to a Large INSERT statement\n                ######################################\n                self.logger.info(\n                    \"Sending data to target MSSQL instance...\" +\n                    \"(Slow - enable_mssql_bulk_insert = False)\")\n                os.system(\"cat {4} | isql {0} {1} {2} -d{3} -v\"\n                          .format(dsn, username, password,\n                                  db_name, data_file_path))\n                self.logger.info(\"Done.\")\n            else:\n                try:\n                    conn = session.connection()\n                    t1 = conn.begin()\n                    self.logger.info(\"Sending data to target MSSQL instance...\\\n                            (Fast [BULK INSERT])\")\n\n                    conn.execute(\"\"\"BULK INSERT {0} FROM '{1}' WITH (\n                                     fieldterminator = '|,',\n                                     rowterminator = '\\n'\n                                   );\"\"\".format(data_file_path, table))\n                    t1.commit()\n                except sqlalchemy.exc.ProgrammingError as e:\n                    self.logger.critical(\"\"\"\n                        *****************************************************\n                        ** BULK INSERT operation not supported on your target\n                        ** MSSQL server instance.\n                        ** ***********************************\n                        ** [It is likely that you are running on\n                        ** Azure SQL (no bulk insert feature), or AWS SQL\n                        ** Server (no bulkadmin role)].\n                        *****************************************************\n                        **** Re-run with\n                        **      'self.enable_mssql_bulk_insert = False'\n                        **   ...but expect slow data transfer.\n                        ******************************************************\n                        Original Exception:\n                        {0}\"\"\".format(str(e)))\n                    raise(e)\n                self.logger.info(\"Done.\")\n        elif self.dst_engine.dialect.name.lower() == \"mysql\":\n            username = self.dst_engine.url.username\n            password = self.dst_engine.url.password\n            db_name = self.dst_engine.url.database\n            host = self.dst_engine.url.host\n            self.logger.info(\n                \"Sending data to target MySQL instance...(Fast [mysqlimport])\")\n            columns = map(lambda c: \"\\`{0}\\`\".format(c), columns)\n            cmd = (\"mysqlimport -v -h{0} -u{1} -p{2} \"\n                       \"--compress \"\n                       \"--local \"\n                       \"--fields-terminated-by=\\\",\\\" \"\n                       \"--fields-enclosed-by='\\\"' \"\n                       \"--fields-escaped-by='\\\\' \"\n                       # \"--columns={3} \"\n                       \"--lines-terminated-by=\\\"\\n\\\" \"\n                       \"{3} {4}\"\n                      ).format(host, username, password,\n                                       #\",\".join(columns), db_name,\n                                       db_name,\n                                       data_file_path)\n            self.logger.info(cmd)\n            os.system(cmd)\n            self.logger.info(\"Done.\")\n        elif self.dst_engine.dialect.name.lower() == \"postgresql\":\n            # TODO: Take advantage of psql> COPY FROM <payload.sql> WITH\n            # DELIMITER AS \",\"\n            username = self.dst_engine.url.username\n            password = self.dst_engine.url.password\n            db_name = self.dst_engine.url.database\n            host = self.dst_engine.url.host\n            \n            import psycopg2\n            conn = psycopg2.connect(\n                \"\"\"\n                host='{0}'\n                port='5432'\n                dbname='{1}'\n                user='{2}'\n                password='{3}'\n                \"\"\".format(host, db_name, username, password))\n            cur = conn.cursor()\n            # Legacy method (doesn't work if not superuser, and if file is\n            # LOCAL\n            cmd = \"\"\"COPY {0} ({1}) FROM '{2}'\n                    WITH CSV QUOTE ''''\n                    ESCAPE '\\\\' \"\"\".format(\n                table, \",\".join(columns), data_file_path, \"'\")\n            self.logger.info(\n                \"Sending data to target Postgresql instance...\" +\n                \"(Fast [COPY ... FROM ... WITH CSV]):\" +\n                \"\\n ----> {0}\".format(cmd))\n            with open(data_file_path, 'r') as fp_psql:\n                # Most use command below, which loads data_file from STDIN to\n                # work-around permissions issues...\n                null_value = 'NULL'\n                delimiter = '|'\n                quote = \"\\'\"\n                #escape = '/'\n                copy_from_stmt = \"COPY \\\"{0}\\\" FROM STDIN WITH CSV NULL '{1}'\"\\\n                    .format(table, null_value, quote, delimiter)\n                cur.copy_expert(copy_from_stmt, fp_psql)\n                              #columns=tuple(map(lambda c: '\"'+str(c)+'\"', columns)))\n            conn.commit()\n            conn.close()\n            self.logger.info(\"Done.\")\n\n        elif self.dst_engine.dialect.name.lower() == \"sqlite\":\n            db_name = self.dst_engine.url.database\n            self.logger.info(\n                \"Sending data to target sqlite instance...(Fast [.import])\")\n            sqlite_cmd = \".separator \\'|\\'\\\\n.nullvalue NULL\\\\n.import {0} {1}\".format(data_file_path, table)\n            self.logger.info(sqlite_cmd)\n            os.system(\"echo \\\"{0}\\\" | sqlite3 {1}\"\n                    .format(sqlite_cmd, db_name))\n            # ** Note values will be inserted as 'NULL' if they are NULL.\n            \"\"\"\n           with open(\"{0}.sql\".format(table), \"r\") as fp:\n               for line in fp.readlines():\n                   self.dst_engine.execute(line)\n           \"\"\"\n            self.logger.info(\"Done.\")\n        elif self.dst_engine.dialect.name.lower() == \"oracle\":\n            with open(data_file_path, \"r\") as fp_orcl:\n                lines_inserted = 0\n                while True:\n                    next_n_lines = list(islice(fp_orcl, 1001))\n                    lines_inserted += 1000\n                    if not next_n_lines:\n                        break\n                    self.dst_engine.execute(\"\\n\".join(next_n_lines))\n                    self.logger.info(\n                        \"Inserted '{0}' rows\".format(\n                            str(lines_inserted)))\n        else:\n            raise Exception(\"Not Implemented!\")\n        # Cleanup...\n        self.logger.info(\"Cleaning up '{0}'.sql\".format(table))\n        os.remove(data_file_path)\n        self.logger.info(\"Done\")\n\n    \"\"\"\n      Dumps the data to a file called <table_name>.sql in the CWD.\n      Depending on the DB Target, either a CSV will be generated\n      for optimized BULK IMPORT, or an INSERT query will be generated\n      if BULK INSERTING a CSV is not supported (i.e. SQL Azure)\n   \"\"\"\n\n    def dump_data(self, T_dst_exists, T, raw_rows, pks, sessionMaker):\n        t_start_load = datetime.now()\n        conn = self.dst_engine.connect()\n        s = sessionMaker(bind=conn)\n        data_file_path = os.getcwd() + \"/{0}.sql\".format(T.name)\n\n        if not T_dst_exists:\n            # Table \"T\" DNE in the destination table prior to this entire\n            # migration process. We can naively INSERT all rows in the buffer\n            with open(data_file_path, \"a+\") as fp:\n                if not self.enable_mssql_bulk_insert and\\\n                   self.dst_engine.dialect.name.lower() == \"mssql\":\n                    dump_to_sql_statement(T.insert().values(\n                            map(lambda r:\n                                dict(zip(self.current_ordered_table_columns,\n                                         r)),\n                                raw_rows)\n                            ), fp, self.dst_engine, T.name)\n                elif self.dst_engine.dialect.name.lower() == \"oracle\":\n                    self.logger.warning(\n                        \"** BULK INSERT operation not supported by Oracle. \" +\n                        \"Expect slow run-time.\\nThis utilty should be \" +\n                        \"run on the target host to descrease network \" +\n                        \"latency for given this limitation...\")\n                    dump_to_oracle_insert_statements(\n                            fp, self.dst_engine,\n                            T.name, raw_rows,\n                            self.current_ordered_table_columns)\n                else:\n                    dump_to_csv(\n                        fp,\n                        T.name,\n                        self.current_ordered_table_columns,\n                        raw_rows,\n                        self.dst_engine.dialect)\n        else:\n            ########################################\n            # We need to upsert the data...prepare upsertDict...\n            ########################################\n            upsertDict = {}\n            self.logger.info(\"Gathering unique columns for upsert.\")\n            if len(pks) == 0:\n                s = \"There is no primary key defined on table '{0}'!\\n \" +\\\n                    \"We are unable to Upsert into this table without \" +\\\n                    \"identifying unique rows based on PKs!\".format(T.name)\n                raise Exception(s)\n            unique_columns = filter(lambda c: c.name.lower() in pks, T.columns)\n            self.logger.info(\n                \"Unique columns are '{0}'\".format(\n                    str(unique_columns)))\n            q = select(unique_columns)\n            rows = conn.execute(q).fetchall()\n            for r in rows:\n                uid = \"\"\n                for pk in pks:\n                    uid += str(getattr(r, pk))\n                upsertDict[uid] = True\n            ################################\n            # Now upsert each row...\n            ################################\n            self.logger.info(\"Creating 'upsert' statements for '\" +\n                             str(len(raw_rows)) +\n                             \"' rows, and dumping to '\" +\n                             str(T.name) +\n                             \".sql'.\")\n\n            init_len = len(raw_rows)\n            for r in range(init_len - 1, -1, -1):\n                uid = \"\"\n                row = raw_rows[r]\n                for pk in pks:\n                    uid += str(row[self.current_ordered_table_columns.index(pk)])\n                if upsertDict.get(uid):\n                    with open(data_file_path, \"a+\") as fp:\n                        stmt = T.update()\\\n                               .where(and_(*tuple(\n                                   map(lambda pk:\n                                       T.columns[pk] ==\n                                       row[self.current_ordered_table_columns\n                                           .index(pk)],\n                                       pks))))\\\n                               .values(dict(zip(\n                                   self.current_ordered_table_columns, row)))\n                        dump_to_sql_statement(stmt, fp, self.dst_engine, T.name)\n                    del raw_rows[r]\n            #################################\n            # Insert the remaining rows...\n            #################################\n            self.logger.info(\"Creating 'insert' stmts for (the remaining)\" +\n                             str(len(raw_rows)) +\n                             \" rows, and dumping to '\" +\n                             str(T.name) +\n                             \".sql' (because they DNE in the table!).\")\n            insertionCount = (len(raw_rows) / 1000) + 1\n            raw_row_len = len(raw_rows)\n            if len(raw_rows) > 0:\n                self.logger.info(\n                    \" ({0}) -- Inserting remaining '{0}' rows.\"\n                    .format(str(raw_row_len)))\n                with open(data_file_path, \"a+\") as fp:\n                    dump_to_sql_statement(\n                        T.insert().values(raw_rows), fp,\n                        self.dst_engine, T.name)\n        conn.close()\n    # TODO: Have a 'Create' option for each table...\n\n    def migrate(\n            self,\n            destination_database_url,\n            migrate_data=True,\n            migrate_schema=True):\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        \"\"\" ** REFLECTION ** \"\"\"\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n       \n        buffer_size = 10000\n\n        if self.database_url.split(\":\")[0] == \"oracle+cx_oracle\":\n            try:\n                self.engine = create_engine(\n                    self.database_url, arraysize=buffer_size)\n            except ImportError as e:\n                raise DBApiNotFound(self.database_url)\n        else:\n            try:\n                self.engine = create_engine(self.database_url)\n            except ImportError as e:\n                raise DBApiNotFound(self.database_url)\n        # Create inspectors to gather schema info...\n        self.src_insp = reflection.Inspector.from_engine(self.engine)\n        self.table_names = self.src_insp.get_table_names()\n        try:\n            self.dst_engine = create_engine(destination_database_url)\n        except ImportError as e:\n            raise DBApiNotFound(destination_database_url)\n        dst_meta = MetaData()\n\n        Session = sessionmaker(bind=self.dst_engine)\n        dst_meta.bind = self.dst_engine\n\n        self.tgt_insp = reflection.Inspector.from_engine(self.dst_engine)\n\n        TablesIterator = self.table_names  # defaults to ALL tables\n\n        if self.included_tables and self.excluded_tables:\n            raise Exception(\"Can't provide 'included_tables'\" +\n                            \"'excluded_tables', choose 1...aborting...\")\n\n        if self.included_tables:\n            TablesIterator = self.included_tables\n        elif self.excluded_tables:\n            TablesIterator = list(set(TablesIterator) -\n                                  set(self.excluded_tables))\n       \n        t_idx = -1\n        t_total = len(TablesIterator)\n        self.logger.info(\"\"\"\n        *************************\n        *** Total Tables: {0} ***\n        *************************\n        \"\"\".format(str(t_total)))\n        for table_name in TablesIterator:\n            t_idx += 1\n            #######################\n            # Time each table...\n            #######################\n            self.times[table_name] = {}\n            self.table_count += 1\n            self.logger.info(\"Reading Table Schema '\" + table_name + \"'...\")\n            pk_count = 0\n            auto_inc_count = 0\n\n            t_start_extract = datetime.now()\n            T_src = Table(table_name, MetaData())\n            try:\n                self.src_insp.reflecttable(T_src, None)\n            except NoSuchTableError as table:\n                self.logger.error(\n                    \"Table '\" +\n                    table +\n                    \"' not found in DB: '\" +\n                    destination +\n                    \"'.\")\n                continue  # skip to next table...\n            except sqlalchemy.exc.DBAPIError as e:\n                self.logger.error(str(e))\n                # Let SQL Server sleep b/c of FreeTDS buffer clean up issues\n                time.sleep(10)\n                self.src_insp.reflecttable(T_src, None)\n            ###############################\n            # Gather indexes & FKs\n            ###############################\n            self.indexes[table_name] = self.src_insp.get_indexes(table_name)\n            self.fks[table_name] = self.src_insp.get_foreign_keys(table_name)\n            self.logger.info(\n                \"Loaded indexes and FKs for table '{0}'\".format(table_name))\n            if migrate_schema:\n                T = Table(table_name, dst_meta)\n                ###############################\n                # Check if DST table exists...\n                ###############################\n                T_dst_exists = True\n                try:\n                    self.tgt_insp.reflecttable(T, None)\n                except sqlalchemy.exc.NoSuchTableError as e:\n                    T_dst_exists = False\n                    self.logger.warning(\n                        \"Table '\" +\n                        T.name +\n                        \"' does not exist in the dst database \" +\n                        \"(we will create this later...)\")\n\n                \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n                \"\"\" *** EXTRACTION *** \"\"\"\n                \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n                #########################################################\n                # Generate the mapping of 'column_name' -> 'list index'\n                ########################################################\n                cols = map(lambda c: c.name, T_src.columns)\n                self.current_ordered_table_columns = [None] * len(cols)\n                self.original_ordered_table_columns = [None] * len(cols)\n                for i in range(0, len(cols)):\n                    self.original_ordered_table_columns[i] = cols[i]\n                    self.current_ordered_table_columns[i] = cols[i]\n                ###################################\n                # Grab raw rows for data type checking...\n                ##################################\n                self.logger.info(\n                    \"Building query to fetch all rows from {0}\".format(\n                        T_src.name))\n                \n\n                cnt = self.engine.execute(T_src.count()).fetchone()[0]\n                resultProxy = self.engine.execute(T_src.select())\n                self.logger.info(\"Done. ({0} total rows)\".format(str(cnt)))\n                j = 0\n                self.logger.info(\"Loading all rows into memory...\")\n                rows = []\n\n                for i in range(1, (cnt / buffer_size) + 1):\n                    self.logger.info(\n                        \"Fetched {0} rows\".format(str(i * buffer_size)))\n                    rows += resultProxy.fetchmany(buffer_size)\n                rows += resultProxy.fetchmany(cnt % buffer_size)\n                # Don't rely on Python garbage collection...\n                resultProxy.close()\n\n                assert(cnt == len(rows))\n\n                raw_rows = [list(row) for row in rows]\n                self.logger.info(\"Done\")\n                pks = []\n\n                t_start_transform = datetime.now()\n\n                # TODO: Use column/table mappers, would need to update foreign\n                # keys...\n            \n                for column in T_src.columns:\n                    self.column_count += 1\n                    ##############################\n                    # Check for multiple primary\n                    #  keys & auto-increment\n                    ##############################\n                    if column.primary_key:\n                        pks.append(column.name.lower())\n                        pk_count += 1\n                    \n                    if column.autoincrement:\n                        auto_inc_count += 1\n                    ##############################\n                    # Standardize Column Type\n                    ##############################\n                    column_copy = self.standardize_column_type(column, raw_rows)\n                    \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n                    \"\"\" *** ELIMINATION I *** \"\"\"\n                    \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n                    self.add_or_eliminate_column(\n                        T, T_dst_exists, column, column_copy, raw_rows)\n\n                if self.dst_engine.dialect.name.lower() == \"mysql\":\n                    #######################################\n                    # Remove auto-inc on composite PK's\n                    #######################################\n                    self.check_multiple_autoincrement_issue(\n                        auto_inc_count, pk_count, T)\n                if self.transform_table(T) is None:\n                    # Skip the table, it is scheduled to be deleted...\n                    continue\n                elif len(T.columns) == 0:\n                    # TODO: Delete table from T_dst\n                    self.logger.warning(\n                        \"Table '\" + T.name + \"' has all NULL columns, \" +\n                        \"skipping...\")\n                    self.empty_table_count += 1\n                    self.empty_tables.append(T.name)\n                    continue\n                elif len(raw_rows) == 0 and self.skip_table_if_empty:\n                    self.logger.warning(\n                        \"Table '\" + T.name + \"' has 0 rows, skipping...\")\n                    self.empty_table_count += 1\n                    self.empty_tables.append(T.name)\n                    continue\n                else:\n                    tableCreationSuccess = self.create_table(T_dst_exists, T)\n                    if not tableCreationSuccess:\n                        continue\n\n                \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n                \"\"\"\" *** INSERT ROWS *** \"\"\"\"\"\n                \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n                data_file_path = os.getcwd() + \"/{0}.sql\".format(T.name)\n                if os.path.isfile(data_file_path):\n                    os.remove(data_file_path)\n                # Delete the old file if it esists (i.e. if a previous run went\n                # bad and didn't clean up...)\n\n                dst_meta.reflect(self.dst_engine)\n\n                #self.tgt_insp.reflecttable(T, None)\n                t_start_dump = datetime.now()\n                t_start_load = datetime.now()\n                \n                row_buffer_size = 100000\n                if self.dst_engine.dialect.name.lower() == 'mssql' and \\\n                 not self.enable_mssql_bulk_insert:\n                    # MSSQL limits the amount of INSERTS per query\n                    row_buffer_size = 1000\n\n                if migrate_data:\n                    self.logger.info(\"Transforming & Dumping \" +\n                                     str(len(raw_rows)) +\n                                     \" total rows from table '\" +\n                                     str(T.name) +\n                                     \"' into '{0}'.\".format(data_file_path))\n                    # Create buffers of \"100000\" rows\n                    # TODO: Parameterize \"100000\" as 'buffer_size' (should be\n                    # configurable)\n                    insertionCount = (len(raw_rows) / row_buffer_size) + 1\n                    raw_row_len = len(raw_rows)\n                    self.total_rows += raw_row_len\n                    if len(raw_rows) > 0:\n                        for i in range(0, insertionCount):\n                            startRow = 0  # i * 1000\n                            endRow = row_buffer_size  # (i+1) * 1000\n                            virtualStartRow = i * row_buffer_size\n                            virtualEndRow = (i + 1) * row_buffer_size\n                            if virtualEndRow > raw_row_len:\n                                virtualEndRow = raw_row_len\n                                endRow = raw_row_len\n                            self.logger.info(\n                                \" ({0}) -- Transforming rows: \".format(\n                                    T.name) +\n                                str(virtualStartRow) +\n                                \" -> \" +\n                                str(virtualEndRow) +\n                                \"...({0} Total)\".format(\n                                    str(raw_row_len)))\n                            self.transform_data(\n                                T_src, raw_rows[startRow:endRow])\n                            self.logger.info(\n                                \" ({0}) -- Dumping rows: \"\n                                .format(T.name) +\n                                str(virtualStartRow) +\n                                \" -> \" +\n                                str(virtualEndRow) +\n                                \" to '{1}.sql'...({0} Total)\"\n                                .format(str(raw_row_len), T.name) +\n                                \"[Table {0}/{1}]\".format(str(t_idx), str(t_total)))\n                            self.dump_data(\n                                T_dst_exists, T, raw_rows[startRow:endRow],\n                                pks, Session)\n                            del raw_rows[startRow:endRow]\n\n                        #######################################################\n                        # Now *actually* load the data via fast-CLI utilities\n                        #######################################################\n                        t_start_load = datetime.now()\n                        # From <table_name>.sql\n                        self.send_data(\n                            T.name, self.current_ordered_table_columns)\n\n                t_stop_load = datetime.now()\n\n                ###################################\n                # Calculate operation time... ###\n                ###################################\n\n                extraction_dt = t_start_transform - t_start_extract\n                extraction_dt_str = str(\n                    extraction_dt.seconds / 60) + \"m:\" + \\\n                    str(extraction_dt.seconds % 60) + \"s\"\n\n                transform_dt = t_start_dump - t_start_transform\n                transform_dt_str = str(\n                    transform_dt.seconds / 60) + \"m:\" + \\\n                    str(transform_dt.seconds % 60) + \"s\"\n\n                dump_dt = t_start_load - t_start_dump\n                dump_dt_str = str(dump_dt.seconds / 60) + \\\n                    \"m:\" + str(dump_dt.seconds % 60) + \"s\"\n\n                load_dt = t_stop_load - t_start_load\n                load_dt_str = str(load_dt.seconds / 60) + \\\n                    \"m:\" + str(load_dt.seconds % 60) + \"s\"\n\n                self.times[table_name][\n                    'Extraction Time (From Source)'] = extraction_dt_str\n                self.times[table_name][\n                    'Transform Time (Schema)'] = transform_dt_str\n                self.times[table_name][\n                    'Data Dump Time (To File)'] = dump_dt_str\n                self.times[table_name]['Load Time (Into Target)'] = load_dt_str\n                # End first table loop...\n\n    def add_indexes(self, destination_database_url):\n        dst_meta = MetaData()\n        dst_meta.reflect(bind=self.dst_engine)\n        dst_meta.bind = self.dst_engine\n        Session = sessionmaker(bind=self.dst_engine)\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        \"\"\" *** INDEX *** \"\"\"\n        \"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\n        ############################\n        # Add Indexes (Some db's require indexed references...\n        ############################\n        idx_count = 0\n        for table_name in self.indexes.keys():\n            t_start_index = datetime.now()\n            pre_transformed_table_name = table_name\n\n            indexes = self.indexes.get(table_name)\n            ####################################\n            # Check to see if table_name\n            # has been transformed...\n            ####################################\n            table_transform = self.schema_transformer.table_transformations\\\n                .get(table_name)\n            column_transformer = self.schema_transformer.column_transformations\\\n                .get(table_name)\n            if table_transform and table_transform.new_table not in [\"\", None]:\n                # Update the table_name\n                table_name = table_transform.new_table\n            this_idx_count = 0\n            self.logger.info(\"Creating indexes for '\" + table_name + \"'...\")\n            for i in indexes:\n                self.logger.info(str(i))\n                self.total_indexes += 1\n                session = Session()\n                col = i['column_names']\n                continueFlag = False\n                if len(col) == 0:\n                    self.logger.warning(\"Index has no columns! This may be an \" +\n                        \"issue with the metadata reflection function...\" +\n                        \"\\n** This issue is known on MSSQL Sources\")\n                    continueFlag = True\n                unique = i['unique']\n                # Name the index something compatible across all databases\n                # (i.e. can't create Idx w/ same name as column in Postgresql)\n                name = \"IDX_\" + table_name + \"__\" + \\\n                    \"_\".join(col) + \"__\" + str(this_idx_count)\n                # Max length of identifier is 63 characters in \n                # postgresql & mysql\n                if len(name) > 63:\n                    name = name[:60] + \"_\" + str(this_idx_count)\n                # number prefix guarentees uniqueness (i.e. if multiple idx's\n                # on one column)\n                cols = ()\n                self.logger.info(\n                    \"Checking validity of data indexed by: \" +\n                    \"'{0}' (column = '{1}' - table = '{2}')\"\n                    .format(\n                        name, str(col), table_name))\n                for c in col:\n                    #####################################\n                    # Check for Column Transformations...\n                    #####################################\n                    if column_transformer and\\\n                     column_transformer.get(c) and\\\n                     column_transformer[c].new_column not in [\"\", None]:\n                        c = column_transformer[c].new_column\n                    #####################################\n                    # Check to see if the table and colum nexist\n                    #####################################\n                    tableHolder = dst_meta.tables.get(table_name)\n                    if tableHolder is None:\n                        continueFlag = True\n                        self.logger.warning(\n                            \"Skipping index '\" + str(name) + \"' on column '\" +\n                            table_name + \".\" + c + \"' because the table DNE\" +\n                            \" in the destination DB schema.\")\n                    else:\n                        columnHolder = dst_meta.tables.get(\n                            table_name).columns.get(c)\n                        if str(columnHolder) == 'None':\n                            self.logger.warning(\n                                \"Skipping index '\" + str(name) + \"' on col\" +\n                                \"' \" + table_name + \".\" + c + \"' because the\" +\n                                \" column DNE in the destination DB schema.\")\n                            continueFlag = True  # Skip to the next table...\n                        cols += (dst_meta.tables.get(table_name).columns.\n                                 get(c),)\n                if continueFlag:\n                    self.skipped_index_count += 1\n                    continue\n                    # Don't create this Index - the table/column don't exist!\n\n                I = Index(name, *cols, unique=unique)\n\n                violationCount = 0\n                if unique:\n                    ############################################\n                    # Check for Unique Constraint Violations\n                    ############################################\n                    cols_tuple = tuple(cols)\n                    # We have a composite index, let's deal with it...\n                    if len(cols_tuple) > 1:\n                        uniqueGroups = session.query(\n                            *\n                            cols_tuple).group_by(\n                            *\n                            cols_tuple).count()\n                        totalEntries = session.query(*cols_tuple).count()\n                        # The difference represents repeated combinations of\n                        # 'cols_tuple'\n                        violationCount = totalEntries - uniqueGroups\n                    else:\n                        violationCount = session.query(\n                            *\n                            cols_tuple).group_by(\n                            *\n                            cols_tuple). having(\n                            func.count(\n                                *\n                                cols_tuple) > 1).count()\n                if violationCount > 0:\n                    self.logger.error(\n                        \"Duplicates found in column '\" +\n                        str(col) +\n                        \"' for unique index '\" +\n                        name)\n                    self.unique_constraint_violations.append(\n                        name + \" (\" + str(col) + \")\")\n                    self.unique_constraint_violation_count += violationCount\n                    self.skipped_index_count += 1\n                    # TODO: Gather bad rows...\n                else:\n                    self.logger.info(\"Adding Index: \" + str(I))\n                    session.close()\n                    try:\n                        I.create(self.dst_engine)\n                    except sqlalchemy.exc.OperationalError as e:\n                        self.logger.warning(str(e) + \"\\n -- it is likely \" +\n                                            \"that the Index already exists...\")\n                        self.skipped_index_count += 1\n                        continue\n                    idx_count += 1\n                    this_idx_count += 1\n            self.logger.info(\n                \"\"\" Done. (Added '{0}' indexes to '{1}')\"\"\"\n                .format(str(this_idx_count), table_name))\n\n            t_stop_index = datetime.now()\n            index_dt = t_stop_index - t_start_index\n            self.times[pre_transformed_table_name]['Indexing Time'] = \\\n                str(index_dt.seconds / 60) + \"m:\" + \\\n                str(index_dt.seconds % 60) + \"s\"\n\n        self.index_count = idx_count\n\n    def add_fks(self, destination_database_url):\n        ############################\n        # Add FKs\n        ############################\n        dst_meta = MetaData()\n        \n        if self.dst_engine.dialect.name.lower() == \"mssql\":\n            raise Exception(\n                \"Adding Constraints to MSSQL is not supported\" +\n                \" by sqlalchemy_migrate...\")\n        dst_meta.reflect(bind=self.dst_engine)\n        dst_meta.bind = self.dst_engine\n        Session = sessionmaker(bind=self.dst_engine)\n        ##########################\n        # HERE BE HACKS!!!!\n        ##########################\n        \"\"\"\n        Problem: often times when porting DBs, data is old, not properly\n        constrained and overall messy. FK constraints get violated without DBAs\n        knowing it (in engines that don't enforce or support FK constraints)\n\n        Hack: Turn off FK checks when porting FKs..\n\n        Better Solution: ...would be to insert data AFTER fks are inserted, row\n        by row, and ask the user to correct the row in question, or delete it,\n        this is more of a 'transform' operation than a 'Constraint' op...\n        \"\"\"\n\n        if self.dst_engine.dialect.name.upper() == \"MYSQL\":\n            self.dst_engine.execute(\"SET foreign_key_checks = 0\")\n        elif self.dst_engine.dialect.name.upper() == \"POSTGRESQL\":\n            self.logger.warning(\n                \"Can't disable foreign key checks on POSTGRSQL\")\n        else:\n            self.logger.warning(\"Can't disable foreign key checks...\")\n\n        inspector = self.tgt_insp\n        for table_name in self.fks.keys():\n            pre_transformed_table_name = table_name\n            t_start_constraint = datetime.now()\n            fks = self.fks[table_name]\n            ####################################\n            # Check to see if table_name\n            # has been transformed...\n            ####################################\n            table_transform = self.schema_transformer.table_transformations.get(\n                table_name)\n            if table_transform and table_transform.new_table not in [\"\", None]:\n                # Update the table_name\n                table_name = table_transform.new_table\n            self.logger.info(\n                \"Adding FKs to table '{0}' (previously {1})\".format(\n                    table_name, pre_transformed_table_name))\n            ########################\n            # Check that constrained table\n            # exists in destiation DB schema\n            ########################\n\n            T = Table(table_name, dst_meta)\n            try:\n                inspector.reflecttable(T, None)\n            except sqlalchemy.exc.NoSuchTableError as e:\n                self.logger.warning(\n                    \"Skipping FK constraints on table '\" +\n                    str(table_name) +\n                    \"' because the constrained table does not\" +\n                    \" exist in the destination DB schema.\")\n                self.skipped_fk_count += len(self.fks[table_name])\n                self.total_fks += len(self.fks[table_name])\n                continue  # on to the next table...\n\n            for fk in fks:\n                cons_column_transformer = \\\n                        self.schema_transformer.column_transformations.get(\n                         pre_transformed_table_name)\n                self.total_fks += 1\n                session = Session()\n                #####################################\n                # Check for Column Transformations...\n                #####################################\n                constrained_columns = []\n                for c in fk['constrained_columns']:\n                    if cons_column_transformer and \\\n                     cons_column_transformer.get(c) and \\\n                     cons_column_transformer[c].new_column not in [\"\", None]:\n                        c = cons_column_transformer[c].new_column\n                    constrained_columns.append(c)\n                constrained_cols = filter(lambda c: c is not None,\n                        map(lambda x: T.columns.get(x),\n                              constrained_columns))\n                         \n\n                ################################\n                # Check that the constrained columns\n                # exists in the destiation db schema\n                ################################\n                if len(constrained_cols) < len(fk['constrained_columns']):\n                    self.logger.warning(\"Skipping FK constraint '\" +\n                                        constraint_name +\n                                        \"' because constrained columns '\" +\n                                        str(fk['constrained_columns']) +\n                                        \"' on table '\" +\n                                        str(table_name) +\n                                        \"' don't exist in the destination \" +\n                                        \"DB schema.\")\n                    session.close()\n                    self.skipped_fk_count += 1\n                    continue\n                ref_table = fk['referred_table']\n\n                ####################################\n                # Check to see if table_name\n                # has been transformed...\n                ####################################\n                table_transform = \\\n                    self.schema_transformer.table_transformations.get(\n                                  ref_table)\n                ref_column_transformer = \\\n                    self.schema_transformer.column_transformations.get(\n                                  ref_table)\n                if table_transform and table_transform.new_table not in [\n                        \"\", None]:\n                    # Update the table_name\n                    ref_table = table_transform.new_table\n                T_ref = Table(ref_table, dst_meta)\n                ############################\n                # Check that referenced table\n                # exists in destination DB schema\n                ############################\n                constraint_name = \"FK__{0}__{1}\".format(\n                    table_name.upper(), T_ref.name.upper())\n                if len(constraint_name) > 63:\n                    constraint_name = constraint_name[:63]\n                \n                try:\n                    inspector.reflecttable(T_ref, None)\n                except sqlalchemy.exc.NoSuchTableError as e:\n                    self.logger.warning(\n                        \"Skipping FK constraint '\" +\n                        constraint_name +\n                        \"' because referenced table '\" +\n                        ref_table +\n                        \"' doesn't exist in the destination DB schema.\" +\n                        \" (FK Dependency not met)\")\n                    session.close()\n                    self.skipped_fk_count += 1\n                    continue\n                ############################\n                # Check that referenced columns\n                # Exist in destination DB schema\n                ############################\n                ref_columns = []\n                for c in fk['referred_columns']:\n                    if ref_column_transformer and \\\n                     ref_column_transformer.get(c) and \\\n                     ref_column_transformer[c].newColumns not in [\"\", None]:\n                        c = ref_column_transformer[c].newColumn\n                    ref_columns.append(c)\n                referred_columns = map(\n                    lambda x: T_ref.columns.get(x), ref_columns)\n                self.logger.info(\"Ref Columns: \" + str(ref_columns))\n                if len(referred_columns) < len(fk['referred_columns']):\n                    self.logger.warning(\"Skipping FK constraint '\" +\n                                        constraint_name +\n                                        \"' because referenced columns '\" +\n                                        str(fk['referred_columns']) +\n                                        \"' on table '\" +\n                                        str(ref_table) +\n                                        \"' don't exist in the destination \" +\n                                        \"DB schema.\")\n                    session.close()\n                    self.skipped_fk_count += 1\n                    continue\n\n                ##################################\n                # Check for referential integrity violations\n                ##################################\n                if self.check_referential_integrity:\n                    if self.dst_engine.dialect.name.upper(\n                    ) in [\"MYSQL\", \"POSTGRESQL\"]:  # HACKS\n                        self.logger.info(\n                            \"Checking referential integrity of '\" +\n                            str(table_name) +\n                            \".\" +\n                            str(constrained_columns) +\n                            \" -> '\" +\n                            str(\n                                T_ref.name) +\n                            \".\" +\n                            str(ref_columns) +\n                            \"'\")\n                        t = session.query(\n                            T_ref.columns.get(\n                                referred_columns[0].name))\n                        query2 = session.query(T)\n\n                        q = query2.filter(\n                            and_(\n                                ~T.columns.get(\n                                    constrained_cols[0].name).in_(t),\n                                T.columns.get(\n                                    constrained_cols[0].name) is not None))\n                        bad_rows = session.execute(q).fetchall()\n\n                        if len(bad_rows) > 0:\n                            self.logger.warning(\"FK from '\" +\n                                                T.name +\n                                                \".\" +\n                                                constrained_cols[0].name +\n                                                \" -> \" +\n                                                T_ref.name +\n                                                \".\" +\n                                                referred_columns[0].name +\n                                                \" was violated '\" +\n                                                str(len(bad_rows)) +\n                                                \"' times.\")\n                            self.referential_integrity_violations += len(\n                                bad_rows)\n                            for row in bad_rows:\n                                self.riv_arr.append(str(row.values()))\n\n                    else:\n                        self.logger.warning(\n                            \"Adding constraints only supported/tested for \" +\n                            \"MySQL\")\n                self.logger.info(\"Adding FK '\" + constraint_name + \"' to '\" +\n                                 table_name + \"'\")\n                session.close()\n                cons = ForeignKeyConstraint(\n                    name=constraint_name,\n                    columns=constrained_cols,\n                    refcolumns=referred_columns,\n                    table=T)\n                # Loop to handle tables that reference other tables w/ multiple\n                # columns & FKs\n                creation_succesful = False\n                max_fks = 15\n                cnt = 0\n                while not creation_succesful:\n                    try:\n                        cons.create(self.dst_engine)\n                        creation_succesful = True\n                    except sqlalchemy.exc.OperationalError as e:\n                        # MySQL Exception\n                        self.logger.warning(\n                            str(e) + \"\\n ---> an FK on this table already \" +\n                            \"references the ref_table...appending '{0}' to\" +\n                            \" FK's name and trying again...\".format(\n                                str(cnt)))\n                        cons = ForeignKeyConstraint(\n                            name=constraint_name +\n                            \"_{0}\".format(\n                                str(cnt)),\n                            columns=constrained_cols,\n                            refcolumns=referred_columns,\n                            table=T)\n                        cnt += 1\n                        if cnt == max_fks:\n                            self.logger.error(\n                                \"FK creation was unsuccesful \" +\n                                \"(surpassed max number of FKs on 1 table\" +\n                                \"which all reference another table)\")\n                            self.skipped_fk_count += 1\n                            break\n                    except sqlalchemy.exc.ProgrammingError as e:\n                        # PostgreSQL Exception\n                        self.logger.warning(\n                            str(e) +\n                            \"\\n ---> an FK on this table already references \" +\n                            \"the ref_table...appending '{0}' to FK's name \" +\n                            \"and trying again...\".format(\n                                str(cnt)))\n                        cons = ForeignKeyConstraint(\n                            name=constraint_name +\n                            \"_{0}\".format(\n                                str(cnt)),\n                            columns=constrained_cols,\n                            refcolumns=referred_columns,\n                            table=T)\n                        cnt += 1\n                        if cnt == max_fks:\n                            self.logger.error(\n                               \"FK creation was unsuccesful (surpassed max \" +\n                               \"number of FKs on 1 table which all reference\" +\n                               \" another table)\")\n                            self.skipped_fk_count += 1\n                            break\n\n                self.fk_count += 1\n            t_stop_constraint = datetime.now()\n            constraint_dt = t_stop_constraint - t_start_constraint\n            constraint_dt_str = str(constraint_dt.seconds / 60) + \"m:\" +\\\n                str(constraint_dt.seconds % 60) + \"s\"\n\n            self.times[pre_transformed_table_name][\n                'Constraint Time'] = constraint_dt_str\n\n    def print_timings(self):\n        stop = datetime.now()\n        dt = stop - self.start\n        timeString = \"\"\n        # if dt.seconds > 3600:\n        #    timeString += (str(int(dt.seconds / 3600)) + \":\")\n        timeString += str(dt.seconds / 60) + \"m:\" + str(dt.seconds % 60) + \"s\"\n        self.logger.info(\"\"\"\n       ========================\n       === * Sync Summary * ===\n       ========================\\n\n       Total Tables:                     {0}\n       -- Empty Tables   (skipped)       {1}\n       -- Deleted Tables (skipped)       {15}\n       -- Synced Tables                  {2}\\n\n       ========================\\n\n       Total Columns:                    {3}\n       -- Empty Columns   (skipped)      {4}\n       -- Deleted Columns (skipped)      {16}\n       -- Synced Columns                 {5}\\n\n       ========================\\n\n       Total Indexes                     {8}\n       -- Skipped Indexes                {11}\n       -- Synced Indexes                 {12}\\n\n       ========================\\n\n       Total FKs                         {9}\n       -- Skipped FKs                    {13}\n       -- Synced FKs                     {14}\\n\n       ========================\\n\n       Referential Integrity Violations: {6}\n       ========================\\n\n       Unique Constraint Violations:     {10}\n       ========================\\n\n       Total Time:                       {7}\n       Total Rows:                       {17}\n       Rows per Minute:                  {18}\\n\\n\"\"\".format(\n            str(self.table_count),\n            str(self.empty_table_count),\n            str(self.table_count - self.empty_table_count),\n            str(self.column_count),\n            str(self.null_column_count),\n            str(self.column_count - self.null_column_count),\n            str(self.referential_integrity_violations),\n            timeString,\n            str(self.total_indexes),\n            str(self.total_fks),\n            str(self.unique_constraint_violation_count),\n            str(self.skipped_index_count),\n            str(self.index_count),\n            str(self.skipped_fk_count),\n            str(self.fk_count),\n            str(self.deleted_table_count),\n            str(self.deleted_column_count),\n            str(self.total_rows),\n            str(self.total_rows / ((dt.seconds / 60) or 1))))\n        # self.logger.warning(\"Referential Integrity \" +\n        # \"Violations: \\n\" + \"\\n\".join(self.riv_arr))\n        self.logger.warning(\n            \"Unique Constraint Violations: \" +\n            \"\\n\".join(\n                self.unique_constraint_violations))\n\n        self.logger.info(\"\"\"\n       =========================\n       === ** TIMING INFO ** ===\n       =========================\n                _____\n             _.'_____`._\n           .'.-'  12 `-.`.\n          /,' 11      1 `.\\\\\n         // 10      /   2 \\\\\\\\\n        ;;         /       ::\n        || 9  ----O      3 ||\n        ::                 ;;\n         \\\\\\\\ 8           4 //\n          \\`. 7       5 ,'/\n           '.`-.__6__.-'.'\n            ((-._____.-))\n            _))       ((_\n           '--'       '--'\n       __________________________\n       \"\"\")\n        ordered_timings = [\n            \"Extraction Time (From Source)\",\n            \"Transform Time (Schema)\",\n            \"Data Dump Time (To File)\",\n            \"Load Time (Into Target)\",\n            \"Indexing Time\",\n            \"Constraint Time\"]\n        for (table_name, timings) in self.times.iteritems():\n            self.logger.info(table_name)\n            for key in ordered_timings:\n                self.logger.info(\"-- \" + str(key) + \": \" +\n                    str(timings.get(key) or 'N/A'))\n            self.logger.info(\"_________________________\")\n\n        self.schema_transformer.failed_transformations = list(\n            self.schema_transformer.failed_transformations)\n        if len(self.schema_transformer.failed_transformations) > 0:\n            self.logger.critical(\n                \"\\n\".join(self.schema_transformer.failed_transformations))\n            self.logger.critical(\"\"\"\n           !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n           !!!! * '{0}' Old Columns had failed transformations !!!!\n           !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\n           \"\"\".format(str(len(self.schema_transformer.failed_transformations))))\n\n            self.logger.critical(\n                \"\\n\".join(self.schema_transformer.failed_transformations))\n\n        ###########################################\n        # Write 'Deleted' columns out to a file...\n        ###########################################\n        removedColumns = self.deleted_columns + self.null_columns\n        with open(\"deleted_columns.csv\", \"w\") as fp:\n            fp.write(\"\\n\".join(map(lambda c:\n                     c.replace(\".\", \",\"), removedColumns)))\n"
  },
  {
    "path": "etlalchemy/ETLAlchemyTarget.py",
    "content": "from etlalchemy_exceptions import DBApiNotFound\nfrom sqlalchemy_utils import database_exists, create_database, drop_database\nfrom sqlalchemy import create_engine, MetaData\n# import dill\nimport logging\n\n\nclass ETLAlchemyTarget():\n    def __init__(self, conn_string, drop_database=False):\n        self.drop_database = drop_database\n        self.conn_string = conn_string\n        self.dst_engine = None\n        ##########################\n        # Right now we only assume  sql database...\n        ##########################\n        self.sources = []\n        self.logger = logging.getLogger(\"ETLAlchemyTarget\")\n        for h in list(self.logger.handlers):\n            # Clean up any old loggers...\n            # (useful during testing w/ multiple log_files)\n            self.logger.removeHandler(h)\n        handler = logging.StreamHandler()\n        formatter = logging.Formatter('%(name)s (%levelname)s) - %(message)s')\n        self.logger.addHandler(handler)\n        self.logger.setLevel(logging.INFO)\n    # Add an ETLAlchemySource to the list of 'sources'\n    \"\"\" Each 'migrator' represents a source SQL DB \"\"\"\n    def addSource(self, source):\n        if not getattr(source, 'migrate'):\n            raise Exception(\"Source '\" + str(source) +\n                            \"' has no function 'migrate'...\")\n        self.sources.append(source)\n\n    def migrate(self, migrate_schema=True, migrate_data=True,\n                migrate_fks=True, migrate_indexes=True):\n        try:\n            self.dst_engine = create_engine(self.conn_string)\n        except ImportError as e:\n            raise DBApiNotFound(self.conn_string)\n        if self.drop_database:\n            self.logger.info(self.dst_engine.dialect.name)\n            ############################\n            # Hack for SQL Server using DSN's\n            # and not havine DB name in connection_string\n            ############################\n            if self.dst_engine.dialect.name.upper() == \"MSSQL\":\n                db_name = list(self.dst_engine.execute(\n                    \"SELECT DB_NAME()\").fetchall())[0][0]\n                self.logger.warning(\n                        \"Can't drop database {0} on MSSQL, \" +\n                        \"dropping tables instead...\".format(db_name))\n                m = MetaData()\n                m.bind = self.dst_engine\n                m.reflect()\n                m.drop_all()\n            elif self.dst_engine.dialect.name.upper() == \"ORACLE\":\n                db_name = list(self.dst_engine.execute(\n                    \"SELECT SYS_CONTEXT('userenv','db_name') \" +\n                    \"FROM DUAL\").fetchall())[0][0]\n                self.logger.warning(\n                        \"Can't drop database {0} on ORACLE, \" +\n                        \"dropping tables instead...\".format(db_name))\n                m = MetaData()\n                m.bind = self.dst_engine\n                m.reflect()\n                m.drop_all()\n            else:\n                if self.dst_engine.url and database_exists(self.dst_engine.url):\n                    self.logger.warning(self.dst_engine.url)\n                    self.logger.warning(\n                            \"Dropping database '{0}'\"\n                            .format(self.conn_string.split(\"/\")[-1]))\n                    drop_database(self.dst_engine.url)\n                    self.logger.info(\n                            \"Creating database '{0}'\"\n                            .format(self.conn_string.split(\"/\")[-1]))\n                    create_database(self.dst_engine.url)\n                else:\n                    self.logger.info(\"Database DNE...no need to nuke it.\")\n                    create_database(self.dst_engine.url)\n        for source in self.sources:\n            self.logger.info(\n                    \"Sending source '\" + str(source) + \"' to destination '\" +\n                    str(self.conn_string) + \"'\")\n            source.migrate(self.conn_string, migrate_schema=migrate_schema,\n                           migrate_data=migrate_data)\n            if migrate_indexes:\n                source.add_indexes(self.conn_string)\n            if migrate_fks:\n                if self.dst_engine.dialect.name.lower() == \"mssql\":\n                    self.logger.warning(\n                            \"** SKIPPING 'Add Foreign Key Constraints' \" +\n                            \"BECAUSE 'sqlalchemy_migrate' DOES NOT \" +\n                            \"SUPPORT fk.create() ON *MSSQL*\")\n                else:\n                    source.add_fks(self.conn_string)\n            source.print_timings()\n"
  },
  {
    "path": "etlalchemy/__init__.py",
    "content": "from ETLAlchemySource import ETLAlchemySource\nfrom ETLAlchemyTarget import ETLAlchemyTarget\n"
  },
  {
    "path": "etlalchemy/etlalchemy_exceptions.py",
    "content": "class DBApiNotFound(Exception):\n    def __init__(self, conn_string):\n        dialect_to_db_apis = {\n            'oracle+cx_oracle': 'cx_Oracle',\n            'mysql': 'MySQL-python',\n            'postgresql': 'psycopg2',\n            'mssql+pyodbc': 'pyodbc',\n            'sqlite': 'sqlite3'\n        }\n        dialect_to_walkthrough_urls = {\n            'oracle+cx_oracle': 'sharrington/databases/oracle/install-cx_oracle-mac',\n        }\n        dialect = conn_string.split(\":\")[0]\n        db_api = dialect_to_db_apis.get(dialect) or \\\n            \"No driver found for dialect '{0}'\".format(dialect)\n        self.msg = \"\"\"\n  ********************************************************\n  ** While creating the engine for '{0}', SQLAlchemy tried to\n  ** import the DB API module '{1}' but failed.\n  ********************************************************\n  **  + This is because 1 of 2 reasons:\n  **  1.) You forgot to install the DB API module '{1}'.\n  **  --> (Try: 'pip install {1}')\n  **  2.) If the above step fails, you most likely forgot to\n  **  --> install the actual database driver on your local\n  **  --> machine! The driver is needed in order to install\n  **  --> the Python DB API ('{1}').\n  **  --> (see the following link for instructions):\n  ** https://thelaziestprogrammer.com/{2}\n  **********************************************************\n        \"\"\".format(conn_string, db_api, dialect_to_walkthrough_urls.get(dialect) or \"\")\n\n    def __str__(self):\n        return self.msg\n"
  },
  {
    "path": "etlalchemy/literal_value_generator.py",
    "content": "import shutil\nimport decimal\nimport datetime\n# Find the best implementation available on this platform\ntry:\n    from cStringIO import StringIO\nexcept:\n    from StringIO import StringIO\n\ndef _generate_literal_value_for_csv(value, dialect):\n    dialect_name = dialect.name.lower()\n    \n    if isinstance(value, basestring):\n        if dialect_name in ['sqlite', 'mssql']:\n            # No support for 'quote' enclosed strings\n            return \"%s\" % value\n        else:\n            value = value.replace('\"', '\"\"')\n            return \"\\\"%s\\\"\" % value\n    elif value is None:\n        return \"NULL\"\n    elif isinstance(value, bool):\n        return \"%s\" % int(value)\n    elif isinstance(value, (float, int, long)):\n        return \"%s\" % value\n    elif isinstance(value, decimal.Decimal):\n        return str(value)\n    elif isinstance(value, datetime.datetime):\n        if dialect_name == \"mysql\":\n            return '%02d-%02d-%02d %02d:%02d:%02d' %\\\n                    (value.year,value.month,value.day,value.hour,value.minute,value.second)\n        elif dialect_name == \"oracle\":\n            return \"TO_DATE('%s','YYYY-MM-DD HH24:MI:SS')\" %\\\n                    ('%02d-%02d-%02d %02d:%02d:%02d' %\\\n                        (value.year,value.month,value.day,value.hour,value.minute,value.second))\n                #value.strftime(\"%Y-%m-%d %H:%M:%S\")\n        elif dialect_name == \"postgresql\":\n            return '%02d-%02d-%02d %02d:%02d:%02d' %\\\n                    (value.year,value.month,value.day,value.hour,value.minute,value.second)\n            #return '%Y-%m-%d %H:%M:%S'.format(value)\n            #return \"\\\"%s\\\"\" % value.strftime(\"%Y-%m-%d %H:%M:%S\")\n        elif dialect_name == \"mssql\":\n            #return \"'%s'\" % value.strftime(\"%m/%d/%Y %H:%M:%S.%p\")\n            return '%02d%02d%02d %02d:%02d:%02d.0' %\\\n                    (value.year,value.month,value.day,value.hour,value.minute,value.second)\n\n        elif dialect_name == \"sqlite\":\n            #return \"%s\" % value.strftime(\"%Y-%m-%d %H:%M:%S.%f\")\n            return '%02d-%02d-%02d %02d:%02d:%02d.0' %\\\n                    (value.year,value.month,value.day,value.hour,value.minute,value.second)\n        else:\n            raise NotImplementedError(\n                    \"No support for engine with dialect '%s'. \" +\n                    \"Implement it here!\" % dialect.name)\n    elif isinstance(value, datetime.date):\n        if dialect_name == \"mysql\":\n            return '%02d-%02d-%02d' %\\\n                    (value.year,value.month,value.day)\n        elif dialect_name == \"oracle\":\n            return \"TO_DATE('%s', 'YYYY-MM-DD')\" %\\\n                    ('%02d-%02d-%02d' % (value.year,value.month,value.day))\n        elif dialect_name == \"postgresql\":\n            return '%02d-%02d-%02d' %\\\n                    (value.year,value.month,value.day)\n        elif dialect_name == \"mssql\":\n            return \"'%02d/%02d/%02d'\" %\\\n                    (value.year,value.month,value.day)\n        elif dialect_name == \"sqlite\":\n            return \"%02d-%02d-%02d\" %\\\n                    (value.year,value.month,value.day)\n        else:\n            raise NotImplementedError(\n                    \"No support for engine with dialect '%s'.\" +\n                    \"Implement it here!\" % dialect.name)\n    \n    else:\n        raise NotImplementedError(\n                    \"Don't know how to literal-quote value %r\" % value)\n\n\ndef _generate_literal_value(value, dialect):\n    dialect_name = dialect.name.lower()\n    if isinstance(value, basestring):\n        value = value.replace(\"'\", \"''\")\n        return \"'%s'\" % value\n    elif value is None:\n        return \"NULL\"\n    elif isinstance(value, bool):\n        return \"%s\" % int(value)\n    elif isinstance(value, (float, int, long)):\n        return \"%s\" % value\n    elif isinstance(value, decimal.Decimal):\n        return str(value)\n    elif isinstance(value, datetime.datetime):\n        #if dialect_name == \"mysql\":\n        #    return \"STR_TO_DATE('%s','%%Y-%%m-%%d %%H:%%M:%%S')\" %\\\n        #        value.strftime(\"%Y-%m-%d %H:%M:%S\")\n        if dialect_name == \"oracle\":\n            return \"TO_DATE('%s','YYYY-MM-DD HH24:MI:SS')\" %\\\n                    ('%02d-%02d-%02d %02d:%02d:%02d' %\\\n                        (value.year,value.month,value.day,value.hour,value.minute,value.second))\n        #elif dialect_name == \"postgresql\":\n        #    return \"to_date('%s', 'YYYY-MM-DD HH24:MI:SS')\" %\\\n        #        value.strftime(\"%Y-%m-%d %H:%M:%S\")\n        elif dialect_name == \"mssql\":\n            #return \"'%s'\" % value.strftime(\"%Y%m%d %H:%M:%S %p\")\n            return \"'%02d%02d%02d %02d:%02d:%02d 0'\" %\\\n                    (value.year,value.month,value.day,value.hour,value.minute,value.second)\n        #elif dialect_name == \"sqlite\":\n        #    return \"'%s'\" % value.strftime(\"%Y-%m-%d %H:%M:%S.%f\")\n        else:\n            raise NotImplementedError(\n                    \"No support for engine with dialect '%s'. \" +\n                    \"Implement it here!\" % dialect.name)\n    elif isinstance(value, datetime.date):\n        #if dialect_name == \"mysql\":\n        #    return \"STR_TO_DATE('%s','%%Y-%%m-%%d')\" %\\\n        #        value.strftime(\"%Y-%m-%d\")\n        if dialect_name == \"oracle\":\n            return \"TO_DATE('%s', 'YYYY-MM-DD')\" %\\\n                ('%02d-%02d-%02d' % (value.year,value.month,value.day))\n        #elif dialect_name == \"postgresql\":\n        #    return \"to_date('%s', 'YYYY-MM-DD')\" %\\\n        #        value.strftime(\"%Y-%m-%d\")\n        elif dialect_name == \"mssql\":\n            return \"'%02d%02d%02d'\" % (value.year,value.month,value.day)\n        #elif dialect_name == \"sqlite\":\n        #    return \"'%s'\" % value.strftime(\"%Y-%m-%d\")\n        else:\n            raise NotImplementedError(\n                    \"No support for engine with dialect '%s'. \" +\n                    \"Implement it here!\" % dialect.name)\n\n    else:\n        raise NotImplementedError(\n            \"Don't know how to literal-quote value %r\" % value)\n\n\ndef dump_to_oracle_insert_statements(fp, engine, table, raw_rows, columns):\n    ##################################\n    # No Bulk Insert available in Oracle\n    ##################################\n    # TODO: Investigate \"sqlldr\" CLI utility to handle this load...\n    lines = []\n    lines.append(\"INSERT INTO {0} (\".format(table) +\n                 \",\".join(columns) +\n                 \")\\n\")\n    num_rows = len(raw_rows)\n    dialect = engine.dialect\n    for i in range(0, num_rows):\n        if i == num_rows-1:\n            # Last row...\n            lines.append(\"SELECT \" +\n                         \",\".join(map(lambda c: _generate_literal_value(\n                             c, dialect), raw_rows[i])) +\n                         \" FROM DUAL\\n\")\n        else:\n            lines.append(\"SELECT \" +\n                         \",\".join(map(lambda c: _generate_literal_value(\n                             c, dialect), raw_rows[i])) +\n                         \" FROM DUAL UNION ALL\\n\")\n    fp.write(''.join(lines))\n\n\n# Supported by [MySQL, Postgresql, sqlite, SQL server (non-Azure) ]\ndef dump_to_csv(fp, table_name, columns, raw_rows, dialect):\n    lines = []\n    separator = \",\"\n    # Determine the separator based on Target DB \n    if dialect.name.lower() in [\"sqlite\"]:\n        separator = \"|\"\n    elif dialect.name.lower() in [\"mssql\"]:\n        separator = \"|,\"\n        \n    num_cols = len(raw_rows[0])\n    num_rows = len(raw_rows)\n    out = StringIO()\n    for i in range(0, num_rows):\n        for j in range(0, num_cols - 1):\n            out.write(_generate_literal_value_for_csv(raw_rows[i][j], dialect))\n            out.write(separator)\n        # Print the last column w/o the separator\n        out.write(_generate_literal_value_for_csv(raw_rows[i][num_cols - 1], dialect) + \"\\n\")\n    out.seek(0)\n    fp.write(out.getvalue())\n            \n\ndef generate_literal_value(value, dialect, type_):\n    \"\"\"Render the value of a bind parameter as a quoted literal.\n\n    This is used for statement sections that do not accept bind paramters\n    on the target driver/database.\n\n    This should be implemented by subclasses using the quoting services\n    of the DBAPI.\n\n    \"\"\"\n    return _generate_literal_value(value, dialect)\n\n\ndef dump_to_sql_statement(statement, fp, bind=None, table_name=None):\n    \"\"\"\n    print a query, with values filled in\n    for debugging purposes *only*\n    for security, you should always separate queries from their values\n    please also note that this function is quite slow\n    \"\"\"\n    import sqlalchemy.orm\n    if isinstance(statement, sqlalchemy.orm.Query):\n        if bind is None:\n            bind = statement.session.get_bind(\n                    statement._mapper_zero_or_none()\n            )\n        statement = statement.statement\n    elif bind is None:\n        bind = statement.bind\n\n    dialect = bind.dialect\n    compiler = statement._compiler(dialect)\n\n    class LiteralCompiler(compiler.__class__):\n        def visit_bindparam(\n                self, bindparam, within_columns_clause=False,\n                literal_binds=False, **kwargs\n        ):\n            return super(LiteralCompiler, self).render_literal_bindparam(\n                    bindparam, within_columns_clause=within_columns_clause,\n                    literal_binds=literal_binds, **kwargs\n            )\n\n        def render_literal_value(self, value, type_):\n            return generate_literal_value(value, dialect, type_)\n\n    compiler = LiteralCompiler(dialect, statement)\n\n    stmt = compiler.process(statement) + \";\\n\"\n    if dialect.name.lower() == \"mssql\":\n        stmt = \"SET IDENTITY_INSERT {0} ON \".format(table_name) + stmt\n\n    fp.write(stmt)\n"
  },
  {
    "path": "etlalchemy/schema_transformer.py",
    "content": "import logging\nimport csv\nimport sqlalchemy\n\n\nclass SchemaTransformer():\n\n    class TableTransformation():\n        def __init__(self, stRow):\n            self.delete = stRow['Delete'].lower() in [\"true\", \"1\"]\n            self.old_table = stRow['Table Name']\n            self.new_table = stRow['New Table Name']\n\n        def __str__(self):\n            return \"({0} -> {1}...Delete = {2})\".\\\n                format(self.old_table, self.new_table, str(self.delete))\n\n    class ColumnTransformation():\n        def __init__(self, stRow):\n            self.delete = stRow['Delete'].lower() in [\"true\", \"1\"]\n            self.old_table = stRow['Table Name']\n            self.old_column = stRow['Column Name']\n            self.new_column = stRow['New Column Name']\n            self.new_type = stRow['New Column Type']\n\n        def _new_type(self):\n            return getattr(sqlalchemy.types, self.new_type)\n\n        def __str__(self):\n            return self.old_table + \".\" + self.old_column\n    def schedule_deletion_of_column(self, col, table):\n        st = self.ColumnTransformation({\n            'Delete': \"true\",\n            'Table Name': table,\n            'Column Name': col,\n            'New Column Name': '',\n            'New Column Type': ''\n        })\n        self.logger.info(\"Scheduling '{0}' to be deleted due to column being empty\".format(col))\n        if not self.column_transformations.get(st.old_table):\n            # No column transformations exist for the table\n            self.column_transformations[st.old_table] = {}\n            self.column_transformations[st.old_table][st.old_column] = st\n        elif self.column_transformations[st.old_table].get(st.old_column):\n            # There ALREADY EXISTS a transformation on this column, UPDATE IT\n            self.column_transformations[st.old_table][st.old_column].delete = True\n            self.column_transformations[st.old_table][st.old_column] = st\n        else:\n            # Transformations exist on the table, not nothing on the column\n            self.column_transformations[st.old_table][st.old_column] = st\n\n    def __init__(self, column_transform_file,\n                 table_transform_file, global_renamed_col_suffixes={}):\n        self.logger = logging.getLogger(\"schema-transformer\")\n        handler = logging.StreamHandler()\n        formatter = logging.Formatter('%(name)s (%(levelname)s) - %(message)s')\n        handler.setFormatter(formatter)\n        self.logger.addHandler(handler)\n        self.logger.setLevel(logging.INFO)\n        self.column_transformations = {}\n        self.table_transformations = {}\n        self.failed_transformations = set([])\n        self.logger.propagate = False\n        self.global_renamed_col_suffixes = global_renamed_col_suffixes\n        # Load column mappings\n        if column_transform_file:\n            with open(column_transform_file, \"rU\") as fp:\n                dr = csv.DictReader(fp)\n                for row in dr:\n                    st = self.ColumnTransformation(row)\n                    if not self.column_transformations.get(st.old_table):\n                        self.column_transformations[st.old_table] = {}\n                    self.column_transformations[st.old_table][st.old_column] = st\n        # Load table mappings\n        if table_transform_file:\n            with open(table_transform_file, \"rU\") as fp:\n                dr = csv.DictReader(fp)\n                for row in dr:\n                    st = self.TableTransformation(row)\n                    self.table_transformations[st.old_table] = st\n    \n    # Returns False if deleted...\n    def transform_table(self, table):\n        thisTableTT = self.table_transformations.get(table.name.lower())\n        # Update table name\n        if thisTableTT:\n            if thisTableTT.delete:\n                return False\n            if thisTableTT.new_table not in [\"\", None]:\n                self.logger.info(\n                    \" ----> Renaming table '{0}' to '{1}'\"\n                    .format(table.name, thisTableTT.new_table))\n                table.name = thisTableTT.new_table\n                return True\n        return True\n    # Returns 'True' if an action is defined for the column...\n\n    def transform_column(self, C, tablename, columns):\n        # Find Column...\n        this_table_st = self.column_transformations.get(tablename)\n        initial_column_name = C.name\n        action_applied = False\n        idx = columns.index(C.name)\n\n        if this_table_st:\n            st = this_table_st.get(C.name)\n            if st:\n                if st.delete:\n                    # Remove the column from the list of columns...\n                    del columns[idx]\n                    action_applied = True\n                else:\n                    # Rename the column if a \"New Column Name\" is specificed\n                    if st.new_column not in [\"\", None]:\n                        self.logger.info(\n                            \" ----> Renaming column '{0}' => '{1}'\"\n                            .format(C.name, st.new_column))\n                        C.name = st.new_column\n                        columns[idx] = C.name\n                        action_applied = True\n                    # Change the type of the column if a\n                    # \"New Column Type\" is specified\n                    if st.new_type not in [\"\", None]:\n                        old_type = C.type.__class__.__name__\n                        try:\n                            C.type = st._new_type()\n                        except Exception as e:\n                            self.logger.critical(\n                                \"** Couldn't change column type of \" +\n                                \"'{0}' to '{1}'**\".\n                                format(C.name, st.new_type))\n                            self.logger.critical(e)\n                            raise e\n                    else:\n                        self.logger.warning(\n                            \"Schema transformation defined for \" +\n                            \"column '{0}', but no action was \" +\n                            \"taken...\".format(C.name))\n\n        if not action_applied:\n            # Then the column had no 'action' applied to it...\n            for k in self.global_renamed_col_suffixes.keys():\n                # Check if column name ends with specfiic suffix\n                if initial_column_name.lower().endswith(k.lower()):\n                    self.logger.info(\n                        \" ---> Renaming column '{0}' to GLOBAL \" +\n                        \" default '{1}' because it contains '{2}'\"\n                        .format(initial_column_name.lower(),\n                                initial_column_name.replace(\n                                    k, self.global_renamed_col_suffixes[k]),\n                                k.lower()))\n                    C.name = initial_column_name.replace(\n                            k, self.global_renamed_col_suffixes[k])\n                    columns[idx] = C.name\n        return columns\n\n    def transform_rows(self, rows, columns, tablename):\n        this_table_st = self.column_transformations.get(tablename)\n        bool_dict = {\n                'Y': True,\n                'N': False,\n                1: True,\n                0: False,\n                '1': True,\n                '0': False,\n                'y': True,\n                'n': False,\n        }\n        if this_table_st is None:\n            return\n        column_transformers = []\n        for c in columns:\n            if this_table_st.get(c):\n                column_transformers.append(this_table_st.get(c))\n            else:\n                column_transformers.append(None)\n        number_columns = len(columns)\n        for r in rows:\n            for i in range(number_columns-1, -1 ,-1):\n                column_transformer = column_transformers[i]\n                if column_transformer:\n                    # Then there is a transformation defined for this column...\n                    if column_transformer.delete:\n                        del r[i]\n                    elif st.new_type in [None, \"\"]:\n                        continue\n                    # Handle type conversions here...\n                    elif st.new_type == \"Integer\":\n                            r[idx] = int(r[idx])\n                    elif st.new_type in [\"String\", \"Text\"]:\n                        r[idx] = str(r[idx])\n                    elif st.new_type in [\"Float\", \"Decimal\"]:\n                        r[idx] = float(r[idx])\n                    elif st.new_type == \"Boolean\":\n                        r[idx] = bool_dict[r[idx]]\n                \n  \n"
  },
  {
    "path": "requirements.txt",
    "content": "# These are the python libraries for all SQL drivers.\n# You must have the drivers installed in order to install these!\n# (They are commented out for a reason, uncomment them once drivers are installed)\n\n#cx-Oracle==5.2.1\n#MySQL-python==1.2.5\n#psycopg2==2.6.1\n#pyodbc==3.0.10\n\nsix>=1.9.0\nSQLAlchemy>=1.2.1,<1.3\nsqlalchemy-migrate>=0.9.7\nSQLAlchemy-Utils>=0.32.0\n"
  },
  {
    "path": "setup.cfg",
    "content": "[metadata]\ndescription-file = README.md\n"
  },
  {
    "path": "setup.py",
    "content": "import sys\n\nfrom setuptools import setup\nfrom setuptools.command.test import test as TestCommand\n\nclass PyTest(TestCommand):\n    user_options = [('pytest-args=', 'a', \"Arguments to pass into pytest\")]\n\n    def initialize_options(self):\n        TestCommand.initialize_options(self)\n        self.pytest_args = \"\"\n\n    def run_tests(self):\n        import pytest\n        import shlex\n\n        errno = pytest.main(shlex.split(self.pytest_args))\n        sys.exit(errno)\n\nsetup(\n        name = 'etlalchemy',\n        packages = ['etlalchemy'],\n        version = '1.0.6',\n        description = 'Extract, Transform, Load. Migrate any SQL Database in 4 lines of code',\n        author = 'Sean Harrington',\n        author_email='seanharr11@gmail.com',\n        url='https://github.com/seanharr11/etlalchemy',\n        download_url='https://github.com/seanharr11/etlalchemy/tarball/1.0.6',\n        keywords=['sql','migration','etl','database'],\n        install_requires = [\n            \"six>=1.9.0\",\n            \"SQLAlchemy>=1.2.1,<1.3\",\n            \"sqlalchemy-migrate>=0.9.7\",\n            \"SQLAlchemy-Utils>=0.32.0\"\n        ],\n        classifiers=[],\n        cmdclass={'test': PyTest},\n        tests_require = [\"pytest\"],\n)\n"
  },
  {
    "path": "tests/test_transformer.py",
    "content": "from etlalchemy.schema_transformer import SchemaTransformer\n\ncol_hdrs = ['Column Name','Table Name',\n            'New Column Name','New Column Type','Delete']\ncol_sample_data = [\n    col_hdrs,\n    ['middle_name','employees','','','True'],\n    ['fired','employees','','Boolean','False'],\n    ['birth_date','employees','dob','',''],\n    ['salary','jobs','payrate','','False'],\n        ]\ndef setup_column_transform_file(tmpdir, data=[]):\n    f = tmpdir.join(\"sample_column_mappings.csv\")\n    file_data = []\n    for row in data:\n        file_data.append(','.join(row))\n    file_data_str = '\\n'.join(file_data)\n    f.write(file_data_str)\n    # f.write_text?\n    assert f.read() == file_data_str\n    return str(f) # filename\n\ntbl_hdrs = ['Table Name','New Table Name','Delete']\ntbl_sample_data = [\n    tbl_hdrs,\n    ['table_to_rename','new_table_name','False'],\n    ['table_to_delete','','True'],\n    ['departments','dept','False'],\n        ]\ndef setup_table_transform_file(tmpdir, data=[]):\n    f = tmpdir.join(\"sample_table_mappings.csv\")\n    file_data = []\n    for row in data:\n        file_data.append(','.join(row))\n    file_data_str = '\\n'.join(file_data)\n    f.write(file_data_str)\n    # f.write_text?\n    assert f.read() == file_data_str\n    return str(f) # filename\n\ndef get_unique_tables(data):\n    \"\"\"Returns unique tables in data using 'Table Name' column in header row\"\"\"\n    hdrs = data[0]\n    tbl_idx = None\n    for idx, hdr in enumerate(hdrs):\n        if hdr == \"Table Name\":\n            tbl_idx = idx\n            break\n    assert tbl_idx is not None\n    return set([c[tbl_idx] for c in [row for row in data[1:]]])\n\ndef mock_dictreader(headers, data):\n    \"\"\"Simulate the behavior of csv dictreader so we don't need files\"\"\"\n    return dict(zip(headers, data))\n\n\ndef test_init_args_empty():\n    trans = SchemaTransformer(column_transform_file=None, table_transform_file=None)\n    assert trans is not None\n    assert trans.global_renamed_col_suffixes == {}\n\ndef test_init_global_only():\n    test_col_suffixes = {'org': 'chg'}\n    trans = SchemaTransformer(column_transform_file=None,\n            table_transform_file=None,\n            global_renamed_col_suffixes=test_col_suffixes)\n    assert trans is not None\n    assert trans.global_renamed_col_suffixes == test_col_suffixes\n\ndef test_column_transformation_delete():\n    \"\"\"Test the allowed values for delete in column transformation file\"\"\"\n    test_cases = {\n        # Delete Value: the expected result\n        'True': True, # The first 3 are the only ones true based on code\n        'true': True,\n        '1': True,\n        'Y': False,   # ! should this be true?\n        'yes': False, # ! should this be true?\n        'delete': False, # ! should this be true?\n        '': False,\n        '0': False,\n        'False': False,\n        'false': False,\n        'unknown': False,\n    }\n    row = mock_dictreader(col_hdrs, ['middle_name','employees','','','True'])\n    for k in test_cases:\n        row['Delete'] = k\n        c = SchemaTransformer.ColumnTransformation(row)\n        assert c\n        assert c.old_column == 'middle_name'\n        assert c.old_table == 'employees'\n        assert c.new_column == ''\n        assert c.new_type == ''\n        assert c.delete == test_cases[k]\n\ndef test_column_transformation_rename():\n    row = mock_dictreader(col_hdrs, ['birth_date','employees','dob','',''])\n    c = SchemaTransformer.ColumnTransformation(row)\n    assert c\n    assert c.old_column == 'birth_date'\n    assert c.old_table == 'employees'\n    assert c.new_column == 'dob' # <=== The actual test\n    assert c.new_type == ''\n    assert c.delete == False\n\n    row['New Column Name'] = '' # Make sure not renaming also works\n    c = SchemaTransformer.ColumnTransformation(row)\n    assert c\n    assert c.old_column == 'birth_date'\n    assert c.old_table == 'employees'\n    assert c.new_column == '' # <==== Should be blank\n    assert c.new_type == ''\n    assert c.delete == False\n\ndef test_column_transformation_tables():\n    row = mock_dictreader(col_hdrs, ['fired','employees','','Boolean','False'])\n    c = SchemaTransformer.ColumnTransformation(row)\n    assert c\n    assert c.old_table == 'employees'\n    assert str(c) == 'employees.fired'\n    row = mock_dictreader(col_hdrs, ['salary','jobs','payrate','','False'])\n    c = SchemaTransformer.ColumnTransformation(row)\n    assert c\n    assert c.old_table == 'jobs'\n    assert str(c) == 'jobs.salary'\n\ndef test_column_transformation_type():\n    row = mock_dictreader(col_hdrs, ['fired','employees','','Boolean','False'])\n    c = SchemaTransformer.ColumnTransformation(row)\n    assert c\n    assert c.new_type == 'Boolean'\n\ndef test_table_transformation_rename():\n    row = mock_dictreader(tbl_hdrs, ['departments','dept','False'])\n    t = SchemaTransformer.TableTransformation(row)\n    assert t.old_table == 'departments'\n    assert t.new_table == 'dept'\n    assert t.delete == False\n\ndef test_table_transformation_delete():\n    \"\"\"Test the allowed values for delete in table transformation file\"\"\"\n    test_cases = {\n        # Delete Value: the expected result\n        'True': True, # The first 3 are the only ones true based on code\n        'true': True,\n        '1': True,\n        'Y': False,   # ! should this be true?\n        'yes': False, # ! should this be true?\n        'delete': False, # ! should this be true?\n        '': False,\n        '0': False,\n        'False': False,\n        'false': False,\n        'unknown': False,\n    }\n    row = mock_dictreader(tbl_hdrs, ['table_to_delete','new_name','True'])\n    for k in test_cases:\n        row['Delete'] = k\n        t = SchemaTransformer.TableTransformation(row)\n        assert t\n        assert t.old_table == 'table_to_delete'\n        assert t.new_table == 'new_name' # ! should this be removed?\n        assert t.delete == test_cases[k]\n\ndef test_needsfiles(tmpdir):\n    \"\"\"Make sure we can create, save and remove temporary files\"\"\"\n    f = tmpdir.join(\"testfile.txt\")\n    f.write(\"can write\")\n    assert len(tmpdir.listdir()) == 1\n    assert f.read() == \"can write\"\n    f.remove()\n    assert len(tmpdir.listdir()) == 0\n\ndef test_init_column_transform_file_empty(tmpdir):\n    col_map = setup_column_transform_file(tmpdir)\n    trans = SchemaTransformer(column_transform_file=col_map,\n            table_transform_file=None)\n    assert trans is not None\n    assert len(trans.column_transformations) == 0\n\ndef test_init_column_transform_file(tmpdir):\n    col_map = setup_column_transform_file(tmpdir, data=col_sample_data)\n    unique_tables = get_unique_tables(col_sample_data)\n    trans = SchemaTransformer(column_transform_file=col_map,\n            table_transform_file=None)\n    assert trans is not None\n    assert len(trans.table_transformations) == 0\n    assert len(trans.column_transformations) > 0\n    assert len(trans.column_transformations) == len(unique_tables)\n    # Make sure the expected tables are in the list of transformations\n    assert set(unique_tables) == set(trans.column_transformations.keys())\n\ndef test_init_table_transform_file(tmpdir):\n    tbl_map = setup_table_transform_file(tmpdir, data=tbl_sample_data)\n    unique_tables = get_unique_tables(tbl_sample_data)\n    trans = SchemaTransformer(column_transform_file=None,\n            table_transform_file=tbl_map)\n    assert trans is not None\n    assert len(trans.column_transformations) == 0\n    assert len(trans.table_transformations) == len(unique_tables)\n    # Make sure the expected tables are in the list of transformations\n    assert set(unique_tables) == set(trans.table_transformations.keys())\n\ndef test_schedule_deletion_of_column(tmpdir):\n    col_map = setup_column_transform_file(tmpdir, data=col_sample_data)\n    trans = SchemaTransformer(column_transform_file=col_map,\n            table_transform_file=None)\n    unique_tables = get_unique_tables(col_sample_data)\n    total_tables = len(unique_tables)\n\n    ### Remove a column in new table (compared to sample data)\n    assert trans.column_transformations.get('dept') is None\n    trans.schedule_deletion_of_column('manager','dept')\n    assert trans.column_transformations.get('dept') is not None\n    assert trans.column_transformations['dept'].get('manager') is not None\n    assert trans.column_transformations['dept'].get('manager').delete\n    # Confirm list has been added to\n    total_tables += 1\n    assert len(trans.column_transformations) == total_tables\n    # Dept was added, so make sure the list of tables differs\n    assert set(unique_tables) != set(trans.column_transformations.keys())\n\n    ### Remove a column known in a different table\n    trans.schedule_deletion_of_column('birth_date', 'bosses')\n    # Make sure it didn't change employees.birth_date\n    assert trans.column_transformations['employees'].get('birth_date').delete == False\n    assert trans.column_transformations['bosses'].get('birth_date').delete\n    total_tables += 1\n    assert len(trans.column_transformations) == total_tables\n\n    ### Remove a known column in known table (in sample data)\n    # Birth_date already has transformation to dob, but isn't to be deleted\n    assert trans.column_transformations['employees'].get('birth_date') is not None\n    assert trans.column_transformations['employees'].get('birth_date').delete == False\n    num_cols = len(trans.column_transformations['employees'])\n    trans.schedule_deletion_of_column('birth_date','employees')\n    # Confirm it changed to deleting it\n    assert trans.column_transformations['employees'].get('birth_date').delete\n    # make sure it did not change the list of employees transformations\n    assert len(trans.column_transformations['employees']) == num_cols\n\n    ### Remove a new column in known table (in sample data)\n    num_cols = len(trans.column_transformations['employees'])\n    trans.schedule_deletion_of_column('title','employees')\n    assert trans.column_transformations['employees'].get('title').delete\n    # make sure it added to the employees transformations\n    assert len(trans.column_transformations['employees']) == num_cols + 1\n    # make sure it din't change how many tables\n    assert len(trans.column_transformations) == total_tables\n\ndef test_transform_table(tmpdir):\n    # TODO implement tests\n    # I think it would be preferable for transform_table to\n    # return the altered SQLAlchemy Table object instead of having a\n    # strange side effect of renaming. It could return None for delete\n    assert 0\n\ndef test_transform_column(tmpdir):\n    # TODO implement tests\n    assert 0\n\ndef test_transform_rows(tmpdir):\n    # TODO implement tests\n    assert 0\n"
  }
]