Full Code of seanharr11/etlalchemy for AI

master 9285c0fa0a6e cached
16 files
126.1 KB
25.9k tokens
64 symbols
1 requests
Download .txt
Repository: seanharr11/etlalchemy
Branch: master
Commit: 9285c0fa0a6e
Files: 16
Total size: 126.1 KB

Directory structure:
gitextract_cuwgq9du/

├── .gitignore
├── HISTORY.rst
├── LICENSE.txt
├── MANIFEST
├── README.md
├── TODO.md
├── etlalchemy/
│   ├── ETLAlchemySource.py
│   ├── ETLAlchemyTarget.py
│   ├── __init__.py
│   ├── etlalchemy_exceptions.py
│   ├── literal_value_generator.py
│   └── schema_transformer.py
├── requirements.txt
├── setup.cfg
├── setup.py
└── tests/
    └── test_transformer.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.pyc

# coverage files
*.coverage
htmlcov
cover

# temporary files
*~

# virtual environments
env*/
venv/

# python setup.py build artifacts
build/
dist/
*egg*/
docs/_build


================================================
FILE: HISTORY.rst
================================================
.. :changelog:

History
-------

1.1.1 (2017-05-15)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**New Features**:

* ``compress_varchar`` parameter added to ETLAlchemySource.__init__() to allow for optional "_minimizing of varchar() columns_". Defaults to ``False``.

**Bug Fixes**:

* Handles huge Decimal values ( > 2^32 ) when determining whether or not to coerce column type to Integer, or leave Decimal.

* Fixed bugs surrounding **Upserting** of rows (when ``drop_database=False``).

* Nearest power of 2 now rounds properly


1.0.7 (2016-08-04)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**New Features**:

* Auto-determination of VARCHAR(n) column size. Size of VARCHAR(n) fields are auto-determined based on the # of chars in the largest string in the column, which is rounded up to the nearest power of 2. (i.e. 21 becomes 32).

**Bug Fixes**:

* Added more thorough UTF-8 support. Previous releases broke when unicode strings were decoded as if they were byte-strings.

* Fixed bug which threw an exception when source is PostgreSQL, and table is capitalized.

``engine.execute(SELECT COUNT(*) FROM Capitalizedtable)`` *replaced with*
``T_with_capitalized_name..count().fetchone() for cross-db support``


**Other Changes**:

* Created HISTORY.rst


================================================
FILE: LICENSE.txt
================================================
The MIT License (MIT)
Copyright (c) 2016 Sean Harrington

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: MANIFEST
================================================
# file GENERATED by distutils, do NOT edit
setup.cfg
setup.py
etlalchemy/ETLAlchemySource.py
etlalchemy/ETLAlchemyTarget.py
etlalchemy/__init__.py
etlalchemy/etlalchemy_exceptions.py
etlalchemy/literal_value_generator.py
etlalchemy/schema_transformer.py


================================================
FILE: README.md
================================================
# etlalchemy
Extract, Transform and Load...Migrate any SQL Database in 4 Lines of Code. *[Read more here...](http://thelaziestprogrammer.com/sharrington/databases/migrating-between-databases-with-etlalchemy)*

[![Donate](https://img.shields.io/badge/donate-paypal-blue.svg)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=DH544PY7RFSLA)
[![Donate](https://img.shields.io/badge/donate-gratipay-green.svg)](https://gratipay.com/etlalchemy/)
# Installation

```bash
pip install etlalchemy
# On El Capitan:
### pip install --ignore-installed etlalchemy

# Also install the necessary DBAPI modules and SQLAlchemy dialects
# For example, for MySQL, you might use:
# pip install pymsql
```

# Basic Usage
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

source = ETLAlchemySource("mssql+pyodbc://username:password@DSN_NAME")
target = ETLAlchemyTarget("mysql://username:password@hostname/db_name", drop_database=True)
target.addSource(source)
target.migrate()
````

# Examples

**Provide a list of tables to include/exclude in migration**
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

# Load ONLY the 'salaries' table
source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees",
                          included_tables=["salaries"])
# Conversely, you could load ALL tables EXCEPT 'salaries'
# source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees",\
#                          excluded_tables=["salaries"])

target = ETLAlchemyTarget("postgresql://etlalchemy:etlalchemy@localhost/test", drop_database=True)
target.addSource(source)
target.migrate()
```
**Only migrate schema, or only Data, or only FKs, or only Indexes (or any combination of the 4!)**
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees")

target = ETLAlchemyTarget("postgresql://etlalchemy:etlalchemy@localhost/test", drop_database=True)
target.addSource(source)
# Note that each phase (schema, data, index, fk) is independent of all others,
# and can be run standalone, or in any combination. (Obviously you need a schema to send data, etc...)
target.migrate(migrate_fks=False, migrate_indexes=False, migrate_data=False, migrate_schema=True)
```
**Skip columns and tables if they are empty**
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# This will skip tables with no rows (or all empty rows), and ignore them during schema migration
# This will skip columns if they have all NULL values, and ignore them during schema migration
source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees",\
                          skip_column_if_empty=True,\
                          skip_table_if_empty=True)
target = ETLAlchemyTarget("postgresql://etlalchemy:etlalchemy@localhost/test", drop_database=True)
target.addSource(source)
target.migrate()
```
**Enable 'upserting' of data**
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees")
# This will leave the target DB as is, and if the tables being migrated from Source -> Target
# already exist on the Target, then rows will be updated based on PKs if they exist, or 
# inserted if they DNE on the Target table.
target = ETLAlchemyTarget("postgresql://etlalchemy:etlalchemy@localhost/test", drop_database=False)
target.addSource(source)
target.migrate()
```
**Alter schema (change column names, column types, table names, and Drop tables/columns)**
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# See below for the simple structure of the .csv's for schema changes
source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees",\
                          column_schema_transformation_file=os.getcwd() + "/transformations/column_mappings.csv",\
                          table_schema_transformation_file=os.getcwd() + "/transformations/table_mappings.csv")
target = ETLAlchemyTarget("postgresql://SeanH:Pats15Ball@localhost/test", drop_database=True)
target.addSource(source)
target.migrate()
```
| *column_mappings.csv* | *table_mappings.csv* |
| :--- | :--- |
|Column Name,Table Name,New Column Name,New Column Type,Delete|Table Name,New Table Name,Delete|
|last_name,employees,,,True|table_to_rename,new_table_name,False|
|fired,employees,,Boolean,False|table_to_delete,,True|
|birth_date,employees,dob,,False|departments,dept,False|

**Rename any column which ends in a given 'suffix' (or skip the column during migration)**
```python
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# global_renamed_col_suffixes is useful to standardize column names across tables (like the date example below)
source = ETLAlchemySource("mysql://etlalchemy:etlalchemy@localhost/employees",\
                          global_ignored_col_suffixes=['drop_all_columns_that_end_in_this'],\
                          global_renamed_col_suffixes={'date': 'dt'},\ #i.e. "created_date -> created_dt"
                         )
target = ETLAlchemyTarget("postgresql://SeanH:Pats15Ball@localhost/test", drop_database=True)
target.addSource(source)
target.migrate()
```

# Known Limitations
1. 'sqlalchemy_migrate' does not support MSSQL FK migrations.
   *_(So, FK migrations will be skipped when Target is MSSQL)_
2. Currently not compatible with Windows
   * Several "os.system()" calls with UNIX-specific utilities
   * One option for Windows users is installing through the [Windows Subsystem for Linux (WSL)](https://msdn.microsoft.com/en-us/commandline/wsl/install_guide)
3. If Target DB is in the Azure Cloud (MSSQL), FreeTDS has some compatibility issues which are performance related. This may be noticed when migrating tables with 1,000,000+ rows into a Azure MSSQL Server.
4. Though the MSSQL 'BULK INSERT' feature is supported in this tool, it is NOT supported on either Azure environments, or AWS MSSQL Server environments (no 'bulkadmin' role allowed). Feel free to test this out on a different MSSQL environment!
5. Regression tests have not **(yet)** been created due to the unique **(and expensive)** way one must test all of the different database types.
6. Migrations *to* MSSQL and Oracle are extremely slow due to the lack of 'fast' import capabilities. 
  * 'SQL Loader' can be used on Oracle, and the 'BULK INSERT' operation can be used on MSSQL, however the former is a PITA to install, and the latter is not supported in several MSSQL environments (see 'Known Limitations' below).
  * 'BULK INSERT' *is supported* in etlalchemy (with limited testing), but "SQL LOADER" is not (yet).
7. When sending data to PostgreSQL, if the data contains VARCHAR() or TEXT() columns with carriage returns ('^M' or '\r'), these will be stripped.
  * This is due to the lack of the "ENCLOSED BY" option of psycopg.copy_from() - these chars are interpreted as literals, and in turn tell the COPY FROM operation that "the row ends here"

# Assumptions Made
1. Default date formats for all Target DB's are assumed to be the 'out-of-the-box' defaults.
2. Text fields to not contain the character "|", or the string "|,".
   * On some Target DBs, if you have text fields containing "|," (mssql) or "|" (sqlite), then the 'fast' import may fail, or insert bizarre values into your DB. This is due to the 'delimiter' which separates column values in the file that is sent to the Target DB.

# On Testing 
1. The 'Performance' matrix has been put together using a simple script which tests every combination of Source (5) and Target (5) DB migration (25 total combinations).
  * The script is not included (publicly), as it contains the connection strings of AWS RDS instances.
2. A regression test suite is needed, as is funding to setup an environment for Oracle and MSSQL instances.. 
3. There are definitely some untested column types here amongst all 5 RDBMS's. Please create *pull requests* or open *issues* that describe the problem **in detail** as these arise!


# Contributors
We are always looking for contributors! 

This project has [its origins](http://thelaziestprogrammer.com/migrating-between-databases-with-etlalchemy) in solving the problem of migrating off of bulky, expensive enterprise-level databases. If the project has helped you to migrate off of these databases, and onto open-source RDBMS's, the best way to show your support is by opening Pull Requests and Issues.



# Donations
[Donations through Gratipay](https://gratipay.com/etlalchemy/) are welcome, but **Pull Requests** are better!

You can also support us [via PayPal here.](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=DH544PY7RFSLA)

# Other

For help installing cx_Oracle on a Mac (El Capitan + cx_Oracle = Misery), [check out this blog post](http://thelaziestprogrammer.com/sharrington/databases/oracle/install-cx_oracle-mac) for help. 

Run this tool from the **same server that hosts your Target database** to get **maximum performance** out of it.


================================================
FILE: TODO.md
================================================
1. Add regression tests
2. Add unit tests
3. Add support for Python 3.5
4. Check to see if FK exists between 2 tables - if it does, append an integer to the end of the constraint_name. Right now we just catch exceptions, and only on a subset of supported RDBMSs (OperationalError - MySQL, ProgrammingError - PostgreSQL)
5. Replace column-type guessing process (iterating over every row)  with a GROUP BY query to improve performance.
6. Add parameter for "_quoted_strings_enclosed_by_" to **ETLAlchemySource** to override default .csv quote-character.
7. Add parameter for "_cleanup_table_csv_files_", with default value of True, allowing the user to override default and let files persist after they are loaded into Target DB.


================================================
FILE: etlalchemy/ETLAlchemySource.py
================================================
import codecs
from itertools import islice
from literal_value_generator import dump_to_sql_statement, dump_to_csv,\
    dump_to_oracle_insert_statements
import random
from migrate.changeset.constraint import ForeignKeyConstraint
from datetime import datetime
import time
from copy import deepcopy
import pickle
import sqlalchemy
import logging
# from clean import cleaners
from sqlalchemy.sql import select
from sqlalchemy.schema import CreateTable, Column
from sqlalchemy.sql.schema import Table, Index
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session, sessionmaker
from sqlalchemy import create_engine, MetaData, func, and_
from sqlalchemy.engine import reflection
from sqlalchemy.inspection import inspect
from sqlalchemy.exc import NoSuchTableError
from sqlalchemy.types import Text, Numeric, BigInteger, Integer, DateTime, Date, TIMESTAMP, String, BINARY, LargeBinary
from sqlalchemy.dialects.postgresql import BYTEA
import inspect as ins
import re
import csv
from schema_transformer import SchemaTransformer
from etlalchemy_exceptions import DBApiNotFound
import os

# Parse the connn_string to find relevant info for each db engine #

"""
An instance of 'ETLAlchemySource' represents 1 DB. This DB can be sent to
multiple 'ETLAlchemyTargets' via calls to ETLAlchemySource.migrate().
See examples (on github) for info...
"""

class ETLAlchemySource():

    def __init__(self,
                 conn_string,
                 global_ignored_col_suffixes=[],
                 global_renamed_col_suffixes={},
                 column_schema_transformation_file=None,
                 table_schema_transformation_file=None,
                 included_tables=None,
                 excluded_tables=None,
                 skip_table_if_empty=False,
                 skip_column_if_empty=False,
                 compress_varchar=False,
                 log_file=None):
        # TODO: Store unique columns in here, and ADD the unique constraints
        # after data has been migrated, rather than before
        self.unique_columns = []
        self.compress_varchar = compress_varchar
        
        self.logger = logging.getLogger("ETLAlchemySource")
        self.logger.propagate = False
        
        for h in list(self.logger.handlers):
            # Clean up any old loggers...(useful during testing w/ multiple
            # log_files)
            self.logger.removeHandler(h)
        handler = logging.StreamHandler()
        if log_file is not None:
            handler = logging.FileHandler(log_file)
        formatter = logging.Formatter('%(name)s (%(levelname)s) - %(message)s')
        handler.setFormatter(formatter)

        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)
        # Load the json dict of cleaners...
        # {'table': [cleaner1, cleaner2,...etc],
        #  'table2': [cleaner1,...cleanerN]}

        self.included_tables = included_tables
        self.excluded_tables = excluded_tables
        # Set this to 'False' if you are using either of the
        # following MSSQL Environments:
        #  1.) AWS SQL Server
        #  ---> The 'bulkadmin' role required for BULK INSERT permissions
        #  is not available in AWS
        #  (see https://forums.aws.amazon.com/thread.jspa?threadID=122351)
        #  2.) Azure SQL
        #  ---> The 'BULK INSERT' feature is disabled in the Microsoft Azure
        #  cloud.
        # ** Otherwise, setting this to 'True' will vastly improve run-time...
        self.enable_mssql_bulk_insert = False

        self.current_ordered_table_columns = []
        self.cleaners = {}

        self.schema_transformer = SchemaTransformer(
            column_transform_file=column_schema_transformation_file,
            table_transform_file=table_schema_transformation_file,
            global_renamed_col_suffixes=global_renamed_col_suffixes)

        self.tgt_insp = None
        self.src_insp = None
        
        self.dst_engine = None
        self.constraints = {}
        self.indexes = {}
        self.fks = {}
        self.engine = None
        self.connection = None
        self.orm = None
        self.database_url = conn_string

        self.total_rows = 0
        self.column_count = 0
        self.table_count = 0
        self.empty_table_count = 0
        self.empty_tables = []
        self.deleted_table_count = 0
        self.deleted_column_count = 0
        self.deleted_columns = []
        self.null_column_count = 0
        self.null_columns = []
        self.referential_integrity_violations = 0
        self.unique_constraint_violations = []
        self.unique_constraint_violation_count = 0

        self.skip_column_if_empty = skip_column_if_empty
        self.skip_table_if_empty = skip_table_if_empty

        self.total_indexes = 0
        self.index_count = 0
        self.skipped_index_count = 0

        self.total_fks = 0
        self.fk_count = 0
        self.skipped_fk_count = 0
        # Config
        self.check_referential_integrity = False
        self.riv_arr = []
        self.start = datetime.now()

        self.global_ignored_col_suffixes = global_ignored_col_suffixes

        self.times = {}  # Map Tables to Names...

    def get_nearest_power_of_two(self, num):
        # This is optimized for MySQL: we want to optimize
        # cache hits by defining our column sizes as small
        # as possible, to the nearest power of 2.
        i = 2
        if num < 256:
            # Disk space is L + 1 byte for length (1 - 255)
            while (i-1) < num:
                i *= 2
            return i - 1
        else:
            # Disk space is L + 2 bytes for length (256 - 65536)
            while (i-2) < num:
                i *= 2
            return i - 2


    def standardize_column_type(self, column, raw_rows):
        old_column_class = column.type.__class__
        column_copy = Column(column.name,
                             column.type,
                             nullable=column.nullable,
                             unique=column.unique,
                             primary_key=column.primary_key)
        if column.unique:
            self.unique_columns.append(column.name)
        """"""""""""""""""""""""""""""""
        """ *** STANDARDIZATION *** """
        """"""""""""""""""""""""""""""""
        idx = self.current_ordered_table_columns.index(column.name)
        ##############################
        # Duck-typing to remove
        # database-vendor specific column types
        ##############################
        base_classes = map(
            lambda c: c.__name__.upper(),
            column.type.__class__.__bases__)
        self.logger.info("({0}) {1}".format(column.name,
            column.type.__class__.__name__))
        self.logger.info("Bases: {0}".format(str(base_classes))) 

        # Assume the column is empty, unless told otherwise
        null = True

        if "ENUM" in base_classes:
            for r in raw_rows:
                if r[idx] is not None:
                    null = False
            # Hack for error 'postgresql enum type requires a name'
            if self.dst_engine.dialect.name.lower() == "postgresql":
                column_copy.type = column.type
                column_copy.type.__class__ = column.type.__class__.__bases__[0]
                # Name the enumeration 'table_column'
                column_copy.type.name = str(column).replace(".", "_")
            else:
                column_copy.type.__class__ = column.type.__class__.__bases__[0]
        elif "STRING" in base_classes\
                or "VARCHAR" in base_classes\
                or "TEXT" in base_classes:
            #########################################
            # Get the VARCHAR size of the column...
            ########################################
            varchar_length = column.type.length
            ##################################
            # Strip collation here ...
            ##################################
            column_copy.type.collation = None
            max_data_length = 0
            for row in raw_rows:
                data = row[idx]
                if data is not None:
                    null = False
                    # Update varchar(size)
                    if len(data) > max_data_length:
                        max_data_length = len(data)
                    if isinstance(row[idx], unicode):
                        row[idx] = row[idx].encode('utf-8', 'ignore')
                    else:
                        row[idx] = row[idx].decode('utf-8', 'ignore').encode('utf-8')
            if self.compress_varchar:
                # Let's reduce the "n" in VARCHAR(n) to a power of 2
                if max_data_length > 0:
                    # The column is not empty...
                    column_size = self.get_nearest_power_of_two(max_data_length)
                    column_copy.type = String(column_size)
                    self.logger.info("Converting to -> VARCHAR({0}) (max_data_length: {1})".format(str(column_size), str(max_data_length)))
                elif varchar_length > 0:
                    # The column is empty BUT has a predfined size
                    column_size = self.get_nearest_power_of_two(varchar_length)
                    column_copy.type = String(column_size)
                    self.logger.info("Converting to -> VARCHAR({0}) (prev varchar size: {1})".format(str(column_size), str(varchar_length)))
                else:
                    # The column is empty and has NO predefined size
                    column_copy.type = Text()
                    self.logger.info("Converting to Text()")
            else:
                if varchar_length > 0:
                    column_copy.type = String(varchar_length)
                else:
                    # The column has NO predefined size
                    column_copy.type = Text()
                    self.logger.info("Converting to Text()")
        elif "UNICODE" in base_classes:
            #########################################
            # Get the VARCHAR size of the column...
            ########################################
            varchar_length = column.type.length
            column_copy.type = String()
            column_copy.type.length = varchar_length
            ##################################
            # Strip collation here ...
            ##################################
            column_copy.type.collation = None
            for row in raw_rows:
                data = row[idx]
                if varchar_length and data and len(data) > varchar_length:
                    self.logger.critical(
                        "Length of column '{0}' exceeds VARCHAR({1})".format(
                            column.name, str(varchar_length)))
                if data is not None:
                    null = False
                    if isinstance(row[idx], unicode):
                        row[idx] = row[idx].encode('utf-8', 'ignore')
                #if row[idx]:
                #    row[idx] = row[idx].decode('utf-8', 'ignore')

        elif "DATE" in base_classes or "DATETIME" in base_classes:
            ####################################
            # Determine whether this is a Date
            # or Datetime field
            ###################################
            type_count = {}
            types = set([])
            for row in raw_rows:
                data = row[
                    self.current_ordered_table_columns.index(
                        column.name)]
                types.add(data.__class__.__name__)
                if type_count.get(data.__class__.__name__):
                    type_count[data.__class__.__name__] += 1
                else:
                    type_count[data.__class__.__name__] = 1
                if data is not None:
                    null = False
            self.logger.warning(str(type_count))
            if type_count.get("datetime"):
                if self.dst_engine.dialect.name.lower() in ["postgresql"]:
                    self.logger.info("Postgresql has no DATETIME - converting to TIMESTAMP")
                    column_copy.type = TIMESTAMP()
                else:
                    column_copy.type = DateTime()
            else:
                column_copy.type = Date()

        elif "NUMERIC" in base_classes\
                or "FLOAT" in base_classes\
                or "DECIMAL" in base_classes:
            ####################################
            # Check all cleaned_rows to determine
            # if column is decimal or integer
            ####################################
            mantissa_max_digits = 0
            left_hand_max_digits = 0
            mantissa_gt_zero = False
            intCount = 0
            maxDigit = 0
            type_count = {}
            types = set([])
            for row in raw_rows:
                data = row[
                    self.current_ordered_table_columns.index(
                        column.name)]
                types.add(data.__class__.__name__)
                if type_count.get(data.__class__.__name__):
                    type_count[data.__class__.__name__] += 1
                else:
                    type_count[data.__class__.__name__] = 1
                ######################
                # Check for NULL data
                # (We will drop column if all rows contain null)
                ######################
                if data is not None:
                    null = False
                if data.__class__.__name__ == 'Decimal' or\
                   data.__class__.__name__ == 'float':
                    splt = str(data).split(".")
                    if len(splt) == 1:
                        intCount += 1
                        maxDigit = max(data, maxDigit)
                        continue

                    left_hand_digits = splt[0]
                    mantissa_digits = splt[1]

                    # Store greatest mantissa to check for decimal cols that
                    # should be integers...(i.e. if m = 3.000)
                    mantissa_max_digits = max(mantissa_max_digits,
                                              len(mantissa_digits))
                    left_hand_max_digits = max(left_hand_max_digits,
                                               len(left_hand_digits))
                    # If we have a mantissa greater than zero, we can keep this column as a decimal
                    if not mantissa_gt_zero and float(mantissa_digits) > 0:
                        # Short circuit the above 'and' so we don't keep resetting mantissa_gt_zero
                        mantissa_gt_zero = True

                elif data.__class__.__name__ == 'int':
                    intCount += 1
                    maxDigit = max(data, maxDigit)
            self.logger.info(" --> " + str(column.name) +
                             "..." + str(type_count))
            #self.logger.info("Max Digit Length: {0}".format(str(len(str(maxDigit)))))
            #self.logger.info("Max Mantissa Digits: {0}".format(str(mantissa_max_digits)))
            #self.logger.info("Max Left Hand Digit: {0}".format(str(left_hand_max_digits)))
            #self.logger.info("Total Left Max Digits: {0}".format(str(max(len(str(maxDigit)), left_hand_max_digits))))
            if mantissa_gt_zero:
                cum_max_left_digits = max(
                    len(str(maxDigit)), (left_hand_max_digits))
                self.logger.info("Numeric({0}, {1})".format(str(cum_max_left_digits + mantissa_max_digits), str(mantissa_max_digits)))
                column_copy.type = Numeric(
                    precision=cum_max_left_digits + mantissa_max_digits,
                    scale=mantissa_max_digits)
                if intCount > 0:
                    self.logger.warning(
                        "Column '" +
                        column.name +
                        "' contains decimals and integers, " +
                        "resorting to type 'Numeric'")
                if column.primary_key:
                    self.logger.warning(
                        "Column '" +
                        column.name +
                        "' is a primary key, but is of type 'Decimal'")
            else:
                self.logger.warning(
                    "Column '" +
                    column.name +
                    "' is of type 'Decimal', but contains no mantissas " +
                    "> 0. (i.e. 3.00, 2.00, etc..)\n ")
                if maxDigit > 4294967295:
                    self.logger.warning("Coercing to 'BigInteger'")
                    column_copy.type = BigInteger()
                    # Do conversion...
                    for r in raw_rows:
                        if r[idx] is not None:
                            r[idx] = long(r[idx])
                else:
                    column_copy.type = Integer()
                    self.logger.warning("Coercing to 'Integer'")
                    for r in raw_rows:
                        if r[idx] is not None:
                            r[idx] = int(r[idx])
        elif column.type.__class__.__name__ == "BIT":
            for r in raw_rows:
                if r[idx] is not None:
                    null = False
            self.logger.info("Found column of type 'BIT' -> " +
                "coercing to Boolean'")
            column_copy.type.__class__ = sqlalchemy.types.Boolean
        elif "TYPEENGINE" in base_classes:
            for r in raw_rows:
                if r[idx] is not None:
                    null = False
            self.logger.warning(
                "Type '{0}' has no base class!".format(
                    column.type.__class__.__name__))
        elif "VARBINARY" in base_classes or "LARGEBINARY" in base_classes:
            if self.dst_engine.dialect.name.lower() == "postgresql":
                for r in raw_rows:
                    if r[idx] is not None:
                        null = False
                        r[idx] = r[idx].encode('hex')
            column_copy.type = LargeBinary()
        elif "_BINARY" in base_classes:
            for r in raw_rows:
                if r[idx] is not None:
                    null = False
                    r[idx] = r[idx].encode('hex')
            if self.dst_engine.dialect.name.lower() == "postgresql":
                column_copy.type = BYTEA()
            else:
                column_copy.type = BINARY()
        else:
            #####################################################
            # Column type is not dialect-specific, but...
            # ... we need to check for null columns still b/c
            # ... we default to True !
            ######################################################
            for r in raw_rows:
                if r[idx] is not None:
                    null = False
            # Reset collations...
            if hasattr(column.type, 'collation'):
                column_copy.type.collation = None
            self.logger.info("({0}) Class: ".format(
                column_copy.name) + str(column.type.__class__.__name__))
            self.logger.info(
                "({0}) ---> Bases: ".format(column_copy.name) +
                str(column.type.__class__.__bases__))

            column_copy.type.__class__ = column.type.__class__.__bases__[0]
        #########################################
        # If the entire column is null, and we specify
        # the option below (skip_column_if_empty),
        # schedule a 'column_transformer' to delete the
        # column later ...
        ########################################
        if null and self.skip_column_if_empty:
            # The column should be deleted due to it being empty
            self.null_column_count += 1
            self.null_columns.append(column.table.name + "." + column.name)
            self.logger.warning(
                "Column '" +
                column.table.name +
                "." +
                column_copy.name +
                "' has all NULL entries, skipping...")
            self.schema_transformer.schedule_deletion_of_column(
                    column.name,
                    column.table.name
                   )
        
        return column_copy

    def add_or_eliminate_column(
            self,
            T,
            T_dst_exists,
            column,
            column_copy,
            raw_rows):
        self.logger.info("Checking column for elimination status...")
        old_column_class = column.type.__class__
        table_name = T.name
        null = True
        idx = self.current_ordered_table_columns.index(column.name)
        
        cname = column_copy.name
        columnHasGloballyIgnoredSuffix = len(
            filter(
                lambda s: cname.find(s) > -1,
                self.global_ignored_col_suffixes)) > 0

        oldColumns = self.current_ordered_table_columns
        oldColumnsLength = len(self.current_ordered_table_columns)
        ##################################
        # Transform the column schema below
        ##################################
        self.current_ordered_table_columns = \
            self.schema_transformer.transform_column(
                column_copy, T.name, self.current_ordered_table_columns)
        if oldColumnsLength != len(self.current_ordered_table_columns):
            # A column is scheduled to be deleted in "column_transformations_file"
            self.logger.warning(
                " ------> Column '" +
                cname +
                "' is scheduled to be deleted -- **NOT** migrating this col..")
            self.deleted_column_count += 1
            self.deleted_columns.append(table_name + "." + cname)
            if T_dst_exists:
                pass
                # TODO: Delete the column from T_dst
            return False
        elif oldColumns[idx] != self.current_ordered_table_columns[idx]:
            # Column was renamed
            if T_dst_exists:
                pass
                # TODO Add the column to the table...
            else:
                # column_copy has updated datatype...
                T.append_column(column_copy)
            self.logger.info("Column '{0}' renamed to '{1}'".format(oldColumns[idx], self.current_ordered_table_columns[idx]))
            return True
        else:
            if T_dst_exists:
                pass
                # TODO Add the column to the table...
            else:
                T.append_column(column_copy)
            return True

    def transform_table(self, T):
        ################################
        # Run Table Transformations
        ################################
        """ This will update the table 'T' in-place
        (i.e. change the table's name)
        """
        if not self.schema_transformer.transform_table(T):
            self.logger.info(
                " ---> Table ({0}) is scheduled to be deleted " +
                "according to table transformations...".format(T.name))
            # Clean up FKs and Indexes on this table...
            del self.indexes[T.name]
            del self.fks[T.name]
            self.deleted_table_count += 1
            self.deleted_columns += map(lambda c: T.name +
                                       "." + c.name, T.columns)
            self.deleted_column_count += len(T.columns)
            return None
        return True

    def check_multiple_autoincrement_issue(self, auto_inc_count, pk_count, T):
        if pk_count > 1:
            # Sometimes we can't detect the 'autoincrement' attr on columns
            # (For instance on SQL Server...)
            for c in T.columns:
                if c.primary_key:
                    c.autoincrement = False
            # and engine == MySQL.innoDB...
            if auto_inc_count > 0:
                # print the verbose warning
                self.logger.warning("""
                ****************************************************************
                **** Table '{0}' contains a composite primary key,
                **** with an auto-increment attribute tagged on 1 of the columns.
                *****************************************************************
                ********* --We are dropping the auto-increment field-- **********
                *****************************************************************
                ** (why? MySQL -> InnoDB Engine does not support this.
                ** Try MyISAM for support - understand that Oracle does not allow
                ** auto-increment fields, but uses sequences to create unique
                ** composite PKs")
                *****************************************************************
                """.format(T.name))

    def transform_data(self, T_src, raw_rows):
        """"""""""""""""""""""""""""""
        """ *** TRANSFORMATION *** """
        """"""""""""""""""""""""""""""
        # Transform the data first
        # if self.cleaners.get(T_src.name):
        # TODO: Finish Implementing TableCleaner.clean(rows)
        # TC = TableCleaner(T_src)
        # TC.loadCleaners(self.cleaners[table_name])
        # TC.clean(raw_rows)
        # Transform the schema second (by updating the column names [keys of
        # dict])
        self.schema_transformer.transform_rows(
            raw_rows, self.original_ordered_table_columns, T_src.name)

    def create_table(self, T_dst_exists, T):
        with self.dst_engine.connect() as conn:
            if not T_dst_exists:
                self.logger.info(" --> Creating table '{0}'".format(T.name))
                try:
                    T.create(conn)
                    return True
                except Exception as e:
                    self.logger.error(
                        "Failed to create table '{0}'\n\n{1}".format(
                            T.name, e))
                    raise
            else:
                self.logger.warning(
                    "Table '{0}' already exists - not creating table, " +
                    "reflecting to get new changes instead..".format(T.name))
                self.tgt_insp.reflecttable(T, None)
                return True
                # We need to Upsert the data...

    def send_data(self, table, columns):
        Session = sessionmaker(bind=self.dst_engine)
        session = Session()
        data_file_path = os.getcwd() + "/" + table + ".sql"

        self.logger.info(
            "Transferring data from local file '{0}' to target DB".format(
                table + ".sql"))
        if self.dst_engine.dialect.name.lower() == "mssql":
            username = self.dst_engine.url.username
            password = self.dst_engine.url.password
            dsn = self.dst_engine.url.host
            db_name = list(self.dst_engine.execute(
                "SELECT DB_NAME()").fetchall())[0][0]
            if not self.enable_mssql_bulk_insert:
                ######################################
                # SQL Azure does not support BULK INSERT
                # ... we resort to a Large INSERT statement
                ######################################
                self.logger.info(
                    "Sending data to target MSSQL instance..." +
                    "(Slow - enable_mssql_bulk_insert = False)")
                os.system("cat {4} | isql {0} {1} {2} -d{3} -v"
                          .format(dsn, username, password,
                                  db_name, data_file_path))
                self.logger.info("Done.")
            else:
                try:
                    conn = session.connection()
                    t1 = conn.begin()
                    self.logger.info("Sending data to target MSSQL instance...\
                            (Fast [BULK INSERT])")

                    conn.execute("""BULK INSERT {0} FROM '{1}' WITH (
                                     fieldterminator = '|,',
                                     rowterminator = '\n'
                                   );""".format(data_file_path, table))
                    t1.commit()
                except sqlalchemy.exc.ProgrammingError as e:
                    self.logger.critical("""
                        *****************************************************
                        ** BULK INSERT operation not supported on your target
                        ** MSSQL server instance.
                        ** ***********************************
                        ** [It is likely that you are running on
                        ** Azure SQL (no bulk insert feature), or AWS SQL
                        ** Server (no bulkadmin role)].
                        *****************************************************
                        **** Re-run with
                        **      'self.enable_mssql_bulk_insert = False'
                        **   ...but expect slow data transfer.
                        ******************************************************
                        Original Exception:
                        {0}""".format(str(e)))
                    raise(e)
                self.logger.info("Done.")
        elif self.dst_engine.dialect.name.lower() == "mysql":
            username = self.dst_engine.url.username
            password = self.dst_engine.url.password
            db_name = self.dst_engine.url.database
            host = self.dst_engine.url.host
            self.logger.info(
                "Sending data to target MySQL instance...(Fast [mysqlimport])")
            columns = map(lambda c: "\`{0}\`".format(c), columns)
            cmd = ("mysqlimport -v -h{0} -u{1} -p{2} "
                       "--compress "
                       "--local "
                       "--fields-terminated-by=\",\" "
                       "--fields-enclosed-by='\"' "
                       "--fields-escaped-by='\\' "
                       # "--columns={3} "
                       "--lines-terminated-by=\"\n\" "
                       "{3} {4}"
                      ).format(host, username, password,
                                       #",".join(columns), db_name,
                                       db_name,
                                       data_file_path)
            self.logger.info(cmd)
            os.system(cmd)
            self.logger.info("Done.")
        elif self.dst_engine.dialect.name.lower() == "postgresql":
            # TODO: Take advantage of psql> COPY FROM <payload.sql> WITH
            # DELIMITER AS ","
            username = self.dst_engine.url.username
            password = self.dst_engine.url.password
            db_name = self.dst_engine.url.database
            host = self.dst_engine.url.host
            
            import psycopg2
            conn = psycopg2.connect(
                """
                host='{0}'
                port='5432'
                dbname='{1}'
                user='{2}'
                password='{3}'
                """.format(host, db_name, username, password))
            cur = conn.cursor()
            # Legacy method (doesn't work if not superuser, and if file is
            # LOCAL
            cmd = """COPY {0} ({1}) FROM '{2}'
                    WITH CSV QUOTE ''''
                    ESCAPE '\\' """.format(
                table, ",".join(columns), data_file_path, "'")
            self.logger.info(
                "Sending data to target Postgresql instance..." +
                "(Fast [COPY ... FROM ... WITH CSV]):" +
                "\n ----> {0}".format(cmd))
            with open(data_file_path, 'r') as fp_psql:
                # Most use command below, which loads data_file from STDIN to
                # work-around permissions issues...
                null_value = 'NULL'
                delimiter = '|'
                quote = "\'"
                #escape = '/'
                copy_from_stmt = "COPY \"{0}\" FROM STDIN WITH CSV NULL '{1}'"\
                    .format(table, null_value, quote, delimiter)
                cur.copy_expert(copy_from_stmt, fp_psql)
                              #columns=tuple(map(lambda c: '"'+str(c)+'"', columns)))
            conn.commit()
            conn.close()
            self.logger.info("Done.")

        elif self.dst_engine.dialect.name.lower() == "sqlite":
            db_name = self.dst_engine.url.database
            self.logger.info(
                "Sending data to target sqlite instance...(Fast [.import])")
            sqlite_cmd = ".separator \'|\'\\n.nullvalue NULL\\n.import {0} {1}".format(data_file_path, table)
            self.logger.info(sqlite_cmd)
            os.system("echo \"{0}\" | sqlite3 {1}"
                    .format(sqlite_cmd, db_name))
            # ** Note values will be inserted as 'NULL' if they are NULL.
            """
           with open("{0}.sql".format(table), "r") as fp:
               for line in fp.readlines():
                   self.dst_engine.execute(line)
           """
            self.logger.info("Done.")
        elif self.dst_engine.dialect.name.lower() == "oracle":
            with open(data_file_path, "r") as fp_orcl:
                lines_inserted = 0
                while True:
                    next_n_lines = list(islice(fp_orcl, 1001))
                    lines_inserted += 1000
                    if not next_n_lines:
                        break
                    self.dst_engine.execute("\n".join(next_n_lines))
                    self.logger.info(
                        "Inserted '{0}' rows".format(
                            str(lines_inserted)))
        else:
            raise Exception("Not Implemented!")
        # Cleanup...
        self.logger.info("Cleaning up '{0}'.sql".format(table))
        os.remove(data_file_path)
        self.logger.info("Done")

    """
      Dumps the data to a file called <table_name>.sql in the CWD.
      Depending on the DB Target, either a CSV will be generated
      for optimized BULK IMPORT, or an INSERT query will be generated
      if BULK INSERTING a CSV is not supported (i.e. SQL Azure)
   """

    def dump_data(self, T_dst_exists, T, raw_rows, pks, sessionMaker):
        t_start_load = datetime.now()
        conn = self.dst_engine.connect()
        s = sessionMaker(bind=conn)
        data_file_path = os.getcwd() + "/{0}.sql".format(T.name)

        if not T_dst_exists:
            # Table "T" DNE in the destination table prior to this entire
            # migration process. We can naively INSERT all rows in the buffer
            with open(data_file_path, "a+") as fp:
                if not self.enable_mssql_bulk_insert and\
                   self.dst_engine.dialect.name.lower() == "mssql":
                    dump_to_sql_statement(T.insert().values(
                            map(lambda r:
                                dict(zip(self.current_ordered_table_columns,
                                         r)),
                                raw_rows)
                            ), fp, self.dst_engine, T.name)
                elif self.dst_engine.dialect.name.lower() == "oracle":
                    self.logger.warning(
                        "** BULK INSERT operation not supported by Oracle. " +
                        "Expect slow run-time.\nThis utilty should be " +
                        "run on the target host to descrease network " +
                        "latency for given this limitation...")
                    dump_to_oracle_insert_statements(
                            fp, self.dst_engine,
                            T.name, raw_rows,
                            self.current_ordered_table_columns)
                else:
                    dump_to_csv(
                        fp,
                        T.name,
                        self.current_ordered_table_columns,
                        raw_rows,
                        self.dst_engine.dialect)
        else:
            ########################################
            # We need to upsert the data...prepare upsertDict...
            ########################################
            upsertDict = {}
            self.logger.info("Gathering unique columns for upsert.")
            if len(pks) == 0:
                s = "There is no primary key defined on table '{0}'!\n " +\
                    "We are unable to Upsert into this table without " +\
                    "identifying unique rows based on PKs!".format(T.name)
                raise Exception(s)
            unique_columns = filter(lambda c: c.name.lower() in pks, T.columns)
            self.logger.info(
                "Unique columns are '{0}'".format(
                    str(unique_columns)))
            q = select(unique_columns)
            rows = conn.execute(q).fetchall()
            for r in rows:
                uid = ""
                for pk in pks:
                    uid += str(getattr(r, pk))
                upsertDict[uid] = True
            ################################
            # Now upsert each row...
            ################################
            self.logger.info("Creating 'upsert' statements for '" +
                             str(len(raw_rows)) +
                             "' rows, and dumping to '" +
                             str(T.name) +
                             ".sql'.")

            init_len = len(raw_rows)
            for r in range(init_len - 1, -1, -1):
                uid = ""
                row = raw_rows[r]
                for pk in pks:
                    uid += str(row[self.current_ordered_table_columns.index(pk)])
                if upsertDict.get(uid):
                    with open(data_file_path, "a+") as fp:
                        stmt = T.update()\
                               .where(and_(*tuple(
                                   map(lambda pk:
                                       T.columns[pk] ==
                                       row[self.current_ordered_table_columns
                                           .index(pk)],
                                       pks))))\
                               .values(dict(zip(
                                   self.current_ordered_table_columns, row)))
                        dump_to_sql_statement(stmt, fp, self.dst_engine, T.name)
                    del raw_rows[r]
            #################################
            # Insert the remaining rows...
            #################################
            self.logger.info("Creating 'insert' stmts for (the remaining)" +
                             str(len(raw_rows)) +
                             " rows, and dumping to '" +
                             str(T.name) +
                             ".sql' (because they DNE in the table!).")
            insertionCount = (len(raw_rows) / 1000) + 1
            raw_row_len = len(raw_rows)
            if len(raw_rows) > 0:
                self.logger.info(
                    " ({0}) -- Inserting remaining '{0}' rows."
                    .format(str(raw_row_len)))
                with open(data_file_path, "a+") as fp:
                    dump_to_sql_statement(
                        T.insert().values(raw_rows), fp,
                        self.dst_engine, T.name)
        conn.close()
    # TODO: Have a 'Create' option for each table...

    def migrate(
            self,
            destination_database_url,
            migrate_data=True,
            migrate_schema=True):
        """"""""""""""""""""""""
        """ ** REFLECTION ** """
        """"""""""""""""""""""""
       
        buffer_size = 10000

        if self.database_url.split(":")[0] == "oracle+cx_oracle":
            try:
                self.engine = create_engine(
                    self.database_url, arraysize=buffer_size)
            except ImportError as e:
                raise DBApiNotFound(self.database_url)
        else:
            try:
                self.engine = create_engine(self.database_url)
            except ImportError as e:
                raise DBApiNotFound(self.database_url)
        # Create inspectors to gather schema info...
        self.src_insp = reflection.Inspector.from_engine(self.engine)
        self.table_names = self.src_insp.get_table_names()
        try:
            self.dst_engine = create_engine(destination_database_url)
        except ImportError as e:
            raise DBApiNotFound(destination_database_url)
        dst_meta = MetaData()

        Session = sessionmaker(bind=self.dst_engine)
        dst_meta.bind = self.dst_engine

        self.tgt_insp = reflection.Inspector.from_engine(self.dst_engine)

        TablesIterator = self.table_names  # defaults to ALL tables

        if self.included_tables and self.excluded_tables:
            raise Exception("Can't provide 'included_tables'" +
                            "'excluded_tables', choose 1...aborting...")

        if self.included_tables:
            TablesIterator = self.included_tables
        elif self.excluded_tables:
            TablesIterator = list(set(TablesIterator) -
                                  set(self.excluded_tables))
       
        t_idx = -1
        t_total = len(TablesIterator)
        self.logger.info("""
        *************************
        *** Total Tables: {0} ***
        *************************
        """.format(str(t_total)))
        for table_name in TablesIterator:
            t_idx += 1
            #######################
            # Time each table...
            #######################
            self.times[table_name] = {}
            self.table_count += 1
            self.logger.info("Reading Table Schema '" + table_name + "'...")
            pk_count = 0
            auto_inc_count = 0

            t_start_extract = datetime.now()
            T_src = Table(table_name, MetaData())
            try:
                self.src_insp.reflecttable(T_src, None)
            except NoSuchTableError as table:
                self.logger.error(
                    "Table '" +
                    table +
                    "' not found in DB: '" +
                    destination +
                    "'.")
                continue  # skip to next table...
            except sqlalchemy.exc.DBAPIError as e:
                self.logger.error(str(e))
                # Let SQL Server sleep b/c of FreeTDS buffer clean up issues
                time.sleep(10)
                self.src_insp.reflecttable(T_src, None)
            ###############################
            # Gather indexes & FKs
            ###############################
            self.indexes[table_name] = self.src_insp.get_indexes(table_name)
            self.fks[table_name] = self.src_insp.get_foreign_keys(table_name)
            self.logger.info(
                "Loaded indexes and FKs for table '{0}'".format(table_name))
            if migrate_schema:
                T = Table(table_name, dst_meta)
                ###############################
                # Check if DST table exists...
                ###############################
                T_dst_exists = True
                try:
                    self.tgt_insp.reflecttable(T, None)
                except sqlalchemy.exc.NoSuchTableError as e:
                    T_dst_exists = False
                    self.logger.warning(
                        "Table '" +
                        T.name +
                        "' does not exist in the dst database " +
                        "(we will create this later...)")

                """"""""""""""""""""""""""
                """ *** EXTRACTION *** """
                """"""""""""""""""""""""""
                #########################################################
                # Generate the mapping of 'column_name' -> 'list index'
                ########################################################
                cols = map(lambda c: c.name, T_src.columns)
                self.current_ordered_table_columns = [None] * len(cols)
                self.original_ordered_table_columns = [None] * len(cols)
                for i in range(0, len(cols)):
                    self.original_ordered_table_columns[i] = cols[i]
                    self.current_ordered_table_columns[i] = cols[i]
                ###################################
                # Grab raw rows for data type checking...
                ##################################
                self.logger.info(
                    "Building query to fetch all rows from {0}".format(
                        T_src.name))
                

                cnt = self.engine.execute(T_src.count()).fetchone()[0]
                resultProxy = self.engine.execute(T_src.select())
                self.logger.info("Done. ({0} total rows)".format(str(cnt)))
                j = 0
                self.logger.info("Loading all rows into memory...")
                rows = []

                for i in range(1, (cnt / buffer_size) + 1):
                    self.logger.info(
                        "Fetched {0} rows".format(str(i * buffer_size)))
                    rows += resultProxy.fetchmany(buffer_size)
                rows += resultProxy.fetchmany(cnt % buffer_size)
                # Don't rely on Python garbage collection...
                resultProxy.close()

                assert(cnt == len(rows))

                raw_rows = [list(row) for row in rows]
                self.logger.info("Done")
                pks = []

                t_start_transform = datetime.now()

                # TODO: Use column/table mappers, would need to update foreign
                # keys...
            
                for column in T_src.columns:
                    self.column_count += 1
                    ##############################
                    # Check for multiple primary
                    #  keys & auto-increment
                    ##############################
                    if column.primary_key:
                        pks.append(column.name.lower())
                        pk_count += 1
                    
                    if column.autoincrement:
                        auto_inc_count += 1
                    ##############################
                    # Standardize Column Type
                    ##############################
                    column_copy = self.standardize_column_type(column, raw_rows)
                    """"""""""""""""""""""""""""""
                    """ *** ELIMINATION I *** """
                    """"""""""""""""""""""""""""""
                    self.add_or_eliminate_column(
                        T, T_dst_exists, column, column_copy, raw_rows)

                if self.dst_engine.dialect.name.lower() == "mysql":
                    #######################################
                    # Remove auto-inc on composite PK's
                    #######################################
                    self.check_multiple_autoincrement_issue(
                        auto_inc_count, pk_count, T)
                if self.transform_table(T) is None:
                    # Skip the table, it is scheduled to be deleted...
                    continue
                elif len(T.columns) == 0:
                    # TODO: Delete table from T_dst
                    self.logger.warning(
                        "Table '" + T.name + "' has all NULL columns, " +
                        "skipping...")
                    self.empty_table_count += 1
                    self.empty_tables.append(T.name)
                    continue
                elif len(raw_rows) == 0 and self.skip_table_if_empty:
                    self.logger.warning(
                        "Table '" + T.name + "' has 0 rows, skipping...")
                    self.empty_table_count += 1
                    self.empty_tables.append(T.name)
                    continue
                else:
                    tableCreationSuccess = self.create_table(T_dst_exists, T)
                    if not tableCreationSuccess:
                        continue

                """"""""""""""""""""""""""""""
                """" *** INSERT ROWS *** """""
                """"""""""""""""""""""""""""""
                data_file_path = os.getcwd() + "/{0}.sql".format(T.name)
                if os.path.isfile(data_file_path):
                    os.remove(data_file_path)
                # Delete the old file if it esists (i.e. if a previous run went
                # bad and didn't clean up...)

                dst_meta.reflect(self.dst_engine)

                #self.tgt_insp.reflecttable(T, None)
                t_start_dump = datetime.now()
                t_start_load = datetime.now()
                
                row_buffer_size = 100000
                if self.dst_engine.dialect.name.lower() == 'mssql' and \
                 not self.enable_mssql_bulk_insert:
                    # MSSQL limits the amount of INSERTS per query
                    row_buffer_size = 1000

                if migrate_data:
                    self.logger.info("Transforming & Dumping " +
                                     str(len(raw_rows)) +
                                     " total rows from table '" +
                                     str(T.name) +
                                     "' into '{0}'.".format(data_file_path))
                    # Create buffers of "100000" rows
                    # TODO: Parameterize "100000" as 'buffer_size' (should be
                    # configurable)
                    insertionCount = (len(raw_rows) / row_buffer_size) + 1
                    raw_row_len = len(raw_rows)
                    self.total_rows += raw_row_len
                    if len(raw_rows) > 0:
                        for i in range(0, insertionCount):
                            startRow = 0  # i * 1000
                            endRow = row_buffer_size  # (i+1) * 1000
                            virtualStartRow = i * row_buffer_size
                            virtualEndRow = (i + 1) * row_buffer_size
                            if virtualEndRow > raw_row_len:
                                virtualEndRow = raw_row_len
                                endRow = raw_row_len
                            self.logger.info(
                                " ({0}) -- Transforming rows: ".format(
                                    T.name) +
                                str(virtualStartRow) +
                                " -> " +
                                str(virtualEndRow) +
                                "...({0} Total)".format(
                                    str(raw_row_len)))
                            self.transform_data(
                                T_src, raw_rows[startRow:endRow])
                            self.logger.info(
                                " ({0}) -- Dumping rows: "
                                .format(T.name) +
                                str(virtualStartRow) +
                                " -> " +
                                str(virtualEndRow) +
                                " to '{1}.sql'...({0} Total)"
                                .format(str(raw_row_len), T.name) +
                                "[Table {0}/{1}]".format(str(t_idx), str(t_total)))
                            self.dump_data(
                                T_dst_exists, T, raw_rows[startRow:endRow],
                                pks, Session)
                            del raw_rows[startRow:endRow]

                        #######################################################
                        # Now *actually* load the data via fast-CLI utilities
                        #######################################################
                        t_start_load = datetime.now()
                        # From <table_name>.sql
                        self.send_data(
                            T.name, self.current_ordered_table_columns)

                t_stop_load = datetime.now()

                ###################################
                # Calculate operation time... ###
                ###################################

                extraction_dt = t_start_transform - t_start_extract
                extraction_dt_str = str(
                    extraction_dt.seconds / 60) + "m:" + \
                    str(extraction_dt.seconds % 60) + "s"

                transform_dt = t_start_dump - t_start_transform
                transform_dt_str = str(
                    transform_dt.seconds / 60) + "m:" + \
                    str(transform_dt.seconds % 60) + "s"

                dump_dt = t_start_load - t_start_dump
                dump_dt_str = str(dump_dt.seconds / 60) + \
                    "m:" + str(dump_dt.seconds % 60) + "s"

                load_dt = t_stop_load - t_start_load
                load_dt_str = str(load_dt.seconds / 60) + \
                    "m:" + str(load_dt.seconds % 60) + "s"

                self.times[table_name][
                    'Extraction Time (From Source)'] = extraction_dt_str
                self.times[table_name][
                    'Transform Time (Schema)'] = transform_dt_str
                self.times[table_name][
                    'Data Dump Time (To File)'] = dump_dt_str
                self.times[table_name]['Load Time (Into Target)'] = load_dt_str
                # End first table loop...

    def add_indexes(self, destination_database_url):
        dst_meta = MetaData()
        dst_meta.reflect(bind=self.dst_engine)
        dst_meta.bind = self.dst_engine
        Session = sessionmaker(bind=self.dst_engine)
        """"""""""""""""""""
        """ *** INDEX *** """
        """"""""""""""""""""
        ############################
        # Add Indexes (Some db's require indexed references...
        ############################
        idx_count = 0
        for table_name in self.indexes.keys():
            t_start_index = datetime.now()
            pre_transformed_table_name = table_name

            indexes = self.indexes.get(table_name)
            ####################################
            # Check to see if table_name
            # has been transformed...
            ####################################
            table_transform = self.schema_transformer.table_transformations\
                .get(table_name)
            column_transformer = self.schema_transformer.column_transformations\
                .get(table_name)
            if table_transform and table_transform.new_table not in ["", None]:
                # Update the table_name
                table_name = table_transform.new_table
            this_idx_count = 0
            self.logger.info("Creating indexes for '" + table_name + "'...")
            for i in indexes:
                self.logger.info(str(i))
                self.total_indexes += 1
                session = Session()
                col = i['column_names']
                continueFlag = False
                if len(col) == 0:
                    self.logger.warning("Index has no columns! This may be an " +
                        "issue with the metadata reflection function..." +
                        "\n** This issue is known on MSSQL Sources")
                    continueFlag = True
                unique = i['unique']
                # Name the index something compatible across all databases
                # (i.e. can't create Idx w/ same name as column in Postgresql)
                name = "IDX_" + table_name + "__" + \
                    "_".join(col) + "__" + str(this_idx_count)
                # Max length of identifier is 63 characters in 
                # postgresql & mysql
                if len(name) > 63:
                    name = name[:60] + "_" + str(this_idx_count)
                # number prefix guarentees uniqueness (i.e. if multiple idx's
                # on one column)
                cols = ()
                self.logger.info(
                    "Checking validity of data indexed by: " +
                    "'{0}' (column = '{1}' - table = '{2}')"
                    .format(
                        name, str(col), table_name))
                for c in col:
                    #####################################
                    # Check for Column Transformations...
                    #####################################
                    if column_transformer and\
                     column_transformer.get(c) and\
                     column_transformer[c].new_column not in ["", None]:
                        c = column_transformer[c].new_column
                    #####################################
                    # Check to see if the table and colum nexist
                    #####################################
                    tableHolder = dst_meta.tables.get(table_name)
                    if tableHolder is None:
                        continueFlag = True
                        self.logger.warning(
                            "Skipping index '" + str(name) + "' on column '" +
                            table_name + "." + c + "' because the table DNE" +
                            " in the destination DB schema.")
                    else:
                        columnHolder = dst_meta.tables.get(
                            table_name).columns.get(c)
                        if str(columnHolder) == 'None':
                            self.logger.warning(
                                "Skipping index '" + str(name) + "' on col" +
                                "' " + table_name + "." + c + "' because the" +
                                " column DNE in the destination DB schema.")
                            continueFlag = True  # Skip to the next table...
                        cols += (dst_meta.tables.get(table_name).columns.
                                 get(c),)
                if continueFlag:
                    self.skipped_index_count += 1
                    continue
                    # Don't create this Index - the table/column don't exist!

                I = Index(name, *cols, unique=unique)

                violationCount = 0
                if unique:
                    ############################################
                    # Check for Unique Constraint Violations
                    ############################################
                    cols_tuple = tuple(cols)
                    # We have a composite index, let's deal with it...
                    if len(cols_tuple) > 1:
                        uniqueGroups = session.query(
                            *
                            cols_tuple).group_by(
                            *
                            cols_tuple).count()
                        totalEntries = session.query(*cols_tuple).count()
                        # The difference represents repeated combinations of
                        # 'cols_tuple'
                        violationCount = totalEntries - uniqueGroups
                    else:
                        violationCount = session.query(
                            *
                            cols_tuple).group_by(
                            *
                            cols_tuple). having(
                            func.count(
                                *
                                cols_tuple) > 1).count()
                if violationCount > 0:
                    self.logger.error(
                        "Duplicates found in column '" +
                        str(col) +
                        "' for unique index '" +
                        name)
                    self.unique_constraint_violations.append(
                        name + " (" + str(col) + ")")
                    self.unique_constraint_violation_count += violationCount
                    self.skipped_index_count += 1
                    # TODO: Gather bad rows...
                else:
                    self.logger.info("Adding Index: " + str(I))
                    session.close()
                    try:
                        I.create(self.dst_engine)
                    except sqlalchemy.exc.OperationalError as e:
                        self.logger.warning(str(e) + "\n -- it is likely " +
                                            "that the Index already exists...")
                        self.skipped_index_count += 1
                        continue
                    idx_count += 1
                    this_idx_count += 1
            self.logger.info(
                """ Done. (Added '{0}' indexes to '{1}')"""
                .format(str(this_idx_count), table_name))

            t_stop_index = datetime.now()
            index_dt = t_stop_index - t_start_index
            self.times[pre_transformed_table_name]['Indexing Time'] = \
                str(index_dt.seconds / 60) + "m:" + \
                str(index_dt.seconds % 60) + "s"

        self.index_count = idx_count

    def add_fks(self, destination_database_url):
        ############################
        # Add FKs
        ############################
        dst_meta = MetaData()
        
        if self.dst_engine.dialect.name.lower() == "mssql":
            raise Exception(
                "Adding Constraints to MSSQL is not supported" +
                " by sqlalchemy_migrate...")
        dst_meta.reflect(bind=self.dst_engine)
        dst_meta.bind = self.dst_engine
        Session = sessionmaker(bind=self.dst_engine)
        ##########################
        # HERE BE HACKS!!!!
        ##########################
        """
        Problem: often times when porting DBs, data is old, not properly
        constrained and overall messy. FK constraints get violated without DBAs
        knowing it (in engines that don't enforce or support FK constraints)

        Hack: Turn off FK checks when porting FKs..

        Better Solution: ...would be to insert data AFTER fks are inserted, row
        by row, and ask the user to correct the row in question, or delete it,
        this is more of a 'transform' operation than a 'Constraint' op...
        """

        if self.dst_engine.dialect.name.upper() == "MYSQL":
            self.dst_engine.execute("SET foreign_key_checks = 0")
        elif self.dst_engine.dialect.name.upper() == "POSTGRESQL":
            self.logger.warning(
                "Can't disable foreign key checks on POSTGRSQL")
        else:
            self.logger.warning("Can't disable foreign key checks...")

        inspector = self.tgt_insp
        for table_name in self.fks.keys():
            pre_transformed_table_name = table_name
            t_start_constraint = datetime.now()
            fks = self.fks[table_name]
            ####################################
            # Check to see if table_name
            # has been transformed...
            ####################################
            table_transform = self.schema_transformer.table_transformations.get(
                table_name)
            if table_transform and table_transform.new_table not in ["", None]:
                # Update the table_name
                table_name = table_transform.new_table
            self.logger.info(
                "Adding FKs to table '{0}' (previously {1})".format(
                    table_name, pre_transformed_table_name))
            ########################
            # Check that constrained table
            # exists in destiation DB schema
            ########################

            T = Table(table_name, dst_meta)
            try:
                inspector.reflecttable(T, None)
            except sqlalchemy.exc.NoSuchTableError as e:
                self.logger.warning(
                    "Skipping FK constraints on table '" +
                    str(table_name) +
                    "' because the constrained table does not" +
                    " exist in the destination DB schema.")
                self.skipped_fk_count += len(self.fks[table_name])
                self.total_fks += len(self.fks[table_name])
                continue  # on to the next table...

            for fk in fks:
                cons_column_transformer = \
                        self.schema_transformer.column_transformations.get(
                         pre_transformed_table_name)
                self.total_fks += 1
                session = Session()
                #####################################
                # Check for Column Transformations...
                #####################################
                constrained_columns = []
                for c in fk['constrained_columns']:
                    if cons_column_transformer and \
                     cons_column_transformer.get(c) and \
                     cons_column_transformer[c].new_column not in ["", None]:
                        c = cons_column_transformer[c].new_column
                    constrained_columns.append(c)
                constrained_cols = filter(lambda c: c is not None,
                        map(lambda x: T.columns.get(x),
                              constrained_columns))
                         

                ################################
                # Check that the constrained columns
                # exists in the destiation db schema
                ################################
                if len(constrained_cols) < len(fk['constrained_columns']):
                    self.logger.warning("Skipping FK constraint '" +
                                        constraint_name +
                                        "' because constrained columns '" +
                                        str(fk['constrained_columns']) +
                                        "' on table '" +
                                        str(table_name) +
                                        "' don't exist in the destination " +
                                        "DB schema.")
                    session.close()
                    self.skipped_fk_count += 1
                    continue
                ref_table = fk['referred_table']

                ####################################
                # Check to see if table_name
                # has been transformed...
                ####################################
                table_transform = \
                    self.schema_transformer.table_transformations.get(
                                  ref_table)
                ref_column_transformer = \
                    self.schema_transformer.column_transformations.get(
                                  ref_table)
                if table_transform and table_transform.new_table not in [
                        "", None]:
                    # Update the table_name
                    ref_table = table_transform.new_table
                T_ref = Table(ref_table, dst_meta)
                ############################
                # Check that referenced table
                # exists in destination DB schema
                ############################
                constraint_name = "FK__{0}__{1}".format(
                    table_name.upper(), T_ref.name.upper())
                if len(constraint_name) > 63:
                    constraint_name = constraint_name[:63]
                
                try:
                    inspector.reflecttable(T_ref, None)
                except sqlalchemy.exc.NoSuchTableError as e:
                    self.logger.warning(
                        "Skipping FK constraint '" +
                        constraint_name +
                        "' because referenced table '" +
                        ref_table +
                        "' doesn't exist in the destination DB schema." +
                        " (FK Dependency not met)")
                    session.close()
                    self.skipped_fk_count += 1
                    continue
                ############################
                # Check that referenced columns
                # Exist in destination DB schema
                ############################
                ref_columns = []
                for c in fk['referred_columns']:
                    if ref_column_transformer and \
                     ref_column_transformer.get(c) and \
                     ref_column_transformer[c].newColumns not in ["", None]:
                        c = ref_column_transformer[c].newColumn
                    ref_columns.append(c)
                referred_columns = map(
                    lambda x: T_ref.columns.get(x), ref_columns)
                self.logger.info("Ref Columns: " + str(ref_columns))
                if len(referred_columns) < len(fk['referred_columns']):
                    self.logger.warning("Skipping FK constraint '" +
                                        constraint_name +
                                        "' because referenced columns '" +
                                        str(fk['referred_columns']) +
                                        "' on table '" +
                                        str(ref_table) +
                                        "' don't exist in the destination " +
                                        "DB schema.")
                    session.close()
                    self.skipped_fk_count += 1
                    continue

                ##################################
                # Check for referential integrity violations
                ##################################
                if self.check_referential_integrity:
                    if self.dst_engine.dialect.name.upper(
                    ) in ["MYSQL", "POSTGRESQL"]:  # HACKS
                        self.logger.info(
                            "Checking referential integrity of '" +
                            str(table_name) +
                            "." +
                            str(constrained_columns) +
                            " -> '" +
                            str(
                                T_ref.name) +
                            "." +
                            str(ref_columns) +
                            "'")
                        t = session.query(
                            T_ref.columns.get(
                                referred_columns[0].name))
                        query2 = session.query(T)

                        q = query2.filter(
                            and_(
                                ~T.columns.get(
                                    constrained_cols[0].name).in_(t),
                                T.columns.get(
                                    constrained_cols[0].name) is not None))
                        bad_rows = session.execute(q).fetchall()

                        if len(bad_rows) > 0:
                            self.logger.warning("FK from '" +
                                                T.name +
                                                "." +
                                                constrained_cols[0].name +
                                                " -> " +
                                                T_ref.name +
                                                "." +
                                                referred_columns[0].name +
                                                " was violated '" +
                                                str(len(bad_rows)) +
                                                "' times.")
                            self.referential_integrity_violations += len(
                                bad_rows)
                            for row in bad_rows:
                                self.riv_arr.append(str(row.values()))

                    else:
                        self.logger.warning(
                            "Adding constraints only supported/tested for " +
                            "MySQL")
                self.logger.info("Adding FK '" + constraint_name + "' to '" +
                                 table_name + "'")
                session.close()
                cons = ForeignKeyConstraint(
                    name=constraint_name,
                    columns=constrained_cols,
                    refcolumns=referred_columns,
                    table=T)
                # Loop to handle tables that reference other tables w/ multiple
                # columns & FKs
                creation_succesful = False
                max_fks = 15
                cnt = 0
                while not creation_succesful:
                    try:
                        cons.create(self.dst_engine)
                        creation_succesful = True
                    except sqlalchemy.exc.OperationalError as e:
                        # MySQL Exception
                        self.logger.warning(
                            str(e) + "\n ---> an FK on this table already " +
                            "references the ref_table...appending '{0}' to" +
                            " FK's name and trying again...".format(
                                str(cnt)))
                        cons = ForeignKeyConstraint(
                            name=constraint_name +
                            "_{0}".format(
                                str(cnt)),
                            columns=constrained_cols,
                            refcolumns=referred_columns,
                            table=T)
                        cnt += 1
                        if cnt == max_fks:
                            self.logger.error(
                                "FK creation was unsuccesful " +
                                "(surpassed max number of FKs on 1 table" +
                                "which all reference another table)")
                            self.skipped_fk_count += 1
                            break
                    except sqlalchemy.exc.ProgrammingError as e:
                        # PostgreSQL Exception
                        self.logger.warning(
                            str(e) +
                            "\n ---> an FK on this table already references " +
                            "the ref_table...appending '{0}' to FK's name " +
                            "and trying again...".format(
                                str(cnt)))
                        cons = ForeignKeyConstraint(
                            name=constraint_name +
                            "_{0}".format(
                                str(cnt)),
                            columns=constrained_cols,
                            refcolumns=referred_columns,
                            table=T)
                        cnt += 1
                        if cnt == max_fks:
                            self.logger.error(
                               "FK creation was unsuccesful (surpassed max " +
                               "number of FKs on 1 table which all reference" +
                               " another table)")
                            self.skipped_fk_count += 1
                            break

                self.fk_count += 1
            t_stop_constraint = datetime.now()
            constraint_dt = t_stop_constraint - t_start_constraint
            constraint_dt_str = str(constraint_dt.seconds / 60) + "m:" +\
                str(constraint_dt.seconds % 60) + "s"

            self.times[pre_transformed_table_name][
                'Constraint Time'] = constraint_dt_str

    def print_timings(self):
        stop = datetime.now()
        dt = stop - self.start
        timeString = ""
        # if dt.seconds > 3600:
        #    timeString += (str(int(dt.seconds / 3600)) + ":")
        timeString += str(dt.seconds / 60) + "m:" + str(dt.seconds % 60) + "s"
        self.logger.info("""
       ========================
       === * Sync Summary * ===
       ========================\n
       Total Tables:                     {0}
       -- Empty Tables   (skipped)       {1}
       -- Deleted Tables (skipped)       {15}
       -- Synced Tables                  {2}\n
       ========================\n
       Total Columns:                    {3}
       -- Empty Columns   (skipped)      {4}
       -- Deleted Columns (skipped)      {16}
       -- Synced Columns                 {5}\n
       ========================\n
       Total Indexes                     {8}
       -- Skipped Indexes                {11}
       -- Synced Indexes                 {12}\n
       ========================\n
       Total FKs                         {9}
       -- Skipped FKs                    {13}
       -- Synced FKs                     {14}\n
       ========================\n
       Referential Integrity Violations: {6}
       ========================\n
       Unique Constraint Violations:     {10}
       ========================\n
       Total Time:                       {7}
       Total Rows:                       {17}
       Rows per Minute:                  {18}\n\n""".format(
            str(self.table_count),
            str(self.empty_table_count),
            str(self.table_count - self.empty_table_count),
            str(self.column_count),
            str(self.null_column_count),
            str(self.column_count - self.null_column_count),
            str(self.referential_integrity_violations),
            timeString,
            str(self.total_indexes),
            str(self.total_fks),
            str(self.unique_constraint_violation_count),
            str(self.skipped_index_count),
            str(self.index_count),
            str(self.skipped_fk_count),
            str(self.fk_count),
            str(self.deleted_table_count),
            str(self.deleted_column_count),
            str(self.total_rows),
            str(self.total_rows / ((dt.seconds / 60) or 1))))
        # self.logger.warning("Referential Integrity " +
        # "Violations: \n" + "\n".join(self.riv_arr))
        self.logger.warning(
            "Unique Constraint Violations: " +
            "\n".join(
                self.unique_constraint_violations))

        self.logger.info("""
       =========================
       === ** TIMING INFO ** ===
       =========================
                _____
             _.'_____`._
           .'.-'  12 `-.`.
          /,' 11      1 `.\\
         // 10      /   2 \\\\
        ;;         /       ::
        || 9  ----O      3 ||
        ::                 ;;
         \\\\ 8           4 //
          \`. 7       5 ,'/
           '.`-.__6__.-'.'
            ((-._____.-))
            _))       ((_
           '--'       '--'
       __________________________
       """)
        ordered_timings = [
            "Extraction Time (From Source)",
            "Transform Time (Schema)",
            "Data Dump Time (To File)",
            "Load Time (Into Target)",
            "Indexing Time",
            "Constraint Time"]
        for (table_name, timings) in self.times.iteritems():
            self.logger.info(table_name)
            for key in ordered_timings:
                self.logger.info("-- " + str(key) + ": " +
                    str(timings.get(key) or 'N/A'))
            self.logger.info("_________________________")

        self.schema_transformer.failed_transformations = list(
            self.schema_transformer.failed_transformations)
        if len(self.schema_transformer.failed_transformations) > 0:
            self.logger.critical(
                "\n".join(self.schema_transformer.failed_transformations))
            self.logger.critical("""
           !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
           !!!! * '{0}' Old Columns had failed transformations !!!!
           !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
           """.format(str(len(self.schema_transformer.failed_transformations))))

            self.logger.critical(
                "\n".join(self.schema_transformer.failed_transformations))

        ###########################################
        # Write 'Deleted' columns out to a file...
        ###########################################
        removedColumns = self.deleted_columns + self.null_columns
        with open("deleted_columns.csv", "w") as fp:
            fp.write("\n".join(map(lambda c:
                     c.replace(".", ","), removedColumns)))


================================================
FILE: etlalchemy/ETLAlchemyTarget.py
================================================
from etlalchemy_exceptions import DBApiNotFound
from sqlalchemy_utils import database_exists, create_database, drop_database
from sqlalchemy import create_engine, MetaData
# import dill
import logging


class ETLAlchemyTarget():
    def __init__(self, conn_string, drop_database=False):
        self.drop_database = drop_database
        self.conn_string = conn_string
        self.dst_engine = None
        ##########################
        # Right now we only assume  sql database...
        ##########################
        self.sources = []
        self.logger = logging.getLogger("ETLAlchemyTarget")
        for h in list(self.logger.handlers):
            # Clean up any old loggers...
            # (useful during testing w/ multiple log_files)
            self.logger.removeHandler(h)
        handler = logging.StreamHandler()
        formatter = logging.Formatter('%(name)s (%levelname)s) - %(message)s')
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)
    # Add an ETLAlchemySource to the list of 'sources'
    """ Each 'migrator' represents a source SQL DB """
    def addSource(self, source):
        if not getattr(source, 'migrate'):
            raise Exception("Source '" + str(source) +
                            "' has no function 'migrate'...")
        self.sources.append(source)

    def migrate(self, migrate_schema=True, migrate_data=True,
                migrate_fks=True, migrate_indexes=True):
        try:
            self.dst_engine = create_engine(self.conn_string)
        except ImportError as e:
            raise DBApiNotFound(self.conn_string)
        if self.drop_database:
            self.logger.info(self.dst_engine.dialect.name)
            ############################
            # Hack for SQL Server using DSN's
            # and not havine DB name in connection_string
            ############################
            if self.dst_engine.dialect.name.upper() == "MSSQL":
                db_name = list(self.dst_engine.execute(
                    "SELECT DB_NAME()").fetchall())[0][0]
                self.logger.warning(
                        "Can't drop database {0} on MSSQL, " +
                        "dropping tables instead...".format(db_name))
                m = MetaData()
                m.bind = self.dst_engine
                m.reflect()
                m.drop_all()
            elif self.dst_engine.dialect.name.upper() == "ORACLE":
                db_name = list(self.dst_engine.execute(
                    "SELECT SYS_CONTEXT('userenv','db_name') " +
                    "FROM DUAL").fetchall())[0][0]
                self.logger.warning(
                        "Can't drop database {0} on ORACLE, " +
                        "dropping tables instead...".format(db_name))
                m = MetaData()
                m.bind = self.dst_engine
                m.reflect()
                m.drop_all()
            else:
                if self.dst_engine.url and database_exists(self.dst_engine.url):
                    self.logger.warning(self.dst_engine.url)
                    self.logger.warning(
                            "Dropping database '{0}'"
                            .format(self.conn_string.split("/")[-1]))
                    drop_database(self.dst_engine.url)
                    self.logger.info(
                            "Creating database '{0}'"
                            .format(self.conn_string.split("/")[-1]))
                    create_database(self.dst_engine.url)
                else:
                    self.logger.info("Database DNE...no need to nuke it.")
                    create_database(self.dst_engine.url)
        for source in self.sources:
            self.logger.info(
                    "Sending source '" + str(source) + "' to destination '" +
                    str(self.conn_string) + "'")
            source.migrate(self.conn_string, migrate_schema=migrate_schema,
                           migrate_data=migrate_data)
            if migrate_indexes:
                source.add_indexes(self.conn_string)
            if migrate_fks:
                if self.dst_engine.dialect.name.lower() == "mssql":
                    self.logger.warning(
                            "** SKIPPING 'Add Foreign Key Constraints' " +
                            "BECAUSE 'sqlalchemy_migrate' DOES NOT " +
                            "SUPPORT fk.create() ON *MSSQL*")
                else:
                    source.add_fks(self.conn_string)
            source.print_timings()


================================================
FILE: etlalchemy/__init__.py
================================================
from ETLAlchemySource import ETLAlchemySource
from ETLAlchemyTarget import ETLAlchemyTarget


================================================
FILE: etlalchemy/etlalchemy_exceptions.py
================================================
class DBApiNotFound(Exception):
    def __init__(self, conn_string):
        dialect_to_db_apis = {
            'oracle+cx_oracle': 'cx_Oracle',
            'mysql': 'MySQL-python',
            'postgresql': 'psycopg2',
            'mssql+pyodbc': 'pyodbc',
            'sqlite': 'sqlite3'
        }
        dialect_to_walkthrough_urls = {
            'oracle+cx_oracle': 'sharrington/databases/oracle/install-cx_oracle-mac',
        }
        dialect = conn_string.split(":")[0]
        db_api = dialect_to_db_apis.get(dialect) or \
            "No driver found for dialect '{0}'".format(dialect)
        self.msg = """
  ********************************************************
  ** While creating the engine for '{0}', SQLAlchemy tried to
  ** import the DB API module '{1}' but failed.
  ********************************************************
  **  + This is because 1 of 2 reasons:
  **  1.) You forgot to install the DB API module '{1}'.
  **  --> (Try: 'pip install {1}')
  **  2.) If the above step fails, you most likely forgot to
  **  --> install the actual database driver on your local
  **  --> machine! The driver is needed in order to install
  **  --> the Python DB API ('{1}').
  **  --> (see the following link for instructions):
  ** https://thelaziestprogrammer.com/{2}
  **********************************************************
        """.format(conn_string, db_api, dialect_to_walkthrough_urls.get(dialect) or "")

    def __str__(self):
        return self.msg


================================================
FILE: etlalchemy/literal_value_generator.py
================================================
import shutil
import decimal
import datetime
# Find the best implementation available on this platform
try:
    from cStringIO import StringIO
except:
    from StringIO import StringIO

def _generate_literal_value_for_csv(value, dialect):
    dialect_name = dialect.name.lower()
    
    if isinstance(value, basestring):
        if dialect_name in ['sqlite', 'mssql']:
            # No support for 'quote' enclosed strings
            return "%s" % value
        else:
            value = value.replace('"', '""')
            return "\"%s\"" % value
    elif value is None:
        return "NULL"
    elif isinstance(value, bool):
        return "%s" % int(value)
    elif isinstance(value, (float, int, long)):
        return "%s" % value
    elif isinstance(value, decimal.Decimal):
        return str(value)
    elif isinstance(value, datetime.datetime):
        if dialect_name == "mysql":
            return '%02d-%02d-%02d %02d:%02d:%02d' %\
                    (value.year,value.month,value.day,value.hour,value.minute,value.second)
        elif dialect_name == "oracle":
            return "TO_DATE('%s','YYYY-MM-DD HH24:MI:SS')" %\
                    ('%02d-%02d-%02d %02d:%02d:%02d' %\
                        (value.year,value.month,value.day,value.hour,value.minute,value.second))
                #value.strftime("%Y-%m-%d %H:%M:%S")
        elif dialect_name == "postgresql":
            return '%02d-%02d-%02d %02d:%02d:%02d' %\
                    (value.year,value.month,value.day,value.hour,value.minute,value.second)
            #return '%Y-%m-%d %H:%M:%S'.format(value)
            #return "\"%s\"" % value.strftime("%Y-%m-%d %H:%M:%S")
        elif dialect_name == "mssql":
            #return "'%s'" % value.strftime("%m/%d/%Y %H:%M:%S.%p")
            return '%02d%02d%02d %02d:%02d:%02d.0' %\
                    (value.year,value.month,value.day,value.hour,value.minute,value.second)

        elif dialect_name == "sqlite":
            #return "%s" % value.strftime("%Y-%m-%d %H:%M:%S.%f")
            return '%02d-%02d-%02d %02d:%02d:%02d.0' %\
                    (value.year,value.month,value.day,value.hour,value.minute,value.second)
        else:
            raise NotImplementedError(
                    "No support for engine with dialect '%s'. " +
                    "Implement it here!" % dialect.name)
    elif isinstance(value, datetime.date):
        if dialect_name == "mysql":
            return '%02d-%02d-%02d' %\
                    (value.year,value.month,value.day)
        elif dialect_name == "oracle":
            return "TO_DATE('%s', 'YYYY-MM-DD')" %\
                    ('%02d-%02d-%02d' % (value.year,value.month,value.day))
        elif dialect_name == "postgresql":
            return '%02d-%02d-%02d' %\
                    (value.year,value.month,value.day)
        elif dialect_name == "mssql":
            return "'%02d/%02d/%02d'" %\
                    (value.year,value.month,value.day)
        elif dialect_name == "sqlite":
            return "%02d-%02d-%02d" %\
                    (value.year,value.month,value.day)
        else:
            raise NotImplementedError(
                    "No support for engine with dialect '%s'." +
                    "Implement it here!" % dialect.name)
    
    else:
        raise NotImplementedError(
                    "Don't know how to literal-quote value %r" % value)


def _generate_literal_value(value, dialect):
    dialect_name = dialect.name.lower()
    if isinstance(value, basestring):
        value = value.replace("'", "''")
        return "'%s'" % value
    elif value is None:
        return "NULL"
    elif isinstance(value, bool):
        return "%s" % int(value)
    elif isinstance(value, (float, int, long)):
        return "%s" % value
    elif isinstance(value, decimal.Decimal):
        return str(value)
    elif isinstance(value, datetime.datetime):
        #if dialect_name == "mysql":
        #    return "STR_TO_DATE('%s','%%Y-%%m-%%d %%H:%%M:%%S')" %\
        #        value.strftime("%Y-%m-%d %H:%M:%S")
        if dialect_name == "oracle":
            return "TO_DATE('%s','YYYY-MM-DD HH24:MI:SS')" %\
                    ('%02d-%02d-%02d %02d:%02d:%02d' %\
                        (value.year,value.month,value.day,value.hour,value.minute,value.second))
        #elif dialect_name == "postgresql":
        #    return "to_date('%s', 'YYYY-MM-DD HH24:MI:SS')" %\
        #        value.strftime("%Y-%m-%d %H:%M:%S")
        elif dialect_name == "mssql":
            #return "'%s'" % value.strftime("%Y%m%d %H:%M:%S %p")
            return "'%02d%02d%02d %02d:%02d:%02d 0'" %\
                    (value.year,value.month,value.day,value.hour,value.minute,value.second)
        #elif dialect_name == "sqlite":
        #    return "'%s'" % value.strftime("%Y-%m-%d %H:%M:%S.%f")
        else:
            raise NotImplementedError(
                    "No support for engine with dialect '%s'. " +
                    "Implement it here!" % dialect.name)
    elif isinstance(value, datetime.date):
        #if dialect_name == "mysql":
        #    return "STR_TO_DATE('%s','%%Y-%%m-%%d')" %\
        #        value.strftime("%Y-%m-%d")
        if dialect_name == "oracle":
            return "TO_DATE('%s', 'YYYY-MM-DD')" %\
                ('%02d-%02d-%02d' % (value.year,value.month,value.day))
        #elif dialect_name == "postgresql":
        #    return "to_date('%s', 'YYYY-MM-DD')" %\
        #        value.strftime("%Y-%m-%d")
        elif dialect_name == "mssql":
            return "'%02d%02d%02d'" % (value.year,value.month,value.day)
        #elif dialect_name == "sqlite":
        #    return "'%s'" % value.strftime("%Y-%m-%d")
        else:
            raise NotImplementedError(
                    "No support for engine with dialect '%s'. " +
                    "Implement it here!" % dialect.name)

    else:
        raise NotImplementedError(
            "Don't know how to literal-quote value %r" % value)


def dump_to_oracle_insert_statements(fp, engine, table, raw_rows, columns):
    ##################################
    # No Bulk Insert available in Oracle
    ##################################
    # TODO: Investigate "sqlldr" CLI utility to handle this load...
    lines = []
    lines.append("INSERT INTO {0} (".format(table) +
                 ",".join(columns) +
                 ")\n")
    num_rows = len(raw_rows)
    dialect = engine.dialect
    for i in range(0, num_rows):
        if i == num_rows-1:
            # Last row...
            lines.append("SELECT " +
                         ",".join(map(lambda c: _generate_literal_value(
                             c, dialect), raw_rows[i])) +
                         " FROM DUAL\n")
        else:
            lines.append("SELECT " +
                         ",".join(map(lambda c: _generate_literal_value(
                             c, dialect), raw_rows[i])) +
                         " FROM DUAL UNION ALL\n")
    fp.write(''.join(lines))


# Supported by [MySQL, Postgresql, sqlite, SQL server (non-Azure) ]
def dump_to_csv(fp, table_name, columns, raw_rows, dialect):
    lines = []
    separator = ","
    # Determine the separator based on Target DB 
    if dialect.name.lower() in ["sqlite"]:
        separator = "|"
    elif dialect.name.lower() in ["mssql"]:
        separator = "|,"
        
    num_cols = len(raw_rows[0])
    num_rows = len(raw_rows)
    out = StringIO()
    for i in range(0, num_rows):
        for j in range(0, num_cols - 1):
            out.write(_generate_literal_value_for_csv(raw_rows[i][j], dialect))
            out.write(separator)
        # Print the last column w/o the separator
        out.write(_generate_literal_value_for_csv(raw_rows[i][num_cols - 1], dialect) + "\n")
    out.seek(0)
    fp.write(out.getvalue())
            

def generate_literal_value(value, dialect, type_):
    """Render the value of a bind parameter as a quoted literal.

    This is used for statement sections that do not accept bind paramters
    on the target driver/database.

    This should be implemented by subclasses using the quoting services
    of the DBAPI.

    """
    return _generate_literal_value(value, dialect)


def dump_to_sql_statement(statement, fp, bind=None, table_name=None):
    """
    print a query, with values filled in
    for debugging purposes *only*
    for security, you should always separate queries from their values
    please also note that this function is quite slow
    """
    import sqlalchemy.orm
    if isinstance(statement, sqlalchemy.orm.Query):
        if bind is None:
            bind = statement.session.get_bind(
                    statement._mapper_zero_or_none()
            )
        statement = statement.statement
    elif bind is None:
        bind = statement.bind

    dialect = bind.dialect
    compiler = statement._compiler(dialect)

    class LiteralCompiler(compiler.__class__):
        def visit_bindparam(
                self, bindparam, within_columns_clause=False,
                literal_binds=False, **kwargs
        ):
            return super(LiteralCompiler, self).render_literal_bindparam(
                    bindparam, within_columns_clause=within_columns_clause,
                    literal_binds=literal_binds, **kwargs
            )

        def render_literal_value(self, value, type_):
            return generate_literal_value(value, dialect, type_)

    compiler = LiteralCompiler(dialect, statement)

    stmt = compiler.process(statement) + ";\n"
    if dialect.name.lower() == "mssql":
        stmt = "SET IDENTITY_INSERT {0} ON ".format(table_name) + stmt

    fp.write(stmt)


================================================
FILE: etlalchemy/schema_transformer.py
================================================
import logging
import csv
import sqlalchemy


class SchemaTransformer():

    class TableTransformation():
        def __init__(self, stRow):
            self.delete = stRow['Delete'].lower() in ["true", "1"]
            self.old_table = stRow['Table Name']
            self.new_table = stRow['New Table Name']

        def __str__(self):
            return "({0} -> {1}...Delete = {2})".\
                format(self.old_table, self.new_table, str(self.delete))

    class ColumnTransformation():
        def __init__(self, stRow):
            self.delete = stRow['Delete'].lower() in ["true", "1"]
            self.old_table = stRow['Table Name']
            self.old_column = stRow['Column Name']
            self.new_column = stRow['New Column Name']
            self.new_type = stRow['New Column Type']

        def _new_type(self):
            return getattr(sqlalchemy.types, self.new_type)

        def __str__(self):
            return self.old_table + "." + self.old_column
    def schedule_deletion_of_column(self, col, table):
        st = self.ColumnTransformation({
            'Delete': "true",
            'Table Name': table,
            'Column Name': col,
            'New Column Name': '',
            'New Column Type': ''
        })
        self.logger.info("Scheduling '{0}' to be deleted due to column being empty".format(col))
        if not self.column_transformations.get(st.old_table):
            # No column transformations exist for the table
            self.column_transformations[st.old_table] = {}
            self.column_transformations[st.old_table][st.old_column] = st
        elif self.column_transformations[st.old_table].get(st.old_column):
            # There ALREADY EXISTS a transformation on this column, UPDATE IT
            self.column_transformations[st.old_table][st.old_column].delete = True
            self.column_transformations[st.old_table][st.old_column] = st
        else:
            # Transformations exist on the table, not nothing on the column
            self.column_transformations[st.old_table][st.old_column] = st

    def __init__(self, column_transform_file,
                 table_transform_file, global_renamed_col_suffixes={}):
        self.logger = logging.getLogger("schema-transformer")
        handler = logging.StreamHandler()
        formatter = logging.Formatter('%(name)s (%(levelname)s) - %(message)s')
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)
        self.column_transformations = {}
        self.table_transformations = {}
        self.failed_transformations = set([])
        self.logger.propagate = False
        self.global_renamed_col_suffixes = global_renamed_col_suffixes
        # Load column mappings
        if column_transform_file:
            with open(column_transform_file, "rU") as fp:
                dr = csv.DictReader(fp)
                for row in dr:
                    st = self.ColumnTransformation(row)
                    if not self.column_transformations.get(st.old_table):
                        self.column_transformations[st.old_table] = {}
                    self.column_transformations[st.old_table][st.old_column] = st
        # Load table mappings
        if table_transform_file:
            with open(table_transform_file, "rU") as fp:
                dr = csv.DictReader(fp)
                for row in dr:
                    st = self.TableTransformation(row)
                    self.table_transformations[st.old_table] = st
    
    # Returns False if deleted...
    def transform_table(self, table):
        thisTableTT = self.table_transformations.get(table.name.lower())
        # Update table name
        if thisTableTT:
            if thisTableTT.delete:
                return False
            if thisTableTT.new_table not in ["", None]:
                self.logger.info(
                    " ----> Renaming table '{0}' to '{1}'"
                    .format(table.name, thisTableTT.new_table))
                table.name = thisTableTT.new_table
                return True
        return True
    # Returns 'True' if an action is defined for the column...

    def transform_column(self, C, tablename, columns):
        # Find Column...
        this_table_st = self.column_transformations.get(tablename)
        initial_column_name = C.name
        action_applied = False
        idx = columns.index(C.name)

        if this_table_st:
            st = this_table_st.get(C.name)
            if st:
                if st.delete:
                    # Remove the column from the list of columns...
                    del columns[idx]
                    action_applied = True
                else:
                    # Rename the column if a "New Column Name" is specificed
                    if st.new_column not in ["", None]:
                        self.logger.info(
                            " ----> Renaming column '{0}' => '{1}'"
                            .format(C.name, st.new_column))
                        C.name = st.new_column
                        columns[idx] = C.name
                        action_applied = True
                    # Change the type of the column if a
                    # "New Column Type" is specified
                    if st.new_type not in ["", None]:
                        old_type = C.type.__class__.__name__
                        try:
                            C.type = st._new_type()
                        except Exception as e:
                            self.logger.critical(
                                "** Couldn't change column type of " +
                                "'{0}' to '{1}'**".
                                format(C.name, st.new_type))
                            self.logger.critical(e)
                            raise e
                    else:
                        self.logger.warning(
                            "Schema transformation defined for " +
                            "column '{0}', but no action was " +
                            "taken...".format(C.name))

        if not action_applied:
            # Then the column had no 'action' applied to it...
            for k in self.global_renamed_col_suffixes.keys():
                # Check if column name ends with specfiic suffix
                if initial_column_name.lower().endswith(k.lower()):
                    self.logger.info(
                        " ---> Renaming column '{0}' to GLOBAL " +
                        " default '{1}' because it contains '{2}'"
                        .format(initial_column_name.lower(),
                                initial_column_name.replace(
                                    k, self.global_renamed_col_suffixes[k]),
                                k.lower()))
                    C.name = initial_column_name.replace(
                            k, self.global_renamed_col_suffixes[k])
                    columns[idx] = C.name
        return columns

    def transform_rows(self, rows, columns, tablename):
        this_table_st = self.column_transformations.get(tablename)
        bool_dict = {
                'Y': True,
                'N': False,
                1: True,
                0: False,
                '1': True,
                '0': False,
                'y': True,
                'n': False,
        }
        if this_table_st is None:
            return
        column_transformers = []
        for c in columns:
            if this_table_st.get(c):
                column_transformers.append(this_table_st.get(c))
            else:
                column_transformers.append(None)
        number_columns = len(columns)
        for r in rows:
            for i in range(number_columns-1, -1 ,-1):
                column_transformer = column_transformers[i]
                if column_transformer:
                    # Then there is a transformation defined for this column...
                    if column_transformer.delete:
                        del r[i]
                    elif st.new_type in [None, ""]:
                        continue
                    # Handle type conversions here...
                    elif st.new_type == "Integer":
                            r[idx] = int(r[idx])
                    elif st.new_type in ["String", "Text"]:
                        r[idx] = str(r[idx])
                    elif st.new_type in ["Float", "Decimal"]:
                        r[idx] = float(r[idx])
                    elif st.new_type == "Boolean":
                        r[idx] = bool_dict[r[idx]]
                
  


================================================
FILE: requirements.txt
================================================
# These are the python libraries for all SQL drivers.
# You must have the drivers installed in order to install these!
# (They are commented out for a reason, uncomment them once drivers are installed)

#cx-Oracle==5.2.1
#MySQL-python==1.2.5
#psycopg2==2.6.1
#pyodbc==3.0.10

six>=1.9.0
SQLAlchemy>=1.2.1,<1.3
sqlalchemy-migrate>=0.9.7
SQLAlchemy-Utils>=0.32.0


================================================
FILE: setup.cfg
================================================
[metadata]
description-file = README.md


================================================
FILE: setup.py
================================================
import sys

from setuptools import setup
from setuptools.command.test import test as TestCommand

class PyTest(TestCommand):
    user_options = [('pytest-args=', 'a', "Arguments to pass into pytest")]

    def initialize_options(self):
        TestCommand.initialize_options(self)
        self.pytest_args = ""

    def run_tests(self):
        import pytest
        import shlex

        errno = pytest.main(shlex.split(self.pytest_args))
        sys.exit(errno)

setup(
        name = 'etlalchemy',
        packages = ['etlalchemy'],
        version = '1.0.6',
        description = 'Extract, Transform, Load. Migrate any SQL Database in 4 lines of code',
        author = 'Sean Harrington',
        author_email='seanharr11@gmail.com',
        url='https://github.com/seanharr11/etlalchemy',
        download_url='https://github.com/seanharr11/etlalchemy/tarball/1.0.6',
        keywords=['sql','migration','etl','database'],
        install_requires = [
            "six>=1.9.0",
            "SQLAlchemy>=1.2.1,<1.3",
            "sqlalchemy-migrate>=0.9.7",
            "SQLAlchemy-Utils>=0.32.0"
        ],
        classifiers=[],
        cmdclass={'test': PyTest},
        tests_require = ["pytest"],
)


================================================
FILE: tests/test_transformer.py
================================================
from etlalchemy.schema_transformer import SchemaTransformer

col_hdrs = ['Column Name','Table Name',
            'New Column Name','New Column Type','Delete']
col_sample_data = [
    col_hdrs,
    ['middle_name','employees','','','True'],
    ['fired','employees','','Boolean','False'],
    ['birth_date','employees','dob','',''],
    ['salary','jobs','payrate','','False'],
        ]
def setup_column_transform_file(tmpdir, data=[]):
    f = tmpdir.join("sample_column_mappings.csv")
    file_data = []
    for row in data:
        file_data.append(','.join(row))
    file_data_str = '\n'.join(file_data)
    f.write(file_data_str)
    # f.write_text?
    assert f.read() == file_data_str
    return str(f) # filename

tbl_hdrs = ['Table Name','New Table Name','Delete']
tbl_sample_data = [
    tbl_hdrs,
    ['table_to_rename','new_table_name','False'],
    ['table_to_delete','','True'],
    ['departments','dept','False'],
        ]
def setup_table_transform_file(tmpdir, data=[]):
    f = tmpdir.join("sample_table_mappings.csv")
    file_data = []
    for row in data:
        file_data.append(','.join(row))
    file_data_str = '\n'.join(file_data)
    f.write(file_data_str)
    # f.write_text?
    assert f.read() == file_data_str
    return str(f) # filename

def get_unique_tables(data):
    """Returns unique tables in data using 'Table Name' column in header row"""
    hdrs = data[0]
    tbl_idx = None
    for idx, hdr in enumerate(hdrs):
        if hdr == "Table Name":
            tbl_idx = idx
            break
    assert tbl_idx is not None
    return set([c[tbl_idx] for c in [row for row in data[1:]]])

def mock_dictreader(headers, data):
    """Simulate the behavior of csv dictreader so we don't need files"""
    return dict(zip(headers, data))


def test_init_args_empty():
    trans = SchemaTransformer(column_transform_file=None, table_transform_file=None)
    assert trans is not None
    assert trans.global_renamed_col_suffixes == {}

def test_init_global_only():
    test_col_suffixes = {'org': 'chg'}
    trans = SchemaTransformer(column_transform_file=None,
            table_transform_file=None,
            global_renamed_col_suffixes=test_col_suffixes)
    assert trans is not None
    assert trans.global_renamed_col_suffixes == test_col_suffixes

def test_column_transformation_delete():
    """Test the allowed values for delete in column transformation file"""
    test_cases = {
        # Delete Value: the expected result
        'True': True, # The first 3 are the only ones true based on code
        'true': True,
        '1': True,
        'Y': False,   # ! should this be true?
        'yes': False, # ! should this be true?
        'delete': False, # ! should this be true?
        '': False,
        '0': False,
        'False': False,
        'false': False,
        'unknown': False,
    }
    row = mock_dictreader(col_hdrs, ['middle_name','employees','','','True'])
    for k in test_cases:
        row['Delete'] = k
        c = SchemaTransformer.ColumnTransformation(row)
        assert c
        assert c.old_column == 'middle_name'
        assert c.old_table == 'employees'
        assert c.new_column == ''
        assert c.new_type == ''
        assert c.delete == test_cases[k]

def test_column_transformation_rename():
    row = mock_dictreader(col_hdrs, ['birth_date','employees','dob','',''])
    c = SchemaTransformer.ColumnTransformation(row)
    assert c
    assert c.old_column == 'birth_date'
    assert c.old_table == 'employees'
    assert c.new_column == 'dob' # <=== The actual test
    assert c.new_type == ''
    assert c.delete == False

    row['New Column Name'] = '' # Make sure not renaming also works
    c = SchemaTransformer.ColumnTransformation(row)
    assert c
    assert c.old_column == 'birth_date'
    assert c.old_table == 'employees'
    assert c.new_column == '' # <==== Should be blank
    assert c.new_type == ''
    assert c.delete == False

def test_column_transformation_tables():
    row = mock_dictreader(col_hdrs, ['fired','employees','','Boolean','False'])
    c = SchemaTransformer.ColumnTransformation(row)
    assert c
    assert c.old_table == 'employees'
    assert str(c) == 'employees.fired'
    row = mock_dictreader(col_hdrs, ['salary','jobs','payrate','','False'])
    c = SchemaTransformer.ColumnTransformation(row)
    assert c
    assert c.old_table == 'jobs'
    assert str(c) == 'jobs.salary'

def test_column_transformation_type():
    row = mock_dictreader(col_hdrs, ['fired','employees','','Boolean','False'])
    c = SchemaTransformer.ColumnTransformation(row)
    assert c
    assert c.new_type == 'Boolean'

def test_table_transformation_rename():
    row = mock_dictreader(tbl_hdrs, ['departments','dept','False'])
    t = SchemaTransformer.TableTransformation(row)
    assert t.old_table == 'departments'
    assert t.new_table == 'dept'
    assert t.delete == False

def test_table_transformation_delete():
    """Test the allowed values for delete in table transformation file"""
    test_cases = {
        # Delete Value: the expected result
        'True': True, # The first 3 are the only ones true based on code
        'true': True,
        '1': True,
        'Y': False,   # ! should this be true?
        'yes': False, # ! should this be true?
        'delete': False, # ! should this be true?
        '': False,
        '0': False,
        'False': False,
        'false': False,
        'unknown': False,
    }
    row = mock_dictreader(tbl_hdrs, ['table_to_delete','new_name','True'])
    for k in test_cases:
        row['Delete'] = k
        t = SchemaTransformer.TableTransformation(row)
        assert t
        assert t.old_table == 'table_to_delete'
        assert t.new_table == 'new_name' # ! should this be removed?
        assert t.delete == test_cases[k]

def test_needsfiles(tmpdir):
    """Make sure we can create, save and remove temporary files"""
    f = tmpdir.join("testfile.txt")
    f.write("can write")
    assert len(tmpdir.listdir()) == 1
    assert f.read() == "can write"
    f.remove()
    assert len(tmpdir.listdir()) == 0

def test_init_column_transform_file_empty(tmpdir):
    col_map = setup_column_transform_file(tmpdir)
    trans = SchemaTransformer(column_transform_file=col_map,
            table_transform_file=None)
    assert trans is not None
    assert len(trans.column_transformations) == 0

def test_init_column_transform_file(tmpdir):
    col_map = setup_column_transform_file(tmpdir, data=col_sample_data)
    unique_tables = get_unique_tables(col_sample_data)
    trans = SchemaTransformer(column_transform_file=col_map,
            table_transform_file=None)
    assert trans is not None
    assert len(trans.table_transformations) == 0
    assert len(trans.column_transformations) > 0
    assert len(trans.column_transformations) == len(unique_tables)
    # Make sure the expected tables are in the list of transformations
    assert set(unique_tables) == set(trans.column_transformations.keys())

def test_init_table_transform_file(tmpdir):
    tbl_map = setup_table_transform_file(tmpdir, data=tbl_sample_data)
    unique_tables = get_unique_tables(tbl_sample_data)
    trans = SchemaTransformer(column_transform_file=None,
            table_transform_file=tbl_map)
    assert trans is not None
    assert len(trans.column_transformations) == 0
    assert len(trans.table_transformations) == len(unique_tables)
    # Make sure the expected tables are in the list of transformations
    assert set(unique_tables) == set(trans.table_transformations.keys())

def test_schedule_deletion_of_column(tmpdir):
    col_map = setup_column_transform_file(tmpdir, data=col_sample_data)
    trans = SchemaTransformer(column_transform_file=col_map,
            table_transform_file=None)
    unique_tables = get_unique_tables(col_sample_data)
    total_tables = len(unique_tables)

    ### Remove a column in new table (compared to sample data)
    assert trans.column_transformations.get('dept') is None
    trans.schedule_deletion_of_column('manager','dept')
    assert trans.column_transformations.get('dept') is not None
    assert trans.column_transformations['dept'].get('manager') is not None
    assert trans.column_transformations['dept'].get('manager').delete
    # Confirm list has been added to
    total_tables += 1
    assert len(trans.column_transformations) == total_tables
    # Dept was added, so make sure the list of tables differs
    assert set(unique_tables) != set(trans.column_transformations.keys())

    ### Remove a column known in a different table
    trans.schedule_deletion_of_column('birth_date', 'bosses')
    # Make sure it didn't change employees.birth_date
    assert trans.column_transformations['employees'].get('birth_date').delete == False
    assert trans.column_transformations['bosses'].get('birth_date').delete
    total_tables += 1
    assert len(trans.column_transformations) == total_tables

    ### Remove a known column in known table (in sample data)
    # Birth_date already has transformation to dob, but isn't to be deleted
    assert trans.column_transformations['employees'].get('birth_date') is not None
    assert trans.column_transformations['employees'].get('birth_date').delete == False
    num_cols = len(trans.column_transformations['employees'])
    trans.schedule_deletion_of_column('birth_date','employees')
    # Confirm it changed to deleting it
    assert trans.column_transformations['employees'].get('birth_date').delete
    # make sure it did not change the list of employees transformations
    assert len(trans.column_transformations['employees']) == num_cols

    ### Remove a new column in known table (in sample data)
    num_cols = len(trans.column_transformations['employees'])
    trans.schedule_deletion_of_column('title','employees')
    assert trans.column_transformations['employees'].get('title').delete
    # make sure it added to the employees transformations
    assert len(trans.column_transformations['employees']) == num_cols + 1
    # make sure it din't change how many tables
    assert len(trans.column_transformations) == total_tables

def test_transform_table(tmpdir):
    # TODO implement tests
    # I think it would be preferable for transform_table to
    # return the altered SQLAlchemy Table object instead of having a
    # strange side effect of renaming. It could return None for delete
    assert 0

def test_transform_column(tmpdir):
    # TODO implement tests
    assert 0

def test_transform_rows(tmpdir):
    # TODO implement tests
    assert 0
Download .txt
gitextract_cuwgq9du/

├── .gitignore
├── HISTORY.rst
├── LICENSE.txt
├── MANIFEST
├── README.md
├── TODO.md
├── etlalchemy/
│   ├── ETLAlchemySource.py
│   ├── ETLAlchemyTarget.py
│   ├── __init__.py
│   ├── etlalchemy_exceptions.py
│   ├── literal_value_generator.py
│   └── schema_transformer.py
├── requirements.txt
├── setup.cfg
├── setup.py
└── tests/
    └── test_transformer.py
Download .txt
SYMBOL INDEX (64 symbols across 7 files)

FILE: etlalchemy/ETLAlchemySource.py
  class ETLAlchemySource (line 40) | class ETLAlchemySource():
    method __init__ (line 42) | def __init__(self,
    method get_nearest_power_of_two (line 145) | def get_nearest_power_of_two(self, num):
    method standardize_column_type (line 162) | def standardize_column_type(self, column, raw_rows):
    method add_or_eliminate_column (line 468) | def add_or_eliminate_column(
    method transform_table (line 525) | def transform_table(self, T):
    method check_multiple_autoincrement_issue (line 546) | def check_multiple_autoincrement_issue(self, auto_inc_count, pk_count,...
    method transform_data (line 570) | def transform_data(self, T_src, raw_rows):
    method create_table (line 585) | def create_table(self, T_dst_exists, T):
    method send_data (line 605) | def send_data(self, table, columns):
    method dump_data (line 769) | def dump_data(self, T_dst_exists, T, raw_rows, pks, sessionMaker):
    method migrate (line 875) | def migrate(
    method add_indexes (line 1184) | def add_indexes(self, destination_database_url):
    method add_fks (line 1339) | def add_fks(self, destination_database_url):
    method print_timings (line 1636) | def print_timings(self):

FILE: etlalchemy/ETLAlchemyTarget.py
  class ETLAlchemyTarget (line 8) | class ETLAlchemyTarget():
    method __init__ (line 9) | def __init__(self, conn_string, drop_database=False):
    method addSource (line 28) | def addSource(self, source):
    method migrate (line 34) | def migrate(self, migrate_schema=True, migrate_data=True,

FILE: etlalchemy/etlalchemy_exceptions.py
  class DBApiNotFound (line 1) | class DBApiNotFound(Exception):
    method __init__ (line 2) | def __init__(self, conn_string):
    method __str__ (line 33) | def __str__(self):

FILE: etlalchemy/literal_value_generator.py
  function _generate_literal_value_for_csv (line 10) | def _generate_literal_value_for_csv(value, dialect):
  function _generate_literal_value (line 81) | def _generate_literal_value(value, dialect):
  function dump_to_oracle_insert_statements (line 139) | def dump_to_oracle_insert_statements(fp, engine, table, raw_rows, columns):
  function dump_to_csv (line 166) | def dump_to_csv(fp, table_name, columns, raw_rows, dialect):
  function generate_literal_value (line 188) | def generate_literal_value(value, dialect, type_):
  function dump_to_sql_statement (line 201) | def dump_to_sql_statement(statement, fp, bind=None, table_name=None):

FILE: etlalchemy/schema_transformer.py
  class SchemaTransformer (line 6) | class SchemaTransformer():
    class TableTransformation (line 8) | class TableTransformation():
      method __init__ (line 9) | def __init__(self, stRow):
      method __str__ (line 14) | def __str__(self):
    class ColumnTransformation (line 18) | class ColumnTransformation():
      method __init__ (line 19) | def __init__(self, stRow):
      method _new_type (line 26) | def _new_type(self):
      method __str__ (line 29) | def __str__(self):
    method schedule_deletion_of_column (line 31) | def schedule_deletion_of_column(self, col, table):
    method __init__ (line 52) | def __init__(self, column_transform_file,
    method transform_table (line 83) | def transform_table(self, table):
    method transform_column (line 98) | def transform_column(self, C, tablename, columns):
    method transform_rows (line 157) | def transform_rows(self, rows, columns, tablename):

FILE: setup.py
  class PyTest (line 6) | class PyTest(TestCommand):
    method initialize_options (line 9) | def initialize_options(self):
    method run_tests (line 13) | def run_tests(self):

FILE: tests/test_transformer.py
  function setup_column_transform_file (line 12) | def setup_column_transform_file(tmpdir, data=[]):
  function setup_table_transform_file (line 30) | def setup_table_transform_file(tmpdir, data=[]):
  function get_unique_tables (line 41) | def get_unique_tables(data):
  function mock_dictreader (line 52) | def mock_dictreader(headers, data):
  function test_init_args_empty (line 57) | def test_init_args_empty():
  function test_init_global_only (line 62) | def test_init_global_only():
  function test_column_transformation_delete (line 70) | def test_column_transformation_delete():
  function test_column_transformation_rename (line 97) | def test_column_transformation_rename():
  function test_column_transformation_tables (line 116) | def test_column_transformation_tables():
  function test_column_transformation_type (line 128) | def test_column_transformation_type():
  function test_table_transformation_rename (line 134) | def test_table_transformation_rename():
  function test_table_transformation_delete (line 141) | def test_table_transformation_delete():
  function test_needsfiles (line 166) | def test_needsfiles(tmpdir):
  function test_init_column_transform_file_empty (line 175) | def test_init_column_transform_file_empty(tmpdir):
  function test_init_column_transform_file (line 182) | def test_init_column_transform_file(tmpdir):
  function test_init_table_transform_file (line 194) | def test_init_table_transform_file(tmpdir):
  function test_schedule_deletion_of_column (line 205) | def test_schedule_deletion_of_column(tmpdir):
  function test_transform_table (line 252) | def test_transform_table(tmpdir):
  function test_transform_column (line 259) | def test_transform_column(tmpdir):
  function test_transform_rows (line 263) | def test_transform_rows(tmpdir):
Condensed preview — 16 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (135K chars).
[
  {
    "path": ".gitignore",
    "chars": 174,
    "preview": "*.pyc\n\n# coverage files\n*.coverage\nhtmlcov\ncover\n\n# temporary files\n*~\n\n# virtual environments\nenv*/\nvenv/\n\n# python set"
  },
  {
    "path": "HISTORY.rst",
    "chars": 1233,
    "preview": ".. :changelog:\n\nHistory\n-------\n\n1.1.1 (2017-05-15)\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n**New Features**:\n\n* ``compress_varcha"
  },
  {
    "path": "LICENSE.txt",
    "chars": 1081,
    "preview": "The MIT License (MIT)\nCopyright (c) 2016 Sean Harrington\n\nPermission is hereby granted, free of charge, to any person ob"
  },
  {
    "path": "MANIFEST",
    "chars": 254,
    "preview": "# file GENERATED by distutils, do NOT edit\nsetup.cfg\nsetup.py\netlalchemy/ETLAlchemySource.py\netlalchemy/ETLAlchemyTarget"
  },
  {
    "path": "README.md",
    "chars": 9032,
    "preview": "# etlalchemy\nExtract, Transform and Load...Migrate any SQL Database in 4 Lines of Code. *[Read more here...](http://thel"
  },
  {
    "path": "TODO.md",
    "chars": 728,
    "preview": "1. Add regression tests\n2. Add unit tests\n3. Add support for Python 3.5\n4. Check to see if FK exists between 2 tables - "
  },
  {
    "path": "etlalchemy/ETLAlchemySource.py",
    "chars": 80150,
    "preview": "import codecs\nfrom itertools import islice\nfrom literal_value_generator import dump_to_sql_statement, dump_to_csv,\\\n    "
  },
  {
    "path": "etlalchemy/ETLAlchemyTarget.py",
    "chars": 4525,
    "preview": "from etlalchemy_exceptions import DBApiNotFound\nfrom sqlalchemy_utils import database_exists, create_database, drop_data"
  },
  {
    "path": "etlalchemy/__init__.py",
    "chars": 92,
    "preview": "from ETLAlchemySource import ETLAlchemySource\nfrom ETLAlchemyTarget import ETLAlchemyTarget\n"
  },
  {
    "path": "etlalchemy/etlalchemy_exceptions.py",
    "chars": 1490,
    "preview": "class DBApiNotFound(Exception):\n    def __init__(self, conn_string):\n        dialect_to_db_apis = {\n            'oracle+"
  },
  {
    "path": "etlalchemy/literal_value_generator.py",
    "chars": 9626,
    "preview": "import shutil\nimport decimal\nimport datetime\n# Find the best implementation available on this platform\ntry:\n    from cSt"
  },
  {
    "path": "etlalchemy/schema_transformer.py",
    "chars": 8589,
    "preview": "import logging\nimport csv\nimport sqlalchemy\n\n\nclass SchemaTransformer():\n\n    class TableTransformation():\n        def _"
  },
  {
    "path": "requirements.txt",
    "chars": 361,
    "preview": "# These are the python libraries for all SQL drivers.\n# You must have the drivers installed in order to install these!\n#"
  },
  {
    "path": "setup.cfg",
    "chars": 40,
    "preview": "[metadata]\ndescription-file = README.md\n"
  },
  {
    "path": "setup.py",
    "chars": 1210,
    "preview": "import sys\n\nfrom setuptools import setup\nfrom setuptools.command.test import test as TestCommand\n\nclass PyTest(TestComma"
  },
  {
    "path": "tests/test_transformer.py",
    "chars": 10525,
    "preview": "from etlalchemy.schema_transformer import SchemaTransformer\n\ncol_hdrs = ['Column Name','Table Name',\n            'New Co"
  }
]

About this extraction

This page contains the full source code of the seanharr11/etlalchemy GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 16 files (126.1 KB), approximately 25.9k tokens, and a symbol index with 64 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!