Full Code of audienceproject/spark-dynamodb for AI

master 816c6e6d8a25 cached

48 files

194.6 KB

46.9k tokens

71 symbols

1 requests

Download .txt

Showing preview only (212K chars total). Download the full file or copy to clipboard to get everything.

Repository: audienceproject/spark-dynamodb
Branch: master
Commit: 816c6e6d8a25
Files: 48
Total size: 194.6 KB

Directory structure:
gitextract_cc5q4yoa/

├── .editorconfig
├── .gitignore
├── LICENSE
├── README.md
├── build.sbt
├── project/
│   ├── build.properties
│   └── plugins.sbt
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── com/
│   │   │       └── audienceproject/
│   │   │           └── shaded/
│   │   │               └── google/
│   │   │                   └── common/
│   │   │                       ├── base/
│   │   │                       │   ├── Preconditions.java
│   │   │                       │   └── Ticker.java
│   │   │                       └── util/
│   │   │                           └── concurrent/
│   │   │                               ├── RateLimiter.java
│   │   │                               └── Uninterruptibles.java
│   │   ├── resources/
│   │   │   └── META-INF/
│   │   │       └── services/
│   │   │           └── org.apache.spark.sql.sources.DataSourceRegister
│   │   └── scala/
│   │       └── com/
│   │           └── audienceproject/
│   │               └── spark/
│   │                   └── dynamodb/
│   │                       ├── attribute.scala
│   │                       ├── catalyst/
│   │                       │   └── JavaConverter.scala
│   │                       ├── connector/
│   │                       │   ├── ColumnSchema.scala
│   │                       │   ├── DynamoConnector.scala
│   │                       │   ├── DynamoWritable.scala
│   │                       │   ├── FilterPushdown.scala
│   │                       │   ├── KeySchema.scala
│   │                       │   ├── TableConnector.scala
│   │                       │   └── TableIndexConnector.scala
│   │                       ├── datasource/
│   │                       │   ├── DefaultSource.scala
│   │                       │   ├── DynamoBatchReader.scala
│   │                       │   ├── DynamoDataDeleteWriter.scala
│   │                       │   ├── DynamoDataUpdateWriter.scala
│   │                       │   ├── DynamoDataWriter.scala
│   │                       │   ├── DynamoReaderFactory.scala
│   │                       │   ├── DynamoScanBuilder.scala
│   │                       │   ├── DynamoTable.scala
│   │                       │   ├── DynamoWriteBuilder.scala
│   │                       │   ├── DynamoWriterFactory.scala
│   │                       │   ├── OutputPartitioning.scala
│   │                       │   ├── ScanPartition.scala
│   │                       │   └── TypeConversion.scala
│   │                       ├── implicits.scala
│   │                       └── reflect/
│   │                           └── SchemaAnalysis.scala
│   └── test/
│       ├── resources/
│       │   └── log4j2.xml
│       └── scala/
│           └── com/
│               └── audienceproject/
│                   └── spark/
│                       └── dynamodb/
│                           ├── AbstractInMemoryTest.scala
│                           ├── DefaultSourceTest.scala
│                           ├── FilterPushdownTest.scala
│                           ├── NestedDataStructuresTest.scala
│                           ├── NullBooleanTest.scala
│                           ├── NullValuesTest.scala
│                           ├── RegionTest.scala
│                           ├── WriteRelationTest.scala
│                           └── structs/
│                               ├── TestFruit.scala
│                               └── TestFruitWithProperties.scala
└── wercker.yml

================================================
FILE CONTENTS
================================================

================================================
FILE: .editorconfig
================================================
root = true
[*]
end_of_line = lf
insert_final_newline = true
charset = utf-8
indent_style = space
indent_size = 4


================================================
FILE: .gitignore
================================================
/target/
/bin/
*.class
*.log
.classpath
.idea
.wercker
project/target
project/project
lib_managed*/


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# Spark+DynamoDB
Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.

We published a small article about the project, check it out here:
https://www.audienceproject.com/blog/tech/sparkdynamodb-using-aws-dynamodb-data-source-apache-spark/

## News

* 2021-01-28: Added option `inferSchema=false` which is useful when writing to a table with many columns
* 2020-07-23: Releasing version 1.1.0 which supports Spark 3.0.0 and Scala 2.12. Future releases will no longer be compatible with Scala 2.11 and Spark 2.x.x.
* 2020-04-28: Releasing version 1.0.4. Includes support for assuming AWS roles through custom STS endpoint (credits @jhulten).
* 2020-04-09: We are releasing version 1.0.3 of the Spark+DynamoDB connector. Added option to `delete` records (thank you @rhelmstetter). Fixes (thank you @juanyunism for #46).
* 2019-11-25: We are releasing version 1.0.0 of the Spark+DynamoDB connector, which is based on the Spark Data Source V2 API. Out-of-the-box throughput calculations, parallelism and partition planning should now be more reliable. We have also pulled out the external dependency on Guava, which was causing a lot of compatibility issues.

## Features

- Distributed, parallel scan with lazy evaluation
- Throughput control by rate limiting on target fraction of provisioned table/index capacity
- Schema discovery to suit your needs
  - Dynamic inference
  - Static analysis of case class
- Column and filter pushdown
- Global secondary index support
- Write support

## Getting The Dependency

The library is available from [Maven Central](https://mvnrepository.com/artifact/com.audienceproject/spark-dynamodb). Add the dependency in SBT as ```"com.audienceproject" %% "spark-dynamodb" % "latest"```

Spark is used in the library as a "provided" dependency, which means Spark has to be installed separately on the container where the application is running, such as is the case on AWS EMR.

## Quick Start Guide

### Scala
```scala
import com.audienceproject.spark.dynamodb.implicits._
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder().getOrCreate()

// Load a DataFrame from a Dynamo table. Only incurs the cost of a single scan for schema inference.
val dynamoDf = spark.read.dynamodb("SomeTableName") // <-- DataFrame of Row objects with inferred schema.

// Scan the table for the first 100 items (the order is arbitrary) and print them.
dynamoDf.show(100)

// write to some other table overwriting existing item with same keys
dynamoDf.write.dynamodb("SomeOtherTable")

// Case class representing the items in our table.
import com.audienceproject.spark.dynamodb.attribute
case class Vegetable (name: String, color: String, @attribute("weight_kg") weightKg: Double)

// Load a Dataset[Vegetable]. Notice the @attribute annotation on the case class - we imagine the weight attribute is named with an underscore in DynamoDB.
import org.apache.spark.sql.functions._
import spark.implicits._
val vegetableDs = spark.read.dynamodbAs[Vegetable]("VegeTable")
val avgWeightByColor = vegetableDs.agg($"color", avg($"weightKg")) // The column is called 'weightKg' in the Dataset.
```

### Python
```python
# Load a DataFrame from a Dynamo table. Only incurs the cost of a single scan for schema inference.
dynamoDf = spark.read.option("tableName", "SomeTableName") \
                     .format("dynamodb") \
                     .load() # <-- DataFrame of Row objects with inferred schema.

# Scan the table for the first 100 items (the order is arbitrary) and print them.
dynamoDf.show(100)

# write to some other table overwriting existing item with same keys
dynamoDf.write.option("tableName", "SomeOtherTable") \
              .format("dynamodb") \
              .save()
```

*Note:* When running from `pyspark` shell, you can add the library as:
```bash
pyspark --packages com.audienceproject:spark-dynamodb_<spark-scala-version>:<version>
```

## Parameters
The following parameters can be set as options on the Spark reader and writer object before loading/saving.
- `region` sets the region where the dynamodb table. Default is environment specific.
- `roleArn` sets an IAM role to assume. This allows for access to a DynamoDB in a different account than the Spark cluster. Defaults to the standard role configuration.

The following parameters can be set as options on the Spark reader object before loading.

- `readPartitions` number of partitions to split the initial RDD when loading the data into Spark. Defaults to the size of the DynamoDB table divided into chunks of `maxPartitionBytes`
- `maxPartitionBytes` the maximum size of a single input partition. Default 128 MB
- `defaultParallelism` the number of input partitions that can be read from DynamoDB simultaneously. Defaults to `sparkContext.defaultParallelism`
- `targetCapacity` fraction of provisioned read capacity on the table (or index) to consume for reading. Default 1 (i.e. 100% capacity).
- `stronglyConsistentReads` whether or not to use strongly consistent reads. Default false.
- `bytesPerRCU` number of bytes that can be read per second with a single Read Capacity Unit. Default 4000 (4 KB). This value is multiplied by two when `stronglyConsistentReads=false`
- `filterPushdown` whether or not to use filter pushdown to DynamoDB on scan requests. Default true.
- `throughput` the desired read throughput to use. It overwrites any calculation used by the package. It is intended to be used with tables that are on-demand. Defaults to 100 for on-demand.

The following parameters can be set as options on the Spark writer object before saving.

- `writeBatchSize` number of items to send per call to DynamoDB BatchWriteItem. Default 25.
- `targetCapacity` fraction of provisioned write capacity on the table to consume for writing or updating. Default 1 (i.e. 100% capacity).
- `update` if true items will be written using UpdateItem on keys rather than BatchWriteItem. Default false.
- `throughput` the desired write throughput to use. It overwrites any calculation used by the package. It is intended to be used with tables that are on-demand. Defaults to 100 for on-demand.
- `inferSchema` if false will not automatically infer schema - this is useful when writing to a table with many columns

## System Properties
The following Java system properties are available for configuration.

- `aws.profile` IAM profile to use for default credentials provider.
- `aws.dynamodb.region` region in which to access the AWS APIs.
- `aws.dynamodb.endpoint` endpoint to use for accessing the DynamoDB API.
- `aws.sts.endpoint` endpoint to use for accessing the STS API when assuming the role indicated by the `roleArn` parameter.

## Acknowledgements
Usage of parallel scan and rate limiter inspired by work in https://github.com/traviscrawford/spark-dynamodb


================================================
FILE: build.sbt
================================================
organization := "com.audienceproject"

name := "spark-dynamodb"

version := "1.1.3"

description := "Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB."

scalaVersion := "2.12.12"

compileOrder := CompileOrder.JavaThenScala

resolvers += "DynamoDBLocal" at "https://s3-us-west-2.amazonaws.com/dynamodb-local/release"

libraryDependencies += "com.amazonaws" % "aws-java-sdk-sts" % "1.11.678"
libraryDependencies += "com.amazonaws" % "aws-java-sdk-dynamodb" % "1.11.678"
libraryDependencies += "com.amazonaws" % "DynamoDBLocal" % "[1.11,2.0)" % "test" exclude("com.google.guava", "guava")

libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.0" % "provided"

libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % "test"

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.25"

libraryDependencies ++= {
    val log4j2Version = "2.11.1"
    Seq(
        "org.apache.logging.log4j" % "log4j-api" % log4j2Version % "test",
        "org.apache.logging.log4j" % "log4j-core" % log4j2Version % "test",
        "org.apache.logging.log4j" % "log4j-slf4j-impl" % log4j2Version % "test"
    )
}

libraryDependencies += "com.almworks.sqlite4java" % "sqlite4java" % "1.0.392" % "test"

retrieveManaged := true

fork in Test := true

val libManaged = "lib_managed"
val libManagedSqlite = s"${libManaged}_sqlite4java"

javaOptions in Test ++= Seq(s"-Djava.library.path=./$libManagedSqlite", "-Daws.dynamodb.endpoint=http://localhost:8000")

/**
  * Put all sqlite4java dependencies in [[libManagedSqlite]] for easy reference when configuring java.library.path.
  */
Test / resourceGenerators += Def.task {
    import java.nio.file.{Files, Path}
    import java.util.function.Predicate
    import java.util.stream.Collectors
    import scala.collection.JavaConverters._

    def log(msg: Any): Unit = println(s"[℣₳ℒ𐎅] $msg") //stand out in the crowd

    val theOnesWeLookFor = Set(
        "libsqlite4java-linux-amd64-1.0.392.so",
        "libsqlite4java-linux-i386-1.0.392.so ",
        "libsqlite4java-osx-1.0.392.dylib     ",
        "sqlite4java-1.0.392.jar              ",
        "sqlite4java-win32-x64-1.0.392.dll    ",
        "sqlite4java-win32-x86-1.0.392.dll    "
    ).map(_.trim)

    val isOneOfTheOnes = new Predicate[Path] {
        override def test(p: Path) = theOnesWeLookFor exists (p endsWith _)
    }

    val theOnesWeCouldFind: Set[Path] = Files
        .walk(new File(libManaged).toPath)
        .filter(isOneOfTheOnes)
        .collect(Collectors.toSet[Path])
        .asScala.toSet

    theOnesWeCouldFind foreach { path =>
        log(s"found: ${path.toFile.getName}")
    }

    assert(theOnesWeCouldFind.size == theOnesWeLookFor.size)

    val libManagedSqliteDir = new File(s"$libManagedSqlite")
    sbt.IO delete libManagedSqliteDir
    sbt.IO createDirectory libManagedSqliteDir
    log(libManagedSqliteDir.getAbsolutePath)

    theOnesWeCouldFind
        .map { path =>
            val source: File = path.toFile
            val target: File = libManagedSqliteDir / source.getName
            log(s"copying from $source to $target")
            sbt.IO.copyFile(source, target)
            target
        }
        .toSeq
}.taskValue

/**
  * Maven specific settings for publishing to Maven central.
  */
publishMavenStyle := true
publishArtifact in Test := false
pomIncludeRepository := { _ => false }
publishTo := {
    val nexus = "https://oss.sonatype.org/"
    if (isSnapshot.value) Some("snapshots" at nexus + "content/repositories/snapshots")
    else Some("releases" at nexus + "service/local/staging/deploy/maven2")
}
pomExtra := <url>https://github.com/audienceproject/spark-dynamodb</url>
    <licenses>
        <license>
            <name>Apache License, Version 2.0</name>
            <url>https://opensource.org/licenses/apache-2.0</url>
        </license>
    </licenses>
    <scm>
        <url>git@github.com:audienceproject/spark-dynamodb.git</url>
        <connection>scm:git:git//github.com/audienceproject/spark-dynamodb.git</connection>
        <developerConnection>scm:git:ssh://github.com:audienceproject/spark-dynamodb.git</developerConnection>
    </scm>
    <developers>
        <developer>
            <id>jacobfi</id>
            <name>Jacob Fischer</name>
            <email>jacob.fischer@audienceproject.com</email>
            <organization>AudienceProject</organization>
            <organizationUrl>https://www.audienceproject.com</organizationUrl>
        </developer>
        <developer>
            <id>johsbk</id>
            <name>Johs Kristoffersen</name>
            <email>johs.kristoffersen@audienceproject.com</email>
            <organization>AudienceProject</organization>
            <organizationUrl>https://www.audienceproject.com</organizationUrl>
        </developer>
    </developers>


================================================
FILE: project/build.properties
================================================
sbt.version = 1.2.6


================================================
FILE: project/plugins.sbt
================================================
logLevel := Level.Warn

addSbtPlugin("com.jsuereth" % "sbt-pgp" % "1.1.0")
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")


================================================
FILE: src/main/java/com/audienceproject/shaded/google/common/base/Preconditions.java
================================================
package com.audienceproject.shaded.google.common.base;

/*
 * Notice:
 * This file was modified at AudienceProject ApS by Cosmin Catalin Sanda (cosmin@audienceproject.com)
 */

/*
 * Copyright (C) 2007 The Guava Authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.util.NoSuchElementException;

import javax.annotation.Nullable;

/**
 * Simple static methods to be called at the start of your own methods to verify
 * correct arguments and state. This allows constructs such as
 * <pre>
 *     if (count <= 0) {
 *       throw new IllegalArgumentException("must be positive: " + count);
 *     }</pre>
 *
 * to be replaced with the more compact
 * <pre>
 *     checkArgument(count > 0, "must be positive: %s", count);</pre>
 *
 * Note that the sense of the expression is inverted; with {@code Preconditions}
 * you declare what you expect to be <i>true</i>, just as you do with an
 * <a href="http://java.sun.com/j2se/1.5.0/docs/guide/language/assert.html">
 * {@code assert}</a> or a JUnit {@code assertTrue} call.
 *
 * <p><b>Warning:</b> only the {@code "%s"} specifier is recognized as a
 * placeholder in these messages, not the full range of {@link
 * String#format(String, Object[])} specifiers.
 *
 * <p>Take care not to confuse precondition checking with other similar types
 * of checks! Precondition exceptions -- including those provided here, but also
 * {@link IndexOutOfBoundsException}, {@link NoSuchElementException}, {@link
 * UnsupportedOperationException} and others -- are used to signal that the
 * <i>calling method</i> has made an error. This tells the caller that it should
 * not have invoked the method when it did, with the arguments it did, or
 * perhaps ever. Postcondition or other invariant failures should not throw
 * these types of exceptions.
 *
 * <p>See the Guava User Guide on <a href=
 * "http://code.google.com/p/guava-libraries/wiki/PreconditionsExplained">
 * using {@code Preconditions}</a>.
 *
 * @author Kevin Bourrillion
 * @since 2.0 (imported from Google Collections Library)
 */
public final class Preconditions {
    private Preconditions() {}

    /**
     * Ensures the truth of an expression involving one or more parameters to the
     * calling method.
     *
     * @param expression a boolean expression
     * @throws IllegalArgumentException if {@code expression} is false
     */
    public static void checkArgument(boolean expression) {
        if (!expression) {
            throw new IllegalArgumentException();
        }
    }

    /**
     * Ensures the truth of an expression involving one or more parameters to the
     * calling method.
     *
     * @param expression a boolean expression
     * @param errorMessage the exception message to use if the check fails; will
     *     be converted to a string using {@link String#valueOf(Object)}
     * @throws IllegalArgumentException if {@code expression} is false
     */
    public static void checkArgument(
        boolean expression, @Nullable Object errorMessage) {
        if (!expression) {
            throw new IllegalArgumentException(String.valueOf(errorMessage));
        }
    }

    /**
     * Ensures the truth of an expression involving one or more parameters to the
     * calling method.
     *
     * @param expression a boolean expression
     * @param errorMessageTemplate a template for the exception message should the
     *     check fail. The message is formed by replacing each {@code %s}
     *     placeholder in the template with an argument. These are matched by
     *     position - the first {@code %s} gets {@code errorMessageArgs[0]}, etc.
     *     Unmatched arguments will be appended to the formatted message in square
     *     braces. Unmatched placeholders will be left as-is.
     * @param errorMessageArgs the arguments to be substituted into the message
     *     template. Arguments are converted to strings using
     *     {@link String#valueOf(Object)}.
     * @throws IllegalArgumentException if {@code expression} is false
     * @throws NullPointerException if the check fails and either {@code
     *     errorMessageTemplate} or {@code errorMessageArgs} is null (don't let
     *     this happen)
     */
    public static void checkArgument(boolean expression,
                                     @Nullable String errorMessageTemplate,
                                     @Nullable Object... errorMessageArgs) {
        if (!expression) {
            throw new IllegalArgumentException(
                format(errorMessageTemplate, errorMessageArgs));
        }
    }

    /**
     * Ensures the truth of an expression involving the state of the calling
     * instance, but not involving any parameters to the calling method.
     *
     * @param expression a boolean expression
     * @throws IllegalStateException if {@code expression} is false
     */
    public static void checkState(boolean expression) {
        if (!expression) {
            throw new IllegalStateException();
        }
    }

    /**
     * Ensures the truth of an expression involving the state of the calling
     * instance, but not involving any parameters to the calling method.
     *
     * @param expression a boolean expression
     * @param errorMessage the exception message to use if the check fails; will
     *     be converted to a string using {@link String#valueOf(Object)}
     * @throws IllegalStateException if {@code expression} is false
     */
    public static void checkState(
        boolean expression, @Nullable Object errorMessage) {
        if (!expression) {
            throw new IllegalStateException(String.valueOf(errorMessage));
        }
    }

    /**
     * Ensures the truth of an expression involving the state of the calling
     * instance, but not involving any parameters to the calling method.
     *
     * @param expression a boolean expression
     * @param errorMessageTemplate a template for the exception message should the
     *     check fail. The message is formed by replacing each {@code %s}
     *     placeholder in the template with an argument. These are matched by
     *     position - the first {@code %s} gets {@code errorMessageArgs[0]}, etc.
     *     Unmatched arguments will be appended to the formatted message in square
     *     braces. Unmatched placeholders will be left as-is.
     * @param errorMessageArgs the arguments to be substituted into the message
     *     template. Arguments are converted to strings using
     *     {@link String#valueOf(Object)}.
     * @throws IllegalStateException if {@code expression} is false
     * @throws NullPointerException if the check fails and either {@code
     *     errorMessageTemplate} or {@code errorMessageArgs} is null (don't let
     *     this happen)
     */
    public static void checkState(boolean expression,
                                  @Nullable String errorMessageTemplate,
                                  @Nullable Object... errorMessageArgs) {
        if (!expression) {
            throw new IllegalStateException(
                format(errorMessageTemplate, errorMessageArgs));
        }
    }

    /**
     * Ensures that an object reference passed as a parameter to the calling
     * method is not null.
     *
     * @param reference an object reference
     * @return the non-null reference that was validated
     * @throws NullPointerException if {@code reference} is null
     */
    public static <T> T checkNotNull(T reference) {
        if (reference == null) {
            throw new NullPointerException();
        }
        return reference;
    }

    /**
     * Ensures that an object reference passed as a parameter to the calling
     * method is not null.
     *
     * @param reference an object reference
     * @param errorMessage the exception message to use if the check fails; will
     *     be converted to a string using {@link String#valueOf(Object)}
     * @return the non-null reference that was validated
     * @throws NullPointerException if {@code reference} is null
     */
    public static <T> T checkNotNull(T reference, @Nullable Object errorMessage) {
        if (reference == null) {
            throw new NullPointerException(String.valueOf(errorMessage));
        }
        return reference;
    }

    /**
     * Ensures that an object reference passed as a parameter to the calling
     * method is not null.
     *
     * @param reference an object reference
     * @param errorMessageTemplate a template for the exception message should the
     *     check fail. The message is formed by replacing each {@code %s}
     *     placeholder in the template with an argument. These are matched by
     *     position - the first {@code %s} gets {@code errorMessageArgs[0]}, etc.
     *     Unmatched arguments will be appended to the formatted message in square
     *     braces. Unmatched placeholders will be left as-is.
     * @param errorMessageArgs the arguments to be substituted into the message
     *     template. Arguments are converted to strings using
     *     {@link String#valueOf(Object)}.
     * @return the non-null reference that was validated
     * @throws NullPointerException if {@code reference} is null
     */
    public static <T> T checkNotNull(T reference,
                                     @Nullable String errorMessageTemplate,
                                     @Nullable Object... errorMessageArgs) {
        if (reference == null) {
            // If either of these parameters is null, the right thing happens anyway
            throw new NullPointerException(
                format(errorMessageTemplate, errorMessageArgs));
        }
        return reference;
    }

    /*
     * All recent hotspots (as of 2009) *really* like to have the natural code
     *
     * if (guardExpression) {
     *    throw new BadException(messageExpression);
     * }
     *
     * refactored so that messageExpression is moved to a separate
     * String-returning method.
     *
     * if (guardExpression) {
     *    throw new BadException(badMsg(...));
     * }
     *
     * The alternative natural refactorings into void or Exception-returning
     * methods are much slower.  This is a big deal - we're talking factors of
     * 2-8 in microbenchmarks, not just 10-20%.  (This is a hotspot optimizer
     * bug, which should be fixed, but that's a separate, big project).
     *
     * The coding pattern above is heavily used in java.util, e.g. in ArrayList.
     * There is a RangeCheckMicroBenchmark in the JDK that was used to test this.
     *
     * But the methods in this class want to throw different exceptions,
     * depending on the args, so it appears that this pattern is not directly
     * applicable.  But we can use the ridiculous, devious trick of throwing an
     * exception in the middle of the construction of another exception.
     * Hotspot is fine with that.
     */

    /**
     * Ensures that {@code index} specifies a valid <i>element</i> in an array,
     * list or string of size {@code size}. An element index may range from zero,
     * inclusive, to {@code size}, exclusive.
     *
     * @param index a user-supplied index identifying an element of an array, list
     *     or string
     * @param size the size of that array, list or string
     * @return the value of {@code index}
     * @throws IndexOutOfBoundsException if {@code index} is negative or is not
     *     less than {@code size}
     * @throws IllegalArgumentException if {@code size} is negative
     */
    public static int checkElementIndex(int index, int size) {
        return checkElementIndex(index, size, "index");
    }

    /**
     * Ensures that {@code index} specifies a valid <i>element</i> in an array,
     * list or string of size {@code size}. An element index may range from zero,
     * inclusive, to {@code size}, exclusive.
     *
     * @param index a user-supplied index identifying an element of an array, list
     *     or string
     * @param size the size of that array, list or string
     * @param desc the text to use to describe this index in an error message
     * @return the value of {@code index}
     * @throws IndexOutOfBoundsException if {@code index} is negative or is not
     *     less than {@code size}
     * @throws IllegalArgumentException if {@code size} is negative
     */
    public static int checkElementIndex(
        int index, int size, @Nullable String desc) {
        // Carefully optimized for execution by hotspot (explanatory comment above)
        if (index < 0 || index >= size) {
            throw new IndexOutOfBoundsException(badElementIndex(index, size, desc));
        }
        return index;
    }

    private static String badElementIndex(int index, int size, String desc) {
        if (index < 0) {
            return format("%s (%s) must not be negative", desc, index);
        } else if (size < 0) {
            throw new IllegalArgumentException("negative size: " + size);
        } else { // index >= size
            return format("%s (%s) must be less than size (%s)", desc, index, size);
        }
    }

    /**
     * Ensures that {@code index} specifies a valid <i>position</i> in an array,
     * list or string of size {@code size}. A position index may range from zero
     * to {@code size}, inclusive.
     *
     * @param index a user-supplied index identifying a position in an array, list
     *     or string
     * @param size the size of that array, list or string
     * @return the value of {@code index}
     * @throws IndexOutOfBoundsException if {@code index} is negative or is
     *     greater than {@code size}
     * @throws IllegalArgumentException if {@code size} is negative
     */
    public static int checkPositionIndex(int index, int size) {
        return checkPositionIndex(index, size, "index");
    }

    /**
     * Ensures that {@code index} specifies a valid <i>position</i> in an array,
     * list or string of size {@code size}. A position index may range from zero
     * to {@code size}, inclusive.
     *
     * @param index a user-supplied index identifying a position in an array, list
     *     or string
     * @param size the size of that array, list or string
     * @param desc the text to use to describe this index in an error message
     * @return the value of {@code index}
     * @throws IndexOutOfBoundsException if {@code index} is negative or is
     *     greater than {@code size}
     * @throws IllegalArgumentException if {@code size} is negative
     */
    public static int checkPositionIndex(
        int index, int size, @Nullable String desc) {
        // Carefully optimized for execution by hotspot (explanatory comment above)
        if (index < 0 || index > size) {
            throw new IndexOutOfBoundsException(badPositionIndex(index, size, desc));
        }
        return index;
    }

    private static String badPositionIndex(int index, int size, String desc) {
        if (index < 0) {
            return format("%s (%s) must not be negative", desc, index);
        } else if (size < 0) {
            throw new IllegalArgumentException("negative size: " + size);
        } else { // index > size
            return format("%s (%s) must not be greater than size (%s)",
                desc, index, size);
        }
    }

    /**
     * Ensures that {@code start} and {@code end} specify a valid <i>positions</i>
     * in an array, list or string of size {@code size}, and are in order. A
     * position index may range from zero to {@code size}, inclusive.
     *
     * @param start a user-supplied index identifying a starting position in an
     *     array, list or string
     * @param end a user-supplied index identifying a ending position in an array,
     *     list or string
     * @param size the size of that array, list or string
     * @throws IndexOutOfBoundsException if either index is negative or is
     *     greater than {@code size}, or if {@code end} is less than {@code start}
     * @throws IllegalArgumentException if {@code size} is negative
     */
    public static void checkPositionIndexes(int start, int end, int size) {
        // Carefully optimized for execution by hotspot (explanatory comment above)
        if (start < 0 || end < start || end > size) {
            throw new IndexOutOfBoundsException(badPositionIndexes(start, end, size));
        }
    }

    private static String badPositionIndexes(int start, int end, int size) {
        if (start < 0 || start > size) {
            return badPositionIndex(start, size, "start index");
        }
        if (end < 0 || end > size) {
            return badPositionIndex(end, size, "end index");
        }
        // end < start
        return format("end index (%s) must not be less than start index (%s)",
            end, start);
    }

    /**
     * Substitutes each {@code %s} in {@code template} with an argument. These
     * are matched by position - the first {@code %s} gets {@code args[0]}, etc.
     * If there are more arguments than placeholders, the unmatched arguments will
     * be appended to the end of the formatted message in square braces.
     *
     * @param template a non-null string containing 0 or more {@code %s}
     *     placeholders.
     * @param args the arguments to be substituted into the message
     *     template. Arguments are converted to strings using
     *     {@link String#valueOf(Object)}. Arguments can be null.
     */
    static String format(String template,
                                            @Nullable Object... args) {
        template = String.valueOf(template); // null -> "null"

        // start substituting the arguments into the '%s' placeholders
        StringBuilder builder = new StringBuilder(
            template.length() + 16 * args.length);
        int templateStart = 0;
        int i = 0;
        while (i < args.length) {
            int placeholderStart = template.indexOf("%s", templateStart);
            if (placeholderStart == -1) {
                break;
            }
            builder.append(template.substring(templateStart, placeholderStart));
            builder.append(args[i++]);
            templateStart = placeholderStart + 2;
        }
        builder.append(template.substring(templateStart));

        // if we run out of placeholders, append the extra args in square braces
        if (i < args.length) {
            builder.append(" [");
            builder.append(args[i++]);
            while (i < args.length) {
                builder.append(", ");
                builder.append(args[i++]);
            }
            builder.append(']');
        }

        return builder.toString();
    }
}



================================================
FILE: src/main/java/com/audienceproject/shaded/google/common/base/Ticker.java
================================================
package com.audienceproject.shaded.google.common.base;

/*
 * Notice:
 * This file was modified at AudienceProject ApS by Cosmin Catalin Sanda (cosmin@audienceproject.com)
 */

/*
 * Copyright (C) 2011 The Guava Authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * A time source; returns a time value representing the number of nanoseconds elapsed since some
 * fixed but arbitrary point in time.
 *
 * <p><b>Warning:</b> this interface can only be used to measure elapsed time, not wall time.
 *
 * @author Kevin Bourrillion
 * @since 10.0
 *     (<a href="http://code.google.com/p/guava-libraries/wiki/Compatibility"
 *     >mostly source-compatible</a> since 9.0)
 */
public abstract class Ticker {
    /**
     * Constructor for use by subclasses.
     */
    protected Ticker() {}

    /**
     * Returns the number of nanoseconds elapsed since this ticker's fixed
     * point of reference.
     */
    public abstract long read();

    /**
     * A ticker that reads the current time using {@link System#nanoTime}.
     *
     * @since 10.0
     */
    public static Ticker systemTicker() {
        return SYSTEM_TICKER;
    }

    private static final Ticker SYSTEM_TICKER = new Ticker() {
        @Override
        public long read() {
            return System.nanoTime();
        }
    };
}



================================================
FILE: src/main/java/com/audienceproject/shaded/google/common/util/concurrent/RateLimiter.java
================================================
package com.audienceproject.shaded.google.common.util.concurrent;

/*
 * Notice:
 * This file was modified at AudienceProject ApS by Cosmin Catalin Sanda (cosmin@audienceproject.com)
 */

/*
 * Copyright (C) 2012 The Guava Authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import com.audienceproject.shaded.google.common.base.Preconditions;
import com.audienceproject.shaded.google.common.base.Ticker;

import javax.annotation.concurrent.ThreadSafe;
import java.util.concurrent.TimeUnit;

/**
  * A rate limiter. Conceptually, a rate limiter distributes permits at a
  * configurable rate. Each {@link #acquire()} blocks if necessary until a permit is
  * available, and then takes it. Once acquired, permits need not be released.
  *
  * <p>Rate limiters are often used to restrict the rate at which some
  * physical or logical resource is accessed. This is in contrast to {@link
  * java.util.concurrent.Semaphore} which restricts the number of concurrent
  * accesses instead of the rate (note though that concurrency and rate are closely related,
  * e.g. see <a href="http://en.wikipedia.org/wiki/Little's_law">Little's Law</a>).
  *
  * <p>A {@code RateLimiter} is defined primarily by the rate at which permits
  * are issued. Absent additional configuration, permits will be distributed at a
  * fixed rate, defined in terms of permits per second. Permits will be distributed
  * smoothly, with the delay between individual permits being adjusted to ensure
  * that the configured rate is maintained.
  *
  * <p>It is possible to configure a {@code RateLimiter} to have a warmup
  * period during which time the permits issued each second steadily increases until
  * it hits the stable rate.
  *
  * <p>As an example, imagine that we have a list of tasks to execute, but we don't want to
  * submit more than 2 per second:
  *<pre>  {@code
  *  final RateLimiter rateLimiter = RateLimiter.create(2.0); // rate is "2 permits per second"
 *  void submitTasks(List<Runnable> tasks, Executor executor) {
 *    for (Runnable task : tasks) {
 *      rateLimiter.acquire(); // may wait
 *      executor.execute(task);
 *    }
  *  }
  *}</pre>
  *
  * <p>As another example, imagine that we produce a stream of data, and we want to cap it
  * at 5kb per second. This could be accomplished by requiring a permit per byte, and specifying
  * a rate of 5000 permits per second:
  *<pre>  {@code
  *  final RateLimiter rateLimiter = RateLimiter.create(5000.0); // rate = 5000 permits per second
 *  void submitPacket(byte[] packet) {
 *    rateLimiter.acquire(packet.length);
 *    networkService.send(packet);
 *  }
  *}</pre>
  *
  * <p>It is important to note that the number of permits requested <i>never</i>
  * affect the throttling of the request itself (an invocation to {@code acquire(1)}
  * and an invocation to {@code acquire(1000)} will result in exactly the same throttling, if any),
  * but it affects the throttling of the <i>next</i> request. I.e., if an expensive task
  * arrives at an idle RateLimiter, it will be granted immediately, but it is the <i>next</i>
  * request that will experience extra throttling, thus paying for the cost of the expensive
  * task.
  *
  * <p>Note: {@code RateLimiter} does not provide fairness guarantees.
  *
  * @author Dimitris Andreou
  * @since 13.0
  */
// TODO(user): switch to nano precision. A natural unit of cost is "bytes", and a micro precision
//     would mean a maximum rate of "1MB/s", which might be small in some cases.
@ThreadSafe
public abstract class RateLimiter {
    /*
     * How is the RateLimiter designed, and why?
     *
     * The primary feature of a RateLimiter is its "stable rate", the maximum rate that
     * is should allow at normal conditions. This is enforced by "throttling" incoming
     * requests as needed, i.e. compute, for an incoming request, the appropriate throttle time,
     * and make the calling thread wait as much.
     *
     * The simplest way to maintain a rate of QPS is to keep the timestamp of the last
     * granted request, and ensure that (1/QPS) seconds have elapsed since then. For example,
     * for a rate of QPS=5 (5 tokens per second), if we ensure that a request isn't granted
     * earlier than 200ms after the the last one, then we achieve the intended rate.
     * If a request comes and the last request was granted only 100ms ago, then we wait for
     * another 100ms. At this rate, serving 15 fresh permits (i.e. for an acquire(15) request)
     * naturally takes 3 seconds.
     *
     * It is important to realize that such a RateLimiter has a very superficial memory
     * of the past: it only remembers the last request. What if the RateLimiter was unused for
     * a long period of time, then a request arrived and was immediately granted?
     * This RateLimiter would immediately forget about that past underutilization. This may
     * result in either underutilization or overflow, depending on the real world consequences
     * of not using the expected rate.
     *
     * Past underutilization could mean that excess resources are available. Then, the RateLimiter
     * should speed up for a while, to take advantage of these resources. This is important
     * when the rate is applied to networking (limiting bandwidth), where past underutilization
     * typically translates to "almost empty buffers", which can be filled immediately.
     *
     * On the other hand, past underutilization could mean that "the server responsible for
     * handling the request has become less ready for future requests", i.e. its caches become
     * stale, and requests become more likely to trigger expensive operations (a more extreme
     * case of this example is when a server has just booted, and it is mostly busy with getting
     * itself up to speed).
     *
     * To deal with such scenarios, we add an extra dimension, that of "past underutilization",
     * modeled by "storedPermits" variable. This variable is zero when there is no
     * underutilization, and it can grow up to maxStoredPermits, for sufficiently large
     * underutilization. So, the requested permits, by an invocation acquire(permits),
     * are served from:
     * - stored permits (if available)
     * - fresh permits (for any remaining permits)
     *
     * How this works is best explained with an example:
     *
     * For a RateLimiter that produces 1 token per second, every second
     * that goes by with the RateLimiter being unused, we increase storedPermits by 1.
     * Say we leave the RateLimiter unused for 10 seconds (i.e., we expected a request at time
     * X, but we are at time X + 10 seconds before a request actually arrives; this is
     * also related to the point made in the last paragraph), thus storedPermits
     * becomes 10.0 (assuming maxStoredPermits >= 10.0). At that point, a request of acquire(3)
     * arrives. We serve this request out of storedPermits, and reduce that to 7.0 (how this is
     * translated to throttling time is discussed later). Immediately after, assume that an
     * acquire(10) request arriving. We serve the request partly from storedPermits,
     * using all the remaining 7.0 permits, and the remaining 3.0, we serve them by fresh permits
     * produced by the rate limiter.
     *
     * We already know how much time it takes to serve 3 fresh permits: if the rate is
     * "1 token per second", then this will take 3 seconds. But what does it mean to serve 7
     * stored permits? As explained above, there is no unique answer. If we are primarily
     * interested to deal with underutilization, then we want stored permits to be given out
     * /faster/ than fresh ones, because underutilization = free resources for the taking.
     * If we are primarily interested to deal with overflow, then stored permits could
     * be given out /slower/ than fresh ones. Thus, we require a (different in each case)
     * function that translates storedPermits to throtting time.
     *
     * This role is played by storedPermitsToWaitTime(double storedPermits, double permitsToTake).
     * The underlying model is a continuous function mapping storedPermits
     * (from 0.0 to maxStoredPermits) onto the 1/rate (i.e. intervals) that is effective at the given
     * storedPermits. "storedPermits" essentially measure unused time; we spend unused time
     * buying/storing permits. Rate is "permits / time", thus "1 / rate = time / permits".
     * Thus, "1/rate" (time / permits) times "permits" gives time, i.e., integrals on this
     * function (which is what storedPermitsToWaitTime() computes) correspond to minimum intervals
     * between subsequent requests, for the specified number of requested permits.
     *
     * Here is an example of storedPermitsToWaitTime:
     * If storedPermits == 10.0, and we want 3 permits, we take them from storedPermits,
     * reducing them to 7.0, and compute the throttling for these as a call to
     * storedPermitsToWaitTime(storedPermits = 10.0, permitsToTake = 3.0), which will
     * evaluate the integral of the function from 7.0 to 10.0.
     *
     * Using integrals guarantees that the effect of a single acquire(3) is equivalent
     * to { acquire(1); acquire(1); acquire(1); }, or { acquire(2); acquire(1); }, etc,
     * since the integral of the function in [7.0, 10.0] is equivalent to the sum of the
     * integrals of [7.0, 8.0], [8.0, 9.0], [9.0, 10.0] (and so on), no matter
     * what the function is. This guarantees that we handle correctly requests of varying weight
     * (permits), /no matter/ what the actual function is - so we can tweak the latter freely.
     * (The only requirement, obviously, is that we can compute its integrals).
     *
     * Note well that if, for this function, we chose a horizontal line, at height of exactly
     * (1/QPS), then the effect of the function is non-existent: we serve storedPermits at
     * exactly the same cost as fresh ones (1/QPS is the cost for each). We use this trick later.
     *
     * If we pick a function that goes /below/ that horizontal line, it means that we reduce
     * the area of the function, thus time. Thus, the RateLimiter becomes /faster/ after a
     * period of underutilization. If, on the other hand, we pick a function that
     * goes /above/ that horizontal line, then it means that the area (time) is increased,
     * thus storedPermits are more costly than fresh permits, thus the RateLimiter becomes
     * /slower/ after a period of underutilization.
     *
     * Last, but not least: consider a RateLimiter with rate of 1 permit per second, currently
     * completely unused, and an expensive acquire(100) request comes. It would be nonsensical
     * to just wait for 100 seconds, and /then/ start the actual task. Why wait without doing
     * anything? A much better approach is to /allow/ the request right away (as if it was an
     * acquire(1) request instead), and postpone /subsequent/ requests as needed. In this version,
     * we allow starting the task immediately, and postpone by 100 seconds future requests,
     * thus we allow for work to get done in the meantime instead of waiting idly.
     *
     * This has important consequences: it means that the RateLimiter doesn't remember the time
     * of the _last_ request, but it remembers the (expected) time of the _next_ request. This
     * also enables us to tell immediately (see tryAcquire(timeout)) whether a particular
     * timeout is enough to get us to the point of the next scheduling time, since we always
     * maintain that. And what we mean by "an unused RateLimiter" is also defined by that
     * notion: when we observe that the "expected arrival time of the next request" is actually
     * in the past, then the difference (now - past) is the amount of time that the RateLimiter
     * was formally unused, and it is that amount of time which we translate to storedPermits.
     * (We increase storedPermits with the amount of permits that would have been produced
     * in that idle time). So, if rate == 1 permit per second, and arrivals come exactly
     * one second after the previous, then storedPermits is _never_ increased -- we would only
     * increase it for arrivals _later_ than the expected one second.
     */

    /**
      * Creates a {@code RateLimiter} with the specified stable throughput, given as
      * "permits per second" (commonly referred to as <i>QPS</i>, queries per second).
      *
      * <p>The returned {@code RateLimiter} ensures that on average no more than {@code
      * permitsPerSecond} are issued during any given second, with sustained requests
      * being smoothly spread over each second. When the incoming request rate exceeds
      * {@code permitsPerSecond} the rate limiter will release one permit every {@code
      * (1.0 / permitsPerSecond)} seconds. When the rate limiter is unused,
      * bursts of up to {@code permitsPerSecond} permits will be allowed, with subsequent
      * requests being smoothly limited at the stable rate of {@code permitsPerSecond}.
      *
      * @param permitsPerSecond the rate of the returned {@code RateLimiter}, measured in
      *        how many permits become available per second.
      */
    public static RateLimiter create(double permitsPerSecond) {
        return create(SleepingTicker.SYSTEM_TICKER, permitsPerSecond);
    }

    static RateLimiter create(SleepingTicker ticker, double permitsPerSecond) {
        RateLimiter rateLimiter = new Bursty(ticker);
        rateLimiter.setRate(permitsPerSecond);
        return rateLimiter;
    }

    /**
      * Creates a {@code RateLimiter} with the specified stable throughput, given as
      * "permits per second" (commonly referred to as <i>QPS</i>, queries per second), and a
      * <i>warmup period</i>, during which the {@code RateLimiter} smoothly ramps up its rate,
      * until it reaches its maximum rate at the end of the period (as long as there are enough
      * requests to saturate it). Similarly, if the {@code RateLimiter} is left <i>unused</i> for
      * a duration of {@code warmupPeriod}, it will gradually return to its "cold" state,
      * i.e. it will go through the same warming up process as when it was first created.
      *
      * <p>The returned {@code RateLimiter} is intended for cases where the resource that actually
      * fulfils the requests (e.g., a remote server) needs "warmup" time, rather than
      * being immediately accessed at the stable (maximum) rate.
      *
      * <p>The returned {@code RateLimiter} starts in a "cold" state (i.e. the warmup period
      * will follow), and if it is left unused for long enough, it will return to that state.
      *
      * @param permitsPerSecond the rate of the returned {@code RateLimiter}, measured in
      *        how many permits become available per second
      * @param warmupPeriod the duration of the period where the {@code RateLimiter} ramps up its
      *        rate, before reaching its stable (maximum) rate
      * @param unit the time unit of the warmupPeriod argument
      */
    // TODO(user): add a burst size of 1-second-worth of permits, as in the metronome?
    public static RateLimiter create(double permitsPerSecond, long warmupPeriod, TimeUnit unit) {
        return create(SleepingTicker.SYSTEM_TICKER, permitsPerSecond, warmupPeriod, unit);
    }

    static RateLimiter create(
        SleepingTicker ticker, double permitsPerSecond, long warmupPeriod, TimeUnit timeUnit) {
        RateLimiter rateLimiter = new WarmingUp(ticker, warmupPeriod, timeUnit);
        rateLimiter.setRate(permitsPerSecond);
        return rateLimiter;
    }

    static RateLimiter createBursty(
        SleepingTicker ticker, double permitsPerSecond, int maxBurstSize) {
        Bursty rateLimiter = new Bursty(ticker);
        rateLimiter.setRate(permitsPerSecond);
        rateLimiter.maxPermits = maxBurstSize;
        return rateLimiter;
    }

    /**
      * The underlying timer; used both to measure elapsed time and sleep as necessary. A separate
      * object to facilitate testing.
      */
    private final SleepingTicker ticker;

    /**
      * The timestamp when the RateLimiter was created; used to avoid possible overflow/time-wrapping
      * errors.
      */
    private final long offsetNanos;

    /**
      * The currently stored permits.
      */
    double storedPermits;

    /**
      * The maximum number of stored permits.
      */
    double maxPermits;

    /**
      * The interval between two unit requests, at our stable rate. E.g., a stable rate of 5 permits
      * per second has a stable interval of 200ms.
      */
    volatile double stableIntervalMicros;

    private final Object mutex = new Object();

    /**
      * The time when the next request (no matter its size) will be granted. After granting a request,
      * this is pushed further in the future. Large requests push this further than small requests.
      */
    private long nextFreeTicketMicros = 0L; // could be either in the past or future

    private RateLimiter(SleepingTicker ticker) {
        this.ticker = ticker;
        this.offsetNanos = ticker.read();
    }

    /**
      * Updates the stable rate of this {@code RateLimiter}, that is, the
      * {@code permitsPerSecond} argument provided in the factory method that
      * constructed the {@code RateLimiter}. Currently throttled threads will <b>not</b>
      * be awakened as a result of this invocation, thus they do not observe the new rate;
      * only subsequent requests will.
      *
      * <p>Note though that, since each request repays (by waiting, if necessary) the cost
      * of the <i>previous</i> request, this means that the very next request
      * after an invocation to {@code setRate} will not be affected by the new rate;
      * it will pay the cost of the previous request, which is in terms of the previous rate.
      *
      * <p>The behavior of the {@code RateLimiter} is not modified in any other way,
      * e.g. if the {@code RateLimiter} was configured with a warmup period of 20 seconds,
      * it still has a warmup period of 20 seconds after this method invocation.
      *
      * @param permitsPerSecond the new stable rate of this {@code RateLimiter}.
      */
    public final void setRate(double permitsPerSecond) {
        Preconditions.checkArgument(permitsPerSecond > 0.0
            && !Double.isNaN(permitsPerSecond), "rate must be positive");
        synchronized (mutex) {
            resync(readSafeMicros());
            double stableIntervalMicros = TimeUnit.SECONDS.toMicros(1L) / permitsPerSecond;
            this.stableIntervalMicros = stableIntervalMicros;
            doSetRate(permitsPerSecond, stableIntervalMicros);
        }
    }

    abstract void doSetRate(double permitsPerSecond, double stableIntervalMicros);

    /**
      * Returns the stable rate (as {@code permits per seconds}) with which this
      * {@code RateLimiter} is configured with. The initial value of this is the same as
      * the {@code permitsPerSecond} argument passed in the factory method that produced
      * this {@code RateLimiter}, and it is only updated after invocations
      * to {@linkplain #setRate}.
      */
    public final double getRate() {
        return TimeUnit.SECONDS.toMicros(1L) / stableIntervalMicros;
    }

    /**
      * Acquires a permit from this {@code RateLimiter}, blocking until the request can be granted.
      *
      * <p>This method is equivalent to {@code acquire(1)}.
      */
    public void acquire() {
        acquire(1);
    }

    /**
      * Acquires the given number of permits from this {@code RateLimiter}, blocking until the
      * request be granted.
      *
      * @param permits the number of permits to acquire
      */
    public void acquire(int permits) {
        checkPermits(permits);
        long microsToWait;
        synchronized (mutex) {
            microsToWait = reserveNextTicket(permits, readSafeMicros());
        }
        ticker.sleepMicrosUninterruptibly(microsToWait);
    }

    /**
      * Acquires a permit from this {@code RateLimiter} if it can be obtained
      * without exceeding the specified {@code timeout}, or returns {@code false}
      * immediately (without waiting) if the permit would not have been granted
      * before the timeout expired.
      *
      * <p>This method is equivalent to {@code tryAcquire(1, timeout, unit)}.
      *
      * @param timeout the maximum time to wait for the permit
      * @param unit the time unit of the timeout argument
      * @return {@code true} if the permit was acquired, {@code false} otherwise
      */
    public boolean tryAcquire(long timeout, TimeUnit unit) {
        return tryAcquire(1, timeout, unit);
    }

    /**
      * Acquires permits from this {@link RateLimiter} if it can be acquired immediately without delay.
      *
      * <p>
      * This method is equivalent to {@code tryAcquire(permits, 0, anyUnit)}.
      *
      * @param permits the number of permits to acquire
      * @return {@code true} if the permits were acquired, {@code false} otherwise
      * @since 14.0
      */
    public boolean tryAcquire(int permits) {
        return tryAcquire(permits, 0, TimeUnit.MICROSECONDS);
    }

    /**
      * Acquires a permit from this {@link RateLimiter} if it can be acquired immediately without
      * delay.
      *
      * <p>
      * This method is equivalent to {@code tryAcquire(1)}.
      *
      * @return {@code true} if the permit was acquired, {@code false} otherwise
      * @since 14.0
      */
    public boolean tryAcquire() {
        return tryAcquire(1, 0, TimeUnit.MICROSECONDS);
    }

    /**
      * Acquires the given number of permits from this {@code RateLimiter} if it can be obtained
      * without exceeding the specified {@code timeout}, or returns {@code false}
      * immediately (without waiting) if the permits would not have been granted
      * before the timeout expired.
      *
      * @param permits the number of permits to acquire
      * @param timeout the maximum time to wait for the permits
      * @param unit the time unit of the timeout argument
      * @return {@code true} if the permits were acquired, {@code false} otherwise
      */
    public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {
        long timeoutMicros = unit.toMicros(timeout);
        checkPermits(permits);
        long microsToWait;
        synchronized (mutex) {
            long nowMicros = readSafeMicros();
            if (nextFreeTicketMicros > nowMicros + timeoutMicros) {
                return false;
            } else {
                microsToWait = reserveNextTicket(permits, nowMicros);
            }
        }
        ticker.sleepMicrosUninterruptibly(microsToWait);
        return true;
    }

    private static void checkPermits(int permits) {
        Preconditions.checkArgument(permits > 0, "Requested permits must be positive");
    }

    /**
      * Reserves next ticket and returns the wait time that the caller must wait for.
      */
    private long reserveNextTicket(double requiredPermits, long nowMicros) {
        resync(nowMicros);
        long microsToNextFreeTicket = nextFreeTicketMicros - nowMicros;
        double storedPermitsToSpend = Math.min(requiredPermits, this.storedPermits);
        double freshPermits = requiredPermits - storedPermitsToSpend;

        long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend)
        + (long) (freshPermits * stableIntervalMicros);

        this.nextFreeTicketMicros = nextFreeTicketMicros + waitMicros;
        this.storedPermits -= storedPermitsToSpend;
        return microsToNextFreeTicket;
    }

    /**
      * Translates a specified portion of our currently stored permits which we want to
      * spend/acquire, into a throttling time. Conceptually, this evaluates the integral
      * of the underlying function we use, for the range of
      * [(storedPermits - permitsToTake), storedPermits].
      *
      * This always holds: {@code 0 <= permitsToTake <= storedPermits}
      */
    abstract long storedPermitsToWaitTime(double storedPermits, double permitsToTake);

    private void resync(long nowMicros) {
        // if nextFreeTicket is in the past, resync to now
        if (nowMicros > nextFreeTicketMicros) {
            storedPermits = Math.min(maxPermits,
                storedPermits + (nowMicros - nextFreeTicketMicros) / stableIntervalMicros);
            nextFreeTicketMicros = nowMicros;
        }
    }

    private long readSafeMicros() {
        return TimeUnit.NANOSECONDS.toMicros(ticker.read() - offsetNanos);
    }

    @Override
    public String toString() {
        return String.format("RateLimiter[stableRate=%3.1fqps]", 1000000.0 / stableIntervalMicros);
    }

    /**
      * This implements the following function:
      *
      *          ^ throttling
      *          |
      * 3*stable +                  /
      * interval |                 /.
      *  (cold)  |                / .
      *          |               /  .   <-- "warmup period" is the area of the trapezoid between
      * 2*stable +              /   .       halfPermits and maxPermits
      * interval |             /    .
      *          |            /     .
      *          |           /      .
      *   stable +----------/  WARM . }
      * interval |          .   UP  . } <-- this rectangle (from 0 to maxPermits, and
      *          |          . PERIOD. }     height == stableInterval) defines the cooldown period,
      *          |          .       . }     and we want cooldownPeriod == warmupPeriod
      *          |---------------------------------> storedPermits
      *              (halfPermits) (maxPermits)
      *
      * Before going into the details of this particular function, let's keep in mind the basics:
      * 1) The state of the RateLimiter (storedPermits) is a vertical line in this figure.
      * 2) When the RateLimiter is not used, this goes right (up to maxPermits)
      * 3) When the RateLimiter is used, this goes left (down to zero), since if we have storedPermits,
      *    we serve from those first
      * 4) When _unused_, we go right at the same speed (rate)! I.e., if our rate is
      *    2 permits per second, and 3 unused seconds pass, we will always save 6 permits
      *    (no matter what our initial position was), up to maxPermits.
      *    If we invert the rate, we get the "stableInterval" (interval between two requests
      *    in a perfectly spaced out sequence of requests of the given rate). Thus, if you
      *    want to see "how much time it will take to go from X storedPermits to X+K storedPermits?",
      *    the answer is always stableInterval * K. In the same example, for 2 permits per second,
      *    stableInterval is 500ms. Thus to go from X storedPermits to X+6 storedPermits, we
      *    require 6 * 500ms = 3 seconds.
      *
      *    In short, the time it takes to move to the right (save K permits) is equal to the
      *    rectangle of width == K and height == stableInterval.
      * 4) When _used_, the time it takes, as explained in the introductory class note, is
      *    equal to the integral of our function, between X permits and X-K permits, assuming
      *    we want to spend K saved permits.
      *
      *    In summary, the time it takes to move to the left (spend K permits), is equal to the
      *    area of the function of width == K.
      *
      * Let's dive into this function now:
      *
      * When we have storedPermits <= halfPermits (the left portion of the function), then
      * we spend them at the exact same rate that
      * fresh permits would be generated anyway (that rate is 1/stableInterval). We size
      * this area to be equal to _half_ the specified warmup period. Why we need this?
      * And why half? We'll explain shortly below (after explaining the second part).
      *
      * Stored permits that are beyond halfPermits, are mapped to an ascending line, that goes
      * from stableInterval to 3 * stableInterval. The average height for that part is
      * 2 * stableInterval, and is sized appropriately to have an area _equal_ to the
      * specified warmup period. Thus, by point (4) above, it takes "warmupPeriod" amount of time
      * to go from maxPermits to halfPermits.
      *
      * BUT, by point (3) above, it only takes "warmupPeriod / 2" amount of time to return back
      * to maxPermits, from halfPermits! (Because the trapezoid has double the area of the rectangle
      * of height stableInterval and equivalent width). We decided that the "cooldown period"
      * time should be equivalent to "warmup period", thus a fully saturated RateLimiter
      * (with zero stored permits, serving only fresh ones) can go to a fully unsaturated
      * (with storedPermits == maxPermits) in the same amount of time it takes for a fully
      * unsaturated RateLimiter to return to the stableInterval -- which happens in halfPermits,
      * since beyond that point, we use a horizontal line of "stableInterval" height, simulating
      * the regular rate.
      *
      * Thus, we have figured all dimensions of this shape, to give all the desired
      * properties:
      * - the width is warmupPeriod / stableInterval, to make cooldownPeriod == warmupPeriod
      * - the slope starts at the middle, and goes from stableInterval to 3*stableInterval so
      *   to have halfPermits being spend in double the usual time (half the rate), while their
      *   respective rate is steadily ramping up
      */
    private static class WarmingUp extends RateLimiter {

        final long warmupPeriodMicros;
        /**
          * The slope of the line from the stable interval (when permits == 0), to the cold interval
          * (when permits == maxPermits)
          */
        private double slope;
        private double halfPermits;

        WarmingUp(SleepingTicker ticker, long warmupPeriod, TimeUnit timeUnit) {
            super(ticker);
            this.warmupPeriodMicros = timeUnit.toMicros(warmupPeriod);
        }

        @Override
        void doSetRate(double permitsPerSecond, double stableIntervalMicros) {
            double oldMaxPermits = maxPermits;
            maxPermits = warmupPeriodMicros / stableIntervalMicros;
            halfPermits = maxPermits / 2.0;
            // Stable interval is x, cold is 3x, so on average it's 2x. Double the time -> halve the rate
            double coldIntervalMicros = stableIntervalMicros * 3.0;
            slope = (coldIntervalMicros - stableIntervalMicros) / halfPermits;
            if (oldMaxPermits == Double.POSITIVE_INFINITY) {
                // if we don't special-case this, we would get storedPermits == NaN, below
                storedPermits = 0.0;
            } else {
                storedPermits = (oldMaxPermits == 0.0)
                ? maxPermits // initial state is cold
                    : storedPermits * maxPermits / oldMaxPermits;
            }
        }

        @Override
        long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
            double availablePermitsAboveHalf = storedPermits - halfPermits;
            long micros = 0;
            // measuring the integral on the right part of the function (the climbing line)
            if (availablePermitsAboveHalf > 0.0) {
                double permitsAboveHalfToTake = Math.min(availablePermitsAboveHalf, permitsToTake);
                micros = (long) (permitsAboveHalfToTake * (permitsToTime(availablePermitsAboveHalf)
                    + permitsToTime(availablePermitsAboveHalf - permitsAboveHalfToTake)) / 2.0);
                permitsToTake -= permitsAboveHalfToTake;
            }
            // measuring the integral on the left part of the function (the horizontal line)
            micros += (stableIntervalMicros * permitsToTake);
            return micros;
        }

        private double permitsToTime(double permits) {
            return stableIntervalMicros + permits * slope;
        }
    }

    /**
      * This implements a trivial function, where storedPermits are translated to
      * zero throttling - thus, a client gets an infinite speedup for permits acquired out
      * of the storedPermits pool. This is also used for the special case of the "metronome",
      * where the width of the function is also zero; maxStoredPermits is zero, thus
      * storedPermits and permitsToTake are always zero as well. Such a RateLimiter can
      * not save permits when unused, thus all permits it serves are fresh, using the
      * designated rate.
      */
    private static class Bursty extends RateLimiter {
        Bursty(SleepingTicker ticker) {
            super(ticker);
        }

        @Override
        void doSetRate(double permitsPerSecond, double stableIntervalMicros) {
            double oldMaxPermits = this.maxPermits;
            /*
             * We allow the equivalent work of up to one second to be granted with zero waiting, if the
             * rate limiter has been unused for as much. This is to avoid potentially producing tiny
             * wait interval between subsequent requests for sufficiently large rates, which would
             * unnecessarily overconstrain the thread scheduler.
             */
            maxPermits = permitsPerSecond; // one second worth of permits
            storedPermits = (oldMaxPermits == 0.0)
            ? 0.0 // initial state
                : storedPermits * maxPermits / oldMaxPermits;
        }

        @Override
        long storedPermitsToWaitTime(double storedPermits, double permitsToTake) {
            return 0L;
        }
    }
}

abstract class SleepingTicker extends Ticker {
    abstract void sleepMicrosUninterruptibly(long micros);

    static final SleepingTicker SYSTEM_TICKER = new SleepingTicker() {
        @Override
        public long read() {
            return systemTicker().read();
        }

        @Override
        public void sleepMicrosUninterruptibly(long micros) {
            if (micros > 0) {
                Uninterruptibles.sleepUninterruptibly(micros, TimeUnit.MICROSECONDS);
            }
        }
    };
}


================================================
FILE: src/main/java/com/audienceproject/shaded/google/common/util/concurrent/Uninterruptibles.java
================================================
package com.audienceproject.shaded.google.common.util.concurrent;

/*
 * Notice:
 * This file was modified at AudienceProject ApS by Cosmin Catalin Sanda (cosmin@audienceproject.com)
 */

/*
 * Copyright (C) 2011 The Guava Authors
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import com.audienceproject.shaded.google.common.base.Preconditions;

import java.util.concurrent.*;

import static java.util.concurrent.TimeUnit.NANOSECONDS;

/**
 * Utilities for treating interruptible operations as uninterruptible.
 * In all cases, if a thread is interrupted during such a call, the call
 * continues to block until the result is available or the timeout elapses,
 * and only then re-interrupts the thread.
 *
 * @author Anthony Zana
 * @since 10.0
 */
public final class Uninterruptibles {

    // Implementation Note: As of 3-7-11, the logic for each blocking/timeout
    // methods is identical, save for method being invoked.

    /**
     * Invokes {@code latch.}{@link CountDownLatch#await() await()}
     * uninterruptibly.
     */
    public static void awaitUninterruptibly(CountDownLatch latch) {
        boolean interrupted = false;
        try {
            while (true) {
                try {
                    latch.await();
                    return;
                } catch (InterruptedException e) {
                    interrupted = true;
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes
     * {@code latch.}{@link CountDownLatch#await(long, TimeUnit)
     * await(timeout, unit)} uninterruptibly.
     */
    public static boolean awaitUninterruptibly(CountDownLatch latch,
                                               long timeout, TimeUnit unit) {
        boolean interrupted = false;
        try {
            long remainingNanos = unit.toNanos(timeout);
            long end = System.nanoTime() + remainingNanos;

            while (true) {
                try {
                    // CountDownLatch treats negative timeouts just like zero.
                    return latch.await(remainingNanos, NANOSECONDS);
                } catch (InterruptedException e) {
                    interrupted = true;
                    remainingNanos = end - System.nanoTime();
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes {@code toJoin.}{@link Thread#join() join()} uninterruptibly.
     */
    public static void joinUninterruptibly(Thread toJoin) {
        boolean interrupted = false;
        try {
            while (true) {
                try {
                    toJoin.join();
                    return;
                } catch (InterruptedException e) {
                    interrupted = true;
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes {@code future.}{@link Future#get() get()} uninterruptibly.
     * To get uninterruptibility and remove checked exceptions.
     *
     * <p>If instead, you wish to treat {@link InterruptedException} uniformly
     * with other exceptions.
     *
     * @throws ExecutionException if the computation threw an exception
     * @throws CancellationException if the computation was cancelled
     */
    public static <V> V getUninterruptibly(Future<V> future)
        throws ExecutionException {
        boolean interrupted = false;
        try {
            while (true) {
                try {
                    return future.get();
                } catch (InterruptedException e) {
                    interrupted = true;
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes
     * {@code future.}{@link Future#get(long, TimeUnit) get(timeout, unit)}
     * uninterruptibly.
     *
     * <p>If instead, you wish to treat {@link InterruptedException} uniformly
     * with other exceptions.
     *
     * @throws ExecutionException if the computation threw an exception
     * @throws CancellationException if the computation was cancelled
     * @throws TimeoutException if the wait timed out
     */
    public static <V> V getUninterruptibly(
        Future<V> future, long timeout,  TimeUnit unit)
        throws ExecutionException, TimeoutException {
        boolean interrupted = false;
        try {
            long remainingNanos = unit.toNanos(timeout);
            long end = System.nanoTime() + remainingNanos;

            while (true) {
                try {
                    // Future treats negative timeouts just like zero.
                    return future.get(remainingNanos, NANOSECONDS);
                } catch (InterruptedException e) {
                    interrupted = true;
                    remainingNanos = end - System.nanoTime();
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes
     * {@code unit.}{@link TimeUnit#timedJoin(Thread, long)
     * timedJoin(toJoin, timeout)} uninterruptibly.
     */
    public static void joinUninterruptibly(Thread toJoin,
                                           long timeout, TimeUnit unit) {
        Preconditions.checkNotNull(toJoin);
        boolean interrupted = false;
        try {
            long remainingNanos = unit.toNanos(timeout);
            long end = System.nanoTime() + remainingNanos;
            while (true) {
                try {
                    // TimeUnit.timedJoin() treats negative timeouts just like zero.
                    NANOSECONDS.timedJoin(toJoin, remainingNanos);
                    return;
                } catch (InterruptedException e) {
                    interrupted = true;
                    remainingNanos = end - System.nanoTime();
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes {@code queue.}{@link BlockingQueue#take() take()} uninterruptibly.
     */
    public static <E> E takeUninterruptibly(BlockingQueue<E> queue) {
        boolean interrupted = false;
        try {
            while (true) {
                try {
                    return queue.take();
                } catch (InterruptedException e) {
                    interrupted = true;
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    /**
     * Invokes {@code queue.}{@link BlockingQueue#put(Object) put(element)}
     * uninterruptibly.
     *
     * @throws ClassCastException if the class of the specified element prevents
     *     it from being added to the given queue
     * @throws IllegalArgumentException if some property of the specified element
     *     prevents it from being added to the given queue
     */
    public static <E> void putUninterruptibly(BlockingQueue<E> queue, E element) {
        boolean interrupted = false;
        try {
            while (true) {
                try {
                    queue.put(element);
                    return;
                } catch (InterruptedException e) {
                    interrupted = true;
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    // TODO(user): Support Sleeper somehow (wrapper or interface method)?
    /**
     * Invokes {@code unit.}{@link TimeUnit#sleep(long) sleep(sleepFor)}
     * uninterruptibly.
     */
    public static void sleepUninterruptibly(long sleepFor, TimeUnit unit) {
        boolean interrupted = false;
        try {
            long remainingNanos = unit.toNanos(sleepFor);
            long end = System.nanoTime() + remainingNanos;
            while (true) {
                try {
                    // TimeUnit.sleep() treats negative timeouts just like zero.
                    NANOSECONDS.sleep(remainingNanos);
                    return;
                } catch (InterruptedException e) {
                    interrupted = true;
                    remainingNanos = end - System.nanoTime();
                }
            }
        } finally {
            if (interrupted) {
                Thread.currentThread().interrupt();
            }
        }
    }

    // TODO(user): Add support for waitUninterruptibly.

    private Uninterruptibles() {}
}



================================================
FILE: src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister
================================================
com.audienceproject.spark.dynamodb.datasource.DefaultSource


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/attribute.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import scala.annotation.StaticAnnotation

final case class attribute(name: String) extends StaticAnnotation


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/catalyst/JavaConverter.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.catalyst

import java.util

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.{ArrayData, MapData}
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String

import scala.collection.JavaConverters._

object JavaConverter {

    def convertRowValue(row: InternalRow, index: Int, elementType: DataType): Any = {
        elementType match {
            case ArrayType(innerType, _) => convertArray(row.getArray(index), innerType)
            case MapType(keyType, valueType, _) => convertMap(row.getMap(index), keyType, valueType)
            case StructType(fields) => convertStruct(row.getStruct(index, fields.length), fields)
            case StringType => row.getString(index)
            case LongType => row.getLong(index)
            case t: DecimalType => row.getDecimal(index, t.precision, t.scale).toBigDecimal
            case _ => row.get(index, elementType)
        }
    }

    def convertArray(array: ArrayData, elementType: DataType): Any = {
        elementType match {
            case ArrayType(innerType, _) => array.toSeq[ArrayData](elementType).map(convertArray(_, innerType)).asJava
            case MapType(keyType, valueType, _) => array.toSeq[MapData](elementType).map(convertMap(_, keyType, valueType)).asJava
            case structType: StructType => array.toSeq[InternalRow](structType).map(convertStruct(_, structType.fields)).asJava
            case StringType => convertStringArray(array).asJava
            case _ => array.toSeq[Any](elementType).asJava
        }
    }

    def convertMap(map: MapData, keyType: DataType, valueType: DataType): util.Map[String, Any] = {
        if (keyType != StringType) throw new IllegalArgumentException(
            s"Invalid Map key type '${keyType.typeName}'. DynamoDB only supports String as Map key type.")
        val keys = convertStringArray(map.keyArray())
        val values = valueType match {
            case ArrayType(innerType, _) => map.valueArray().toSeq[ArrayData](valueType).map(convertArray(_, innerType))
            case MapType(innerKeyType, innerValueType, _) => map.valueArray().toSeq[MapData](valueType).map(convertMap(_, innerKeyType, innerValueType))
            case structType: StructType => map.valueArray().toSeq[InternalRow](structType).map(convertStruct(_, structType.fields))
            case StringType => convertStringArray(map.valueArray())
            case _ => map.valueArray().toSeq[Any](valueType)
        }
        val kvPairs = for (i <- 0 until map.numElements()) yield keys(i) -> values(i)
        Map(kvPairs: _*).asJava
    }

    def convertStruct(row: InternalRow, fields: Seq[StructField]): util.Map[String, Any] = {
        val kvPairs = for (i <- 0 until row.numFields) yield
            if (row.isNullAt(i)) fields(i).name -> null
            else fields(i).name -> convertRowValue(row, i, fields(i).dataType)
        Map(kvPairs: _*).asJava
    }

    def convertStringArray(array: ArrayData): Seq[String] =
        array.toSeq[UTF8String](StringType).map(_.toString)

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/ColumnSchema.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import org.apache.spark.sql.types.{DataType, StructType}

private[dynamodb] class ColumnSchema(keySchema: KeySchema,
                                     sparkSchema: StructType) {

    type Attr = (String, Int, DataType)

    private val columnNames = sparkSchema.map(_.name)

    private val keyIndices = keySchema match {
        case KeySchema(hashKey, None) =>
            val hashKeyIndex = columnNames.indexOf(hashKey)
            val hashKeyType = sparkSchema(hashKey).dataType
            Left(hashKey, hashKeyIndex, hashKeyType)
        case KeySchema(hashKey, Some(rangeKey)) =>
            val hashKeyIndex = columnNames.indexOf(hashKey)
            val rangeKeyIndex = columnNames.indexOf(rangeKey)
            val hashKeyType = sparkSchema(hashKey).dataType
            val rangeKeyType = sparkSchema(rangeKey).dataType
            Right((hashKey, hashKeyIndex, hashKeyType), (rangeKey, rangeKeyIndex, rangeKeyType))
    }

    private val attributeIndices = columnNames.zipWithIndex.filterNot({
        case (name, _) => keySchema match {
            case KeySchema(hashKey, None) => name == hashKey
            case KeySchema(hashKey, Some(rangeKey)) => name == hashKey || name == rangeKey
        }
    }).map({
        case (name, index) => (name, index, sparkSchema(name).dataType)
    })

    def keys(): Either[Attr, (Attr, Attr)] = keyIndices

    def attributes(): Seq[Attr] = attributeIndices

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/DynamoConnector.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import com.amazonaws.auth.profile.ProfileCredentialsProvider
import com.amazonaws.auth.{AWSCredentialsProvider, AWSStaticCredentialsProvider, BasicSessionCredentials, DefaultAWSCredentialsProviderChain}
import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration
import com.amazonaws.services.dynamodbv2.document.{DynamoDB, ItemCollection, ScanOutcome}
import com.amazonaws.services.dynamodbv2.{AmazonDynamoDB, AmazonDynamoDBAsync, AmazonDynamoDBAsyncClientBuilder, AmazonDynamoDBClientBuilder}
import com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClientBuilder
import com.amazonaws.services.securitytoken.model.AssumeRoleRequest
import org.apache.spark.sql.sources.Filter

private[dynamodb] trait DynamoConnector {

    @transient private lazy val properties = sys.props

    def getDynamoDB(region: Option[String] = None, roleArn: Option[String] = None, providerClassName: Option[String] = None): DynamoDB = {
        val client: AmazonDynamoDB = getDynamoDBClient(region, roleArn, providerClassName)
        new DynamoDB(client)
    }

    private def getDynamoDBClient(region: Option[String] = None,
                                  roleArn: Option[String] = None,
                                  providerClassName: Option[String]): AmazonDynamoDB = {
        val chosenRegion = region.getOrElse(properties.getOrElse("aws.dynamodb.region", "us-east-1"))
        val credentials = getCredentials(chosenRegion, roleArn, providerClassName)

        properties.get("aws.dynamodb.endpoint").map(endpoint => {
            AmazonDynamoDBClientBuilder.standard()
                .withCredentials(credentials)
                .withEndpointConfiguration(new EndpointConfiguration(endpoint, chosenRegion))
                .build()
        }).getOrElse(
            AmazonDynamoDBClientBuilder.standard()
                .withCredentials(credentials)
                .withRegion(chosenRegion)
                .build()
        )
    }

    def getDynamoDBAsyncClient(region: Option[String] = None,
                               roleArn: Option[String] = None,
                               providerClassName: Option[String] = None): AmazonDynamoDBAsync = {
        val chosenRegion = region.getOrElse(properties.getOrElse("aws.dynamodb.region", "us-east-1"))
        val credentials = getCredentials(chosenRegion, roleArn, providerClassName)

        properties.get("aws.dynamodb.endpoint").map(endpoint => {
            AmazonDynamoDBAsyncClientBuilder.standard()
                .withCredentials(credentials)
                .withEndpointConfiguration(new EndpointConfiguration(endpoint, chosenRegion))
                .build()
        }).getOrElse(
            AmazonDynamoDBAsyncClientBuilder.standard()
                .withCredentials(credentials)
                .withRegion(chosenRegion)
                .build()
        )
    }

    /**
     * Get credentials from an instantiated object of the class name given
     * or a passed in arn
     * or from profile
     * or return the default credential provider
     **/
    private def getCredentials(chosenRegion: String, roleArn: Option[String], providerClassName: Option[String]) = {
        providerClassName.map(providerClass => {
            Class.forName(providerClass).newInstance.asInstanceOf[AWSCredentialsProvider]
        }).orElse(roleArn.map(arn => {
            val stsClient = properties.get("aws.sts.endpoint").map(endpoint => {
                AWSSecurityTokenServiceClientBuilder
                    .standard()
                    .withCredentials(new DefaultAWSCredentialsProviderChain)
                    .withEndpointConfiguration(new EndpointConfiguration(endpoint, chosenRegion))
                    .build()
            }).getOrElse(
                // STS without an endpoint will sign from the region, but use the global endpoint
                AWSSecurityTokenServiceClientBuilder
                    .standard()
                    .withCredentials(new DefaultAWSCredentialsProviderChain)
                    .withRegion(chosenRegion)
                    .build()
            )
            val assumeRoleResult = stsClient.assumeRole(
                new AssumeRoleRequest()
                    .withRoleSessionName("DynamoDBAssumed")
                    .withRoleArn(arn)
            )
            val stsCredentials = assumeRoleResult.getCredentials
            val assumeCreds = new BasicSessionCredentials(
                stsCredentials.getAccessKeyId,
                stsCredentials.getSecretAccessKey,
                stsCredentials.getSessionToken
            )
            new AWSStaticCredentialsProvider(assumeCreds)
        })).orElse(properties.get("aws.profile").map(new ProfileCredentialsProvider(_)))
            .getOrElse(new DefaultAWSCredentialsProviderChain)
    }

    val keySchema: KeySchema

    val readLimit: Double

    val itemLimit: Int

    val totalSegments: Int

    val filterPushdownEnabled: Boolean

    def scan(segmentNum: Int, columns: Seq[String], filters: Seq[Filter]): ItemCollection[ScanOutcome]

    def isEmpty: Boolean = itemLimit == 0

    def nonEmpty: Boolean = !isEmpty

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/DynamoWritable.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import com.amazonaws.services.dynamodbv2.document.DynamoDB
import com.audienceproject.shaded.google.common.util.concurrent.RateLimiter
import org.apache.spark.sql.catalyst.InternalRow

private[dynamodb] trait DynamoWritable {

    val writeLimit: Double

    def putItems(columnSchema: ColumnSchema, items: Seq[InternalRow])
                (client: DynamoDB, rateLimiter: RateLimiter): Unit

    def updateItem(columnSchema: ColumnSchema, item: InternalRow)
                  (client: DynamoDB, rateLimiter: RateLimiter): Unit

    def deleteItems(columnSchema: ColumnSchema, itema: Seq[InternalRow])
                   (client: DynamoDB, rateLimiter: RateLimiter): Unit

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/FilterPushdown.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder.{BOOL => newBOOL, N => newN, S => newS, _}
import com.amazonaws.services.dynamodbv2.xspec._
import org.apache.spark.sql.sources._

private[dynamodb] object FilterPushdown {

    def apply(filters: Seq[Filter]): Condition =
        filters.map(buildCondition).map(parenthesize).reduce[Condition](_ and _)

    /**
      * Accepts only filters that would be considered valid input to FilterPushdown.apply()
      *
      * @param filters input list which may contain both valid and invalid filters
      * @return a (valid, invalid) partitioning of the input filters
      */
    def acceptFilters(filters: Array[Filter]): (Array[Filter], Array[Filter]) =
        filters.partition(checkFilter)

    private def checkFilter(filter: Filter): Boolean = filter match {
        case _: StringEndsWith => false
        case And(left, right) => checkFilter(left) && checkFilter(right)
        case Or(left, right) => checkFilter(left) && checkFilter(right)
        case Not(f) => checkFilter(f)
        case _ => true
    }

    private def buildCondition(filter: Filter): Condition = filter match {
        case EqualTo(path, value: Boolean) => newBOOL(path).eq(value)
        case EqualTo(path, value) => coerceAndApply(_ eq _, _ eq _)(path, value)

        case GreaterThan(path, value) => coerceAndApply(_ gt _, _ gt _)(path, value)
        case GreaterThanOrEqual(path, value) => coerceAndApply(_ ge _, _ ge _)(path, value)

        case LessThan(path, value) => coerceAndApply(_ lt _, _ lt _)(path, value)
        case LessThanOrEqual(path, value) => coerceAndApply(_ le _, _ le _)(path, value)

        case In(path, values) =>
            val valueList = values.toList
            valueList match {
                case (_: String) :: _ => newS(path).in(valueList.asInstanceOf[List[String]]: _*)
                case (_: Boolean) :: _ => newBOOL(path).in(valueList.asInstanceOf[List[Boolean]]: _*)
                case (_: Int) :: _ => newN(path).in(valueList.map(_.asInstanceOf[Number]): _*)
                case (_: Long) :: _ => newN(path).in(valueList.map(_.asInstanceOf[Number]): _*)
                case (_: Short) :: _ => newN(path).in(valueList.map(_.asInstanceOf[Number]): _*)
                case (_: Float) :: _ => newN(path).in(valueList.map(_.asInstanceOf[Number]): _*)
                case (_: Double) :: _ => newN(path).in(valueList.map(_.asInstanceOf[Number]): _*)
                case Nil => throw new IllegalArgumentException("Unable to apply `In` filter with empty value list")
                case _ => throw new IllegalArgumentException(s"Type of values supplied to `In` filter on attribute $path not supported by filter pushdown")
            }

        case IsNull(path) => attribute_not_exists(path)
        case IsNotNull(path) => attribute_exists(path)

        case StringStartsWith(path, value) => newS(path).beginsWith(value)
        case StringContains(path, value) => newS(path).contains(value)
        case StringEndsWith(_, _) => throw new UnsupportedOperationException("Filter `StringEndsWith` is not supported by DynamoDB")

        case And(left, right) => parenthesize(buildCondition(left)) and parenthesize(buildCondition(right))
        case Or(left, right) => parenthesize(buildCondition(left)) or parenthesize(buildCondition(right))
        case Not(f) => parenthesize(buildCondition(f)).negate()
    }

    private def coerceAndApply(stringOp: (S, String) => Condition, numOp: (N, Number) => Condition)
                              (path: String, value: Any): Condition = value match {
        case string: String => stringOp(newS(path), string)
        case number: Int => numOp(newN(path), number)
        case number: Long => numOp(newN(path), number)
        case number: Short => numOp(newN(path), number)
        case number: Float => numOp(newN(path), number)
        case number: Double => numOp(newN(path), number)
        case _ => throw new IllegalArgumentException(s"Type of operand given to filter on attribute $path not supported by filter pushdown")
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/KeySchema.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import com.amazonaws.services.dynamodbv2.model.{KeySchemaElement, KeyType}

private[dynamodb] case class KeySchema(hashKeyName: String, rangeKeyName: Option[String])

private[dynamodb] object KeySchema {

    def fromDescription(keySchemaElements: Seq[KeySchemaElement]): KeySchema = {
        val hashKeyName = keySchemaElements.find(_.getKeyType == KeyType.HASH.toString).get.getAttributeName
        val rangeKeyName = keySchemaElements.find(_.getKeyType == KeyType.RANGE.toString).map(_.getAttributeName)
        KeySchema(hashKeyName, rangeKeyName)
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/TableConnector.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import com.amazonaws.services.dynamodbv2.document._
import com.amazonaws.services.dynamodbv2.document.spec.{BatchWriteItemSpec, ScanSpec, UpdateItemSpec}
import com.amazonaws.services.dynamodbv2.model.ReturnConsumedCapacity
import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder
import com.audienceproject.shaded.google.common.util.concurrent.RateLimiter
import com.audienceproject.spark.dynamodb.catalyst.JavaConverter
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.sources.Filter

import scala.annotation.tailrec
import scala.collection.JavaConverters._

private[dynamodb] class TableConnector(tableName: String, parallelism: Int, parameters: Map[String, String])
    extends DynamoConnector with DynamoWritable with Serializable {

    private val consistentRead = parameters.getOrElse("stronglyconsistentreads", "false").toBoolean
    private val filterPushdown = parameters.getOrElse("filterpushdown", "true").toBoolean
    private val region = parameters.get("region")
    private val roleArn = parameters.get("rolearn")
    private val providerClassName = parameters.get("providerclassname")

    override val filterPushdownEnabled: Boolean = filterPushdown

    override val (keySchema, readLimit, writeLimit, itemLimit, totalSegments) = {
        val table = getDynamoDB(region, roleArn, providerClassName).getTable(tableName)
        val desc = table.describe()

        // Key schema.
        val keySchema = KeySchema.fromDescription(desc.getKeySchema.asScala)

        // User parameters.
        val bytesPerRCU = parameters.getOrElse("bytesperrcu", "4000").toInt
        val maxPartitionBytes = parameters.getOrElse("maxpartitionbytes", "128000000").toInt
        val targetCapacity = parameters.getOrElse("targetcapacity", "1").toDouble
        val readFactor = if (consistentRead) 1 else 2

        // Table parameters.
        val tableSize = desc.getTableSizeBytes
        val itemCount = desc.getItemCount

        // Partitioning calculation.
        val numPartitions = parameters.get("readpartitions").map(_.toInt).getOrElse({
            val sizeBased = (tableSize / maxPartitionBytes).toInt max 1
            val remainder = sizeBased % parallelism
            if (remainder > 0) sizeBased + (parallelism - remainder)
            else sizeBased
        })

        // Provisioned or on-demand throughput.
        val readThroughput = parameters.getOrElse("throughput", Option(desc.getProvisionedThroughput.getReadCapacityUnits)
            .filter(_ > 0).map(_.longValue().toString)
            .getOrElse("100")).toLong
        val writeThroughput = parameters.getOrElse("throughput", Option(desc.getProvisionedThroughput.getWriteCapacityUnits)
            .filter(_ > 0).map(_.longValue().toString)
            .getOrElse("100")).toLong

        // Rate limit calculation.
        val avgItemSize = tableSize.toDouble / itemCount
        val readCapacity = readThroughput * targetCapacity
        val writeCapacity = writeThroughput * targetCapacity

        val readLimit = readCapacity / parallelism
        val itemLimit = ((bytesPerRCU / avgItemSize * readLimit).toInt * readFactor) max 1

        val writeLimit = writeCapacity / parallelism

        (keySchema, readLimit, writeLimit, itemLimit, numPartitions)
    }

    override def scan(segmentNum: Int, columns: Seq[String], filters: Seq[Filter]): ItemCollection[ScanOutcome] = {
        val scanSpec = new ScanSpec()
            .withSegment(segmentNum)
            .withTotalSegments(totalSegments)
            .withMaxPageSize(itemLimit)
            .withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
            .withConsistentRead(consistentRead)

        if (columns.nonEmpty) {
            val xspec = new ExpressionSpecBuilder().addProjections(columns: _*)

            if (filters.nonEmpty && filterPushdown) {
                xspec.withCondition(FilterPushdown(filters))
            }

            scanSpec.withExpressionSpec(xspec.buildForScan())
        }

        getDynamoDB(region, roleArn, providerClassName).getTable(tableName).scan(scanSpec)
    }

    override def putItems(columnSchema: ColumnSchema, items: Seq[InternalRow])
                         (client: DynamoDB, rateLimiter: RateLimiter): Unit = {
        // For each batch.
        val batchWriteItemSpec = new BatchWriteItemSpec().withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
        batchWriteItemSpec.withTableWriteItems(new TableWriteItems(tableName).withItemsToPut(
            // Map the items.
            items.map(row => {
                val item = new Item()

                // Map primary key.
                columnSchema.keys() match {
                    case Left((hashKey, hashKeyIndex, hashKeyType)) =>
                        item.withPrimaryKey(hashKey, JavaConverter.convertRowValue(row, hashKeyIndex, hashKeyType))
                    case Right(((hashKey, hashKeyIndex, hashKeyType), (rangeKey, rangeKeyIndex, rangeKeyType))) =>
                        val hashKeyValue = JavaConverter.convertRowValue(row, hashKeyIndex, hashKeyType)
                        val rangeKeyValue = JavaConverter.convertRowValue(row, rangeKeyIndex, rangeKeyType)
                        item.withPrimaryKey(hashKey, hashKeyValue, rangeKey, rangeKeyValue)
                }

                // Map remaining columns.
                columnSchema.attributes().foreach({
                    case (name, index, dataType) if !row.isNullAt(index) =>
                        item.`with`(name, JavaConverter.convertRowValue(row, index, dataType))
                    case _ =>
                })

                item
            }): _*
        ))

        val response = client.batchWriteItem(batchWriteItemSpec)
        handleBatchWriteResponse(client, rateLimiter)(response)
    }

    override def updateItem(columnSchema: ColumnSchema, row: InternalRow)
                           (client: DynamoDB, rateLimiter: RateLimiter): Unit = {
        val updateItemSpec = new UpdateItemSpec().withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)

        // Map primary key.
        columnSchema.keys() match {
            case Left((hashKey, hashKeyIndex, hashKeyType)) =>
                updateItemSpec.withPrimaryKey(hashKey, JavaConverter.convertRowValue(row, hashKeyIndex, hashKeyType))
            case Right(((hashKey, hashKeyIndex, hashKeyType), (rangeKey, rangeKeyIndex, rangeKeyType))) =>
                val hashKeyValue = JavaConverter.convertRowValue(row, hashKeyIndex, hashKeyType)
                val rangeKeyValue = JavaConverter.convertRowValue(row, rangeKeyIndex, rangeKeyType)
                updateItemSpec.withPrimaryKey(hashKey, hashKeyValue, rangeKey, rangeKeyValue)
        }

        // Map remaining columns.
        val attributeUpdates = columnSchema.attributes().collect({
            case (name, index, dataType) if !row.isNullAt(index) =>
                new AttributeUpdate(name).put(JavaConverter.convertRowValue(row, index, dataType))
        })

        updateItemSpec.withAttributeUpdate(attributeUpdates: _*)

        // Update item and rate limit on write capacity.
        val response = client.getTable(tableName).updateItem(updateItemSpec)
        Option(response.getUpdateItemResult.getConsumedCapacity)
            .foreach(cap => rateLimiter.acquire(cap.getCapacityUnits.toInt max 1))
    }

    override def deleteItems(columnSchema: ColumnSchema, items: Seq[InternalRow])
                            (client: DynamoDB, rateLimiter: RateLimiter): Unit = {
        // For each batch.
        val batchWriteItemSpec = new BatchWriteItemSpec().withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)

        val tableWriteItems = new TableWriteItems(tableName)
        val tableWriteItemsWithItems: TableWriteItems =
        // Check if hash key only or also range key.
            columnSchema.keys() match {
                case Left((hashKey, hashKeyIndex, hashKeyType)) =>
                    val hashKeys = items.map(row =>
                        JavaConverter.convertRowValue(row, hashKeyIndex, hashKeyType).asInstanceOf[AnyRef])
                    tableWriteItems.withHashOnlyKeysToDelete(hashKey, hashKeys: _*)
                case Right(((hashKey, hashKeyIndex, hashKeyType), (rangeKey, rangeKeyIndex, rangeKeyType))) =>
                    val alternatingHashAndRangeKeys = items.flatMap { row =>
                        val hashKeyValue = JavaConverter.convertRowValue(row, hashKeyIndex, hashKeyType)
                        val rangeKeyValue = JavaConverter.convertRowValue(row, rangeKeyIndex, rangeKeyType)
                        Seq(hashKeyValue.asInstanceOf[AnyRef], rangeKeyValue.asInstanceOf[AnyRef])
                    }
                    tableWriteItems.withHashAndRangeKeysToDelete(hashKey, rangeKey, alternatingHashAndRangeKeys: _*)
            }

        batchWriteItemSpec.withTableWriteItems(tableWriteItemsWithItems)

        val response = client.batchWriteItem(batchWriteItemSpec)
        handleBatchWriteResponse(client, rateLimiter)(response)
    }

    @tailrec
    private def handleBatchWriteResponse(client: DynamoDB, rateLimiter: RateLimiter)
                                        (response: BatchWriteItemOutcome): Unit = {
        // Rate limit on write capacity.
        if (response.getBatchWriteItemResult.getConsumedCapacity != null) {
            response.getBatchWriteItemResult.getConsumedCapacity.asScala.map(cap => {
                cap.getTableName -> cap.getCapacityUnits.toInt
            }).toMap.get(tableName).foreach(units => rateLimiter.acquire(units max 1))
        }
        // Retry unprocessed items.
        if (response.getUnprocessedItems != null && !response.getUnprocessedItems.isEmpty) {
            val newResponse = client.batchWriteItemUnprocessed(response.getUnprocessedItems)
            handleBatchWriteResponse(client, rateLimiter)(newResponse)
        }
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/connector/TableIndexConnector.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.connector

import com.amazonaws.services.dynamodbv2.document.spec.ScanSpec
import com.amazonaws.services.dynamodbv2.document.{ItemCollection, ScanOutcome}
import com.amazonaws.services.dynamodbv2.model.ReturnConsumedCapacity
import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder
import org.apache.spark.sql.sources.Filter

import scala.collection.JavaConverters._

private[dynamodb] class TableIndexConnector(tableName: String, indexName: String, parallelism: Int, parameters: Map[String, String])
    extends DynamoConnector with Serializable {

    private val consistentRead = parameters.getOrElse("stronglyConsistentReads", "false").toBoolean
    private val filterPushdown = parameters.getOrElse("filterPushdown", "true").toBoolean
    private val region = parameters.get("region")
    private val roleArn = parameters.get("roleArn")
    private val providerClassName = parameters.get("providerclassname")

    override val filterPushdownEnabled: Boolean = filterPushdown

    override val (keySchema, readLimit, itemLimit, totalSegments) = {
        val table = getDynamoDB(region, roleArn, providerClassName).getTable(tableName)
        val indexDesc = table.describe().getGlobalSecondaryIndexes.asScala.find(_.getIndexName == indexName).get

        // Key schema.
        val keySchema = KeySchema.fromDescription(indexDesc.getKeySchema.asScala)

        // User parameters.
        val bytesPerRCU = parameters.getOrElse("bytesPerRCU", "4000").toInt
        val maxPartitionBytes = parameters.getOrElse("maxpartitionbytes", "128000000").toInt
        val targetCapacity = parameters.getOrElse("targetCapacity", "1").toDouble
        val readFactor = if (consistentRead) 1 else 2

        // Table parameters.
        val indexSize = indexDesc.getIndexSizeBytes
        val itemCount = indexDesc.getItemCount

        // Partitioning calculation.
        val numPartitions = parameters.get("readpartitions").map(_.toInt).getOrElse({
            val sizeBased = (indexSize / maxPartitionBytes).toInt max 1
            val remainder = sizeBased % parallelism
            if (remainder > 0) sizeBased + (parallelism - remainder)
            else sizeBased
        })

        // Provisioned or on-demand throughput.
        val readThroughput = parameters.getOrElse("throughput", Option(indexDesc.getProvisionedThroughput.getReadCapacityUnits)
            .filter(_ > 0).map(_.longValue().toString)
            .getOrElse("100")).toLong

        // Rate limit calculation.
        val avgItemSize = indexSize.toDouble / itemCount
        val readCapacity = readThroughput * targetCapacity

        val rateLimit = readCapacity / parallelism
        val itemLimit = ((bytesPerRCU / avgItemSize * rateLimit).toInt * readFactor) max 1

        (keySchema, rateLimit, itemLimit, numPartitions)
    }

    override def scan(segmentNum: Int, columns: Seq[String], filters: Seq[Filter]): ItemCollection[ScanOutcome] = {
        val scanSpec = new ScanSpec()
            .withSegment(segmentNum)
            .withTotalSegments(totalSegments)
            .withMaxPageSize(itemLimit)
            .withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL)
            .withConsistentRead(consistentRead)

        if (columns.nonEmpty) {
            val xspec = new ExpressionSpecBuilder().addProjections(columns: _*)

            if (filters.nonEmpty && filterPushdown) {
                xspec.withCondition(FilterPushdown(filters))
            }

            scanSpec.withExpressionSpec(xspec.buildForScan())
        }

        getDynamoDB(region, roleArn, providerClassName).getTable(tableName).getIndex(indexName).scan(scanSpec)
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DefaultSource.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import java.util

import org.apache.spark.sql.connector.catalog.{Table, TableProvider}
import org.apache.spark.sql.connector.expressions.Transform
import org.apache.spark.sql.sources.DataSourceRegister
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.util.CaseInsensitiveStringMap

class DefaultSource extends TableProvider with DataSourceRegister {

    override def getTable(schema: StructType,
                          partitioning: Array[Transform],
                          properties: util.Map[String, String]): Table = {
        new DynamoTable(new CaseInsensitiveStringMap(properties), Some(schema))
    }

    override def inferSchema(options: CaseInsensitiveStringMap): StructType = {
        new DynamoTable(options).schema()
    }

    override def supportsExternalMetadata(): Boolean = true

    override def shortName(): String = "dynamodb"

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoBatchReader.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.audienceproject.spark.dynamodb.connector.DynamoConnector
import org.apache.spark.sql.connector.read._
import org.apache.spark.sql.connector.read.partitioning.Partitioning
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types.StructType

class DynamoBatchReader(connector: DynamoConnector,
                        filters: Array[Filter],
                        schema: StructType)
    extends Scan with Batch with SupportsReportPartitioning {

    override def readSchema(): StructType = schema

    override def toBatch: Batch = this

    override def planInputPartitions(): Array[InputPartition] = {
        val requiredColumns = schema.map(_.name)
        Array.tabulate(connector.totalSegments)(new ScanPartition(_, requiredColumns, filters))
    }

    override def createReaderFactory(): PartitionReaderFactory =
        new DynamoReaderFactory(connector, schema)

    override val outputPartitioning: Partitioning = new OutputPartitioning(connector.totalSegments)

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoDataDeleteWriter.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2020 AudienceProject. All rights reserved.
  */

package com.audienceproject.spark.dynamodb.datasource

import com.amazonaws.services.dynamodbv2.document.DynamoDB
import com.audienceproject.spark.dynamodb.connector.{ColumnSchema, TableConnector}

class DynamoDataDeleteWriter(batchSize: Int,
                             columnSchema: ColumnSchema,
                             connector: TableConnector,
                             client: DynamoDB)
    extends DynamoDataWriter(batchSize, columnSchema, connector, client) {

    protected override def flush(): Unit = {
        if (buffer.nonEmpty) {
            connector.deleteItems(columnSchema, buffer)(client, rateLimiter)
            buffer.clear()
        }
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoDataUpdateWriter.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.amazonaws.services.dynamodbv2.document.DynamoDB
import com.audienceproject.shaded.google.common.util.concurrent.RateLimiter
import com.audienceproject.spark.dynamodb.connector.{ColumnSchema, TableConnector}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.connector.write.{DataWriter, WriterCommitMessage}

class DynamoDataUpdateWriter(columnSchema: ColumnSchema,
                             connector: TableConnector,
                             client: DynamoDB)
    extends DataWriter[InternalRow] {

    private val rateLimiter = RateLimiter.create(connector.writeLimit)

    override def write(record: InternalRow): Unit = {
        connector.updateItem(columnSchema, record)(client, rateLimiter)
    }

    override def commit(): WriterCommitMessage = new WriterCommitMessage {}

    override def abort(): Unit = {}

    override def close(): Unit = client.shutdown()

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoDataWriter.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.amazonaws.services.dynamodbv2.document.DynamoDB
import com.audienceproject.shaded.google.common.util.concurrent.RateLimiter
import com.audienceproject.spark.dynamodb.connector.{ColumnSchema, TableConnector}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.connector.write.{DataWriter, WriterCommitMessage}

import scala.collection.mutable.ArrayBuffer

class DynamoDataWriter(batchSize: Int,
                       columnSchema: ColumnSchema,
                       connector: TableConnector,
                       client: DynamoDB)
    extends DataWriter[InternalRow] {

    protected val buffer: ArrayBuffer[InternalRow] = new ArrayBuffer[InternalRow](batchSize)
    protected val rateLimiter: RateLimiter = RateLimiter.create(connector.writeLimit)

    override def write(record: InternalRow): Unit = {
        buffer += record.copy()
        if (buffer.size == batchSize) {
            flush()
        }
    }

    override def commit(): WriterCommitMessage = {
        flush()
        new WriterCommitMessage {}
    }

    override def abort(): Unit = {}

    override def close(): Unit = client.shutdown()

    protected def flush(): Unit = {
        if (buffer.nonEmpty) {
            connector.putItems(columnSchema, buffer)(client, rateLimiter)
            buffer.clear()
        }
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoReaderFactory.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.amazonaws.services.dynamodbv2.document.Item
import com.audienceproject.shaded.google.common.util.concurrent.RateLimiter
import com.audienceproject.spark.dynamodb.connector.DynamoConnector
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.connector.read.{InputPartition, PartitionReader, PartitionReaderFactory}
import org.apache.spark.sql.types.{StructField, StructType}

import scala.collection.JavaConverters._

class DynamoReaderFactory(connector: DynamoConnector,
                          schema: StructType)
    extends PartitionReaderFactory {

    override def createReader(partition: InputPartition): PartitionReader[InternalRow] = {
        if (connector.isEmpty) new EmptyReader
        else new ScanPartitionReader(partition.asInstanceOf[ScanPartition])
    }

    private class EmptyReader extends PartitionReader[InternalRow] {
        override def next(): Boolean = false

        override def get(): InternalRow = throw new IllegalStateException("Unable to call get() on empty iterator")

        override def close(): Unit = {}
    }

    private class ScanPartitionReader(scanPartition: ScanPartition) extends PartitionReader[InternalRow] {

        import scanPartition._

        private val pageIterator = connector.scan(partitionIndex, requiredColumns, filters).pages().iterator().asScala
        private val rateLimiter = RateLimiter.create(connector.readLimit)

        private var innerIterator: Iterator[InternalRow] = Iterator.empty

        private var currentRow: InternalRow = _
        private var proceed = false

        private val typeConversions = schema.collect({
            case StructField(name, dataType, _, _) => name -> TypeConversion(name, dataType)
        }).toMap

        override def next(): Boolean = {
            proceed = true
            innerIterator.hasNext || {
                if (pageIterator.hasNext) {
                    nextPage()
                    next()
                }
                else false
            }
        }

        override def get(): InternalRow = {
            if (proceed) {
                currentRow = innerIterator.next()
                proceed = false
            }
            currentRow
        }

        override def close(): Unit = {}

        private def nextPage(): Unit = {
            val page = pageIterator.next()
            val result = page.getLowLevelResult
            Option(result.getScanResult.getConsumedCapacity).foreach(cap => rateLimiter.acquire(cap.getCapacityUnits.toInt max 1))
            innerIterator = result.getItems.iterator().asScala.map(itemToRow(requiredColumns))
        }

        private def itemToRow(requiredColumns: Seq[String])(item: Item): InternalRow =
            if (requiredColumns.nonEmpty) InternalRow.fromSeq(requiredColumns.map(columnName => typeConversions(columnName)(item)))
            else InternalRow.fromSeq(item.asMap().asScala.values.toSeq.map(_.toString))

    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoScanBuilder.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.audienceproject.spark.dynamodb.connector.{DynamoConnector, FilterPushdown}
import org.apache.spark.sql.connector.read._
import org.apache.spark.sql.sources.Filter
import org.apache.spark.sql.types._

class DynamoScanBuilder(connector: DynamoConnector, schema: StructType)
    extends ScanBuilder
        with SupportsPushDownRequiredColumns
        with SupportsPushDownFilters {

    private var acceptedFilters: Array[Filter] = Array.empty
    private var currentSchema: StructType = schema

    override def build(): Scan = new DynamoBatchReader(connector, pushedFilters(), currentSchema)

    override def pruneColumns(requiredSchema: StructType): Unit = {
        val keyFields = Seq(Some(connector.keySchema.hashKeyName), connector.keySchema.rangeKeyName).flatten
            .flatMap(keyName => currentSchema.fields.find(_.name == keyName))
        val requiredFields = keyFields ++ requiredSchema.fields
        val newFields = currentSchema.fields.filter(requiredFields.contains)
        currentSchema = StructType(newFields)
    }

    override def pushFilters(filters: Array[Filter]): Array[Filter] = {
        if (connector.filterPushdownEnabled) {
            val (acceptedFilters, postScanFilters) = FilterPushdown.acceptFilters(filters)
            this.acceptedFilters = acceptedFilters
            postScanFilters // Return filters that need to be evaluated after scanning.
        } else filters
    }

    override def pushedFilters(): Array[Filter] = acceptedFilters

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoTable.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import java.util

import com.audienceproject.spark.dynamodb.connector.{TableConnector, TableIndexConnector}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connector.catalog._
import org.apache.spark.sql.connector.read.ScanBuilder
import org.apache.spark.sql.connector.write.{LogicalWriteInfo, WriteBuilder}
import org.apache.spark.sql.types._
import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.slf4j.LoggerFactory

import scala.collection.JavaConverters._

class DynamoTable(options: CaseInsensitiveStringMap,
                  userSchema: Option[StructType] = None)
    extends Table
        with SupportsRead
        with SupportsWrite {

    private val logger = LoggerFactory.getLogger(this.getClass)

    private val dynamoConnector = {
        val indexName = Option(options.get("indexname"))
        val defaultParallelism = Option(options.get("defaultparallelism")).map(_.toInt).getOrElse(getDefaultParallelism)
        val optionsMap = Map(options.asScala.toSeq: _*)

        if (indexName.isDefined) new TableIndexConnector(name(), indexName.get, defaultParallelism, optionsMap)
        else new TableConnector(name(), defaultParallelism, optionsMap)
    }

    override def name(): String = options.get("tablename")

    override def schema(): StructType = userSchema.getOrElse(inferSchema())

    override def capabilities(): util.Set[TableCapability] =
        Set(TableCapability.BATCH_READ, TableCapability.BATCH_WRITE, TableCapability.ACCEPT_ANY_SCHEMA).asJava

    override def newScanBuilder(options: CaseInsensitiveStringMap): ScanBuilder = {
        new DynamoScanBuilder(dynamoConnector, schema())
    }

    override def newWriteBuilder(info: LogicalWriteInfo): WriteBuilder = {
        val parameters = Map(info.options().asScala.toSeq: _*)
        dynamoConnector match {
            case tableConnector: TableConnector => new DynamoWriteBuilder(tableConnector, parameters, info.schema())
            case _ => throw new RuntimeException("Unable to write to a GSI, please omit `indexName` option.")
        }
    }

    private def getDefaultParallelism: Int =
        SparkSession.getActiveSession match {
            case Some(spark) => spark.sparkContext.defaultParallelism
            case None =>
                logger.warn("Unable to read defaultParallelism from SparkSession." +
                    " Parallelism will be 1 unless overwritten with option `defaultParallelism`")
                1
        }

    private def inferSchema(): StructType = {
        val inferenceItems =
            if (dynamoConnector.nonEmpty && options.getBoolean("inferSchema",true)) dynamoConnector.scan(0, Seq.empty, Seq.empty).firstPage().getLowLevelResult.getItems.asScala
            else Seq.empty

        val typeMapping = inferenceItems.foldLeft(Map[String, DataType]())({
            case (map, item) => map ++ item.asMap().asScala.mapValues(inferType)
        })
        val typeSeq = typeMapping.map({ case (name, sparkType) => StructField(name, sparkType) }).toSeq

        if (typeSeq.size > 100) throw new RuntimeException("Schema inference not possible, too many attributes in table.")

        StructType(typeSeq)
    }

    private def inferType(value: Any): DataType = value match {
        case number: java.math.BigDecimal =>
            if (number.scale() == 0) {
                if (number.precision() < 10) IntegerType
                else if (number.precision() < 19) LongType
                else DataTypes.createDecimalType(number.precision(), number.scale())
            }
            else DoubleType
        case list: java.util.ArrayList[_] =>
            if (list.isEmpty) ArrayType(StringType)
            else ArrayType(inferType(list.get(0)))
        case set: java.util.Set[_] =>
            if (set.isEmpty) ArrayType(StringType)
            else ArrayType(inferType(set.iterator().next()))
        case map: java.util.Map[String, _] =>
            val mapFields = (for ((fieldName, fieldValue) <- map.asScala) yield {
                StructField(fieldName, inferType(fieldValue))
            }).toSeq
            StructType(mapFields)
        case _: java.lang.Boolean => BooleanType
        case _: Array[Byte] => BinaryType
        case _ => StringType
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoWriteBuilder.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.audienceproject.spark.dynamodb.connector.TableConnector
import org.apache.spark.sql.connector.write._
import org.apache.spark.sql.types.StructType

class DynamoWriteBuilder(connector: TableConnector, parameters: Map[String, String], schema: StructType)
    extends WriteBuilder {

    override def buildForBatch(): BatchWrite = new BatchWrite {
        override def createBatchWriterFactory(info: PhysicalWriteInfo): DataWriterFactory =
            new DynamoWriterFactory(connector, parameters, schema)

        override def commit(messages: Array[WriterCommitMessage]): Unit = {}

        override def abort(messages: Array[WriterCommitMessage]): Unit = {}
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoWriterFactory.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.audienceproject.spark.dynamodb.connector.{ColumnSchema, TableConnector}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.connector.write.{DataWriter, DataWriterFactory}
import org.apache.spark.sql.types.StructType

class DynamoWriterFactory(connector: TableConnector,
                          parameters: Map[String, String],
                          schema: StructType)
    extends DataWriterFactory {

    private val batchSize = parameters.getOrElse("writebatchsize", "25").toInt
    private val update = parameters.getOrElse("update", "false").toBoolean
    private val delete = parameters.getOrElse("delete", "false").toBoolean

    private val region = parameters.get("region")
    private val roleArn = parameters.get("rolearn")
    private val providerClassName = parameters.get("providerclassname")

    override def createWriter(partitionId: Int, taskId: Long): DataWriter[InternalRow] = {
        val columnSchema = new ColumnSchema(connector.keySchema, schema)
        val client = connector.getDynamoDB(region, roleArn, providerClassName)
        if (update) {
            assert(!delete, "Please provide exactly one of 'update' or 'delete' options.")
            new DynamoDataUpdateWriter(columnSchema, connector, client)
        } else if (delete) {
            new DynamoDataDeleteWriter(batchSize, columnSchema, connector, client)
        } else {
            new DynamoDataWriter(batchSize, columnSchema, connector, client)
        }
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/OutputPartitioning.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import org.apache.spark.sql.connector.read.partitioning.{Distribution, Partitioning}

class OutputPartitioning(override val numPartitions: Int) extends Partitioning {

    override def satisfy(distribution: Distribution): Boolean = false

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/ScanPartition.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import org.apache.spark.sql.connector.read.InputPartition
import org.apache.spark.sql.sources.Filter

class ScanPartition(val partitionIndex: Int,
                    val requiredColumns: Seq[String],
                    val filters: Array[Filter])
    extends InputPartition


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/datasource/TypeConversion.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2019 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.datasource

import com.amazonaws.services.dynamodbv2.document.{IncompatibleTypeException, Item}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData}
import org.apache.spark.sql.types._
import org.apache.spark.unsafe.types.UTF8String

import scala.collection.JavaConverters._

private[dynamodb] object TypeConversion {

    def apply(attrName: String, sparkType: DataType): Item => Any =

        sparkType match {
            case BooleanType => nullableGet(_.getBOOL)(attrName)
            case StringType => nullableGet(item => attrName => UTF8String.fromString(item.getString(attrName)))(attrName)
            case IntegerType => nullableGet(_.getInt)(attrName)
            case LongType => nullableGet(_.getLong)(attrName)
            case DoubleType => nullableGet(_.getDouble)(attrName)
            case FloatType => nullableGet(_.getFloat)(attrName)
            case BinaryType => nullableGet(_.getBinary)(attrName)
            case DecimalType() => nullableGet(_.getNumber)(attrName)
            case ArrayType(innerType, _) =>
                nullableGet(_.getList)(attrName).andThen(extractArray(convertValue(innerType)))
            case MapType(keyType, valueType, _) =>
                if (keyType != StringType) throw new IllegalArgumentException(s"Invalid Map key type '${keyType.typeName}'. DynamoDB only supports String as Map key type.")
                nullableGet(_.getRawMap)(attrName).andThen(extractMap(convertValue(valueType)))
            case StructType(fields) =>
                val nestedConversions = fields.collect({ case StructField(name, dataType, _, _) => name -> convertValue(dataType) })
                nullableGet(_.getRawMap)(attrName).andThen(extractStruct(nestedConversions))
            case _ => throw new IllegalArgumentException(s"Spark DataType '${sparkType.typeName}' could not be mapped to a corresponding DynamoDB data type.")
        }

    private val stringConverter = (value: Any) => UTF8String.fromString(value.asInstanceOf[String])

    private def convertValue(sparkType: DataType): Any => Any =

        sparkType match {
            case IntegerType => nullableConvert(_.intValue())
            case LongType => nullableConvert(_.longValue())
            case DoubleType => nullableConvert(_.doubleValue())
            case FloatType => nullableConvert(_.floatValue())
            case DecimalType() => nullableConvert(identity)
            case ArrayType(innerType, _) => extractArray(convertValue(innerType))
            case MapType(keyType, valueType, _) =>
                if (keyType != StringType) throw new IllegalArgumentException(s"Invalid Map key type '${keyType.typeName}'. DynamoDB only supports String as Map key type.")
                extractMap(convertValue(valueType))
            case StructType(fields) =>
                val nestedConversions = fields.collect({ case StructField(name, dataType, _, _) => name -> convertValue(dataType) })
                extractStruct(nestedConversions)
            case BooleanType => {
                case boolean: Boolean => boolean
                case _ => null
            }
            case StringType => {
                case string: String => UTF8String.fromString(string)
                case _ => null
            }
            case BinaryType => {
                case byteArray: Array[Byte] => byteArray
                case _ => null
            }
            case _ => throw new IllegalArgumentException(s"Spark DataType '${sparkType.typeName}' could not be mapped to a corresponding DynamoDB data type.")
        }

    private def nullableGet(getter: Item => String => Any)(attrName: String): Item => Any = {
        case item if item.hasAttribute(attrName) => try getter(item)(attrName) catch {
            case _: NumberFormatException => null
            case _: IncompatibleTypeException => null
        }
        case _ => null
    }

    private def nullableConvert(converter: java.math.BigDecimal => Any): Any => Any = {
        case item: java.math.BigDecimal => converter(item)
        case _ => null
    }

    private def extractArray(converter: Any => Any): Any => Any = {
        case list: java.util.List[_] => new GenericArrayData(list.asScala.map(converter))
        case set: java.util.Set[_] => new GenericArrayData(set.asScala.map(converter).toSeq)
        case _ => null
    }

    private def extractMap(converter: Any => Any): Any => Any = {
        case map: java.util.Map[_, _] => ArrayBasedMapData(map, stringConverter, converter)
        case _ => null
    }

    private def extractStruct(conversions: Seq[(String, Any => Any)]): Any => Any = {
        case map: java.util.Map[_, _] => InternalRow.fromSeq(conversions.map({
            case (name, conv) => conv(map.get(name))
        }))
        case _ => null
    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/implicits.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import com.audienceproject.spark.dynamodb.reflect.SchemaAnalysis
import org.apache.spark.sql._
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.functions.col

import scala.reflect.ClassTag
import scala.reflect.runtime.universe.TypeTag

object implicits {

    implicit class DynamoDBDataFrameReader(reader: DataFrameReader) {

        def dynamodb(tableName: String): DataFrame =
            getDynamoDBSource(tableName).load()

        def dynamodb(tableName: String, indexName: String): DataFrame =
            getDynamoDBSource(tableName).option("indexName", indexName).load()

        def dynamodbAs[T <: Product : ClassTag : TypeTag](tableName: String): Dataset[T] = {
            implicit val encoder: Encoder[T] = ExpressionEncoder()
            val (schema, aliasMap) = SchemaAnalysis[T]
            getColumnsAlias(getDynamoDBSource(tableName).schema(schema).load(), aliasMap).as
        }

        def dynamodbAs[T <: Product : ClassTag : TypeTag](tableName: String, indexName: String): Dataset[T] = {
            implicit val encoder: Encoder[T] = ExpressionEncoder()
            val (schema, aliasMap) = SchemaAnalysis[T]
            getColumnsAlias(
                getDynamoDBSource(tableName).option("indexName", indexName).schema(schema).load(), aliasMap).as
        }

        private def getDynamoDBSource(tableName: String): DataFrameReader =
            reader.format("com.audienceproject.spark.dynamodb.datasource").option("tableName", tableName)

        private def getColumnsAlias(dataFrame: DataFrame, aliasMap: Map[String, String]): DataFrame = {
            if (aliasMap.isEmpty) dataFrame
            else {
                val columnsAlias = dataFrame.columns.map({
                    case name if aliasMap.isDefinedAt(name) => col(name) as aliasMap(name)
                    case name => col(name)
                })
                dataFrame.select(columnsAlias: _*)
            }
        }

    }

    implicit class DynamoDBDataFrameWriter[T](writer: DataFrameWriter[T]) {

        def dynamodb(tableName: String): Unit =
            writer.format("com.audienceproject.spark.dynamodb.datasource")
                .mode(SaveMode.Append)
                .option("tableName", tableName)
                .save()

    }

}


================================================
FILE: src/main/scala/com/audienceproject/spark/dynamodb/reflect/SchemaAnalysis.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb.reflect

import com.audienceproject.spark.dynamodb.attribute
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.{StructField, StructType}

import scala.reflect.ClassTag
import scala.reflect.runtime.{universe => ru}

/**
  * Uses reflection to perform a static analysis that can derive a Spark schema from a case class of type `T`.
  */
private[dynamodb] object SchemaAnalysis {

    def apply[T <: Product : ClassTag : ru.TypeTag]: (StructType, Map[String, String]) = {

        val runtimeMirror = ru.runtimeMirror(getClass.getClassLoader)

        val classObj = scala.reflect.classTag[T].runtimeClass
        val classSymbol = runtimeMirror.classSymbol(classObj)

        val params = classSymbol.primaryConstructor.typeSignature.paramLists.head
        val (sparkFields, aliasMap) = params.foldLeft((List.empty[StructField], Map.empty[String, String]))({
            case ((list, map), field) =>
                val sparkType = ScalaReflection.schemaFor(field.typeSignature).dataType

                // Black magic from here:
                // https://stackoverflow.com/questions/23046958/accessing-an-annotation-value-in-scala
                val attrName = field.annotations.collectFirst({
                    case ann: ru.AnnotationApi if ann.tree.tpe =:= ru.typeOf[attribute] =>
                        ann.tree.children.tail.collectFirst({
                            case ru.Literal(ru.Constant(name: String)) => name
                        })
                }).flatten

                if (attrName.isDefined) {
                    val sparkField = StructField(attrName.get, sparkType, nullable = true)
                    (list :+ sparkField, map + (attrName.get -> field.name.toString))
                } else {
                    val sparkField = StructField(field.name.toString, sparkType, nullable = true)
                    (list :+ sparkField, map)
                }
        })

        (StructType(sparkFields), aliasMap)
    }

}


================================================
FILE: src/test/resources/log4j2.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN" name="Log4j2 configuration">
    <Appenders>
        <Console target="SYSTEM_OUT" name="console">
            <PatternLayout pattern="%highlight{[%-5level][%d{HH:mm:ss.SSS}][%logger{36}]} %msg%n" />
        </Console>
        <Console target="SYSTEM_OUT" name="simple-console">
            <PatternLayout pattern="%msg%n" />
        </Console>
    </Appenders>
    <Loggers>
        <Root level="INFO">
            <AppenderRef ref="console" />
        </Root>
        <logger name="org.apache.spark" level="WARN">
            <AppenderRef ref="simple-console"/>
        </logger>
        <logger name="org.spark_project.jetty" level="WARN">
            <AppenderRef ref="simple-console"/>
        </logger>
        <logger name="com.amazonaws.services.dynamodbv2.local" level="WARN">
            <AppenderRef ref="simple-console"/>
        </logger>
        <logger name="com.amazonaws.auth.profile.internal.BasicProfileConfigLoader" level="ERROR">
            <AppenderRef ref="simple-console"/>
        </logger>
        <Logger name="MessageOnly" level="INFO" additivity="false">
            <AppenderRef ref="simple-console"/>
        </Logger>
    </Loggers>
</Configuration>


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/AbstractInMemoryTest.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration
import com.amazonaws.services.dynamodbv2.document.{DynamoDB, Item}
import com.amazonaws.services.dynamodbv2.local.main.ServerRunner
import com.amazonaws.services.dynamodbv2.local.server.DynamoDBProxyServer
import com.amazonaws.services.dynamodbv2.model.{AttributeDefinition, CreateTableRequest, KeySchemaElement, ProvisionedThroughput}
import com.amazonaws.services.dynamodbv2.{AmazonDynamoDB, AmazonDynamoDBClientBuilder}
import org.apache.spark.sql.SparkSession
import org.scalatest.{BeforeAndAfterAll, FunSuite}

class AbstractInMemoryTest extends FunSuite with BeforeAndAfterAll {

    val server: DynamoDBProxyServer = ServerRunner.createServerFromCommandLineArgs(Array("-inMemory"))

    val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.standard()
        .withEndpointConfiguration(new EndpointConfiguration(System.getProperty("aws.dynamodb.endpoint"), "us-east-1"))
        .build()
    val dynamoDB: DynamoDB = new DynamoDB(client)

    val spark: SparkSession = SparkSession.builder
        .master("local")
        .appName(this.getClass.getName)
        .getOrCreate()

    spark.sparkContext.setLogLevel("ERROR")

    override def beforeAll(): Unit = {
        server.start()

        // Create a test table.
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("TestFruit")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        // Populate with test data.
        val table = dynamoDB.getTable("TestFruit")
        for ((name, color, weight) <- Seq(
            ("apple", "red", 0.2), ("banana", "yellow", 0.15), ("watermelon", "red", 0.5),
            ("grape", "green", 0.01), ("pear", "green", 0.2), ("kiwi", "green", 0.05),
            ("blackberry", "purple", 0.01), ("blueberry", "purple", 0.01), ("plum", "purple", 0.1)
        )) {
            table.putItem(new Item()
                .withString("name", name)
                .withString("color", color)
                .withDouble("weightKg", weight))
        }
    }

    override def afterAll(): Unit = {
        client.deleteTable("TestFruit")
        server.stop()
    }

}


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/DefaultSourceTest.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import com.audienceproject.spark.dynamodb.implicits._
import com.audienceproject.spark.dynamodb.structs.TestFruit
import org.apache.spark.sql.functions._

import scala.collection.JavaConverters._

class DefaultSourceTest extends AbstractInMemoryTest {

    test("Table count is 9") {
        val count = spark.read.dynamodb("TestFruit")
        count.show()
        assert(count.count() === 9)
    }

    test("Column sum is 27") {
        val result = spark.read.dynamodb("TestFruit").collectAsList().asScala
        val numCols = result.map(_.length).sum
        assert(numCols === 27)
    }

    test("Select only first two columns") {
        val result = spark.read.dynamodb("TestFruit").select("name", "color").collectAsList().asScala
        val numCols = result.map(_.length).sum
        assert(numCols === 18)
    }

    test("The least occurring color is yellow") {
        import spark.implicits._
        val itemWithLeastOccurringColor = spark.read.dynamodb("TestFruit")
            .groupBy($"color").agg(count($"color").as("countColor"))
            .orderBy($"countColor")
            .takeAsList(1).get(0)
        assert(itemWithLeastOccurringColor.getAs[String]("color") === "yellow")
    }

    test("Test of attribute name alias") {
        import spark.implicits._
        val itemApple = spark.read.dynamodbAs[TestFruit]("TestFruit")
            .filter($"primaryKey" === "apple")
            .takeAsList(1).get(0)
        assert(itemApple.primaryKey === "apple")
    }

}


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/FilterPushdownTest.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import com.audienceproject.spark.dynamodb.implicits._

class FilterPushdownTest extends AbstractInMemoryTest {

    test("Count of red fruit is 2 (`EqualTo` filter)") {
        import spark.implicits._
        val fruitCount = spark.read.dynamodb("TestFruit").where($"color" === "red").count()
        assert(fruitCount === 2)
    }

    test("Count of yellow and green fruit is 4 (`In` filter)") {
        import spark.implicits._
        val fruitCount = spark.read.dynamodb("TestFruit")
            .where($"color" isin("yellow", "green"))
            .count()
        assert(fruitCount === 4)
    }

    test("Count of 0.01 weight fruit is 4 (`In` filter)") {
        import spark.implicits._
        val fruitCount = spark.read.dynamodb("TestFruit")
            .where($"weightKg" isin 0.01)
            .count()
        assert(fruitCount === 3)
    }

    test("Only 'banana' starts with a 'b' and is >0.01 kg (`StringStartsWith`, `GreaterThan`, `And` filters)") {
        import spark.implicits._
        val fruit = spark.read.dynamodb("TestFruit")
            .where(($"name" startsWith "b") && ($"weightKg" > 0.01))
            .collectAsList().get(0)
        assert(fruit.getAs[String]("name") === "banana")
    }

}


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/NestedDataStructuresTest.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import com.amazonaws.services.dynamodbv2.model.{AttributeDefinition, CreateTableRequest, KeySchemaElement, ProvisionedThroughput}
import com.audienceproject.spark.dynamodb.implicits._
import com.audienceproject.spark.dynamodb.structs.{TestFruitProperties, TestFruitWithProperties}
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.struct
import org.apache.spark.sql.types._

class NestedDataStructuresTest extends AbstractInMemoryTest {

    test("Insert ArrayType") {
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("InsertTestList")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        import spark.implicits._

        val fruitSchema = StructType(
            Seq(
                StructField("name", StringType, nullable = false),
                StructField("color", StringType, nullable = false),
                StructField("weight", DoubleType, nullable = false),
                StructField("properties", ArrayType(StringType, containsNull = false), nullable = false)
            ))

        val rows = spark.sparkContext.parallelize(Seq(
            Row("lemon", "yellow", 0.1, Seq("fresh", "2 dkk")),
            Row("orange", "orange", 0.2, Seq("too ripe", "1 dkk")),
            Row("pomegranate", "red", 0.2, Seq("freshness", "4 dkk"))
        ))

        val newItemsDs = spark.createDataFrame(rows, fruitSchema)

        newItemsDs.printSchema()
        newItemsDs.show(false)

        newItemsDs.write.dynamodb("InsertTestList")

        println("Writing successful.")

        val validationDs = spark.read.dynamodb("InsertTestList")
        assert(validationDs.count() === 3)
        assert(validationDs.select($"properties".as[Seq[String]]).collect().forall(Seq(
            Seq("fresh", "2 dkk"),
            Seq("too ripe", "1 dkk"),
            Seq("freshness", "4 dkk")
        ) contains _))
    }

    test("Insert MapType") {
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("InsertTestMap")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        import spark.implicits._

        val fruitSchema = StructType(
            Seq(
                StructField("name", StringType, nullable = false),
                StructField("color", StringType, nullable = false),
                StructField("weight", DoubleType, nullable = false),
                StructField("properties", MapType(StringType, StringType, valueContainsNull = false))
            ))

        val rows = spark.sparkContext.parallelize(Seq(
            Row("lemon", "yellow", 0.1, Map("freshness" -> "fresh", "eco" -> "yes", "price" -> "2 dkk")),
            Row("orange", "orange", 0.2, Map("freshness" -> "too ripe", "eco" -> "no", "price" -> "1 dkk")),
            Row("pomegranate", "red", 0.2, Map("freshness" -> "green", "eco" -> "yes", "price" -> "4 dkk"))
        ))

        val newItemsDs = spark.createDataFrame(rows, fruitSchema)

        newItemsDs.printSchema()
        newItemsDs.show(false)

        newItemsDs.write.dynamodb("InsertTestMap")

        println("Writing successful.")

        val validationDs = spark.read.schema(fruitSchema).dynamodb("InsertTestMap")
        validationDs.show(false)
        assert(validationDs.count() === 3)
        assert(validationDs.select($"properties".as[Map[String, String]]).collect().forall(Seq(
            Map("freshness" -> "fresh", "eco" -> "yes", "price" -> "2 dkk"),
            Map("freshness" -> "too ripe", "eco" -> "no", "price" -> "1 dkk"),
            Map("freshness" -> "green", "eco" -> "yes", "price" -> "4 dkk")
        ) contains _))
    }

    test("Insert ArrayType with nested MapType") {
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("InsertTestListMap")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        import spark.implicits._

        val fruitSchema = StructType(
            Seq(
                StructField("name", StringType, nullable = false),
                StructField("color", StringType, nullable = false),
                StructField("weight", DoubleType, nullable = false),
                StructField("properties", ArrayType(MapType(StringType, StringType, valueContainsNull = false), containsNull = false), nullable = false)
            ))

        val rows = spark.sparkContext.parallelize(Seq(
            Row("lemon", "yellow", 0.1, Seq(Map("freshness" -> "fresh", "eco" -> "yes", "price" -> "2 dkk"))),
            Row("orange", "orange", 0.2, Seq(Map("freshness" -> "too ripe", "eco" -> "no", "price" -> "1 dkk"))),
            Row("pomegranate", "red", 0.2, Seq(Map("freshness" -> "green", "eco" -> "yes", "price" -> "4 dkk")))
        ))

        val newItemsDs = spark.createDataFrame(rows, fruitSchema)

        newItemsDs.printSchema()
        newItemsDs.show(false)

        newItemsDs.write.dynamodb("InsertTestListMap")

        println("Writing successful.")

        val validationDs = spark.read.schema(fruitSchema).dynamodb("InsertTestListMap")
        validationDs.show(false)
        assert(validationDs.count() === 3)
        assert(validationDs.select($"properties".as[Seq[Map[String, String]]]).collect().forall(Seq(
            Seq(Map("freshness" -> "fresh", "eco" -> "yes", "price" -> "2 dkk")),
            Seq(Map("freshness" -> "too ripe", "eco" -> "no", "price" -> "1 dkk")),
            Seq(Map("freshness" -> "green", "eco" -> "yes", "price" -> "4 dkk"))
        ) contains _))
    }

    test("Insert StructType") {
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("InsertTestStruct")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        import spark.implicits._

        val fruitSchema = StructType(
            Seq(
                StructField("name", StringType, nullable = false),
                StructField("color", StringType, nullable = false),
                StructField("weight", DoubleType, nullable = false),
                StructField("freshness", StringType, nullable = false),
                StructField("eco", BooleanType, nullable = false),
                StructField("price", DoubleType, nullable = false)
            ))

        val rows = spark.sparkContext.parallelize(Seq(
            Row("lemon", "yellow", 0.1, "fresh", true, 2.0),
            Row("pomegranate", "red", 0.2, "green", true, 4.0)
        ))

        val newItemsDs = spark.createDataFrame(rows, fruitSchema).select(
            $"name",
            $"color",
            $"weight",
            struct($"freshness", $"eco", $"price") as "properties"
        )

        newItemsDs.printSchema()
        newItemsDs.show(false)

        newItemsDs.write.dynamodb("InsertTestStruct")

        println("Writing successful.")

        val validationDs = spark.read.dynamodbAs[TestFruitWithProperties]("InsertTestStruct")
        assert(validationDs.count() === 2)
        assert(validationDs.select($"properties".as[TestFruitProperties]).collect().forall(Seq(
            TestFruitProperties("fresh", eco = true, 2.0),
            TestFruitProperties("green", eco = true, 4.0)
        ) contains _))
    }

}


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/NullBooleanTest.scala
================================================
package com.audienceproject.spark.dynamodb

import com.amazonaws.services.dynamodbv2.document.Item
import com.amazonaws.services.dynamodbv2.model.{
    AttributeDefinition,
    CreateTableRequest,
    KeySchemaElement,
    ProvisionedThroughput
}
import com.audienceproject.spark.dynamodb.implicits._

class NullBooleanTest extends AbstractInMemoryTest {
    test("Test Null") {
        dynamoDB.createTable(
            new CreateTableRequest()
                .withTableName("TestNullBoolean")
                .withAttributeDefinitions(new AttributeDefinition("Pk", "S"))
                .withKeySchema(new KeySchemaElement("Pk", "HASH"))
                .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L))
        )

        val table = dynamoDB.getTable("TestNullBoolean")

        for ((_pk, _type, _value) <- Seq(
            ("id1", "type1", true),
            ("id2", "type2", null)
        )) {
            if (_type != "type2") {
                table.putItem(
                    new Item()
                        .withString("Pk", _pk)
                        .withString("Type", _type)
                        .withBoolean("Value", _value.asInstanceOf[Boolean])
                )
            } else {
                table.putItem(
                    new Item()
                        .withString("Pk", _pk)
                        .withString("Type", _type)
                        .withNull("Value")
                )
            }
        }

        val df = spark.read.dynamodbAs[BooleanClass]("TestNullBoolean")

        import spark.implicits._
        df.where($"Type" === "type2").show()
        client.deleteTable("TestNullBoolean")
    }
}

case class BooleanClass(Pk: String, Type: String, Value: Boolean)


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/NullValuesTest.scala
================================================
package com.audienceproject.spark.dynamodb

import com.amazonaws.services.dynamodbv2.model.{AttributeDefinition, CreateTableRequest, KeySchemaElement, ProvisionedThroughput}
import com.audienceproject.spark.dynamodb.implicits._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}

class NullValuesTest extends AbstractInMemoryTest {

    test("Insert nested StructType with null values") {
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("NullTest")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        val schema = StructType(
            Seq(
                StructField("name", StringType, nullable = false),
                StructField("info", StructType(
                    Seq(
                        StructField("age", IntegerType, nullable = true),
                        StructField("address", StringType, nullable = true)
                    )
                ), nullable = true)
            )
        )

        val rows = spark.sparkContext.parallelize(Seq(
            Row("one", Row(30, "Somewhere")),
            Row("two", null),
            Row("three", Row(null, null))
        ))

        val newItemsDs = spark.createDataFrame(rows, schema)

        newItemsDs.write.dynamodb("NullTest")

        val validationDs = spark.read.dynamodb("NullTest")

        validationDs.show(false)
    }

}


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/RegionTest.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import com.amazonaws.client.builder.AwsClientBuilder.EndpointConfiguration
import com.amazonaws.services.dynamodbv2.{AmazonDynamoDB, AmazonDynamoDBClientBuilder}
import com.amazonaws.services.dynamodbv2.document.DynamoDB
import com.amazonaws.services.dynamodbv2.model.{AttributeDefinition, CreateTableRequest, KeySchemaElement, ProvisionedThroughput}
import com.audienceproject.spark.dynamodb.implicits._

class RegionTest extends AbstractInMemoryTest {

    test("Inserting from a local Dataset") {
        val tableName = "RegionTest1"
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName(tableName)
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))
        val client: AmazonDynamoDB = AmazonDynamoDBClientBuilder.standard()
            .withEndpointConfiguration(new EndpointConfiguration(System.getProperty("aws.dynamodb.endpoint"), "eu-central-1"))
            .build()
        val dynamoDBEU: DynamoDB = new DynamoDB(client)
        dynamoDBEU.createTable(new CreateTableRequest()
            .withTableName(tableName)
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        import spark.implicits._

        val newItemsDs = spark.createDataset(Seq(
            ("lemon", "yellow", 0.1),
            ("orange", "orange", 0.2),
            ("pomegranate", "red", 0.2)
        ))
            .withColumnRenamed("_1", "name")
            .withColumnRenamed("_2", "color")
            .withColumnRenamed("_3", "weight")
        newItemsDs.write.option("region","eu-central-1").dynamodb(tableName)

        val validationDs = spark.read.dynamodb(tableName)
        assert(validationDs.count() === 0)
        val validationDsEU = spark.read.option("region","eu-central-1").dynamodb(tableName)
        assert(validationDsEU.count() === 3)
    }

}


================================================
FILE: src/test/scala/com/audienceproject/spark/dynamodb/WriteRelationTest.scala
================================================
/**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  * http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
  *
  * Copyright © 2018 AudienceProject. All rights reserved.
  */
package com.audienceproject.spark.dynamodb

import java.util

import collection.JavaConverters._
import com.amazonaws.services.dynamodbv2.model.{AttributeDefinition, CreateTableRequest, KeySchemaElement, KeyType, ProvisionedThroughput}
import com.audienceproject.spark.dynamodb.implicits._
import org.apache.spark.sql.functions.{lit, when, length => sqlLength}
import org.scalatest.Matchers

class WriteRelationTest extends AbstractInMemoryTest with Matchers {

    test("Inserting from a local Dataset") {
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName("InsertTest1")
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"))
            .withProvisionedThroughput(new ProvisionedThroughput(5L, 5L)))

        import spark.implicits._

        val newItemsDs = spark.createDataset(Seq(
            ("lemon", "yellow", 0.1),
            ("orange", "orange", 0.2),
            ("pomegranate", "red", 0.2)
        ))
            .withColumnRenamed("_1", "name")
            .withColumnRenamed("_2", "color")
            .withColumnRenamed("_3", "weight")
        newItemsDs.write.dynamodb("InsertTest1")

        val validationDs = spark.read.dynamodb("InsertTest1")
        assert(validationDs.count() === 3)
        assert(validationDs.select("name").as[String].collect().forall(Seq("lemon", "orange", "pomegranate") contains _))
        assert(validationDs.select("color").as[String].collect().forall(Seq("yellow", "orange", "red") contains _))
        assert(validationDs.select("weight").as[Double].collect().forall(Seq(0.1, 0.2, 0.2) contains _))
    }

    test("Deleting from a local Dataset with a HashKey only") {
        val tablename = "DeleteTest1"
        dynamoDB.createTable(new CreateTableRequest()
            .withTableName(tablename)
            .withAttributeDefinitions(new AttributeDefinition("name", "S"))
            .withKeySchema(new KeySchemaElement("name", "HASH"

Download .txt

gitextract_cc5q4yoa/

├── .editorconfig
├── .gitignore
├── LICENSE
├── README.md
├── build.sbt
├── project/
│   ├── build.properties
│   └── plugins.sbt
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── com/
│   │   │       └── audienceproject/
│   │   │           └── shaded/
│   │   │               └── google/
│   │   │                   └── common/
│   │   │                       ├── base/
│   │   │                       │   ├── Preconditions.java
│   │   │                       │   └── Ticker.java
│   │   │                       └── util/
│   │   │                           └── concurrent/
│   │   │                               ├── RateLimiter.java
│   │   │                               └── Uninterruptibles.java
│   │   ├── resources/
│   │   │   └── META-INF/
│   │   │       └── services/
│   │   │           └── org.apache.spark.sql.sources.DataSourceRegister
│   │   └── scala/
│   │       └── com/
│   │           └── audienceproject/
│   │               └── spark/
│   │                   └── dynamodb/
│   │                       ├── attribute.scala
│   │                       ├── catalyst/
│   │                       │   └── JavaConverter.scala
│   │                       ├── connector/
│   │                       │   ├── ColumnSchema.scala
│   │                       │   ├── DynamoConnector.scala
│   │                       │   ├── DynamoWritable.scala
│   │                       │   ├── FilterPushdown.scala
│   │                       │   ├── KeySchema.scala
│   │                       │   ├── TableConnector.scala
│   │                       │   └── TableIndexConnector.scala
│   │                       ├── datasource/
│   │                       │   ├── DefaultSource.scala
│   │                       │   ├── DynamoBatchReader.scala
│   │                       │   ├── DynamoDataDeleteWriter.scala
│   │                       │   ├── DynamoDataUpdateWriter.scala
│   │                       │   ├── DynamoDataWriter.scala
│   │                       │   ├── DynamoReaderFactory.scala
│   │                       │   ├── DynamoScanBuilder.scala
│   │                       │   ├── DynamoTable.scala
│   │                       │   ├── DynamoWriteBuilder.scala
│   │                       │   ├── DynamoWriterFactory.scala
│   │                       │   ├── OutputPartitioning.scala
│   │                       │   ├── ScanPartition.scala
│   │                       │   └── TypeConversion.scala
│   │                       ├── implicits.scala
│   │                       └── reflect/
│   │                           └── SchemaAnalysis.scala
│   └── test/
│       ├── resources/
│       │   └── log4j2.xml
│       └── scala/
│           └── com/
│               └── audienceproject/
│                   └── spark/
│                       └── dynamodb/
│                           ├── AbstractInMemoryTest.scala
│                           ├── DefaultSourceTest.scala
│                           ├── FilterPushdownTest.scala
│                           ├── NestedDataStructuresTest.scala
│                           ├── NullBooleanTest.scala
│                           ├── NullValuesTest.scala
│                           ├── RegionTest.scala
│                           ├── WriteRelationTest.scala
│                           └── structs/
│                               ├── TestFruit.scala
│                               └── TestFruitWithProperties.scala
└── wercker.yml

Download .txt

SYMBOL INDEX (71 symbols across 4 files)

FILE: src/main/java/com/audienceproject/shaded/google/common/base/Preconditions.java
  class Preconditions (line 65) | public final class Preconditions {
    method Preconditions (line 66) | private Preconditions() {}
    method checkArgument (line 75) | public static void checkArgument(boolean expression) {
    method checkArgument (line 90) | public static void checkArgument(
    method checkArgument (line 116) | public static void checkArgument(boolean expression,
    method checkState (line 132) | public static void checkState(boolean expression) {
    method checkState (line 147) | public static void checkState(
    method checkState (line 173) | public static void checkState(boolean expression,
    method checkNotNull (line 190) | public static <T> T checkNotNull(T reference) {
    method checkNotNull (line 207) | public static <T> T checkNotNull(T reference, @Nullable Object errorMe...
    method checkNotNull (line 231) | public static <T> T checkNotNull(T reference,
    method checkElementIndex (line 284) | public static int checkElementIndex(int index, int size) {
    method checkElementIndex (line 302) | public static int checkElementIndex(
    method badElementIndex (line 311) | private static String badElementIndex(int index, int size, String desc) {
    method checkPositionIndex (line 334) | public static int checkPositionIndex(int index, int size) {
    method checkPositionIndex (line 352) | public static int checkPositionIndex(
    method badPositionIndex (line 361) | private static String badPositionIndex(int index, int size, String des...
    method checkPositionIndexes (line 386) | public static void checkPositionIndexes(int start, int end, int size) {
    method badPositionIndexes (line 393) | private static String badPositionIndexes(int start, int end, int size) {
    method format (line 417) | static String format(String template,

FILE: src/main/java/com/audienceproject/shaded/google/common/base/Ticker.java
  class Ticker (line 35) | public abstract class Ticker {
    method Ticker (line 39) | protected Ticker() {}
    method read (line 45) | public abstract long read();
    method systemTicker (line 52) | public static Ticker systemTicker() {
    method read (line 57) | @Override

FILE: src/main/java/com/audienceproject/shaded/google/common/util/concurrent/RateLimiter.java
  class RateLimiter (line 89) | @ThreadSafe
    method create (line 227) | public static RateLimiter create(double permitsPerSecond) {
    method create (line 231) | static RateLimiter create(SleepingTicker ticker, double permitsPerSeco...
    method create (line 260) | public static RateLimiter create(double permitsPerSecond, long warmupP...
    method create (line 264) | static RateLimiter create(
    method createBursty (line 271) | static RateLimiter createBursty(
    method RateLimiter (line 315) | private RateLimiter(SleepingTicker ticker) {
    method setRate (line 338) | public final void setRate(double permitsPerSecond) {
    method doSetRate (line 349) | abstract void doSetRate(double permitsPerSecond, double stableInterval...
    method getRate (line 358) | public final double getRate() {
    method acquire (line 367) | public void acquire() {
    method acquire (line 377) | public void acquire(int permits) {
    method tryAcquire (line 398) | public boolean tryAcquire(long timeout, TimeUnit unit) {
    method tryAcquire (line 412) | public boolean tryAcquire(int permits) {
    method tryAcquire (line 426) | public boolean tryAcquire() {
    method tryAcquire (line 441) | public boolean tryAcquire(int permits, long timeout, TimeUnit unit) {
    method checkPermits (line 457) | private static void checkPermits(int permits) {
    method reserveNextTicket (line 464) | private long reserveNextTicket(double requiredPermits, long nowMicros) {
    method storedPermitsToWaitTime (line 486) | abstract long storedPermitsToWaitTime(double storedPermits, double per...
    method resync (line 488) | private void resync(long nowMicros) {
    method readSafeMicros (line 497) | private long readSafeMicros() {
    method toString (line 501) | @Override
    class WarmingUp (line 581) | private static class WarmingUp extends RateLimiter {
      method WarmingUp (line 591) | WarmingUp(SleepingTicker ticker, long warmupPeriod, TimeUnit timeUni...
      method doSetRate (line 596) | @Override
      method storedPermitsToWaitTime (line 614) | @Override
      method permitsToTime (line 630) | private double permitsToTime(double permits) {
    class Bursty (line 644) | private static class Bursty extends RateLimiter {
      method Bursty (line 645) | Bursty(SleepingTicker ticker) {
      method doSetRate (line 649) | @Override
      method storedPermitsToWaitTime (line 664) | @Override
  class SleepingTicker (line 671) | abstract class SleepingTicker extends Ticker {
    method sleepMicrosUninterruptibly (line 672) | abstract void sleepMicrosUninterruptibly(long micros);
    method read (line 675) | @Override
    method sleepMicrosUninterruptibly (line 680) | @Override

FILE: src/main/java/com/audienceproject/shaded/google/common/util/concurrent/Uninterruptibles.java
  class Uninterruptibles (line 39) | public final class Uninterruptibles {
    method awaitUninterruptibly (line 48) | public static void awaitUninterruptibly(CountDownLatch latch) {
    method awaitUninterruptibly (line 71) | public static boolean awaitUninterruptibly(CountDownLatch latch,
    method joinUninterruptibly (line 97) | public static void joinUninterruptibly(Thread toJoin) {
    method getUninterruptibly (line 125) | public static <V> V getUninterruptibly(Future<V> future)
    method getUninterruptibly (line 155) | public static <V> V getUninterruptibly(
    method joinUninterruptibly (line 184) | public static void joinUninterruptibly(Thread toJoin,
    method takeUninterruptibly (line 211) | public static <E> E takeUninterruptibly(BlockingQueue<E> queue) {
    method putUninterruptibly (line 237) | public static <E> void putUninterruptibly(BlockingQueue<E> queue, E el...
    method sleepUninterruptibly (line 260) | public static void sleepUninterruptibly(long sleepFor, TimeUnit unit) {
    method Uninterruptibles (line 284) | private Uninterruptibles() {}

Download .json

Condensed preview — 48 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (210K chars).

[
  {
    "path": ".editorconfig",
    "chars": 114,
    "preview": "root = true\n[*]\nend_of_line = lf\ninsert_final_newline = true\ncharset = utf-8\nindent_style = space\nindent_size = 4\n"
  },
  {
    "path": ".gitignore",
    "chars": 100,
    "preview": "/target/\n/bin/\n*.class\n*.log\n.classpath\n.idea\n.wercker\nproject/target\nproject/project\nlib_managed*/\n"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 6809,
    "preview": "# Spark+DynamoDB\nPlug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.\n\nWe published a sm"
  },
  {
    "path": "build.sbt",
    "chars": 4816,
    "preview": "organization := \"com.audienceproject\"\n\nname := \"spark-dynamodb\"\n\nversion := \"1.1.3\"\n\ndescription := \"Plug-and-play imple"
  },
  {
    "path": "project/build.properties",
    "chars": 20,
    "preview": "sbt.version = 1.2.6\n"
  },
  {
    "path": "project/plugins.sbt",
    "chars": 215,
    "preview": "logLevel := Level.Warn\n\naddSbtPlugin(\"com.jsuereth\" % \"sbt-pgp\" % \"1.1.0\")\naddSbtPlugin(\"com.typesafe.sbteclipse\" % \"sbt"
  },
  {
    "path": "src/main/java/com/audienceproject/shaded/google/common/base/Preconditions.java",
    "chars": 19192,
    "preview": "package com.audienceproject.shaded.google.common.base;\n\n/*\n * Notice:\n * This file was modified at AudienceProject ApS b"
  },
  {
    "path": "src/main/java/com/audienceproject/shaded/google/common/base/Ticker.java",
    "chars": 1824,
    "preview": "package com.audienceproject.shaded.google.common.base;\n\n/*\n * Notice:\n * This file was modified at AudienceProject ApS b"
  },
  {
    "path": "src/main/java/com/audienceproject/shaded/google/common/util/concurrent/RateLimiter.java",
    "chars": 34770,
    "preview": "package com.audienceproject.shaded.google.common.util.concurrent;\n\n/*\n * Notice:\n * This file was modified at AudiencePr"
  },
  {
    "path": "src/main/java/com/audienceproject/shaded/google/common/util/concurrent/Uninterruptibles.java",
    "chars": 9334,
    "preview": "package com.audienceproject.shaded.google.common.util.concurrent;\n\n/*\n * Notice:\n * This file was modified at AudiencePr"
  },
  {
    "path": "src/main/resources/META-INF/services/org.apache.spark.sql.sources.DataSourceRegister",
    "chars": 60,
    "preview": "com.audienceproject.spark.dynamodb.datasource.DefaultSource\n"
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/attribute.scala",
    "chars": 1039,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/catalyst/JavaConverter.scala",
    "chars": 3991,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/ColumnSchema.scala",
    "chars": 2361,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/DynamoConnector.scala",
    "chars": 6079,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/DynamoWritable.scala",
    "chars": 1616,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/FilterPushdown.scala",
    "chars": 5027,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/KeySchema.scala",
    "chars": 1504,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/TableConnector.scala",
    "chars": 10853,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/connector/TableIndexConnector.scala",
    "chars": 4580,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DefaultSource.scala",
    "chars": 1826,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoBatchReader.scala",
    "chars": 1950,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoDataDeleteWriter.scala",
    "chars": 1566,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoDataUpdateWriter.scala",
    "chars": 1860,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoDataWriter.scala",
    "chars": 2281,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoReaderFactory.scala",
    "chars": 3899,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoScanBuilder.scala",
    "chars": 2443,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoTable.scala",
    "chars": 5196,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoWriteBuilder.scala",
    "chars": 1621,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/DynamoWriterFactory.scala",
    "chars": 2445,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/OutputPartitioning.scala",
    "chars": 1183,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/ScanPartition.scala",
    "chars": 1218,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/datasource/TypeConversion.scala",
    "chars": 5785,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/implicits.scala",
    "chars": 3226,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/main/scala/com/audienceproject/spark/dynamodb/reflect/SchemaAnalysis.scala",
    "chars": 2922,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/resources/log4j2.xml",
    "chars": 1250,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Configuration status=\"WARN\" name=\"Log4j2 configuration\">\n    <Appenders>\n       "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/AbstractInMemoryTest.scala",
    "chars": 3265,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/DefaultSourceTest.scala",
    "chars": 2426,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/FilterPushdownTest.scala",
    "chars": 2159,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/NestedDataStructuresTest.scala",
    "chars": 8691,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/NullBooleanTest.scala",
    "chars": 1743,
    "preview": "package com.audienceproject.spark.dynamodb\n\nimport com.amazonaws.services.dynamodbv2.document.Item\nimport com.amazonaws."
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/NullValuesTest.scala",
    "chars": 1584,
    "preview": "package com.audienceproject.spark.dynamodb\n\nimport com.amazonaws.services.dynamodbv2.model.{AttributeDefinition, CreateT"
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/RegionTest.scala",
    "chars": 3043,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/WriteRelationTest.scala",
    "chars": 8306,
    "preview": "/**\n  * Licensed to the Apache Software Foundation (ASF) under one\n  * or more contributor license agreements.  See the "
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/structs/TestFruit.scala",
    "chars": 240,
    "preview": "package com.audienceproject.spark.dynamodb.structs\n\nimport com.audienceproject.spark.dynamodb.attribute\n\ncase class Test"
  },
  {
    "path": "src/test/scala/com/audienceproject/spark/dynamodb/structs/TestFruitWithProperties.scala",
    "chars": 412,
    "preview": "package com.audienceproject.spark.dynamodb.structs\n\ncase class TestFruitProperties(freshness: String,\n                  "
  },
  {
    "path": "wercker.yml",
    "chars": 1034,
    "preview": "box:\n    id: audienceproject/jvm\n    username: $DOCKERHUB_ACCOUNT\n    password: $DOCKERHUB_PASSWORD\n    tag: latest\n\nbui"
  }
]

About this extraction

This page contains the full source code of the audienceproject/spark-dynamodb GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 48 files (194.6 KB), approximately 46.9k tokens, and a symbol index with 71 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo