[
  {
    "path": "LICENSE.txt",
    "content": "MIT No Attribution\n\nCopyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this\nsoftware and associated documentation files (the \"Software\"), to deal in the Software\nwithout restriction, including without limitation the rights to use, copy, modify,\nmerge, publish, distribute, sublicense, and/or sell copies of the Software, and to\npermit persons to whom the Software is furnished to do so.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,\nINCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A\nPARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\nHOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION\nOF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Streaming Data to Amazon Elasticsearch Service\n## Using AWS Lambda: Sample Node.js Code\n\n### Introduction\nIt is often useful to stream data, as it gets generated, for indexing in an\nAmazon Elasticsearch Service domain.  This helps fresh data to be available for\nsearch or analytics.  To do this requires:\n\n1. Knowing when new data is available\n2. Code to pick up and parse the data into JSON documents, and add them to an\n   Amazon Elasticsearch (henceforth, ES for short) domain.\n3. Scalable and fully managed infrastructure to host this code\n\n*Lambda* is an AWS service that takes care of these requirements.  Put simply,\nit is an \"event handling\" service in the cloud.  Lambda lets us implement\nthe event handler (in Node.js or Java), which it hosts and invokes in response\nto an event.\n\nThe handler can be triggered by a \"push\" or a \"pull\" approach.\nCertain event sources (such as S3) push an event notification to Lambda.\nOthers (such as Kinesis) require Lambda to poll for events and pull them\nwhen available.\n\nFor more details on AWS Lambda, please see\n[the documentation](http://aws.amazon.com/documentation/lambda/).\n\nThis package contains sample Lambda code (in Node.js) to stream data to ES\nfrom two common AWS data sources: S3 and Kinesis.  The S3 sample takes apache\nlog files, parses them into JSON documents and adds them to ES.  The Kinesis\nsample reads JSON data from the stream and adds them to ES.\n\nNote that the sample code has been kept simple for reasons for clarity.  It\ndoes not handle ES document batching, or eventual consistency issues for\nS3 updates, etc.\n\n### Setup Overview\n\nWhile some detailed instructions are covered later in this file and elsewhere\n(in the Lambda documentation), this section aims to show the larger picture\nthat the individual steps work to accomplish.  We assume that the data source\n(an S3 bucket or a Kinesis stream, in this case) and an ES domain are already\nset up.\n\n1. **Deployment Package**: The \"Deployment Package\" is the event handler code files\n   and its dependencies packaged as a zip file.  The first step in creating\n   a new Lambda function is to prepare and upload this zip file.\n\n2. **Lambda Configuration**:\n\n   1. Handler: The name of the main code file in the deployment package,\n      with the file extension replaced with a `.handler` suffix.\n   2. Memory: The memory limit, based on which the EC2 instance type to use\n      is determined.  For now, the default should do.\n   3. Timeout: The default timeout value (3 seconds) is quite low for our\n      use-case.  10 seconds might work better, but please adjust based on\n      your testing.\n\n3. **Authorization**: Since there is a need here for various AWS services making\n   calls to each other, appropriate authorization is required.  This takes\n   the form of configuring an IAM role, to which various authorization policies\n   are attached.  This role will be assumed by the Lambda function when running.\n\nNote:\n\n* The AWS Console is simpler to use for configuration than other methods.\n* Lambda is currently available only in a few regions (us-east-1, us-west-2,\n  eu-west-1, ap-northeast-1).\n* Once the setup is complete and tested, enable the data source in the Lambda\n  console, so that data may start streaming in.\n* The code is kept simple for purposes of illustration.  It doesn't batch\n  documents when loading the ES domain, or (for S3 updates) handle\n  eventual consistency cases.\n\n#### Deployment Package Creation\n1. On your development machine, download and install [Node.js](https://nodejs.org/en/).\n2. Anywhere, create a directory structure similar to the following:\n\n       eslambda (place sample code here)\n       |\n       +-- node_modules (dependencies will go here)\n\n3. Modify the sample code with the correct ES endpoint, region, index\n   and document type.\n4. Install each dependency imported by the sample code\n   (with the `require()` call), as follows:\n\n       npm install <dependency>\n\n   Verify that these are installed within the `node_modules` subdirectory.\n5. Create a zip file to package the code and the `node_modules` subdirectory\n\n       zip -r eslambda.zip *\n\nThe zip file thus created is the Lambda Deployment Package.\n\n## S3-Lambda-ES\n\nSet up the Lambda function and the S3 bucket as described in the\n[Lambda-S3 Walkthrough](http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser.html).\nPlease keep in mind the following notes and configuration overrides:\n\n* The walkthrough uses the AWS CLI for configuration, but it's probably more\nconvenient to use the AWS Console (web UI)\n\n* The S3 bucket must be created in the same region as Lambda is, so that it\n  can push events to Lambda.\n\n* When registering the S3 bucket as the data-source in Lambda, add a filter\n  for files having `.log` suffix, so that Lambda picks up only apache log files.\n\n* The following authorizations are required:\n\n  1. Lambda permits S3 to push event notification to it\n  2. S3 permits Lambda to fetch the created objects from a given bucket\n  3. ES permits Lambda to add documents to the given domain\n\n  The Lambda console provides a simple way to create an IAM role with policies\n  for (1).  For (2), when creating the IAM role, choose the \"S3 execution role\"\n  option; this will load the role with permissions to read from the S3\n  bucket.  For (3), add the following access policy to permit ES operations\n  to the role.\n\n      {\n          \"Version\": \"2012-10-17\",\n          \"Statement\": [\n              {\n                  \"Action\": [\n                      \"es:*\"\n                  ],\n                  \"Effect\": \"Allow\",\n                  \"Resource\": \"*\"\n              }\n          ]\n      }\n\n\n## Kinesis-Lambda-ES\n\nSet up the Lambda function and the Kinesis stream as described in the\n[Lambda-Kinesis Walkthrough](http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-kinesis-events-adminuser.html).\nPlease keep in mind the following notes and configuration overrides:\n\n* The walkthrough uses the AWS CLI, but it's probably more convenient to use\n  the AWS Console (web UI) for Lambda configuration.\n\n* To the IAM role assigned to the Lambda function, add the following\n  access policy to permit ES operations.\n\n        {\n            \"Version\": \"2012-10-17\",\n            \"Statement\": [\n                {\n                    \"Action\": [\n                        \"es:*\"\n                    ],\n                    \"Effect\": \"Allow\",\n                    \"Resource\": \"*\"\n                }\n            ]\n        }\n\n* For testing: If you have a Kinesis client, use it to stream a record to Lambda.\n  If not, the AWS CLI could be used to push a JSON document to Lambda.\n\n      aws kinesis put-record --stream-name <lambda name> --data \"<JSON document>\" --region <region> --partition-key shardId-000000000000\n\n## Copyright\n\nCopyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\nSPDX-License-Identifier: MIT-0\n"
  },
  {
    "path": "src/kinesis_lambda_es.js",
    "content": "/*\n * Sample node.js code for AWS Lambda to upload the JSON documents\n * pushed from Kinesis to Amazon Elasticsearch.\n *\n * Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n * SPDX-License-Identifier: MIT-0\n */\n\n/* == Imports == */\nvar AWS = require('aws-sdk');\nvar path = require('path');\n\n/* == Globals == */\nvar esDomain = {\n    region: 'us-east-1',\n    endpoint: 'my-domain-search-endpoint',\n    index: 'myindex',\n    doctype: 'mytype'\n};\nvar endpoint = new AWS.Endpoint(esDomain.endpoint);\n/*\n * The AWS credentials are picked up from the environment.\n * They belong to the IAM role assigned to the Lambda function.\n * Since the ES requests are signed using these credentials,\n * make sure to apply a policy that allows ES domain operations\n * to the role.\n */\nvar creds = new AWS.EnvironmentCredentials('AWS');\n\n\n/* Lambda \"main\": Execution begins here */\nexports.handler = function(event, context) {\n    console.log(JSON.stringify(event, null, '  '));\n    event.Records.forEach(function(record) {\n        var jsonDoc = new Buffer(record.kinesis.data, 'base64');\n        postToES(jsonDoc.toString(), context);\n    });\n}\n\n\n/*\n * Post the given document to Elasticsearch\n */\nfunction postToES(doc, context) {\n    var req = new AWS.HttpRequest(endpoint);\n\n    req.method = 'POST';\n    req.path = path.join('/', esDomain.index, esDomain.doctype);\n    req.region = esDomain.region;\n    req.headers['presigned-expires'] = false;\n    req.headers['Host'] = endpoint.host;\n    req.body = doc;\n\n    var signer = new AWS.Signers.V4(req , 'es');  // es: service code\n    signer.addAuthorization(creds, new Date());\n\n    var send = new AWS.NodeHttpClient();\n    send.handleRequest(req, null, function(httpResp) {\n        var respBody = '';\n        httpResp.on('data', function (chunk) {\n            respBody += chunk;\n        });\n        httpResp.on('end', function (chunk) {\n            console.log('Response: ' + respBody);\n            context.succeed('Lambda added document ' + doc);\n        });\n    }, function(err) {\n        console.log('Error: ' + err);\n        context.fail('Lambda failed with error ' + err);\n    });\n}\n"
  },
  {
    "path": "src/s3_lambda_es.js",
    "content": "/*\n * Sample node.js code for AWS Lambda to get Apache log files from S3, parse\n * and add them to an Amazon Elasticsearch Service domain.\n *\n * Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n * SPDX-License-Identifier: MIT-0\n */\n\n/* Imports */\nvar AWS = require('aws-sdk');\nvar LineStream = require('byline').LineStream;\nvar parse = require('clf-parser');  // Apache Common Log Format\nvar path = require('path');\nvar stream = require('stream');\n\n/* Globals */\nvar esDomain = {\n    endpoint: 'my-search-endpoint.amazonaws.com',\n    region: 'my-region',\n    index: 'logs',\n    doctype: 'apache'\n};\nvar endpoint =  new AWS.Endpoint(esDomain.endpoint);\nvar s3 = new AWS.S3();\nvar totLogLines = 0;    // Total number of log lines in the file\nvar numDocsAdded = 0;   // Number of log lines added to ES so far\n\n/*\n * The AWS credentials are picked up from the environment.\n * They belong to the IAM role assigned to the Lambda function.\n * Since the ES requests are signed using these credentials,\n * make sure to apply a policy that permits ES domain operations\n * to the role.\n */\nvar creds = new AWS.EnvironmentCredentials('AWS');\n\n/*\n * Get the log file from the given S3 bucket and key.  Parse it and add\n * each log record to the ES domain.\n */\nfunction s3LogsToES(bucket, key, context, lineStream, recordStream) {\n    // Note: The Lambda function should be configured to filter for .log files\n    // (as part of the Event Source \"suffix\" setting).\n\n    var s3Stream = s3.getObject({Bucket: bucket, Key: key}).createReadStream();\n\n    // Flow: S3 file stream -> Log Line stream -> Log Record stream -> ES\n    s3Stream\n      .pipe(lineStream)\n      .pipe(recordStream)\n      .on('data', function(parsedEntry) {\n          postDocumentToES(parsedEntry, context);\n      });\n\n    s3Stream.on('error', function() {\n        console.log(\n            'Error getting object \"' + key + '\" from bucket \"' + bucket + '\".  ' +\n            'Make sure they exist and your bucket is in the same region as this function.');\n        context.fail();\n    });\n}\n\n/*\n * Add the given document to the ES domain.\n * If all records are successfully added, indicate success to lambda\n * (using the \"context\" parameter).\n */\nfunction postDocumentToES(doc, context) {\n    var req = new AWS.HttpRequest(endpoint);\n\n    req.method = 'POST';\n    req.path = path.join('/', esDomain.index, esDomain.doctype);\n    req.region = esDomain.region;\n    req.body = doc;\n    req.headers['presigned-expires'] = false;\n    req.headers['Host'] = endpoint.host;\n\n    // Sign the request (Sigv4)\n    var signer = new AWS.Signers.V4(req, 'es');\n    signer.addAuthorization(creds, new Date());\n\n    // Post document to ES\n    var send = new AWS.NodeHttpClient();\n    send.handleRequest(req, null, function(httpResp) {\n        var body = '';\n        httpResp.on('data', function (chunk) {\n            body += chunk;\n        });\n        httpResp.on('end', function (chunk) {\n            numDocsAdded ++;\n            if (numDocsAdded === totLogLines) {\n                // Mark lambda success.  If not done so, it will be retried.\n                console.log('All ' + numDocsAdded + ' log records added to ES.');\n                context.succeed();\n            }\n        });\n    }, function(err) {\n        console.log('Error: ' + err);\n        console.log(numDocsAdded + 'of ' + totLogLines + ' log records added to ES.');\n        context.fail();\n    });\n}\n\n/* Lambda \"main\": Execution starts here */\nexports.handler = function(event, context) {\n    console.log('Received event: ', JSON.stringify(event, null, 2));\n    \n    /* == Streams ==\n    * To avoid loading an entire (typically large) log file into memory,\n    * this is implemented as a pipeline of filters, streaming log data\n    * from S3 to ES.\n    * Flow: S3 file stream -> Log Line stream -> Log Record stream -> ES\n    */\n    var lineStream = new LineStream();\n    // A stream of log records, from parsing each log line\n    var recordStream = new stream.Transform({objectMode: true})\n    recordStream._transform = function(line, encoding, done) {\n        var logRecord = parse(line.toString());\n        var serializedRecord = JSON.stringify(logRecord);\n        this.push(serializedRecord);\n        totLogLines ++;\n        done();\n    }\n\n    event.Records.forEach(function(record) {\n        var bucket = record.s3.bucket.name;\n        var objKey = decodeURIComponent(record.s3.object.key.replace(/\\+/g, ' '));\n        s3LogsToES(bucket, objKey, context, lineStream, recordStream);\n    });\n}\n"
  }
]