master 390017daa370 cached
4 files
14.2 KB
3.5k tokens
3 symbols
1 requests
Download .txt
Repository: awslabs/amazon-elasticsearch-lambda-samples
Branch: master
Commit: 390017daa370
Files: 4
Total size: 14.2 KB

Directory structure:
gitextract_oa40gt6d/

├── LICENSE.txt
├── README.md
└── src/
    ├── kinesis_lambda_es.js
    └── s3_lambda_es.js

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE.txt
================================================
MIT No Attribution

Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: README.md
================================================
# Streaming Data to Amazon Elasticsearch Service
## Using AWS Lambda: Sample Node.js Code

### Introduction
It is often useful to stream data, as it gets generated, for indexing in an
Amazon Elasticsearch Service domain.  This helps fresh data to be available for
search or analytics.  To do this requires:

1. Knowing when new data is available
2. Code to pick up and parse the data into JSON documents, and add them to an
   Amazon Elasticsearch (henceforth, ES for short) domain.
3. Scalable and fully managed infrastructure to host this code

*Lambda* is an AWS service that takes care of these requirements.  Put simply,
it is an "event handling" service in the cloud.  Lambda lets us implement
the event handler (in Node.js or Java), which it hosts and invokes in response
to an event.

The handler can be triggered by a "push" or a "pull" approach.
Certain event sources (such as S3) push an event notification to Lambda.
Others (such as Kinesis) require Lambda to poll for events and pull them
when available.

For more details on AWS Lambda, please see
[the documentation](http://aws.amazon.com/documentation/lambda/).

This package contains sample Lambda code (in Node.js) to stream data to ES
from two common AWS data sources: S3 and Kinesis.  The S3 sample takes apache
log files, parses them into JSON documents and adds them to ES.  The Kinesis
sample reads JSON data from the stream and adds them to ES.

Note that the sample code has been kept simple for reasons for clarity.  It
does not handle ES document batching, or eventual consistency issues for
S3 updates, etc.

### Setup Overview

While some detailed instructions are covered later in this file and elsewhere
(in the Lambda documentation), this section aims to show the larger picture
that the individual steps work to accomplish.  We assume that the data source
(an S3 bucket or a Kinesis stream, in this case) and an ES domain are already
set up.

1. **Deployment Package**: The "Deployment Package" is the event handler code files
   and its dependencies packaged as a zip file.  The first step in creating
   a new Lambda function is to prepare and upload this zip file.

2. **Lambda Configuration**:

   1. Handler: The name of the main code file in the deployment package,
      with the file extension replaced with a `.handler` suffix.
   2. Memory: The memory limit, based on which the EC2 instance type to use
      is determined.  For now, the default should do.
   3. Timeout: The default timeout value (3 seconds) is quite low for our
      use-case.  10 seconds might work better, but please adjust based on
      your testing.

3. **Authorization**: Since there is a need here for various AWS services making
   calls to each other, appropriate authorization is required.  This takes
   the form of configuring an IAM role, to which various authorization policies
   are attached.  This role will be assumed by the Lambda function when running.

Note:

* The AWS Console is simpler to use for configuration than other methods.
* Lambda is currently available only in a few regions (us-east-1, us-west-2,
  eu-west-1, ap-northeast-1).
* Once the setup is complete and tested, enable the data source in the Lambda
  console, so that data may start streaming in.
* The code is kept simple for purposes of illustration.  It doesn't batch
  documents when loading the ES domain, or (for S3 updates) handle
  eventual consistency cases.

#### Deployment Package Creation
1. On your development machine, download and install [Node.js](https://nodejs.org/en/).
2. Anywhere, create a directory structure similar to the following:

       eslambda (place sample code here)
       |
       +-- node_modules (dependencies will go here)

3. Modify the sample code with the correct ES endpoint, region, index
   and document type.
4. Install each dependency imported by the sample code
   (with the `require()` call), as follows:

       npm install <dependency>

   Verify that these are installed within the `node_modules` subdirectory.
5. Create a zip file to package the code and the `node_modules` subdirectory

       zip -r eslambda.zip *

The zip file thus created is the Lambda Deployment Package.

## S3-Lambda-ES

Set up the Lambda function and the S3 bucket as described in the
[Lambda-S3 Walkthrough](http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser.html).
Please keep in mind the following notes and configuration overrides:

* The walkthrough uses the AWS CLI for configuration, but it's probably more
convenient to use the AWS Console (web UI)

* The S3 bucket must be created in the same region as Lambda is, so that it
  can push events to Lambda.

* When registering the S3 bucket as the data-source in Lambda, add a filter
  for files having `.log` suffix, so that Lambda picks up only apache log files.

* The following authorizations are required:

  1. Lambda permits S3 to push event notification to it
  2. S3 permits Lambda to fetch the created objects from a given bucket
  3. ES permits Lambda to add documents to the given domain

  The Lambda console provides a simple way to create an IAM role with policies
  for (1).  For (2), when creating the IAM role, choose the "S3 execution role"
  option; this will load the role with permissions to read from the S3
  bucket.  For (3), add the following access policy to permit ES operations
  to the role.

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Action": [
                      "es:*"
                  ],
                  "Effect": "Allow",
                  "Resource": "*"
              }
          ]
      }


## Kinesis-Lambda-ES

Set up the Lambda function and the Kinesis stream as described in the
[Lambda-Kinesis Walkthrough](http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-kinesis-events-adminuser.html).
Please keep in mind the following notes and configuration overrides:

* The walkthrough uses the AWS CLI, but it's probably more convenient to use
  the AWS Console (web UI) for Lambda configuration.

* To the IAM role assigned to the Lambda function, add the following
  access policy to permit ES operations.

        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Action": [
                        "es:*"
                    ],
                    "Effect": "Allow",
                    "Resource": "*"
                }
            ]
        }

* For testing: If you have a Kinesis client, use it to stream a record to Lambda.
  If not, the AWS CLI could be used to push a JSON document to Lambda.

      aws kinesis put-record --stream-name <lambda name> --data "<JSON document>" --region <region> --partition-key shardId-000000000000

## Copyright

Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: MIT-0


================================================
FILE: src/kinesis_lambda_es.js
================================================
/*
 * Sample node.js code for AWS Lambda to upload the JSON documents
 * pushed from Kinesis to Amazon Elasticsearch.
 *
 * Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 * SPDX-License-Identifier: MIT-0
 */

/* == Imports == */
var AWS = require('aws-sdk');
var path = require('path');

/* == Globals == */
var esDomain = {
    region: 'us-east-1',
    endpoint: 'my-domain-search-endpoint',
    index: 'myindex',
    doctype: 'mytype'
};
var endpoint = new AWS.Endpoint(esDomain.endpoint);
/*
 * The AWS credentials are picked up from the environment.
 * They belong to the IAM role assigned to the Lambda function.
 * Since the ES requests are signed using these credentials,
 * make sure to apply a policy that allows ES domain operations
 * to the role.
 */
var creds = new AWS.EnvironmentCredentials('AWS');


/* Lambda "main": Execution begins here */
exports.handler = function(event, context) {
    console.log(JSON.stringify(event, null, '  '));
    event.Records.forEach(function(record) {
        var jsonDoc = new Buffer(record.kinesis.data, 'base64');
        postToES(jsonDoc.toString(), context);
    });
}


/*
 * Post the given document to Elasticsearch
 */
function postToES(doc, context) {
    var req = new AWS.HttpRequest(endpoint);

    req.method = 'POST';
    req.path = path.join('/', esDomain.index, esDomain.doctype);
    req.region = esDomain.region;
    req.headers['presigned-expires'] = false;
    req.headers['Host'] = endpoint.host;
    req.body = doc;

    var signer = new AWS.Signers.V4(req , 'es');  // es: service code
    signer.addAuthorization(creds, new Date());

    var send = new AWS.NodeHttpClient();
    send.handleRequest(req, null, function(httpResp) {
        var respBody = '';
        httpResp.on('data', function (chunk) {
            respBody += chunk;
        });
        httpResp.on('end', function (chunk) {
            console.log('Response: ' + respBody);
            context.succeed('Lambda added document ' + doc);
        });
    }, function(err) {
        console.log('Error: ' + err);
        context.fail('Lambda failed with error ' + err);
    });
}


================================================
FILE: src/s3_lambda_es.js
================================================
/*
 * Sample node.js code for AWS Lambda to get Apache log files from S3, parse
 * and add them to an Amazon Elasticsearch Service domain.
 *
 * Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 * SPDX-License-Identifier: MIT-0
 */

/* Imports */
var AWS = require('aws-sdk');
var LineStream = require('byline').LineStream;
var parse = require('clf-parser');  // Apache Common Log Format
var path = require('path');
var stream = require('stream');

/* Globals */
var esDomain = {
    endpoint: 'my-search-endpoint.amazonaws.com',
    region: 'my-region',
    index: 'logs',
    doctype: 'apache'
};
var endpoint =  new AWS.Endpoint(esDomain.endpoint);
var s3 = new AWS.S3();
var totLogLines = 0;    // Total number of log lines in the file
var numDocsAdded = 0;   // Number of log lines added to ES so far

/*
 * The AWS credentials are picked up from the environment.
 * They belong to the IAM role assigned to the Lambda function.
 * Since the ES requests are signed using these credentials,
 * make sure to apply a policy that permits ES domain operations
 * to the role.
 */
var creds = new AWS.EnvironmentCredentials('AWS');

/*
 * Get the log file from the given S3 bucket and key.  Parse it and add
 * each log record to the ES domain.
 */
function s3LogsToES(bucket, key, context, lineStream, recordStream) {
    // Note: The Lambda function should be configured to filter for .log files
    // (as part of the Event Source "suffix" setting).

    var s3Stream = s3.getObject({Bucket: bucket, Key: key}).createReadStream();

    // Flow: S3 file stream -> Log Line stream -> Log Record stream -> ES
    s3Stream
      .pipe(lineStream)
      .pipe(recordStream)
      .on('data', function(parsedEntry) {
          postDocumentToES(parsedEntry, context);
      });

    s3Stream.on('error', function() {
        console.log(
            'Error getting object "' + key + '" from bucket "' + bucket + '".  ' +
            'Make sure they exist and your bucket is in the same region as this function.');
        context.fail();
    });
}

/*
 * Add the given document to the ES domain.
 * If all records are successfully added, indicate success to lambda
 * (using the "context" parameter).
 */
function postDocumentToES(doc, context) {
    var req = new AWS.HttpRequest(endpoint);

    req.method = 'POST';
    req.path = path.join('/', esDomain.index, esDomain.doctype);
    req.region = esDomain.region;
    req.body = doc;
    req.headers['presigned-expires'] = false;
    req.headers['Host'] = endpoint.host;

    // Sign the request (Sigv4)
    var signer = new AWS.Signers.V4(req, 'es');
    signer.addAuthorization(creds, new Date());

    // Post document to ES
    var send = new AWS.NodeHttpClient();
    send.handleRequest(req, null, function(httpResp) {
        var body = '';
        httpResp.on('data', function (chunk) {
            body += chunk;
        });
        httpResp.on('end', function (chunk) {
            numDocsAdded ++;
            if (numDocsAdded === totLogLines) {
                // Mark lambda success.  If not done so, it will be retried.
                console.log('All ' + numDocsAdded + ' log records added to ES.');
                context.succeed();
            }
        });
    }, function(err) {
        console.log('Error: ' + err);
        console.log(numDocsAdded + 'of ' + totLogLines + ' log records added to ES.');
        context.fail();
    });
}

/* Lambda "main": Execution starts here */
exports.handler = function(event, context) {
    console.log('Received event: ', JSON.stringify(event, null, 2));
    
    /* == Streams ==
    * To avoid loading an entire (typically large) log file into memory,
    * this is implemented as a pipeline of filters, streaming log data
    * from S3 to ES.
    * Flow: S3 file stream -> Log Line stream -> Log Record stream -> ES
    */
    var lineStream = new LineStream();
    // A stream of log records, from parsing each log line
    var recordStream = new stream.Transform({objectMode: true})
    recordStream._transform = function(line, encoding, done) {
        var logRecord = parse(line.toString());
        var serializedRecord = JSON.stringify(logRecord);
        this.push(serializedRecord);
        totLogLines ++;
        done();
    }

    event.Records.forEach(function(record) {
        var bucket = record.s3.bucket.name;
        var objKey = decodeURIComponent(record.s3.object.key.replace(/\+/g, ' '));
        s3LogsToES(bucket, objKey, context, lineStream, recordStream);
    });
}
Download .txt
gitextract_oa40gt6d/

├── LICENSE.txt
├── README.md
└── src/
    ├── kinesis_lambda_es.js
    └── s3_lambda_es.js
Download .txt
SYMBOL INDEX (3 symbols across 2 files)

FILE: src/kinesis_lambda_es.js
  function postToES (line 44) | function postToES(doc, context) {

FILE: src/s3_lambda_es.js
  function s3LogsToES (line 41) | function s3LogsToES(bucket, key, context, lineStream, recordStream) {
  function postDocumentToES (line 68) | function postDocumentToES(doc, context) {
Condensed preview — 4 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (15K chars).
[
  {
    "path": "LICENSE.txt",
    "chars": 951,
    "preview": "MIT No Attribution\n\nCopyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n\nPermission is hereby grante"
  },
  {
    "path": "README.md",
    "chars": 6886,
    "preview": "# Streaming Data to Amazon Elasticsearch Service\n## Using AWS Lambda: Sample Node.js Code\n\n### Introduction\nIt is often "
  },
  {
    "path": "src/kinesis_lambda_es.js",
    "chars": 2143,
    "preview": "/*\n * Sample node.js code for AWS Lambda to upload the JSON documents\n * pushed from Kinesis to Amazon Elasticsearch.\n *"
  },
  {
    "path": "src/s3_lambda_es.js",
    "chars": 4525,
    "preview": "/*\n * Sample node.js code for AWS Lambda to get Apache log files from S3, parse\n * and add them to an Amazon Elasticsear"
  }
]

About this extraction

This page contains the full source code of the awslabs/amazon-elasticsearch-lambda-samples GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 4 files (14.2 KB), approximately 3.5k tokens, and a symbol index with 3 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!