[
  {
    "path": "Dockerfile",
    "content": "FROM    alpine:3.3\n\n# Here we use several hacks collected from https://github.com/gliderlabs/docker-alpine/issues/11:\n# # 1. install GLibc (which is not the cleanest solution at all) \n\n\n# Build variables\nENV     FILEBEAT_VERSION 1.1.1\nENV     FILEBEAT_URL https://download.elastic.co/beats/filebeat/filebeat-${FILEBEAT_VERSION}-x86_64.tar.gz\n\n# Environment variables\nENV     FILEBEAT_HOME /opt/filebeat-${FILEBEAT_VERSION}-x86_64\nENV     PATH $PATH:${FILEBEAT_HOME}\n\nWORKDIR /opt/\n\nRUN     apk add --update python curl && \\\n        wget \"https://circle-artifacts.com/gh/andyshinn/alpine-pkg-glibc/6/artifacts/0/home/ubuntu/alpine-pkg-glibc/packages/x86_64/glibc-2.21-r2.apk\" \\\n             \"https://circle-artifacts.com/gh/andyshinn/alpine-pkg-glibc/6/artifacts/0/home/ubuntu/alpine-pkg-glibc/packages/x86_64/glibc-bin-2.21-r2.apk\" && \\\n        apk add --allow-untrusted glibc-2.21-r2.apk glibc-bin-2.21-r2.apk && \\\n        /usr/glibc/usr/bin/ldconfig /lib /usr/glibc/usr/lib\n\nRUN     curl -sL ${FILEBEAT_URL} | tar xz -C .\nADD     filebeat.yml ${FILEBEAT_HOME}/\nADD     docker-entrypoint.sh    /entrypoint.sh\nRUN     chmod +x /entrypoint.sh\n\nENTRYPOINT  [\"/entrypoint.sh\"]\nCMD         [\"start\"]\n"
  },
  {
    "path": "README.md",
    "content": "# What is Filebeat?\nFilebeat is a lightweight, open source shipper for log file data. As the next-generation Logstash Forwarder, Filebeat tails logs and quickly sends this information to Logstash for further parsing and enrichment.\n\n![alt text](https://static-www.elastic.co/assets/blta28996a125bb8b42/packetbeat-fish-nodes-bkgd.png?q=755 \"Filebeat logo\")\n\n> https://www.elastic.co/products/beats/filebeat\n\n\n# Why this image?\n\nThis image uses the Docker API to collect the logs of all the running containers on the same machine and ship them to a Logstash. No need to install Filebeat manually on your host or inside your images. Just use this image to create a container that's going to handle everything for you :-)\n\n\n# How to use this image\nStart Filebeat as follows:\n\n```\n$ docker run -d \n   -v /var/run/docker.sock:/tmp/docker.sock \n   -e LOGSTASH_HOST=monitoring.xyz -e LOGSTASH_PORT=5044 -e SHIPPER_NAME=$(hostname) \n   bargenson/filebeat\n```\n\nThree environment variables are needed:\n* `LOGSTASH_HOST`: to specify on which server runs your Logstash\n* `LOGSTASH_PORT`: to specify on which port listens your Logstash for beats inputs\n* `SHIPPER_NAME`: to specify the Filebeat shipper name (deafult: the container ID) \n\nThe docker-compose service definition should look as follows:\n```\nfilebeat:\n  image: bargenson/filebeat\n  restart: unless-stopped\n  volumes:\n   - /var/run/docker.sock:/tmp/docker.sock\n  environment:\n   - LOGSTASH_HOST=monitoring.xyz\n   - LOGSTASH_PORT=5044\n   - SHIPPER_NAME=aWonderfulName\n```\n\n\n# Logstash configuration:\n\nConfigure the Beats input plugin as follows:\n\n```\ninput {\n  beats {\n    port => 5044\n  }\n}\n```\n\nIn order to have a `containerName` field and a cleaned `message` field, you have to declare the following filter:\n\n```\nfilter {\n\n  if [type] == \"filebeat-docker-logs\" {\n\n    grok {\n      match => { \n        \"message\" => \"\\[%{WORD:containerName}\\] %{GREEDYDATA:message_remainder}\"\n      }\n    }\n\n    mutate {\n      replace => { \"message\" => \"%{message_remainder}\" }\n    }\n    \n    mutate {\n      remove_field => [ \"message_remainder\" ]\n    }\n\n  }\n\n}\n```\n\n\n# User Feedback\n## Issues\nIf you have any problems with or questions about this image, please contact me through a [GitHub issue](https://github.com/bargenson/docker-filebeat/issues).\n\n## Contributing\nYou are invited to the [GitHub repo](https://github.com/bargenson/docker-filebeat) to contribute new features, fixes, or updates, large or small."
  },
  {
    "path": "docker-compose.yml",
    "content": "filebeat:\n  image: bargenson/filebeat:latest\n  restart: unless-stopped\n  volumes:\n   - /var/run/docker.sock:/tmp/docker.sock\n  environment:\n   - LOGSTASH_HOST=logstash.localdomain\n   - LOGSTASH_PORT=5044\n   - SHIPPER_NAME=aWonderfulName\n"
  },
  {
    "path": "docker-entrypoint.sh",
    "content": "#!/bin/sh\nset -e\n\nif [ \"$1\" = 'start' ]; then\n\n  CONTAINERS_FOLDER=/tmp/containers\n  NAMED_PIPE=/tmp/pipe\n\n  setConfiguration() {\n    KEY=$1\n    VALUE=$2\n    sed -i \"s/{{$KEY}}/$VALUE/g\" ${FILEBEAT_HOME}/filebeat.yml\n  }\n\n  getRunningContainers() {\n    curl --no-buffer -s -XGET --unix-socket /tmp/docker.sock http:/containers/json | python -c \"\nimport json, sys\ncontainers=json.loads(sys.stdin.readline())\nfor container in containers:\n  print(container['Id'])\n\"\n  }\n\n  getContainerName() {\n    curl --no-buffer -s -XGET --unix-socket /tmp/docker.sock http:/containers/$1/json | python -c \"\nimport json, sys\ncontainer=json.loads(sys.stdin.readline())\nprint(container['Name'])\n\" | sed 's;/;;'\n  }\n\n  createContainerFile() {\n    touch \"$CONTAINERS_FOLDER/$1\"\n  }\n\n  removeContainerFile() {\n    rm \"$CONTAINERS_FOLDER/$1\"\n  }\n\n  collectContainerLogs() {\n    local CONTAINER=$1\n    echo \"Processing $CONTAINER...\"\n    createContainerFile $CONTAINER\n    CONTAINER_NAME=`getContainerName $CONTAINER`\n    curl -s --no-buffer -XGET --unix-socket /tmp/docker.sock \"http:/containers/$CONTAINER/logs?stderr=1&stdout=1&tail=1&follow=1\" | sed \"s;^;[$CONTAINER_NAME] ;\" > $NAMED_PIPE\n    echo \"Disconnected from $CONTAINER.\"\n    removeContainerFile $CONTAINER\n  }\n\n  if [ -n \"${LOGSTASH_HOST+1}\" ]; then\n    setConfiguration \"LOGSTASH_HOST\" \"$LOGSTASH_HOST\"\n  else\n    echo \"LOGSTASH_HOST is needed\"\n    exit 1\n  fi\n\n  if [ -n \"${LOGSTASH_PORT+1}\" ]; then\n    setConfiguration \"LOGSTASH_PORT\" \"$LOGSTASH_PORT\"\n  else\n    echo \"LOGSTASH_PORT is needed\"\n    exit 1\n  fi\n\n  if [ -n \"${SHIPPER_NAME+1}\" ]; then\n    setConfiguration \"SHIPPER_NAME\" \"$SHIPPER_NAME\"\n  else\n    setConfiguration \"SHIPPER_NAME\" \"`hostname`\"\n  fi\n\n  rm -rf \"$CONTAINERS_FOLDER\"\n  rm -rf \"$NAMED_PIPE\"\n  mkdir \"$CONTAINERS_FOLDER\"\n  mkfifo -m a=rw \"$NAMED_PIPE\"\n\n  echo \"Initializing Filebeat...\"\n  cat $NAMED_PIPE | ${FILEBEAT_HOME}/filebeat -e -v &\n\n  while true; do\n    CONTAINERS=`getRunningContainers`\n    for CONTAINER in $CONTAINERS; do\n      if ! ls $CONTAINERS_FOLDER | grep -q $CONTAINER; then\n        collectContainerLogs $CONTAINER &\n      fi\n    done\n    sleep 5\n  done\n\nelse\n  exec \"$@\"\nfi\n\n"
  },
  {
    "path": "filebeat.yml",
    "content": "################### Filebeat Configuration Example #########################\n\n############################# Filebeat ######################################\nfilebeat:\n  # List of prospectors to fetch data.\n  prospectors:\n    # Each - is a prospector. Below are the prospector specific configurations\n    -\n      # Paths that should be crawled and fetched. Glob based paths.\n      # To fetch all \".log\" files from a specific level of subdirectories\n      # /var/log/*/*.log can be used.\n      # For each file found under this path, a harvester is started.\n      # Make sure not file is defined twice as this can lead to unexpected behaviour.\n      #paths:\n        #- /var/log/*.log\n        #- c:\\programdata\\elasticsearch\\logs\\*\n\n      # Configure the file encoding for reading files with international characters\n      # following the W3C recommendation for HTML5 (http://www.w3.org/TR/encoding).\n      # Some sample encodings:\n      #   plain, utf-8, utf-16be-bom, utf-16be, utf-16le, big5, gb18030, gbk,\n      #    hz-gb-2312, euc-kr, euc-jp, iso-2022-jp, shift-jis, ...\n      #encoding: plain\n\n      # Type of the files. Based on this the way the file is read is decided.\n      # The different types cannot be mixed in one prospector\n      #\n      # Possible options are:\n      # * log: Reads every line of the log file (default)\n      # * stdin: Reads the standard in\n      input_type: stdin\n\n      # Exclude lines. A list of regular expressions to match. It drops the lines that are\n      # matching any regular expression from the list. The include_lines is called before\n      # exclude_lines. By default, no lines are dropped.\n      # exclude_lines: [\"^DBG\"]\n\n      # Include lines. A list of regular expressions to match. It exports the lines that are\n      # matching any regular expression from the list. The include_lines is called before\n      # exclude_lines. By default, all the lines are exported.\n      # include_lines: [\"^ERR\", \"^WARN\"]\n\n      # Exclude files. A list of regular expressions to match. Filebeat drops the files that\n      # are matching any regular expression from the list. By default, no files are dropped.\n      # exclude_files: [\".gz$\"]\n\n      # Optional additional fields. These field can be freely picked\n      # to add additional information to the crawled log files for filtering\n      #fields:\n      #  level: debug\n      #  review: 1\n\n      # Set to true to store the additional fields as top level fields instead\n      # of under the \"fields\" sub-dictionary. In case of name conflicts with the\n      # fields added by Filebeat itself, the custom fields overwrite the default\n      # fields.\n      #fields_under_root: false\n\n      # Ignore files which were modified more then the defined timespan in the past\n      # Time strings like 2h (2 hours), 5m (5 minutes) can be used.\n      #ignore_older: 24h\n\n      # Type to be published in the 'type' field. For Elasticsearch output,\n      # the type defines the document type these entries should be stored\n      # in. Default: log\n      document_type: filebeat-docker-logs\n\n      # Scan frequency in seconds.\n      # How often these files should be checked for changes. In case it is set\n      # to 0s, it is done as often as possible. Default: 10s\n      #scan_frequency: 10s\n\n      # Defines the buffer size every harvester uses when fetching the file\n      #harvester_buffer_size: 16384\n\n      # Maximum number of bytes a single log event can have\n      # All bytes after max_bytes are discarded and not sent. The default is 10MB.\n      # This is especially useful for multiline log messages which can get large.\n      #max_bytes: 10485760\n\n      # Mutiline can be used for log messages spanning multiple lines. This is common\n      # for Java Stack Traces or C-Line Continuation\n      #multiline:\n\n        # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [\n        #pattern: ^\\[\n\n        # Defines if the pattern set under pattern should be negated or not. Default is false.\n        #negate: false\n\n        # Match can be set to \"after\" or \"before\". It is used to define if lines should be append to a pattern\n        # that was (not) matched before or after or as long as a pattern is not matched based on negate.\n        # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash\n        #match: after\n\n        # The maximum number of lines that are combined to one event.\n        # In case there are more the max_lines the additional lines are discarded.\n        # Default is 500\n        #max_lines: 500\n\n        # After the defined timeout, an multiline event is sent even if no new pattern was found to start a new event\n        # Default is 5s.\n        #timeout: 5s\n\n      # Setting tail_files to true means filebeat starts readding new files at the end\n      # instead of the beginning. If this is used in combination with log rotation\n      # this can mean that the first entries of a new file are skipped.\n      #tail_files: false\n\n      # Backoff values define how agressively filebeat crawls new files for updates\n      # The default values can be used in most cases. Backoff defines how long it is waited\n      # to check a file again after EOF is reached. Default is 1s which means the file\n      # is checked every second if new lines were added. This leads to a near real time crawling.\n      # Every time a new line appears, backoff is reset to the initial value.\n      #backoff: 1s\n\n      # Max backoff defines what the maximum backoff time is. After having backed off multiple times\n      # from checking the files, the waiting time will never exceed max_backoff idenependent of the\n      # backoff factor. Having it set to 10s means in the worst case a new line can be added to a log\n      # file after having backed off multiple times, it takes a maximum of 10s to read the new line\n      #max_backoff: 10s\n\n      # The backoff factor defines how fast the algorithm backs off. The bigger the backoff factor,\n      # the faster the max_backoff value is reached. If this value is set to 1, no backoff will happen.\n      # The backoff value will be multiplied each time with the backoff_factor until max_backoff is reached\n      #backoff_factor: 2\n\n      # This option closes a file, as soon as the file name changes.\n      # This config option is recommended on windows only. Filebeat keeps the files it's reading open. This can cause\n      # issues when the file is removed, as the file will not be fully removed until also Filebeat closes\n      # the reading. Filebeat closes the file handler after ignore_older. During this time no new file with the\n      # same name can be created. Turning this feature on the other hand can lead to loss of data\n      # on rotate files. It can happen that after file rotation the beginning of the new\n      # file is skipped, as the reading starts at the end. We recommend to leave this option on false\n      # but lower the ignore_older value to release files faster.\n      #force_close_files: false\n\n    # Additional prospector\n    #-\n      # Configuration to use stdin input\n      #input_type: stdin\n\n  # General filebeat configuration options\n  #\n  # Event count spool threshold - forces network flush if exceeded\n  #spool_size: 2048\n\n  # Defines how often the spooler is flushed. After idle_timeout the spooler is\n  # Flush even though spool_size is not reached.\n  #idle_timeout: 5s\n\n  # Name of the registry file. Per default it is put in the current working\n  # directory. In case the working directory is changed after when running\n  # filebeat again, indexing starts from the beginning again.\n  #registry_file: .filebeat\n\n  # Full Path to directory with additional prospector configuration files. Each file must end with .yml\n  # These config files must have the full filebeat config part inside, but only\n  # the prospector part is processed. All global options like spool_size are ignored.\n  # The config_dir MUST point to a different directory then where the main filebeat config file is in.\n  #config_dir:\n\n###############################################################################\n############################# Libbeat Config ##################################\n# Base config file used by all other beats for using libbeat features\n\n############################# Output ##########################################\n\n# Configure what outputs to use when sending the data collected by the beat.\n# Multiple outputs may be used.\noutput:\n\n  ### Elasticsearch as output\n  #elasticsearch:\n    # Array of hosts to connect to.\n    # Scheme and port can be left out and will be set to the default (http and 9200)\n    # In case you specify and additional path, the scheme is required: http://localhost:9200/path\n    # IPv6 addresses should always be defined as: https://[2001:db8::1]:9200\n    #hosts: [\"localhost:9200\"]\n\n    # Optional protocol and basic auth credentials.\n    #protocol: \"https\"\n    #username: \"admin\"\n    #password: \"s3cr3t\"\n\n    # Number of workers per Elasticsearch host.\n    #worker: 1\n\n    # Optional index name. The default is \"filebeat\" and generates\n    # [filebeat-]YYYY.MM.DD keys.\n    #index: \"filebeat\"\n\n    # Optional HTTP Path\n    #path: \"/elasticsearch\"\n\n    # Proxy server url\n    #proxy_url: http://proxy:3128\n\n    # The number of times a particular Elasticsearch index operation is attempted. If\n    # the indexing operation doesn't succeed after this many retries, the events are\n    # dropped. The default is 3.\n    #max_retries: 3\n\n    # The maximum number of events to bulk in a single Elasticsearch bulk API index request.\n    # The default is 50.\n    #bulk_max_size: 50\n\n    # Configure http request timeout before failing an request to Elasticsearch.\n    #timeout: 90\n\n    # The number of seconds to wait for new events between two bulk API index requests.\n    # If `bulk_max_size` is reached before this interval expires, addition bulk index\n    # requests are made.\n    #flush_interval: 1\n\n    # Boolean that sets if the topology is kept in Elasticsearch. The default is\n    # false. This option makes sense only for Packetbeat.\n    #save_topology: false\n\n    # The time to live in seconds for the topology information that is stored in\n    # Elasticsearch. The default is 15 seconds.\n    #topology_expire: 15\n\n    # tls configuration. By default is off.\n    #tls:\n      # List of root certificates for HTTPS server verifications\n      #certificate_authorities: [\"/etc/pki/root/ca.pem\"]\n\n      # Certificate for TLS client authentication\n      #certificate: \"/etc/pki/client/cert.pem\"\n\n      # Client Certificate Key\n      #certificate_key: \"/etc/pki/client/cert.key\"\n\n      # Controls whether the client verifies server certificates and host name.\n      # If insecure is set to true, all server host names and certificates will be\n      # accepted. In this mode TLS based connections are susceptible to\n      # man-in-the-middle attacks. Use only for testing.\n      #insecure: true\n\n      # Configure cipher suites to be used for TLS connections\n      #cipher_suites: []\n\n      # Configure curve types for ECDHE based cipher suites\n      #curve_types: []\n\n      # Configure minimum TLS version allowed for connection to logstash\n      #min_version: 1.0\n\n      # Configure maximum TLS version allowed for connection to logstash\n      #max_version: 1.2\n\n\n  ### Logstash as output\n  logstash:\n    # The Logstash hosts\n    hosts: [\"{{LOGSTASH_HOST}}:{{LOGSTASH_PORT}}\"]\n\n    # Number of workers per Logstash host.\n    #worker: 1\n\n    # Set gzip compression level.\n    #compression_level: 3\n\n    # Optional load balance the events between the Logstash hosts\n    #loadbalance: true\n\n    # Optional index name. The default index name depends on the each beat.\n    # For Packetbeat, the default is set to packetbeat, for Topbeat\n    # top topbeat and for Filebeat to filebeat.\n    #index: filebeat\n\n    # Optional TLS. By default is off.\n    #tls:\n      # List of root certificates for HTTPS server verifications\n      #certificate_authorities: [\"/etc/pki/root/ca.pem\"]\n\n      # Certificate for TLS client authentication\n      #certificate: \"/etc/pki/client/cert.pem\"\n\n      # Client Certificate Key\n      #certificate_key: \"/etc/pki/client/cert.key\"\n\n      # Controls whether the client verifies server certificates and host name.\n      # If insecure is set to true, all server host names and certificates will be\n      # accepted. In this mode TLS based connections are susceptible to\n      # man-in-the-middle attacks. Use only for testing.\n      #insecure: true\n\n      # Configure cipher suites to be used for TLS connections\n      #cipher_suites: []\n\n      # Configure curve types for ECDHE based cipher suites\n      #curve_types: []\n\n\n  ### File as output\n  #file:\n    # Path to the directory where to save the generated files. The option is mandatory.\n    #path: \"/tmp/filebeat\"\n\n    # Name of the generated files. The default is `filebeat` and it generates files: `filebeat`, `filebeat.1`, `filebeat.2`, etc.\n    #filename: filebeat\n\n    # Maximum size in kilobytes of each file. When this size is reached, the files are\n    # rotated. The default value is 10 MB.\n    #rotate_every_kb: 10000\n\n    # Maximum number of files under path. When this number of files is reached, the\n    # oldest file is deleted and the rest are shifted from last to first. The default\n    # is 7 files.\n    #number_of_files: 7\n\n\n  ### Console output\n  # console:\n    # Pretty print json event\n    #pretty: false\n\n\n############################# Shipper #########################################\n\nshipper:\n  # The name of the shipper that publishes the network data. It can be used to group\n  # all the transactions sent by a single shipper in the web interface.\n  # If this options is not defined, the hostname is used.\n  name: {{SHIPPER_NAME}}\n\n  # The tags of the shipper are included in their own field with each\n  # transaction published. Tags make it easy to group servers by different\n  # logical properties.\n  #tags: [\"service-X\", \"web-tier\"]\n\n  # Uncomment the following if you want to ignore transactions created\n  # by the server on which the shipper is installed. This option is useful\n  # to remove duplicates if shippers are installed on multiple servers.\n  #ignore_outgoing: true\n\n  # How often (in seconds) shippers are publishing their IPs to the topology map.\n  # The default is 10 seconds.\n  #refresh_topology_freq: 10\n\n  # Expiration time (in seconds) of the IPs published by a shipper to the topology map.\n  # All the IPs will be deleted afterwards. Note, that the value must be higher than\n  # refresh_topology_freq. The default is 15 seconds.\n  #topology_expire: 15\n\n  # Internal queue size for single events in processing pipeline\n  #queue_size: 1000\n\n  # Configure local GeoIP database support.\n  # If no paths are not configured geoip is disabled.\n  #geoip:\n    #paths:\n    #  - \"/usr/share/GeoIP/GeoLiteCity.dat\"\n    #  - \"/usr/local/var/GeoIP/GeoLiteCity.dat\"\n\n\n############################# Logging #########################################\n\n# There are three options for the log ouput: syslog, file, stderr.\n# Under Windos systems, the log files are per default sent to the file output,\n# under all other system per default to syslog.\nlogging:\n\n  # Send all logging output to syslog. On Windows default is false, otherwise\n  # default is true.\n  #to_syslog: true\n\n  # Write all logging output to files. Beats automatically rotate files if rotateeverybytes\n  # limit is reached.\n  #to_files: false\n\n  # To enable logging to files, to_files option has to be set to true\n  files:\n    # The directory where the log files will written to.\n    #path: /var/log/mybeat\n\n    # The name of the files where the logs are written to.\n    #name: mybeat\n\n    # Configure log file size limit. If limit is reached, log file will be\n    # automatically rotated\n    rotateeverybytes: 10485760 # = 10MB\n\n    # Number of rotated log files to keep. Oldest files will be deleted first.\n    #keepfiles: 7\n\n  # Enable debug output for selected components. To enable all selectors use [\"*\"]\n  # Other available selectors are beat, publish, service\n  # Multiple selectors can be chained.\n  #selectors: [ ]\n\n  # Sets log level. The default log level is error.\n  # Available log levels are: critical, error, warning, info, debug\n  #level: error\n\n\n"
  }
]