logstash
logstash

grok

Parse arbitrary text and structure it. Grok is currently the best way in logstash to parse crappy unstructured log data (like syslog or apache logs) into something structured and queryable.

This filter requires you have libgrok installed.

You can find libgrok here: http://code.google.com/p/semicomplete/wiki/Grok

Compile/install notes can be found in the INSTALL file of the grok tarball, or here: https://github.com/jordansissel/grok/blob/master/INSTALL

Key dependencies:

  • libtokyocabinet > 1.4.6
  • libpcre >= 7.6
  • libevent >= 1.3 (though older versions may worK)

Feature requirements:

  • Int/float coercion requires >= 1.20110223.*
  • In-line pattern definitions >= 1.20110630.*

Note: CentOS 5 ships with an ancient version of pcre that does not work with grok.

Flags

This plugin provides the following flags:
--grok-patterns-path PATH
Colon-delimited path of patterns to load

Synopsis

This is what it might look like in your config file:

filter {
  grok {
    /[A-Za-z0-9_-]+/ => ... # string
    add_field => ... # hash, default: {}
    add_tag => ... # array, default: []
    break_on_match => ... # boolean, default: true
    drop_if_match => ... # boolean
    match => ... # hash, default: {}
    named_captures_only => ... # boolean
    pattern => ... # array
    patterns_dir => ... # array, default: []
    type => ... # string
  }
}

Details

/[A-Za-z0-9_-]+/

  • The configuration attribute name here is anything that matches the above regular expression.
  • Value type is string
  • There is no default value for this setting.

Any existing field name can be used as a config name here for matching against.

# this config:
foo => "some pattern"

# same as:
match => [ "foo", "some pattern" ]

add_field

  • Value type is hash
  • Default value is {}

If this filter is successful, add any arbitrary fields to this event. Example:

filter {
  myfilter {
    add_field => [ "sample", "Hello world, from %{@source}" ]
  }
}

On success, myfilter will then add field 'sample' with the value above and the %{@source} piece replaced with that value from the event.

add_tag

  • Value type is array
  • Default value is []

If this filter is successful, add arbitrary tags to the event. Tags can be dynamic and include parts of the event using the %{field} syntax. Example:

filter {
  myfilter {
    add_tag => [ "foo_%{somefield}" ]
  }
}

If the event has field "somefield" == "hello" this filter, on success, would add a tag "foo_hello"

break_on_match

  • Value type is boolean
  • Default value is true

Break on first match. The first successful match by grok will result in the filter being finished. If you want grok to try all patterns (maybe you are parsing different things), then set this to false.

drop_if_match

  • Value type is boolean
  • There is no default value for this setting.

Drop if matched. Note, this feature may not stay. It is preferable to combine grok + grep filters to do parsing + dropping.

requested in: googlecode/issue/26

match

  • Value type is hash
  • Default value is {}

Specify a path to a directory with grok pattern files in it A hash of matches of field => value

named_captures_only

  • Value type is boolean
  • There is no default value for this setting.

If true, only store named captures from grok.

pattern

  • Value type is array
  • There is no default value for this setting.

Specify a pattern to parse with. This will match the '@message' field.

If you want to match other fields than @message, use the 'match' setting. Multiple patterns is fine.

patterns_dir

  • Value type is array
  • Default value is []

logstash ships by default with a bunch of patterns, so you don't necessarily need to define this yourself unless you are adding additional patterns.

Pattern files are plain text with format:

NAME PATTERN

For example:

NUMBER \d+

type

  • Value type is string
  • There is no default value for this setting.

The type to act on. If a type is given, then this filter will only act on messages with the same type. See any input plugin's "type" attribute for more.


This is documentation from lib/logstash/filters/grok.rb