Parse arbitrary text and structure it. Grok is currently the best way in logstash to parse crappy unstructured log data (like syslog or apache logs) into something structured and queryable.
This filter requires you have libgrok installed.
You can find libgrok here: http://code.google.com/p/semicomplete/wiki/Grok
Compile/install notes can be found in the INSTALL file of the grok tarball, or here: https://github.com/jordansissel/grok/blob/master/INSTALL
Key dependencies:
Feature requirements:
Note: CentOS 5 ships with an ancient version of pcre that does not work with grok.
filter {
grok {
/[A-Za-z0-9_-]+/ => ... # string
add_field => ... # hash, default: {}
add_tag => ... # array, default: []
break_on_match => ... # boolean, default: true
drop_if_match => ... # boolean
match => ... # hash, default: {}
named_captures_only => ... # boolean
pattern => ... # array
patterns_dir => ... # array, default: []
type => ... # string
}
}
Any existing field name can be used as a config name here for matching against.
# this config:
foo => "some pattern"
# same as:
match => [ "foo", "some pattern" ]
If this filter is successful, add any arbitrary fields to this event. Example:
filter {
myfilter {
add_field => [ "sample", "Hello world, from %{@source}" ]
}
}
On success, myfilter will then add field 'sample' with the value above and the %{@source} piece replaced with that value from the event.
If this filter is successful, add arbitrary tags to the event. Tags can be dynamic and include parts of the event using the %{field} syntax. Example:
filter {
myfilter {
add_tag => [ "foo_%{somefield}" ]
}
}
If the event has field "somefield" == "hello" this filter, on success, would add a tag "foo_hello"
Break on first match. The first successful match by grok will result in the filter being finished. If you want grok to try all patterns (maybe you are parsing different things), then set this to false.
Drop if matched. Note, this feature may not stay. It is preferable to combine grok + grep filters to do parsing + dropping.
requested in: googlecode/issue/26
Specify a path to a directory with grok pattern files in it A hash of matches of field => value
If true, only store named captures from grok.
Specify a pattern to parse with. This will match the '@message' field.
If you want to match other fields than @message, use the 'match' setting. Multiple patterns is fine.
logstash ships by default with a bunch of patterns, so you don't necessarily need to define this yourself unless you are adding additional patterns.
Pattern files are plain text with format:
NAME PATTERN
For example:
NUMBER \d+
The type to act on. If a type is given, then this filter will only act on messages with the same type. See any input plugin's "type" attribute for more.