Within one of our projects we installed a graylog2 log analytics server.

After playing around with the different shipping methods for nginx access.log’s, we were pretty confident, that we disliked all of the seen solutions which there were (i.e.):

Please don’t get me wrong – i am not saying that the above wont work, but they just didn’t meet our expectations in form of:

  • flexible adding and removing of fields in graylog
  • easyness of use and operation
  • easy implementation in nginx
  • of course – easy (automated) rollouts to the nginx clusters
  • managing different “nginx – server types” such as “caching_proxy” “upstream server” or plain “webserver”

After evaluation the PRO’s and CON’S (our requirements against the existing solutions) we decided to create our very own way to ship nginx access logs to graylog.

 

So whats the basic idea ?

Relaying on Lennart Koopmanns idea, of using nginx syslog integration (which we really love !) , we created a new json based log format and a new graylog2 input to handle the above requirements.

Nginx sends the message in json format via the syslog  protocol (UDP) to our graylog2 server ( Syslog UDP Input). Graylog2 Extracts the JSONS and (wihtout the need of manual defining the extractor for fields or field names) automatically inserts the extracted message.

 

What you’ll need

Nginx greater than version 1.7.1 (this is where syslog shipping was added)

Graylog2 Server up and running

 

NGINX example log_format_line for one of our caching servers

log_format graylog2_json '{ "nginx_server_type": "caching_proxy", "time": "$time_iso8601", "remote_addr": "$remote_addr", "remote_user": "$remote_user", "body_bytes_sent": "$body_bytes_sent", "request_time": "$request_time", "status": "$status", "request": "$request", "request_method": "$request_method", "http_referrer": "$http_referer", "http_user_agent": "$http_user_agent", "upstream_cache_status": "$upstream_cache_status", "upstream_addr": "$upstream_addr", "request_scheme": "$scheme", "request_body": "$request_body"}';
access_log syslog:server=graylog2.mynet.local:12301 graylog2_json;

so this line gets directly fired against our INPUT on the graylog server (of course with the vars replaced)

 

Creating the INPUT on graylog2 side was a little bit more tricky, since i first had to regex the incoming message due to modifications of the syslog protocol which looks like this (original without any modification):

nginx-cache01.mynet.local nginx: { "time": "2017-02-23T19:55:08+01:00", "remote_addr": "4.3.2.1", "remote_user": "-", "body_bytes_sent": "1917", "request_time": "0.000", "status": "200", "request": "GET /loading.gif HTTP/1.1", "request_method": "GET", "http_referrer": "http://www.mywebsite.com/test.html", "http_user_agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0", "upstream_cache_status": "HIT", "upstream_addr": "-"}

With a little regex we wiped everything out until the { and put the extract in a new field called “json”.
The JSON Field got an json – extractor which extracts all the fields defined inside the json string defined inside  nginx.

So if we want to add some more fields or values in the future, we just have to modify the nginx log_format and we do not even have to touch or manipulate the extractors of our input. Sounds quite flexible ? This is what we wanted to achieve !

 

For easier implementation, we put the whole “graylog2” related  stuff into a separate config file and included it into the nginx config – which makes it quite easy to handle the logshipping config with a configuration tool like puppet or ansible.

To give you some more insights, i put our working Graylog2 INPUT Extractors into the repo as well. Maybe i create a content_pack someday, someday but not for now.

 

Here’s the link to our github repository

 

Further reading

nginx_syslog (access_log)

graylog2_inputs