My take on the ELK stack: best opensource dashboard

A dashboard with pretty plots and numbers about your organization… It shouldn’t be that difficult, right?

Reality: Things don’t always go according to the plan pic.twitter.com/tD5W5Eqcbd via @ToddWhitaker & @LarysaDiDio
— Michael Carton (@michaeltcarton) March 2, 2014

Luckily, ElasticSearch+Logstash+Kibana did came up with a nice stack to solve this problem. Here’s my result and how I solved some integration hiccups. The resulting beta dashboard, as it is now, looks like this:

Many many many other blog posts have addressed this issue, but I would like to share a couple of tweaks I came up with while working on it.

Logstash and iRODs custom log parsing

Broadly speaking, I found the following issues when setting up the dashboard:

Index design: Logstash default index formatting (logstash-YYYYMMDD) brought down a 30GB RAM machine (line 63).
Missing year in log timestamps from logs content, getting it from log filenames (lines 31 to 42 below).
GeoIP coordinates not being parsed by Kibana’s bettermap (lines 44 to 58).

ElasticSearch tweaks

As mentioned, the default “daily” indexing scheme in logstash did not work for my purposes. When monitoring elasticsearch with the head plugin, the status went red while ingesting events from logs. Thanks to @zackarytong and other’s feedback, I managed to address the issue:

Stress test #elasticsearch+#logstash with 3 years (800MB) worth of logs to parse and index. If the machine survives, we'll call it a day :)
— Roman Valls (@braincode) February 10, 2014

Then, Kibana could not connect to the elasticsearch backend when large time ranges were defined. After some chrome developer tool and CURLing, I reported the issue, too many indexes were URL-encoded which required to setup the following ElasticSearch directive:

http.max_initial_line_length: 64kb

In the future, I might consider further indexing optimizations by using the ElasticSearch index curator tool to optimize and cut down index usage, but for now I want to keep all data accessible from the dashboard.

Kibana

The presentation side of this dashboard hipster stack also had some gimmicks. My original idea of showing three separate Kibana bettermaps, one per AWS region, had to wait after another issue got addressed very recently. Both Chrome developer console and AngularJS batarang were very useful to find issues in Kibana and its interactions with the ElasticSearch backend.

Speaking of Amazon and their regions, while AWS has exposed a great amount of functionality through their developer APIs, at the time of writing these lines there are missing API endpoints, such as billing information in all regions (only available in us-east-1), and fetching remaining credits from your EDU Amazon grant, should you have one. The only viable option left today is scraping it:

If one wants to keep track of AWS expenses on ES, some specific index type mappings changes on the ElasticSearch side are needed:

curl -XPUT 'https://ids-panel.incf.net:9200/aws-2014.02/credits/_mapping' -d '{
  "aws_credit_balance": {
    "properties": {
      "aws_credit_balance": {
        "type": "integer"
      }
    }
  }
}'

In a more large scale/business setting, Netflix’s Ice could be definitely a good fit.

Conclusions and future improvements

It has been a fun project to collect data about your organization’s services and be able to expose it as clearly as possible and in realtime, feels like I would like to do this for a living as a consultant some day. Some new insight coming from the dashboard has allowed us to decide on downscale resources, ultimately saving money.

The feedback from INCF people has been invaluable to rethink how some data is presented and what it all means, always bring third parties and ask them their opinions. Visualization is hard to get right, bring in users and consider their feedback.

My next iteration in this project is to have finer detail on which activities users are performing (data being shared, copied, downloaded). This could be leveraged with some custom iRODS microservices for ElasticSearch or evaluating other EU-funded projects in the topic such as Chesire3.

When it comes to who can access the dashboard, there’s a recent blog post on multi-faceted authentication, a.k.a showing different views of the dashboard to different audiences. I’ve already tried Kibana’s authentication proxy, which supports OAuth among other auth systems, but there are a few rough edges to polish.

On the logstash backend, it might be worth grepping iRODS codebase for log stanzas to assess important log events worth parsing and getting good semantic tokens out of them. Luckily, ElasticSearch is backed by Lucene’s full text engine helps a lot in not having to do this tedious task. Kibana/ElasticSearch search and filtering are excellent.

Last but not least, some remaining issues leading to total world domination include:

Instrumenting all your organization’s Python code with Logbook sinking to a Redis exchange.
Easily add/include other types of panels in Kibana, perhaps allowing better or more explicit integration possibilities for D3, mpld3 or BokehJS with Kibana.
Getting UNIQ count for records in ElasticSearch (i.e, count unique number of IPs, users, etc…) which are on the roadmap under aggregations, so they are coming soon :)