Skip to content

Problem scaling Logstash cluster for single Azure Event Hub #41

Open
@kristianvld

Description

@kristianvld

We have a single Azure Event Hub from which we want to read and process event logs. There are around 200k+ events feed into the Hub every 30 seconds. We are currently hosting everything in Azure. If we configure a single Logstash VM, after some optimisation and tinkering of settings, we are able to read around 180k messages every 30 seconds (±10k). The machine is then running on average between 95-100% CPU usage and using 9 out of 16 GB of RAM (stats pulled from htop). As soon as I add the storage_connection option to the config, the single machine drops down to around 100k messages per second. After some tweaking, I'm able to get it up to around 120k. The machine now runs between 30-50% CPU usage and about 7GB of RAM used. If I try to add another machine, identical specs and same configs, then the total number of messages processed feed into ES are around 140k, adding a third machine raises the number to around 150k.

Anyone knows what could be the cause of the problem? Just adding the blob storage to a single machine almost halves the performance, but can be mitigated through adding more threads and higher batch sizes. All VMs, Storage Account and Azure Event hub are located under the same Azure subscription and in the same region. I noticed that upgrading to a premium Storage Account raised the number with about 5k messages per 30 seconds.

Input config:

input {
  azure_event_hubs {
     event_hub_connections => ["Endpoint=sb://.....servicebus.windows.net/;SharedAccessKeyName=....;SharedAccessKey=....;EntityPath=...."]
     threads => 32
     codec => plain {
       charset => "ISO-8859-1"
     }
     max_batch_size => 1000
     storage_connection => "DefaultEndpointsProtocol=https;AccountName=....;AccountKey=....;EndpointSuffix=core.windows.net"
     storage_container => "logstash-proxy"
     decorate_events => false
  }
}

pipelines.yml:

- pipeline.id: main
 path.config: "/etc/logstash/conf.d/*.conf"
 pipeline.workers: 16
 pipeline.batch.size: 500

Our cluster was initially deployed using the Azure Marketplace Elasticsearch template. I do not believe ES to be the bottleneck given that we were able to feed into it 180k messages from a single machine, and did at that point only max out at around 50-70% CPU usage.

Any tips or help in improving our performance would be much appreciated. If this is the incorrect place to post such a problem, then I apologise, however this seems to be some problem either in the azure_event_hub input plugin itself or in my configuration of it.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions