Description
We have a single Azure Event Hub from which we want to read and process event logs. There are around 200k+ events feed into the Hub every 30 seconds. We are currently hosting everything in Azure. If we configure a single Logstash VM, after some optimisation and tinkering of settings, we are able to read around 180k messages every 30 seconds (±10k). The machine is then running on average between 95-100% CPU usage and using 9 out of 16 GB of RAM (stats pulled from htop). As soon as I add the storage_connection
option to the config, the single machine drops down to around 100k messages per second. After some tweaking, I'm able to get it up to around 120k. The machine now runs between 30-50% CPU usage and about 7GB of RAM used. If I try to add another machine, identical specs and same configs, then the total number of messages processed feed into ES are around 140k, adding a third machine raises the number to around 150k.
Anyone knows what could be the cause of the problem? Just adding the blob storage to a single machine almost halves the performance, but can be mitigated through adding more threads and higher batch sizes. All VMs, Storage Account and Azure Event hub are located under the same Azure subscription and in the same region. I noticed that upgrading to a premium Storage Account raised the number with about 5k messages per 30 seconds.
Input config:
input {
azure_event_hubs {
event_hub_connections => ["Endpoint=sb://.....servicebus.windows.net/;SharedAccessKeyName=....;SharedAccessKey=....;EntityPath=...."]
threads => 32
codec => plain {
charset => "ISO-8859-1"
}
max_batch_size => 1000
storage_connection => "DefaultEndpointsProtocol=https;AccountName=....;AccountKey=....;EndpointSuffix=core.windows.net"
storage_container => "logstash-proxy"
decorate_events => false
}
}
pipelines.yml:
- pipeline.id: main
path.config: "/etc/logstash/conf.d/*.conf"
pipeline.workers: 16
pipeline.batch.size: 500
Our cluster was initially deployed using the Azure Marketplace Elasticsearch template. I do not believe ES to be the bottleneck given that we were able to feed into it 180k messages from a single machine, and did at that point only max out at around 50-70% CPU usage.
Any tips or help in improving our performance would be much appreciated. If this is the incorrect place to post such a problem, then I apologise, however this seems to be some problem either in the azure_event_hub input plugin itself or in my configuration of it.