Understanding and Managing Collector Overhead¶
Epoch's collectors are designed to minimize the overhead while collecting realtime metrics. This guide explains the techiques and tradeoffs for managing that overhead.
Functional View of Overhead¶
The following is the functional view of overhead in Epoch Collectors:
- Packet capture is achieved with the
sslsplitcomponent. It has minimal CPU overhead and must be co-located with the host. Typical overhead is ~1% of CPU runtime on average and 5-10 MB of RAM.
- Stream processing has the maximum CPU and memory overhead (when doing layer7 analysis) but can be offloaded to another host. For layer7 protocol analysis, typical overhead is 3-8% of CPU runtime on average and 200-600 MB of RAM. For layer4 protocol analysis, typical overhead is less than 0.25% of CPU and less than 50 MB of RAM.
- Infrastructure metrics collection has a predictable and recurring overhead of about 1% of CPU runtime and ~120MB of RAM. This has to run co-located with the host.
With the default collector settings (layer4 protocol analysis mode), the typical overhead is 1-2% of CPU time and 150-200 MB of RAM.
With the layer7 protocol analysis settings all the functional overhead is on the host without any sampling or resource limits. Typical workloads incur on average 5-10 % of CPU time for the Collectors. The actual overhead depends on the throughput of network transactions being processed. The typical memory overhead is 300-700 MB.
The outgoing network bandwidth is between 5-20 KBps. The exact number depends on the number of unique series.
The following techniques for controlling overhead can be used alone or in conjunction with one another.
This parameter controls the sampling of network traffic at the TCP flow level for the layer7 analysis. This parameter does not impact the layer4 analysis. The sampling parameter is provided as a percentage of total traffic (1-100), so a sampling rate of 50 would sample 50% of the flows. By default, the sampling is turned off. The resulting metrics are normalized based on the sampling rate. This parameter can significantly reduce CPU overhead but the resulting tradeoff is a lower fidelity of protocol metrics. For more information, see Sampling Rate
Run the collector under the standard Unix/Linux
nice command with the value specified. Nice ensures that processes with higher priority get a larger chunk of the CPU time than a lower priority process. For more information, see Nice Value
OS Resource Limits¶
Overhead can also be limited using OS resource limits. The following are examples for limiting resources in some of the environments. All the examples limit the collector’s CPU to a maximum of 50% of a single CPU and to a maximum of 1024 MB of memory.
Before running your collector container, set the following parameters.
--cpu-period=100000 --cpu-quota=50000 --memory=1024m
If you use Docker 1.13 or higher, use --cpus instead.
Before deploying your collector container, set the following parameter in your collector manifest.
resources: requests: memory: "512Mi" cpu: "0.5" limits: memory: "1024Mi" cpu: "1.0"
Remote Stream Processing¶
The CPU and memory overhead is much smaller if stream processing is done outside the host where the collector is running. The outgoing bandwidth is higher is this mode but does not incur any overhead if stream processor(s) are running in the same VPC as the collectors. For more information, see standalone stream processor.