Context
Since 1.18, the Monitoring part of Conduktor Console has been externalized in an image called conduktor-platform-cortex
.
This image contains 3 components:
- Prometheus, to scrape the metrics from Conduktor Console
- Cortex, to store these metrics in your S3, or volumes
- Alert Manager, to setup alerts
These 3 components are external to Conduktor, and not maintained by us, but we use them in order to make our Monitoring work.
Issue
If you have many clusters, with a lot of resources (topics, consumer groups, partitions), you might hit some of the Cortex thresholds. You can notice it by seeing errors containing the messages below, in the logs of the Cortex container:
ingestion rate limit (25000) exceeded while adding 2000 samples and 0 metadata
per-metric series limit of 50000 exceeded
This means that some of your series won't be stored (you'll miss some monitoring information), and that your storage will grow, as this error will be spammed in the logs.
Solution
In order to get rid of this error, you can override Cortex configuration, so you increase these limits. But first, let's check how many time-series you are pulling from Conduktor Console, in order to know how much you should give as a limit.
Step 1: Check how many time-series you have
To check how many time-series you are pulling, you can enter in one of the two containers using bash, and run the following command line:
curl http://conduktor-platform:8080/monitoring/metrics | wc -l
In my case, the name of the container is conduktor-platform
and the port is 8080
. Please make sure you're changing it so it matches with your deployment.
The output should look like this:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 36898 100 36898 0 0 5659k 0 --:--:-- --:--:-- --:--:-- 6005k
279
The number of time-series is the one on the bottom left, in this case "279". This is really low, but it can easily increase if you have many clusters, topics, partitions, or consumer groups.
If you're hitting the ingestion rate limit, you should override it so it's higher than the number of time-series you're getting.
Step 2: Override the configuration
To override your Cortex configuration, you have to create a YAML file that contains what you want to override, based on Cortex configuration documentation. Then, you have to mount this YAML file into /opt/override-configs/cortex.yaml
.
In case you want to mount it in another directory, you must mention the new path in the environment variable CORTEX_OVERRIDE_CONFIG_FILE
.
For example, you can create a file cortex.yaml
like the following:
limits:
ingestion_rate: 50000
max_series_per_metric: 100000
Verification
If you want to verify that this config has been taken into account, you can enter in the conduktor-platform-cortex
container using bash, and type the following command line:
cat /var/conduktor/configs/monitoring-cortex.yaml
You should see the overridden properties in there.
Comments
0 comments
Please sign in to leave a comment.