Context

Since 1.18, the Monitoring part of Conduktor Console has been externalized in an image called conduktor-platform-cortex.

This image contains 3 components:

Prometheus, to scrape the metrics from Conduktor Console
Cortex, to store these metrics in your S3, or volumes
Alert Manager, to setup alerts

These 3 components are external to Conduktor, and not maintained by us, but we use them in order to make our Monitoring work.

Issue

If you have many clusters, with a lot of resources (topics, consumer groups, partitions), you might hit some of the Cortex thresholds. You can notice it by seeing errors containing the messages below, in the logs of the Cortex container:

ingestion rate limit (25000) exceeded while adding 2000 samples and 0 metadata

per-metric series limit of 50000 exceeded

This means that some of your series won't be stored (you'll miss some monitoring information), and that your storage will grow, as this error will be spammed in the logs.

Solution

In order to get rid of this error, you can override Cortex configuration, so you increase these limits. But first, let's check how many time-series you are pulling from Conduktor Console, in order to know how much you should give as a limit.

Step 1: Check how many time-series you have

To check how many time-series you are pulling, you can enter in one of the two containers using bash, and run the following command line:

curl http://conduktor-platform:8080/monitoring/metrics | wc -l

In my case, the name of the container is conduktor-platform and the port is 8080. Please make sure you're changing it so it matches with your deployment.

The output should look like this:

 % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 36898 100 36898 0 0 5659k 0 --:--:-- --:--:-- --:--:-- 6005k
279

The number of time-series is the one on the bottom left, in this case "279". This is really low, but it can easily increase if you have many clusters, topics, partitions, or consumer groups.

If you're hitting the ingestion rate limit, you should override it so it's higher than the number of time-series you're getting.

Step 2: Override the configuration

To override your Cortex configuration, you have to create a YAML file that contains what you want to override, based on Cortex configuration documentation. Then, you have to mount this YAML file into /opt/override-configs/cortex.yaml.

In case you want to mount it in another directory, you must mention the new path in the environment variable CORTEX_OVERRIDE_CONFIG_FILE.

For example, you can create a file cortex.yaml like the following:

limits:
  ingestion_rate: 50000
  max_series_per_metric: 100000

Verification

If you want to verify that this config has been taken into account, you can enter in the conduktor-platform-cortex container using bash, and type the following command line:

cat /var/conduktor/configs/monitoring-cortex.yaml

You should see the overridden properties in there.

How to remove Cortex limit reached logs?

Context

Issue

Solution

Step 1: Check how many time-series you have

Step 2: Override the configuration

Verification

Comments

Context

Issue

Solution

Step 1: Check how many time-series you have

Step 2: Override the configuration

Verification

Related articles