Context
This is a guide for troubleshooting Conduktor Gateway SNI Routing.
The goal is to work through each layer of the deployment to validate the configuration in a systematic manner.
We recommend deploying Gateway in stages for clarity when troubleshooting issues:
- Start with a single Gateway instance and a single Kafka cluster
- Confirm Gateway can startup successfully
- Confirm Gateway can connect to the Kafka cluster
- Validate clients can connect to Kafka via Gateway through SNI routing
We would recommend not setting up client authorisation, until you have validated SNI routing works, and thus this guide does not cover this topic.
This guide is broken down into four sections:
- Infrastructure Checklist
- Gateway Checklist
- Client Connectivity
- TLS Errors
Infrastructure Checklist
-
Load Balancer in Passthrough Mode
- Ensure that the load balancer is set to passthrough mode.
- This allows the TLS handshake to occur directly between the client and the gateway.
- Ensure that the load balancer is set to passthrough mode.
-
TLS Termination
- Ensure TLS terminates at the Gateway, not at the load balancer.
- Check that the load balancer does not interfere with the TLS handshake. This will depend on the type of load balancer in use.
- Ensure TLS terminates at the Gateway, not at the load balancer.
-
Port Consistency
- Ensure the ports between the load balancer and the Gateway are consistent.
- The load balancer should be configured on port 6969 for clients to route properly to the Gateway, which defaults to port 6969.
- Alternatively, if you wish to use a different port, you will need to configure the following property on gateway
-
GATEWAY_ADVERTISED_SNI_PORT=<your desired port number>
- Ensure the ports between the load balancer and the Gateway are consistent.
-
DNS Entries for Kafka Brokers
- Confirm that the DNS entries for the Kafka brokers follow this format:
<host-prefix><cluster-id><broker-id>.<advertised-host>
- If on version 3.5.0 or later (November 2024 release) the default format is
<host-prefix><cluster-id><broker-id>-<advertised-host>
- Confirm that the DNS entries for the Kafka brokers follow this format:
Gateway Checklist
-
Validate if Gateway has started Successfully
- Validate Gateway has started successfully by searching for the following in the log file:
-
Gateway started successfully with SNI routing
-
- Validate Gateway has started successfully by searching for the following in the log file:
-
Verify Gateway Configuration
- If Gateway fails to start, validate your configuration by comparing it to the following:
gatewayClusterId: "gateway"
gatewayRackId: null
gatewayGroupId: null
kafkaSelector: !<env>
prefix: "KAFKA_"
hostPortConfiguration:
bindHost: "0.0.0.0"
advertisedHost: "conduktor-gateway.staging.company.com"
hostPrefix: "broker"
portCount: 1
portStart: 6969
minBrokerId: 0
tenantInHostname: false
advertisedSniPort: 6969
sniHostSeparator: "."
routing: "host"
authenticationConfig:
securityProtocol: "SSL"
sslConfig:
keyStore:
keyStorePath: "/etc/gateway/tls/keystore.jks"
keyStorePassword: "***"
keyPassword: "***"
keyStoreType: "jks"
incomplete: false
trustStore:
trustStorePath: null
trustStorePassword: null
trustStoreType: "jks"
clientAuth: "NONE"
sslPrincipalMappingRules: null
incomplete: true
updateContextIntervalMinutes: 5
connectionsMaxReauthMs: 0
oauth: null
authorizationCacheMs: 5000
mandatoryVCluster: false - Ensure that:
- The advertisedHost is set correctly.
- The keyStore and trustStore paths and passwords are properly configured.
- The SSL configuration and security protocols match the environment.
- Note, as of Gateway 3.5.0 (November 2024 release) sniHostSeparator: "." has changed to sniHostSeparator: "-"
- If you notice any discrepancies, update the configuration and restart the Gateway and search for the startup message again
- If Gateway fails to start, validate your configuration by comparing it to the following:
-
Validate Gateway can connect to Kafka
- Validate that Gateway can connect to the Kafka cluster by searching the logs for the following:
-
successfully authenticated with <list of kafka broker ids>
-
- If Gateway is not able to connect to Kafka, enable the following debug logging:
-
LOG4J2_ORG_APACHE_KAFKA_LEVEL=debug
-
- Validate that Gateway can connect to the Kafka cluster by searching the logs for the following:
-
Enable Debug Logging
- If Gateway starts successfully, and clients are still unable to connect, enabled debug logging by setting the following environment variables and restarting Gateway:
LOG4J2_IO_CONDUKTOR_PROXY_NETWORK_LEVEL=debug LOG4J2_IO_CONDUKTOR_PROXY_SERVICE_LEVEL=debug LOG4J2_IO_CONDUKTOR_UPSTREAM_THREAD_LEVEL=debug LOG4J2_IO_CONDUKTOR_PROXY_AUTHORIZATION_LEVEL=debug
-
After enabling logging, check the logs for any errors related to:
-
network
-
upstream thread handling
-
service-level issues
-
- If Gateway starts successfully, and clients are still unable to connect, enabled debug logging by setting the following environment variables and restarting Gateway:
Client Connectivity
1. Validate Connectivity via the Kafka Console Producer
We recommend using the Kafka Console Producer/Consumer to validate client connectivity, as it provides very robust logging for both TLS errors as well as Authentication issues.
- Install the JVM if not already present on the host (note: some of these commands may differ slightly depending on the version of the Linux distro you are running)
- Arch Linux:
-
sudo pacman -S jdk-openjdk
-
- Debian/Ubuntu:
-
sudo apt update
-
sudo apt install openjdk-17-jdk
-
- RHEL:
-
sudo yum install java-17-openjdk-devel
-
- FEDORA:
-
sudo dnf install java
-
- Arch Linux:
- Download and decompress the Apache Kafka binaries:
-
wget -q --show-progress https://downloads.apache.org/kafka/3.8.0/kafka_2.13-3.8.0.tgz -O kafka_latest.tgz && tar -xzf kafka_latest.tgz
-
- Create a client configuration file called client-ssl.properties:
-
security.protocol=SSL
ssl.truststore.location=/path/to/client.truststore.jks
ssl.truststore.password=your_truststore_password
# If mutual TLS is required, also add the keystore settings:
ssl.keystore.location=/path/to/client.keystore.jks
ssl.keystore.password=your_keystore_password
ssl.key.password=your_key_password
-
- Run the Console Producer using the client-ssl.properties:
-
$ ./kafka-console-producer.sh \
--broker-list <BROKER_HOST:PORT> \
--topic <TOPIC_NAME> \
--producer.config client-ssl.properties|tee producer.log
-
- You should be able to Produce a message to the topic you have selected, if are not able to, check the following:
- You have the correct Broker URL and Port number
- The topic exists or that that kafka cluster allows topic creation from the client
- Check for TLS errors in the producer.log
2. Validate connectivity via kcat
If you are unable to deploy the Kafka Console Producer/Consumer to test with, you can use kcat. Note, the logging in kcat is not as robust as the logging in the Kafka Console Producer/Consumer so the issue may not be as clear.
- Download kcat if not already present in the environment:
- Run kcat as follows:
-
kcat -b <BROKER_HOST:PORT> -C -t <TOPIC_NAME> -X security.protocol=ssl \
-X ssl.ca.location=/etc/ssl/certs/ca-certificates.crt \
-X ssl.certificate.location=/path/to/client-cert.pem \
-X ssl.key.location=/path/to/client-key.pem|tee kcat.log
-
- You should be able to consume messages from the Kafka topic <your_topic>, if you are unable to, check the following:
- You have the correct Broker URL and Port number
- The topic exists and has messages in it
- Check for TLS errors in the kcat.log
3. Validate connectivity using OPENSSL
It's also possible to debug TLS issues specifically, using OpenSSL as a client to connect to Kafka. While OpenSSL does not "speak Kafka", it can still pass or fail the TLS handshake process and give you logging to tell you where any issue with TLS is.
- Validate you have OpenSSL installed:
-
openssl --version
-
- Run the openssl client as follows:
-
openssl s_client -connect <BROKER_HOST>:<BROKER_PORT> -tls1_2 -servername <BROKER_HOST> -debug 2>&1 | tee tls_logs.txt
-
- You should be able to complete the TLS connection, if you are unable to, check the following:
- You have the correct Broker URL and Port number
- Check for TLS errors in the tls_logs.txt
TLS Errors
The following is a list of TLS errors and their meanings, to aid in troubleshooting.
-
SSL handshake failure
- Meaning: The client and server couldn't agree on a TLS version or cipher suite during the handshake.
-
Possible Causes:
- Incompatible TLS versions between client and server.
- Cipher suites are not mutually supported.
- Server misconfiguration or client misconfiguration.
-
Certificate verify failed
- Meaning: The server's certificate could not be verified.
-
Possible Causes:
- The certificate is not signed by a trusted Certificate Authority (CA).
- The certificate chain is incomplete or missing.
- The client doesn't have the CA certificate needed to verify the server's certificate.
- Expired or revoked certificate.
-
Unable to get local issuer certificate
- Meaning: The client cannot find the issuer certificate for the server’s certificate.
-
Possible Causes:
- The server’s certificate chain is missing intermediate certificates.
- The client’s trust store does not contain the necessary CA certificate.
-
Wrong version number
- Meaning: The client and server attempted to use different or incompatible protocol versions.
-
Possible Causes:
- TLS/SSL version mismatch (e.g., client using TLS 1.2, but the server only supports TLS 1.3 or vice versa).
- The server does not support the version of TLS specified by the client.
-
No shared cipher
- Meaning: The client and server could not agree on a mutually supported cipher suite.
-
Possible Causes:
- Misconfigured cipher suites on either client or server.
- Server may be configured to only use strong ciphers, while the client does not support them.
-
SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
- Meaning: The certificate presented by the server could not be verified during the handshake.
-
Possible Causes:
- The certificate is expired or invalid.
- The certificate chain is incomplete.
- The CA is not trusted or known by the client.
- TLSv1 alert unknown CA
- Meaning: The server does not recognize the certificate authority that issued the client’s certificate.
-
Possible Causes:
- The server does not trust the client's CA.
- The client is using a self-signed certificate or a certificate from a CA that is not trusted by the server.
-
Peer did not return certificate
- Meaning: The server did not provide a certificate during the handshake.
-
Possible Causes:
- The server may not be configured correctly to provide its certificate.
- Client may have requested a certificate from a non-TLS or incorrectly configured server.
-
Handshake failure alert
- Meaning: The server sent a handshake failure alert in response to the client.
-
Possible Causes:
- Incorrect TLS version or cipher suite configuration.
- Server policy may require mutual TLS (mTLS), but the client did not provide a certificate.
-
TLSv1 alert decrypt error
- Meaning: An error occurred during the decryption process in the TLS handshake.
-
Possible Causes:
- Incorrect or corrupted certificate or private key.
- An error in the underlying encryption algorithm, possibly due to a cipher mismatch or bad data.
-
TLSv1 alert protocol version
- Meaning: The server does not support the TLS version requested by the client.
-
Possible Causes:
- The client is using a deprecated or unsupported version of TLS (e.g., TLS 1.0, 1.1).
- The server is configured to accept only certain versions of TLS (e.g., only TLS 1.2 or TLS 1.3).
-
TLSv1 bad certificate
- Meaning: The certificate provided by the client or server is bad or invalid.
-
Possible Causes:
- The certificate might be expired, revoked, or incorrectly formatted.
- Certificate not matching the hostname.
-
error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
- Meaning: The client and server are trying to communicate using different protocols (e.g., the client is using TLS but the server is expecting plaintext).
-
Possible Causes:
- Attempting to connect to a non-TLS port or an improperly configured service.
- Protocol mismatch (e.g., Kafka broker might not be set up for SSL).
-
unable to get issuer certificate
- Meaning: The client could not find the issuer certificate of the server's certificate in its trust store.
-
Possible Causes:
- Missing intermediate certificates in the certificate chain.
- Incorrect certificate chain configuration on the server.
-
tlsv1 alert internal error
- Meaning: A general internal error occurred on the server during the handshake process.
-
Possible Causes:
- Server misconfiguration.
- Resource limitations or security policies preventing successful connection.
Comments
0 comments
Please sign in to leave a comment.