Job Title: Confluent Kafka Infrastructure AdministratorLocation: Remote (Pak) / Hybrid (Riyadh, KSA)Employment Type: Full-Time / Contract-to-HireExperience Level: 5+ yearsRole Summary
As a
Confluent Kafka Infrastructure Administrator, you'll be the operational backbone for our Kafka ecosystem, handling the full lifecycle of clusters from installation to ongoing maintenance. You'll deploy and manage Confluent Platform in multi-environment setups (dev/test/UAT/preprod/prod/DR), ensuring high availability, security, and performance. This hands-on role involves automation, troubleshooting, and collaboration with engineering teams to support real-time data pipelines in production-scale environments.
Key Responsibilities-
Installation & Setup: Lead the deployment and initial configuration of Confluent Platform (Enterprise/Cloud editions) and Apache Kafka clusters across multiple environments (dev, test, UAT, preprod/prod, DR), including:
-
Installing brokers, ZooKeeper/KRaft controllers, Schema Registry, Kafka Connect, and ksqlDB using automated tools like Ansible, Terraform, or Confluent CLI/Helm charts on Kubernetes, bare-metal, or cloud (GCP).
-
Configuring topics, partitions, replication factors, and security (SSL/TLS encryption, SASL authentication, ACLs/RBAC via ZooKeeper or KRaft) with zero-downtime setups.
-
Integrating auxiliary services like MirrorMaker 2 or Confluent Replicator for cross-environment replication, and performing initial load testing/health checks with tools like Kafka's perf tools.
-
Automating environment provisioning for reproducibility, including VPC peering, load balancers, and integration with existing infrastructure.
-
Maintenance & Operations: Oversee day-to-day cluster management, upgrades, and patching in multi-environment landscapes, ensuring >99.9% uptime and compliance with SLAs.
-
Monitoring & Troubleshooting: Implement and maintain observability with Prometheus, Grafana, Confluent Control Center; set up alerts for metrics like lag, throughput, and broker health; perform root-cause analysis for incidents (e.g., replication failures, disk exhaustion).
-
Optimization & Scaling: Conduct capacity planning, performance tuning (e.g., JVM tweaks, partition balancing), and horizontal/vertical scaling; implement Tiered Storage for cost efficiency in large-scale setups.
-
Security & Compliance: Enforce security best practices (OAuth/Kerberos, auditing); manage access controls and ensure GDPR/HIPAA compliance across environments.
-
Automation & Documentation: Develop scripts (Python/Bash) for routine tasks like backups, rollouts, and failover; create runbooks and contribute to IaC pipelines (CI/CD with Jenkins/GitHub Actions).
-
Collaboration: Partner with data engineers, DevOps, and architects to support POCs, integrations (e.g., Flink/Spark), and environment migrations; provide 24/7 on-call support for prod incidents.
Required Qualifications & Skills-
Bachelor's in Computer Science/Engineering; Confluent Certified Administrator (CCAAK) or equivalent a plus.
-
5+ years in infrastructure ops; 3+ years hands-on with Confluent Platform & Apache Kafka (KRaft mode preferred).
-
Proven track record in deploying/managing Kafka clusters across multi-environments (dev/test/UAT/preprd/prod/DR); expertise in Ansible/Terraform/Helm for automated installs.
-
Proficiency in Bash/Python scripting; Kafka admin APIs, Connect, Streams, ksqlDB; Confluent CLI, REST Proxy.
-
Containerization (Docker/K8s); CI/CD (Jenkins/GitHub Actions); multi-cloud (AWS/GCP/Azure) with managed services like MSK.
-
Tools like Prometheus/Grafana/ELK; experience with upgrades, backups, DR (MirrorMaker/Replicator), and performance tuning.
-
SSL/TLS, SASL, ACLs/RBAC; auditing and compliance in distributed systems.
-
Strong troubleshooting, documentation, and communication; ability to handle on-call rotation.
LlA0WcKzEp