Qureos

FIND_THE_RIGHTJOB.

Confluent Kafka Infrastructure Administrator

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Job Title: Confluent Kafka Infrastructure Administrator
Location: Remote (Pak) / Hybrid (Riyadh, KSA)
Employment Type: Full-Time / Contract-to-Hire
Experience Level: 5+ years

Role Summary

As a Confluent Kafka Infrastructure Administrator , you'll be the operational backbone for our Kafka ecosystem, handling the full lifecycle of clusters from installation to ongoing maintenance. You'll deploy and manage Confluent Platform in multi-environment setups (dev/test/UAT/preprod/prod/DR), ensuring high availability, security, and performance. This hands-on role involves automation, troubleshooting, and collaboration with engineering teams to support real-time data pipelines in production-scale environments.

Key Responsibilities
  • Installation & Setup: Lead the deployment and initial configuration of Confluent Platform (Enterprise/Cloud editions) and Apache Kafka clusters across multiple environments (dev, test, UAT, preprod/prod, DR), including:
    • Installing brokers, ZooKeeper/KRaft controllers, Schema Registry, Kafka Connect, and ksqlDB using automated tools like Ansible, Terraform, or Confluent CLI/Helm charts on Kubernetes, bare-metal, or cloud (GCP).
    • Configuring topics, partitions, replication factors, and security (SSL/TLS encryption, SASL authentication, ACLs/RBAC via ZooKeeper or KRaft) with zero-downtime setups.
    • Integrating auxiliary services like MirrorMaker 2 or Confluent Replicator for cross-environment replication, and performing initial load testing/health checks with tools like Kafka's perf tools.
    • Automating environment provisioning for reproducibility, including VPC peering, load balancers, and integration with existing infrastructure.
  • Maintenance & Operations: Oversee day-to-day cluster management, upgrades, and patching in multi-environment landscapes, ensuring >99.9% uptime and compliance with SLAs.
  • Monitoring & Troubleshooting: Implement and maintain observability with Prometheus, Grafana, Confluent Control Center; set up alerts for metrics like lag, throughput, and broker health; perform root-cause analysis for incidents (e.g., replication failures, disk exhaustion).
  • Optimization & Scaling: Conduct capacity planning, performance tuning (e.g., JVM tweaks, partition balancing), and horizontal/vertical scaling; implement Tiered Storage for cost efficiency in large-scale setups.
  • Security & Compliance: Enforce security best practices (OAuth/Kerberos, auditing); manage access controls and ensure GDPR/HIPAA compliance across environments.
  • Automation & Documentation: Develop scripts (Python/Bash) for routine tasks like backups, rollouts, and failover; create runbooks and contribute to IaC pipelines (CI/CD with Jenkins/GitHub Actions).
  • Collaboration: Partner with data engineers, DevOps, and architects to support POCs, integrations (e.g., Flink/Spark), and environment migrations; provide 24/7 on-call support for prod incidents.
Required Qualifications & Skills
  • Bachelor's in Computer Science/Engineering; Confluent Certified Administrator (CCAAK) or equivalent a plus.
  • 5+ years in infrastructure ops; 3+ years hands-on with Confluent Platform & Apache Kafka (KRaft mode preferred).
  • Proven track record in deploying/managing Kafka clusters across multi-environments (dev/test/UAT/preprd/prod/DR); expertise in Ansible/Terraform/Helm for automated installs.
  • Proficiency in Bash/Python scripting; Kafka admin APIs, Connect, Streams, ksqlDB; Confluent CLI, REST Proxy.
  • Containerization (Docker/K8s); CI/CD (Jenkins/GitHub Actions); multi-cloud (AWS/GCP/Azure) with managed services like MSK.
  • Tools like Prometheus/Grafana/ELK; experience with upgrades, backups, DR (MirrorMaker/Replicator), and performance tuning.
  • SSL/TLS, SASL, ACLs/RBAC; auditing and compliance in distributed systems.
  • Strong troubleshooting, documentation, and communication; ability to handle on-call rotation.

© 2025 Qureos. All rights reserved.