Data Infrastructure & Operations
Offer flexible and secure data ingestion, streaming, transformation, analytics and data lake storage paired with self-service compute & ML workspaces so that In-house data teams can spin-up services and create pipelines as per their business requirements
Job description
- Looking for 4-6 years of hand-ons experience with production-level development and operations on AWS or Azure Cloud
- Develop and maintain infrastructure-as-code using Terraform to deploy and manage Kubernetes clusters (AKS) and Databricks environments
- Hands-on experience with data pipeline orchestration tools like Azure Data Factory, Amazon data Pipeline, Apache Spark, Databricks
- Hands-on experience with one or more of stream & batch processing systems: Kafka (Confluent cloud, open source), Apache Storm, Spark-Streaming, Apache Flink, Kappa architecture
- Experience in architecting right storage strategy for use-cases, keeping data processing, data accessibility, data availability and cloud cost considerations
- Proficiency in Data transformation using Kstreams App/KSQL/Processor Libraries
- Data ingestion and data distribution integration experience using managed connectors such as Event Hubs, kafka topics, ADLS2, REST APIs
- Proficiency to set-up & manage open-source stack, including Airflow, Druid, Kafka (open source) OpenSearch, and Superset
- Proficiency in Python scripting for automation and integration tasks
- Utilize FastAPI for building and deploying high-performance APIs
- Handling requirements of managed services, IAM, auto-scaling, High availability, elasticity, networking options
- Handle federated access to cloud computing resource (or set of resources) based on a user's role within the organization
- Proficiency with Git, including branching/merging strategies, Pull Requests, and basic command line functions
- Proficiency in DevSecOps practices throughout the product lifecycle including fully managed Day 2 Ops leveraging Datadog
- Shared access controls to support multi-tenancy and self-service tooling for customers
- Manage data catalog per topic or domain based on services & use-cases offered
- Research, investigate and bring new technologies to continually evolve data platform capabilities
- Experience in working under Agile scrum Methodologies