- Design and implement large-scale, fault-tolerant data pipelines on OCI, using services like OCI Data Integration , OCI Data Flow (Apache Spark) , Object Storage , and Autonomous Database .
- Build and manage streaming data architectures using tools such as OCI GoldenGate , Apache Kafka , and Spark/Flink Streaming .
- Enforce standards and automation across the entire data lifecycle , including schema evolution, dataset migration, and deprecation strategies.
- Improve platform resilience, data quality, and observability with advanced monitoring, alerting, and automated data governance.
- Serve as a technical leader , mentoring junior engineers, reviewing designs and code, and promoting engineering best practices.
- Collaborate cross-functionally with ML engineers, platform teams, and data scientists to integrate data services with AI/ML workloads.
- Partner in AI pipeline enablement , ensuring Lakehouse services efficiently support model training, feature engineering, and real-time inference.
Minimum Qualifications
- Bachelor’s or Master’s degree in Computer Science , Engineering , or related technical field.
- 4–6 year’s experience designing and building cloud-based data pipelines and distributed systems .
- Proficiency in at least one core language: Python , Java , or Scala .
- Familiar with lakehouse formats (Iceberg, Delta, Hudi), file formats (Parquet, ORC, Avro), and streaming platforms (Kafka, Kinesis).
- Strong understanding of distributed systems fundamentals: partitioning , replication , idempotency , consensus protocols .
Engineering & Infrastructure
- 5+ years building distributed systems or production-grade data platforms in the cloud.
- Strong coding proficiency in Python , Java , or Scala , with an emphasis on performance and reliability.
- Expertise in SQL and PLSQL , data modeling, and query optimization.
- Proven experience with cloud-native architectures —especially OCI , AWS, Azure, or GCP.
Lakehouse & Streaming Mastery
- Deep knowledge of modern lakehouse/table formats : Apache Iceberg , Delta Lake , or Apache Hudi .
- Production experience with big data compute engines : Spark , Flink , or Trino .
- Skilled in real-time streaming and event-driven architectures using Kafka , Flink , GoldenGate , or Streaming .
- Experience managing data lakes , catalogs, and metadata governance in large-scale environments.
AI/ML Integration
- Hands-on experience enabling ML pipelines : from data ingestion to model training and deployment.
- Familiarity with ML frameworks (e.g., PyTorch , XGBoost , scikit-learn ).
- Understanding of modern ML architectures : including RAG , prompt chaining , and agent-based workflows .
- Awareness of MLOps practices , including model versioning, feature stores, and integration with AI pipelines.
️ DevOps & Operational Excellence
- Deep understanding of CI/CD , infrastructure-as-code (IaC), and release automation using tools like Terraform , GitHub Actions , or CloudFormation .
- Experience with Docker , Kubernetes , and cloud-native container orchestration .
- Strong focus on testing, documentation , and system observability (Prometheus, Grafana, ELK stack).
- Comfortable with cost/performance tuning , incident response, and data security standards (IAM, encryption, auditing).
Preferred Qualifications
- Experience with Oracle’s cloud-native tools : OCI Data Integration , Data Flow , Autonomous Database , GoldenGate , OCI Streaming .
- Experience with query engines like Trino or Presto , and tools like dbt or Apache Airflow .
- Familiarity with data cataloging , RBAC/ABAC , and enterprise data governance frameworks.
- Exposure to vector databases and LLM tooling (embeddings, vector search, prompt orchestration).
- Solid understanding of data warehouse design principles , star/snowflake schemas, and ETL optimization.
Soft Skills & Team Expectations
- Proven ability to lead technical initiatives independently end-to-end .
- Comfortable working in cross-functional teams and mentoring junior engineers.
- Excellent problem-solving skills , design thinking, and attention to operational excellence.
- Passion for learning emerging data and AI technologies and sharing knowledge across teams.