Key Responsibilities
- Data Pipeline Development: Design and implement robust ETL/ELT pipelines using Databricks, PySpark, and Delta Lake to process structured and unstructured data efficiently.
- Manage job orchestration, scheduling, and workflow automation through Databricks Workflows or Airflow.
- Performance Optimization: Tune and optimize Databricks clusters and notebooks for performance, scalability, and cost-efficiency.
- Data Governance: Implement data governance and lineage using Unity Catalog and other platform-native features
- Collaboration: Work closely with data scientists, analysts, and business stakeholders to understand data requirements and deliver solutions that meet business needs.
- Cloud Integration: Leverage cloud platforms (AWS) to build and deploy data solutions, ensuring seamless integration with existing infrastructure.
- Data Modeling: Develop and maintain data models that support analytics and machine learning workflows.
- Automation & Monitoring: Implement automated testing, monitoring, and alerting mechanisms to ensure data pipeline reliability and data quality.
- Documentation & Best Practices: Maintain comprehensive documentation of data workflows and adhere to best practices in coding, version control, and data governance.
Required Qualifications
- Experience: 5+ years in data engineering, with hands-on experience using Databricks and Apache Spark.
- Programming Skills: Proficiency in Python and SQL;
- Cloud Platforms: Strong experience with cloud services such as AWS (e.g., S3, Glue, Redshift
- Data Engineering Tools: Familiarity with tools like Airflow, Kafka, and dbt.
- Data Modeling: Experience in designing data models for analytics and machine learning applications.
- Collaboration: Proven ability to work in cross-functional teams and communicate effectively with non-technical stakeholders.
Job Type: Full-time
Pay: ₹1,000,000.00 - ₹1,500,000.00 per year
Work Location: In person