Job Summary:
We are seeking a skilled and proactive Data Engineer to join our team. The ideal candidate will have strong coding abilities, hands-on experience with data warehousing concepts, and a solid foundation in DevOps practices. This role requires flexibility to work in shifts and a collaborative mindset to support data-driven initiatives across the organization.
Must-Have Skills:
- Proficient in Python programming
- Experience with Git for version control
- Strong understanding of DevOps principles
- Advanced SQL skills
- Solid grasp of Data Warehousing (DWH) concepts with hands-on experience
- Willingness to work in shift-based schedules
Good-to-Have Skills:
- Experience with Snowflake
- Familiarity with Power BI or Looker for data visualization
Key Responsibilities:
- Infrastructure and Operations: Ensure the reliability and scalability of critical systems by designing and managing robust infrastructure solutions.
- System Monitoring: Proactively monitor system health, using performance metrics and automated tools to detect potential issues before they impact users.
- Incident Management: Lead response efforts during service disruptions, ensuring swift resolution and minimal downtime.
- Problem Solving: Analyze root causes of system failures and implement long-term fixes to enhance system reliability.
- Automation: Develop scripts and tools to automate repetitive tasks, improving operational efficiency and reducing manual interventions.
- Collaboration: Partner with development teams to align on reliability goals and implement best practices into software design and deployment.
- Documentation: Maintain comprehensive system documentation to support consistent and efficient troubleshooting and knowledge sharing.
- Continuous Improvement: Drive innovation by identifying areas for enhancement and applying cutting-edge technologies and operational practices.
Qualifications:
- Service Reliability : Experience with managing and maintaining highly-available systems, including cloud-based infrastructure.
- Programming : Proficiency in programming to automate repetitive tasks ("toil") to reduce manual effort and human error.
- Monitoring & Observability : Solid understanding of monitoring tools, incident management platforms, and metrics analysis.
- Technical Depth : Deep knowledge of system performance optimization and troubleshooting methodologies. Experience with cloud platforms, databases, CI/CD, distributed systems, and security best practices.
- Communication & Collaboration : Strong communication skills (written and verbal) to effectively collaborate across cross-functional teams.
- Problem Solving : Ability to thrive in high-pressure situations and demonstrate a calm, methodical approach to problem-solving. Analytical mindset for interpreting data, metrics, and patterns to make informed decisions and predict future issues.
- Systemic Thinking: Ability to view interconnected systems holistically anticipating the broader impact of changes and designing for resilience.
- Ownership and Proactiveness: Take responsibility for the reliability and performance of services. Proactively identifying potential problems, performance bottlenecks, and areas for improvement before they impact users.
Education & Experience
- Education : Bachelor’s or Master’s Degree in Information systems, Computer Science / Computer Engineering or equivalent.
- Experience: 5-8 yrs of experience