JOB_REQUIREMENTS
Employment Type
Not specified
Company Location
Not specified
The Role
We are seeking a skilled and detail-oriented Data Engineer to join our team and support a transformative Generative AI (GenAI) initiative. In this role, you will be responsible for designing, developing, and maintaining the data infrastructure that powers advanced AI models. You will work closely with data scientists, analysts, and other stakeholders to ensure efficient data workflows, reliable pipelines, and scalable systems that enable high-performance AI applications. Requirements: • Proficiency in Python is a must and other programming languages relevant to data engineering, such as Scala, or Java, and familiarity with Big Data processing frameworks (e.g., Apache Spark, Hadoop). • Build and maintain data lakes, warehouses, and feature stores optimized for GenAI workloads. • Experience with relational SQL and NoSQL databases, including MySQL, PostgreSQL, MongoDB, and Cassandra. • Knowledge of data modeling, ETL (Extract, Transform, Load) processes, and data warehousing solutions. • Understanding of machine learning model requirements in terms of data preparation, feature engineering, and data augmentation. • Skills in implementing data pipelines and workflows using tools like Apache Airflow, Luigi, or similar technologies. • Understanding of big data technologies and practices, including data lakes, data streams, and real-time data processing. • Demonstrated ability to design and implement effective data architecture that supports large-scale, high-performance data processing. • Ability to work closely with data scientists, machine learning engineers, and software developers to understand data needs and deliver solutions that support AI model development. • Eagerness to explore new tools and technologies that can enhance the data infrastructure and support the evolving needs of generative AI product development. • Experience with version control systems (e.g., Git) and collaboration tools (e.g., GitLab) for managing codebases and documentation.
Requirements
Skills & Experience: • Proven experience in data engineering, preferably in AI/ML or GenAI environments. • Strong technical background in AI technologies, including machine learning, natural language processing, and deep learning. • Knowledge of MLOps tools and practices (e.g., MLflow, DVC). • Excellent communication and presentation skills, with the ability to convey complex technical concepts to non-technical stakeholders. • Strong problem-solving skills and a creative approach to developing innovative solutions. • Strong ability to work collaboratively with cross-functional teams. • Agile methodology experience is preferred. • Bachelor degree in computer science, Data Science, Information Technology, or a related field with a strong emphasis on data engineering, database design, or big data technologies.
About the company
Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organization of 350,000 team members in more than 50 countries. With its strong 55-year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms. The Group reported in 2022 global revenues of 22 billion.