We are seeking Senior Data Engineers to play a pivotal role in the development of our Large Driving Model (LDM). In this role, you will take ownership of the data pipelines and curation strategies that fuel our models. You will be responsible for building miners that surface high-value examples, refining evaluation benchmarks, and collaborating directly with modeling teams to accelerate performance.
If you are a strong individual contributor with a deep background in Python, a passion for autonomous systems, and the ability to navigate complex data challenges independently, we want to hear from you.
-
Develop and maintain scalable data mining pipelines to extract rare and critical driving
scenarios from massive fleet datasets. -
Lead the curation and analysis of large-scale training sets to improve the robustness and
generalization of the LDM. -
Design and implement rigorous evaluation frameworks to measure model performance against real-world baselines.
-
Provide technical guidance and code reviews for junior and mid-level engineers, ensuring high standards for code quality and system design.
-
Partner closely with modeling engineers to define data requirements and iterate on rapid
experimental loops. -
Identify trends, anomalies, and data distribution shifts that impact model training and validation.
-
Contribute to the design of data infrastructure, making informed decisions on tradeoffs between data quality, scale, and compute efficiency.
-
Transition experimental data strategies into production-ready pipelines with a focus on Python-based automation.
-
B.S. or M.S. in Computer Science, Data Science, Robotics, Engineering, or a related field.
-
5+ years of experience as a Data Engineer, Machine Learning Engineer, or Motion
Planning Engineer, specifically within the autonomous driving, robotics, or computer vision space. -
Advanced proficiency in Python and SQL is required, with a proven track record of building production-grade data products.
-
Hands-on experience with distributed data processing frameworks (e.g., Spark, Ray) and cloud infrastructure.
-
Solid understanding of data-centric AI, dataset curation strategies, and model evaluation methodologies.
-
Experience handling large-scale sensor data or complex robotics telemetry.
-
Demonstrated ability to work autonomously, taking projects from a vague concept to a verified, high performing product.
-
Strong ability to communicate technical trade-offs and collaborate effectively within a fast-paced team environment.