Find The RightJob.
Seeking experienced researchers and technical experts to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help identify reasoning and problem-solving gaps in advanced STEM models. The role involves building real-world tasks with executable tests and analyzing model or agent behavior.
Design challenging, real-world STEM problems
Implement each task within an agentic development environment using Python
Similar jobs
Enerjisa Üretim
Izmir, Turkey
4 days ago
YO IT CONSULTING
Turkey
4 days ago
YO IT CONSULTING
Turkey
4 days ago
YO IT CONSULTING
Turkey
4 days ago
YO IT CONSULTING
Turkey
4 days ago
YO IT CONSULTING
Turkey
4 days ago
YO IT CONSULTING
Turkey
4 days ago
© 2026 Qureos. All rights reserved.