Find The RightJob.
Seeking experienced researchers and technical experts to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help identify reasoning and problem-solving gaps in advanced STEM models. The role involves building real-world tasks with executable tests and analyzing model or agent behavior.
Design challenging, real-world STEM problems
Implement each task within an agentic development environment using Python
Similar jobs
Alibaba Cloud
Riyadh, Saudi Arabia
about 3 hours ago
Blue Book Global
Saudi Arabia
10 days ago
KAUST (King Abdullah University of Science and Technology)
Saudi Arabia
10 days ago
Marsh McLennan
Riyadh, Saudi Arabia
10 days ago
مجموعة الموسى الصحية
Al Khobar, Saudi Arabia
10 days ago
Sadara Chemical Company
Al Jubayl, Saudi Arabia
11 days ago
Devoteam
Riyadh, Saudi Arabia
11 days ago
© 2026 Qureos. All rights reserved.