Find The RightJob.
Seeking experienced researchers and technical experts to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help identify reasoning and problem-solving gaps in advanced STEM models. The role involves building real-world tasks with executable tests and analyzing model or agent behavior.
Design challenging, real-world STEM problems
Implement each task within an agentic development environment using Python
Similar jobs
Optimar Araştırma
Turkey
about 7 hours ago
Concentrix
Turkey
10 days ago
サイネオス・ヘルス
Turkey
11 days ago
HUGO BOSS
Izmir, Turkey
11 days ago
Commencis
Istanbul, Turkey
11 days ago
Hilton
Istanbul, Turkey
11 days ago
Arçelik
Turkey
11 days ago
© 2026 Qureos. All rights reserved.