Find The RightJob.
Seeking experienced researchers and technical experts to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help identify reasoning and problem-solving gaps in advanced STEM models. The role involves building real-world tasks with executable tests and analyzing model or agent behavior.
Design challenging, real-world STEM problems
Implement each task within an agentic development environment using Python
Similar jobs
Valeo
Egypt
about 2 hours ago
Orange Business
Cairo, Egypt
10 days ago
Finaira
Egypt
10 days ago
PwC
Egypt
11 days ago
TP
Cairo, Egypt
11 days ago
Signify
Cairo, Egypt
11 days ago
eClerx
Cairo, Egypt
11 days ago
© 2026 Qureos. All rights reserved.