Find The RightJob.
Seeking experienced researchers and technical experts to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help identify reasoning and problem-solving gaps in advanced STEM models. The role involves building real-world tasks with executable tests and analyzing model or agent behavior.
Design challenging, real-world STEM problems
Implement each task within an agentic development environment using Python
Similar jobs
NorthBay Solutions
Egypt
4 days ago
YO IT CONSULTING
Egypt
4 days ago
Ventures Middle East (VME)
Cairo, Egypt
4 days ago
IQVIA
Egypt
4 days ago
NoGood
Egypt
4 days ago
Nano Health Suite
Egypt
4 days ago
YO IT CONSULTING
Egypt
4 days ago
© 2026 Qureos. All rights reserved.