Qureos

Find The RightJob.

PhD Rater - Remote

Seeking experienced researchers and technical experts to support a frontier-model evaluation project focused on agentic workflows. You will design and validate challenging benchmark tasks in data science, machine learning, finance, and coding to help identify reasoning and problem-solving gaps in advanced STEM models. The role involves building real-world tasks with executable tests and analyzing model or agent behavior.

Key Responsibilities

  • Design challenging, real-world STEM problems

  • Implement each task within an agentic development environment using Python

    Contract and Payment Terms

    • You will be engaged as an independent contractor.
    • This is a fully remote role that can be completed on your own schedule.
    • Projects can be extended, shortened, or concluded early depending on needs and performance.
    • Payments are weekly on Stripe or Wise based on services rendered.

© 2026 Qureos. All rights reserved.