About The Job
Mercor
connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include
Benchmark
,
General Catalyst
,
Peter Thiel
,
Adam D'Angelo
,
Larry Summers
, and
Jack Dorsey
.
Position:
Language Model Evaluator
Type:
Full-time or Part-time Contract Work
Compensation:
$23/hour
Location:
Geography restricted to Egypt, Saudi Arabia, UAE, USA
Role Responsibilities
-
Evaluate LLM-generated responses on their ability to effectively answer user queries.
-
Conduct fact-checking using trusted public sources and external tools.
-
Generate high-quality human evaluation data by annotating response strengths, areas for improvement, and factual inaccuracies.
-
Assess reasoning quality, clarity, tone, and completeness of responses.
-
Ensure model responses align with expected conversational behavior and system guidelines.
-
Apply consistent annotations by following clear taxonomies, benchmarks, and detailed evaluation guidelines.
Qualifications
Must-Have
-
Bachelor’s degree
-
Native speaker or ILR 5/primary fluency (C2 on the CEFR scale) in Arabic
-
Significant experience using large language models (LLMs)
-
Excellent writing skills
-
Strong attention to detail
-
Adaptable and comfortable moving across topics, domains, and customer requirements
-
Background or experience in domains requiring structured analytical thinking
-
Excellent college-level mathematics skills
Preferred
-
Prior experience with RLHF, model evaluation, or data annotation work
-
Experience writing or editing high-quality written content
-
Experience comparing multiple outputs and making fine-grained qualitative judgments
-
Familiarity with evaluation rubrics, benchmarks, or quality scoring systems
Application Process (Takes 20–30 mins to complete)
-
Upload resume
-
AI interview based on your resume
-
Submit form
Resources & Support
PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.