Contract Data/ML Engineer-Scoring Reliability & Candidate Archetypes (Part-time)

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Job: Contract Data/ML Engineer — Scoring Reliability & Candidate Archetypes - ASAP
Job Type: Part-time (for a 100-hour project)Job Presence: Remote (optional for Onsite and Hybrid in Vietnam)Candidate Location: Vietnam, IndiaJoining Date: ASAP as we're hiring this role urgently
Summary
Own the end-to-end implementation of two analytics features in Qode’s multi-agent assessment stack: (1) bootstrap confidence intervals (CIs) for per-question scores to communicate stability/disagreement across evaluators, and (2) candidate archetype discovery via clustering to surface talent patterns beyond raw scores. You’ll ship data plumbing, models, integrations, and lightweight reporting.
What you’ll do

Data foundations: ensure per-candidate, per-question, per-agent criterion scores are structured and queryable; add/modify tables and JSON schemas as needed.
Bootstrap CIs: implement agent-level resampling, compute CI-90/CI-95, derive stability labels (high/medium/low), and persist alongside normalized scores; batch backfill existing records.
Archetypes: build standardized candidate feature vectors (per-question and/or per-criterion), run clustering (K-means/GMM/hierarchical), evaluate (e.g., silhouette), and generate human-readable labels from centroids and summaries.
Integrations: expose CI fields and cluster IDs/labels via API and internal dashboards; add basic charts/UX to surface stability and “candidate type.”
Reliability & performance: write unit/integration tests, guardrails (min N agents), and ensure pipeline runtime stays within agreed budgets.
Docs & handoff: clear README/runbooks covering data contracts, thresholds, and ops.

Must-have skills and qualifications

3-5 years of experience in a relevant role
Python (pandas, NumPy, scikit-learn), SQL, DB migrations (e.g., Postgres).
Statistical resampling (bootstrap), clustering, model selection/validation.
Data engineering for batch jobs/backfills; API integration.
Pragmatic product sense for labeling clusters and communicating uncertainty.

Nice-to-haves

Airflow/dbt/Prefect; Grafana/Metabase; experience with multi-agent/LLM evaluation pipelines; cloud (GCP/AWS/Azure); Docker/Kubernetes.

Deliverables & acceptance criteria

CI service/module + persisted mean, ci_low, ci_high, stability_label for 100% of scored candidate-question rows with ≥N agents; reproducible backfill completed.
Clustering job that assigns cluster_id and cluster_label to each candidate; labels documented with centroid profiles and example candidates.
API fields and minimal dashboard tiles (score±CI, stability badge; “Candidate Type” with top strengths/weaknesses).
Tests (unit + E2E), monitoring hooks, and runbooks.

Similar jobs

No similar jobs found

Term of use Privacy policy