Job: Contract Data/ML Engineer — Scoring Reliability & Candidate Archetypes - ASAP
Job Type: Part-time (for a 100-hour project)Job Presence: Remote (optional for Onsite and Hybrid in Vietnam)Candidate Location: Vietnam, IndiaJoining Date: ASAP as we're hiring this role urgently
Summary
Own the end-to-end implementation of two analytics features in Qode’s multi-agent assessment stack: (1) bootstrap confidence intervals (CIs) for per-question scores to communicate stability/disagreement across evaluators, and (2) candidate archetype discovery via clustering to surface talent patterns beyond raw scores. You’ll ship data plumbing, models, integrations, and lightweight reporting.
What you’ll do
- Data foundations: ensure per-candidate, per-question, per-agent criterion scores are structured and queryable; add/modify tables and JSON schemas as needed.
- Bootstrap CIs: implement agent-level resampling, compute CI-90/CI-95, derive stability labels (high/medium/low), and persist alongside normalized scores; batch backfill existing records.
- Archetypes: build standardized candidate feature vectors (per-question and/or per-criterion), run clustering (K-means/GMM/hierarchical), evaluate (e.g., silhouette), and generate human-readable labels from centroids and summaries.
- Integrations: expose CI fields and cluster IDs/labels via API and internal dashboards; add basic charts/UX to surface stability and “candidate type.”
- Reliability & performance: write unit/integration tests, guardrails (min N agents), and ensure pipeline runtime stays within agreed budgets.
- Docs & handoff: clear README/runbooks covering data contracts, thresholds, and ops.
Must-have skills and qualifications
- 3-5 years of experience in a relevant role
- Python (pandas, NumPy, scikit-learn), SQL, DB migrations (e.g., Postgres).
- Statistical resampling (bootstrap), clustering, model selection/validation.
- Data engineering for batch jobs/backfills; API integration.
- Pragmatic product sense for labeling clusters and communicating uncertainty.
Nice-to-haves
- Airflow/dbt/Prefect; Grafana/Metabase; experience with multi-agent/LLM evaluation pipelines; cloud (GCP/AWS/Azure); Docker/Kubernetes.
Deliverables & acceptance criteria
- CI service/module + persisted mean, ci_low, ci_high, stability_label for 100% of scored candidate-question rows with ≥N agents; reproducible backfill completed.
- Clustering job that assigns cluster_id and cluster_label to each candidate; labels documented with centroid profiles and example candidates.
- API fields and minimal dashboard tiles (score±CI, stability badge; “Candidate Type” with top strengths/weaknesses).
- Tests (unit + E2E), monitoring hooks, and runbooks.