Caseproof is the company behind MemberPress, the most widely used WordPress membership platform. We're privately held, profitable, and bootstrapped. We're building a new AI-powered product, and we're hiring the engineer who will own the AI substrate that powers it.
The Role
You'll own the AI core of a new product that will be used weekly by thousands of paying customers. Inference pipeline, retrieval, prompt versioning, eval suite, cost discipline, model-improvement loop... all of it.
This is a shipping role, not a research role. We're not training models or publishing papers. We're building a production system that has to be accurate, cheap to run, and steadily better month over month. You'll report to our senior engineering lead and work alongside a small, focused team.
What You'll Do
- Design and run the inference pipeline. Retrieval-augmented generation with structured tool calls, citation-grounded responses, tier-aware model routing.
- Own prompt versioning and the eval suite. Real evals, with adversarial cases and release-blocking gates. Vibes-based evals don't ship here.
- Own cost telemetry and cost discipline. Per-user caps, model routing enforcement, caching, abuse detection. The product has a free tier; you're accountable for keeping it profitable at scale.
- Build the feedback loop that makes the system improve over time.
- Iterate on quality continuously based on real-customer signals.
What We Need
- You have personally shipped at least one production LLM-powered product or feature that real users rely on. You can describe its prompts, evals, and cost telemetry in detail because you built them.
- You've experienced and recovered from at least one of: prompt drift, model version regression, retrieval quality degradation, cost overrun, hallucination incident, eval-suite failure that blocked a release.
- Strong full-stack engineering. Comfortable in Python or TypeScript, comfortable with Postgres and SQL, comfortable owning integrations end-to-end.
- Vector store experience (pgvector, Pinecone, or equivalent).
- Hands-on Anthropic API experience preferred; OpenAI API also fine.
- You think about prompt drift, eval coverage, cost discipline, and feedback loops as engineering surfaces. Not afterthoughts.
What We Don't Need
- Research scientists. We're not training models.
- LangChain wrappers. We own our orchestration in-house.
- Resumes heavy on AI buzzwords and light on shipped systems.
Compensation
Generous compensation & bonus structure. Health, dental, and vision for US employees. Remote-first with US-Mountain Time overlap preferred.
How to Apply
Send the following:
- A short cover note (under 300 words) describing the most interesting LLM-powered system you've shipped, what was hard about it, and what you'd do differently next time.
- A link to a portfolio piece, GitHub repo, or write-up that shows your work.
- Your expected compensation range and your earliest start date.
Process
We move fast — about 2–3 weeks end to end.
- 30-minute screen with the CEO.
- 60–90 minute technical conversation with the engineering lead. Walk us through one of your shipped LLM systems in depth.
- Paid week-long trial project on a scoped problem.
- Final conversation. Offer typically within 48 hours.