Find The RightJob.

Software Engineer (Applied AI)

About Centralize

Enterprise sales runs on relationships, and every tool built to manage them is a database from 2005. Reps lose seven-figure deals because they can't see who actually matters inside an account. We're building the system of intelligence that replaces the CRM.

Centralize is the relationship intelligence platform for enterprise revenue teams. Webflow, Intercom, Brex, Cognition, LangChain, and Cresta use us to close their largest deals. We've grown 5x since last year, and customers are pulling us forward faster than we can ship.

We just raised a Series A led by NEA, bringing our total funding to $17M+ alongside Salesforce Ventures, Y Combinator, and operators including Cal Henderson (Co-Founder, Slack), Noah Weiss (former CPO, Slack), and sales leaders from Figma, Box, Dropbox, Anthropic, and Notion.

Centralize was founded by Rachit Kataria, a founding engineer on Facebook Shops who helped scale it to 250M MAUs, and Will Wang, who led the launch of Slack Huddles, the fastest-growing product in Slack's history.

Our Bar

We stay small on purpose. No passengers, no politics, no waiting for permission or for someone else to fix what's broken.

We own the unglamorous work alongside the exciting work. The 9pm customer request, the integration buried three layers deep, the bug nobody wants to touch. You go after it because it needs to get done, and because the next thing you build is better for it.

You won't have the answers handed to you. The roadmap, the architecture, the right call on a customer request, you'll be the one figuring it out. We hire people who are energized by ambiguity, not slowed down by it.

Your work shapes what Centralize becomes.

The Role

We are hiring an applied AI engineer to own the intelligence inside Centralize. The product's value depends on AI systems that map stakeholders, analyze deal health, and turn unstructured customer conversations into actions that drive revenue. You will own those systems end to end across the full AI stack: the multi-agent architectures and LLM pipelines, the classical ML and data science work that powers ranking, scoring, and entity resolution, and the eval and data infrastructure that makes all of it better over time.

This is a production engineering role with both an LLM lens and an ML/DS lens. Some problems at Centralize are best solved with a frontier model and a well-designed agent loop. Others are best solved with a classifier, an embedding model, a custom retriever, or a feature pipeline. You'll know which is which, and you'll build whichever one moves the metric.

This role is well-suited to engineers who have shipped LLM-powered products and trained or fine-tuned models in production, who think about evals and reliability before model selection, and who can move fluidly between prompt engineering, fine-tuning, and traditional ML when the problem demands it.

What You Will Do

Design and ship multi-agent systems that handle the hardest reasoning problems in the product: stakeholder mapping, account research, deal health analysis, conversation intelligence.
Own the LLM pipelines end to end: prompt engineering, retrieval, tool use, structured outputs, guardrails, and the orchestration glue that ties it all together.
Build and maintain the ML and DS work that LLMs aren't the right tool for: ranking models, classifiers, embedding models, entity resolution across messy CRM data, signal extraction from sales conversations.
Fine-tune models when frontier APIs aren't enough. Curate training data, design eval sets, run experiments, and ship the results to production.
Build the eval infrastructure that lets us ship AI features without breaking them. LLM-as-judge, human-in-the-loop, classical metrics for ML systems, regression suites. We grade on what works in production.
Own the data flywheel. The product generates rich signal from customer conversations, deal outcomes, and stakeholder interactions. Turn that into training data, eval data, and the feedback loops that compound over time.
Stay on the frontier. New models drop monthly. You'll know which ones move the needle for our use cases, when to switch, and when to wait.
Talk to customers. Sit on calls, see what's actually broken, and translate that into the AI capabilities that matter.

What Success Looks Like

Week 1: First eval suite shipped for an existing AI feature, with measurable accuracy improvement.
Day 14: Owning a major AI surface end to end, including the customer conversations that scoped it.
Day 30: A multi-agent system you architected is in production at customer scale, with the eval and observability infrastructure to keep improving it.

What We Are Looking For

Demonstrated experience shipping LLM-powered products to production with real customers and real evals. We can tell the difference between someone who's built demos and someone who's lived through the operational reality.
Demonstrated experience training, fine-tuning, or shipping classical ML models in production. Ranking, classification, embeddings, retrieval. You know when a 50ms classifier beats a $0.10 LLM call, and you know when it doesn't.
Strong fluency with multi-agent systems, tool use, function calling, RAG, and the orchestration patterns that make them reliable. Frameworks are tools, not religion.
Real expertise in evaluation across both LLM and ML systems. You think about evals before you think about prompts or features, because you've learned the hard way that you can't improve what you can't measure.
Strong backend engineering fundamentals. Most of this work lives in production services, not notebooks. Python is required; familiarity with TypeScript, Postgres, queues, and AWS is a major plus.
Sharp instinct for cost, latency, and reliability tradeoffs across the AI stack. You know when to reach for a frontier model, when to fine-tune a smaller one, and when to write a regex.
Excellent written and verbal English communication. You can write a doc that explains a model behavior to a non-technical PM and a customer demo that closes a deal.
Demonstrated ability to operate independently. We give you the goal, not the steps.

This Role Is Not For You If

You want to do AI research. We are an applied team. We use frontier models, we don't build them.
You only want to work on LLMs. Some of the most important work at Centralize is classical ML, ranking, and entity resolution. The right tool for the job, every time.
You think evals are someone else's problem. They are the most important thing you'll own.
You've only built demos or hackathon projects. We are looking for production scars.
You want a slower pace. We work hard and move quickly. Please only apply if that excites you.

Preferred Qualifications

Background as an MLE who has flexed into LLM application work, or as an LLM engineer with deep MLE foundations. The best candidates for this role are fluent in both worlds.
Experience fine-tuning open or closed models for specific tasks, including data curation, training infrastructure, and post-training evaluation.
Experience with multi-agent orchestration frameworks (LangGraph, Mastra, custom orchestrators) at production scale.
Experience with classical ML systems in production: ranking models, embedding models, entity resolution, recommendation systems.
Open-source contributions, technical blog posts, or papers on applied AI or ML work.
Direct exposure to enterprise sales cycles or B2B SaaS products.

The Team You'll Join

You'll work directly with Rachit and Will, alongside former founders and engineers from Coinbase, Gusto, Modern Treasury, and C3 AI.

Compensation and Logistics

Location: This role is open to remote candidates in the US, with a strong preference for candidates based in or willing to relocate to San Francisco or New York City. Most of the team works in person, and remote hires should expect regular travel to one of the two hubs.
Work Authorization: We are unable to sponsor visas. Candidates must have existing US work authorization.
Compensation: $170,000 to $220,000 base salary depending on level, plus 0.40% to 0.70% equity. Final offer calibrated to seniority and experience.