About the Role
We are building a new class of clinical-grade AI agents powered by a foundational model that unifies 3D volumetric medical data and textual reasoning. These agents will assist with radiology workflows, treatment planning, case summarization, longitudinal patient analysis, and human-in-the-loop clinical decision support.
In this role, you will design the agent layer that transforms a powerful multimodal foundation model into a safe, reliable, adaptive medical assistant capable of analyzing 3D scans, retrieving past cases, explaining findings, and coordinating multi-step reasoning chains.
This position is ideal for researchers from multimodal/VLM/CLIP ecosystems who want to move up the stack into autonomous medical reasoning, multimodal alignment, and real-world clinical impact.
What You Will Work On
- Architect LLM-driven medical agents capable of:
- Reading & interpreting 3D medical volumes (MRI, CT, PET).
- Answering radiology questions, generating report drafts, and performing differential reasoning.
- Retrieving similar prior cases using contrastive embeddings.
- Synthesizing multi-instance evidence across scans, labs, and notes.
- Executing multi-step "clinical workflows"—image analysis summary recommendation.
- Integrate clinical agents with the 3D foundation model's representations, enabling structured medical reasoningand contextual memory across longitudinal data.
- Develop evaluation frameworks for trustworthiness, interpretability, calibration, and clinical safety.
- Collaborate with clinicians and scientists to co-design tasks, datasets, and validation protocols.
- Benchmark models against open and proprietary datasets: MIMIC, CheXpert, MosMed, LiTS, BraTS, and India-specific clinical corpora.
- Publish findings at top-tier venues (MICCAI, NeurIPS, AAAI, EMNLP).
- Work with regulatory and product teams to translate research into medically viable systems.
Why This Role Appeals to CLIP / VLM / Chitrarth / Patram Authors
- A rare opportunity to work at the intersection of agentic LLMs, multimodal 3D vision, and clinical intelligence.
- Ability to bring contrastive learning, cross-lingual modeling, multimodal alignment, and document understanding into the medical domain.
- Greenfield space: medical agent design is still unexplored compared to 2D VLMs.
- Freedom to publish, open source, and propose new clinical reasoning benchmarks.
- Attractive to researchers who want impact beyond academic settings—directly improving physician workflows and patient outcomes.
- A chance to create India's first medical foundation-model agent system, setting global standards.
What We're Looking For
- Experience developing LLM-based agents, tool-using models, chain-of-thought systems, or orchestrated pipelines.
- Strong background in multimodal learning, vision–language transformers, contrastive models, or clinical NLP.
- Familiarity with medical imaging, radiology workflows, DICOM pipelines, or 3D data representations.
- Expertise in PyTorch/JAX and large-scale training and inference.
- Ability to design safe, interpretable, and auditable reasoning systems.
- Strong publication record in multimodal, medical AI, agentic systems, or LLMs.
- Comfort collaborating with clinicians and biomedical researchers.
Nice to Have
- Experience with agent frameworks (LangChain, LLaMA Index, Haystack, custom pipelines).
- Background in clinical ontologies, SNOMED, RadLex, or UMLS.
- Contributions to medical AI challenges (BraTS, RSNA, MedVQA, MedIC).
- Prior work in multilingual or code-mixed medical text.
- Proven ability to take research concepts into production-grade systems.
What We Offer
- Competitive compensation.
- Access to high-compute clusters + large proprietary datasets.
- Full freedom to publish and define development direction.
- Cross-functional collaboration with clinicians, regulatory experts, and product teams.
- A clear path to building systems that meaningfully improve healthcare outcomes.