Introduction
Zapyan is a fast-growing startup, founded in December 2021, dedicated to transforming digital operations through data-driven automation and cloud technologies.
We are embarking on a flagship initiative to build a Next-Generation Health Claims Fraud Audit Platform. This mission-critical system will leverage advanced Machine Learning for anomaly detection, Computer Vision for digitizing paper claims, and Generative AI for medical necessity reviews.
We are seeking a rigorous, hands-on Data Scientist to join our "Dream Team" of engineers and domain experts. In this role, you will be the "Brain" of the operation, designing the mathematical models and AI logic that detect fraudulent patterns in financial and healthcare data.
Location: Islamabad (Hybrid – 2 days in office)
Contract: 6 Months (with strong potential to extend based on project success)
Experience: Minimum 2 years post-graduation experience
The Project: Health Claims Fraud Audit Platform
You will be working on a greenfield project to build an intelligent audit engine from the ground up. The goal is to move beyond simple rule-based checks and implement:
- Anomaly Detection: Statistical models to flag outlier costs and billing patterns.
- Unstructured Data Analysis: Using RAG (Retrieval-Augmented Generation) and OCR to extract insights from medical notes and PDFs.
- Predictive Risk Scoring: Assigning risk scores to claims in real-time.
Key Responsibilities
Model Development & Fraud Logic
- Design and train Machine Learning models (Supervised and Unsupervised) to detect anomalies in healthcare claims data.
- Develop logic for "Medical Necessity" audits using NLP techniques to analyze unstructured text against policy documents.
- Perform rigorous statistical analysis to validate model accuracy and reduce false positives.
Data Pipeline & Engineering Collaboration
- Work closely with the Data Engineering team to ensure data is clean, structured, and ready for modeling.
- Collaborate on feature engineering to transform raw claims data into meaningful predictors of fraud.
- Assist in deploying models into a production environment (specifically Microsoft Fabric).
Insights & Communication
- Translate complex model outputs into clear risk scores that can be visualized by our BI team.
- Document model architecture, assumptions, and performance metrics for technical and non-technical stakeholders.
- Continuously monitor model performance and retrain as new fraud patterns emerge.
Skills & Experience
Must-Have (Required)
- Education: Bachelor’s degree in Data Science, Computer Science, Statistics, Mathematics, or a related field.
- Experience: Minimum 2 years of post-graduation experience working in a professional organization.
- Cloud Proficiency: Hands-on experience with at least one major cloud platform (AWS, GCP, or Azure). You must be comfortable working in cloud-based environments rather than just local machines.
- Machine Learning Mastery:
- Deep expertise in Python ML libraries: Scikit-learn (for classical ML), plus familiarity with TensorFlow or PyTorch (for deep learning/NLP).
- Proven experience developing Classification and Anomaly Detection models.
- Development Workflow: Proficiency with Jupyter Notebooks (or Databricks/Fabric Notebooks) for experimentation and data exploration.
- NLP & Unstructured Data: Experience using LLM frameworks (e.g., LangChain, Hugging Face) to process text data.
- SQL: Strong ability to write complex queries (joins, window functions) to independently extract training data.
Beneficial (Nice-to-Have)
- Microsoft Stack: Specific experience with Microsoft Fabric, Azure Machine Learning, or Synapse.
- Data Engineering: Familiarity with ETL pipelines and data warehousing concepts.
- Visualization: Ability to use Power BI to visualize model results.
- Domain: Experience in Healthcare, Banking, Fintech, or Fraud Detection.
Why Join Zapyan?
- Impact: You aren't just running queries; you are building the core IP of a product that solves a massive financial problem in the healthcare sector.
- Tech Stack: Get hands-on experience with the latest enterprise tech stack, including Microsoft Fabric, Vector Databases, and LLM Agents.
- Collaboration: Work alongside a highly specialized team of Cloud Architects, BI Leads, and Domain Experts.
- Flexibility: Hybrid work model (2 days in office, 3 days remote) to support work-life balance.
- Growth: $200 annual professional development budget and exposure to cutting-edge AI practices.
- Leave: 25 days of paid leave (pro-rated for the contract duration).
Final Notes
This role is ideal for a Data Scientist who wants to move beyond "theoretical" modeling and build a system that goes into production. If you have a strong statistical foundation and are curious about how Data Science integrates with modern Cloud Engineering (Azure/Fabric), we want to hear from you.
Job Type: Full-time
Application Question(s):
- What year did you graduate, and how many years of professional experience have you gained since then?
- What specific part of the job description attracted you to apply for this role?
Education:
Work Location: Hybrid remote in Islamabad G-8/Markaz