DS (Vector Search + GCP )- Bangalore
Bangalore
JOB DESCRIPTION
Data/Applied scientist (Search)
- Strong in Python and experience with Jupyter notebooks, Python packages like
polars, pandas, numpy, scikit-learn, matplotlib, etc.
- Must have: Experience with machine learning lifecycle, including data
preparation, training, evaluation, and deployment
- Must have: Hands-on experience with GCP services for ML & data science
- Must have: Experience with Vector Search , Hybrid Search techniques, Query preprocessing
- Must have: Experience with embeddings generation using models like BERT, Sentence
Transformers, or custom models
- Must have: Experience in embedding indexing and retrieval (e.g.,
Elastic, FAISS, ScaNN, Annoy)
- Must have: Experience with LLMs and use cases like RAG (Retrieval-Augmented Generation)
- Must have: Understanding of semantic vs lexical search paradigms
- Must have: Experience with Learning to Rank (LTR) techniques and libraries (e.g., XGBoost,
LightGBM with LTR support)
- Should be proficient in SQL and BigQuery for analytics and feature generation
- Should have experience with Dataproc clusters for distributed data processing using Apache
Spark or PySpark
- Should have experience deploying models and services using Vertex AI, Cloud Run, or Cloud
Functions
- Should be comfortable working with BM25 ranking (via Elasticsearch or OpenSearch) and
blending with vector-based approaches
- Good to have: Familiarity with Vertex AI Matching Engine for scalable vector retrieval
- Good to have: Familiarity with TensorFlow Hub, Hugging Face, or other model repositories
- Good to have: Experience with prompt engineering, context windowing, and embedding
optimization for LLM-based systems
- Should understand how to build end-to-end ML pipelines for search and ranking applications
- Must have: Awareness of evaluation metrics for search relevance
(e.g., precision@k, recall, nDCG, MRR)
- Should have exposure to CI/CD pipelines and model versioning practices
GCP Tools Experience:
ML & AI: Vertex AI, Vertex AI Matching Engine, AutoML, AI Platform
Storage: BigQuery, Cloud Storage, Firestore
Ingestion: Pub/Sub, Cloud Functions, Cloud Run
Search: Vector Databases (e.g., Matching Engine, Qdrant on GKE), Elasticsearch/OpenSearch
Compute: Cloud Run, Cloud Functions, Vertex Pipelines, Cloud Dataproc (Spark/PySpark)
CI/CD & IaC: GitLab/GitHub Actions
EXPERTISE AND QUALIFICATIONS
Data/Applied scientist (Search)
- Strong in Python and experience with Jupyter notebooks, Python packages like
polars, pandas, numpy, scikit-learn, matplotlib, etc.
- Must have: Experience with machine learning lifecycle, including data
preparation, training, evaluation, and deployment
- Must have: Hands-on experience with GCP services for ML & data science
- Must have: Experience with Vector Search , Hybrid Search techniques, Query preprocessing
- Must have: Experience with embeddings generation using models like BERT, Sentence
Transformers, or custom models
- Must have: Experience in embedding indexing and retrieval (e.g.,
Elastic, FAISS, ScaNN, Annoy)
- Must have: Experience with LLMs and use cases like RAG (Retrieval-Augmented Generation)
- Must have: Understanding of semantic vs lexical search paradigms
- Must have: Experience with Learning to Rank (LTR) techniques and libraries (e.g., XGBoost,
LightGBM with LTR support)
- Should be proficient in SQL and BigQuery for analytics and feature generation
- Should have experience with Dataproc clusters for distributed data processing using Apache
Spark or PySpark
- Should have experience deploying models and services using Vertex AI, Cloud Run, or Cloud
Functions
- Should be comfortable working with BM25 ranking (via Elasticsearch or OpenSearch) and
blending with vector-based approaches
- Good to have: Familiarity with Vertex AI Matching Engine for scalable vector retrieval
- Good to have: Familiarity with TensorFlow Hub, Hugging Face, or other model repositories
- Good to have: Experience with prompt engineering, context windowing, and embedding
optimization for LLM-based systems
- Should understand how to build end-to-end ML pipelines for search and ranking applications
- Must have: Awareness of evaluation metrics for search relevance
(e.g., precision@k, recall, nDCG, MRR)
- Should have exposure to CI/CD pipelines and model versioning practices
GCP Tools Experience:
ML & AI: Vertex AI, Vertex AI Matching Engine, AutoML, AI Platform
Storage: BigQuery, Cloud Storage, Firestore
Ingestion: Pub/Sub, Cloud Functions, Cloud Run
Search: Vector Databases (e.g., Matching Engine, Qdrant on GKE), Elasticsearch/OpenSearch
Compute: Cloud Run, Cloud Functions, Vertex Pipelines, Cloud Dataproc (Spark/PySpark)
CI/CD & IaC: GitLab/GitHub Actions
Job Type: Full-time
Pay: Up to ₹1,700,000.00 per year
Work Location: In person