This role is for one of the Weekday's clients
Salary range: Rs 2400000 - Rs 3500000 (ie INR 24-35 LPA)
Min Experience: 6 years
Location: Bangalore
JobType: full-time
We are seeking a skilled
Speech Data Scientist
to design, develop, and optimize advanced speech analytics and automatic speech recognition (ASR) solutions. The ideal candidate will work on end-to-end speech pipelines, multilingual audio processing, and model deployment in production environments. You will also drive research and innovation in speech processing, contributing to model enhancement and high-impact technical solutions.
Requirements
Key Responsibilities
Core Development & Implementation
-
Design and implement end-to-end speech analytics pipelines for production.
-
Develop ASR engines using frameworks such as Wav2vec, Whisper, and Deep Speech with PyTorch or TensorFlow.
-
Build and optimize speaker diarization, language identification (LID), and text post-processing systems.
-
Focus on multilingual audio processing and domain adaptation strategies.
-
Lead data selection and preprocessing for improved model performance.
Model Development & Enhancement
-
Develop and analyze objective measures for speech quality evaluation and enhancement.
-
Implement speaker-conditioned personalization techniques to improve ASR accuracy in noisy environments.
-
Optimize on-device ASR models, emphasizing multi-language scenarios.
-
Guide teams on best practices for model accuracy and performance optimization.
Research & Innovation
-
Conduct research on advanced speech processing and neural speech enhancement techniques.
-
Develop novel solutions for multi-speaker and complex audio scenarios.
-
Contribute to patents, publications, and technical thought leadership in speech technology.
-
Stay updated on transformer models, attention mechanisms, and foundation models.
Technical Integration & Deployment
-
Design integration architectures for speech-to-text services and related technologies.
-
Implement MLOps processes and CI/CD pipelines for speech model deployment.
-
Deploy and scale speech solutions on cloud platforms (AWS, GCP).
-
Develop production-ready applications using Python, C++, and Java.
Required Qualifications
Education
-
Ph.D./M.S./M.Tech in Computer Science, Signal Processing, or related field preferred.
- B.Tech/B.E in ECE, CSE, or related technical field required.
Technical Expertise
-
Speech Processing: 3-6 years of hands-on experience in ASR and speech analytics. Strong knowledge of HMMs, GMMs, ANNs, language modeling, CNNs, RNNs, LSTMs, CTC, and attention mechanisms.
-
Machine Learning / Deep Learning: Proficiency in PyTorch and TensorFlow; experience with transformer models (BERT, Wav2vec 2.0, Whisper) and end-to-end ASR implementation.
-
Programming & Tools: Strong Python skills (numpy, pandas, scikit-learn), experience with C++/Java for production, bash scripting, and Git.
-
Cloud & Deployment: Hands-on experience with AWS/GCP, containerization (Docker, Kubernetes), MLOps, CI/CD pipelines, and scalable model serving.
Skills
ASR, Speech Recognition, Speech Analytics, Multilingual Audio Processing, Python, PyTorch, TensorFlow, Deep Learning