Job Description:
Extract text data from a variety of sources (documents, logs, databases, web scraping) to support development of NLP/LLM solutions.
Develop, test, and maintain robust tools, frameworks, and libraries that standardize and streamline the data & machine learning lifecycle.
Collaborate with cross-functional teams of Data Science, Data Engineering, business units, and IT teams.
Bachelor’s or Master’s degree with 8+ years of experience in Computer Science, Data Science, Engineering, or a related field.
5+ years of experience working with Python, SQL, PySpark, and bash scripts. Proficient in software development lifecycle and software engineering practices.
2+ years of hands-on experience using Databricks platform for data engineering and MLOps, including MLFlow, Model Registry, Databricks Workflow, Job Clusters, Databricks CLI, and Workspace.
Experience with machine learning frameworks (scikit-learn, xgboost, Keras, PyTorch) and operationalizing models in production.
Hands-on experience with CI/CD tools (e.g., Jenkins or equivalent), version control (Github, Bitbucket), orchestration (Airflow, Prefect or equivalent).
Location:
This position can be based in any of the following locations:
Chennai
Current Guardian Colleagues: Please apply through the internal Jobs Hub in Workday