Find The RightJob.

Data Scientist

Internally the position is known as Senior Data Scientist

About JiBe

JiBe is a cloud based fully integrated ERP system for the shipping industry. Our goal is to allow shipping companies to improve productivity, efficiency and safety levels, while reducing costs. JiBe ERP enables increased automation and streamlining of processes, creating pre-defined work flows and reducing the usage of email and paper.

Job Responsibilities:

Agentic AI System Development

Design and develop sophisticated agentic AI systems that orchestrate multi-step reasoning, decision-making, and tool use to solve complex real-world problems
Integrate diverse model types — including large language models accessed via
OpenRouter, fine-tuned transformer models, and classical ML models — as tools within agentic workflows
Continuously evaluate and incorporate emerging agentic frameworks, patterns, and best practices into the team's solutions

Optical Character Recognition (OCR)

Develop and improve advanced OCR solutions addressing highly complex and varied document types
Work on document and page classification, determining document types, layouts, and structures as a foundation for downstream processing
Design and implement sophisticated information extraction pipelines that identify, parse, and structure data from unstructured or semi-structured documents with high accuracy and reliability

Retrieval Augmented Generation (RAG)

Contribute to the development of RAG solutions that leverage data extracted through the team's OCR and information extraction pipelines
Collaborate closely with the data engineering team on the design of retrieval pipelines, vector stores, and data preparation workflows that underpin RAG systems

Collaboration & Integration

Work as an integrated member of a strong, established data science team, contributing expertise while aligning with shared architectural and methodological standards
Collaborate closely with data engineers to define data requirements, provide feedback on pipeline outputs, and ensure data consumed from Databricks and MongoDB meets the needs of model development
Participate in code reviews, knowledge sharing, and the continuous elevation of the team's technical standards

Qualifications:

Experience:

5+ years of hands-on experience in applied data science or machine learning engineering, with a strong track record of delivering production-grade solutions
Demonstrable experience developing agentic AI systems, including tool use, multi-agent orchestration, or LLM-driven workflows
Deep expertise in OCR and document understanding, including experience with complex, real-world documents exhibiting high layout and content variability
Strong experience with information extraction techniques, including named entity recognition, structured data extraction, and document parsing

Technical Skills

Deep expertise in Python, including advanced concepts such as async/await concurrency, decorators, context managers, metaprogramming, type hinting, and design patterns. Proven ability to write clean, maintainable, and performant code following best practices
Strong hands-on experience building scalable, production-grade microservices using FastAPI. Proficiency in API design, dependency injection, middleware integration, async endpoint development, OpenAPI/Swagger documentation, and performance optimization.
Solid experience with the ML/AI ecosystem: Hugging Face Transformers, scikit-learn, and PyTorch or TensorFlow.
Familiarity with ML-Flow, model serving, containerization (Docker), and orchestration (Kubernetes) for ML workloads.
Experience training, fine-tuning, and evaluating transformer-based models as well as classical supervised and unsupervised models.
Solid understanding of prompt engineering strategies including few-shot learning, chain-of-thought reasoning, system prompts, and prompt templating.
Experience optimizing prompts for accuracy, latency, and cost efficiency including LLM evaluation, and best practices for integrating LLMs into production systems.
Hands-on experience with LangChain, LangGraph, and other popular LLM orchestration frameworks (e.g., LlamaIndex, Haystack) for building agentic workflows, RAG pipelines, and complex multi-step LLM applications.
Familiarity with OpenRouter or equivalent LLM gateway/routing platforms is a serious advantage.
Experience with vector databases (e.g., Milvus, Qdrant, Chroma) for semantic search, retrieval-augmented generation (RAG), and efficient similarity search at scale.
Experience with Retrieval Augmented Generation (RAG) — including chunking strategies, embedding models, vector search, and retrieval evaluation — is a serious advantage
Familiarity with MongoDB or any other NoSQL database is an advantage
Experience working with Databricks or similar large-scale data platforms is an advantage

Soft Skills

Strong analytical thinking and the ability to break down complex, ambiguous problems into tractable solutions
A collaborative, team-first mindset — comfortable integrating into an existing high-performing team and contributing without the need for top-down direction
Clear communication skills, with the ability to discuss technical approaches with both technical peers and non-technical stakeholders
A proactive attitude toward learning, with genuine curiosity about the rapidly evolving AI landscape

Similar jobs