Internally the position is known as Senior Data Scientist
About JiBe
JiBe is a cloud based fully integrated ERP system for the shipping industry. Our goal is to allow shipping companies to improve productivity, efficiency and safety levels, while reducing costs. JiBe ERP enables increased automation and streamlining of processes, creating pre-defined work flows and reducing the usage of email and paper.
Job Responsibilities:
Agentic AI System Development
-
Design and develop sophisticated agentic AI systems that orchestrate multi-step reasoning, decision-making, and tool use to solve complex real-world problems
-
Integrate diverse model types — including large language models accessed via
-
OpenRouter, fine-tuned transformer models, and classical ML models — as tools within agentic workflows
-
Continuously evaluate and incorporate emerging agentic frameworks, patterns, and best practices into the team's solutions
Optical Character Recognition (OCR)
-
Develop and improve advanced OCR solutions addressing highly complex and varied document types
-
Work on document and page classification, determining document types, layouts, and structures as a foundation for downstream processing
-
Design and implement sophisticated information extraction pipelines that identify, parse, and structure data from unstructured or semi-structured documents with high accuracy and reliability
Retrieval Augmented Generation (RAG)
-
Contribute to the development of RAG solutions that leverage data extracted through the team's OCR and information extraction pipelines
-
Collaborate closely with the data engineering team on the design of retrieval pipelines, vector stores, and data preparation workflows that underpin RAG systems
Collaboration & Integration
-
Work as an integrated member of a strong, established data science team, contributing expertise while aligning with shared architectural and methodological standards
-
Collaborate closely with data engineers to define data requirements, provide feedback on pipeline outputs, and ensure data consumed from Databricks and MongoDB meets the needs of model development
-
Participate in code reviews, knowledge sharing, and the continuous elevation of the team's technical standards
Qualifications:
Experience:
-
5+ years of hands-on experience in applied data science or machine learning engineering, with a strong track record of delivering production-grade solutions
-
Demonstrable experience developing agentic AI systems, including tool use, multi-agent orchestration, or LLM-driven workflows
-
Deep expertise in OCR and document understanding, including experience with complex, real-world documents exhibiting high layout and content variability
-
Strong experience with information extraction techniques, including named entity recognition, structured data extraction, and document parsing
Technical Skills
-
Deep expertise in Python, including advanced concepts such as async/await concurrency, decorators, context managers, metaprogramming, type hinting, and design patterns. Proven ability to write clean, maintainable, and performant code following best practices
-
Strong hands-on experience building scalable, production-grade microservices using FastAPI. Proficiency in API design, dependency injection, middleware integration, async endpoint development, OpenAPI/Swagger documentation, and performance optimization.
-
Solid experience with the ML/AI ecosystem: Hugging Face Transformers, scikit-learn, and PyTorch or TensorFlow.
-
Familiarity with ML-Flow, model serving, containerization (Docker), and orchestration (Kubernetes) for ML workloads.
-
Experience training, fine-tuning, and evaluating transformer-based models as well as classical supervised and unsupervised models.
-
Solid understanding of prompt engineering strategies including few-shot learning, chain-of-thought reasoning, system prompts, and prompt templating.
-
Experience optimizing prompts for accuracy, latency, and cost efficiency including LLM evaluation, and best practices for integrating LLMs into production systems.
-
Hands-on experience with LangChain, LangGraph, and other popular LLM orchestration frameworks (e.g., LlamaIndex, Haystack) for building agentic workflows, RAG pipelines, and complex multi-step LLM applications.
-
Familiarity with OpenRouter or equivalent LLM gateway/routing platforms is a serious advantage.
-
Experience with vector databases (e.g., Milvus, Qdrant, Chroma) for semantic search, retrieval-augmented generation (RAG), and efficient similarity search at scale.
-
Experience with Retrieval Augmented Generation (RAG) — including chunking strategies, embedding models, vector search, and retrieval evaluation — is a serious advantage
-
Familiarity with MongoDB or any other NoSQL database is an advantage
-
Experience working with Databricks or similar large-scale data platforms is an advantage
Soft Skills
-
Strong analytical thinking and the ability to break down complex, ambiguous problems into tractable solutions
-
A collaborative, team-first mindset — comfortable integrating into an existing high-performing team and contributing without the need for top-down direction
-
Clear communication skills, with the ability to discuss technical approaches with both technical peers and non-technical stakeholders
-
A proactive attitude toward learning, with genuine curiosity about the rapidly evolving AI landscape