Citrin Cooperman offers a dynamic work environment, fostering professional growth and collaboration. We’re continuously seeking talented individuals who bring a problem-solving mindset, fresh perspectives, and sharp technical expertise. We know you have choices, so our team of collaborative, innovative professionals are ready to support your professional development. At Citrin Cooperman, we offer competitive compensation and benefits and most importantly, the flexibility to manage your personal and professional life to focus on what matters most to you!
We are seeking a Senior – Data Engineer, Development, to join our Development team within the Information Technology department. We’re establishing a pioneering AI Solutions team responsible for taking successful AI and Agentic pilots and industrializing them for enterprise-scale production. This is a high-impact engineering position focused on bridging the gap between rapid AI innovation and rigorous enterprise operations. You’ll be the critical link ensuring our AI initiatives have the data foundations required to operate securely, safely, and accurately on scale.
You’ll pivot dynamically between two distinct modes: rapidly generating synthetic, sample, or anonymized data products to support pilot workstreams, and designing the hardened, production-grade deployment architectures within our Microsoft Fabric ecosystem. With our AI landscape spanning multiple frontier models (Anthropic, Google, OpenAI) and custom agentic platforms like LangGraph, you’ll help set firm-wide standards for how these applications consume, process, and govern data. The ideal candidate is a builder at heart who thrives on enterprise complexity, stays continuously updated on AI engineering trends, and possesses the technical authority to transform prototypes into resilient, highly available data systems.
Responsibilities are, but not limited to
-
AI Data Architecture & Fabric Integration: Design and build production-grade data pipelines within Microsoft Fabric (OneLake) to feed diverse AI models and agentic workflows, ensuring low-latency and high-reliability data retrieval.
-
Synthetic & Sample Data Provisioning: Rapidly engineer synthetic, anonymized, or sample datasets to unblock innovation teams and support secure pilot development without exposing sensitive enterprise data.
-
Multi-Model Data Strategy: Establish data formatting, chunking, and embedding standards that are interoperable across multiple LLM providers (Anthropic, OpenAI, Google) and vector stores.
-
Evaluation & Telemetry Data Management: Design the data architecture to support continuous LLM evaluation tools. This includes managing ground-truth datasets, capturing prompt/response telemetry, logging agentic reasoning traces (e.g., LangGraph state transitions), and storing evaluation metrics for drift and accuracy monitoring.
-
Enterprise Data Security & Governance: Implement row-level security, PII masking, and data access guardrails within the data layer before information ever reaches external AI APIs or internal sandboxes.
-
Transition to Operations: Create comprehensive documentation, data quality checks, and operational playbooks to successfully transition mature AI data pipelines to the Data Operations team.
The ideal candidate must:
-
Have a bachelor’s degree in computer science, data engineering, mathematics, or equivalent practical experience.
- Be Microsoft Certified: Fabric Data Engineer Associate (DP-700).
-
Be Microsoft Certified: Fabric Analytics Engineer Associate (DP-600).
-
Be Microsoft Certified: Power BI Data Analyst Associate (PL-300).
-
Have 5+ years of advanced data engineering experience in enterprise environments.
-
Have deep expertise in modern data platforms and cloud ecosystems (e.g., Microsoft Fabric, AWS, Snowflake) and strong proficiency in Python and SQL.
-
Have experience with the specific data requirements of Generative AI, including Vector Databases, RAG (Retrieval-Augmented Generation) architectures, and data chunking strategies.
-
Be familiar with building data backends for multi-agent or complex stateful AI frameworks (such as LangChain or LangGraph).
-
Have a demonstrated ability to design data models that support complex logging, telemetry, and evaluation for non-deterministic systems.
-
Have proven experience taking data products from initial proof-of-concept to highly available production environments.
-
Have a Pioneering Mindset: Eager to define new standards in an emerging technical domain rather than just following existing playbooks.
-
Be Adaptable & Agile: Comfortable context-switching between rapid prototyping support and rigorous, methodical production engineering.
-
Be Quality-Obsessed: Views pilot pre-launch review and deployment readiness as critical architectural exercises and refuses to compromise on enterprise data integrity and security.