This position supports the REFRAME project, a multi-year, externally funded research initiative led by NC State University. The project aims to deliver a modular, AI-enabled, open-source platform for feedstock-agnostic evaluation of future biomass. It will integrate legacy open-source models, novel surrogates, shared metadata schemas, and a large language model (LLM) to enhance accessibility for varied stakeholders.
Leveraging well-characterized agricultural and food processing residues, REFRAME will generate insights into historically underfunded circular biomass streams and provide deployment-ready tools for end-to-end scenario analysis, counterfactual modeling, and decision support across the value chain. The Platform Architect will play a central role in designing and operationalizing the technical backbone of this platform in close collaboration with faculty, research staff, and external partners.
The Platform Architect is a senior technical leadership role that will lead the software and infrastructure development for the digital backbone for the REFRAME project. This digital backbone is the central nervous system for developing, deploying, monitoring, and governing all machine learning models related to the project. The platform architect will be responsible for strategic roadmapping of infrastructure, system design, tech-stack selection, devOps, security & compliance, workflow integration, and process optimization, ensuring the platform’s scalability, reliability, and efficiency. The architect will also work closely with the project manager and software development personnel, bridging gaps between project personnel and development personnel to ensure that ML models can transition from research workflows to scalable, secure, and reliable production environments efficiently.
The Platform Architect is a senior technical leadership role responsible for defining the strategy, design, and implementation of our solution-specific Machine Learning Operations (MLOps) platform. This platform is the central nervous system for developing, deploying, monitoring, and governing all machine learning models related to the project. The Architect bridges the gap between data science experimentation and production engineering rigor, ensuring that ML models can transition from research workflows to scalable, secure, and reliable production environments efficiently.
Platform Vision and Architecture (30%)
- Strategic Roadmapping: Define the technical vision and multi-year roadmap for the end-to-end ML platform, aligning it with project and research objectives and data science needs (including GenAI/LLM capabilities).
- System Design: Architect, design, and document a robust, scalable, and cost-efficient platform covering the entire ML lifecycle: Data Ingestion, Feature Store, Model Training, Model Registry, Model Serving/Inference, and Monitoring.
- Technology Selection: Evaluate, select, and integrate appropriate cloud-native services (AWS, Azure, or GCP), hybrid and on-prem computing resources, and open-source MLOps tools (e.g., Kubeflow, MLflow, Airflow) to build a cohesive ecosystem.
MLOps and Delivery Excellence (20%)
- Automation: Lead the implementation of MLOps pipelines using CI/CD practices to automate model training, testing, validation, deployment, and automated retraining workflows.
- Infrastructure as Code (IaC): Design and enforce IaC standards (Terraform/CloudFormation) for provisioning and managing all underlying compute, networking, and storage resources (e.g., Kubernetes clusters, GPU instances).
Feature Engineering: Define shared data and feature management patterns to ensure consistency and reuse across model training and inference workflows.
Governance, Security, and Compliance (30%)
- Model Governance: Implement standards for model versioning, lineage tracking, and compliance to ensure models are traceable, reproducible, and meet ethical AI principles.
- Security: Architect security measures, including network segmentation, access control (IAM/RBAC), and data encryption across the entire ML pipeline.
- Performance & Cost Optimization: Establish comprehensive monitoring and alerting for model performance (drift, bias) and infrastructure metrics, driving continuous optimization for performance and cloud costs (FinOps).
Technical Leadership and Collaboration (20%)
- Cross-Functional Partnership: Act as the primary technical point of contact, collaborating closely with Data Scientists, Data Engineers, Software Engineers, Product Managers, domain scientists, industry partners, process engineering and systems modeling experts.
- Documentation: Create and maintain high-quality architectural diagrams, reference implementations, and technical documentation.