Find The RightJob.
We build frontier foundation models that power intelligent experiences at Apple. Our team works across the full training lifecycle: including pre-training foundation models, and developing mid-training approaches that bridge general capability and task-specific performance. What makes our work distinct is that we're engineering models specifically for Apple silicon and optimized for experiences that are private, personal, and deeply integrated into the OS. We're solving frontier problems in reward modeling to resist reward hacking, handling sparse and delayed rewards in agentic settings, and aligning models reliably across the spectrum from open-ended creative tasks to precise, action-taking workflows. If you're drawn to hard problems where the research and the product are inseparable, this is the team.
Description
This position operates at the convergence of Software Engineering and Machine Learning Research. Unlike traditional backend roles, this position requires you to design systems where the outcome is the statistical distribution and quality of data itself. You will work alongside Research Scientists to transform theoretical observations into concrete, scalable engineering solutions. Your core focus will be the architecture of our Data Acquisition, Processing, and Repository Management systems for Large Model training. You will lead technical efforts to enable active, quality-driven data curation, including filtering, deduping, synthetic data generation and data mixing, ensuring our models are trained on the highest-quality information available.","responsibilities":"Architect Scalable Ingestion Systems: Design and implement high-throughput distributed systems to ingest petabytes of text and multimodal data from diverse sources, including web crawls and third-party partnerships.
Repository Optimization: Manage the lifecycle of large-scale datasets across data storage and high-performance file systems. Optimize data formats for efficient random access and sequential scanning during model training.
Data Governance & Privacy: Engineer robust data governance and privacy solutions for the training data, in collaboration with compliance and legal teams, to ensure adherence to stringent regulatory standards.
High-Performance Processing Pipelines: Build and maintain distributed data processing workflows using advanced frameworks on cloud infrastructure (e.g., GCP, AWS).
Algorithmic Data Curation: Implement sophisticated data filtering and selection logic to remove low-quality content. Develop semantic deduplication at scale to prevent model memorization and improve training efficiency.
Decontamination Removal: Design automated systems to detect and remove benchmark leakage, ensuring that evaluation datasets remain strictly isolated from training corpora.
Infrastructure for Scaling Laws: Collaborate with researchers to enable data ablations and scaling experiments. Build tools to support systematic data mixture optimization and empirically data studies.
Preferred Qualifications
Research Collaboration: Experience working within or closely with ML research organizations (e.g., as a Research Engineer), with an ability to translate research results into engineering implementations.
Domain Knowledge: Familiarity with lifecycle of modern LLM training, end-to-end workflows, and underlying system architecture.
Complex Data Types: Experience in processing complex data modalities beyond plain text, such as source code repositories, images, videos, and audios.
Minimum Qualifications
Education: Bachelor’s degree in Computer Science, Electrical Engineering, or Mathematics.
Technical Expertise: 4+ years of software engineering experience with a specific focus on Data Infrastructure, Distributed Systems, or AI/ML Engineering.
Language Proficiency: Expert fluency in Python, and strong competence in system languages such as C++.
Cloud Architecture: Extensive experience architecting solutions on major public cloud platforms (e.g. GCP) to build scalable data systems (e.g. with Apache Beam, GCS)
Performance Engineering: Deep experience profiling and optimizing high-throughput data systems. Demonstrated ability to debug distributed bottlenecks (e.g., stragglers, I/O saturation), optimize data formats and provide efficient data storage solutions.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant .
Pay & Benefits
At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $181,100 and $318,400, and your base pay will depend on your skills, qualifications, experience, and location.
Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses - including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.
Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
Similar jobs
Apple
Cupertino, United States
2 days ago
Apple
Cupertino, United States
2 days ago
Amazon Web Services
Cupertino, United States
2 days ago
Apple
Cupertino, United States
2 days ago
Apple
Cupertino, United States
3 days ago
Apple
Cupertino, United States
3 days ago
Apple
Cupertino, United States
3 days ago
© 2026 Qureos. All rights reserved.