FIND_THE_RIGHTJOB.

Sublime Wireless Inc

AI Model Optimization & Fine-Tuning Engineer

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

AI Model Optimization & Fine-Tuning Engineer

(On-Device, Fully Offline)

About the Role

We are seeking a hands-on AI Model Optimization Engineer with proven experience in taking large base models, fine-tuning, distilling, and quantizing them for fully offline mobile deployment. This role requires real-world experience with model compression, dataset preparation, and mobile inference optimization for Android/iOS devices.

Responsibilities

End-to-end pipeline: data prep → fine-tuning → distillation → quantization → mobile packaging → benchmarking.
Apply PTQ/QAT quantization and distillation to deploy LLMs and multimodal models onto devices with limited memory/thermal budgets.
Format and prepare datasets for fine-tuning (tokenization, tagging, deduplication, versioning).
Optimize models for battery efficiency, low latency, and minimal RAM usage.
Benchmark and debug inference performance with Perfetto, Battery Historian, Instruments, etc.
Collaborate with app teams to integrate optimized models.

Mandatory Skills Checklist (Applicants must demonstrate experience in ALL of the following)

✅ Quantization & Distillation

Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
Methods like AWQ, GPTQ, SmoothQuant, RPTQ.
Knowledge of 4-bit/8-bit schemes (INT4, INT8, FP4, NF4).
Distillation methods (teacher–student, logit matching, feature distillation).

✅ Fine-Tuning & Data Handling

LoRA/QLoRA/DoRA/AdaLoRA fine-tuning.
Instruction-tuning pipelines with PyTorch + Hugging Face.
Dataset formatting: JSONL, multi-turn dialogs, tagging, tokenization (SentencePiece/BPE).
Deduplication, stratified sampling, and eval set creation.

✅ On-Device Deployment

Hands-on with at least two runtimes: llama.cpp / GGUF, MLC LLM, ExecuTorch, ONNX Runtime Mobile, TensorFlow Lite, Core ML.
Experience with hardware acceleration: Metal (iOS), NNAPI (Android), GPU/Vulkan, Qualcomm DSP/NPU, XNNPACK.
Real-world deployment: must provide examples of models running fully offline on mobile (tokens/s, RAM usage, device specs).

✅ Performance & Benchmarking

Tools: Perfetto, systrace, Battery Historian, adb stats (Android); Xcode Instruments, Energy Log (iOS).
Profiling decode speed, cold start vs. warm start latency, RAM usage, and energy consumption.

✅ General

Strong PyTorch and Hugging Face experience.
Clear documentation and ability to explain optimization trade-offs.

Nice to Have

Open-source contributions to LLM quantization/edge-AI frameworks.
Prior deployment of Qwen, LLaMA, Gemma, or Mistral families onto mobile devices.
Multilingual or low-resource dataset experience (Urdu, Arabic, Hindi, etc.), including tokenization, script handling, and fine-tuning.
Familiarity with multimodal (ASR/TTS/VAD) integration on device.

Application Requirements

Applicants must include in their application:

A short case study of a model they have fine-tuned (dataset + method + results).
A short case study of a model they have quantized/distilled for mobile (framework + bit-depth + device + performance metrics).
Links to GitHub repos, papers, or APK/TestFlight builds if available.

Job Type: Full-time

Pay: Rs250,000.00 - Rs400,000.00 per month

Work Location: In person

Similar jobs

Artificial Intelligence & Machine Learning Engineer – AI-ML

THE ADVANTECH

Lahore, Pakistan

5 days ago

AI/ML Engineer

Tek Headquarters

Lahore, Pakistan

5 days ago

AI/ML Engineer

Appsians

Lahore, Pakistan

5 days ago

AI Expert for Software Solutions

Mohkaab Enterprise

Lahore, Pakistan

5 days ago

AI Engineer (PhD Required)

Pixalate, Inc.

Lahore, Pakistan

5 days ago

AI Engineer

Cube Discipline

Lahore, Pakistan

5 days ago

AI Automation Engineer

My Digital People

Lahore, Pakistan

5 days ago

Term of use Privacy policy