Qureos

FIND_THE_RIGHTJOB.

Software Engineer (Web Crawling)

Stanford, United States

About the Role

As a Crawling Engineer at Abaka AI, you will design, develop, and maintain automated systems that collect and process data from a wide range of web sources. Your crawlers will directly power our multimodal dataset pipelines, ensuring clients have access to accurate and diverse data. You will work closely with internal data engineering and AI teams to optimize crawling strategies, manage large-scale data extraction, and uphold ethical and secure crawling practices.

Location: Palo Alto, CA.


Responsibilities:

  • Collaborate closely with clients to understand their data requirements; coordinate internal teams to develop tailored delivery plans and ensure on-time, high-quality data delivery (e.g., meeting format, precision, and volume expectations).
  • Lead the development of mid- to long-term plans for the data engineering function. Build scalable, end-to-end pipelines for multimodal data (text, image, audio, video, 3D point cloud, etc.) including data sourcing, cleaning, annotation, QA, storage, and iterative optimization for training, fine-tuning, and evaluation.
  • Drive solutions to core technical challenges in multimodal data processing, such as cross-modal alignment (e.g., image-text semantic matching), large-scale data cleaning (e.g., deduplication, denoising, format normalization), annotation efficiency, and data encryption/security.
  • Work cross-functionally with algorithm, product, and business teams: for example, providing feedback to model teams on data bottlenecks, helping refine internal tooling and services, or supporting client-facing teams with technical documentation and pre-sales support.
  • Assess and optimize the cost structure of data processing operations, including headcount, infrastructure, and tooling-striking a balance between quality, efficiency, and scalability.


Qualifications:

  • Strong background in computer science, data engineering, artificial intelligence, or related fields, with hands-on experience in large-scale data systems.
  • 3+ years of experience in data engineering or data operations; leadership experience is highly valued. Prior involvement in LLM or multimodal dataset preparation is a strong plus.
  • Deep understanding of end-to-end multimodal data workflows, with hands-on experience in at least two modalities (e.g., text, images, audio, video).
  • Proficient in designing technical architectures for large-scale data pipelines (e.g., distributed processing, automation frameworks). Familiarity with data privacy and security best practices (e.g., access control, data anonymization).
  • Strong execution and team management skills, able to translate high-level objectives into actionable plans and drive team outcomes.
  • Excellent communication and cross-functional collaboration skills, able to clearly convey technical and operational requirements, resolve conflicts, and manage stakeholder expectations.
  • High sense of ownership and resilience, comfortable working in a fast-paced, evolving AI landscape and capable of navigating urgent delivery timelines.


About Us

Abaka AI is a leading AI company and we are committed to becoming data partners in the artificial intelligence industry.

Abaka Al provides accurate and efficient services, covering data collection, data cleaning, data annotation and datasets. The self-developed intelligent data engineering platform (MooreData Platform) can process multimodal data such as image, video, text, audio, and point clouds. With the built-in Al Power of the platform, the efficiency of data engineering can be accelerated by 500%-1000%.

Abaka AI has established cooperative relationships with more than 1,000 top technology companies and research institutions, in the fields of Automobile AI, Generative AI, and Embodied AI. The company has launched global offices in Silicon Valley, Paris, Singapore and Tokyo, providing world-class AI data services and intelligent data engineering platforms to global partners.

© 2025 Qureos. All rights reserved.