Key Responsibilities
- Develop, maintain, and optimize web scraping scripts, crawlers, and data pipelines using languages like Python.
- Implement scraping solutions using frameworks such as Scrapy, BeautifulSoup, Selenium, Playwright, or similar tools.
- Ensure compliance with website rules, ethical guidelines, and applicable data-privacy regulations.
- Clean, transform, and validate scraped data to ensure accuracy and quality.
- Monitor scraper performance, troubleshoot issues, and enhance reliability.
- Work with APIs and integrate data from multiple sources.
- Collaborate with data engineers, analysts, and product teams to understand data requirements.
- Maintain documentation of processes, tools, and workflows.
- Implement anti-bot handling strategies (e.g., proxies, rotating user agents, CAPTCHA bypass mechanisms when legally permitted).
- Optimize scraping for speed, efficiency, and scalability.
Required Skills & Qualifications
- 4–5 years of professional experience in web scraping, data extraction, or related fields.
- Strong proficiency in Python and relevant libraries (Requests, BeautifulSoup, lxml, Pandas, etc.).
- Experience with scraping frameworks such as Scrapy or Playwright/Selenium.
- Hands-on experience working with RESTful APIs, JSON, XML, and other data formats.
- Familiarity with headless browsers, automation techniques, and proxy management.
- Strong understanding of HTML, CSS, JavaScript, and website structures.
- Experience with databases (SQL or NoSQL) for storing scraped data.
- Knowledge of ETL processes and data-pipeline design.
- Good problem-solving skills and attention to detail.
- Ability to manage multiple scraping tasks with minimal supervision.
Job Type: Full-time
Work Location: In person