Web Crawling & Data Extraction Engineer (WFH)
Experience: 1–7 Years
Location: Remote (Work from Home)
Mode of Engagement: Full-time
No of Positions: 3 to 8
Educational Qualification: Bachelor’s degree in Computer Science, IT, or related field
Industry: IT / Software Services / Data & AI
Notice Period: Immediate Joiners
What We Are Looking For
- Strong hands-on experience in web crawling, scraping, and automated data extraction.
- Experience working with Playwright, Puppeteer, Selenium, and Python Requests for scraping JS-heavy and dynamic websites.
- Good understanding of HTML DOM, XPaths, JSON APIs, and handling pagination, infinite scroll, and dynamic page loads.
- Ability to manage sessions, cookies, headers, token-based flows, and bypass basic anti-bot protections.
- Experience building automated extraction scripts that run on schedules or pipelines.
- Ability to write clean, structured, and optimized Python code for crawling and automation.
Responsibilities
- Build and maintain web crawlers & automated extraction scripts using Playwright/Puppeteer/Selenium/Requests.
- Extract data from complex, JavaScript-rendered, or protected websites.
- Handle dynamic elements, redirects, headers, authentication, and anti-bot mitigations.
- Clean, validate, and structure extracted data (JSON, CSV, DB).
- Use SQL/NoSQL databases to store extracted content.
- Maintain automation scripts, scheduling, retries, logging, and monitoring for stable crawling.
- Work closely with internal teams to deliver accurate and timely data.
Qualifications
- 1–4 years of hands-on experience in Python-based web crawling and data extraction.
- Experience with Playwright, Puppeteer, Selenium, and Python Requests/BeautifulSoup.
- Understanding of proxies, user-agents, sessions, and anti-bot workarounds.
- Experience with SQL or MongoDB.
- Strong debugging, analytical, and logical reasoning skills.
- Good English communication skills.
Job Type: Full-time
Pay: ₹30,000.00 - ₹90,000.00 per month
Benefits:
- Health insurance
- Provident Fund
- Work from home
Application Question(s):
- How many projects involved anti-bot handling (proxies, sessions, headers, fingerprinting, CAPTCHA solving, logins)?
- How many projects have you independently built end-to-end (requirement → crawling → extraction → automation → deployment)?
- How many automated data pipelines have you built (scheduling, retries, data cleaning, storing into DB)?
- How many JavaScript-rendered websites have you handled using Playwright/Puppeteer/Selenium?
- How many years of hands-on experience do you have in web crawling, scraping, and data extraction using Python?
- Current CTC (monthly)
- Notice period (days)
- How many years of experience do you have using Python Requests for crawling normal/static websites?
Work Location: Remote