Job Type: Contract
Job Category: IT
Job Description
Hiring: Unstructured.io DeveloperLocation: Remote (Boston, MA)Contract: 6–12 Months (Extendable)
Job Summary: We are seeking an experienced
Unstructured.io Developer to work on enterprise-grade data ingestion and document processing solutions. The ideal candidate will have strong hands-on experience with
Unstructured.io framework, data transformation pipelines, and integration with
LLM / Vector DB / Search platforms. In this role, you will develop and optimize workflows for parsing, cleaning, and indexing complex enterprise documents.
Key Responsibilities
Develop and enhance data processing pipelines using
Unstructured.io for converting unstructured data (PDF, DOCX, HTML, Emails, Scans) into structured formats.
Integrate extracted data with Vector Databases or Search Indexing workflows for LLM/RAG applications.
Optimize parsing performance, accuracy, and consistency across various document formats.
Work with Python-based microservices, APIs, and orchestration frameworks.
Collaborate with Data Engineering, ML, and Product teams to design scalable ingestion architectures.
Implement best practices for scalable, reusable pipeline components.
Monitor, debug, and resolve pipeline issues across staging and production environments.
Required Skills & Experience
Overall IT Experience: 8+ Years
3+ years hands-on experience implementing
Unstructured.io in production environments.
Strong experience with Python, including parsing, data transformation, and API development.
Experience building RAG (Retrieval-Augmented Generation) or Document AI workflows.
Hands-on with Vector Databases (Pinecone, Weaviate, Chroma, FAISS, Milvus, etc.).
Familiarity with Cloud Platforms (AWS preferred).
Experience with Docker, Git, CI/CD pipelines.
Nice to Have
Experience with frameworks like LangChain / LlamaIndex.
Knowledge of NLP, embeddings, and tokenization.
Experience integrating with LLM providers (OpenAI, Anthropic, Azure OpenAI, etc.).
Familiarity with document OCR tools (Tesseract, Azure Form Recognizer, AWS Textract).
Required Skills
CLOUD DEVELOPER
SQL APPLICATION DEVELOPER