About Alphamatician
Alphamatician is an alternative data platform that collects, cleans, and structures public web data into investment-ready datasets for institutional investors. Founded in 2012, we provide point-in-time datasets covering more than 40,000 companies globally, with up to 15 years of historical data. Our clients use our data for systematic trading strategies, investment research, and quantitative analysis.
The Role
You will be our first dedicated engineering hire, taking ownership of the data collection pipeline that powers the entire business. The core platform is built in PHP (CodeIgniter 4) and has been running in production for over 14 years. This is a hands-on role where you will monitor, maintain, and improve a production system that collects data daily from dozens of web sources, cleans and maps it to company identifiers, and delivers it to institutional clients via API and SFTP. You will spend the majority of your time working in PHP, with Python and Node.js used for supporting tools and sub-projects.
This is not a role where you build dashboards or run queries. You will work directly with scrapers, parsers, databases, and server infrastructure. When something breaks at a data source, you diagnose and fix it. You will own multiple collection processes end to end and progressively take on more as you ramp up.
What You Will Do Day to Day
- Monitor the data pipeline and error logs across multiple collection processes
- Diagnose and resolve data quality issues by working across the codebase, the database, and the underlying infrastructure
- Own the full lifecycle of individual data collection processes, from source scraping through cleaning, mapping, and loading
- Progressively take ownership of additional datasets as you learn the system
- Maintain and improve scraping and parsing logic as source websites change
- Work with a production MySQL database (RDS) and manage data integrity across large-scale datasets
- Collaborate directly with the founder to troubleshoot complex or novel issues
- Contribute to documentation and build institutional knowledge of the pipeline
Tech Stack
- Core application: PHP (CodeIgniter 4) — this is the primary language you will work in daily
- Supporting tools and sub-projects: Python and Node.js
- Database: MySQL on AWS RDS
- Infrastructure: Local servers and AWS (EC2, RDS)
- Delivery: API and SFTP
What We Are Looking For
- 3–6 years of experience working with data pipelines, ETL processes, or web scraping infrastructure in a production environment
- Strong working knowledge of PHP is required. The core platform is PHP and you will work in it daily. Experience with Python or Node.js is a plus, as these are used for supporting tools
- Solid MySQL skills, including the ability to write and optimize queries, troubleshoot performance issues, and manage data at scale
- Experience with AWS services, particularly RDS and EC2
- Comfort working with web scraping or web data collection, including handling the unpredictability of external data sources
- Ability to move between code, a database console, and server logs to diagnose problems
- Self-directed work style suited to a small, remote team where you manage your own priorities
Nice to Have
- Experience in financial data, alternative data, or fintech
- Experience with CodeIgniter or MVC PHP frameworks
- Experience working at a small company or startup where you wore multiple hats
- Understanding of data mapping, entity resolution, or securities identifiers (tickers, ISINs)
What This Role Is Not
This is not a data science or analytics role. You will not be building models, writing reports, or working directly with clients. This is an engineering and operations role focused on keeping a complex data collection system running reliably and improving it over time. If you enjoy the satisfaction of making systems work and the challenge of debugging real-world data problems, this is the role for you.
Compensation and Structure
- Salary range: $120,000–$140,000 depending on experience
- Initial engagement structured as a 6-month W-2 contract-to-hire with a clear path to full-time conversion
- Fully remote, US-based
- Direct collaboration with the founder from day one
How to Apply
Submit your resume and a brief note about your experience with data pipelines or web scraping at alphamatician.com/careers. If you have links to relevant work, open source contributions, or a portfolio, please include them. We value practical experience over credentials.
Pay: $120,000.00 - $140,000.00 per year
Application Question(s):
- How many years of experience do you have working with PHP in a production environment?
- Do you have experience with data pipelines, ETL, or web scraping infrastructure?
- Are you authorized to work in the United States?
Work Location: Remote