Data Analyst Intern (Advanced SQL / Python / Data Engineering Focus)
Location: On site (Chicago preferred)
Duration: 6 months
Type: Internship (High-Impact Technical Track)
Role Overview
We are seeking a highly technical Data Analyst Intern who can operate at the intersection of analytics and data engineering. This role involves working directly with production databases, optimizing queries at scale, contributing to data modeling decisions, and building automated analytical pipelines.
This is not a dashboard-only role — we are looking for someone who understands how data flows from source systems to analytical layers and can reason about performance, schema design, and statistical validity.
Core Responsibilities
1. Advanced SQL & Query Optimization
- Write complex analytical queries using:
- CTE chains
- Window functions (ROW_NUMBER, RANK, LAG, LEAD)
- Recursive queries
- Complex aggregations
- Correlated subqueries
- Analyze and optimize slow-running queries using:
- Execution plans
- Indexing strategies
- Partitioning logic
- Query refactoring
- Design and maintain:
- Materialized views
- Incremental aggregation tables
- Work with large datasets (multi-million row tables)
- Contribute to schema design discussions
Expected databases:
- PostgreSQL / MySQL / SQL Server
- Familiarity with columnar systems (Redshift / BigQuery / Snowflake) is a plus
2. Data Modeling & Warehousing
- Understand and apply:
- Star schema
- Snowflake schema
- Fact vs Dimension tables
- Slowly Changing Dimensions (SCD)
- Help design data marts optimized for analytics use cases
- Define and document KPI logic and metric definitions
- Ensure data consistency across reporting layers
3. Python & Data Pipelines
- Build scalable data processing scripts using:
- Pandas
- NumPy
- SQLAlchemy
- PyArrow (optional)
- Design automated reporting pipelines
- Implement ETL/ELT workflows
- Perform data validation checks and anomaly detection
- Work with:
- JSON APIs
- Web scraping (if applicable)
- Streaming or batch ingestion processes
- Use logging and exception handling for production-ready scripts
4. Statistical & Experimental Analysis
- Design and analyze A/B tests:
- Power analysis
- Statistical significance testing
- Confidence intervals
- Apply regression models for:
- Trend forecasting
- Cohort analysis
- Retention modeling
- Evaluate metric bias and confounding variables
- Clearly communicate limitations of statistical conclusions
5. Dev & Infrastructure Exposure
- Use Git for version control
- Write modular, reusable code
- Document pipelines and data logic
- (Preferred) Exposure to:
- Docker
- Airflow
- dbt
- Cloud environments (AWS/GCP/Azure)
- CI/CD concepts
Required Qualifications
- Strong SQL proficiency (must understand query execution and optimization)
- Advanced Python for data analysis
- Deep understanding of relational database design
- Strong grasp of statistics and hypothesis testing
- Ability to reason about data integrity and system-level data flow
- Comfortable working with messy, incomplete, real-world data
Preferred Qualifications
- Experience building ETL pipelines
- Knowledge of indexing strategies and database internals
- Experience with analytics engineering tools (dbt)
- Exposure to distributed systems or big data frameworks
- Experience handling datasets >10M rows
Compensation
- Paid internship (competitive hourly rate)
- High ownership and impact
- Mentorship from senior analytics / engineering leadership
- Potential conversion to full-time role
Pay: From $20.00 per hour
Work Location: In person