- Work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions
-
Mine and analyze data from client databases to drive optimization and improvement of product development, marketing techniques and business strategies
-
Build, optimize and maintain machine learning models
-
Assess “health” of new data sources and soundness of data gathering techniques employed by the client
-
Process, clean, and verify the integrity of data used for analysis
-
Perform ad-hoc analysis and present results in a clear manner
Experience in one or more of the following programming languages: Python, R, MATLAB, Julia- Experience in data wrangling of (messy) datasets using pandas or dplyr
- Experience in exploratory data analysis
- Experience in data visualization using one or more of the following packages/tools:
seaborn, matplotlib, plotly, ggplot, Tableau
- Knowledge of machine learning techniques and algorithms such as K-nearest
neighbors, naive bayes, support vector machines, random forest, logistic regression, etc.
- Awareness of machine learning concepts such as over-fitting and under-fitting, the difference between bias and variance, generalization capability of the prediction model to unseen data, feature engineering, etc.
- Excellent written and verbal communication skills for coordinating across teams
- A drive to learn and master new technologies and techniques
Bonus Points- A Masters or PhD degree in a relevant discipline
- Data engineering experience; e.g., SQL, Hadoop, Spark, cloud computing
- Competitive programming experience (e.g., ACM, Topcoder, Code Forces, etc.)
- Experience participating in machine learning competitions (e.g., Kaggle, Hacker Earth, etc.)
- Strong statistics background
- An up-to-date portfolio (on GitHub?) showing your experience in all of the above!