Responsibilities
1. Architecture Design
o Develop and maintain scalable data architectures using Databricks, including data lakes, data warehouses, and real-time processing systems.
o Create detailed blueprints for data processes and flows, ensuring alignment with business objectives.
2. Pipeline Development
o Design and implement ETL/ELT pipelines using Databricks and Apache Spark to process large-scale datasets efficiently.
o Optimize pipelines for performance, scalability, and reliability.
3. Data Governance
o Implement and enforce data governance policies, security measures, and compliance standards within the Databricks environment.
4. Collaboration
o Partner with data scientists, business analysts, and business stakeholders to understand data needs and deliver solutions that drive business value.
o Communicate complex technical concepts clearly to diverse audiences, fostering alignment and collaboration.
5. Integration
o Integrate Databricks with cloud services (e.g., AWS S3, Azure Data Lake Storage, Google Cloud Storage) and CRM/marketing automation platforms (e.g., Salesforce, Veeva) for seamless data flow.
o Ensure interoperability with existing systems to create a cohesive data ecosystem.
6. Innovation
o Stay updated on advancements in Databricks, Delta Lake, Databricks SQL, and related technologies, applying best practices to enhance data capabilities.
o Drive continuous improvement in data processes and tools.
7. Leadership and Mentoring
o Lead the implementation and optimization of Databricks for commercial pharma analytics, tailoring solutions for sales, marketing, and patient outcomes.
o Train and mentor team members on Databricks and analytics platforms, fostering a culture of data literacy and innovation.
8. Analytics Collaboration
o Collaborate with analytics teams to develop reporting frameworks for monitoring KPIs, such as engagement rates and campaign ROI.
o Ensure seamless integration of Databricks with CRM and marketing automation systems to support analytics workflows.
Desired Profile:
- 12+ years of experience in data engineering, with a strong focus on Databricks, Apache Spark, and cloud platforms (AWS, Azure, or GCP).
- Proficiency in Databricks, Python, Scala, SQL, and experience with ETL/ELT tools and pipeline orchestration (e.g., Apache Airflow, ADF).
- Pharma knowledge especially around entity HCP & HCO
- Data modeling/Medallion
- ETL/ELT pipeline design
- Data privacy/security
- Pharma entity (HCP/HCO) integration
- Deep knowledge of data modeling, schema design, and database management.
- Proven leadership in managing data engineering teams and projects, with strong project management skills.
- Excellent communication skills, with the ability to convey complex technical concepts to non-technical stakeholders.
- Strong understanding of syndicated data in pharmaceuticals.
- Experience in commercial pharma analytics is highly preferred.
- Bachelor’s degree in computer science, Data Science, or a related field; advanced degree (e.g., Master’s) is a plus.
- Certifications in Databricks, AWS, Azure, or related technologies is REQUIRED.