Wilson Sonsini is the premier legal advisor to technology, life sciences, and other growth enterprises worldwide. We represent companies at every stage of development, from entrepreneurial start-ups to multibillion-dollar global corporations, as well as the venture firms, private equity firms, and investment banks that finance and advise them. The firm has approximately 1,100 attorneys in 17 offices: 13 in the U.S., two in China, and two in Europe. Our broad spectrum of practices and entrepreneurial spirit allow our staff exceptional opportunities for professional achievement and career growth.
Wilson Sonsini is actively seeking an experience Principal Data Engineer to join our Data Science and Operations team. The Principal Data Engineer (PDE) will serve as a senior technical authority for designing and delivering an Azure-centric data ecosystem that powers the firm’s data management system and its application to business processes, workflows, AI/ML solutions, enterprise search, analytics, automation and reporting.
The PDE will serve as a key driver in developing, guiding, and executing the technology strategy for the firm’s central data management function, helping to advance data- and cloud-driven strategies and data security, spearhead data automation and AI/ML initiatives, and deliver high-impact data solutions and application development, all anchored in modern engineering practices and DevOps methodology.
The PDE will collaborate closely with internal business units, vendors and business stakeholders to further the firms’ data foundation to accelerate digital transformation and next-generation legal services while maintaining rigorous governance, security and client-data protections.
By helping to architect a secure, governed Azure data backbone that unifies authoritative information and powers search, analytics, and AI firm-wide, the Principal Data Engineer will help lead a team of data professionals committed to developing key, actionable insights and business workflows, and create a decisive competitive advantage for the firm and its clients.
Essential Duties, Responsibilities:
Azure-Based Data Warehouse & Lakehouse Development
- Architect, develop, and optimize a scalable Azure-centric Lakehouse that ingests data from firm databases, such as SQL Server, and firm-managed data platforms, such as NetDocuments, SharePoint, Aderant, Workday and SaaS practice and transactional platforms.
- Establish robust ELT/CDC pipelines with optimized Azure workflows for near–real-time ingestion and transformation.
- Implement semantic models that support enterprise search, knowledge-graph entities, AI stores and applications, and BI/dashboard datasets.
Data Governance & Security
- Embed firm-wide data-classification, retention, and ethical-wall rules into pipelines using Azure/Microsoft-based data controls, such as Purview, Defender for Cloud, and attribute-based access control (ABAC).
- Ensure compliance with firm data security and governance policies and guidelines.
- Champion data-governance reviews; define lineage, cataloging, and stewardship workflows.
Master Data Management (MDM)
- Lead the selection and rollout of an MDM hub (such as Profisee, Informatica 360, or equivalent) and its integration to Microsoft Master Data Services to unify firm data taxonomies and hierarchies.
- Define master record selection, match/merge logic, and data-quality SLAs.
- Integrate MDM outputs into downstream search indexes and analytics models.
Enterprise API Management
- Work on a team to deploy API Management controls and workflows as the firm’s secure gateway for internal micro-services and external client/data-provider integrations.
- Help enforce OAuth 2.0/OpenID Connect, policy-based throttling, and schema versioning.
- Establish data pipelines that integrate with the firm’s DevOps CI/CD workflows for automated API lifecycle management.
Data Pipeline and Workflow Automation
- Extend existing and new platforms (such as Power BI or Litera Foundation) with event-driven Azure Functions, Logic Apps, and Power Automate flows that push authoritative data to and from the appropriate data pipelines and transaction-management platforms.
- Automate document-metadata enrichment, classifications and clause-library updates to firm knowledge repositories, RAG databases, taxonomies and knowledge graphs.
AI & Advanced Analytics Enablement
- Provision vector stores, databases, and embeddings pipelines for generative-AI knowledge assistants/agents; co-develop retrieval-augmented generation (RAG) patterns with legal-AI teams.
- Help develop ML-Ops infrastructure and DevOps pipelines to support ML and AI-related initiatives.
- Partner with data scientists and analysts to help with problem framing and scoping, data discovery and access, feature engineering and experimentation, code methodology and review, and model development and deployment.
Strategic Leadership & Mentoring
- Define data-engineering roadmap, standards, and reference architectures; advocate cloud-native, (Terraform, Kubernetes, container management) and DevSecOps best practices.
- Mentor data professionals and analysts; assist with code reviews, participate in pair programming/system development, and conduct knowledge-sharing sessions.
General Data Engineering Duties
- Participate in defining and standardizing firm data to develop workable and practical data dictionaries, controlled vocabularies, and business taxonomies that align with practice-area and client-matter needs.
- Assist with the buildout of knowledge-graph capabilities – design schemas and ontologies that surface relationships with key data objects to power advanced search and generative-AI solutions.
- Prepare authoritative datasets via robust ETL/ELT pipelines – orchestrate Azure-based ingestion, transformation, and data-quality checks to ensure data is analysis-ready and trusted.
- Establish practices and procedures toward a “single source of truth,” that help to reduce redundant or less-authoritative repositories and enforce governance policies to prevent data silos and duplication.
- Engineer scalable data pipelines – implement automated, version-controlled DataOps workflows that support continuous delivery, monitoring, and lineage tracking across the firm’s analytics and AI technology stacks.
- Lead efforts to integrate various data systems and platforms to create a unified data ecosystem where information can be found using an AI-based guided enterprise search.
- Serve as a technical data owner for all the firm’s data assets and coordinate with responsible parties to establish a one-firm data approach.
- Develop and execute the data strategy, aligning it with the firm’s overall business objectives.
- Build and scale the firm’s data infrastructure into a modern, robust, best-in-industry capability.
- Ensure that outputs and solutions (models, dashboards, insights, reports, subscriptions) are actionable and integrate seamlessly into daily business operations.
- Stay current with industry trends and advancements in technology, including AI, data science, and engineering.
Qualifications:
- Ability to communicate clearly and effectively with people from both technical and non-technical backgrounds. Excellent writing and oral presentation skills.
- Experience performing root cause analysis on internal and external data, data integrations and processes to solve specific business problems and identify opportunities for improvement.
- Strong analytic skills related to working with structured and unstructured datasets and data models.
- Experience developing processes that support data transformation, integrations, data structures, metadata, dependency, and workflow management.
- A successful history of manipulating, processing, and extracting value from large, disconnected datasets.
- Working knowledge of creating and maintaining large data stores in SQL and cloud platforms, such as Azure or AWS.
- Experience supporting and working with cross-functional teams in a dynamic environment.
- Ability to collaborate with team members and interact with others throughout the Firm.
- Ability to deal responsibly with sensitive and confidential information in a discreet and secure manner.
- Law firm experience a plus.
Required Qualifications
- Proven experience in leading data-driven projects and teams to successful completion.
- 10+ years of experience in senior data management and engineering positions, including 3+ years as a lead/architect in Azure; prior work in highly regulated industries (legal, finance, healthcare) strongly preferred. This position is expected to remain technical in scope and daily practice.
- BA/BS/MA/MS and/or graduate degree in Computer Science, Data Science, Data Analytics, Information Systems or equivalent discipline.
- Experience with advanced data technologies, Microsoft SQL Server and related Microsoft data management and integration technologies.
- Excellent verbal and written communication and interpersonal skills.
- Technical Expertise:
- Data Warehouse experience, such as Azure Synapse/Fabric, Data Factory, Databricks, Delta Lake, Snowflake, Cosmos DB, and Event Hub/Kafka
- SQL Server, T-SQL, SSIS & SSRS, Stored Procedures
- Python or Scala, Spark, and modern ELT patterns
- Programming and scripting languages: Python, R, C++, Julia, Javascript, SQL
- Power Platform, PowerBI, Data Analysis Expressions (DAX)
- Excel / PowerQuery
- Azure Purview/Defender, RBAC/ABAC, encryption-at-rest/in-transit, key-vault management
- API design (REST/GraphQL), Swagger/OpenAPI, Azure APIM or Kong, OAuth 2.0
- CI/CD with Azure DevOps or GitHub Actions; Infrastructure-as-Code (Bicep/Terraform)
- MDM and Master Data Services implementations and data-quality frameworks
- Agile/Scrum methodology
- Extensive experience working with a variety of data file formats, such as JSON, XML, SQL
- Additional skills that would be highly advantageous include:
- PowerShell
- Regular Expression (Regex)
- VBA, MS Access & Excel
- Documentation & Process Mapping
- Dynamic visualization tools, such as Microsoft Power BI, Tableau, Domo, etc.
- Experience developing and applying machine learning models using Python, R, SQL and Azure Machine Learning
- Experience integrating legal industry, line-of-business applications, such as Litera Foundation, Intapp Open, SharePoint/OneDrive, Aderant, Salesforce.com/CRM and NetDocs/DMS
The primary location for this job posting is in Palo Alto, but other locations may be listed. The actual base pay offered will depend upon a variety of factors, including but not limited to the selected candidate’s qualifications, years of relevant experience, level of education, professional certifications and licenses, and work location. The anticipated pay range for this position is as follows:
Palo Alto, New York, San Francisco: $163,200 – $220,800 per year.
Austin, Boston, Boulder, Century City, Delaware, Los Angeles, Salt Lake City, San Diego, Seattle, Washington, D.C.: $147,050 – $198,950 per year.
The compensation for this position may include a discretionary year-end merit bonus based on performance. We offer a highly competitive salary and benefits package.
Equal Opportunity Employer (EOE).