Must Have Technical/Functional Skills
- Successfully executed a data migration or modernization to Data Bricks, preferably IBM Data Stage to Data Bricks on AWS
- Should have Experience in handling Large Migrations to Data Bricks.
- Should have good analytical skills to compare the legacy and modern data platform end to end right from source to target.
- Good understanding of DataBricks implementation of Medallion layer architecture.
- Independently Lead and Managed large Data Bricks migrations.
- CI/CD Integration: Implement version control (e.g., Git) and automated deployment processes for Databricks assets
Technical and architectural skills required are below.
Core Data Engineering Languages
- Experience in Advanced SQL for building modular analytics workflows, utilizing advanced Common Table Expressions (CTEs), and writing high-performance queries inside Data Bricks SQL Analytics.
- Experience in Python or Scala to build, optimize, and debug complex data transformation scripts, custom functions, and machine learning pipelines.
Big Data & Architecture Core
- Experience in Apache Spark Ecosystem for understanding cluster execution flow, memory allocation, driver/worker nodes, and handling data frames.
- Experience in Delta Lake Architecture to understand ACID transactions on object storage, data skipping, partition strategies, and automated data compaction.
Databricks Platform Expertise
- Experience in Delta Live Tables (DLT) & Workflows for constructing and orchestrating production-ready, declarative streaming, and batch ETL pipelines.
- Experience in Unity Catalog for setting up data governance, column/row-level access control, and tracking end-to-end data lineage across workspaces.
- Experience in Auto Loader for implementing modern, incremental data ingestion patterns from cloud blob storage into the lakehouse.
Code Translation & Refactoring
- Pipeline Conversion: Translate visual DataStage Parallel Jobs and Sequences into Python/PySpark scripts or Data bricks Notebooks
- Legacy Refactoring: Modernize legacy logic rather than applying "lift and shift" anti-patterns; adapt workflows to think in distributed DataFrames rather than DataStage stages.
- Logic Mapping: Map DataStage components—such as Aggregators, Joiners, Transformers, and Sort stages—to equivalent Spark operations
Testing & Reconciliation
- Validation & Reconciliation: Build automated reconciliation frameworks to compare row counts, checksums, and aggregate sums between legacy DataStage outputs and new Databricks output
- Data Cleansing: Identify and resolve data type discrepancies, null-handling differences, and encoding issues during the extraction and loadi ng phases
Platform Orchestration & Governance
- Orchestration: Replace DataStage sequence jobs with Databricks workflows ( or external orchestrators like Azure Data Factory/Airflow) to schedule and manage dependencies
- Data Governance: Enforce data lineage, security, and cataloging using Unity Catalog to ensure compliance in the new Lakehouse environment.
GOOD TO Cloud Infrastructure & CI/CD
- Cloud Providers (AWS): Understanding underlying cloud object storage , identity access management (IAM), and network security configurations.
- DevOps & Bundles: Familiarity with Databricks Asset Bundles (DABs) and CI/CD tools to automate the deployment of workspaces and pipeline assets.
Legacy Assessment & Migration Mechanics
- Code Conversion & Translation: The ability to parse legacy code structures and refactor them into Databricks-native code.
AI-Assisted Migration: Skills in using AI coding assistants and open framework agent tools to analyze application interdependencies, automate schema mapping, and accelerate lift-and-shift workloads
- Code Conversion & Translation: The ability to parse legacy code structures from ETL pipelines, Informatica, data Stage preferred
Experience working in Agile teams and understanding of data governance frameworks.
Responsibilities
Support post-migration environment from IBM DataStage to Databricks
Incident & Lifecycle Management
- CI/CD Deployment: Support code deployments across Development, Test, and Production environments using Databricks Repos and REST APIs
- Monitoring & Alerting: Set up monitoring via Databricks System Tables and observability tools to catch job failures, data anomalies, or latency spikes early
Pipeline Maintenance & Orchestration
- Workflow Management: Transition from DataStage job sequences to native data bricks workflows for scheduling, dependency tracking, and alerts
- ETL Refactoring: Troubleshoot and fix issues in generated PySpark or Spark SQL code that replaced legacy DataStage Transformer or Lookup stages
- Streaming & Batch Integration: Support ongoing data ingestion using data bricks autoloader to process files continuously from cloud storage
Performance Tuning & Cost Optimization
- Compute Management: Monitor and configure serverless or classic clusters to prevent over-provisioning
- Query Optimization: Analyze Spark execution plans. Replace inefficient row-by-row processing logic (a common DataStage carryover) with vectorized operations and native Spark functions
- Storage Optimization: Maintain Delta Lake tables by enforcing layout optimization (\(ZORDER\)
Data Governance & Security
- Access Control: Implement granular permissions, column-masking, and row-level filters using Data bricks unity catalog to replace DataStage's legacy security p olicies
- Data Quality: Utilize Delta Live Tables (DLT) to build pipelines with built-in, declarative data quality expectations and monitoring
Additional Skills
- Excellent communication Skills
- Ability to collaborate with Legacy and Modernize application teams and stake holders
Base Salary Range : $120,000 to $140,000 Per Annum
TCS Employee Benefits Summary:
Discretionary Annual Incentive.
Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.
Family Support: Maternal & Parental Leaves.
Insurance Options: Auto & Home Insurance, Identity Theft Protection.
Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.
Time Off: Vacation, Time Off, Sick Leave & Holidays.
Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.
#LI-SV2
#LI-KUMARAN