Find The RightJob.

Tata Consultancy Services (TCS)

Data Bricks Migration and Support engineer

Must Have Technical/Functional Skills

Successfully executed a data migration or modernization to Data Bricks, preferably IBM Data Stage to Data Bricks on AWS

Should have Experience in handling Large Migrations to Data Bricks.

Should have good analytical skills to compare the legacy and modern data platform end to end right from source to target.

Good understanding of DataBricks implementation of Medallion layer architecture.

Independently Lead and Managed large Data Bricks migrations.

CI/CD Integration: Implement version control (e.g., Git) and automated deployment processes for Databricks assets

Technical and architectural skills required are below.

Core Data Engineering Languages

Experience in Advanced SQL for building modular analytics workflows, utilizing advanced Common Table Expressions (CTEs), and writing high-performance queries inside Data Bricks SQL Analytics.

Experience in Python or Scala to build, optimize, and debug complex data transformation scripts, custom functions, and machine learning pipelines.

Big Data & Architecture Core

Experience in Apache Spark Ecosystem for understanding cluster execution flow, memory allocation, driver/worker nodes, and handling data frames.

Experience in Delta Lake Architecture to understand ACID transactions on object storage, data skipping, partition strategies, and automated data compaction.

Databricks Platform Expertise

Experience in Delta Live Tables (DLT) & Workflows for constructing and orchestrating production-ready, declarative streaming, and batch ETL pipelines.

Experience in Unity Catalog for setting up data governance, column/row-level access control, and tracking end-to-end data lineage across workspaces.

Experience in Auto Loader for implementing modern, incremental data ingestion patterns from cloud blob storage into the lakehouse.

Code Translation & Refactoring

Pipeline Conversion: Translate visual DataStage Parallel Jobs and Sequences into Python/PySpark scripts or Data bricks Notebooks

Legacy Refactoring: Modernize legacy logic rather than applying "lift and shift" anti-patterns; adapt workflows to think in distributed DataFrames rather than DataStage stages.

Logic Mapping: Map DataStage components—such as Aggregators, Joiners, Transformers, and Sort stages—to equivalent Spark operations

Testing & Reconciliation

Validation & Reconciliation: Build automated reconciliation frameworks to compare row counts, checksums, and aggregate sums between legacy DataStage outputs and new Databricks output

Data Cleansing: Identify and resolve data type discrepancies, null-handling differences, and encoding issues during the extraction and loadi ng phases

Platform Orchestration & Governance

Orchestration: Replace DataStage sequence jobs with Databricks workflows ( or external orchestrators like Azure Data Factory/Airflow) to schedule and manage dependencies

Data Governance: Enforce data lineage, security, and cataloging using Unity Catalog to ensure compliance in the new Lakehouse environment.

GOOD TO Cloud Infrastructure & CI/CD

Cloud Providers (AWS): Understanding underlying cloud object storage , identity access management (IAM), and network security configurations.

DevOps & Bundles: Familiarity with Databricks Asset Bundles (DABs) and CI/CD tools to automate the deployment of workspaces and pipeline assets.

Legacy Assessment & Migration Mechanics

Code Conversion & Translation: The ability to parse legacy code structures and refactor them into Databricks-native code.

AI-Assisted Migration: Skills in using AI coding assistants and open framework agent tools to analyze application interdependencies, automate schema mapping, and accelerate lift-and-shift workloads

Code Conversion & Translation: The ability to parse legacy code structures from ETL pipelines, Informatica, data Stage preferred

Experience working in Agile teams and understanding of data governance frameworks.

Responsibilities

Support post-migration environment from IBM DataStage to Databricks

Incident & Lifecycle Management

CI/CD Deployment: Support code deployments across Development, Test, and Production environments using Databricks Repos and REST APIs

Monitoring & Alerting: Set up monitoring via Databricks System Tables and observability tools to catch job failures, data anomalies, or latency spikes early

Pipeline Maintenance & Orchestration

Workflow Management: Transition from DataStage job sequences to native data bricks workflows for scheduling, dependency tracking, and alerts

ETL Refactoring: Troubleshoot and fix issues in generated PySpark or Spark SQL code that replaced legacy DataStage Transformer or Lookup stages

Streaming & Batch Integration: Support ongoing data ingestion using data bricks autoloader to process files continuously from cloud storage

Performance Tuning & Cost Optimization

Compute Management: Monitor and configure serverless or classic clusters to prevent over-provisioning

Query Optimization: Analyze Spark execution plans. Replace inefficient row-by-row processing logic (a common DataStage carryover) with vectorized operations and native Spark functions

Storage Optimization: Maintain Delta Lake tables by enforcing layout optimization ($ZORDER$

Data Governance & Security

Access Control: Implement granular permissions, column-masking, and row-level filters using Data bricks unity catalog to replace DataStage's legacy security p olicies

Data Quality: Utilize Delta Live Tables (DLT) to build pipelines with built-in, declarative data quality expectations and monitoring

Additional Skills

Excellent communication Skills

Ability to collaborate with Legacy and Modernize application teams and stake holders

Base Salary Range : $120,000 to $140,000 Per Annum

TCS Employee Benefits Summary:

Discretionary Annual Incentive.

Comprehensive Medical Coverage: Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.

Family Support: Maternal & Parental Leaves.

Insurance Options: Auto & Home Insurance, Identity Theft Protection.

Convenience & Professional Growth: Commuter Benefits & Certification & Training Reimbursement.

Time Off: Vacation, Time Off, Sick Leave & Holidays.

Legal & Financial Assistance: Legal Assistance, 401K Plan, Performance Bonus, College Fund, Student Loan Refinancing.

#LI-SV2

#LI-KUMARAN

Location

Seattle, WA

Job Function

TECHNOLOGY

Role

Senior Engineer

Job Id

416291

Desired Skills

Data Migration

Salary Range

$120,000-$140,000 a year

Desired Candidate Profile

Qualifications : BACHELOR OF COMPUTER SCIENCE

Similar jobs

No similar jobs found

Term of use Privacy policy