Qureos

Find The RightJob.

Senior PySpark Data Engineer

Roles & Responsibilities
Job Title: Data Engineer


Job Description:


We are seeking a highly skilled and motivated Data Engineer to play a pivotal role in designing, building, and optimizing our next-generation scalable data pipelines. This position requires expertise in processing massive datasets using cutting-edge technologies like Apache Spark, PySpark, and Hive within a dynamic cloud environment. Your primary objective will be to ensure the utmost data reliability, speed, and efficiency, providing a robust foundation for downstream business intelligence and advanced analytics initiatives.


Roles & Responsibilities:
  • Data Pipeline Development & Maintenance: Design, build, and maintain highly scalable and efficient ETL/ELT data pipelines utilizing PySpark and Spark SQL for complex data transformations.
  • Cloud Data Infrastructure Management: Deploy, manage, and scale critical data infrastructure components on leading cloud platforms such as Amazon Web Services (AWS) (e.g., EMR, Glue), Microsoft Azure (e.g., Databricks, Synapse), or Google Cloud Platform (GCP).
  • Data Warehousing & Storage Optimization: Strategically manage data layout, partitioning, and indexing within Apache Hive and various cloud data lake solutions to optimize performance and accessibility.
  • Performance Tuning & Optimization: Proactively identify and resolve performance bottlenecks in Spark jobs, leveraging Spark UI for in-depth analysis, effectively managing data skewness, and optimizing memory utilization.
  • Diverse Data Integration: Develop robust solutions for ingesting high-volume and diverse datasets from both structured relational databases and unstructured flat files into our data ecosystem.
  • Automated Workflow Orchestration: Implement and manage automated data workflows using industry-standard scheduling tools like Apache Airflow or platform-native schedulers, ensuring timely and reliable data delivery.
  • Strategic Collaboration: Partner closely with data scientists, business analysts, and cross-functional enterprise teams to translate complex business requirements into technically sound and efficient data solutions.


Qualifications:


  • Big Data Frameworks Expertise: Demonstrated high proficiency in Apache Spark architecture, including a deep understanding of drivers, executors, and Directed Acyclic Graphs (DAGs).
  • Advanced Programming: Exceptional coding skills in Python and extensive experience with the PySpark API for developing intricate data transformations and processing logic.
  • Querying & Schema Management: Strong command of HiveQL and ANSI SQL, coupled with expertise in data partitioning techniques and effective schema definition.
  • Optimized Storage Formats: In-depth understanding and practical experience with optimized big data storage file formats such as Parquet, ORC, and Avro.
  • Cloud Ecosystem Development: Hands-on development experience utilizing cloud-native big data utilities (e.g., AWS EMR, Azure Databricks) with in major cloud platforms.
  • Data Warehousing Fundamentals: Solid foundation in Dimensional Data Modeling, including Star and Snowflake schemas, and practical experience with Data Lakes concepts and implementation.
Preferred Qualifications
  • CI/CD & DevOps Automation: Experience with Continuous Integration/Continuous Deployment (CI/CD) practices and automation tools like Git, Jenkins, or Ansible.
  • NoSQL Database Integration: Exposure to and experience with NoSQL databases such as HBase, Cassandra, or MongoDB.
  • Professional Cloud Certifications: Relevant professional cloud certifications (e.g., AWS Certified Data Engineer, Microsoft Certified: Azure Data Engineer Associate) are highly valued




Salary Range: $125,000 to $140,000 per year


Location
Irving, TX
Job Function
TECHNOLOGY
Role
Engineer
Job Id
416953
Desired Skills
Hadoop
Salary Range
$125,000-$140,000 a year

Desired Candidate Profile

Qualifications : BACHELOR OF COMPUTER SCIENCE

Similar jobs

No similar jobs found

© 2026 Qureos. All rights reserved.