We are looking for a Senior AWS HPC Engineer to enhance high-performance computing solutions within our team. This role involves deploying and managing Open OnDemand portals with AWS services to elevate HPC workloads. Become a part of groundbreaking cloud and HPC initiatives to improve client computing performance.
Responsibilities
-
Deploy, configure, and tailor the Open OnDemand web portal for HPC workloads
-
Integrate Open OnDemand with AWS tools such as EC2, S3, FSx, ParallelCluster, and IAM
-
Collaborate with AWS and HPC teams for smooth user access and operation of workloads
-
Implement high standards for performance, scalability, and cost efficiency
-
Develop automation and infrastructure-as-code solutions using Terraform, CloudFormation, or Ansible
-
Create and maintain comprehensive documentation for architecture, configurations, and processes
-
Apply security measures with Identity and Access Management integration
-
Monitor and troubleshoot performance issues in the HPC environment
-
Optimize Linux system setups to support HPC workloads
-
Coordinate with various teams to meet client demands
-
Assess and recommend new technologies to improve HPC infrastructure
Requirements
-
3+ years of experience managing HPC environments
-
Proven hands-on experience deploying and maintaining Open OnDemand
-
Knowledge of Linux systems administration
-
Background in AWS HPC services and cloud architecture
-
Proficiency in scripting languages such as Bash, Python, or Ruby
-
Familiarity with authentication and user management systems like LDAP, SSO, or Keycloak
-
Flexibility to handle Identity and Access Management (IAM) policies
-
Competency in creating infrastructure as code with Terraform, CloudFormation, or Ansible
-
Strong analytical and communication skills
-
English proficiency at B2 level or higher
Nice to have
-
AWS certification such as Solutions Architect, SysOps, or DevOps
-
Experience with AWS ParallelCluster or similar HPC orchestration tools
-
Knowledge of SLURM or other workload managers
We offer
-
CONTINUOUS UPSKILLING, LEARNING & DEVELOPMENT
-
Diversity of tasks and projects
-
Assessment center for objective review of competency level
-
Personal development plan
-
Mentoring programs and leadership development
-
Certification and professional development support
-
Access to learning platforms including more than 2,500 internal courses and the LinkedIn Learning library with 20,000+ courses
-
English courses taught by certified teachers
-
CORPORATE BENEFITS
-
Extra leave days
-
Referral bonuses
-
COMPENSATION PACKAGE
-
Competitive compensation paid in USD
-
Regular salary and performance reviews
-
MEDICAL & HEALTHCARE
-
Private health insurance
-
Well-being events
-
WORKING ENVIRONMENT
-
Recreation areas and kitchens
-
Tea, coffee, and snacks
-
Well-being events
-
Sports equipment and game consoles
-
IT Equipment
-
Microsoft's Software Assurance Home Use Program (HUP)
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.