The CTS Enterprise Analytics Services (EAS) organization is actively recruiting for a strong Platform Engineer to work on a broad spectrum of engineering initiatives. EAS organization is driving enterprise-wide strategy of engineering and managing best in class data and analytics services including Big Data platforms, Spark services, AI & ML services, etc. The role requires a thought leader who can perform hands on work in partnership with key stakeholders, architects, engineers, data scientist and devops teams to engineer and deliver highly resilient solutions.
Responsibilities:
- 
Assess the current landscape and book of work, and partner with various teams to identify key areas for infrastructure automations, configuration management, monitoring, alerting, etc.
- 
Continuously work on designing and improving processes of detecting and responding to production service outages and build preventive solutions.
- 
Act as the subject matter expert in Site Reliability Engineering to help drive engineering vision set by EAS stakeholders.
- 
Produce availability and performance metrics for services and deliver processes to improve on major KPIs.
- 
Operationalize highly available services deployed across multi-region and multi-data center environments.
- 
Handle outages, perform root cause analysis, and provide architectural and engineering recommendations.
- 
Build internal knowledge base to educate partners and support teams.
Skills:
- 
Proven track record of system design experience with highly available platforms and services supporting various types of workloads.
- 
Experience in designing fail-over processes and solutions.
- 
Strong scripting skills – shell scripts, Python, Perl, etc.
- 
Experience with virtualization, containerization, and cloud technologies – Docker, Kubernetes and Cloud Service Providers e.g. GCP, AWS, etc.
- 
Analytical thinker able to assess various aspects to methodically arrive at a solution.
- 
Hands on experience in gathering performance metrics, troubleshooting, tuning, monitoring, etc.
- 
Experience with monitoring and logging solutions and frameworks e.g. OTEL, Grafana, Prometheus, Kibana, Splunk, etc.
- 
Hands on work on installing, configuring and troubleshooting Linux based environments.
- 
Experience in IaC and CI/CD tooling e.g. Terraform, Jenkins, Harness, etc.
- 
Strong knowledge of configuration management tools e.g. Ansible and/or Chef.
- 
Familiarity with GPU management in virtualized enterprise environments.
- 
Good understanding of security concepts and best practices.
- 
Excellent written and verbal communication skills.
- 
Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products.
- 
Ability to work in a matrixed environment and follow procedures, processes and policies.
- 
Experience managing vendor interactions for troubleshooting sessions, enhancement requests, and guiding vendor roadmaps to meet Citi standards and functional requirements.
Self-starter who works with minimal supervision and can work in a team of diverse skills and geographies.
Skills:
- 
Proven track record of designing and supporting highly available platforms and services supporting various types of workloads.
- 
Experience in designing fail-over processes and solutions.
- 
Strong scripting skills – shell scripts, Python, Perl, etc.
- 
Experience with virtualization, containerization, and cloud service providers – VMware, Docker, Kubernetes, AWS, GCP, etc.
- 
Analytical thinker able to assess various aspects of a work item to methodically arrive at a solution.
- 
Individual with hands on experience in gathering performance metrics, troubleshooting, tuning, monitoring, etc.
- 
Hands on work on installing, configuring and troubleshooting Linux based environments.
- 
Expertise with infrastructure automation, build, and deployment technologies for IaC and CI/CD e.g. Terraform, Ansible, Harness, ArgoCD, etc.
- 
Good understanding of security concepts and best practices.
- 
Strong experience with logging, monitoring, tracing, and visualization stacks e.g. ELK, Splunk, Grafana, etc.
- 
Excellent written and verbal communication skills.
- 
Good team player interested in sharing knowledge and cross-training other team members and shows interest in learning new technologies and products.
- 
Experience managing vendor interactions for troubleshooting sessions, enhancement requests, and guiding vendor roadmaps to meet Citi standards and functional requirements.
- 
Self-starter who works with minimal supervision and is able to work in a team of diverse skills and geographies.
-
Job Family Group:
Technology
-
Job Family:
Systems & Engineering
-
Time Type:
Full time
-
Most Relevant Skills
Please see the requirements listed above.
-
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.
-
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.