- Design, develop, and maintain secure and scalable cloud infrastructure platforms using the latest DevSecOps and Platform Engineering methodologies
- Create and implement best practices and processes for code quality, security, performance, and scalability using Sonarqube, Cycode, DAST, SAST & FOSSA
- Strong experience using GCP specific services like Compute Engine, CloudRun, GKE, Cloud operations suite, Service Mesh, Anthos, Pub/Sub, Dataflow, Cloud Scheduler, Bigtable, AlloyDB, Vertex AI, Big Query, Cloud SQL and other managed services.
- Google Cloud infrastructure provisioning including VPC, Subnet, Gateway, Security groups, managed services, Kubernetes Cluster etc.
- Expertise with automating Infrastructure as Code using Terraform, Packer, Ansible, Shell Scripting and ArgoCD
- Experience in implementing Auto scaling, Disaster Recovery, High Availability, Multi-region Active/Active & Active/Passive configurations & best practices is added advantage.
- Evaluate and select appropriate technologies and tools to support the development and deployment of products on the eCommerce foundation layer
- Experience with Internal Developer Platform (IDP) like Backstage and address developer productivity
- Expertise with patch management, APM tools like Dynatrace/AppDynamics, Prometheus, Grafana, ELK for monitoring and alerting.
- Experience in Elastic Search service offerings in K8s.
- Experience in Cloud FinOps to optimize Cloud Infrastructure Consumption Cost
Excellent communication and interpersonal skills
Ability to work effectively with other global team members
Proven facilitation skills - able to effectively drive discussion among diverse perspectives
- Responsible for overall Infrastructure Architecture and evolution of next gen platforms. Ideal candidates will research the existing products and recommend solutions to run workloads in futuristic Infrastructure Architecture landscape
- Conduct Infrastructure as Code reviews, automate and deploy Cloud Infrastructure
- Experience with implementing AIOps in the Platform Engineering space and increase Developer Experience
- Identify code vulnerabilities and performance bottlenecks at the Infrastructure Layer, and recommend solutions to improve the overall quality and performance of the sub systems
- Create and maintain technical documentation, including architecture diagrams, design documents, and operational procedures for High Availability, Disaster Recovery scenarios
- Analyze kernel logs, network stats, APM metrics, application logs to troubleshoot CPU/Memory/Resource hot spots, API latency and application/platform health
- Analyze and identify root-cause and fix complex performance problems involving multiple teams, networks, and software in GCP that relate to scaling and performance
- Build Automation for repeatable DevSecOps tasks and help with improving Software Engineers’ productivity