Infrastructure Tier 2 Engineer

Noida, India

Dear Candidate,

Kindly find below JD

Location - Chennai, Bangalore, Hyderabad, Pune and Noida

Experience - 7+

Primary Monitoring & Incident Response

Provide 24×7 monitoring of Azure infrastructure (compute, network, storage) using tools such as Azure Monitor, Splunk, DynaTrace, and custom dashboards.
Respond to alerts and triage P1/P2 escalations via ServiceNow war rooms, performing initial diagnosis and remediation where possible.
Incident / Change / Exception process adherence.

Capacity & Availability Management

Identify scaling opportunities with virtual machines or service as required and identify zone-redundancy patterns for performance.
Keep track of capacity forecasts and proactively identify performance bottlenecks.

Backup & Restore Operations

Execute frequent backups (Azure Backup, NetApp Snapshots) and perform basic restore tasks to ensure business continuity.
Conduct routine backup verifications/tests to confirm data integrity.

Access & Permissions Management

Maintain Azure/NetApp file shares, setting up and adjusting access controls and AD group permissions according to organizational policy.
Perform periodic identity and access reviews to ensure principle of least privilege.

Logging & Metrics Oversight

Oversee monitoring agents (e.g., Splunk, DynaTrace, Azure Alerts, SystemPulse), ensuring they are up-to-date and generating the right alerts/metrics for L2 to act upon.
Collaborate with L3 to fine-tune alert thresholds and logging when chronic issues emerge.

Basic Performance Testing

Execute routine performance checks (e.g., load or stress tests) in coordination with L3 teams when potential service degradation is suspected.
Document and escalate consistent performance anomalies.

SKILL SET & STAFFING CONSIDERATIONS

Comfortable reading and troubleshooting logs/metrics (Splunk, DynaTrace, Azure Monitor).
Familiar with Azure Backup services, basic restore procedures, and file share permissions.
Proficiency in ticketing systems (ServiceNow), collaborating with other technical teams for escalations.
Sufficient knowledge to follow runbooks and standard operating procedures (SOPs).
Documentation of standard operating procedures and IaC changes should be continuously updated in a central repository (e.g., Git repos).
Familiarity with Epic implementations (on-prem / cloud)

Similar jobs