Support Engineer

JOB_REQUIREMENTS

Hires in

Not specified

Employment Type

Not specified

Company Location

Not specified

Salary

Not specified

Equifax is seeking creative, high-energy and driven software engineers with hands-on development skills to work on a variety of meaningful projects. Our software engineering positions provide you the opportunity to join a team of talented engineers working with leading-edge technology. You are ideal for this position if you are a forward-thinking, committed, and enthusiastic software engineer who is passionate about technology.

What you’ll do

Monitoring & Observability (Datadog-Focused)

Own Observability: Design, implement, and maintain a comprehensive monitoring strategy using Datadog (Metrics, APM, Logs, Synthetics, and RUM).
Proactive Detection: Build and refine sophisticated dashboards, SLOs/SLIs, and alerts to identify performance bottlenecks and potential failures before they become customer-facing incidents.
Analyze: Use Datadog's full suite to trace complex issues across distributed microservices, from the front-end to the database.

Production Support & Incident Management

Incident Command: Act as the technical lead during high-priority production incidents, coordinating cross-functional teams (Development, DevOps, Product) to drive rapid resolution.
Root Cause Analysis (RCA): Conduct thorough, blameless post-mortems to identify the true root cause of incidents, documenting findings and tracking remedial actions.
On-Call: Participate in a rotating on-call schedule, serving as the primary escalation point for all production service issues.
War Room Leadership: Confidently manage "war room" scenarios, clearly communicating status, impact, and needs to both technical and business stakeholders.

Engineering & Automation (The "Dev" Component)

Code-Level Troubleshooting: Utilize your development background (e.g., Python, Go, Java, .NET) to read and understand application code, enabling you to pinpoint bugs and collaborate effectively with development teams on fixes.
Build Tools, Not Toil: Identify and automate repetitive manual tasks (toil) by building scripts, internal tools, and runbooks.
Influence Design: Partner with software engineers to champion "design for production," providing feedback on logging, metrics, and application reliability from the support perspective.

What experience you need

Bachelor's degree or equivalent experience
5+ years in a Production Support, Site Reliability Engineering (SRE), or high-stakes DevOps role.
Datadog Expertise: Extensive, hands-on experience with the Datadog platform. You must be comfortable building complex dashboards, setting up monitors, and using APM and log analytics for deep-dive troubleshooting.
Production Incident Management: Proven track record of leading the response to and resolution of critical incidents in a 24/7, high-availability environment.
Development/Scripting: Strong prior development or scripting knowledge. Must be proficient in at least one language like Python, Go, Bash, or PowerShell . The ability to read and debug code in languages like Java, or Node.js is a major plus.
Core Tech: Deep understanding of:
- Cloud Platforms (AWS, Azure, or GCP)
- Containerization (Kubernetes, Docker)
- CI/CD Pipelines (Jenkins, GitLab CI, etc.)
Mindset: A calm, methodical, and detail-oriented approach to problem-solving, especially under pressure.

What could set you apart

Datadog Certification(s).
Experience with Infrastructure as Code (Terraform, Ansible).
Knowledge of other observability tools (e.g., Prometheus, Grafana, ELK Stack).
Experience in database performance tuning (SQL or NoSQL).

Similar jobs

Director of Data Engineering

Outpost

United States

22 days ago

Easy Apply

Term of use Privacy policy