Role Summary
Own, operate, and continuously improve company's IT estate with a Linux-first mindset. This hands-on manager leads Linux/Unix systems administration, cloud server setup/operations, data security, enterprise Internet management for staff, and network reliability—while running a small helpdesk/ops team. You’ll keep operations steady under load, resolve ongoing issues quickly, and drive automation and hardening.
Key Responsibilities
Linux/Unix Systems
- Install, harden, patch, and administer Linux servers (RHEL/Ubuntu/SLES) across on-prem and cloud.
- Manage users, storage, filesystems, services, SELinux/AppArmor, and configuration baselines.
- Standardize golden images and build repeatable provisioning with IaC.
Cloud (Setup & Operations)
- Design and stand up cloud-based servers and services (AWS/Azure/GCP): VPC/VNet, subnets, routing, security groups/NSGs, VPNs, load balancers, autoscaling, snapshots/backups.
- Implement identity & access (IAM roles/policies), secret management, logging/monitoring (CloudWatch/Cloud Monitoring), and cost controls.
- Plan migrations and hybrid networking (site-to-site VPN/Direct Connect/ExpressRoute).
Internet & Network Management for Staff
- Own business Internet: primary/backup ISP links, SD-WAN/failover, bandwidth/QoS for critical apps (ERP, Zoom/Teams), and proactive capacity planning.
- Run secure web access: DNS security, web proxies/content filtering, WAF/CDN where needed; manage guest/BYOD and corporate VLANs with NAC.
- Troubleshoot L2/L3 issues (DHCP/DNS, VLANs, routing, Wi-Fi, VPNs, firewalls, proxies, load balancers) with rapid response during business hours and on-call.
Reliability, Monitoring & Troubleshooting
- Operate observability stacks (syslog, SNMP, Prometheus/Zabbix/Nagios, ELK/Graylog); define SLOs/alerts and tune performance.
- Lead incident response (P1/P2), maintain runbooks, perform RCAs, and eliminate recurring incidents.
Data Security & Compliance
- Own data security end-to-end: encryption at rest/in transit, backup/restore testing, DLP controls, endpoint protection/EDR, patch/vulnerability management, SIEM logging, least privilege.
- Manage SSO (SAML/OIDC), MFA, and key management (e.g., KMS).
- Support audits and policies (ISO 27001/SOC 2 as applicable) and ensure vendor/SaaS security reviews.
Automation & Tooling
- Automate ops with Bash/Python and config management (Ansible/Puppet/Chef); maintain Infrastructure as Code (e.g., Terraform) for servers, networks, and cloud resources.
- CI/CD basics for infra changes with change control.
Team, Service & Vendor Management
- Lead 2–6 IT staff/contractors; manage ticket queues, SLAs, coaching, and performance.
- Run ITIL-lite processes (incident/change/problem), maintain asset inventory/CMDB and SOPs.
- Own vendor relationships, AMCs, licensing, and procurement.
Qualifications (Must-Have)
- Education/Experience: Bachelor’s in CS/IT (or equivalent experience). 5–8+ years in IT infrastructure with 2+ years in a lead/manager role.
- Linux/Unix: Strong admin at scale (systemd, networking, package mgmt, shell scripting).
- Cloud Setup/Operations: Hands-on experience setting up and operating servers/services in AWS/Azure/GCP (networking, IAM, monitoring, HA/backup).
- Networking/Internet: Solid TCP/IP fundamentals; proven L2/L3 troubleshooting; Internet link management, QoS, SD-WAN/failover.
- Data Security: Practical hardening, patch orchestration, EDR/SIEM, backups/DR, identity/SSO, secrets, encryption, and DLP basics.
- Automation: Bash and at least one of Python/Ansible; Git; IaC (Terraform preferred).
- Operations: Comfortable owning P1/P2 incidents, writing RCAs, and driving permanent fixes.
Nice-to-Have
- Certifications: RHCSA/RHCE, Linux+, CCNA, Network+, ITIL Foundation, AWS/Azure Associate/SysOps.
- Experience with: HAProxy/Nginx, pfSense/Fortinet/Cisco/MikroTik/Meraki, FreeIPA/AD/SSO, PostgreSQL/MySQL ops, ZFS, NFS/SMB, MDM (Intune/Jamf), Kubernetes.
- Exposure to ISO 27001/SOC 2 and multi-site (plant/warehouse) networks.
Tools & Stack (adjust to your environment)
Linux (RHEL/Ubuntu), VMware/KVM/Proxmox, Cisco/MikroTik/Meraki/Aruba, pfSense/Fortinet/ASA, HAProxy/Nginx, Bind/Unbound, Cloudflare/WAF/CDN, Prometheus/Zabbix/Nagios, ELK/Graylog, Git, Ansible/Terraform, Bash/Python, AWS/Azure (core compute, storage, IAM, networking).
KPIs / Success Metrics
- Core service & network uptime; Internet link availability and failover success
- MTTR/MTBF; % recurring incidents eliminated
- Patch & backup compliance; successful restore tests
- Helpdesk SLAs (first response/resolve); CSAT
- Security findings closed on time; audit readiness
- Cloud & connectivity cost efficiency vs. budget
Job Type: Full-time
Language:
Work Location: In person