Arivazhagan Pandiyan

Site Reliability Engineer Lead specialized in building fault-tolerant, scalable infrastructure on Microsoft Azure with advanced Observability and Automation.

arivu.p@live.in
+1 346-599-0347
Houston, Texas, USA

About Me

I am a seasoned Site Reliability Engineer with over 10 years of experience in designing and maintaining high-availability systems for mission-critical applications in the financial and healthcare sectors.

My expertise lies in Microsoft Azure ecosystem (AZ-104 Certified), where I architect scalable cloud solutions, automate complex workflows using PowerShell and Python, and implement robust observability strategies using tools like Datadog, Dynatrace, and Zabbix. I am passionate about reducing toil through automation and enhancing system reliability.

Reliability
Automation
Cloud Native
Leadership

Professional Experience

Site Reliability Engineer Lead

Aug 2023 – Jul 2024
New American Funding
  • Defined SLIs/SLOs and led incident response, post-mortems, and Root Cause Analysis (RCA) to allow for continuous improvement.
  • Designed scalable, highly-available systems on Microsoft Azure, ensuring 99.99% uptime for critical services.
  • Drove automation of repetitive tasks and deployment tools using CI/CD pipelines, reducing manual effort by 40%.

Technology Operations Associate

Oct 2017 – Oct 2022
Wells Fargo India Solutions
  • Managed and monitored internal servers, applications, and hardware inventory using ServiceNow.
  • Developed PowerShell scripts for automated server troubleshooting (CPU/Disk/Memory analysis) and reporting.
  • Coordinated extensively with vendors (IBM, DELL, HDS, Brocade) to resolve critical hardware failures and production issues.

System Administrator

Sept 2015 – Oct 2017
NTT Data Global Delivery
  • Monitored and managed live Windows and Linux production servers in a 24/7 environment.
  • Rolled out IT Service Delivery tools, focusing on change management processes and automation strategies.

Support Engineer

Nov 2014 – Sept 2015
First Source Solution
  • Managed background applications and monitored server health; addressed alerts for Disk, CPU, and Memory utilization.
  • Handled customer inquiries and incident tracking using the Kayako ticketing system.

Technical Assistant Engineer

June 2013 – Aug 2014
Cliptos Technologies
  • Installed and maintained software packages and provided technical support for desktop systems.
  • Assisted users with application installation and configuration for Healthcare domain (MEDITOS).

Technical Proficiency

Core Competencies

Microsoft Azure SRE Leadership Cloud Architecture Uptime Optimization Incident Management System Monitoring

Tools & Languages

PowerShell Python Shell Scripting KQL Datadog Dynatrace Zabbix PagerDuty Terraform

Infrastructure

Windows Server Linux/Unix VMware Cisco Networking NetApp Pure Storage

Certifications

AZ-104 Azure Administrator Associate
Datadog Fundamentals Certification