Senior AWS Site Reliability Engineer (SRE)
  • Posted On: 25/07/2025

Senior AWS Site Reliability Engineer (SRE)

  • Makati | Remote
  • Senior AWS Site Reliability Engineer (SRE)
  • Full-Time
  • Apply Now

We are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functional cloud platform team. Working alongside a diverse group of DevOps and Site Reliability Engineers, you will combine deep technical expertise in AWS cloud infrastructure with strong leadership capabilities in incident response and system reliability. In this role, you will be instrumental in leading incident response, maintaining, optimizing and scaling our cloud infrastructure while ensuring exceptional system reliability and performance.

KEY RESPONSIBILITIES:

  • Lead incident response from initial detection, real-time mitigation, root cause analysis, post-mortem documentation (using Incident IO) and implementation of lessons learned, with a focus on continuous improvement.
  • Develop and execute comprehensive incident response strategies to minimise downtime and business impact
  • Participate in a 24/7 on-call rotation to ensure continuous system availability
  • Implement and maintain comprehensive observability solutions using Cloudwatch, DataDog or similar monitoring platforms
  • Maintain, improve, and optimise AWS infrastructure using Terraform while ensuring scalability, reliability, and cost efficiency.
  • Continuously assess and enhance AWS infrastructure to optimise performance and cost-effectiveness
  • Monitor and optimise serverless technologies including AWS Lambda and API Gateway for peak performance and cost efficiency
  • Monitor and maintain ECS Fargate deployments for containerised applications, ensuring optimal resource utilization
  • Collect and analyse metrics to identify resource consumption, abnormal behavior, and potential performance bottlenecks
  • Configure and manage alerting, dashboards, and automated monitoring across distributed systems
  • Foster improved collaboration between development and operations teams by implementing SRE practices.

 

REQUIRED QUALIFICATIONS:

  • Previous experience in a DevOps or SRE role
  • Exceptional written and verbal communication skills
  • Proven experience in incident response and 24/7 on-call responsibilities
  • Expert-level knowledge of Infrastructure as Code, primarily Terraform(demonstrated experience with other IaC tools will be highly regarded)
  • Expert-level knowledge of AWS compute infrastructure
  • Proficiency in automation tools and scripting languages
  • Strong understanding of monitoring, metrics collection, and performance analysis
  • Expert knowledge of observability and monitoring platforms such as DataDog, New Relic, Prometheus, or similar tools
  • Experience with log aggregation, APM (Application Performance Monitoring), and distributed tracing
  • Excellent collaboration abilities and capacity to work effectively in cross-functional teams
  • Strong analytical and problem-solving skills
  • Demonstrated ability to work autonomously and take ownership

 

PREFERRED QUALIFICATIONS:

  • Experience with incident.io (highly desirable).
  • Background in payments and PCI compliance environments (highly desirable).
  • AWS certifications.
  • Experience with container orchestration and microservices architecture.
  • Knowledge of security best practices in cloud environments.

 

WORK DETAILS:

  • Schedule: Monday- Friday, 6:00am- 3:00pm or 7:00am- 4:00pm (PH Time); depending on business needs
  • Location: Makati | Work from Home Until Further Notice

Scroll to Top