Site Reliability Engineer
  • Posted On: 20/04/2026

Site Reliability Engineer

  • Makati
  • Site Reliability Engineer
  • Full-Time
  • Apply Now

SITE RELIABILITY ENGINEER

 

About the role

As a Site Reliability Engineer, you are responsible for designing, building and maintaining reliable, secure and scalable systems that support the delivery and operation of high-quality software. You will play a key role in improving platform resilience, observability, automation and incident response, ensuring our products perform reliably for customers now and into the future.

 

 

Key Responsibilities

Reliability Engineering and Platform Operations

  • Design, build and maintain reliable, secure and scalable infrastructure and platform capabilities that support our engineering teams and customer-facing products.
  • Improve system availability, performance and resilience through proactive monitoring, alerting, automation, continuous improvement practices, and by implementing and maintaining observability standards across systems, including logging, metrics, tracing, dashboards and alerting, so that teams have strong visibility into service health and performance.
  • Support production systems across the software delivery lifecycle, including deployments, infrastructure changes, incident response, root cause analysis and post-incident follow-up actions.
  • Drive infrastructure as code and automation practices to reduce manual effort, improve consistency and strengthen operational maturity.
  • Partner with Engineering teams to design systems that are resilient, supportable and scalable, with consideration for failure modes, recovery, security, compliance and long-term maintainability.
  • Contribute to capacity planning, disaster recovery readiness, environment management and platform standardisation.
  • Support and improve CI/CD pipelines, release processes and deployment reliability to enable fast, safe and repeatable delivery.
  • Use AI tools and agents responsibly to improve efficiency in infrastructure management, scripting, troubleshooting and operational workflows, while applying sound engineering judgement and review.
  • Conduct regular reviews of infrastructure, automation and operational processes to ensure alignment with architecture principles, security standards, performance expectations and reliability goals.
  • When presented with a reliability, infrastructure or operational challenge that is not straightforward, take the time to think through multiple approaches. Weigh up the pros and cons of each and document this thought process so it can be peer reviewed before proceeding. Where useful, support this with a proof of concept or prototype.

 

Incident Management and Continuous Improvement

  • Participate in on-call rotations to provide 24/7 incident response, helping to diagnose issues, restore service quickly and communicate clearly with stakeholders during and after incidents.
  • Lead or contribute to root cause analysis and post-incident reviews, ensuring actions are identified, prioritised and followed through to reduce recurrence.
  • Establish and improve service level indicators, service level objectives and reliability standards that help teams make informed trade-offs between speed, stability and operational effort.
  • Identify recurring operational pain points and drive long-term improvements through automation, architecture changes or process refinement.

 

Stakeholder Collaboration

  • Partner closely with Product, Engineering, Security and Delivery to ensure shared understanding of operational risks, platform needs, service reliability and technical trade-offs.
  • Ensure transparency of progress, risks and operational considerations for your current workload is clearly articulated to your manager and relevant stakeholders. If you are stuck, it is important to communicate this early so help can be provided. Timebox yourself if you are concerned something may blow out.
  • Work with developers and technical leaders to embed reliability, observability and operational excellence into day-to-day engineering practices.
  • Showcase your work with your peers and the wider Engineering team. We promote sharing what you have created in our Showcases, or alternatively create a quick Loom video and share it in Slack.
  • Work with technical leadership to help shape the reliability, platform and operational direction of the business, including solving existing latent issues as well as maximising the use of emerging technologies and approaches.

 

 

Requirements

Skills & Experience

  • Strong experience in Site Reliability Engineering, DevOps, Platform Engineering or Infrastructure Engineering within a SaaS or product-led environment, using an Agile delivery model, and with a strong sense of end-to-end ownership and pride in your work.
  • Experience with cloud infrastructure, containerisation, CI/CD pipelines, infrastructure as code and modern monitoring and observability tooling.
  • Experience supporting production systems, including incident management, troubleshooting, root cause analysis and operational improvement.
  • Experience with scripting or software engineering to build automation, improve platform capability and reduce manual effort.
  • Familiarity with technology stacks such as Docker, Git, SQL-based databases and modern application environments. Experience supporting PHP/Laravel, React/TypeScript or comparable product stacks will be highly regarded.
  • Experience managing, tuning, and optimising databases, schemas, and queries to ensure performance and reliability.
  • A healthy attitude towards, and demonstrated experience with, AI-assisted engineering and operational practices, using AI to increase efficiency and quality while maintaining strong engineering oversight.
  • Demonstrated experience influencing technical decision-making and improving reliability, operational standards and platform practices as part of a team.
  • Strong collaboration skills across cross-functional teams.
  • Ability to balance reliability, security and technical excellence with business priorities and delivery needs.
  • Clear and effective communication skills.
  • A curious learner who is willing to continuously learn new things.

 

Behavioural / mindset attributes

  • Demonstrated curiosity and experimentation mindset around AI tooling, infrastructure automation and operational improvement, staying current with new capabilities and actively running small trials to validate value.
  • Pragmatic approach to AI adoption, able to balance enthusiasm with engineering discipline, ensuring AI tools augment rather than replace critical thinking, design, review and operational accountability.
  • Strong change leadership skills, able to influence engineers with varying levels of comfort across reliability practices, automation and AI usage, address scepticism with data, and create a culture of responsible, accountable usage.
  • Calm and methodical under pressure, with the ability to respond effectively to incidents and operational challenges.
  • Strong systems thinking, with an ability to identify patterns, anticipate risks and design for resilience.
  • Have an opinion on what we are doing well and where we could improve. Constructive feedback is always welcome, in particular when you can walk the walk and take it upon yourself to make things better.

 

 

Work Details

  • Shift: Monday to Friday: 6:00am- 3:00pm or 7:00am- 4:00pm PH Time with 24/7 On-call Participation; depending on business needs
  • Location: Makati | *Work from Home Until Further Notice
  • Status: Full-time or Contractor Set-up.

Scroll to Top