Responsibilities
Ensure site reliability by managing the deploy, scaling, and maintanence of new and existing online services that connect over a billion users around the world Leverage your engineering skills while working directly with developers in order to help test and diagnose issues with newly deployed services, infrastructure resources, or code before and after they reach the production environment Manage high severity incidents and incidents impacting end users by focusing on service monitoring, alerts, and rapid recovery Use stress testing to help measure, tune, and optimize system performance and reliability for a wide variety of services Develop and maintain automation tools/systems to help eliminate repetitive manual operations and ensure better site reliability Produce and maintain documentation and standard operating procedures (SOPs) to more efficiently and reliably handle regular operations in conjunction with colleagues around the world Work Location: Singapore-CapitaSky
Requirements
Engineering, or related fields Prior work experience in Cloud Engineering, Site Reliability Engineering (SRE), or DevOps for a major, public-facing internet service Hands-on experience with at least one of the programming languages: Bash, Go, Python Good command of Linux environment with deep understanding of the Linux operating system, including kernel, memory, processes, threads, static / shared libraries, IPC, RPCs, and signals Understanding of standard networking protocols such as HTTP, DNS, SSL, TCP/IP, and ICMP Experience in large-scale distributed environments. Familiarity with distributed systems including the CAP theorem and microservices. Experience with container technologies such as Docker and Kubernetes Experience with monitoring tools like Prometheus and Zabbix Strong sense of ownership, reliability, and integrity demonstrated Passion for eliminating repetitive manual processes via automation Fast learning ability and a good team player Fluency in both English and Mandarin to deal with international stakeholders and stakeholders who are based in HQ