Responsible for the operation and maintenance of cloud resources such as network, computing, storage, databases, and security for Huawei Cloud NA customers, including monitoring and inspection, problem handling, changes, emergency response to failures, risk management, continuous delivery, and automation of operations.
Responsible for service availability architectural design at Huawei Cloud sites, supporting customers about performance pressure testing, fault drills, monitoring system optimization, security reinforcement, cloud-native migration and transformation, leading the optimization and implementation of key risk points.
Responsible for operational capabilities construction for Huawei Cloud NA customers, building operational platform capabilities, continuously improve the level and efficiency of cloud operation and maintenance service.
Provide optimization and transformation implementation or suggestions for customer business needs, such as performance capacity assessment, rate limiting and downgrade plans, complex SQL optimization, and operational process improvement.
Provide major festival service protection for NA customer, ensuring the overall stability and continuity of business.
Engage in daily technical exchanges with users, summarize on-site work, and report regularly.
Job Requirements:
Bachelor's degree or above in science and engineering related fields, with a priority for computer science, communication, software engineering, and other related majors. Over 5 years of cloud operation and maintenance experience, proficient in Linux, understanding of TCP/HTTP protocols, ability to analyze common network issues, and familiarity with common services such as DNS, NTP, DHCP.
Strong experience in problem troubleshooting and performance analysis and optimization, capable of troubleshooting and resolving various issues at the cloud platform level.
Familiar with Huawei Cloud IaaS/PaaS/SaaS products, with Huawei Cloud HCCDP/HCIE certification preferred.
Familiarity with open-source monitoring/log analysis systems (such as Prometheus, Zabbix, ELK) is preferred.
Familiarity with container and Kubernetes principles and daily operations is preferred.
Good team collaboration and communication skills, willingness to share, and a strong customer service consciousness.
Logical thinking and problem analysis skills to troubleshoot complex network problem to isolate the root cause and provide suitable resolution.
Excellent English oral and written communication skills.
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job