IntelliPro

Site Reliability Engineer

Sunnyvale, CA, US

11 days ago
Save Job

Summary

Responsibilities: 责任:

● Design and develop solutions to automate the technical operations of large-scale systems, and work closely with teams to improve stability from a Software Development Lifecycle perspective.

● 设计和开发解决方案以自动化大型系统的技术作,并与团队密切合作,从软件开发生命周期的角度提高稳定性。

● Take the technical effort to strengthen payment system stability, which includes but not limited to the monitoring, logs, dashboard, diagnosis tools etc.; conduct usual drills and develop remedy plans to achieve fast service restoration, and take shifts to respond to production issues across regions.

● 采取技术努力加强支付系统的稳定性,包括但不限于监控、日志、仪表板、诊断工具等;进行常规演练并制定补救计划以实现快速服务恢复,并轮班响应跨区域的生产问题。

● Define the indicators to evaluate the performance and runtime of the system to improve the observability, facilitating system development and trouble-shooting process; and, plan the system capacities according to business expansion and scheduled promotions.

● 定义指标以评估系统的性能和运行时间,以提高可观测性,促进系统开发和故障排除过程;以及,根据业务扩展和预定的促销活动规划系统容量。

● Analyze the production cases, such as performance bottleneck, to conclude the technical best-practices, taking the efforts to achieve the high-availability payment architecture.

● 分析性能瓶颈等生产案例,总结技术最佳实践,努力实现高可用的支付架构。

● Designing and setting up new IDC; designing and implementing data protection plan to meet the standard requirement.

● 设计和建立新的 IDC;设计和实施数据保护计划以满足标准要求。

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: