Senior Site Reliability Engineer with AI/ML in San Leandro, CA - Only locals
San Leandro, CA, US
about 1 month ago
Save Job
Summary
Job Title Senior Site Reliability Engineer with AI/ML.
Location: San Leandro, CA - Onsite
Note: Need only locals as in-person interview required for this role
10+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
10+ years of experience in Production support/Site Reliability Engineering teams with continued focus on improving Platform health
Familiar with Agile or other rapid application development practices
Hands-on expertise with Automated testing, Process Automation & building dashboards using APM tools.
Experience with distributed (multi-tiered) systems, algorithms, hands-on exp with Oracle and MongoBD databases.
Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka.
Must have working knowledge of APM tools such as splunk, GCL, ELK, Grafana, Prometheus etc.
Able to create Dashboards using GCL/Splunk/ELK and setup alerts.
Working knowledge of CICD is a plus – Source control like Git, Continuous Integration – Jenkins / UCD Release etc. .
Ability to work with Engineering teams across the ecosystem such as Security, Networking & Infrastructure challenges which can impact platform health & resiliency.
Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks .
Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF, Kubernetes / OpenShift, AWS or Azure.