Senior DevOps Engineer role at Rakuten
Language Requirement:
English - Business level
Japanese - Not required
Position details:
We are looking for a senior DevOps engineer to support and scale our AI product offerings in the AI For Business Deparment by building robust infrastructure and automation pipelines and monitoring and observability services.
Responsibilities:
Design, implement, and maintain scalable and secure CI/CD pipelines using GitHub Actions for microservices powering LLM-based AI APIs
Build and manage cloud infrastructure (Azure and GCP primarily) using Infrastructure-as-Code (IaC) tools like Terraform.
Establish observability standards across services (logging, monitoring, alerting) using tools like Prometheus, Grafana, Cloud Logging, etc.
Implement automated testing, release workflows, and deployment strategies (blue-green, canary) to ensure high availability and reliability
Collaborate closely with backend engineers, ML engineers, and product teams to ensure DevOps practices align with product goals
Lead incident response, troubleshoot production issues, and implement long-term improvements for reliability
Improve build performance, security posture, and developer experience through better tooling and automation
Required Skills:
- Bachelor’s degree in Computer Science, Computer Engineering, or related technical discipline
- 3–5+ years of experience in a DevOps, SRE, or platform engineering role
- Strong experience with CI/CD pipelines using GitHub Actions (including matrix builds, secrets, caching, workflows)
- Proficient in cloud infrastructure provisioning and management (especially in Azure and/or GCP)
- Deep understanding of Linux systems, containerization (Docker), and orchestration (e.g., Kubernetes)
- Solid scripting/programming skills in Python, Shell, or Go for automation and tooling
- Experinece in API development (e.g FastAPI)
- Experience managing infrastructure security: network policies, firewalls, IAM, secret management
- Strong English communication skills and a collaborative mindset to work with cross-functional teams
Desired Skills:
- Experience operating AI/ML pipelines or APIs in production environments and handling multiple application environments
- Experience with performing zero-downtime deployments, including tasks such as database schema changes.
- Familiarity with Infrastructure-as-Code tools like Terraform, Pulumi, or Bicep
- Knowledge of monitoring/observability stacks (Prometheus, Grafana, ELK, OpenTelemetry)
- Exposure to multi-cloud environments and hybrid cloud strategies
- Interest in Generative AI and LLM infrastructure (e.g., model deployment, inference scalability, vector databases)
- Contributions to developer experience: local dev environments, automated onboarding, lint/test automation