EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking an experienced Devops/ AIOps Architect to design, architect, and implement an AI-driven operations solution that integrates various cloud-native services across AWS, Azure, and cloud-agnostic environments. The AIOps platform will be used for end-to-end machine learning lifecycle management, automated incident detection, and root cause analysis (RCA). The architect will lead efforts in developing a scalable solution utilizing data lakes, event streaming pipelines, ChatOps integration, and model deployment services. This platform will enable real-time intelligent operations in hybrid cloud and multi-cloud setups.
Responsibilities
Assist in the implementation and maintenance of cloud infrastructure and services
Contribute to the development and deployment of automation tools for cloud operations
Participate in monitoring and optimizing cloud resources using AIOps and MLOps techniques
Collaborate with cross-functional teams to troubleshoot and resolve cloud infrastructure issues
Support the design and implementation of scalable and reliable cloud architectures
Conduct research and evaluation of new cloud technologies and tools
Work on continuous improvement initiatives to enhance cloud operations efficiency and performance
Document cloud infrastructure configurations, processes, and procedures
Adhere to security best practices and compliance requirements in cloud operations
Requirements
Bachelor’s Degree in Computer Science, Engineering, or related field
12+ years of experience in DevOps roles, AIOps, OR Cloud Architecture
Hands-on experience with AWS services such as SageMaker, S3, Glue, Kinesis, ECS, EKS
Strong experience with Azure services such as Azure Machine Learning, Blob Storage, Azure Event Hubs, Azure AKS
Strong experience with Infrastructure as Code (IAC)/ Terraform/ Cloud formation
Proficiency in container orchestration (e.g., Kubernetes) and experience with multi-cloud environments
Experience with machine learning model training, deployment, and data management across cloud-native and cloud-agnostic environments
Expertise in implementing ChatOps solutions using platforms like Microsoft Teams, Slack, and integrating them with AIOps automation
Familiarity with data lake architectures, data pipelines, and inference pipelines using event-driven architectures
Strong programming skills in Python for rule management, automation, and integration with cloud services
Nice to have
Any certifications in the AI/ ML/ Gen AI space
We offer
Opportunity to work on technical challenges that may impact across geographies
Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
Opportunity to share your ideas on international platforms
Sponsored Tech Talks & Hackathons
Unlimited access to LinkedIn learning solutions
Possibility to relocate to any EPAM office for short and long-term projects
Focused individual development
Benefit package:
Health benefits
Retirement benefits
Paid time off
Flexible benefits
Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)