We are seeking an experienced Devops/ AIOps Architect to design, architect, and implement an AI-driven operations solution that integrates various cloud-native services across AWS, Azure, and cloud-agnostic environments. The AIOps platform will be used for end-to-end machine learning lifecycle management, automated incident detection, and root cause analysis (RCA). The architect will lead efforts in developing a scalable solution utilizing data lakes, event streaming pipelines, ChatOps integration, and model deployment services. This platform will enable real-time intelligent operations in hybrid cloud and multi-cloud setups.
Responsibilities
Assist in the implementation and maintenance of cloud infrastructure and services
Contribute to the development and deployment of automation tools for cloud operations
Participate in monitoring and optimizing cloud resources using AIOps and MLOps techniques
Collaborate with cross-functional teams to troubleshoot and resolve cloud infrastructure issues
Support the design and implementation of scalable and reliable cloud architectures
Conduct research and evaluation of new cloud technologies and tools
Work on continuous improvement initiatives to enhance cloud operations efficiency and performance
Document cloud infrastructure configurations, processes, and procedures
Adhere to security best practices and compliance requirements in cloud operations
Requirements
Bachelor’s Degree in Computer Science, Engineering, or related field
12+ years of experience in DevOps roles, AIOps, OR Cloud Architecture
Hands-on experience with AWS services such as SageMaker, S3, Glue, Kinesis, ECS, EKS
Strong experience with Azure services such as Azure Machine Learning, Blob Storage, Azure Event Hubs, Azure AKS
Strong experience with Infrastructure as Code (IAC)/ Terraform/ Cloud formation
Proficiency in container orchestration (e.g., Kubernetes) and experience with multi-cloud environments
Experience with machine learning model training, deployment, and data management across cloud-native and cloud-agnostic environments
Expertise in implementing ChatOps solutions using platforms like Microsoft Teams, Slack, and integrating them with AIOps automation
Familiarity with data lake architectures, data pipelines, and inference pipelines using event-driven architectures
Strong programming skills in Python for rule management, automation, and integration with cloud services
Nice to have
Any certifications in the AI/ ML/ Gen AI space
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job