JOB OBJECTIVE
The DevOps/Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of the company's infrastructure and applications. This role involves automating processes, managing CI/CD pipelines, monitoring systems, and collaborating with development teams to enhance deployment processes and incident response. The objective is to streamline operations, improve system resilience, and ensure seamless integration between development and operations.
Infrastructure Management
KEY ACCOUNTABILITIES (Including but not limited to)
- Design, implement, and manage scalable, resilient infrastructure.
- Ensure high availability, performance, and reliability of production and non-production environments.
CI/CD Pipeline Development
- Develop, maintain, and enhance CI/CD pipelines to automate application deployment and infrastructure provisioning.
- Implement testing, integration, and deployment automation to ensure rapid, reliable releases.
Monitoring & Incident Response
- Set up and maintain monitoring and alerting systems to track application and infrastructure performance.
- Respond to incidents, troubleshoot issues, and conduct root cause analysis to prevent recurrence.
- Implement and refine incident management processes to minimize downtime and ensure swift recovery.
Automation & Scripting
- Automate routine operational tasks using scripting languages like Python, Bash, or PowerShell.
Collaboration & Communication
- Work closely with development teams to ensure seamless integration of new features and services.
- Collaborate with security teams to implement best practices and ensure compliance with security standards.
- Communicate effectively with stakeholders, providing updates on infrastructure health, performance, and incident resolution.
Performance Optimization
- Continuously monitor and optimize system performance, identifying and resolving bottlenecks and inefficiencies.
- Implement load testing and performance tuning to ensure applications can scale to meet user demand.
Documentation
- Create and maintain detailed documentation for infrastructure, deployment processes, and operational procedures.
- Ensure that troubleshooting guides are up-to-date and accessible to the team.
Security & Compliance
- Implement and maintain security best practices in infrastructure and deployment processes.
QUALIFICATIONS, EXPERIENCE, SKiLLs
Education: Bachelor's degree in computer science or computer engineering.
Skills
Very good command of English is a must.
Experience in administration of Linux
Knowledge of database administration of : Elasticsearch / MongoDB / PostgreSQL
Profound knowledge in Scripting: Python / Bash
Ability to balance simultaneous projects, evaluate workload and prioritize tasks based on criticality
Experience with Django and Python is a plus
Communication skills and the ability to work with different teams and functions is a must
Strong attention to detail
Highly developed communication skills, including the ability to present ideas and share
your knowledge with others.