Senior/Lead Technical Support Engineer

Hyderabad, TS, IN

about 1 month ago

Save Job

Summary

Experience: 5-10 years

Role Overview: This role is responsible for troubleshooting and resolving production incidents. This role acts as a bridge between the support and development teams, handling technical investigations, applying quick fixes, and escalating critical issues. By managing and resolving incidents effectively, this role allows the development team to focus on R&D and feature development.

Key Responsibilities:

Incident Management and Troubleshooting

Take ownership of production incidents, perform deep-dive investigations, and provide immediate resolutions or workarounds
Monitor production alerts, logs, and error notifications in real-time to ensure rapid incident response
Escalate unresolved issues to the development team only when necessary, minimizing their involvement in routine incidents
Document all production issues, resolutions, and lessons learned to improve troubleshooting efficiency
Develop and maintain incident response plans to ensure a structured troubleshooting approach

Collaboration and Support Enablement

Work closely with the support team to assist with technical escalations and ensure customer issues are addressed quickly
Coordinate with the development team to report recurring issues that need long-term fixes while reducing their direct involvement in incident handling
Communicate incident status, impact, and resolution progress to key stakeholders and leadership

System Monitoring and Performance Optimization

Monitor support emails, process failure notification emails, and Prometheus alerts to proactively detect or prevent incidents before they occur
Work with DevOps to improve observability, logging, and alerting strategies

Suggest Workarounds and Implement Quick Fixes

Understand the product and customer use cases to provide workaround solutions when needed
Execute minor SQL queries and data fixes to resolve customer issues without requiring development team intervention

Leadership and Team Management

Lead and mentor a team of junior support engineers, ensuring they follow best practices in incident handling
Train the support team on troubleshooting common production issues
Establish clear ownership of incident response to reduce ad-hoc escalations to the development team

Required Qualifications:

Technical Skills:

5+ years of experience in production support, incident management, or site reliability engineering
Good expertise in Linux/Unix systems and troubleshooting
Experience with monitoring tools such as ELK Stack, Grafana, Prometheus, and CloudWatch
Proficiency in SQL (MySQL, PostgreSQL, or Oracle) for running queries and applying minor data fixes
Hands-on experience with log analysis and debugging using ELK Stack
Knowledge of scripting languages such as Shell, Python, or Groovy to automate incident handling
Familiarity with microservices, REST APIs, and message queues like RabbitMQ and Kafka

Soft Skills and Leadership:

Strong problem-solving and troubleshooting skills under pressure
Ability to mentor junior engineers and effectively lead small teams
Excellent communication skills for collaboration with engineering, CS and DevOps teams
Proactive mindset to reduce developer involvement in incident handling and improve overall system reliability

Senior/Lead Technical Support Engineer

Hyderabad, TS, IN

Summary

How strong is your resume?

How strong is your resume?

People also searched:

Our Company

Career Guides

Career Advice

Support