Softworld, a Kelly Company

IT Operations Manager - Incident & Problem Management

Portland, OR, US

10 days ago
Save Job

Summary

Hybrid – 3 days onsite and 2 days remote

Summary:

Oversee daily incident and problem management across IT Operations to ensure reliable, secure, and cost-effective technology services for critical systems, including transit operations and customer-facing platforms. Lead 24/7 ITIL-compliant processes, ensuring adherence to industry best practices. Guide technical teams through incidents to resolution while delivering timely, clear communication to stakeholders. Drive the formal Problem Management function, ensuring issues are thoroughly addressed and resolved. Develop and implement operational plans for routine, degraded, and emergency service scenarios. Collaborate on policies and procedures that support effective IT operations. Serve as an on-call manager for continuous 24/7 coverage.

Primary Responsibilities:

  • Provide IT wide effective leadership of all teams involved in incident identification, scope of impact, resolution and following problem management processes.
  • Lead the organization in increasing resilience by way of reducing incident duration and frequency.
  • Develop a work environment that is inclusive, respectful and in full partnership with members and officers of IT. Ensure consistent application of policies, procedures and labor agreement throughout the workforce. Apply provisions of the Working and Wage Agreement, District policies, rules, and standard operating procedures while addressing employee concerns over work times and conditions.
  • Collaborate, develop, recommend, communicate and evaluate new or revised policies, programs and procedures that impact operations and support of agency goals, including IT policies, Standard Operating Procedures (SOP), communication protocols, and business processes. Keep current with industry best practices including consistent SOP application and usage, development of Key Performance Indicators (KPI) and reporting on same.
  • Assist in development of programs for OCC technical systems, including bus dispatch, rail control, CCTV, and mobile computing systems. Assist in managing logistics implementation sequence and schedules, testing, training, and operations for replacement or modification projects for OCC technical systems.
  • Develop, coordinate, and implement operational plans related to normal, degraded, and emergency service. Support the Director in overall incident command during special events, service interruptions and during emergencies. Collaborate with OCC indecent manager and as needed, overall incident commander and be an active member of an Emergency Operations Command Center.
  • Help establish clear goals, targets, performance standards, policies, strategic actions and scorecards. Provide leadership to the operational team and provide employees clear communication and direction regarding goals and objectives. Measure performance goals and recognize achievements.
  • Serve as an ITIL expert in the space of Incident and Problem management; contribute to overall ITIL maturity organization wide.

Required Education:

  • Bachelor’s degree in computer science or IT is required.
  • Master’s degree in computer science or IT is preferred.

Required Work Experience:

  • 10 years of experience in IT operations, managing large complex systems. Experience with different IT Infrastructure components such as Servers, Storage, Network, Cloud, firewalls, routers, load balancers, Oracle/SQL, virtualization, customer facing and business critical applications, email and phone systems.
  • 3 years of experience in Incident and or Problem management in a lead or participating role within a large enterprise.
  • 3 years of experience in ITSM in a technology operations environment.
  • 3 years of experience in providing services as a lead or participant in achieving SLA/KPI targets.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job