As a Systems Analyst for the Systems Operations Centre (SOC), you are a member of a team which manages IT operations on behalf of customers to reduce the impact of operational incidents and perform system management tasks as prescribed by our customer and HG SOC processes. This team provides systems monitoring, event investigation and analysis, and security monitoring.
As part of the team responsible for the 24×7 systems Event Management Service, you will be responsible for the following items:
24×7 Systems Event Management:
Managed Technologies and responsibilities
- Datacenter/Point of Presence/Hub Room Infrastructure
- Data/Voice Networks and Telecom
- Servers, Storage and Operating Systems
- Information Security
- Databases and Database Servers
- Corporate Applications and Middleware
- Batch Processing Jobs
- Incident Management Process
- Following an established, documented process for event detection including but not limited to:
- Receipt of Alerts, (and Operational Health Alerts from Devices, applications, databases and servers) from monitored devices and associated technology
- Acknowledgement of receipt of the event
- Opening new service desk tickets, or update existing tickets in order to track event handling through its lifecycle to resolution and closure.
- Assignment of the event ticket to the appropriate owner.
- Follow established process for identification of events that require filtering
- Documenting requests for event filtering in the service desk ticket
- Assignment of the event ticket to the appropriately defined resource.
- In the event of client request for filtering of certain event types, follow the established process for completing that request.
- Follow an established process for the purposes or collecting relevant data and performing the necessary level of analysis on that data.
- Determine relationships between the event, client services, technologies and previous tickets.
- Determine whether the relationships warrant an increase in severity, and subsequent reprioritization in escalation.
- Document your findings in the service desk ticket as they are discovered.
- Follow an established process for transmitting event investigation data to the appropriate point of contact, whether that point of contact is an external client, or an internal resource.
- Report on recurring problems and issues discovered during the course of your duties.
- Provide action plans detailing specifics of:
- What the event indicates (Event Description)
- Why it is important for the client (What the potential Risks are to the business)
- What actions the client can take to remediate the current event, and prevent future instances of this event type.
- Follow established process to ensure that resolution criteria are met before closing tickets.
Proactive Health Monitoring:
Manual Health Checks
- In the event that proactive, automated health monitoring of critical devices is not possible, you will:
- Follow establish and approved processes for performing scheduled health checks on applicable devices.
Must have demonstrated knowledge and experience with the following:
- APC StruxWare or similar power and HVAC monitoring systems
- SolarWinds Orian for network monitoring
- Cisco UCS
- Networking gear including: Cisco equipment, Cisco VoIP, Citrix Load Balancers, Alcatel DWDM
- HP Servers
- EMC Brocade and IBM Storage SAN
- Brocade SAN Switches
- VMWARE monitoring using Vcenter
- SIEM Monitoring for security technologies such as IPS/IDS, Firewalls, Malware, DLP and Web filtering
- Databases: Oracle and MS-SQL and associated monitoring tools using Oracle Enterprise Manager and Dell Spotlight
- Middleware monitoring – Websphere using Compuware Dynatrace
- Experience with ERP and Web Portals a strong plus as well as monitoring using Automic Appworx
- BMC Remedy
- Critical Thinking and Analytical skills
- Excellent written and verbal communication skills
- Strong troubleshooting and problem solving skills
- Team player with ability to work autonomously
- Ability to prioritize, and reprioritize work as require