Site Reliability Engineer (Lac-Brome)
Site Reliability Engineer (Lac-Brome)
-
Lac-Brome, Canada
-
Last edited: less than a week ago
-
Save
Description
Title: SRE Operations Engineer (Canada) Location: 100% Remote Role Summary - L1 Site Reliability Engineer responsible for monitoring, triaging, and executing standard operational tasks across enterprise applications - Supports Kubernetes, APIs, WAF, databases, API gateways (Gloo, Apigee), Kafka, and multi-cloud environments (AWS/Azure/GCP) - First line of defense for incident detection, troubleshooting, and escalation using runbooks and automation Key Responsibilities - Monitoring & Infrastructure - Monitor systems using Grafana, Datadog, Splunk, Prometheus, and AIOps tools - Detect anomalies and follow alert workflows for resolution or escalation - Validate Kubernetes issues using monitoring dashboards and logs - Runbook Execution - Follow predefined runbooks for incident resolution - Restart services, validate system health, and escalate when procedures fail - Ensure adherence to operational standards - Incident Triage & Communication - Perform initial incident triage and severity classification - Collect logs, metrics, and system data for analysis - Communicate clearly with stakeholders and escalation teams - Kubernetes Operations - Use kubectl to inspect pods, deployments, and services - Validate service health and troubleshoot cluster-level issues - Scripting & Automation - Read and modify scripts in Python, Bash, or PowerShell - Support automation of repetitive operational tasks - Networking & Security Troubleshooting - Use tools like ping, curl, netstat, and traceroute - Identify DNS, firewall, WAF, or proxy-related issues - Documentation & Knowledge Management - Document incident resolution steps and system issues - Identify gaps in runbooks and suggest improvements Preferred Skills - Familiarity with AWS, Azure, or GCP cloud platforms - Basic SQL/NoSQL knowledge (e.g., simple query validation like SELECT 1) - Experience with ITSM tools such as ServiceNow, Jira, or xMatters - Exposure to observability tools (ELK, Prometheus, Grafana, Splunk) - Understanding of AI-assisted operational support tools - Solid automation mindset and process optimization awareness Qualifications - 2–5 years (or more) in IT operations, NOC, or SRE/DevOps roles - Strong understanding of Linux, networking, and Kubernetes fundamentals - Knowledge of cloud-ready applications and observability tools - Strong troubleshooting skills using structured methods (5 Whys, Fishbone analysis) Deliverables - Continuous monitoring of infrastructure, applications, dashboards, and logs - Execution of standardized runbooks for incidents and routine tasks - First-level incident triage and escalation to L2/L3 teams - Documentation of incidents, gaps, and automation opportunities - Clear communication during operational incidents - Support onboarding of applications into operations framework Apply on Kit Job: kitjob.ca/job/2o5kli
Highlights
-
Company nameNet2Source (N2S)
-
Job positionSite Reliability Engineer (Lac-Brome)
Safety Tips
Be careful with jobs that explicitly state ’no experience needed’.
More info about this ad
Site Reliability Engineer (Lac-Brome) has been posted in the Lac-Brome Engineering category on Locanto.
For Lac-Brome, there are no other ads posted in this category.
There are more ads within a 15 km radius for this category. If you want to view those ads, click here.