Site Reliability Engineer (Joliette)
Site Reliability Engineer (Joliette)
-
Joliette, Canada
-
Last edited: less than a week ago
-
Save
Description
Title: SRE Operations Engineer (Canada) Location: 100% Remote Role Summary - L1 Site Reliability Engineer responsible for monitoring, triaging, and executing standard operational tasks across enterprise applications - Supports Kubernetes, APIs, WAF, databases, API gateways (Gloo, Apigee), Kafka, and multi-cloud environments (AWS/Azure/GCP) - First line of defense for incident detection, troubleshooting, and escalation using runbooks and automation Key Responsibilities - Monitoring & Infrastructure - Monitor systems using Grafana, Datadog, Splunk, Prometheus, and AIOps tools - Detect anomalies and follow alert workflows for resolution or escalation - Validate Kubernetes issues using monitoring dashboards and logs - Runbook Execution - Follow predefined runbooks for incident resolution - Restart services, validate system health, and escalate when procedures fail - Ensure adherence to operational standards - Incident Triage & Communication - Perform initial incident triage and severity classification - Collect logs, metrics, and system data for analysis - Communicate clearly with stakeholders and escalation teams - Kubernetes Operations - Use kubectl to inspect pods, deployments, and services - Validate service health and troubleshoot cluster-level issues - Scripting & Automation - Read and modify scripts in Python, Bash, or PowerShell - Support automation of repetitive operational tasks - Networking & Security Troubleshooting - Use tools like ping, curl, netstat, and traceroute - Identify DNS, firewall, WAF, or proxy-related issues - Documentation & Knowledge Management - Document incident resolution steps and system issues - Identify gaps in runbooks and suggest improvements Preferred Skills - Familiarity with AWS, Azure, or GCP cloud platforms - Basic SQL/NoSQL knowledge (e.g., simple query validation like SELECT 1) - Experience with ITSM tools such as ServiceNow, Jira, or xMatters - Exposure to observability tools (ELK, Prometheus, Grafana, Splunk) - Understanding of AI-assisted operational support tools - Strong automation mindset and process optimization awareness Qualifications - 2–5 years (or more) in IT operations, NOC, or SRE/DevOps roles - Strong understanding of Linux, networking, and Kubernetes fundamentals - Knowledge of cloud-ready applications and observability tools - Strong troubleshooting skills using structured methods (5 Whys, Fishbone analysis) Deliverables - Continuous monitoring of infrastructure, applications, dashboards, and logs - Execution of standardized runbooks for incidents and routine tasks - First-level incident triage and escalation to L2/L3 teams - Documentation of incidents, gaps, and automation opportunities - Transparent communication during operational incidents - Support onboarding of applications into operations framework Apply on Kit Job: kitjob.ca/job/2o85pq
Highlights
-
Company nameNet2Source (N2S)
-
Job positionSite Reliability Engineer (Joliette)
Safety Tips
Be careful: if it seems too good to be true, it most likely is.
More info about this ad
Site Reliability Engineer (Joliette) has been posted in the L'Assomption Engineering category on Locanto.
Right now, this is the only ad posted in this category in L'Assomption.
There are more ads within a 15 km radius for this category. If you want to view those ads, click here.