Canada

AI-Assisted SRE / AIOps Lead Engineer (Banff)

AI-Assisted SRE / AIOps Lead Engineer (Banff)
Description
Job Title: AI-Assisted SRE / AIOps Lead Engineer Location: Remote Employment Type: Contract Role Overview We are seeking a highly skilled and hands-on AI-Assisted SRE / AIOps Lead Engineer to lead the operationalization and scaling of an SRE agent-driven operations model. This role combines Site Reliability Engineering, automation, production operations, and AI-assisted workflow enablement to modernize operational practices and improve system reliability. This is not a traditional support or coordination role. The ideal candidate will be a technical builder and operator who can independently assess risk, validate AI-driven recommendations, and apply sound operational judgment in high-impact production environments. You should be comfortable leading a small team while actively contributing at the technical level. Key Responsibilities - Lead the adoption, onboarding, and operationalization of SRE agent-driven workflows across reliability and support functions - Translate existing scripts, runbooks, SOPs, and operational procedures into scalable, agent-compatible workflows - Evaluate and determine which operational activities should remain manual, become semi-automated, or be fully automated - Validate AI-generated recommendations, remediation actions, and workflow outputs before production implementation - Support production releases, release validation, smoke testing, and post-deployment system health checks - Drive troubleshooting efforts during production incidents and ensure timely resolution with thorough root cause analysis - Improve alert management, event correlation, and incident response effectiveness - Partner with engineering, platform, and operations teams to onboard new workflows and drive process improvements - Develop and maintain operational documentation, standards, and reusable runbooks - Mentor junior engineers and provide technical guidance on workflow design, operational execution, and validation practices - Continuously identify opportunities to modernize legacy operational processes and improve efficiency Required Experience - 5–10 years of hands-on experience in Site Reliability Engineering, cloud operations, production engineering, platform operations, or IT operations - Strong experience supporting and troubleshooting production environments - Demonstrated experience with automation, incident management, and operational process improvement - Experience working with release support processes and production validation activities - Exposure to AI-assisted operations, AIOps platforms, or automation-led support models is highly preferred - Experience leading initiatives while remaining deeply involved in hands-on execution Required Technical Skills - Strong scripting expertise in: - Python - PowerShell - Shell/Bash - Hands-on experience with: - Monitoring and observability platforms - Logging systems and dashboards - Alerting and incident workflows - Production support and release validation processes - Cloud platforms, preferably Azure - ITSM/ticketing platforms such as ServiceNow, Jira, or equivalent - APIs, integrations, and automation pipelines - Working knowledge or exposure to: - Kubernetes / AKS - AI productivity and operational tools such as ChatGPT and Copilot - Modern automation and orchestration practices Critical Soft Skills - Robust analytical and structured problem-solving skills - Ability to operate effectively in ambiguous environments with incomplete documentation - Strong ownership mindset with the ability to independently drive outcomes - Excellent judgment during high-pressure production incidents - Ability to challenge assumptions and validate AI-assisted recommendations rather than relying on them blindly - Creative approach toward transforming and modernizing legacy operational workflows - Strong communication and collaboration skills across technical and non-technical teams Ideal Candidate Profile - Hands-on builder/operator rather than a pure coordinator or process manager - Comfortable balancing automation with operational governance and control - Able to independently assess: - Risk impact - Blast radius - Rollback strategies - Secure execution practices - Capable of leading a small team while continuing to contribute technically on a day-to-day basis - Practical mindset with a strong focus on operational excellence and reliability engineering This role is ideal for someone who enjoys combining AI-assisted operations, automation, and modern SRE practices to build scalable and reliable operational systems. Apply on Kit Job: kitjob.ca/job/2oyhrw
Highlights
Safety Tips
Be careful if you are offered a job on the spot.
1 / 10
More info about this ad

AI-Assisted SRE / AIOps Lead Engineer (Banff) has been posted in the Cochrane Engineering category on Locanto.

Right now, this is the only ad posted in this category in Cochrane.

There are more ads within a 15 km radius for this category. If you want to view those ads, click here.