In one of my recent automation projects, I came across a real-time scenario:
An Azure Alert once condition met (example:VM’s C:\ drive crosses 80% usage), and boom — the alert triggers, gets routed to Moogsoft, and creates an incident in ServiceNow.
But before it even hits the L1 queue, I wanted to give automation a shot.
Why assign alert/incident (disk cleanup) to an engineer when a bot can do it better — and faster?
Real Scenario
Here’s what happens traditionally:
- Azure Monitor Alert when conditon met: (C:\ > 80%)
- Moogsoft/or anyother tool receives the alert throgh webhook
- Moogsoft/or anyother tool creates an incident in ServiceNow
- L1 team gets it assigned (and starts with “clearing temp files”)
Let’s not wait till that point.
I wanted to trigger an automated workflow before the incident reaches L1. So we created an Automation Assignment Group in SNOW — all disk space alerts land here first.
What I Did – The Flow
Azure Alert (C:\ > 80%)
↓
Action Group → Logic App (or Webhook)
↓
Moogsoft receives and correlates
↓
Incident Created in ServiceNow → Assigned to "Automation-Bot" group
↓
Azure Automation Runbook kicks in
↓
1. Cleanup temp & log files
2. Recheck space
↓
→ If resolved: Close incident in SNOW
→ Else: Escalate to L1 with full logs & diagnostics
What the Runbook Does
Here’s the high-level logic in my Azure Automation Runbook (PowerShell-based):
$drive = Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='C:'"
$percentFree = ($drive.FreeSpace / $drive.Size) * 100
if ($percentFree -lt 20) {
Write-Output "Low space: $percentFree%. Starting cleanup..."
# Clear temp files
Remove-Item -Path "C:\Windows\Temp\*" -Recurse -Force -ErrorAction SilentlyContinue
Remove-Item -Path "C:\Users\*\AppData\Local\Temp\*" -Recurse -Force -ErrorAction SilentlyContinue
Start-Sleep -Seconds 30
# Re-check
$drive = Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='C:'"
$percentFree = ($drive.FreeSpace / $drive.Size) * 100
if ($percentFree -gt 20) {
Write-Output "Cleanup successful: $percentFree%. Closing incident."
# Call SNOW API → Resolve incident
} else {
Write-Output "Still low space. Escalating to L1."
# Call SNOW API → Update work notes and reassign
}
} else {
Write-Output "Disk space looks fine. No action taken."
}
Integration Points
Here’s where everything ties together:
- Component Role
- Azure Monitor Detect disk threshold breach
- Action Group Triggers Logic App or webhook
- Moogsoft Event correlation, dedupe
- ServiceNow Creates incident
- Azure Runbook Performs cleanup + resolution
- SNOW REST API Auto-close or reassign with logs
Optional Add-Ons (Next Steps)
- Runbook Scheduling: Run daily checks on all servers.
- AI Insights: Use historical patterns to decide cleanup thresholds.
- Self-Healing Summary Dashboard: How many incidents were fixed without L1?
Final Thoughts
- This is how I turned a simple disk space alert into a self-healing workflow.
- It saved our L1 team 15–20 incidents per week (no kidding), and gave me full control to inject intelligence and automation right at the start of the alert pipeline.
- Let me know if you’re building similar automation — I’d love to share ideas or show you how I did the SNOW integration in detail.
Until next time — keep automating – Kasi Suresh | @KasdevTech