🤖 Dan Huynh

Recent Notes

Dan Huynh
Jun 10, 2026
Linearity
Jun 10, 2026
CAP Theorem
Jun 10, 2026
Causality
Jun 10, 2026
Quorum Reads and Writes
Jun 10, 2026

Apr 07, 20261 min read

Monitoring Software

We monitor to produce:

Alerts: A human needs to be paged to take action now
Tickets: A human needs to do something one day
Logging: No need to look at this except for diagnostic purposes

For Performance

Includes:

Performance Profiling
Traces

Challenges

When you observe a system, you perturb it. There is performance overhead that is incurred by monitoring
You increase volume and complexity of monitoring data

For DevOps

Health checks
Can our services do useful work
Maybe do this in a way that shows performance problems

Things to Monitor

CPU Load
Memory Utilization
Disk Space
Disk I/O
Network Traffic
Clock Skew
Queue lengths
Application Response Times

Alert Conditions

CPU usage exceeding threshold for a certain period of time
Increased rate of error logs over a period of time
A service has restarted many times recently
Queue length very long
Taking too long to complete a workflow
Setting minimum thresholds might also held identify errors (things are too quiet…)

Note

Monitoring software is important if you offer a free service, since we want to make sure that we aren’t being taken advantage of.

Graph View

Recent Notes

Dan Huynh
Jun 10, 2026
Linearity
Jun 10, 2026
CAP Theorem
Jun 10, 2026

For Performance
Challenges
For DevOps
Things to Monitor
Alert Conditions

Backlinks

DevOps
Software Operations

Created with Quartz v4.5.1 © 2026

Portfolio
GitHub
LinkedIn