Monit: lightweight monitoring solution

Monit is a simple, lightweight, but useful and powerful enough monitoring solution for your servers.

Monit can monitor:

  • OS processes (presence, resources)
  • files, directories and file system for changes (mtime, size and checksum changes)
  • network hosts (ping, TCP connections)

Monit can notify administrator via configurable e-mail messages. It also can automatically restart failed service.

Monit has an embedded web-server which allows to view state on monitoring objects and disable or enable them.

Of course, enterprise-class monitoring systems have much more features, but they are quite a bit more complex. BTW, there is product named M/Monit. It can control multiple Monit instances. Unfortunately, M/Monit is only available under commercial license.

Let's try to install and configure Monit:

emerge -av monit

And here are some config examples:


set daemon  120 # check every 2 minutes
set logfile syslog facility log_daemon

set mailserver localhost
set eventqueue # use event queue is case mail server is unreachable
    basedir /var/monit
    slots 10
set mail-format { from: monit@ }
set alert admin1 admin2 # list of alert revievers

# internal httpd configuration
set httpd port 2812 and
    use address
    allow admin:password

include /etc/monit.d/*


# overall OS resources checking
check system myserver
    if loadavg (1min) > 30 then alert
    if loadavg (5min) > 20 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert


check process apache with pidfile /var/run/
    start program = "/etc/init.d/apache2 start"
    stop program  = "/etc/init.d/apache2 stop"
    if totalmem > 500.0 MB for 5 cycles then restart
    if children > 250 then restart
    if loadavg(5min) greater than 30 for 8 cycles then stop
    if failed host port 80 protocol http
       and request "/index.html"
       then restart
    if failed port 443 type tcpssl protocol http
       with timeout 15 seconds
       then restart
    if 3 restarts within 5 cycles then timeout


# file system:
check device data with path /dev/sdb1
    start program  = "/bin/mount /data"
    stop program  = "/bin/umount /data"
    if space usage > 80% for 5 times within 15 cycles then alert
    if inode usage > 80% then alert
    group server