I commented before reading the great writeup Tor has put up. Seems I
jumped on his bandwagon, and I apologize. His holistic walk-through of how
to tie into salt's returners, runners, and peer system is just what I
needed and really lacking in the default documentation.
I also have the flu right now. :) The following ramblings may be
medicated.
My post was a synthesis of some of the things I've seen on this list and
what I've already done.
What I have running is icinga configured and maintained by check_mk, which
runs over passwordless ssh to invoke check-mk-agent on minions. Salt
configures all the glue. I'm still manually adding new hosts to check_mk,
but that is just one line. Salt's reactor might be just what I need to
automate this.
The rest of this post will sound like evangelizing check_mk. It does a lot
for you and is hard to pin down a single definition for it. It is a pile o
python that can completely abstract away nagios's objects and templates for
you. I've used nagios for a decade now and will never go back to vim'ing
16 files per host in a complex environment. I guess it's author has gone
to war applying DRY to nagios.
It also consolidates all remote checks (formerly NRPE) to one fork and
connection per host with a simple, single file agent. Here is the output
of a check_mk_agent run: https://gist.github.com/4476003
It also allows you to maintain a normal checks (it calls them legacy
checks) in a single place, and apply these checks via tags (roles).
Here is an example of how check_mk can mesh with a pillar of roles. Say we
put in a pillar that app03 has roles 'mongodb' and 'wsgi_app1'. You can
then use salt to install a LAMP stack + mongodb, and deploy your Paste app.
In check_mk one line needs to be added for a new server, app03:
"app03|ssh|mongodb|python",
It uses the pipe character to add arbitrary tags. All of my hosts have the
ssh tag, which tells check_mk that it is to connect via ssh. ( Here is
where it could be improved to use salt's zeromq connections ). Then it has
mongodb and python tags. Adding these via a jinja snippet should be easy
from the roles in pillar.
The mongodb tag incorporates more checks for this host to remotely verify
the tcp port AND internal checks from the host (status, memory size, number
of connections, etc).
Here is the "legacy check" that is applied to all hosts with the 'mongodb'
tag/role:
legacy_checks += [
( ("check-tcp!-p 31867", "MongoDB", True), ['mongodb'], ALL_HOSTS ),
]
Again, check_mk creates all the hosts, commands, and templates for you with
that one line.
The python tag is magic: I've configured check_mk to inventory only
certain python processes on hosts tagged 'python' and keep track of them.
It auto generates checks for all python processes matching predefined
strings. I've configured 'wsgi', 'beaver', 'salt-minion', and
'supervisor'.
Each of these are a single line in a configuration, for example, learning
to watch 'Paste' pids:
( ['python'], ALL_HOSTS, "Paste", "~.*python .*paster", GRAB_USER, 1, 1,
2, 2 ),
To translate: All hosts with 'python' tag, when inventorying, search for
regex ".*python .*paster" , if present, create a new passive check called
Paste, make sure the UID doesn't change, and set the Warn and Crit levels
... That is a hell of a lot of file edits all over several Nagios objects
and templates. And there is nothing additionally run on the client. The
process list is already delivered via the check_mk_agent.
It even sets up graphing the memory usage, uptime, and number of processes
in pnp4nagios for you.
... it also creates pre-compiled python checks for each host that combine
all the checks for that host for you. Less forking.
I'm going to stop rambling now. I feel like at this point I should say:
"But wait! If you act now, we'll give you TWO check_mk's for the price of
One!"
--