Description
check_mysql_health is a plugin to check various parameters of a MySQL database.
Command line parameters
- --hostname <hostname>
The database server which should be monitored. In case of "localhost" this parameter can be omitted.
- --username <username>
The database user.
- --password <password>
Password of the database user.
- --mode <modus>
With the mode-parameter you tell the plugin what it should do. See the list of possible values further down.
- --name <objektname>
Here the check can be limited to a single object. (Momentarily this parameter is only used for mode=sql)
- --name2 <string>
If you use --mode=sql, then the SQL-Statement appears in the output and performance values. With the parameter name2 you're able to specify a string for this..
- --warning <range>
Determined values outside of this range trigger a WARNING.
- --critical <range>
Determined values outside of this range trigger a CRITICAL.
- --environment <variable>=<wert>
With this you can pass environment variables to the script. Multiple declarations are possible.
- --method <connectmethode>
With this parameter you tell the plugin how it should connect to the database. (dbi for using DBD::mysql (default), mysql for mysql-Tool).
- --units <%|KB|MB|GB>
The declaration from units serves the "beautification" of the output from mode=sql
Use the option --mode with various keywords to tell the Plugin which values it should determine and check.
Keyword | Description | Range |
connection-time | Determines how long connection establishment and login take | 0..n Seconds (1, 5) |
uptime | Time since start of the database server (recognizes DB-Crash+Restart) | 0..n Seconds (10:, 5: Minutes) |
threads-connected | Number of open connections | 1..n (10, 20) |
threadcache-hitrate | Hitrate in the Thread-Cache | 0%..100% (90:, 80:) |
q[uery]cache-hitrate | Hitrate in the Query Cache | 0%..100% (90:, 80:) |
q[uery]cache-lowmem-prunes | Displacement out of the Query Cache due to memory shortness | n/sec (1, 10) |
[myisam-]keycache-hitrate | Hitrate in the Myisam Key Cache | 0%..100% (99:, 95:) |
[innodb-]bufferpool-hitrate | Hitrate in the InnoDB Buffer Pool | 0%..100% (99:, 95:) |
[innodb-]bufferpool-wait-free | Rate of the InnoDB Buffer Pool Waits | 0..n/sec (1, 10) |
[innodb-]log-waits | Rate of the InnoDB Log Waits | 0..n/sec (1, 10) |
tablecache-hitrate | Hitrate in the Table-Cache | 0%..100% (99:, 95:) |
table-lock-contention | Rate of failed table locks | 0%..100% (1, 2) |
index-usage | Sum of the Index-Utilization (in contrast to Full Table Scans) | 0%..100% (90:, 80:) |
tmp-disk-tables | Percent of the temporary tables that were created on the disk instead in memory | 0%..100% (25, 50) |
slow-queries | Rate of queries that were detected as "slow" | 0..n/sec (0.1, 1) |
long-running-procs | Sum of processes that are runnning longer than 1 minute | 0..n (10, 20) |
slave-lag | Delay between Master and Slave | 0..n Seconds |
slave-io-running | Checks if the IO-Thread of the Slave-DB is running | |
slave-sql-running | Checks if the SQL-Thread of the Slave-DB is running | |
sql | Result of any SQL-Statement that returns a number. The statement itself is passed over with the parameter --name. A Label for the performance data output can be passed over with the parameter --name2. The parameter --units can add units to the output (%, c, s, MB, GB,..). If the SQL-Statement includeds special characters or spaces, it can first be encoded with the mode encode. | 0..n |
open-files | Number of open files (of upper limit) | 0%..100% (80, 95) |
encode | Reads standard input (STDIN) and outputs an encoded string. | |
cluster-ndb-running | Checks if all cluster nodes are running. | |
Depending on the chosen mode two labels can appear in the performance data output.
<label>= and <label_now>=
The determinded values apply to the complete runtime of the database and to the time since the last run of check_mysl_health.
Example: qcache_hitrate=71.63%;90:;80: qcache_hitrate_now=8.25%
The Hitrate of the Query-Cache is calculated from Qcache_hits / ( Qcache_hits + Com_select ). This values are continuously increased. A serious change in access behaviour affects the hitrate only slowly. To be able to recognize temporarily fluctuations in the hitrate and, for example, assign it to an application update, the value qcache_hitrate_now is printed out additionally. This value is calculated through the difference (delta) between Qcache_hits and Com_select (actual value of the variables minus the value since the last run from check_mysql_health).
Here the command line parameter --lookback is used.
It's recommended to use --lookback but specify at least half an hour (--lookback 1800) because the now-value underlies a heavy fluctuation which would lead to frequent alarms.
Pleae note, that the thresholds must be specified according to the Nagios plug-in development Guidelines.
"10" means "Alarm, if > 10" und
"90:" means "Alarm, if < 90"
Connect to the database
Creating a database user
In order to be able to collect the needed information from the database a database user with specific privileges is required:
grant usage on *.* to 'nagios'@'nagiosserver' identified by 'nagiospassword'
Connectionstring
To connect to the database you use the parameters --username and --password. The database server which should be used can be specified more precise with --hostname and --socket or --port.
Use of environment variables
It's possible to omit --hostname, --username and --password as well as --socket and --port completely, if you provide the corresponding values in environment variables. Since Version 3.x it is possible to extend service definitions in Nagios through own attributes (custom object variables). These will appear during the exectution of the check command in the environment.
The environment variables are:
- NAGIOS__SERVICEMYSQL_HOST (_mysql_host in the service definition)
- NAGIOS__SERVICEMYSQL_USER (_mysql_user in the service definition)
- NAGIOS__SERVICEMYSQL_PASS (_mysql_pass in the service definition)
- NAGIOS__SERVICEMYSQL_PORT (_mysql_port in the service definition)
- NAGIOS__SERVICEMYSQL_SOCK (_mysql_sock in the service definition)
Examples
nagios$ check_mysql_health --hostname mydb3 --username nagios --password nagios
--mode connection-time
OK - 0.03 seconds to connect as nagios | connection_time=0.0337s;1;5
nagios$ check_oracle_health --mode=connection-time
OK - 0.17 seconds to connect | connection_time=0.1740;1;5
nagios$ check_mysql_health --mode querycache-hitrate
CRITICAL - query cache hitrate 70.97% | qcache_hitrate=70.97%;90:;80: qcache_hitrate_now=72.25% selects_per_sec=270.00
nagios$ check_mysql_health --mode querycache-hitrate
--warning 80: --critical 70:
WARNING - query cache hitrate 70.82% | qcache_hitrate=70.82%;80:;70: qcache_hitrate_now=62.82% selects_per_sec=420.17
nagios$ check_mysql_health --mode sql
--name 'select 111 from dual'
CRITICAL - select 111 from dual: 111 | 'select 111 from dual'=111;1;5
nagios$ echo 'select 111 from dual' |
check_mysql_health --mode encode
select%20111%20from%20dual
nagios$ check_mysql_health --mode sql
--name select%20111%20from%20dual
CRITICAL - select 111 from dual: 111 | 'select 111 from dual'=111;1;5
nagios$ check_mysql_health --mode sql
--name select%20111%20from%20dual --name2 myval
CRITICAL - myval: 111 | 'myval'=111;1;5
nagios$ check_mysql_health --mode sql
--name select%20111%20from%20dual --name2 myval --units GB
CRITICAL - myval: 111GB | 'myval'=111GB;1;5
nagios$ check_mysql_health --mode sql
--name select%20111%20from%20dual --name2 myval --units GB
--warning 100 --critical 110
CRITICAL - myval: 111GB | 'myval'=111GB;100;110
Installation
The plugin requires the installation of a mysql-client packages. The installation of the perl-modules DBI and DBD::mysql is desirable, but not mandatory.
After unpacking the archive ./configure is called. With ./configure --help some options can be printed which show some default values for compiling the plugin.
- --prefix=BASEDIRECTORY
Specify a directory in which check_mysql_health should be stored. (default: /usr/local/nagios)
- --with-nagios-user=SOMEUSER
This User will be the owner of the check_mysql_health file. (default: nagios)
- --with-nagios-group=SOMEGROUP
The group of the check_mysql_health plugin. (default: nagios)
- --with-perl=PATHTOPERL
Specify the path to the perl interpreter you wish to use. (default: perl in PATH)
Download
check_mysql_health-2.2.1.tar.gz
Changelog
- 2.1.1 - 2015-07-29
bugfix in password-expiration
bugfix in rfc-password-encoding and method sqlplus - 2.1 - 2015-07-25
optimized sql for tablespace-free with –notemp (Thanks Frank) - 2.0 - 2015-04-23
add rfc3986-encoded passwords - 1.9.4.9 - 2015-04-23
bugfix in asm-diskgroup-usage perfdata, add perfdata for disk group max size (Thanks Bernhard Keppel) - 1.9.4.8 - 2015-03-19
Convert ‘:/' to ‘_’ in –uniquelabels (Thanks Simon Meggle) - 1.9.4.7 - 2015-03-09
add –uniquelabels for datafiles with identical names (used in io-traffic) - 1.9.4.6 - 2015-02-20
bugfix in –ident - 1.9.4.5 - 2014-12-01
bugfix in method sqlplus for special characters in passwords - 1.9.4.4 - 2014-11-17
switch off -epn again (observed problems in even rarer cases) - 1.9.4.3 - 2014-11-13
make the plugin more epn-safe - 1.9.4.2 - 2014-11-11
bugfix in initial handshake, remove an undef split - 1.9.4.1 - 2014-10-27
switch off -epn again (observed problems in rare cases) - 1.9.4 - 2014-05-18
enable wallets for the dbi method (Thanks Tommi)
bugfix in initial handshake and 9.x
bugfix in invalid/dba_ind_subpartitions and 9.x
–noperfdata suppresses perfdata - 1.9.3.7 - 2014-04-06
switch on +epn - 1.9.3.6 - 2014-04-04
bugfix in method sqlplus. handle expired passwords - 1.9.3.5 - 2014-04-01
remove leftover sqlnet.log files
implement –negate old_level=new_level
output also ok-messages for my-modes - 1.9.3.4 - 2014-03-18
allow floating point numbers in thresholds - 1.9.3.3 - 2014-03-17
bugfix in diskgroups
bugfix in ora-error-handling - 1.9.3.2 - 2014-01-27
bugfix in a tmp-file-cleanup-routine (only –method sqlplus and many concurrent checks) - 1.9.3.1 - 2014-01-14
show ORA-errors in stderr (results in warning) coming from sqlplus - 1.9.3 - 2014-01-13
added subpartitions to invalid-objects - 1.9.2.1 - 2014-01-09
bugfix in sga-library-cache-pinhit-ratio (Thanks Michel van der Voort) - 1.9.2 - 2013-12-18
show detailed output for mode corrupted-blocks and report html
protect sga-data-buffer-hit-ratio against non-plausible values
add modes list-asm-diskgroups and asm-diskgroup-usage/free (Thanks Oliver Skibbe!) - 1.9.1.1 - 2013-12-17
bugfix in connection destructor - 1.9.1 - 2013-12-13
bugfix in tablespace-free (datafile in status recovery leads to undef error) - 1.9 - 2013-12-09
rewrite of the timeout handling
bugfix for the windows version - 1.8.4.3 - 2013-10-16
bugfix in sqlplus fetchrow_array, handles empty result set - 1.8.4.2 - 2013-10-16
bugfix in invalid objects for oracle < 10.x - 1.8.4.1 - 2013-10-09
show a subset of perfata with –name2 and invalid-objects - 1.8.4 - 2013-10-04
invalid-objects can be selected with –name/name2 - 1.8.3 - 2013-09-30
show detailed output for mode invalid-objects and report html - 1.8.2.1 - 2013-
bugfix in –name :… - 1.8.2 - 2013-08-20
enable easy connect syntax with –method sqlplus - 1.8.1.2 - 2013-07-26
–mode sql –name2 “:<label>” suppresses the output of the numerical result - 1.8.1.1 - 2013-07-04
invalid-objects takes refresh delay (default: 2 days) into account. Thanks @chtyo) - 1.8.1 - 2013-07-02
mode sqlplus can execute stored procedures - 1.8.0.1 - 2013-06-10
bugfix in sysdba-connect - 1.8 - 2013-06-04
Don Seiler implemented modes dataguard-lag and dataguard-mrp-status
A big thank to Don! Monitoring of Oracle Data Guard is now possible - 1.7.8.2 - 2013-04-26
fix a bug in the online help - 1.7.8.1 - 2013-04-09
cleanup leftover temp-files (written by method sqlplus) - 1.7.8 - 2013-03-27
added mode decode - 1.7.7.3 - 2013-02-21
bugfix in add_nagios, raising a deprecated-message in perl. (Thanks Philip Griesbacher) - 1.7.7.2 - 2013-01-22
optimized tablespace-can-allocate-next (Thanks Thomas Koerblein) - 1.7.7.1 - 2013-01-14
bugfix in sqlplus connect with a sys user - 1.7.7 - 2012-11-29
add parameter –mitigation and –notemp and –noreadonly - 1.7.6.1 - 2012-11-19
fix a bug with –extra-opts and –environ - 1.7.6 - 2012-11-10
implemented all sorts of thresholds (Thanks Simon Meggle) - 1.7.5.1
fix a bug in sga-library-cache-reloads (thresholds compared against pinhits) (Thanks claney)
fix a bug in calcmeth which only is visible with –environ (Thanks Pullistricker) - 1.7.5
restrict rman problems to backup-operations (not list, report..) (Thanks marekel) - 1.7.4 - 2012-03-15
bugfix in timeout-alarm handling under windows (Thanks Marian Jamrich)
bugfix in invalid-objects. No longer counts subpartitions (Thanks Teijo Lallukka)
bugfix in session-usage (Thanks Bauchi)
add mode sql-runtime - 1.7.3 - 2011 09-29
mode sql now correctly handles dml sql errors like missing tables etc.
single ticks around the –name argument under Windows CMD will be removed automatically - 1.7.2 - 2011-09-21
add mode sga-library-cache-pinhit-ratio
sga-library-cache-hit-ratio becomes sga-library-cache-gethit-ratio
add mode sga-library-cache-reloads - 1.7.1 - 2001-08-17
add option –commit (Thanks Ovidiu) - 1.7.0 - 2011-08-16
add error handling for unwritable status files
fix a bug with statefilesdir and capital letters
enhance stale statistics
enhance invalid objects (Thanks Yannick Charton)
fix a bug in open cmdcmd (only affects method sqlplus) - 1.6.9 - 2011-06-16
sites in an OMD (http://omdistro.org) environment have now private statefile directories
add mode session-usage, process-usage, rman-backup-problems, corrupted-blocks (Thanks Ovidiu Marcu)
-add mode datafiles-created (Thanks Ovidiu Marcu) - 1.6.8.1 - 2011-01-08
Workaround for the windows version which cleans up leftover spool files which cannot be deleted. - 1.6.8 - 2010-01.03
massive speedup in modes seg-top10-* (Thanks Michael Nieberg http://kenntwas.de)
bugfix in –mode sql (numeric vs. regexp result) (Thanks Michel Meelker) - 1.6.7 - 2010-12-18
mode sql can now have a non-numerical output which is compared to a string/regexp
new mode report can be used to output only the bad news (short,long,html) - 1.6.6.2 2010-11-11
better error message with method sqlplus when db is down - 1.6.6.1 2010-10-01
–dbthresholds can have an argument
workaround for an oracle-bug in shared-pool-free (Thanks Yannik) - 1.6.6 2010-08-12
new parameter –dbthresholds. thresholds can now also be deposited in the table check_oracle_health_thresholds
bugfix in connection-time. dbuser was uninitialized in rare cases - 1.6.5 2010-08-09
plugin can now run on windows
–with-mymodules-dyn-dir on the commandline overrides the configure-option of the same name
added mode flash-recovery-area-[usage|free] - 1.6.4
added checking of dba_registry to mode invalid-objects. Thanks Ovidiu Marcu
speedup of tablespace-remaining-time. Thanks Steffen Poulsen
switch-interval detects redo log timestamps in the future and reports critical- method sqlplus now works with “(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP”-like connectstrings
new parameter –ident to show instance and database names in the output
bugfix in tablespace-usage (temp tbs with multiple datafiles). Thanks Philipp Lemke - 1.6.3 2009-09-09
optimized tablespace-can-allocate-next
added more tracing.
fixed a bug which caused invalid statefile names. Thanks Franky van Liedekerke
fixed a bug in tablespace-usage for Oracle 8.1.x
switch-interval now tries to predict the next interval to avoid false alerts. Thanks Naquada.
passwords do no longer show up in error messages. Thanks Jens Seiffert.
fixed a bug in mode sql (numbers of the form .5 were rejected). Thanks Shane Jordan.
fixed a bug in sga-latches-hitratio (thresholds were ignored). Thanks Yannik Charton.
login parameter –user is now –username (–user still works) - 1.6.2 2009-04-04
fixed a bug in tablespace-usage and german localization
fixed a bug with –method sqlplus and tablespacenames. Thanks “contact_name”
fixed a bug in tablespace-usage/free with non-autoextensible TEMP-tablespaces. Thanks Daniel Graef. - 1.6.1 2009-03-27
NAGIOS__HOSTMYSQL_HOST is now possible
added detection for offline/damaged tablespaces to –mode=tablespace-usage|free. Thanks Daniel Graef. - 1.6 2009-03-12
support for DBD::SQLRelay (use it. it spares your database
the extra load caused by permanent logins)
added support for mode=sql and an array of results. Thanks Juergen Lesny from Matrix.
added support for login as “sys”. Thanks Joerg Horchler.
fixed a bug where warning/critical=0 was ignored. Thanks Danijel Tasov. - 1.5.1 2008-12-10
the plugin can be used with the embedded perl interpreter.
fixed some typos. Thanks Oliver Riesen. - 1.5.0.1 2008-10-16
fixed a bug with , instead of . in decimal numbers. Thanks Birk Bohne.
mode=sql numerical results are round up to two decimal places.
fixed a bug in sga-shared-pool-free. You’ll see more free space now. Thanks Birk Bohne. - 1.5.0 2008-10-15
added authentication with password store
added authentication as sysdba
new parameter –units for mode=sql
new parameter tablespace-free which allows thresholds in combination with –units
mode=switch-interval now separates redologs of rac nodes. Thanks Harald Zahn.
it is now possible to integrate self-written code - 1.4.2.1 2008-09-19
bugfix in tablespace-usage. resized datafiles caused usage 100% - 1.4.2 2008-09-16
new mode –regexp which extends –name
bugfix in datafile-io-balance and case sensitive tbs (Thanks Wiltmut Gerdes) - 1.4.1 2008-09-07
new mode tablespace-can-allocate-next
bugfix to handle locked accounts
rewrote seg-top10… sql to avoid overloading
bugfix in timeout
bugfix in mode sql and zero return value. (Thanks Viktor Kaefer)
new mode encode
undo tablespace usage takes into account expired extents - 1.4.0.1 2008-07-07
bugfix when –name=0
bugfix in –method=sqlplus
bugfix in thresholds of invalid-objects (Thanks Konrad Barck) - 1.4 2008-07-03
statesdir is now /var/tmp/check_oracle_health by default (autom. migration if it was /tmp)
bugfix in latch-contention
bugfix in sysstats (thresholds were ignored)
bugfix in roll-extends and roll-wraps
performance enhancements
tablespace-usage can be output as a bargraph (Thanks Allan Peda) - 1.3.1.2 2008-07-02
fixed a bug in disconnect. there were leftover sessions. - 1.3.1.1 2008-07-01
fixed a bug in method=sqlplus and os$user
objects in the recyclebin are no longer treated as invalid
better performance data for pga-in-memory-sort-ratio
fixed a bug in tablespace-usage and temp tbs (Thanks Franky van Liedekerke) - 1.3.1 2008-06-26
typos removed, code cleanup
fixed a bug in connected-users thresholds (Thanks Frank Brehm) - 1.3 2008-06-23
–method=sqlplus using sqlplus instead of DBD::Oracle is possible but NOT supported
!! NOT !! supported. If you use it and it don’t work, then don’t whine about it
tablespace-usage now takes autoextent into account (Thanks Wiltmut Gerdes)
data-buffer/library/dictionary-cache-hitratio is now more accurate
–method=sqlplus does not work for you? I’m not listening, lalalalala - 1.2.7.1 2008-06-20
fixed a bug in windows datafile handling - 1.2.7 2008-06-20
removed unrecoverable datafiles from invalid-objects
added mode sql
bugfixes in top10-x and pga-in-memory-sort-ratio - 1.2.6.1 2008-06-16
added sysstat-rate and list-sysstats - 1.2.6 2008-06-14
added event-waiting
added event-waits
added list-events - 1.2.5.1 2008-06-11
added an abstraction layer so that perl-dbi may be replaced - 1.2.5 2008-06-03
added latch-contention
added enqueue-contention
added enqueue-waiting
added connected-users
added roll-avgactivesize (forget it)
added –list-latches –list-enqueues - 1.2.4.2 2008-05-27
windows pathnames of datafiles are now handled correctly - 1.2.4.1 2008-05-27
added –list-tablespaces –list-datafiles - 1.2.4 2008-05-27
added datafile-io-traffic
added redo-io-traffic
better handling of temp tablespaces - 1.2.3.1 2008-05-25
stale-statistics now works for < 10.x - 1.2.3 2008-05-25
added roll-block-contention
added roll-hit-ratio
fixed a bug in switch-interval - 1.2.2.1 - 2008-05-23
disabled modes which require minimum 10.x - 1.2.2 - 2008-05-21
fixed a bug in –environment - 1.2.1 - 2008-05-19
support for externally authenticated users
new parameters –runas and –environment
sga-buffer-cache-hit-ratio now shows percent (Thanks Maik Ihde)
fixed a bug in tablespace-remaining-time - 1.2 - 2008-05-06
stale-statistics
connection timeout handling - 1.1 - 2008-05-02
tablespace-remaining-time predicts when a tablespace will be full
tablespace-io-balance uses standard deviation - 1.0 - 2008-04-16
Initial release
Copyright
Gerhard Laußer
Check_mysql_health is published under the GNU General Public License. GPL
Author
Gerhard Laußer (gerhard.lausser@consol.de) gladly answers questions to this plugin.
Translation
Thanks to Christian Lauf there is finally an english translation of this page :-)
官网地址:https://labs.consol.de/nagios/check_mysql_health/index.html