Common server host monitoring procedures
What problems you can prevent
Well-known day-to-day monitoring activities go a long way towards ensuring smooth operation of production systems. With mission critical systems, monitoring becomes even more important. As with all systems, running out of memory, disk space or CPU capacity will bring down the entire portion of the TASSTA framework that the server is responsible for, so it is essential to watch these three parameters. On a smaller scale, downtime of specific services means the corresponding TASSTA functionality becomes unavailable, so the states of these services should also be monitored.
What you need
To implement monitoring of the health of your TASSTA servers, you need to know the basics of shell scripting, setup of cron
schedules and operation of the relevant commands that provide information about the use of resources on your server hosts.
How to proceed
This topic only suggests the areas where monitoring is desirable and gives examples of common tools customarily used by Linux administrators for similar purposes. The exact procedures are not described. The article outlines the goals but does not propose any particular workflow for achieving them; such proposals may contradict your administrative experience anyway.
Consider an SLA: |
Instead of setting up your own monitoring configuration, you might prefer TASSTA to do all the monitoring for you. In that case, contact the Support team about a service-level agreement. |
Hardware resource monitoring
The following table shows the critical levels for CPU, memory and disk space usage. If any of these levels is reached on your server, urgent administrative action is needed to reduce the usage of that resource.
Resource | Threshold value |
---|---|
CPU | 90% used |
Memory | 95% used |
Disk space | 95% used |
A popular method to monitor CPU and memory usage is the top
command. Importantly, it is available out of the box in the Linux server configuration recommended for TASSTA deployment. To run the program in a non-interactive way (for use in a shell script), you can include the -bn 1
parameters.
To check disk space usage, the df
command is commonly used.
You may want to set up automatic alerts in case the threshold levels are reached. For that, wrap your commands and the parsing of their output in a shell script and schedule the script with cron
to run frequently. For the alerting part, you can use sendmail
or any other notification method that is convenient to you.
Service state monitoring
The following services should be running at all times on a fully functioning T.Lion server. Some of them are TASSTA services, and others are their dependencies.
- amgwservice
- authservice
- billingservice
- configservice
- connectionservice
- crqueue-service
- emcservice
- emergency-service
- eventservice
- fileservice
- mailservice
- maptoolservice
- mongod
- mysqld
- nginx
- recorder
- tasstad
- updates-service
- userjournalservice
- watchdog
The best-known tool for checking the running services is the ps
command.
As with hardware resource monitoring, it is a good idea to set up alerts for when any of these services are not running. You can use the toolset mentioned above: shell scripts for invoking commands and parsing the output, cron
for scheduling and sendmail
for notification.
In addition, watch Syslog for any messages saying the following:
- "License status changed" (in tasstad 5.5.43 and later)
This can mean TASSTA license invalidation - "Started TASSTA (TL) Server Daemon"
This means the tasstad service was restarted
T.Lion–T.Brother configuration specifics
In a T.Lion–T.Brother failover configuration, additional services should be monitored.
On T.Lion:
- Monitor the keepalived service
- Watch Syslog for messages saying, "Galera status is not OK"
On T.Brother, monitor the following services:
- keepalived
- mongod
- mysqld
- nginx
On T.Arbitrator:
- Monitor the tassta_garb.service service