MASAKARI-MONITORS(1) | masakari-monitors | MASAKARI-MONITORS(1) |
masakari-monitors - masakari-monitors 17.0.1
Contents:
Monitors for Masakari provides Virtual Machine High Availability (VMHA) service for OpenStack clouds by automatically detecting the failure events such as VM process down, provisioning process down, and nova-compute host failure. If it detect the events, it sends notifications to the masakari-api.
Original version of Masakari: https://github.com/ntt-sic/masakari
Tokyo Summit Session: https://www.youtube.com/watch?v=BmjNKceW_9A
Monitors for Masakari is distributed under the terms of the Apache License, Version 2.0. The full terms and conditions of this license are detailed in the LICENSE file.
$ git clone https://github.com/openstack/masakari-monitors.git
$ sudo python setup.py install
$ tox -egenconfig
$ masakari-processmonitor $ masakari-hostmonitor $ masakari-instancemonitor
At the command line:
$ pip install masakari-monitors
Or, if you have virtualenvwrapper installed:
$ mkvirtualenv masakari-monitors $ pip install masakari-monitors
Monitors for Masakari:
The masakari-hostmonitor provides compute node High Availability for OpenStack clouds by automatically detecting compute nodes failure via monitor driver.
Here is an example to show how to make up one consul cluster.
Consul is a service mesh solution providing a full featured control plane with service discovery, configuration, and segmentation functionality. Each of these features can be used individually as needed, or they can be used together to build a full service mesh.
The Consul agent is the core process of Consul. The Consul agent maintains membership information, registers services, runs checks, responds to queries, and more.
Consul clients can provide any number of health checks, either associated with a given service or with the local node. This information can be used by an operator to monitor cluster health.
Please refer to Consul Agent Overview.
There are three controller nodes and two compute nodes in the test environment. Every node has three network interfaces. The first interface is used for management, with an ip such as ‘192.168.101.*’. The second interface is used to connect to storage, with an ip such as ‘192.168.102.*’. The third interface is used for tenant, with an ip such as ‘192.168.103.*’.
Download Consul package for CentOS. Other OS please refer to Download Consul.
sudo yum install -y yum-utils sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo sudo yum -y install Consul
Consul agent must runs on every node. Consul server agent runs on controller nodes, while Consul client agent runs on compute nodes, which makes up one Consul cluster.
The following is an example of a config file for Consul server agent which binds to management interface of the host.
management.json
{ "bind_addr": "192.168.101.1", "datacenter": "management", "data_dir": "/tmp/consul_m", "log_level": "INFO", "server": true, "bootstrap_expect": 3, "node_name": "node01", "addresses": { "http": "192.168.101.1" }, "ports": { "http": 8500, "serf_lan": 8501 }, "retry_join": ["192.168.101.1:8501", "192.168.101.2:8501", "192.168.101.3:8501"] }
The following is an example of a config file for Consul client agent which binds to management interface of the host.
management.json
{ "bind_addr": "192.168.101.4", "datacenter": "management", "data_dir": "/tmp/consul_m", "log_level": "INFO", "node_name": "node04", "addresses": { "http": "192.168.101.4" }, "ports": { "http": 8500, "serf_lan": 8501 }, "retry_join": ["192.168.101.1:8501", "192.168.101.2:8501", "192.168.101.3:8501"] }
Use the tenant or storage interface ip and ports when config agent in tenant or storage datacenter.
Please refer to Consul Agent Configuration.
The Consul agent is started by the following command.
# Consul agent –config-file management.json
After all Consul agents installed and started, you can see all nodes in the cluster by the following command.
# Consul members -http-addr=192.168.101.1:8500 Node Address Status Type Build Protocol DC node01 192.168.101.1:8501 alive server 1.10.2 2 management node02 192.168.101.2:8501 alive server 1.10.2 2 management node03 192.168.101.3:8501 alive server 1.10.2 2 management node04 192.168.101.4:8501 alive client 1.10.2 2 management node05 192.168.101.5:8501 alive client 1.10.2 2 management
This section in masakarimonitors.conf shows an example of how to configure the hostmonitor if you choice monitor driver based on pacemaker.
[host] # Driver that hostmonitor uses for monitoring hosts. monitoring_driver = default # Monitoring interval(in seconds) of node status. monitoring_interval = 60 # Do not check whether the host is completely down. # Possible values: # * True: Do not check whether the host is completely down. # * False: Do check whether the host is completely down. # If ipmi RA is not set in pacemaker, this value should be set True. disable_ipmi_check = False # Timeout value(in seconds) of the ipmitool command. ipmi_timeout = 5 # Number of ipmitool command retries. ipmi_retry_max = 3 # Retry interval(in seconds) of the ipmitool command. ipmi_retry_interval = 10 # Only monitor pacemaker-remotes, ignore the status of full cluster # members. restrict_to_remotes = False # Standby time(in seconds) until activate STONITH. stonith_wait = 30 # Timeout value(in seconds) of the tcpdump command when monitors # the corosync communication. tcpdump_timeout = 5 # The name of interface that corosync is using for mutual communication # between hosts. # If there are multiple interfaces, specify them in comma-separated # like 'enp0s3,enp0s8'. # The number of interfaces you specify must be equal to the number of # corosync_multicast_ports values and must be in correct order with # relevant ports in corosync_multicast_ports. corosync_multicast_interfaces = enp0s3,enp0s8 # The port numbers that corosync is using for mutual communication # between hosts. # If there are multiple port numbers, specify them in comma-separated # like '5405,5406'. # The number of port numbers you specify must be equal to the number of # corosync_multicast_interfaces values and must be in correct order with # relevant interfaces in corosync_multicast_interfaces. corosync_multicast_ports = 5405,5406
If you want to use or test monitor driver based on consul, please modify following configuration.
[host] # Driver that hostmonitor uses for monitoring hosts. monitoring_driver = consul [consul] # Addr for local consul agent in management datacenter. # The addr is make up of the agent's bind_addr and http port, # such as '192.168.101.1:8500'. agent_manage = $(CONSUL_MANAGEMENT_ADDR) # Addr for local consul agent in tenant datacenter. agent_tenant = $(CONSUL_TENANT_ADDR) # Addr for local consul agent in storage datacenter. agent_storage = $(CONSUL_STORAGE_ADDR) # Config file for consul health action matrix. matrix_config_file = /etc/masakarimonitors/matrix.yaml
The matrix_config_file shows the HA strategy. Matrix is combined by host health and actions. The ‘health: [x, x, x]’, repreasents assembly status of SEQUENCE. Action, means which actions it will trigger if host health turns into, while ‘recovery’ means it will trigger one host failure recovery workflow. User can define the HA strategy according to the physical environment. For example, if there is just 1 cluster to monitor management network connectivity, the user just need to configurate $(CONSUL_MANAGEMENT_ADDR) in consul section of the hostmontior’ configuration file, and change the HA strategy in /etc/masakarimonitors/matrix.yaml as following:
sequence: ['manage'] matrix: - health: ['up'] action: [] - health: ['down'] action: ['recovery']
Then the hostmonitor by consul works as same as the hostmonitor by pacemaker.
The masakari-instancemonitor provides Virtual Machine High Availability for OpenStack clouds by automatically detecting VMs domain events via libvirt. If it detects specific libvirt events, it sends notifications to the masakari-api.
This section in masakarimonitors.conf shows an example of how to configure the monitor.
[libvirt] # Override the default libvirt URI. connection_uri = qemu:///system
The masakari-introspectiveinstancemonitor provides Virtual Machine HA for OpenStack clouds by automatically detecting the system-level failure events via QEMU Guest Agent. If it detects VM heartbeat failure events, it sends notifications to the masakari-api.
This section in masakarimonitors.conf shows an example of how to configure the monitor.
[libvirt] # Override the default libvirt URI. connection_uri = qemu:///system [introspectiveinstancemonitor] # Guest monitoring interval of VM status (in seconds). # * The value should not be too low as there should not be false negative # * for reporting QEMU_GUEST_AGENT failures # * VM needs time to do powering-off. # * guest_monitoring_interval should be greater than # * the time to SHUTDOWN VM gracefully. guest_monitoring_interval = 10 # Guest monitoring timeout (in seconds). guest_monitoring_timeout = 2 # Failure threshold before sending notification. guest_monitoring_failure_threshold = 3 # The file path of qemu guest agent sock. qemu_guest_agent_sock_path = \ /var/lib/libvirt/qemu/org\.qemu\.guest_agent\..*\.instance-.*\.sock
The masakari-processmonitor, provides key process High Availability for OpenStack clouds by automatically detecting the process failure. If it detects process failure, it sends notifications to masakari-api.
If your OpenStack service runs in container(pod), this processmonitor will not work as expected. It is recommended not to deploy processmonitor.
Define one process to be monitored as follows:
process_name: [Name of the process as it in 'ps -ef'.] start_command: [Start command of the process.] pre_start_command: [Command which is executed before start_command.] post_start_command: [Command which is executed after start_command.] restart_command: [Restart command of the process.] pre_restart_command: [Command which is executed before restart_command.] post_restart_command: [Command which is executed after restart_command.] run_as_root: [Bool value whether to execute commands as root authority.]
Sample of definitions is shown as follows:
# nova-compute process_name: /usr/local/bin/nova-compute start_command: systemctl start nova-compute pre_start_command: post_start_command: restart_command: systemctl restart nova-compute pre_restart_command: post_restart_command: run_as_root: True
This section in masakarimonitors.conf shows an example of how to configure the monitor.
[process] # Interval in seconds for checking a process. check_interval = 5 # Number of retries when the failure of restarting a process. restart_retries = 3 # Interval in seconds for restarting a process. restart_interval = 5 # The file path of process list. process_list_path = /etc/masakarimonitors/process_list.yaml
The following is an overview of all available configuration options in masakari-monitors. To see sample configuration file, see Masakari Monitors Sample Configuration File.
Determine if monkey patching should be applied.
Related options:
any effect
List of modules/decorators to monkey patch.
This option allows you to patch a decorator for all functions in specified modules.
Related options:
Group | Name |
DEFAULT | host |
Full class name for the Manager for instancemonitor.
Full class name for introspectiveinstancemonitor.
Full class name for the Manager for processmonitor.
Full class name for the Manager for hostmonitor.
If set to true, the logging level will be set to DEBUG instead of the default INFO level.
The name of a logging configuration file. This file is appended to any existing logging configuration files. For details about logging configuration files, see the Python logging module documentation. Note that when logging configuration files are used then all logging configuration is set in the configuration file and other logging configuration options are ignored (for example, log-date-format).
Group | Name |
DEFAULT | log-config |
DEFAULT | log_config |
Defines the format string for %(asctime)s in log records. Default: the value above . This option is ignored if log_config_append is set.
Group | Name |
DEFAULT | logfile |
Group | Name |
DEFAULT | logdir |
Uses logging handler designed to watch file system. When log file is moved or removed this handler will open a new log file with specified path instantaneously. It makes sense only if log_file option is specified and Linux platform is used. This option is ignored if log_config_append is set.
Use syslog for logging. Existing syslog format is DEPRECATED and will be changed later to honor RFC5424. This option is ignored if log_config_append is set.
Enable journald for logging. If running in a systemd environment you may wish to enable journal support. Doing so will use the journal native protocol which includes structured metadata in addition to log messages.This option is ignored if log_config_append is set.
Syslog facility to receive log lines. This option is ignored if log_config_append is set.
Use JSON formatting for logging. This option is ignored if log_config_append is set.
Log output to standard error. This option is ignored if log_config_append is set.
Log output to Windows Event Log.
WARNING:
The amount of time before the log files are rotated. This option is ignored unless log_rotation_type is set to “interval”.
Rotation interval type. The time of the last file change (or the time when the service was started) is used when scheduling the next rotation.
Log file maximum size in MB. This option is ignored if “log_rotation_type” is not set to “size”.
Format string to use for log messages with context. Used by oslo_log.formatters.ContextFormatter
Format string to use for log messages when context is undefined. Used by oslo_log.formatters.ContextFormatter
Additional data to append to log message when logging level for the message is DEBUG. Used by oslo_log.formatters.ContextFormatter
Prefix each line of exception output with this format. Used by oslo_log.formatters.ContextFormatter
Defines the format string for %(user_identity)s that is used in logging_context_format_string. Used by oslo_log.formatters.ContextFormatter
List of package logging levels in logger=LEVEL pairs. This option is ignored if log_config_append is set.
The format for an instance that is passed with the log message.
The format for an instance UUID that is passed with the log message.
Log level name used by rate limiting: CRITICAL, ERROR, INFO, WARNING, DEBUG or empty string. Logs with level greater or equal to rate_limit_except_level are not filtered. An empty string means that all levels are filtered.
Configuration options for sending notifications.
Group | Name |
api | tenant-id |
api | tenant_id |
Group | Name |
api | tenant-name |
api | tenant_name |
Optional domain ID to use with v3 and v2 parameters. It will be used for both the user and project domain in v3 and ignored in v2 authentication.
Optional domain name to use with v3 API and v2 parameters. It will be used for both the user and project domain in v3 and ignored in v2 authentication.
Group | Name |
api | user-name |
api | user_name |
Trial interval of time of the notification processing is error(in seconds).
Indicate whether this resource may be shared with the domain received in the requests “origin” header. Format: “<protocol>://<host>[:<port>]”, no trailing slash. Example: https://horizon.example.com
Indicate that the actual request can include user credentials
Indicate which headers are safe to expose to the API. Defaults to HTTP Simple Headers.
Indicate which methods can be used during the actual request.
Indicate which header field names may be used during the actual request.
The path to respond to healtcheck requests on.
WARNING:
Show more detailed information as part of the response. Security note: Enabling this option may expose sensitive details about the service being monitored. Be sure to verify that it will not violate your security policies.
Additional backends that can perform health checks and report that information back as part of a request.
A list of network addresses to limit source ip allowed to access healthcheck information. Any request from ip outside of these network addresses are ignored.
Check the presence of a file to determine if an application is running on a port. Used by DisableByFileHealthcheck plugin.
Check the presence of a file based on a port to determine if an application is running on a port. Expects a “port:path” list of strings. Used by DisableByFilesPortsHealthcheck plugin.
Monitoring probes to collect before making the decision to send Masakari notification about the node status. If and only if monitoring_samples consecutive reports have the same status, will the Masakari notification be sent.
Trial interval of time of the notification processing is error(in seconds).
Do not check whether the host is completely down.
Possible values:
If ipmi RA is not set in pacemaker, this value should be set True.
Only monitor pacemaker-remotes, ignore the status of full cluster members.
Timeout value(in seconds) of the tcpdump command when monitors the corosync communication.
The name of interface that corosync is using for mutual communication between hosts. If there are multiple interfaces, specify them in comma-separated like ‘enp0s3,enp0s8’. The number of interfaces you specify must be equal to the number of corosync_multicast_ports values and must be in correct order with relevant ports in corosync_multicast_ports.
The port numbers that corosync is using for mutual communication between hosts. If there are multiple port numbers, specify them in comma-separated like ‘5405,5406’. The number of port numbers you specify must be equal to the number of corosync_multicast_interfaces values and must be in correct order with relevant interfaces in corosync_multicast_interfaces.
Using this option, one can avoid systemd checks that would establish whether this hostmonitor is running alongside Corosync and Pacemaker (the cluster stack) or Pacemaker Remote (the remote stack).
The default (autodetect) ensures backward compatibility and means systemd is used to check the stack.
Guest monitoring interval of VM status (in seconds). * The value should not be too low as there should not be false negative * for reporting QEMU_GUEST_AGENT failures * VM needs time to do powering-off. * guest_monitoring_interval should be greater than * the time to SHUTDOWN VM gracefully. * e.g. | 565da9ba-3c0c-4087-83ca | iim1 | ACTIVE | powering-off | Running
Failure threshold before sending notification.
e.g. r’/var/lib/libvirt/qemu/org.qemu.guest_agent..*.instance-.*.sock’
Group | Name |
DEFAULT | osapi_max_request_body_size |
DEFAULT | max_request_body_size |
Whether the application is behind a proxy or not. This determines if the middleware should parse the headers or not.
Interval between re-sending a notification in processmonitor(in seconds).
The file path of process list.
Configure Masakari Monitors by editing /etc/masakarimonitors/masakarimonitors.conf.
No config file is provided with the source code, it will be created during the installation. In case where no configuration file was installed, one can be easily created by running:
tox -e genconfig
To see configuration options available, please refer to Masakari Monitors Configuration Options.
unknown
2024, OpenStack Foundation
April 5, 2024 | 17.0.1 |