Index rebootmgr 1.3

Name

rebootmgrd, rebootmgr.service, org.opensuse.RebootMgr.conf — Reboot the machine during a maintenance window.

Synopsis

/usr/sbin/rebootmgrd [ --debug | --help | --version ]

/usr/lib/systemd/system/rebootmgr.service

/etc/dbus-1/system.d/org.opensuse.RebootMgr.conf

Description

To avoid that a whole cluster or a set of machines with the same task reboot at the same there, rebootmgrd reboots the machine following configured policies.

Reboot Strategies

rebootmgr supports different strategies, when a reboot should be done:

instantly

When the signal arrives other services will be informed that we plan to reboot and do the reboot without getting any locks or waiting for a maintenance window.

maint-window

Reboot only during a specified maintenance window. If no window is specified, reboot immediately.

etcd-lock

Acquire a lock at etcd for the specified lock-group before reboot. If a maintenance window is specified, acquire the lock only during this window. If taking the lock takes longer than the duration of the maitnenance window, the reboot is canceld and an error logged. This option is only available if rebootmgrd was compiled with etcd support.

best-effort

This is the default. If etcd is running, use etcd-lock. If no etcd is running, but a maintenance window is specified, use maint-window. If no maintenance window is specified, reboot immediately (instantly).

off

rebootmgr continues to run, but ignores all signals to reboot. Setting the strategy to off does not clear the maintenance window. If rebootmgr is enabled again, it will continue to use the old specified maintenance window.

The reboot strategy can be configured via rebootmgr.conf(5) and adjusted at runtime via rebootmgrctl(1). This changes will be written to the configuration file and survive the next reboot.

Locking via etcd

To make sure that not all machines reboot at the same time, the machines can be sorted into groups and the number of machines of a group which are allowed to reboot at the same time can be configured and controlled via etcd. So you can create a group "etcd_server", which contains all machines running etcd, and specify that only one etcd server is allowed to reboot at one time. And a second group "worker", in which a higher number of machines are allowed to reboot at the same time.

The etcd path to the directory containing data for a group is: "/opensuse.org/rebootmgr/locks/<group>/". This directory contains two variables: "mutex", which is by default "0" and can be set via atomic_compare_and_swap to "1" to make sure that only one machine has write access, and a variable "data" containing the following json structure:

	{
	  "max":1,
	  "holders":[]
	}
      

"holders" will contain a unique ID of the machines holding a lock. As unique ID the machine ID from /etc/machine-id is used.

So a record containing two locks out of 10 possible would look like:

	{
	  "max":10,
	  "holders":[
	    "3cb8c701b4d3474d99a7e88b31dd3439",
	    "71c8efe539b280af2fe09b3b5771345e"
	  ]
	}
      

A typical work-flow of a client which tries to reboot would look like:

  • check, that there are free locks, else watch the data variable until it changes

  • get the mutex

  • add our machine ID to the list of machine holding a lock

  • release the mutex

  • reboot

  • on boot, check if we hold a lock. If yes:

    • get the mutex

    • remove the machine ID from the list

    • release the mutex

Options

--debug

Log additional informations during runtime. A real reboot is not done in debug mode.

--help

Display help text and exit

--version

Output version information and exit

Environment

ETCD_SERVERS

This environment variable contains a list of URLs of etcd servers. If this variable is not set, "http://127.0.0.1:2379" is used.

See Also

rebootmgrctl(1), rebootmgr.conf(5), systemd.time(7)