boothd - The Booth Cluster Ticket Manager.
[-SD] [-c config
] [-l lockfile
] [-c config
] [-c config
] [-c config
] [-c config
[-D] [-c config
Booth manages tickets which authorizes one of the cluster sites located in
geographically dispersed distances to run certain resources. It is designed to
be extend Pacemaker to support geographically distributed clustering.
It is based on the RAFT protocol, see eg.
# boothd daemon -D
# booth list
# booth grant ticket-nfs
# booth revoke ticket-nfs
Configuration to use.
Can be a full path to a configuration file, or a short name; in the latter case,
the directory /etc/booth
and suffix .conf
are added. Per default
is used, which results in the path /etc/booth/booth.conf
The configuration name also determines the name of the PID file - for the
Site address or name.
The special value 'other' can be used to specify the other
site. Obviously, in that case, the booth configuration must
have exactly two sites defined.
: Don’t wait for
unreachable sites to relinquish the ticket. See the Booth ticket
section below for more details.
This option may be DANGEROUS. It makes booth grant the ticket
even though it cannot ascertain that unreachable sites don't
hold the same ticket. It is up to the user to make sure that
unreachable sites don't have this ticket as granted.
wait for the request outcome: The
client waits for the final outcome of grant or revoke request.
wait for ticket commit to CIB: The
client waits for the ticket commit to CIB (only for grant requests). If one or
more sites are unreachable, this takes the ticket expire time (plus, if
defined, the acquire-after time).
Give a short usage output.
Report version information.
systemd mode: don’t fork. This
is like -D but without the debug output.
Debug output/don’t daemonize. Increases
the debug output level; booth daemon remains in the foreground.
Use another lock file. By default, the lock
file name is inferred from the configuration file name. Normally not
Whether the binary is called as boothd
matter; the first argument determines the mode of operation.
Tells boothd to serve a site. The
locally configured interfaces are searched for an IP address that is defined
in the configuration. booth then runs in either /arbitrator/ or /site/
Booth clients can list the ticket information
(see also crm_ticket -L
), and revoke or grant tickets to a site.
The grant and, under certain circumstances, revoke operations may take a while
to return a definite operation’s outcome. The client will wait up to
the network timeout value (by default 5 seconds) for the result. Unless the
option was set, in which case the client waits indefinitely.
In this mode the configuration file is searched for an IP address that is
locally reachable, ie. matches a configured subnet. This allows to run the
client commands on another node in the same cluster, as long as the config
file and the service IP is locally reachable.
For instance, if the booth service IP is 192.168.55.200, and the local node has
192.168.55.15 configured on one of its network interfaces, it knows which site
it belongs to.
to direct client to connect to a different site.
boothd looks for the (locked) PID file
and the UDP socket, prints some output to stdout (for use in shell scripts)
and returns an OCF-compatible return code. With -D, a human-readable
message is printed to STDERR as well.
List the other boothd
servers we know
In addition to the type, name (IP address), and the last time the server was
heard from, network statistics are also printed. The statistics are split into
two rows, the first one consists of counters for the sent packets and the
second one for the received packets. The first counter is the total number of
packets and descriptions of the other counters follows:
Packets which had to be resent because the
recipient didn’t acknowledge a message. This usually means that either
the message or the acknowledgement got lost. The number of resends usually
reflect the network reliability.
Packets which either couldn’t be sent,
got truncated, or were badly formed. Should be zero.
These packets contain either invalid or
non-existing ticket name or refer to a non-existing ticket leader. Should be
Packets which couldn’t be
authenticated. Should be zero.
The configuration file must be identical on all sites and arbitrators.
A minimal file may look like this:
Comments start with a hash-sign ( '#
'). Whitespace at the start and end
of the line, and around the '=
', are ignored.
The following key/value pairs are defined:
The UDP/TCP port to use. Default is
The transport protocol to use for Raft
exchanges. Currently only UDP is supported.
Clients use TCP to communicate with a daemon; Booth will always bind and listen
to both UDP and TCP ports.
File containing the authentication key. The
key can be either binary or text. If the latter, then both leading and
trailing white space, including new lines, is ignored. This key is a shared
secret and used to authenticate both clients and servers. The key must be
between 8 and 64 characters long and be readable only by the file owner.
As protection against replay attacks, packets
contain generation timestamps. Such a timestamp is not allowed to be too old.
Just how old can be specified with this parameter. The value is in seconds and
the default is 600 (10 minutes). If clocks vary more than this default between
sites and nodes (which is definitely something you should fix) then set this
parameter to a higher value. The time skew test is performed only in concert
Defines a site Raft member with the given IP.
Sites can acquire tickets. The sites' IP should be managed by the
Defines an arbitrator Raft member with the
given IP. Arbitrators help reach consensus in elections and cannot hold
Booth needs at least three members for normal operation. Odd number of members
provides more redundancy.
These define the credentials boothd
will be running with.
On a (Pacemaker) site the booth process will have to call crm_ticket
the default is to use hacluster
:'haclient'; for an arbitrator this user
and group might not exists, so there we default to
Registers a ticket. Multiple tickets can be
handled by single Booth instance.
Use the special ticket name defaults
to modify the defaults. The
stanza must precede all the other ticket
All times are in seconds.
The lease time for a ticket. After that time
the ticket can be acquired by another site if the ticket holder is not
The default is 600
Once a ticket is lost, wait this time in
addition before acquiring the ticket.
This is to allow for the site that lost the ticket to relinquish the resources,
by either stopping them or fencing a node.
A typical delay might be 60 seconds, but ultimately it depends on the protected
resources and the fencing configuration.
The default is 0
Set the ticket renewal frequency period.
If the network reliability is often reduced over prolonged periods, it is
advisable to try to renew more often.
Before every renewal, if defined, the command specified in
is run. In that case the renewal-freq
parameter is effectively also the local cluster monitoring interval.
After that time booth
packets if there was an insufficient number of replies. This should be long
enough to allow packets to reach other members.
The default is 5
Defines how many times to retry sending
packets before giving up waiting for acks from other members.
Default is 10
. Values lower than 3 are illegal.
Ticket renewals should allow for this number of retries. Hence, the total retry
time must be shorter than the renewal time (either half the expire time or
timeout*(retries+1) < renewal
A comma-separated list of integers that define
the weight of individual Raft members, in the same order as the site
Default is 0
for all; this means that the order in the configuration file
defines priority for conflicting requests.
If set, this command will be called before
tries to acquire or renew a ticket. On exit code other than 0,
relinquishes the ticket.
Thus it is possible to ensure whether the services and its dependencies
protected by the ticket are in good shape at this site. For instance, if a
service in the dependency-chain has a failcount of INFINITY
available nodes, the service will be unable to run. In that case, it is of no
use to claim the ticket.
See below for details about booth specific environment variables. The
script is an example which may be used to
test whether a pacemaker resource can be started.
Sites can have GEO attributes managed with the
program. Attributes are within ticket’s scope and
may be tested by boothd
for additional control of ticket failover
(automatic) or ticket acquire (manual).
Attributes are typically used to convey extra information about resources, for
instance database replication status. The attributes are commonly updated by
Attribute values are referenced in expressions and may be tested for equality
with the eq
binary operator or inequality with the ne
The usage is as follows:
attr-prereq = <grant_type> <name> <op> <value>
<grant_type>: "auto" | "manual"
<name>: attribute name
<op>: "eq" | "ne"
<value>: attribute value
The two grant types are auto
for ticket failover and manual
grants using the booth client. Only in case the expression evaluates to true
can the ticket be granted.
It is not clear whether the manual
grant type has any practical use
because, obviously, this operation is anyway controlled by a human.
Note that there can be no guarantee on whether an attribute value is up to date,
i.e. if it actually reflects the current state.
One example of a booth configuration file:
transport = udp
port = 9930
expire = 600
acquire-after = 60
timeout = 10
retries = 5
renewal-freq = 60
before-acquire-handler = /usr/share/booth/service-runnable db8
attr-prereq = auto repl_state eq ACTIVE
The booth cluster guarantees that every ticket is owned by only one site at the
Tickets must be initially granted with the booth client grant
Once it gets granted, the ticket is managed by the booth cluster. Hence, only
granted tickets are managed by booth
If the ticket gets lost, i.e. that the other members of the booth cluster do not
hear from the ticket owner in a sufficiently long time, one of the remaining
sites will acquire the ticket. This is what is called ticket failover
If the remaining members cannot form a majority, then the ticket cannot fail
A ticket may be revoked at any time with the booth client revoke
For revoke to succeed, the site holding the ticket must be reachable.
Once the ticket is administratively revoked, it is not managed by the booth
cluster anymore. For the booth cluster to start managing the ticket again, it
must be again granted to a site.
The grant operation, in case not all sites are reachable, may get delayed for
the ticket expire time (and, if defined, the acquire-after
reason is that the other booth members may not know if the ticket is currently
granted at the unreachable site.
This delay may be disabled with the -F
option. In that case, it is up to
the administrator to make sure that the unreachable site is not holding the
When the ticket is managed by booth
, it is dangerous to modify it
manually using either crm_ticket command or crm site ticket. Neither of these
tools is aware of booth
and, consequently, booth
itself may not
be aware of any ticket status changes. A notable exception is setting the
ticket to standby which is typically done before a planned failover.
Tickets are not meant to be moved around quickly, the default expire
is 600 seconds (10 minutes).
works with both IPv4 and IPv6 addresses.
renews a ticket before it expires, to account for possible
transmission delays. The renewal time, unless explicitly set, is set to half
Currently, there’s only one external handler defined (see the
configuration item above).
The following environment variables are exported to the handler:
The ticket name, as given in the configuration
file. (See ticket item above.)
The local site name, as defined in
The path to the active configuration
The configuration name, as used by the
-c commandline argument.
When the ticket expires (in seconds since
1.1.1970), or 0.
The handler is invoked with positional arguments specified after it.
The default configuration file name. See also
the -c argument.
There is no default, but this is a typical
location for the shared secret (authentication key).
Directory that holds PID/lock files. See also
the status command.
In essence, every ticket corresponds to a separate Raft cluster.
A ticket is granted to the Raft Leader
which then owns (or keeps) the
The booth daemon for an arbitrator which typically doesn’t run the
cluster stack, may be started through systemd or with
, depending on which init system the
The SysV init script starts a booth arbitrator for every configuration file
found in /etc/booth
Platforms running systemd can enable and start every configuration separately
# systemctl enable booth@<configurationname>
# systemctl start booth@<configurationname>
requires the configuration name, even for the default name
Success. For the status command: Daemon
General error code.
No daemon process for that configuration
Booth is tested regularly. See the README-testing file for more information.
Please report any bugs either at GitHub:
Or, if you prefer bugzilla, at openSUSE bugzilla (component "High
was originally written (mostly) by Jiaju Zhang.
In 2013 and 2014 Philipp Marek took over maintainership.
Since April 2014 it has been mainly developed by Dejan Muhamedagic.
Many people contributed (see the AUTHORS file).
Copyright © 2011 Jiaju Zhang <firstname.lastname@example.org>
Copyright © 2013-2014 Philipp Marek <email@example.com>
Copyright © 2014 Dejan Muhamedagic <firstname.lastname@example.org>
Free use of this software is granted under the terms of the GNU General Public