Supervised failover

Enterprise Edition

Supervised failover is supported by the Enterprise Edition only.

Example on GitHub: supervised_failover

Tarantool provides the ability to control leadership in a replica set using an external failover coordinator. A failover coordinator reads a cluster configuration from a file or an etcd-based configuration storage, polls instances for their statuses, and appoints a leader for each replica set depending on the availability and health of instances.

To increase fault tolerance, you can run two or more failover coordinators. In this case, an etcd cluster provides synchronization between coordinators.

Overview

The main steps of using an external failover coordinator for a newly configured cluster might look as follows:

Configure a cluster to work with an external coordinator. The main step is setting the replication.failover option to supervised for all replica sets that should be managed by the external coordinator.
Start a configured cluster. When an external coordinator is still not running, instances in a replica set start in the following modes:
- If a replica set is already bootstrapped, all instances are started in read-only mode.
- If a replica set is not bootstrapped, one instance is started in read-write mode.
Start a failover coordinator. You can start two or more failover coordinators to increase fault tolerance. In this case, one coordinator is active and others are passive.

Once a cluster and failover coordinators are up and running, a failover coordinator appoints one instance to be a master if there is no master instance in a replica set. Then, the following events may occur:

If a master instance fails, a failover coordinator performs an automated failover.
If an active failover coordinator fails, another coordinator becomes active and performs an automated failover.

Note

Note that a failover coordinator doesn’t work with replica sets with two or more read-write instances. In this case, a coordinator logs a warning to stdout and doesn’t perform any appointments.

Appointing a new master instance

After a master instance has been appointed, a failover coordinator monitors the statuses of all instances in a replica set by sending requests each probe_interval seconds. For the master instance, the coordinator maintains a read-write mode deadline, which is renewed periodically each renew_interval seconds. If all attempts to renew the deadline fail during the specified time interval (lease_interval), the master switches to read-only mode. Then, the coordinator appoints a new instance as the master.

Note

Anonymous replicas are not considered as candidates to be a master.

If a remote etcd-based storage is used to maintain the state of failover coordinators, you can also perform a manual failover.

Active and passive coordinators

To increase fault tolerance, you can run two or more failover coordinators. In this case, only one coordinator is active and used to control leadership in a replica set. Other coordinators are passive and don’t perform any read-write appointments.

To maintain the state of coordinators, Tarantool uses a stateboard – a remote etcd-based storage. This storage uses the same connection settings as a centralized etcd-based configuration storage. If a cluster configuration is stored in the <prefix>/config/* keys in etcd, the failover coordinator looks into <prefix>/failover/* for its state. Here are a few examples of keys used for different purposes:

<prefix>/failover/info/by-uuid/<uuid>: contains a state of a failover coordinator identified by the specified uuid.
<prefix>/failover/active/lock: a unique identifier (UUID) of an active failover coordinator.
<prefix>/failover/active/term: a kind of fencing token allowing to have an order in which coordinators become active (took the lock) over time.
<prefix>/failover/command/<id>: a key used to perform a manual failover.

Configuring a cluster

To configure a cluster to work with an external failover coordinator, follow the steps below:

(Optional) If you need to run several failover coordinators to increase fault tolerance, set up an etcd-based configuration storage, as described in Centralized configuration storages.
Set the replication.failover option to supervised:
```
replication:
  failover: supervised
```

Grant a user used for replication permissions to execute the failover.execute function:

credentials:
  users:
    replicator:
      password: 'topsecret'
      roles: [ replication ]
      privileges:
      - permissions: [ execute ]
        functions: [ 'failover.execute' ]

Create the failover.execute function in the application code. For example, you can create a custom role for this purpose:

-- supervised_instance.lua --
return {
    validate = function()
    end,
    apply = function()
        if box.info.ro then
            return
        end
        local func_name = 'failover.execute'
        local opts = { if_not_exists = true }
        box.schema.func.create(func_name, opts)
    end,
    stop = function()
        if box.info.ro then
            return
        end
        local func_name = 'failover.execute'
        if not box.schema.func.exists(func_name) then
            return
        end
        box.schema.func.drop(func_name)
    end,
}

Then, you need to enable this role for all storage instances:

roles: [ 'supervised_instance' ]

(Optional) Configure options that control how a failover coordinator operates in the failover section:

failover:
  probe_interval: 5
  lease_interval: 15
  renew_interval: 5
  stateboard:
    keepalive_interval: 5
    renew_interval: 1

You can find the full example on GitHub: supervised_failover.

Starting a failover coordinator

To start a failover coordinator, you need to execute the tarantool command with the failover option. This command accepts the path to a cluster configuration file:

tarantool --failover --config instances.enabled/supervised_failover/config.yaml

If a cluster’s configuration is stored in etcd, the config.yaml file contains connection options for the etcd storage.

You can run two or more failover coordinators to increase fault tolerance. In this case, only one coordinator is active and used to control leadership in a replica set. Learn more from Active and passive coordinators.

Performing manual failover

If an etcd-based storage is used to maintain the state of failover coordinators, you can perform a manual failover. External tools can use the <prefix>/failover/command/<id> key to choose a new master. For example, the tt utility provides the tt cluster failover command for managing a supervised failover.

Version:

Supervised failover

Overview

Appointing a new master instance

Active and passive coordinators

Configuring a cluster

Starting a failover coordinator

Performing manual failover