Developer’s guide
For a quick start, skip the details below and jump right away to the Cartridge getting started guide.
For a deep dive into what you can develop with Tarantool Cartridge, go on with the Cartridge developer’s guide.
To develop and start an application, you need to go through the following steps:
- Install Tarantool Cartridge and other components of the development environment.
- Create a project.
- Develop the application. In case it is a cluster-aware application, implement its logic in a custom (user-defined) cluster role to initialize the database in a cluster environment.
- Deploy the application to target server(s). This includes configuring and starting the instance(s).
- In case it is a cluster-aware application, deploy the cluster.
The following sections provide details for each of these steps.
Install
cartridge-cli
, a command-line tool for developing, deploying, and managing Tarantool applications.Important
cartridge-cli
is deprecated in favor of the tt CLI utility. This guide usescartridge-cli
as a native tool for Cartridge applications development. However, we encourage you to switch tott
in order to simplify the migration to Tarantool 3.0 and newer versions.Install
git
, a version control system.Install
npm
, a package manager fornode.js
.Install the
unzip
utility.
To set up your development environment, create a project using the Tarantool Cartridge project template. In any directory, run:
$ cartridge create --name <app_name> /path/to/
This will automatically set up a Git repository in a new /path/to/<app_name>/
directory, tag it with version 0.1.0
,
and put the necessary files into it.
In this Git repository, you can develop the application (by simply editing the default files provided by the template), plug the necessary modules, and then easily pack everything to deploy on your server(s).
The project template creates the <app_name>/
directory with the following
contents:
<app_name>-scm-1.rockspec
file where you can specify the application dependencies.deps.sh
script that resolves dependencies from the.rockspec
file.init.lua
file which is the entry point for your application..git
file necessary for a Git repository..gitignore
file to ignore the unnecessary files.env.lua
file that sets common rock paths so that the application can be started from any directory.custom-role.lua
file that is a placeholder for a custom (user-defined) cluster role.
The entry point file (init.lua
), among other things, loads the cartridge
module and calls its initialization function:
...
local cartridge = require('cartridge')
...
cartridge.cfg({
-- cartridge options example
workdir = '/var/lib/tarantool/app',
advertise_uri = 'localhost:3301',
cluster_cookie = 'super-cluster-cookie',
...
}, {
-- box options example
memtx_memory = 1000000000,
... })
...
The cartridge.cfg()
call renders the instance operable via the administrative
console but does not call box.cfg()
to configure instances.
Warning
Calling the box.cfg()
function is forbidden.
The cluster itself will do it for you when it is time to:
- bootstrap the current instance once you:
- run
cartridge.bootstrap()
via the administrative console, or - click Create in the web interface;
- run
- join the instance to an existing cluster once you:
- run
cartridge.join_server({uri = 'other_instance_uri'})
via the console, or - click Join (an existing replica set) or Create (a new replica set) in the web interface.
- run
Notice that you can specify a cookie for the cluster (cluster_cookie
parameter)
if you need to run several clusters in the same network. The cookie can be any
string value.
Now you can develop an application that will run on a single or multiple independent Tarantool instances (e.g. acting as a proxy to third-party databases) – or will run in a cluster.
If you plan to develop a cluster-aware application, first familiarize yourself with the notion of cluster roles.
Cluster roles are Lua modules that implement some specific functions and/or logic. In other words, a Tarantool Cartridge cluster segregates instance functionality in a role-based way.
Since all instances running cluster applications use the same source code and are aware of all the defined roles (and plugged modules), you can dynamically enable and disable multiple different roles without restarts, even during cluster operation.
Note that every instance in a replica set performs the same roles and you cannot enable/disable roles individually on some instances. In other words, configuration of enabled roles is set up per replica set. See a step-by-step configuration example in this guide.
The cartridge
module comes with two built-in roles that implement
automatic sharding:
vshard-router
that handles thevshard
’s compute-intensive workload: routes requests to storage nodes.vshard-storage
that handles thevshard
’s transaction-intensive workload: stores and manages a subset of a dataset.Note
For more information on sharding, see the vshard module documentation.
With the built-in and custom roles, you can develop applications with separated compute and transaction handling – and enable relevant workload-specific roles on different instances running on physical servers with workload-dedicated hardware.
You can implement custom roles for any purposes, for example:
- define stored procedures;
- implement extra features on top of
vshard
; - go without
vshard
at all; - implement one or multiple supplementary services such as e-mail notifier, replicator, etc.
To implement a custom cluster role, do the following:
Take the
app/roles/custom.lua
file in your project as a sample. Rename this file as you wish, e.g.app/roles/custom-role.lua
, and implement the role’s logic. For example:-- Implement a custom role in app/roles/custom-role.lua local role_name = 'custom-role' local function init() ... end local function stop() ... end return { role_name = role_name, init = init, stop = stop, }
Here the
role_name
value may differ from the module name passed to thecartridge.cfg()
function. If therole_name
variable is not specified, the module name is the default value.Note
Role names must be unique as it is impossible to register multiple roles with the same name.
Register the new role in the cluster by modifying the
cartridge.cfg()
call in theinit.lua
entry point file:-- Register a custom role in init.lua ... local cartridge = require('cartridge') ... cartridge.cfg({ workdir = ..., advertise_uri = ..., roles = {'custom-role'}, }) ...
where
custom-role
is the name of the Lua module to be loaded.
The role module does not have required functions, but the cluster may execute the following ones during the role’s life cycle:
init()
is the role’s initialization function.Inside the function’s body you can call any box functions: create spaces, indexes, grant permissions, etc. Here is what the initialization function may look like:
local function init(opts) -- The cluster passes an 'opts' Lua table containing an 'is_master' flag. if opts.is_master then local customer = box.schema.space.create('customer', { if_not_exists = true } ) customer:format({ {'customer_id', 'unsigned'}, {'bucket_id', 'unsigned'}, {'name', 'string'}, }) customer:create_index('customer_id', { parts = {'customer_id'}, if_not_exists = true, }) end end
Note
- Neither
vshard-router
norvshard-storage
manage spaces, indexes, or formats. You should do it within a custom role: add abox.schema.space.create()
call to your first cluster role, as shown in the example above. - The function’s body is wrapped in a conditional statement that
lets you call
box
functions on masters only. This protects against replication collisions as data propagates to replicas automatically.
- Neither
stop()
is the role’s termination function. Implement it if initialization starts a fiber that has to be stopped or does any job that needs to be undone on termination.validate_config()
andapply_config()
are functions that validate and apply the role’s configuration. Implement them if some configuration data needs to be stored cluster-wide.
Next, get a grip on the role’s life cycle to implement the functions you need.
You can instruct the cluster to apply some other roles if your custom role is enabled.
For example:
-- Role dependencies defined in app/roles/custom-role.lua
local role_name = 'custom-role'
...
return {
role_name = role_name,
dependencies = {'cartridge.roles.vshard-router'},
...
}
Here vshard-router
role will be initialized automatically for every
instance with custom-role
enabled.
Replica sets with vshard-storage
roles can belong to different groups.
For example, hot
or cold
groups meant to independently process hot and
cold data.
Groups are specified in the cluster’s configuration:
-- Specify groups in init.lua
cartridge.cfg({
vshard_groups = {'hot', 'cold'},
...
})
If no groups are specified, the cluster assumes that all replica sets belong
to the default
group.
With multiple groups enabled, every replica set with a vshard-storage
role
enabled must be assigned to a particular group.
The assignment can never be changed.
Another limitation is that you cannot add groups dynamically (this will become available in future).
Finally, mind the syntax for router access.
Every instance with a vshard-router
role enabled initializes multiple
routers. All of them are accessible through the role:
local router_role = cartridge.service_get('vshard-router')
router_role.get('hot'):call(...)
If you have no roles specified, you can access a static router as before (when Tarantool Cartridge was unaware of groups):
local vshard = require('vshard')
vshard.router.call(...)
However, when using the current group-aware API, you must call a static router with a colon:
local router_role = cartridge.service_get('vshard-router')
local default_router = router_role.get() -- or router_role.get('default')
default_router:call(...)
The cluster displays the names of all custom roles along with the built-in vshard-*
roles in the web interface.
Cluster administrators can enable and disable them for particular instances –
either via the web interface or via the cluster
public API.
For example:
cartridge.admin.edit_replicaset('replicaset-uuid', {roles = {'vshard-router', 'custom-role'}})
If you enable multiple roles on an instance at the same time, the cluster first
initializes the built-in roles (if any) and then the custom ones (if any) in the
order the latter were listed in cartridge.cfg()
.
If a custom role has dependent roles, the dependencies are registered and validated first, prior to the role itself.
The cluster calls the role’s functions in the following circumstances:
- The
init()
function, typically, once: either when the role is enabled by the administrator or at the instance restart. Enabling a role once is normally enough. - The
stop()
function – only when the administrator disables the role, not on instance termination. - The
validate_config()
function, first, before the automaticbox.cfg()
call (database initialization), then – upon every configuration update. - The
apply_config()
function upon every configuration update.
As a tryout, let’s task the cluster with some actions and see the order of executing the role’s functions:
- Join an instance or create a replica set, both with an enabled role:
validate_config()
init()
apply_config()
- Restart an instance with an enabled role:
validate_config()
init()
apply_config()
- Disable role:
stop()
. - Upon the
cartridge.confapplier.patch_clusterwide()
call:validate_config()
apply_config()
- Upon a triggered failover:
validate_config()
apply_config()
Considering the described behavior:
- The
init()
function may:- Call
box
functions. - Start a fiber and, in this case, the
stop()
function should take care of the fiber’s termination. - Configure the built-in HTTP server.
- Execute any code related to the role’s initialization.
- Call
- The
stop()
functions must undo any job that needs to be undone on role’s termination. - The
validate_config()
function must validate any configuration change. - The
apply_config()
function may execute any code related to a configuration change, e.g., take care of anexpirationd
fiber.
The validation and application functions together allow you to change the cluster-wide configuration as described in the next section.
You can:
Store configurations for your custom roles as sections in cluster-wide configuration, for example:
# in YAML configuration file my_role: notify_url: "https://localhost:8080"
-- in init.lua file local notify_url = 'http://localhost' function my_role.apply_config(conf, opts) local conf = conf['my_role'] or {} notify_url = conf.notify_url or 'default' end
Download and upload cluster-wide configuration using the web interface or API (via GET/PUT queries to
admin/config
endpoint likecurl localhost:8081/admin/config
andcurl -X PUT -d "{'my_parameter': 'value'}" localhost:8081/admin/config
).Utilize it in your role’s
apply_config()
function.
Every instance in the cluster stores a copy of the configuration file in its
working directory (configured by cartridge.cfg({workdir = ...})
):
/var/lib/tarantool/<instance_name>/config.yml
for instances deployed from RPM packages and managed bysystemd
./home/<username>/tarantool_state/var/lib/tarantool/config.yml
for instances deployed from tar+gz archives.
The cluster’s configuration is a Lua table, downloaded and uploaded as YAML. If some application-specific configuration data, e.g. a database schema as defined by DDL (data definition language), needs to be stored on every instance in the cluster, you can implement your own API by adding a custom section to the table. The cluster will help you spread it safely across all instances.
Such section goes in the same file with topology-specific
and vshard
-specific sections that the cluster generates automatically.
Unlike the generated, the custom section’s modification, validation, and
application logic has to be defined.
The common way is to define two functions:
validate_config(conf_new, conf_old)
to validate changes made in the new configuration (conf_new
) versus the old configuration (conf_old
).apply_config(conf, opts)
to execute any code related to a configuration change. As input, this function takes the configuration to apply (conf
, which is actually the new configuration that you validated earlier withvalidate_config()
) and options (theopts
argument that includesis_master
, a Boolean flag described later).
Important
The validate_config()
function must detect all configuration
problems that may lead to apply_config()
errors. For more information,
see the next section.
When implementing validation and application functions that call box
ones for some reason, mind the following precautions:
Due to the role’s life cycle, the cluster does not guarantee an automatic
box.cfg()
call prior to callingvalidate_config()
.If the validation function calls any
box
functions (e.g., to check a format), make sure the calls are wrapped in a protective conditional statement that checks ifbox.cfg()
has already happened:-- Inside the validate_config() function: if type(box.cfg) == 'table' then -- Here you can call box functions end
Unlike the validation function,
apply_config()
can callbox
functions freely as the cluster applies custom configuration after the automaticbox.cfg()
call.However, creating spaces, users, etc., can cause replication collisions when performed on both master and replica instances simultaneously. The appropriate way is to call such
box
functions on masters only and let the changes propagate to replicas automatically.Upon the
apply_config(conf, opts)
execution, the cluster passes anis_master
flag in theopts
table which you can use to wrap collision-inducingbox
functions in a protective conditional statement:-- Inside the apply_config() function: if opts.is_master then -- Here you can call box functions end
Consider the following code as part of the role’s module (custom-role.lua
)
implementation:
-- Custom role implementation
local cartridge = require('cartridge')
local role_name = 'custom-role'
-- Modify the config by implementing some setter (an alternative to HTTP PUT)
local function set_secret(secret)
local custom_role_cfg = cartridge.confapplier.get_deepcopy(role_name) or {}
custom_role_cfg.secret = secret
cartridge.confapplier.patch_clusterwide({
[role_name] = custom_role_cfg,
})
end
-- Validate
local function validate_config(cfg)
local custom_role_cfg = cfg[role_name] or {}
if custom_role_cfg.secret ~= nil then
assert(type(custom_role_cfg.secret) == 'string', 'custom-role.secret must be a string')
end
return true
end
-- Apply
local function apply_config(cfg)
local custom_role_cfg = cfg[role_name] or {}
local secret = custom_role_cfg.secret or 'default-secret'
-- Make use of it
end
return {
role_name = role_name,
set_secret = set_secret,
validate_config = validate_config,
apply_config = apply_config,
}
Once the configuration is customized, do one of the following:
- continue developing your application and pay attention to its versioning;
- (optional) enable authorization in the web interface.
- in case the cluster is already deployed, apply the configuration cluster-wide.
With the implementation showed by the example,
you can call the set_secret()
function to apply the new configuration via
the administrative console – or an HTTP endpoint if the role exports one.
The set_secret()
function calls cartridge.confapplier.patch_clusterwide()
which performs a two-phase commit:
- It patches the active configuration in memory: copies the table and replaces
the
"custom-role"
section in the copy with the one given by theset_secret()
function. - The cluster checks if the new configuration can be applied on all instances
except disabled and expelled. All instances subject to update must be healthy
and
alive
according to the membership module. - (Preparation phase) The cluster propagates the patched configuration.
Every instance validates it with the
validate_config()
function of every registered role. Depending on the validation’s result:- If successful (i.e., returns
true
), the instance saves the new configuration to a temporary file namedconfig.prepare.yml
within the working directory. - (Abort phase) Otherwise, the instance reports an error and all the other instances roll back the update: remove the file they may have already prepared.
- If successful (i.e., returns
- (Commit phase) Upon successful preparation of all instances, the cluster
commits the changes. Every instance:
- Creates the active configuration’s hard-link.
- Atomically replaces the active configuration file with the prepared one. The atomic replacement is indivisible – it can either succeed or fail entirely, never partially.
- Calls the
apply_config()
function of every registered role.
If any of these steps fail, an error pops up in the web interface next to the corresponding instance. The cluster does not handle such errors automatically, they require manual repair.
You will avoid the repair if the validate_config()
function can detect all
configuration problems that may lead to apply_config()
errors.
The cluster launches an httpd
server instance during initialization
(cartridge.cfg()
). You can bind a port to the instance via an environmental
variable:
-- Get the port from an environmental variable or the default one:
local http_port = os.getenv('HTTP_PORT') or '8080'
local ok, err = cartridge.cfg({
...
-- Pass the port to the cluster:
http_port = http_port,
...
})
To make use of the httpd
instance, access it and configure routes inside
the init()
function of some role, e.g. a role that exposes API over HTTP:
local function init(opts)
...
-- Get the httpd instance:
local httpd = cartridge.service_get('httpd')
if httpd ~= nil then
-- Configure a route to, for example, metrics:
httpd:route({
method = 'GET',
path = '/metrics',
public = true,
},
function(req)
return req:render({json = stat.stat()})
end
)
end
end
For more information on using Tarantool’s HTTP server, see its documentation.
To implement authorization in the web interface of every instance in a Tarantool cluster:
Implement a new, say,
auth
module with acheck_password
function. It should check the credentials of any user trying to log in to the web interface.The
check_password
function accepts a username and password and returns an authentication success or failure.-- auth.lua -- Add a function to check the credentials local function check_password(username, password) -- Check the credentials any way you like -- Return an authentication success or failure if not ok then return false end return true end ...
Pass the implemented
auth
module name as a parameter tocartridge.cfg()
, so the cluster can use it:-- init.lua local ok, err = cartridge.cfg({ auth_backend_name = 'auth', -- The cluster will automatically call 'require()' on the 'auth' module. ... })
This adds a Log in button to the upper right corner of the web interface but still lets the unsigned users interact with the interface. This is convenient for testing.
Note
Also, to authorize requests to cluster API, you can use the HTTP basic authorization header.
To require the authorization of every user in the web interface even before the cluster bootstrap, add the following line:
-- init.lua local ok, err = cartridge.cfg({ auth_backend_name = 'auth', auth_enabled = true, ... })
With the authentication enabled and the
auth
module implemented, the user will not be able to even bootstrap the cluster without logging in. After the successful login and bootstrap, the authentication can be enabled and disabled cluster-wide in the web interface and theauth_enabled
parameter is ignored.
Tarantool Cartridge understands semantic versioning as described at semver.org. When developing an application, create new Git branches and tag them appropriately. These tags are used to calculate version increments for subsequent packing.
For example, if your application has version 1.2.1, tag your current branch with
1.2.1
(annotated or not).
To retrieve the current version from Git, run:
$ git describe --long --tags
1.2.1-12-g74864f2
This output shows that we are 12 commits after the version 1.2.1. If we are
to package the application at this point, it will have a full version of
1.2.1-12
and its package will be named <app_name>-1.2.1-12.rpm
.
Non-semantic tags are prohibited. You will not be able to create a package from a branch with the latest tag being non-semantic.
Once you package your application, the version
is saved in a VERSION
file in the package root.
You can add a .cartridge.ignore
file to your application repository to
exclude particular files and/or directories from package builds.
For the most part, the logic is similar to that of .gitignore
files.
The major difference is that in .cartridge.ignore
files the order of
exceptions relative to the rest of the templates does not matter, while in
.gitignore
files the order does matter.
.cartridge.ignore entry | ignores every… |
---|---|
target/ |
folder (due to the trailing / )
named target , recursively |
target |
file or folder named target ,
recursively |
/target |
file or folder named target in the
top-most directory (due to the leading / ) |
/target/ |
folder named target in the top-most
directory (leading and trailing / ) |
*.class |
every file or folder ending with
.class , recursively |
#comment |
nothing, this is a comment (the first
character is a # ) |
\#comment |
every file or folder with name
#comment (\ for escaping) |
target/logs/ |
every folder named logs which is
a subdirectory of a folder named target |
target/*/logs/ |
every folder named logs two levels
under a folder named target (* doesn’t
include / ) |
target/**/logs/ |
every folder named logs somewhere
under a folder named target (**
includes / ) |
*.py[co] |
every file or folder ending in .pyc or
.pyo ; however, it doesn’t match .py! |
*.py[!co] |
every file or folder ending in anything
other than c or o |
*.file[0-9] |
every file or folder ending in digit |
*.file[!0-9] |
every file or folder ending in anything other than digit |
* |
every |
/* |
everything in the top-most directory (due
to the leading / ) |
**/*.tar.gz |
every *.tar.gz file or folder which is
one or more levels under the starting
folder |
!file |
every file or folder will be ignored even if it matches other patterns |
An important concept in cluster topology is appointing a leader. Leader is an instance which is responsible for performing key operations. To keep things simple, you can think of a leader as of the only writable master. Every replica set has its own leader, and there’s usually not more than one.
Which instance will become a leader depends on topology settings and failover configuration.
An important topology parameter is the failover priority within a replica set. This is an ordered list of instances. By default, the first instance in the list becomes a leader, but with the failover enabled it may be changed automatically if the first one is malfunctioning.
When Cartridge configures roles, it takes into account the leadership map
(consolidated in the failover.lua
module). The leadership map is composed when
the instance enters the ConfiguringRoles
state for the first time. Later
the map is updated according to the failover mode.
Every change in the leadership map is accompanied by instance
re-configuration. When the map changes, Cartridge updates the read_only
setting and calls the apply_config
callback for every role. It also
specifies the is_master
flag (which actually means is_leader
, but hasn’t
been renamed yet due to historical reasons).
It’s important to say that we discuss a distributed system where every instance has its own opinion. Even if all opinions coincide, there still may be races between instances, and you (as an application developer) should take them into account when designing roles and their interaction.
The logic behind leader election depends on the failover mode: disabled, eventual, or stateful.
This is the simplest case. The leader is always the first instance in the failover priority. No automatic switching is performed. When it’s dead, it’s dead.
In the eventual
mode, the leader isn’t elected consistently. Instead, every
instance in the cluster thinks that the leader is the first healthy instance
in the failover priority list, while instance health is determined according to
the membership status (the SWIM protocol).
Not recommended to use on large clusters in production. If you have highload
production cluster, use stateful failover with etcd
instead.
The member is considered healthy if both are true:
- It reports either
ConfiguringRoles
orRolesConfigured
state; - Its SWIM status is either
alive
orsuspect
.
A suspect
member becomes dead
after the failover_timout
expires.
Leader election is done as follows. Suppose there are two replica sets in the cluster:
- a single router “R”,
- two storages, “S1” and “S2”.
Then we can say: all the three instances (R, S1, S2) agree that S1 is the leader.
The SWIM protocol guarantees that eventually all instances will find a common ground, but it’s not guaranteed for every intermediate moment of time. So we may get a conflict.
For example, soon after S1 goes down, R is already informed and thinks that S2 is the leader, but S2 hasn’t received the gossip yet and still thinks he’s not. This is a conflict.
Similarly, when S1 recovers and takes the leadership, S2 may be unaware of that yet. So, both S1 and S2 consider themselves as leaders.
Moreover, SWIM protocol isn’t perfect and still can produce
false-negative gossips (announce the instance is dead when it’s not).
It may cause “failover storms”, when failover triggers too many times per minute
under a high load. You can pause failover at runtime using Lua API
(require('cartridge.lua-api.failover').pause()
) or GraphQL mutation
(mutation { cluster { failover_pause } }
). Those functions will pause
failover on every instance they can reach. To see if failover is paused,
check the logs or use the function require('cartridge.failover').is_paused()
.
Don’t forget to resume failover using Lua API
(require('cartridge.lua-api.failover').resume()
) or GraphQL mutation
(mutation { cluster { failover_resume } }
).
You can also enable failover suppressing by cartridge.cfg
parameter
enable_failover_suppressing
. It allows to automatically pause failover
in runtime if failover triggers too many times per minute. It could be
configured by argparse parameters failover_suppress_threshold
(count of times than failover triggers per failover_suppress_timeout
to
be suppressed) and failover_suppress_timeout
(time in seconds, if failover
triggers more than failover_suppress_threshold
, it’ll be suppressed and
released after failover_suppress_timeout
sec).
Similarly to the eventual mode, every instance composes its own leadership map,
but now the map is fetched from an external state provider
(that’s why this failover mode called “stateful”). Nowadays there are two state
providers supported – etcd
and stateboard
(standalone Tarantool instance).
State provider serves as a domain-specific key-value storage (simply
replicaset_uuid -> leader_uuid
) and a locking mechanism.
Changes in the leadership map are obtained from the state provider with the long polling technique.
All decisions are made by the coordinator – the one that holds the lock. The coordinator is implemented as a built-in Cartridge role. There may be many instances with the coordinator role enabled, but only one of them can acquire the lock at the same time. We call this coordinator the “active” one.
The lock is released automatically when the TCP connection is closed, or it
may expire if the coordinator becomes unresponsive (in stateboard
it’s set
by the stateboard’s --lock_delay
option, for etcd
it’s a part of
clusterwide configuration), so the coordinator renews the lock from
time to time in order to be considered alive.
The coordinator makes a decision based on the SWIM data, but the decision algorithm is slightly different from that in case of eventual failover:
- Right after acquiring the lock from the state provider, the coordinator fetches the leadership map.
- If there is no leader appointed for the replica set, the coordinator appoints the first leader according to the failover priority, regardless of the SWIM status.
- If a leader becomes
dead
, the coordinator makes a decision. A new leader is the first healthy instance from the failover priority list. If an old leader recovers, no leader change is made until the current leader down. Changing failover priority doesn’t affect this. - Every appointment (self-made or fetched) is immune for a while
(controlled by the
IMMUNITY_TIMEOUT
option).
You can also enable leader_autoreturn
to return leadership to the
first leader in failover_priority
list after failover was triggered.
It might be useful when you have active and passive data centers.
The time before failover will try to return the leader is configured by
autoreturn_delay
option in a failover configuration. Note that
leader_autoreturn
won’t work if the prime leader is unhealthy.
Stateful failover automatically checks if there is a registered cluster
in a state provider. Check is performed on a first stateful failover
configuration and every time when cluster is restarted. You can disable that
option by using check_cookie_hash = false
in failover configuration.
Stateful failover may call box.ctl.promote
on the leader instance.
It doesn’t work with ALL_RW
replicasets and replicasets with
one existing or enabled node. It works on any Tarantool versions where
box.ctl.promote
is available. If you face any issue with promoting,
you can try call it manually on leader
. If you want to enable this
functionality, you should enable it in your init.lua
file:
cartridge.cfg({
...
enable_synchro_mode = true,
})
In this case, instances do nothing: the leader remains a leader, read-only instances remain read-only. If any instance restarts during an external state provider outage, it composes an empty leadership map: it doesn’t know who actually is a leader and thinks there is none.
An active coordinator may be absent in a cluster either because of a failure or due to disabling the role on all instances. Just like in the previous case, instances do nothing about it: they keep fetching the leadership map from the state provider. But it will remain the same until a coordinator appears.
Raft failover in Cartridge based on built-in Tarantool Raft failover, the
box.ctl.on_election
trigger that was introduced in Tarantool 2.10.0, and
eventual failover mechanisms. The replicaset leader is chosen by built-in Raft,
then the other replicasets get information about leader change from membership.
It’s needed to use Cartridge RPC calls. The user can control an instance’s
election mode using the argparse option TARANTOOL_ELECTION_MODE
or
--election-mode
or use box.cfg{election_mode = ...}
API in runtime.
Raft failover can be enabled only on replicasets of 3 or more instances
(you can change the behavior by using cartridge.cfg
option
disable_raft_on_small_clusters
) and can’t be enabled with ALL_RW
replicasets.
Important
Raft failover in Cartridge is in beta. Don’t use it in production.
It differs a lot depending on the failover mode.
In the disabled and eventual modes, you can only promote a leader by changing the failover priority (and applying a new clusterwide configuration).
In the stateful mode, the failover priority doesn’t make much sense (except for
the first appointment). Instead, you should use the promotion API
(the Lua cartridge.failover_promote or
the GraphQL mutation {cluster{failover_promote()}}
)
which pushes manual appointments to the state provider.
The stateful failover mode implies consistent promotion: before becoming
writable, each instance performs the wait_lsn
operation to sync up with the
previous one.
Information about the previous leader (we call it a vclockkeeper) is also stored on the external storage. Even when the old leader is demoted, it remains the vclockkeeper until the new leader successfully awaits and persists its vclock on the external storage.
If replication is stuck and consistent promotion isn’t possible, a user has two
options: to revert promotion (to re-promote the old leader) or to force it
inconsistently (all kinds of failover_promote
API has
force_inconsistency
flag).
Consistent promotion doesn’t work for replicasets with all_rw
flag enabled
and for single-instance replicasets. In these two cases an instance doesn’t
even try to query vclockkeeper
and to perform wait_lsn
. But the coordinator
still appoints a new leader if the current one dies.
In the Raft failover mode, the user can also use the promotion API:
cartridge.failover_promote in Lua or
mutation {cluster{failover_promote()}}
in GraphQL,
which calls box.ctl.promote
on the specified instances.
Note that box.ctl.promote
starts fair elections, so some other instance may
become the leader in the replicaset.
You can restrict the election of a particular node in the stateful
failover mode by GraphQL or Lua API. An “unelectable” node can’t become a
leader in a replicaset. It could be useful for nodes that could only be used
for election process and for routers that shouldn’t store the data.
In edit_topology
:
{
"replicasets": [
{
"alias": "storage",
"uuid": "aaaaaaaa-aaaa-0000-0000-000000000000",
"join_servers": [
{
"uri": "localhost:3301",
"uuid": "aaaaaaaa-aaaa-0000-0000-000000000001",
"electable": false
}
],
"roles": []
}
]
}
In Lua API:
-- to make nodes unelectable:
require('cartridge.lua-api.topology').api_topology.set_unelectable_servers(uuids)
-- to make nodes electable:
require('cartridge.lua-api.topology').api_topology.set_electable_servers(uuids)
You can also make a node unelectable in WebUI:
If everything is ok, you will see a crossed-out crown to the left of the instance name.
Neither eventual
nor stateful
failover mode protects a replicaset
from the presence of multiple leaders when the network is partitioned.
But fencing does. It enforces at-most-one leader policy in a replicaset.
Fencing operates as a fiber that occasionally checks connectivity with
the state provider and with replicas. Fencing fiber runs on
vclockkeepers; it starts right after consistent promotion succeeds.
Replicasets which don’t need consistency (single-instance and
all_rw
) don’t defend, though.
The condition for fencing actuation is the loss of both the state provider quorum and at least one replica. Otherwise, if either state provider is healthy or all replicas are alive, the fencing fiber waits and doesn’t intervene.
When fencing is actuated, it generates a fake appointment locally and
sets the leader to nil
. Consequently, the instance becomes
read-only. Subsequent recovery is only possible when the quorum
reestablishes; replica connection isn’t a must for recovery. Recovery is
performed according to the rules of consistent switchover unless some
other instance has already been promoted to a new leader.
Raft failover supports fencing too. Check election_fencing_mode
parameter
of box.cfg{}
These are clusterwide parameters:
mode
: “disabled” / “eventual” / “stateful” / “raft”.state_provider
: “tarantool” / “etcd”.failover_timeout
– time (in seconds) to marksuspect
members asdead
and trigger failover (default: 20).tarantool_params
:{uri = "...", password = "..."}
.etcd2_params
:{endpoints = {...}, prefix = "/", lock_delay = 10, username = "", password = ""}
.fencing_enabled
:true
/false
(default: false).fencing_timeout
– time to actuate fencing after the check fails (default: 10).fencing_pause
– the period of performing the check (default: 2).leader_autoreturn
:true
/false
(default: false).autoreturn_delay
– the time before failover will try to return leader in replicaset to the first instance infailover_priority
list (default: 300).check_cookie_hash
– enable check that nobody else uses this stateboard.
It’s required that failover_timeout > fencing_timeout >= fencing_pause
.
See:
Use your favorite GraphQL client (e.g. Altair) for requests introspection:
query {cluster{failover_params{}}}
,mutation {cluster{failover_params(){}}}
,mutation {cluster{failover_promote()}}
.
Here is an example of how to setup stateful failover:
mutation {
cluster { failover_params(
mode: "stateful"
failover_timeout: 20
state_provider: "etcd2"
etcd2_params: {
endpoints: ["http://127.0.0.1:4001"]
prefix: "etcd-prefix"
}) {
mode
}
}
}
Like other Cartridge instances, the stateboard supports cartridge.argprase
options:
listen
workdir
password
lock_delay
Similarly to other argparse
options, they can be passed via
command-line arguments or via environment variables, e.g.:
.rocks/bin/stateboard --workdir ./dev/stateboard --listen 4401 --password qwerty
Besides failover priority and mode, there are some other private options that influence failover operation:
LONGPOLL_TIMEOUT
(failover
) – the long polling timeout (in seconds) to fetch new appointments (default: 30);NETBOX_CALL_TIMEOUT
(failover/coordinator
) – stateboard client’s connection timeout (in seconds) applied to all communications (default: 1);RECONNECT_PERIOD
(coordinator
) – time (in seconds) to reconnect to the state provider if it’s unreachable (default: 5);IMMUNITY_TIMEOUT
(coordinator
) – minimal amount of time (in seconds) to wait before overriding an appointment (default: 15).
Cartridge orchestrates a distributed system of Tarantool instances – a cluster. One of the core concepts is clusterwide configuration. Every instance in a cluster stores a copy of it.
Clusterwide configuration contains options that must be identical on every cluster node, such as the topology of the cluster, failover and vshard configuration, authentication parameters and ACLs, and user-defined configuration.
Clusterwide configuration doesn’t provide instance-specific parameters: ports, workdirs, memory settings, etc.
Instance configuration includes two sets of parameters:
You can set any of these parameters in:
- Command line arguments.
- Environment variables.
- YAML configuration file.
init.lua
file.
The order here indicates the priority: command-line arguments override environment variables, and so forth.
No matter how you start the instances, you need to set
the following cartridge.cfg()
parameters for each instance:
advertise_uri
– either<HOST>:<PORT>
, or<HOST>:
, or<PORT>
. Used by other instances to connect to the current one. DO NOT specify0.0.0.0
– this must be an external IP address, not a socket bind.http_port
– port to open administrative web interface and API on. Defaults to8081
. To disable it, specify"http_enabled": False
.workdir
– a directory where all data will be stored: snapshots, wal logs, andcartridge
configuration file. Defaults to.
.
If you start instances using cartridge
CLI or systemctl
,
save the configuration as a YAML file, for example:
my_app.router: {"advertise_uri": "localhost:3301", "http_port": 8080}
my_app.storage_A: {"advertise_uri": "localhost:3302", "http_enabled": False}
my_app.storage_B: {"advertise_uri": "localhost:3303", "http_enabled": False}
With cartridge
CLI, you can pass the path to this file as the --cfg
command-line argument to the cartridge start
command – or specify the path
in cartridge
CLI configuration (in ./.cartridge.yml
or ~/.cartridge.yml
):
cfg: cartridge.yml
run-dir: tmp/run
With systemctl
, save the YAML file to /etc/tarantool/conf.d/
(the default systemd
path) or to a location set in the TARANTOOL_CFG
environment variable.
If you start instances with tarantool init.lua
,
you need to pass other configuration options as command-line parameters and
environment variables, for example:
$ tarantool init.lua --alias router --memtx-memory 100 --workdir "~/db/3301" --advertise_uri "localhost:3301" --http_port "8080"
In the file system, clusterwide configuration is represented by a file tree.
Inside workdir
of any configured instance you can find the following
directory:
config/
├── auth.yml
├── topology.yml
└── vshard_groups.yml
This is the clusterwide configuration with three default config sections –
auth
, topology
, and vshard_groups
.
Due to historical reasons clusterwide configuration has two appearances:
- old-style single-file
config.yml
with all sections combined, and - modern multi-file representation mentioned above.
Before cartridge v2.0 it used to look as follows, and this representation is
still used in HTTP API and luatest
helpers.
# config.yml
---
auth: {...}
topology: {...}
vshard_groups: {...}
...
Beyond these essential sections, clusterwide configuration may be used for storing some other role-specific data. Clusterwide configuration supports YAML as well as plain text sections. It can also be organized in nested subdirectories.
In Lua it’s represented by the ClusterwideConfig
object (a table with
metamethods). Refer to the cartridge.clusterwide-config
module
documentation for more details.
Cartridge manages clusterwide configuration to be identical everywhere
using the two-phase commit algorithm implemented in the cartridge.twophase
module. Changes in clusterwide configuration imply applying it on
every instance in the cluster.
Almost every change in cluster parameters triggers a two-phase commit: joining/expelling a server, editing replica set roles, managing users, setting failover and vshard configuration.
Two-phase commit requires all instances to be alive and healthy, otherwise it returns an error.
For more details, please, refer to the
cartridge.config_patch_clusterwide
API reference.
Beside system sections, clusterwide configuration may be used for storing some other role-specific data. It supports YAML as well as plain text sections. And it can also be organized in nested subdirectories.
Role-specific sections are used by some third-party roles, i.e. sharded-queue and cartridge-extensions.
A user can influence clusterwide configuration in various ways. You can alter configuration using Lua, HTTP or GraphQL API. Also there are luatest helpers available.
It works with old-style single-file representation only. It’s useful when there are only few sections needed.
Example:
cat > config.yml << CONFIG
---
custom_section: {}
...
CONFIG
Upload new config:
curl -v "localhost:8081/admin/config" -X PUT --data-binary @config.yml
Download it:
curl -v "localhost:8081/admin/config" -o config.yml
It’s suitable for role-specific sections only. System sections
(topology
, auth
, vshard_groups
, users_acl
) can be neither
uploaded nor downloaded.
If authorization is enabled, use the curl
option --user username:password
.
GraphQL API, by contrast, is only suitable for managing plain-text sections in the modern multi-file appearance. It is mostly used by WebUI, but sometimes it’s also helpful in tests:
g.cluster.main_server:graphql({query = [[
mutation($sections: [ConfigSectionInput!]) {
cluster {
config(sections: $sections) {
filename
content
}
}
}]],
variables = {sections = {
{
filename = 'custom_section.yml',
content = '---\n{}\n...',
}
}}
})
Unlike HTTP API, GraphQL affects only the sections mentioned in the query. All the other sections remain unchanged.
Similarly to HTTP API, GraphQL cluster {config}
query isn’t suitable for
managing system sections.
It’s not the most convenient way to configure third-party role, but it may be useful for role development. Please, refer to the corresponding API reference:
cartridge.config_patch_clusterwide
cartridge.config_get_deepcopy
cartridge.config_get_readonly
Example (from sharded-queue
, simplified):
function create_tube(tube_name, tube_opts)
local tubes = cartridge.config_get_deepcopy('tubes') or {}
tubes[tube_name] = tube_opts or {}
return cartridge.config_patch_clusterwide({tubes = tubes})
end
local function validate_config(conf)
local tubes = conf.tubes or {}
for tube_name, tube_opts in pairs(tubes) do
-- validate tube_opts
end
return true
end
local function apply_config(conf, opts)
if opts.is_master then
local tubes = cfg.tubes or {}
-- create tubes according to the configuration
end
return true
end
Cartridge test helpers provide methods for configuration management:
cartridge.test-helpers.cluster:upload_config
,cartridge.test-helpers.cluster:download_config
.
Internally they wrap the HTTP API.
Example:
g.before_all(function()
g.cluster = helpers.Cluster.new(...)
g.cluster:upload_config({some_section = 'some_value'})
t.assert_equals(
g.cluster:download_config(),
{some_section = 'some_value'}
)
end)
After you’ve developed your Tarantool Cartridge application locally, you can deploy it to a test or production environment.
Deploying includes:
- packing the application into a specific distribution format
- installing it to the target server
- running the application.
You have four options to deploy a Tarantool Cartridge application:
- as an RPM package (for production)
- as a DEB package (for production)
- as a tar+gz archive (for testing or as a workaround for production if root access is unavailable)
- from sources (for local testing only).
The choice between DEB and RPM depends on the package manager of the target OS. DEB is used for Debian Linux and its derivatives, and RPM—for CentOS/RHEL and other RPM-based Linux distributions.
Important
If you use the Tarantool Community Edition while packing the application, the package will have a dependency on this version of Tarantool.
In this case, on a target server, add the Tarantool repository for the version equal or later than the one used for packing the application. This lets a package manager install the dependency correctly. See details for your OS on the Download page.
For a production environment, it is recommended to use the systemd
subsystem
for managing the application instances and accessing log entries.
To deploy your Tarantool Cartridge application:
Pack the application into a deliverable:
$ cartridge pack rpm [APP_PATH] [--use-docker] $ # -- OR -- $ cartridge pack deb [APP_PATH] [--use-docker]
where
APP_PATH
—a path to the application directory. Defaults to.
(the current directory).--use-docker
– the flag to use if packing the application on a different Linux distribution or on macOS. It ensures the resulting artifact contains the Linux compatible external modules and executables.
This creates an RPM or DEB package with the following naming:
<APP_NAME>-<VERSION>.{rpm,deb}
. For example,./my_app-0.1.0-1-g8c57dcb.rpm
or./my_app-0.1.0-1-g8c57dcb.deb
. For more details on the format and usage of thecartridge pack
command, refer to the command description.Upload the generated package to a target server.
Install the application:
$ sudo yum install <APP_NAME>-<VERSION>.rpm $ # -- OR -- $ sudo dpkg -i <APP_NAME>-<VERSION>.deb
Configure the application instances.
The configuration is stored in the
/etc/tarantool/conf.d/instances.yml
file. Create the file and specify parameters of the instances. For details, refer to Configuring instances.For example:
my_app: cluster_cookie: secret-cookie my_app.router: advertise_uri: localhost:3301 http_port: 8081 my_app.storage-master: advertise_uri: localhost:3302 http_port: 8082 my_app.storage-replica: advertise_uri: localhost:3303 http_port: 8083
Note
Do not specify working directories of the instances in this configuration. They are defined via the
TARANTOOL_WORKDIR
environmental variable in the instantiated unit file (/etc/systemd/system/<APP_NAME>@.service
).Start the application instances by using
systemctl
.For more details, see Start/stop using systemctl.
$ sudo systemctl start my_app@router $ sudo systemctl start my_app@storage-master $ sudo systemctl start my_app@storage-replica
In case of a cluster-aware application, proceed to deploying the cluster.
Note
If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:
- In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
- In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
You can further manage the running instances by using the standard operations of
the systemd
utilities:
systemctl
for stopping, re-starting, checking the status of the instances, and so onjournalctl
for collecting logs of the instances.
During the installation of a Tarantool Cartridge application, the following entities are additionally created:
- The
tarantool
user group. - The
tarantool
system user. All the application instances start under this user. Thetarantool
user group is the main group for thetarantool
user. The user is created with the option-s /sbin/nologin
. - Directories and files listed in the table below
(
<APP_NAME>
is the application name,%i
is the instance name):
Path | Access Rights | Owner:Group | Description |
---|---|---|---|
/etc/systemd/system/<APP_NAME>.service |
-rw-r--r-- |
root:root |
systemd unit file for the <APP_NAME> service |
/etc/systemd/system/<APP_NAME>@.service |
-rw-r--r-- |
root:root |
systemd instantiated unit file for the <APP_NAME> service |
/usr/share/tarantool/<APP_NAME>/ |
drwxr-xr-x |
root:root |
Directory. Contains executable files of the application. |
/etc/tarantool/conf.d/ |
drwxr-xr-x |
root:root |
Directory for YAML files with the configuration of the application instances,
such as instances.yml . |
/var/lib/tarantool/<APP_NAME>.%i/ |
drwxr-xr-x |
tarantool:tarantool |
Working directories of the application instances. Each directory contains the instance data, namely, the WAL and snapshot files, and also the application configuration YAML files. |
/var/run/tarantool/ |
drwxr-xr-x |
tarantool:tarantool |
Directory. Contains the following files for each instance:
<APP_NAME>.%i.pid and <APP_NAME>.%i.control . |
/var/run/tarantool/<APP_NAME>.%i.pid |
-rw-r--r-- |
tarantool:tarantool |
Contains the process ID. |
/var/run/tarantool/<APP_NAME>.%i.control |
srwxr-xr-x |
tarantool:tarantool |
Unix socket to connect to the instance via the tt CLI utility. |
Pack the application into a distributable:
$ cartridge pack tgz APP_NAME
This will create a tar+gz archive (e.g.
./my_app-0.1.0-1.tgz
).Upload the archive to target servers, with
tarantool
and (optionally) cartridge-cli installed.Extract the archive:
$ tar -xzvf APP_NAME-VERSION.tgz
Configure the instance(s). Create a file called
/etc/tarantool/conf.d/instances.yml
. For example:my_app: cluster_cookie: secret-cookie my_app.instance-1: http_port: 8081 advertise_uri: localhost:3301 my_app.instance-2: http_port: 8082 advertise_uri: localhost:3302
See details here.
Start Tarantool instance(s). You can do it using:
tarantool, for example:
$ tarantool init.lua # starts a single instance
or cartridge, for example:
$ # in application directory $ cartridge start # starts all instances $ cartridge start .router_1 # starts a single instance $ # in multi-application environment $ cartridge start my_app # starts all instances of my_app $ cartridge start my_app.router # starts a single instance
In case it is a cluster-aware application, proceed to deploying the cluster.
Note
If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:
- In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
- In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
This deployment method is intended for local testing only.
Pull all dependencies to the
.rocks
directory:$ tt rocks make
Configure the instance(s). Create a file called
/etc/tarantool/conf.d/instances.yml
. For example:my_app: cluster_cookie: secret-cookie my_app.instance-1: http_port: 8081 advertise_uri: localhost:3301 my_app.instance-2: http_port: 8082 advertise_uri: localhost:3302
See details here.
Start Tarantool instance(s). You can do it using:
tarantool, for example:
$ tarantool init.lua # starts a single instance
or cartridge, for example:
$ # in application directory $ cartridge start # starts all instances $ cartridge start .router_1 # starts a single instance $ # in multi-application environment $ cartridge start my_app # starts all instances of my_app $ cartridge start my_app.router # starts a single instance
In case it is a cluster-aware application, proceed to deploying the cluster.
Note
If you’re migrating your application from local test environment to production, you can re-use your test configuration at this step:
- In the cluster web interface of the test environment, click Configuration files > Download to save the test configuration.
- In the cluster web interface of the production environment, click Configuration files > Upload to upload the saved configuration.
Depending on your deployment method, you can start/stop the instances using tarantool, cartridge CLI, or systemctl.
With tarantool
, you can start only a single instance:
# the simplest command
$ tarantool init.lua
You can also specify more options on the command line or in environment variables.
To stop the instance, use Ctrl+C.
With cartridge
CLI, you can start one or multiple instances:
$ cartridge start [APP_NAME[.INSTANCE_NAME]] [options]
The options are listed in the cartridge start reference.
Here are some commonly used options:
--script FILE
Application’s entry point. Defaults to:
TARANTOOL_SCRIPT
, or./init.lua
when running from the app’s directory, orapp_name/init.lua
in a multi-app environment.
--run-dir DIR
- Directory with pid and sock files.
Defaults to
TARANTOOL_RUN_DIR
or/var/run/tarantool
. --cfg FILE
- Cartridge instances YAML configuration file.
Defaults to
TARANTOOL_CFG
or./instances.yml
. Theinstances.yml
file containscartridge.cfg()
parameters described in the configuration section of this guide.
For example:
$ cartridge start my_app --cfg demo.yml --run-dir ./tmp/run
It starts all tarantool
instances specified in cfg
file, in foreground,
with enforced environment variables.
When APP_NAME
is not provided, cartridge
parses it from ./*.rockspec
filename.
When INSTANCE_NAME
is not provided, cartridge
reads cfg
file and
starts all defined instances:
$ # in application directory
$ cartridge start # starts all instances
$ cartridge start .router_1 # starts a single instance
$ # in multi-application environment
$ cartridge start my_app # starts all instances of my_app
$ cartridge start my_app.router # starts a single instance
To stop the instances, run:
$ cartridge stop [APP_NAME[.INSTANCE_NAME]] [options]
These options from the cartridge start
command are supported:
--run-dir DIR
--cfg FILE
To run a single instance:
$ systemctl start APP_NAME
This will start a
systemd
service that will listen to the port specified in instance configuration (http_port
parameter).To run multiple instances on one or multiple servers:
$ systemctl start APP_NAME@INSTANCE_1 $ systemctl start APP_NAME@INSTANCE_2 ... $ systemctl start APP_NAME@INSTANCE_N
where
APP_NAME@INSTANCE_N
is the instantiated service name forsystemd
with an incrementalN
– a number, unique for every instance, added to the port the instance will listen to (e.g.,3301
,3302
, etc.)To stop all services on a server, use the
systemctl stop
command and specify instance names one by one. For example:$ systemctl stop APP_NAME@INSTANCE_1 APP_NAME@INSTANCE_2 ... APP_NAME@INSTANCE_<N>
When running instances with systemctl
, keep these practices in mind:
You can specify instance configuration in a YAML file.
This file can contain these options; see an example here).
Save this file to
/etc/tarantool/conf.d/
(the defaultsystemd
path) or to a location set in theTARANTOOL_CFG
environment variable (if you’ve edited the application’ssystemd
unit file). The file name doesn’t matter: it can beinstances.yml
or anything else you like.Here’s what
systemd
is doing further:- obtains
app_name
(andinstance_name
, if specified) from the name of the application’ssystemd
unit file (e.g.APP_NAME@default
orAPP_NAME@INSTANCE_1
); - sets default console socket (e.g.
/var/run/tarantool/APP_NAME@INSTANCE_1.control
), PID file (e.g./var/run/tarantool/APP_NAME@INSTANCE_1.pid
) andworkdir
(e.g./var/lib/tarantool/<APP_NAME>.<INSTANCE_NAME>
).Environment=TARANTOOL_WORKDIR=${workdir}.%i
Finally,
cartridge
looks across all YAML files in/etc/tarantool/conf.d
for a section with the appropriate name (e.g.app_name
that contains common configuration for all instances, andapp_name.instance_1
that contain instance-specific configuration). As a result, Cartridge optionsworkdir
,console_sock
, andpid_file
in the YAML file cartridge.cfg become useless, becausesystemd
overrides them.- obtains
The default tool for querying logs is journalctl. For example:
$ # show log messages for a systemd unit named APP_NAME.INSTANCE_1 $ journalctl -u APP_NAME.INSTANCE_1 $ # show only the most recent messages and continuously print new ones $ journalctl -f -u APP_NAME.INSTANCE_1
If really needed, you can change logging-related
box.cfg
options in the YAML configuration file: see log and other related options.
Almost all errors in Cartridge follow the return nil, err
style, where
err
is an error object produced by Tarantool’s
errors module. Cartridge
doesn’t raise errors except for bugs and functions contracts mismatch.
Developing new roles should follow these guidelines as well.
Note that in triggers (cartridge.graphql.on_resolve
and
cartridge.twophase.on_patch
) return values are ignored.
So if you want to raise error from trigger function, you need to
call error()
explicitly.
Error classes help to locate the problem’s source. For this purpose, an error object contains its class, stack traceback, and a message.
local errors = require('errors')
local DangerousError = errors.new_class("DangerousError")
local function some_fancy_function()
local something_bad_happens = true
if something_bad_happens then
return nil, DangerousError:new("Oh boy")
end
return "success" -- not reachable due to the error
end
print(some_fancy_function())
nil DangerousError: Oh boy
stack traceback:
test.lua:9: in function 'some_fancy_function'
test.lua:15: in main chunk
For uniform error handling, errors
provides the :pcall
API:
local ret, err = DangerousError:pcall(some_fancy_function)
print(ret, err)
nil DangerousError: Oh boy
stack traceback:
test.lua:9: in function <test.lua:4>
[C]: in function 'xpcall'
.rocks/share/tarantool/errors.lua:139: in function 'pcall'
test.lua:15: in main chunk
print(DangerousError:pcall(error, 'what could possibly go wrong?'))
nil DangerousError: what could possibly go wrong?
stack traceback:
[C]: in function 'xpcall'
.rocks/share/tarantool/errors.lua:139: in function 'pcall'
test.lua:15: in main chunk
For errors.pcall
there is no difference between the return nil, err
and
error()
approaches.
Note that errors.pcall
API differs from the vanilla Lua
pcall. Instead of true
the former
returns values returned from the call. If there is an error, it returns
nil
instead of false
, plus an error message.
Remote net.box
calls keep no stack trace from the remote. In that
case, errors.netbox_eval
comes to the rescue. It will find a stack trace
from local and remote hosts and restore metatables.
> conn = require('net.box').connect('localhost:3301')
> print( errors.netbox_eval(conn, 'return nil, DoSomethingError:new("oops")') )
nil DoSomethingError: oops
stack traceback:
eval:1: in main chunk
during net.box eval on localhost:3301
stack traceback:
[string "return print( errors.netbox_eval("]:1: in main chunk
[C]: in function 'pcall'
However, vshard
implemented in Tarantool doesn’t utilize the errors
module. Instead it uses
its own errors.
Keep this in mind when working with vshard
functions.
Data included in an error object (class name, message, traceback) may be
easily converted to string using the tostring()
function.
GraphQL implementation in Cartridge wraps the errors
module, so a typical
error response looks as follows:
{
"errors":[{
"message":"what could possibly go wrong?",
"extensions":{
"io.tarantool.errors.stack":"stack traceback: ...",
"io.tarantool.errors.class_name":"DangerousError"
}
}]
}
Read more about errors in the GraphQL specification.
If you’re going to implement a GraphQL handler, you can add your own extension like this:
local err = DangerousError:new('I have extension')
err.graphql_extensions = {code = 403}
It will lead to the following response:
{
"errors":[{
"message":"I have extension",
"extensions":{
"io.tarantool.errors.stack":"stack traceback: ...",
"io.tarantool.errors.class_name":"DangerousError",
"code":403
}
}]
}
In a nutshell, an errors
object is a table. This means that it can be
swiftly represented in JSON. This approach is used by Cartridge to
handle errors via http:
local err = DangerousError:new('Who would have thought?')
local resp = req:render({
status = 500,
headers = {
['content-type'] = "application/json; charset=utf-8"
},
json = json.encode(err),
})
{
"line":27,
"class_name":"DangerousError",
"err":"Who would have thought?",
"file":".../app/roles/api.lua",
"stack":"stack traceback:..."
}
Every instance in the cluster has an internal state machine. It helps manage cluster operation and describe a distributed system simpler.
Instance lifecycle starts with a cartridge.cfg
call.
During the initialization,
Cartridge instance binds TCP (iproto) and UDP sockets
(SWIM), checks working directory.
Depending on the result, it enters one
of the following states:
If the working directory is clean and neither snapshots nor cluster-wide
configuration files exist, the instance enters the Unconfigured
state.
The instance starts to accept iproto requests (Tarantool binary protocol) and remains in the state until the user decides to join it to a cluster (to create replicaset or join an existing one).
After that, the instance moves to the BootstrappingBox
state.
If the instance finds all configuration files and snapshots, it enters the ConfigFound
state.
The instance does not load the files and snapshots yet, because it will download and validate the config first.
On success, the state enters the ConfigLoaded
state.
On failure, it will move to the InitError
state.
Config is found, loaded and validated. The next step is instance
configuring. If there are any snapshots, the instance will change its
state to RecoveringSnapshot
. Otherwise, it will move to
BootstrappingBox
state. By default, all instances start in read-only mode
and don’t start listening until bootstrap/recovery finishes.
The following events can cause instance initialization error:
- Error occurred during
cartridge.remote-control
’s connection to binary port - Missing
config.yml
from workdir (tmp/
), while snapshots are present - Error while loading configuration from disk
- Invalid config - Server is not present in the cluster configuration
Configuring arguments for box.cfg
if snapshots or config files are
not present. box.cfg
execution. Setting up users and stopping
remote-control
. The instance will try to start listening to full-featured
iproto protocol. In case of failed attempt instance will change its
state to BootError
. On success, the instance enters the ConnectingFullmesh
state.
If there is no replicaset in cluster-wide
config, the instance will set the state to BootError
.
If snapshots are present, box.cfg
will start a recovery process.
After that, the process is similar to BootstrappingBox
.
This state can be caused by the following events:
- Failed binding to binary port for iproto usage
- Server is missing in cluster-wide config
- Replicaset is missing in cluster-wide config
- Failed replication configuration
During this state, a configuration of servers and replicasets is being
performed. Eventually, cluster topology, which is described in the config, is
implemented. But in case of an error instance, the state moves to
BootError
. Otherwise, it proceeds to configuring roles.
This state follows the successful configuration of replicasets and cluster topology. The next step is a role configuration.
The state of role configuration. Instance enters this state while
initial setup, after failover trigger(failover.lua
) or after
altering cluster-wide config(twophase.lua
).