Module metrics
Since: 2.11.1
The metrics
module provides the ability to collect and expose Tarantool metrics.
Note
If you use a Tarantool version below 2.11.1,
it is necessary to install the latest version of metrics first.
For Tarantool 2.11.1 and above, you can also use the external metrics
module.
In this case, the external metrics
module takes priority over the built-in one.
Tarantool provides the following metric collectors:
A collector is a representation of one or more observations that change over time.
counter
A counter is a cumulative metric that denotes a single monotonically increasing counter. Its value might only
increase or be reset to zero on restart. For example, you can use the counter to represent the number of requests
served, tasks completed, or errors.
The design is based on the Prometheus counter.
gauge
A gauge is a metric that denotes a single numerical value that can arbitrarily increase and decrease.
The gauge type is typically used for measured values like temperature or current memory usage.
It could also be used for values that can increase or decrease, such as the number of concurrent requests.
The design is based on the Prometheus gauge.
histogram
A histogram metric is used to collect and analyze
statistical data about the distribution of values within the application.
Unlike metrics that track the average value or quantity of events, a histogram provides detailed visibility into the distribution of values and can uncover hidden dependencies.
The design is based on the Prometheus histogram.
summary
A summary metric is used to collect statistical data
about the distribution of values within the application.
Each summary provides several measurements:
- total count of measurements
- sum of measured values
- values at specific quantiles
Similar to histograms, the summary also operates with value ranges. However, unlike histograms,
it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case,
it is not required to define fixed boundaries. For summary type, the ranges depend
on the measured values and the number of measurements.
The design is based on the Prometheus summary.
A label is a piece of metainfo that you associate with a metric in the key-value format.
For details, see labels in Prometheus and tags in Graphite.
Labels are used to differentiate between the characteristics of a thing being
measured. For example, in a metric associated with the total number of HTTP
requests, you can represent methods and statuses as label pairs:
http_requests_total_counter:inc(1, { method = 'POST', status = '200' })
The example above allows extracting the following time series:
- The total number of requests over time with
method = "POST"
(and any status).
- The total number of requests over time with
status = 500
(and any method).
To configure metrics, use metrics.cfg().
This function can be used to turn on or off the specified metrics or to configure labels applied to all collectors.
Moreover, you can use the following shortcut functions to set-up metrics or labels:
Note
Starting from version 3.0, metrics can be configured using a configuration file in the metrics section.
To create a custom metric, follow the steps below:
Create a metric
To create a new metric, you need to call a function corresponding to the desired collector type. For example, call metrics.counter() or metrics.gauge() to create a new counter or gauge, respectively.
In the example below, a new counter is created:
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
This counter is intended to collect the number of data operations performed on the specified space.
In the next example, a gauge is created:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
Observe a value
You can observe a value in two ways:
At the appropriate place, for example, in an API request handler or trigger.
In this example below, the counter value is increased any time a data operation is performed on the bands
space.
To increase a counter value, counter_obj:inc() is called.
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
local trigger = require('trigger')
trigger.set(
'box.space.bands.on_replace',
'update_bands_replace_count_metric',
function(_, _, _, request_type)
bands_replace_count:inc(1, { request_type = request_type })
end
)
At the time of requesting the data collected by metrics.
In this case, you need to collect the required metric inside metrics.register_callback().
The example below shows how to use a gauge collector to measure the size of memory wasted due to internal fragmentation:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
To set a gauge value, gauge_obj:set() is called.
You can find the full example on GitHub: metrics_collect_custom.
The module allows to add your own metrics, but there are some subtleties when working with specific tools.
When adding your custom metric, it’s important to ensure that the number of label value combinations is kept to a minimum.
Otherwise, combinatorial explosion may happen in the timeseries database with metrics values stored.
Examples of data labels:
For example, if your company uses InfluxDB for metric collection, you can potentially disrupt the entire
monitoring setup, both for your application and for all other systems within the company. As a result,
monitoring data is likely to be lost.
Example:
local some_metric = metrics.counter('some', 'Some metric')
-- THIS IS POSSIBLE
local function on_value_update(instance_alias)
some_metric:inc(1, { alias = instance_alias })
end
-- THIS IS NOT ALLOWED
local function on_value_update(customer_id)
some_metric:inc(1, { customer_id = customer_id })
end
In the example, there are two versions of the function on_value_update
. The top version labels
the data with the cluster instance’s alias. Since there’s a relatively small number of nodes, using
them as labels is feasible. In the second case, an identifier of a record is used. If there are many
records, it’s recommended to avoid such situations.
The same principle applies to URLs. Using the entire URL with parameters is not recommended.
Use a URL template or the name of the command instead.
In essence, when designing custom metrics and selecting labels or tags, it’s crucial to opt for a minimal
set of values that can uniquely identify the data without introducing unnecessary complexity or potential
conflicts with existing metrics and systems.
The metrics
module provides middleware for monitoring HTTP latency statistics for endpoints that are created using the http module.
The latency collector observes both latency information and the number of invocations.
The metrics collected by HTTP middleware are separated by a set of labels:
- a route (
path
)
- a method (
method
)
- an HTTP status code (
status
)
For each route that you want to track, you must specify the middleware explicitly.
The example below shows how to collect statistics for requests made to the /metrics/hello
endpoint.
httpd = require('http.server').new('127.0.0.1', 8080)
local metrics = require('metrics')
metrics.http_middleware.configure_default_collector('summary')
httpd:route({
method = 'GET',
path = '/metrics/hello'
}, metrics.http_middleware.v1(
function()
return { status = 200,
headers = { ['content-type'] = 'text/plain' },
body = 'Hello from http_middleware!' }
end))
httpd:start()
Note
The middleware does not cover the 404 errors.
The metrics
module provides a set of plugins that let you collect metrics through a unified interface:
For example, you can obtain an HTTP response object containing metrics in the Prometheus format by calling the metrics.plugins.prometheus.collect_http()
function:
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
To expose the collected metrics, you can use the http module:
httpd = require('http.server').new('127.0.0.1', 8080)
httpd:route({
method = 'GET',
path = '/metrics/prometheus'
}, function()
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
return prometheus_metrics
end)
httpd:start()
Example on GitHub: metrics_plugins
Use the following API to create custom plugins:
To create a plugin, you need to include the following in your main export function:
-- Invoke all callbacks registered via `metrics.register_callback(<callback-function>)`
metrics.invoke_callbacks()
-- Loop over collectors
for _, c in pairs(metrics.collectors()) do
...
-- Loop over instant observations in the collector
for _, obs in pairs(c:collect()) do
-- Export observation `obs`
...
end
end
See the source code of built-in plugins in the metrics GitHub repository.
metrics API
metrics.cfg()
Entrypoint to setup the module
metrics.collect()
Collect observations from each collector
metrics.collectors()
List all collectors in the registry
metrics.counter()
Register a new counter
metrics.enable_default_metrics()
Same as metrics.cfg{ include = include, exclude = exclude }
metrics.gauge()
Register a new gauge
metrics.histogram()
Register a new histogram
metrics.invoke_callbacks()
Invoke all registered callbacks
metrics.register_callback()
Register a function named callback
metrics.set_global_labels()
Same as metrics.cfg{ labels = label_pairs }
metrics.summary()
Register a new summary
metrics.unregister_callback()
Unregister a function named callback
metrics.http_middleware API
metrics.http_middleware.build_default_collector()
Register and return a collector for the middleware
metrics.http_middleware.configure_default_collector()
Register a collector for the middleware and set it as default
metrics.http_middleware.get_default_collector()
Get the default collector
metrics.http_middleware.set_default_collector()
Set the default collector
metrics.http_middleware.v1()
Latency measuring wrap-up
Related objects
collector_object
A collector object
counter_obj
A counter object
gauge_obj
A gauge object
histogram_obj
A histogram object
registry
A metrics registry
summary_obj
A summary object
-
metrics.
cfg
([config])¶
Entrypoint to setup the module.
Parameters:
- config (
table
) – module configuration options:
cfg.include
(string/table, default all
): all
to enable all
supported default metrics, none
to disable all default metrics,
table with names of the default metrics to enable a specific set of metrics.
cfg.exclude
(table, default {}
): a table containing the names of
the default metrics that you want to disable. Has higher priority
than cfg.include
.
cfg.labels
(table, default {}
): a table containing label names as
string keys, label values as values. See also: Labels.
You can work with metrics.cfg
as a table to read values, but you must call
metrics.cfg{}
as a function to update them.
Supported default metric names (for cfg.include
and cfg.exclude
tables):
all
(metasection including all metrics)
network
operations
system
replicas
info
slab
runtime
memory
spaces
fibers
cpu
vinyl
memtx
luajit
clock
event_loop
config
See metrics reference for details.
All metric collectors from the collection have metainfo.default = true
.
cfg.labels
are the global labels to be added to every observation.
Global labels are applied only to metric collection. They have no effect
on how observations are stored.
Global labels can be changed on the fly.
label_pairs
from observation objects have priority over global labels.
If you pass label_pairs
to an observation method with the same key as
some global label, the method argument value will be used.
Note that both label names and values in label_pairs
are treated as strings.
-
metrics.
collect
([opts])¶
Collect observations from each collector.
Parameters:
- opts (
table
) – table of collect options:
invoke_callbacks
– if true
, invoke_callbacks() is triggered before actual collect.
default_only
– if true
, observations contain only default metrics (metainfo.default = true
).
-
metrics.
collectors
()¶
List all collectors in the registry. Designed to be used in exporters.
Return: A list of created collectors (see collector_object).
See also: Creating custom plugins
-
metrics.
counter
(name[, help, metainfo])¶
Register a new counter.
Parameters:
Return: A counter object (see counter_obj).
Rtype: counter_obj
See also: Creating custom metrics
-
metrics.
enable_default_metrics
([include, exclude])¶
Same as metrics.cfg{include=include, exclude=exclude}
, but include={}
is
treated as include='all'
for backward compatibility.
-
metrics.
gauge
(name[, help, metainfo])¶
Register a new gauge.
Parameters:
Return: A gauge object (see gauge_obj).
Rtype: gauge_obj
See also: Creating custom metrics
-
metrics.
histogram
(name[, help, buckets, metainfo])¶
Register a new histogram.
Parameters:
- name (
string
) – collector name. Must be unique.
- help (
string
) – collector description.
- buckets (
table
) – histogram buckets (an array of sorted positive numbers).
The infinity bucket (INF
) is appended automatically.
Default: {.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF}
.
- metainfo (
table
) – collector metainfo.
Return: A histogram object (see histogram_obj).
Rtype: histogram_obj
See also: Creating custom metrics
Note
A histogram is basically a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.
name .. "_count"
– a counter holding the number of added observations.
name .. "_bucket"
– a counter holding all bucket sizes under the label
le
(less or equal). To access a specific bucket – x
(where x
is a number),
specify the value x
for the label le
.
-
metrics.
invoke_callbacks
()¶
Invoke all registered callbacks. Has to be called before each collect().
You can also use collect{invoke_callbacks = true}
instead.
If you’re using one of the default exporters,
invoke_callbacks()
will be called by the exporter.
See also: Creating custom plugins
-
metrics.
register_callback
(callback)¶
Register a function named callback
, which will be called right before metric
collection on plugin export.
Parameters:
- callback (
function
) – a function that takes no parameters.
This method is most often used for gauge metrics updates.
Example:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
See also: Custom metrics
-
metrics.
set_global_labels
(label_pairs)¶
Same as metrics.cfg{ labels = label_pairs }
.
Learn more in metrics.cfg().
-
metrics.
summary
(name[, help, objectives, params, metainfo])¶
Register a new summary. Quantile computation is based on the
“Effective computation of biased quantiles over data streams”
algorithm.
Parameters:
- name (
string
) – collector name. Must be unique.
- help (
string
) – collector description.
- objectives (
table
) – a list of “targeted” φ-quantiles in the {quantile = error, ... }
form.
Example: {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01}
.
The targeted φ-quantile is specified in the form of a φ-quantile and the tolerated
error. For example, {[0.5] = 0.1}
means that the median (= 50th
percentile) is to be returned with a 10-percent error. Note that
percentiles and quantiles are the same concept, except that percentiles are
expressed as percentages. The φ-quantile must be in the interval [0, 1]
.
A lower tolerated error for a φ-quantile results in higher memory and CPU
usage during summary calculation.
- params (
table
) – table of the summary parameters used to configuring the sliding
time window. This window consists of several buckets to store observations.
New observations are added to each bucket. After a time period, the head bucket
(from which observations are collected) is reset, and the next bucket becomes the
new head. This way, each bucket stores observations for
max_age_time * age_buckets_count
seconds before it is reset.
max_age_time
sets the duration of each bucket’s lifetime – that is, how
many seconds the observations are kept before they are discarded.
age_buckets_count
sets the number of buckets in the sliding time window.
This variable determines the number of buckets used to exclude observations
older than max_age_time
from the summary. The value is
a trade-off between resources (memory and CPU for maintaining the bucket)
and how smooth the time window moves.
Default value: {max_age_time = math.huge, age_buckets_count = 1}
.
- metainfo (
table
) – collector metainfo.
Return: A summary object (see summary_obj).
Rtype: summary_obj
See also: Creating custom metrics
Note
A summary represents a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.
name .. "_count"
– a counter holding the number of added observations.
name
holds all the quantiles under observation that find themselves
under the label quantile
(less or equal).
To access bucket x
(where x
is a number),
specify the value x
for the label quantile
.
-
metrics.
unregister_callback
(callback)¶
Unregister a function named callback
that is called right before metric
collection on plugin export.
Parameters:
- callback (
function
) – a function that takes no parameters.
Example:
local cpu_callback = function()
local cpu_metrics = require('metrics.psutils.cpu')
cpu_metrics.update()
end
metrics.register_callback(cpu_callback)
-- after a while, we don't need that callback function anymore
metrics.unregister_callback(cpu_callback)
-
metrics.http_middleware.
build_default_collector
(type_name, name[, help])¶
Register and return a collector for the middleware.
Parameters:
Return: A collector object
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
configure_default_collector
(type_name, name, help)¶
Register a collector for the middleware and set it as default.
Parameters:
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
get_default_collector
()¶
Return the default collector.
If the default collector hasn’t been set yet, register it
(with default http_middleware.build_default_collector() parameters)
and set it as default.
Return: A collector object
-
metrics.http_middleware.
set_default_collector
(collector)¶
Set the default collector.
Parameters:
- collector – middleware collector object
-
metrics.http_middleware.
v1
(handler, collector)¶
Latency measuring wrap-up for the HTTP ver. 1.x.x
handler. Returns a wrapped handler.
Learn more in Collecting HTTP metrics.
Parameters:
- handler (
function
) – handler function.
- collector – middleware collector object.
If not set, the default collector is used
(like in http_middleware.get_default_collector()).
Usage:
httpd:route(route, http_middleware.v1(request_handler, collector))
See also: Collecting HTTP metrics
A counter is a cumulative metric that denotes a single monotonically increasing counter. Its value might only increase or be reset to zero on restart. For example, you can use the counter to represent the number of requests served, tasks completed, or errors.
The design is based on the Prometheus counter.
gauge
A gauge is a metric that denotes a single numerical value that can arbitrarily increase and decrease.
The gauge type is typically used for measured values like temperature or current memory usage.
It could also be used for values that can increase or decrease, such as the number of concurrent requests.
The design is based on the Prometheus gauge.
histogram
A histogram metric is used to collect and analyze
statistical data about the distribution of values within the application.
Unlike metrics that track the average value or quantity of events, a histogram provides detailed visibility into the distribution of values and can uncover hidden dependencies.
The design is based on the Prometheus histogram.
summary
A summary metric is used to collect statistical data
about the distribution of values within the application.
Each summary provides several measurements:
- total count of measurements
- sum of measured values
- values at specific quantiles
Similar to histograms, the summary also operates with value ranges. However, unlike histograms,
it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case,
it is not required to define fixed boundaries. For summary type, the ranges depend
on the measured values and the number of measurements.
The design is based on the Prometheus summary.
A label is a piece of metainfo that you associate with a metric in the key-value format.
For details, see labels in Prometheus and tags in Graphite.
Labels are used to differentiate between the characteristics of a thing being
measured. For example, in a metric associated with the total number of HTTP
requests, you can represent methods and statuses as label pairs:
http_requests_total_counter:inc(1, { method = 'POST', status = '200' })
The example above allows extracting the following time series:
- The total number of requests over time with
method = "POST"
(and any status).
- The total number of requests over time with
status = 500
(and any method).
To configure metrics, use metrics.cfg().
This function can be used to turn on or off the specified metrics or to configure labels applied to all collectors.
Moreover, you can use the following shortcut functions to set-up metrics or labels:
Note
Starting from version 3.0, metrics can be configured using a configuration file in the metrics section.
To create a custom metric, follow the steps below:
Create a metric
To create a new metric, you need to call a function corresponding to the desired collector type. For example, call metrics.counter() or metrics.gauge() to create a new counter or gauge, respectively.
In the example below, a new counter is created:
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
This counter is intended to collect the number of data operations performed on the specified space.
In the next example, a gauge is created:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
Observe a value
You can observe a value in two ways:
At the appropriate place, for example, in an API request handler or trigger.
In this example below, the counter value is increased any time a data operation is performed on the bands
space.
To increase a counter value, counter_obj:inc() is called.
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
local trigger = require('trigger')
trigger.set(
'box.space.bands.on_replace',
'update_bands_replace_count_metric',
function(_, _, _, request_type)
bands_replace_count:inc(1, { request_type = request_type })
end
)
At the time of requesting the data collected by metrics.
In this case, you need to collect the required metric inside metrics.register_callback().
The example below shows how to use a gauge collector to measure the size of memory wasted due to internal fragmentation:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
To set a gauge value, gauge_obj:set() is called.
You can find the full example on GitHub: metrics_collect_custom.
The module allows to add your own metrics, but there are some subtleties when working with specific tools.
When adding your custom metric, it’s important to ensure that the number of label value combinations is kept to a minimum.
Otherwise, combinatorial explosion may happen in the timeseries database with metrics values stored.
Examples of data labels:
For example, if your company uses InfluxDB for metric collection, you can potentially disrupt the entire
monitoring setup, both for your application and for all other systems within the company. As a result,
monitoring data is likely to be lost.
Example:
local some_metric = metrics.counter('some', 'Some metric')
-- THIS IS POSSIBLE
local function on_value_update(instance_alias)
some_metric:inc(1, { alias = instance_alias })
end
-- THIS IS NOT ALLOWED
local function on_value_update(customer_id)
some_metric:inc(1, { customer_id = customer_id })
end
In the example, there are two versions of the function on_value_update
. The top version labels
the data with the cluster instance’s alias. Since there’s a relatively small number of nodes, using
them as labels is feasible. In the second case, an identifier of a record is used. If there are many
records, it’s recommended to avoid such situations.
The same principle applies to URLs. Using the entire URL with parameters is not recommended.
Use a URL template or the name of the command instead.
In essence, when designing custom metrics and selecting labels or tags, it’s crucial to opt for a minimal
set of values that can uniquely identify the data without introducing unnecessary complexity or potential
conflicts with existing metrics and systems.
The metrics
module provides middleware for monitoring HTTP latency statistics for endpoints that are created using the http module.
The latency collector observes both latency information and the number of invocations.
The metrics collected by HTTP middleware are separated by a set of labels:
- a route (
path
)
- a method (
method
)
- an HTTP status code (
status
)
For each route that you want to track, you must specify the middleware explicitly.
The example below shows how to collect statistics for requests made to the /metrics/hello
endpoint.
httpd = require('http.server').new('127.0.0.1', 8080)
local metrics = require('metrics')
metrics.http_middleware.configure_default_collector('summary')
httpd:route({
method = 'GET',
path = '/metrics/hello'
}, metrics.http_middleware.v1(
function()
return { status = 200,
headers = { ['content-type'] = 'text/plain' },
body = 'Hello from http_middleware!' }
end))
httpd:start()
Note
The middleware does not cover the 404 errors.
The metrics
module provides a set of plugins that let you collect metrics through a unified interface:
For example, you can obtain an HTTP response object containing metrics in the Prometheus format by calling the metrics.plugins.prometheus.collect_http()
function:
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
To expose the collected metrics, you can use the http module:
httpd = require('http.server').new('127.0.0.1', 8080)
httpd:route({
method = 'GET',
path = '/metrics/prometheus'
}, function()
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
return prometheus_metrics
end)
httpd:start()
Example on GitHub: metrics_plugins
Use the following API to create custom plugins:
To create a plugin, you need to include the following in your main export function:
-- Invoke all callbacks registered via `metrics.register_callback(<callback-function>)`
metrics.invoke_callbacks()
-- Loop over collectors
for _, c in pairs(metrics.collectors()) do
...
-- Loop over instant observations in the collector
for _, obs in pairs(c:collect()) do
-- Export observation `obs`
...
end
end
See the source code of built-in plugins in the metrics GitHub repository.
metrics API
metrics.cfg()
Entrypoint to setup the module
metrics.collect()
Collect observations from each collector
metrics.collectors()
List all collectors in the registry
metrics.counter()
Register a new counter
metrics.enable_default_metrics()
Same as metrics.cfg{ include = include, exclude = exclude }
metrics.gauge()
Register a new gauge
metrics.histogram()
Register a new histogram
metrics.invoke_callbacks()
Invoke all registered callbacks
metrics.register_callback()
Register a function named callback
metrics.set_global_labels()
Same as metrics.cfg{ labels = label_pairs }
metrics.summary()
Register a new summary
metrics.unregister_callback()
Unregister a function named callback
metrics.http_middleware API
metrics.http_middleware.build_default_collector()
Register and return a collector for the middleware
metrics.http_middleware.configure_default_collector()
Register a collector for the middleware and set it as default
metrics.http_middleware.get_default_collector()
Get the default collector
metrics.http_middleware.set_default_collector()
Set the default collector
metrics.http_middleware.v1()
Latency measuring wrap-up
Related objects
collector_object
A collector object
counter_obj
A counter object
gauge_obj
A gauge object
histogram_obj
A histogram object
registry
A metrics registry
summary_obj
A summary object
-
metrics.
cfg
([config])¶
Entrypoint to setup the module.
Parameters:
- config (
table
) – module configuration options:
cfg.include
(string/table, default all
): all
to enable all
supported default metrics, none
to disable all default metrics,
table with names of the default metrics to enable a specific set of metrics.
cfg.exclude
(table, default {}
): a table containing the names of
the default metrics that you want to disable. Has higher priority
than cfg.include
.
cfg.labels
(table, default {}
): a table containing label names as
string keys, label values as values. See also: Labels.
You can work with metrics.cfg
as a table to read values, but you must call
metrics.cfg{}
as a function to update them.
Supported default metric names (for cfg.include
and cfg.exclude
tables):
all
(metasection including all metrics)
network
operations
system
replicas
info
slab
runtime
memory
spaces
fibers
cpu
vinyl
memtx
luajit
clock
event_loop
config
See metrics reference for details.
All metric collectors from the collection have metainfo.default = true
.
cfg.labels
are the global labels to be added to every observation.
Global labels are applied only to metric collection. They have no effect
on how observations are stored.
Global labels can be changed on the fly.
label_pairs
from observation objects have priority over global labels.
If you pass label_pairs
to an observation method with the same key as
some global label, the method argument value will be used.
Note that both label names and values in label_pairs
are treated as strings.
-
metrics.
collect
([opts])¶
Collect observations from each collector.
Parameters:
- opts (
table
) – table of collect options:
invoke_callbacks
– if true
, invoke_callbacks() is triggered before actual collect.
default_only
– if true
, observations contain only default metrics (metainfo.default = true
).
-
metrics.
collectors
()¶
List all collectors in the registry. Designed to be used in exporters.
Return: A list of created collectors (see collector_object).
See also: Creating custom plugins
-
metrics.
counter
(name[, help, metainfo])¶
Register a new counter.
Parameters:
Return: A counter object (see counter_obj).
Rtype: counter_obj
See also: Creating custom metrics
-
metrics.
enable_default_metrics
([include, exclude])¶
Same as metrics.cfg{include=include, exclude=exclude}
, but include={}
is
treated as include='all'
for backward compatibility.
-
metrics.
gauge
(name[, help, metainfo])¶
Register a new gauge.
Parameters:
Return: A gauge object (see gauge_obj).
Rtype: gauge_obj
See also: Creating custom metrics
-
metrics.
histogram
(name[, help, buckets, metainfo])¶
Register a new histogram.
Parameters:
- name (
string
) – collector name. Must be unique.
- help (
string
) – collector description.
- buckets (
table
) – histogram buckets (an array of sorted positive numbers).
The infinity bucket (INF
) is appended automatically.
Default: {.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF}
.
- metainfo (
table
) – collector metainfo.
Return: A histogram object (see histogram_obj).
Rtype: histogram_obj
See also: Creating custom metrics
Note
A histogram is basically a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.
name .. "_count"
– a counter holding the number of added observations.
name .. "_bucket"
– a counter holding all bucket sizes under the label
le
(less or equal). To access a specific bucket – x
(where x
is a number),
specify the value x
for the label le
.
-
metrics.
invoke_callbacks
()¶
Invoke all registered callbacks. Has to be called before each collect().
You can also use collect{invoke_callbacks = true}
instead.
If you’re using one of the default exporters,
invoke_callbacks()
will be called by the exporter.
See also: Creating custom plugins
-
metrics.
register_callback
(callback)¶
Register a function named callback
, which will be called right before metric
collection on plugin export.
Parameters:
- callback (
function
) – a function that takes no parameters.
This method is most often used for gauge metrics updates.
Example:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
See also: Custom metrics
-
metrics.
set_global_labels
(label_pairs)¶
Same as metrics.cfg{ labels = label_pairs }
.
Learn more in metrics.cfg().
-
metrics.
summary
(name[, help, objectives, params, metainfo])¶
Register a new summary. Quantile computation is based on the
“Effective computation of biased quantiles over data streams”
algorithm.
Parameters:
- name (
string
) – collector name. Must be unique.
- help (
string
) – collector description.
- objectives (
table
) – a list of “targeted” φ-quantiles in the {quantile = error, ... }
form.
Example: {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01}
.
The targeted φ-quantile is specified in the form of a φ-quantile and the tolerated
error. For example, {[0.5] = 0.1}
means that the median (= 50th
percentile) is to be returned with a 10-percent error. Note that
percentiles and quantiles are the same concept, except that percentiles are
expressed as percentages. The φ-quantile must be in the interval [0, 1]
.
A lower tolerated error for a φ-quantile results in higher memory and CPU
usage during summary calculation.
- params (
table
) – table of the summary parameters used to configuring the sliding
time window. This window consists of several buckets to store observations.
New observations are added to each bucket. After a time period, the head bucket
(from which observations are collected) is reset, and the next bucket becomes the
new head. This way, each bucket stores observations for
max_age_time * age_buckets_count
seconds before it is reset.
max_age_time
sets the duration of each bucket’s lifetime – that is, how
many seconds the observations are kept before they are discarded.
age_buckets_count
sets the number of buckets in the sliding time window.
This variable determines the number of buckets used to exclude observations
older than max_age_time
from the summary. The value is
a trade-off between resources (memory and CPU for maintaining the bucket)
and how smooth the time window moves.
Default value: {max_age_time = math.huge, age_buckets_count = 1}
.
- metainfo (
table
) – collector metainfo.
Return: A summary object (see summary_obj).
Rtype: summary_obj
See also: Creating custom metrics
Note
A summary represents a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.
name .. "_count"
– a counter holding the number of added observations.
name
holds all the quantiles under observation that find themselves
under the label quantile
(less or equal).
To access bucket x
(where x
is a number),
specify the value x
for the label quantile
.
-
metrics.
unregister_callback
(callback)¶
Unregister a function named callback
that is called right before metric
collection on plugin export.
Parameters:
- callback (
function
) – a function that takes no parameters.
Example:
local cpu_callback = function()
local cpu_metrics = require('metrics.psutils.cpu')
cpu_metrics.update()
end
metrics.register_callback(cpu_callback)
-- after a while, we don't need that callback function anymore
metrics.unregister_callback(cpu_callback)
-
metrics.http_middleware.
build_default_collector
(type_name, name[, help])¶
Register and return a collector for the middleware.
Parameters:
Return: A collector object
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
configure_default_collector
(type_name, name, help)¶
Register a collector for the middleware and set it as default.
Parameters:
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
get_default_collector
()¶
Return the default collector.
If the default collector hasn’t been set yet, register it
(with default http_middleware.build_default_collector() parameters)
and set it as default.
Return: A collector object
-
metrics.http_middleware.
set_default_collector
(collector)¶
Set the default collector.
Parameters:
- collector – middleware collector object
-
metrics.http_middleware.
v1
(handler, collector)¶
Latency measuring wrap-up for the HTTP ver. 1.x.x
handler. Returns a wrapped handler.
Learn more in Collecting HTTP metrics.
Parameters:
- handler (
function
) – handler function.
- collector – middleware collector object.
If not set, the default collector is used
(like in http_middleware.get_default_collector()).
Usage:
httpd:route(route, http_middleware.v1(request_handler, collector))
See also: Collecting HTTP metrics
A gauge is a metric that denotes a single numerical value that can arbitrarily increase and decrease.
The gauge type is typically used for measured values like temperature or current memory usage. It could also be used for values that can increase or decrease, such as the number of concurrent requests.
The design is based on the Prometheus gauge.
histogram
A histogram metric is used to collect and analyze
statistical data about the distribution of values within the application.
Unlike metrics that track the average value or quantity of events, a histogram provides detailed visibility into the distribution of values and can uncover hidden dependencies.
The design is based on the Prometheus histogram.
summary
A summary metric is used to collect statistical data
about the distribution of values within the application.
Each summary provides several measurements:
- total count of measurements
- sum of measured values
- values at specific quantiles
Similar to histograms, the summary also operates with value ranges. However, unlike histograms,
it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case,
it is not required to define fixed boundaries. For summary type, the ranges depend
on the measured values and the number of measurements.
The design is based on the Prometheus summary.
A label is a piece of metainfo that you associate with a metric in the key-value format.
For details, see labels in Prometheus and tags in Graphite.
Labels are used to differentiate between the characteristics of a thing being
measured. For example, in a metric associated with the total number of HTTP
requests, you can represent methods and statuses as label pairs:
http_requests_total_counter:inc(1, { method = 'POST', status = '200' })
The example above allows extracting the following time series:
- The total number of requests over time with
method = "POST"
(and any status).
- The total number of requests over time with
status = 500
(and any method).
To configure metrics, use metrics.cfg().
This function can be used to turn on or off the specified metrics or to configure labels applied to all collectors.
Moreover, you can use the following shortcut functions to set-up metrics or labels:
Note
Starting from version 3.0, metrics can be configured using a configuration file in the metrics section.
To create a custom metric, follow the steps below:
Create a metric
To create a new metric, you need to call a function corresponding to the desired collector type. For example, call metrics.counter() or metrics.gauge() to create a new counter or gauge, respectively.
In the example below, a new counter is created:
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
This counter is intended to collect the number of data operations performed on the specified space.
In the next example, a gauge is created:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
Observe a value
You can observe a value in two ways:
At the appropriate place, for example, in an API request handler or trigger.
In this example below, the counter value is increased any time a data operation is performed on the bands
space.
To increase a counter value, counter_obj:inc() is called.
local metrics = require('metrics')
local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
local trigger = require('trigger')
trigger.set(
'box.space.bands.on_replace',
'update_bands_replace_count_metric',
function(_, _, _, request_type)
bands_replace_count:inc(1, { request_type = request_type })
end
)
At the time of requesting the data collected by metrics.
In this case, you need to collect the required metric inside metrics.register_callback().
The example below shows how to use a gauge collector to measure the size of memory wasted due to internal fragmentation:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
To set a gauge value, gauge_obj:set() is called.
You can find the full example on GitHub: metrics_collect_custom.
The module allows to add your own metrics, but there are some subtleties when working with specific tools.
When adding your custom metric, it’s important to ensure that the number of label value combinations is kept to a minimum.
Otherwise, combinatorial explosion may happen in the timeseries database with metrics values stored.
Examples of data labels:
For example, if your company uses InfluxDB for metric collection, you can potentially disrupt the entire
monitoring setup, both for your application and for all other systems within the company. As a result,
monitoring data is likely to be lost.
Example:
local some_metric = metrics.counter('some', 'Some metric')
-- THIS IS POSSIBLE
local function on_value_update(instance_alias)
some_metric:inc(1, { alias = instance_alias })
end
-- THIS IS NOT ALLOWED
local function on_value_update(customer_id)
some_metric:inc(1, { customer_id = customer_id })
end
In the example, there are two versions of the function on_value_update
. The top version labels
the data with the cluster instance’s alias. Since there’s a relatively small number of nodes, using
them as labels is feasible. In the second case, an identifier of a record is used. If there are many
records, it’s recommended to avoid such situations.
The same principle applies to URLs. Using the entire URL with parameters is not recommended.
Use a URL template or the name of the command instead.
In essence, when designing custom metrics and selecting labels or tags, it’s crucial to opt for a minimal
set of values that can uniquely identify the data without introducing unnecessary complexity or potential
conflicts with existing metrics and systems.
The metrics
module provides middleware for monitoring HTTP latency statistics for endpoints that are created using the http module.
The latency collector observes both latency information and the number of invocations.
The metrics collected by HTTP middleware are separated by a set of labels:
- a route (
path
)
- a method (
method
)
- an HTTP status code (
status
)
For each route that you want to track, you must specify the middleware explicitly.
The example below shows how to collect statistics for requests made to the /metrics/hello
endpoint.
httpd = require('http.server').new('127.0.0.1', 8080)
local metrics = require('metrics')
metrics.http_middleware.configure_default_collector('summary')
httpd:route({
method = 'GET',
path = '/metrics/hello'
}, metrics.http_middleware.v1(
function()
return { status = 200,
headers = { ['content-type'] = 'text/plain' },
body = 'Hello from http_middleware!' }
end))
httpd:start()
Note
The middleware does not cover the 404 errors.
The metrics
module provides a set of plugins that let you collect metrics through a unified interface:
For example, you can obtain an HTTP response object containing metrics in the Prometheus format by calling the metrics.plugins.prometheus.collect_http()
function:
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
To expose the collected metrics, you can use the http module:
httpd = require('http.server').new('127.0.0.1', 8080)
httpd:route({
method = 'GET',
path = '/metrics/prometheus'
}, function()
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
return prometheus_metrics
end)
httpd:start()
Example on GitHub: metrics_plugins
Use the following API to create custom plugins:
To create a plugin, you need to include the following in your main export function:
-- Invoke all callbacks registered via `metrics.register_callback(<callback-function>)`
metrics.invoke_callbacks()
-- Loop over collectors
for _, c in pairs(metrics.collectors()) do
...
-- Loop over instant observations in the collector
for _, obs in pairs(c:collect()) do
-- Export observation `obs`
...
end
end
See the source code of built-in plugins in the metrics GitHub repository.
metrics API
metrics.cfg()
Entrypoint to setup the module
metrics.collect()
Collect observations from each collector
metrics.collectors()
List all collectors in the registry
metrics.counter()
Register a new counter
metrics.enable_default_metrics()
Same as metrics.cfg{ include = include, exclude = exclude }
metrics.gauge()
Register a new gauge
metrics.histogram()
Register a new histogram
metrics.invoke_callbacks()
Invoke all registered callbacks
metrics.register_callback()
Register a function named callback
metrics.set_global_labels()
Same as metrics.cfg{ labels = label_pairs }
metrics.summary()
Register a new summary
metrics.unregister_callback()
Unregister a function named callback
metrics.http_middleware API
metrics.http_middleware.build_default_collector()
Register and return a collector for the middleware
metrics.http_middleware.configure_default_collector()
Register a collector for the middleware and set it as default
metrics.http_middleware.get_default_collector()
Get the default collector
metrics.http_middleware.set_default_collector()
Set the default collector
metrics.http_middleware.v1()
Latency measuring wrap-up
Related objects
collector_object
A collector object
counter_obj
A counter object
gauge_obj
A gauge object
histogram_obj
A histogram object
registry
A metrics registry
summary_obj
A summary object
-
metrics.
cfg
([config])¶
Entrypoint to setup the module.
Parameters:
- config (
table
) – module configuration options:
cfg.include
(string/table, default all
): all
to enable all
supported default metrics, none
to disable all default metrics,
table with names of the default metrics to enable a specific set of metrics.
cfg.exclude
(table, default {}
): a table containing the names of
the default metrics that you want to disable. Has higher priority
than cfg.include
.
cfg.labels
(table, default {}
): a table containing label names as
string keys, label values as values. See also: Labels.
You can work with metrics.cfg
as a table to read values, but you must call
metrics.cfg{}
as a function to update them.
Supported default metric names (for cfg.include
and cfg.exclude
tables):
all
(metasection including all metrics)
network
operations
system
replicas
info
slab
runtime
memory
spaces
fibers
cpu
vinyl
memtx
luajit
clock
event_loop
config
See metrics reference for details.
All metric collectors from the collection have metainfo.default = true
.
cfg.labels
are the global labels to be added to every observation.
Global labels are applied only to metric collection. They have no effect
on how observations are stored.
Global labels can be changed on the fly.
label_pairs
from observation objects have priority over global labels.
If you pass label_pairs
to an observation method with the same key as
some global label, the method argument value will be used.
Note that both label names and values in label_pairs
are treated as strings.
-
metrics.
collect
([opts])¶
Collect observations from each collector.
Parameters:
- opts (
table
) – table of collect options:
invoke_callbacks
– if true
, invoke_callbacks() is triggered before actual collect.
default_only
– if true
, observations contain only default metrics (metainfo.default = true
).
-
metrics.
collectors
()¶
List all collectors in the registry. Designed to be used in exporters.
Return: A list of created collectors (see collector_object).
See also: Creating custom plugins
-
metrics.
counter
(name[, help, metainfo])¶
Register a new counter.
Parameters:
Return: A counter object (see counter_obj).
Rtype: counter_obj
See also: Creating custom metrics
-
metrics.
enable_default_metrics
([include, exclude])¶
Same as metrics.cfg{include=include, exclude=exclude}
, but include={}
is
treated as include='all'
for backward compatibility.
-
metrics.
gauge
(name[, help, metainfo])¶
Register a new gauge.
Parameters:
Return: A gauge object (see gauge_obj).
Rtype: gauge_obj
See also: Creating custom metrics
-
metrics.
histogram
(name[, help, buckets, metainfo])¶
Register a new histogram.
Parameters:
- name (
string
) – collector name. Must be unique.
- help (
string
) – collector description.
- buckets (
table
) – histogram buckets (an array of sorted positive numbers).
The infinity bucket (INF
) is appended automatically.
Default: {.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF}
.
- metainfo (
table
) – collector metainfo.
Return: A histogram object (see histogram_obj).
Rtype: histogram_obj
See also: Creating custom metrics
Note
A histogram is basically a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.
name .. "_count"
– a counter holding the number of added observations.
name .. "_bucket"
– a counter holding all bucket sizes under the label
le
(less or equal). To access a specific bucket – x
(where x
is a number),
specify the value x
for the label le
.
-
metrics.
invoke_callbacks
()¶
Invoke all registered callbacks. Has to be called before each collect().
You can also use collect{invoke_callbacks = true}
instead.
If you’re using one of the default exporters,
invoke_callbacks()
will be called by the exporter.
See also: Creating custom plugins
-
metrics.
register_callback
(callback)¶
Register a function named callback
, which will be called right before metric
collection on plugin export.
Parameters:
- callback (
function
) – a function that takes no parameters.
This method is most often used for gauge metrics updates.
Example:
local metrics = require('metrics')
local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
metrics.register_callback(function()
bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size'])
end)
See also: Custom metrics
-
metrics.
set_global_labels
(label_pairs)¶
Same as metrics.cfg{ labels = label_pairs }
.
Learn more in metrics.cfg().
-
metrics.
summary
(name[, help, objectives, params, metainfo])¶
Register a new summary. Quantile computation is based on the
“Effective computation of biased quantiles over data streams”
algorithm.
Parameters:
- name (
string
) – collector name. Must be unique.
- help (
string
) – collector description.
- objectives (
table
) – a list of “targeted” φ-quantiles in the {quantile = error, ... }
form.
Example: {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01}
.
The targeted φ-quantile is specified in the form of a φ-quantile and the tolerated
error. For example, {[0.5] = 0.1}
means that the median (= 50th
percentile) is to be returned with a 10-percent error. Note that
percentiles and quantiles are the same concept, except that percentiles are
expressed as percentages. The φ-quantile must be in the interval [0, 1]
.
A lower tolerated error for a φ-quantile results in higher memory and CPU
usage during summary calculation.
- params (
table
) – table of the summary parameters used to configuring the sliding
time window. This window consists of several buckets to store observations.
New observations are added to each bucket. After a time period, the head bucket
(from which observations are collected) is reset, and the next bucket becomes the
new head. This way, each bucket stores observations for
max_age_time * age_buckets_count
seconds before it is reset.
max_age_time
sets the duration of each bucket’s lifetime – that is, how
many seconds the observations are kept before they are discarded.
age_buckets_count
sets the number of buckets in the sliding time window.
This variable determines the number of buckets used to exclude observations
older than max_age_time
from the summary. The value is
a trade-off between resources (memory and CPU for maintaining the bucket)
and how smooth the time window moves.
Default value: {max_age_time = math.huge, age_buckets_count = 1}
.
- metainfo (
table
) – collector metainfo.
Return: A summary object (see summary_obj).
Rtype: summary_obj
See also: Creating custom metrics
Note
A summary represents a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.
name .. "_count"
– a counter holding the number of added observations.
name
holds all the quantiles under observation that find themselves
under the label quantile
(less or equal).
To access bucket x
(where x
is a number),
specify the value x
for the label quantile
.
-
metrics.
unregister_callback
(callback)¶
Unregister a function named callback
that is called right before metric
collection on plugin export.
Parameters:
- callback (
function
) – a function that takes no parameters.
Example:
local cpu_callback = function()
local cpu_metrics = require('metrics.psutils.cpu')
cpu_metrics.update()
end
metrics.register_callback(cpu_callback)
-- after a while, we don't need that callback function anymore
metrics.unregister_callback(cpu_callback)
-
metrics.http_middleware.
build_default_collector
(type_name, name[, help])¶
Register and return a collector for the middleware.
Parameters:
Return: A collector object
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
configure_default_collector
(type_name, name, help)¶
Register a collector for the middleware and set it as default.
Parameters:
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
get_default_collector
()¶
Return the default collector.
If the default collector hasn’t been set yet, register it
(with default http_middleware.build_default_collector() parameters)
and set it as default.
Return: A collector object
-
metrics.http_middleware.
set_default_collector
(collector)¶
Set the default collector.
Parameters:
- collector – middleware collector object
-
metrics.http_middleware.
v1
(handler, collector)¶
Latency measuring wrap-up for the HTTP ver. 1.x.x
handler. Returns a wrapped handler.
Learn more in Collecting HTTP metrics.
Parameters:
- handler (
function
) – handler function.
- collector – middleware collector object.
If not set, the default collector is used
(like in http_middleware.get_default_collector()).
Usage:
httpd:route(route, http_middleware.v1(request_handler, collector))
See also: Collecting HTTP metrics
A histogram metric is used to collect and analyze statistical data about the distribution of values within the application. Unlike metrics that track the average value or quantity of events, a histogram provides detailed visibility into the distribution of values and can uncover hidden dependencies.
The design is based on the Prometheus histogram.
summary
A summary metric is used to collect statistical data
about the distribution of values within the application.
Each summary provides several measurements:
- total count of measurements
- sum of measured values
- values at specific quantiles
Similar to histograms, the summary also operates with value ranges. However, unlike histograms,
it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case,
it is not required to define fixed boundaries. For summary type, the ranges depend
on the measured values and the number of measurements.
The design is based on the Prometheus summary.
A label is a piece of metainfo that you associate with a metric in the key-value format.
For details, see labels in Prometheus and tags in Graphite.
Labels are used to differentiate between the characteristics of a thing being
measured. For example, in a metric associated with the total number of HTTP
requests, you can represent methods and statuses as label pairs:
http_requests_total_counter:inc(1, { method = 'POST', status = '200' })
The example above allows extracting the following time series:
- The total number of requests over time with
method = "POST"
(and any status).
- The total number of requests over time with
status = 500
(and any method).
A summary metric is used to collect statistical data about the distribution of values within the application.
Each summary provides several measurements:
- total count of measurements
- sum of measured values
- values at specific quantiles
Similar to histograms, the summary also operates with value ranges. However, unlike histograms, it uses quantiles (defined by a number between 0 and 1) for this purpose. In this case, it is not required to define fixed boundaries. For summary type, the ranges depend on the measured values and the number of measurements.
The design is based on the Prometheus summary.
A label is a piece of metainfo that you associate with a metric in the key-value format. For details, see labels in Prometheus and tags in Graphite.
Labels are used to differentiate between the characteristics of a thing being measured. For example, in a metric associated with the total number of HTTP requests, you can represent methods and statuses as label pairs:
http_requests_total_counter:inc(1, { method = 'POST', status = '200' })
The example above allows extracting the following time series:
- The total number of requests over time with
method = "POST"
(and any status). - The total number of requests over time with
status = 500
(and any method).
To configure metrics, use metrics.cfg(). This function can be used to turn on or off the specified metrics or to configure labels applied to all collectors. Moreover, you can use the following shortcut functions to set-up metrics or labels:
Note
Starting from version 3.0, metrics can be configured using a configuration file in the metrics section.
To create a custom metric, follow the steps below:
Create a metric
To create a new metric, you need to call a function corresponding to the desired collector type. For example, call metrics.counter() or metrics.gauge() to create a new counter or gauge, respectively. In the example below, a new counter is created:
local metrics = require('metrics') local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations')
This counter is intended to collect the number of data operations performed on the specified space.
In the next example, a gauge is created:
local metrics = require('metrics') local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation')
Observe a value
You can observe a value in two ways:
At the appropriate place, for example, in an API request handler or trigger. In this example below, the counter value is increased any time a data operation is performed on the
bands
space. To increase a counter value, counter_obj:inc() is called.local metrics = require('metrics') local bands_replace_count = metrics.counter('bands_replace_count', 'The number of data operations') local trigger = require('trigger') trigger.set( 'box.space.bands.on_replace', 'update_bands_replace_count_metric', function(_, _, _, request_type) bands_replace_count:inc(1, { request_type = request_type }) end )
At the time of requesting the data collected by metrics. In this case, you need to collect the required metric inside metrics.register_callback(). The example below shows how to use a gauge collector to measure the size of memory wasted due to internal fragmentation:
local metrics = require('metrics') local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation') metrics.register_callback(function() bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size']) end)
To set a gauge value, gauge_obj:set() is called.
You can find the full example on GitHub: metrics_collect_custom.
The module allows to add your own metrics, but there are some subtleties when working with specific tools.
When adding your custom metric, it’s important to ensure that the number of label value combinations is kept to a minimum. Otherwise, combinatorial explosion may happen in the timeseries database with metrics values stored. Examples of data labels:
For example, if your company uses InfluxDB for metric collection, you can potentially disrupt the entire monitoring setup, both for your application and for all other systems within the company. As a result, monitoring data is likely to be lost.
Example:
local some_metric = metrics.counter('some', 'Some metric')
-- THIS IS POSSIBLE
local function on_value_update(instance_alias)
some_metric:inc(1, { alias = instance_alias })
end
-- THIS IS NOT ALLOWED
local function on_value_update(customer_id)
some_metric:inc(1, { customer_id = customer_id })
end
In the example, there are two versions of the function on_value_update
. The top version labels
the data with the cluster instance’s alias. Since there’s a relatively small number of nodes, using
them as labels is feasible. In the second case, an identifier of a record is used. If there are many
records, it’s recommended to avoid such situations.
The same principle applies to URLs. Using the entire URL with parameters is not recommended. Use a URL template or the name of the command instead.
In essence, when designing custom metrics and selecting labels or tags, it’s crucial to opt for a minimal set of values that can uniquely identify the data without introducing unnecessary complexity or potential conflicts with existing metrics and systems.
The metrics
module provides middleware for monitoring HTTP latency statistics for endpoints that are created using the http module.
The latency collector observes both latency information and the number of invocations.
The metrics collected by HTTP middleware are separated by a set of labels:
- a route (
path
) - a method (
method
) - an HTTP status code (
status
)
For each route that you want to track, you must specify the middleware explicitly.
The example below shows how to collect statistics for requests made to the /metrics/hello
endpoint.
httpd = require('http.server').new('127.0.0.1', 8080)
local metrics = require('metrics')
metrics.http_middleware.configure_default_collector('summary')
httpd:route({
method = 'GET',
path = '/metrics/hello'
}, metrics.http_middleware.v1(
function()
return { status = 200,
headers = { ['content-type'] = 'text/plain' },
body = 'Hello from http_middleware!' }
end))
httpd:start()
Note
The middleware does not cover the 404 errors.
The metrics
module provides a set of plugins that let you collect metrics through a unified interface:
For example, you can obtain an HTTP response object containing metrics in the Prometheus format by calling the metrics.plugins.prometheus.collect_http()
function:
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
To expose the collected metrics, you can use the http module:
httpd = require('http.server').new('127.0.0.1', 8080)
httpd:route({
method = 'GET',
path = '/metrics/prometheus'
}, function()
local prometheus_plugin = require('metrics.plugins.prometheus')
local prometheus_metrics = prometheus_plugin.collect_http()
return prometheus_metrics
end)
httpd:start()
Example on GitHub: metrics_plugins
Use the following API to create custom plugins:
To create a plugin, you need to include the following in your main export function:
-- Invoke all callbacks registered via `metrics.register_callback(<callback-function>)`
metrics.invoke_callbacks()
-- Loop over collectors
for _, c in pairs(metrics.collectors()) do
...
-- Loop over instant observations in the collector
for _, obs in pairs(c:collect()) do
-- Export observation `obs`
...
end
end
See the source code of built-in plugins in the metrics GitHub repository.
metrics API | |
metrics.cfg() | Entrypoint to setup the module |
metrics.collect() | Collect observations from each collector |
metrics.collectors() | List all collectors in the registry |
metrics.counter() | Register a new counter |
metrics.enable_default_metrics() | Same as metrics.cfg{ include = include, exclude = exclude } |
metrics.gauge() | Register a new gauge |
metrics.histogram() | Register a new histogram |
metrics.invoke_callbacks() | Invoke all registered callbacks |
metrics.register_callback() | Register a function named callback |
metrics.set_global_labels() | Same as metrics.cfg{ labels = label_pairs } |
metrics.summary() | Register a new summary |
metrics.unregister_callback() | Unregister a function named callback |
metrics.http_middleware API | |
metrics.http_middleware.build_default_collector() | Register and return a collector for the middleware |
metrics.http_middleware.configure_default_collector() | Register a collector for the middleware and set it as default |
metrics.http_middleware.get_default_collector() | Get the default collector |
metrics.http_middleware.set_default_collector() | Set the default collector |
metrics.http_middleware.v1() | Latency measuring wrap-up |
Related objects | |
collector_object | A collector object |
counter_obj | A counter object |
gauge_obj | A gauge object |
histogram_obj | A histogram object |
registry | A metrics registry |
summary_obj | A summary object |
-
metrics.
cfg
([config])¶ Entrypoint to setup the module.
Parameters: - config (
table
) –module configuration options:
cfg.include
(string/table, defaultall
):all
to enable all supported default metrics,none
to disable all default metrics, table with names of the default metrics to enable a specific set of metrics.cfg.exclude
(table, default{}
): a table containing the names of the default metrics that you want to disable. Has higher priority thancfg.include
.cfg.labels
(table, default{}
): a table containing label names as string keys, label values as values. See also: Labels.
You can work with
metrics.cfg
as a table to read values, but you must callmetrics.cfg{}
as a function to update them.Supported default metric names (for
cfg.include
andcfg.exclude
tables):all
(metasection including all metrics)network
operations
system
replicas
info
slab
runtime
memory
spaces
fibers
cpu
vinyl
memtx
luajit
clock
event_loop
config
See metrics reference for details. All metric collectors from the collection have
metainfo.default = true
.cfg.labels
are the global labels to be added to every observation.Global labels are applied only to metric collection. They have no effect on how observations are stored.
Global labels can be changed on the fly.
label_pairs
from observation objects have priority over global labels. If you passlabel_pairs
to an observation method with the same key as some global label, the method argument value will be used.Note that both label names and values in
label_pairs
are treated as strings.- config (
-
metrics.
collect
([opts])¶ Collect observations from each collector.
Parameters: - opts (
table
) –table of collect options:
invoke_callbacks
– iftrue
, invoke_callbacks() is triggered before actual collect.default_only
– iftrue
, observations contain only default metrics (metainfo.default = true
).
- opts (
-
metrics.
collectors
()¶ List all collectors in the registry. Designed to be used in exporters.
Return: A list of created collectors (see collector_object). See also: Creating custom plugins
-
metrics.
counter
(name[, help, metainfo])¶ Register a new counter.
Parameters: Return: A counter object (see counter_obj).
Rtype: counter_obj
See also: Creating custom metrics
-
metrics.
enable_default_metrics
([include, exclude])¶ Same as
metrics.cfg{include=include, exclude=exclude}
, butinclude={}
is treated asinclude='all'
for backward compatibility.
-
metrics.
gauge
(name[, help, metainfo])¶ Register a new gauge.
Parameters: Return: A gauge object (see gauge_obj).
Rtype: gauge_obj
See also: Creating custom metrics
-
metrics.
histogram
(name[, help, buckets, metainfo])¶ Register a new histogram.
Parameters: - name (
string
) – collector name. Must be unique. - help (
string
) – collector description. - buckets (
table
) – histogram buckets (an array of sorted positive numbers). The infinity bucket (INF
) is appended automatically. Default:{.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF}
. - metainfo (
table
) – collector metainfo.
Return: A histogram object (see histogram_obj).
Rtype: histogram_obj
See also: Creating custom metrics
Note
A histogram is basically a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.name .. "_count"
– a counter holding the number of added observations.name .. "_bucket"
– a counter holding all bucket sizes under the labelle
(less or equal). To access a specific bucket –x
(wherex
is a number), specify the valuex
for the labelle
.
- name (
-
metrics.
invoke_callbacks
()¶ Invoke all registered callbacks. Has to be called before each collect(). You can also use
collect{invoke_callbacks = true}
instead. If you’re using one of the default exporters,invoke_callbacks()
will be called by the exporter.See also: Creating custom plugins
-
metrics.
register_callback
(callback)¶ Register a function named
callback
, which will be called right before metric collection on plugin export.Parameters: - callback (
function
) – a function that takes no parameters.
This method is most often used for gauge metrics updates.
Example:
local metrics = require('metrics') local bands_waste_size = metrics.gauge('bands_waste_size', 'The size of memory wasted due to internal fragmentation') metrics.register_callback(function() bands_waste_size:set(box.space.bands:stat()['tuple']['memtx']['waste_size']) end)
See also: Custom metrics
- callback (
-
metrics.
set_global_labels
(label_pairs)¶ Same as
metrics.cfg{ labels = label_pairs }
. Learn more in metrics.cfg().
-
metrics.
summary
(name[, help, objectives, params, metainfo])¶ Register a new summary. Quantile computation is based on the “Effective computation of biased quantiles over data streams” algorithm.
Parameters: - name (
string
) – collector name. Must be unique. - help (
string
) – collector description. - objectives (
table
) – a list of “targeted” φ-quantiles in the{quantile = error, ... }
form. Example:{[0.5]=0.01, [0.9]=0.01, [0.99]=0.01}
. The targeted φ-quantile is specified in the form of a φ-quantile and the tolerated error. For example,{[0.5] = 0.1}
means that the median (= 50th percentile) is to be returned with a 10-percent error. Note that percentiles and quantiles are the same concept, except that percentiles are expressed as percentages. The φ-quantile must be in the interval[0, 1]
. A lower tolerated error for a φ-quantile results in higher memory and CPU usage during summary calculation. - params (
table
) – table of the summary parameters used to configuring the sliding time window. This window consists of several buckets to store observations. New observations are added to each bucket. After a time period, the head bucket (from which observations are collected) is reset, and the next bucket becomes the new head. This way, each bucket stores observations formax_age_time * age_buckets_count
seconds before it is reset.max_age_time
sets the duration of each bucket’s lifetime – that is, how many seconds the observations are kept before they are discarded.age_buckets_count
sets the number of buckets in the sliding time window. This variable determines the number of buckets used to exclude observations older thanmax_age_time
from the summary. The value is a trade-off between resources (memory and CPU for maintaining the bucket) and how smooth the time window moves. Default value:{max_age_time = math.huge, age_buckets_count = 1}
. - metainfo (
table
) – collector metainfo.
Return: A summary object (see summary_obj).
Rtype: summary_obj
See also: Creating custom metrics
Note
A summary represents a set of collectors:
name .. "_sum"
– a counter holding the sum of added observations.name .. "_count"
– a counter holding the number of added observations.name
holds all the quantiles under observation that find themselves under the labelquantile
(less or equal). To access bucketx
(wherex
is a number), specify the valuex
for the labelquantile
.
- name (
-
metrics.
unregister_callback
(callback)¶ Unregister a function named
callback
that is called right before metric collection on plugin export.Parameters: - callback (
function
) – a function that takes no parameters.
Example:
local cpu_callback = function() local cpu_metrics = require('metrics.psutils.cpu') cpu_metrics.update() end metrics.register_callback(cpu_callback) -- after a while, we don't need that callback function anymore metrics.unregister_callback(cpu_callback)
- callback (
-
metrics.http_middleware.
build_default_collector
(type_name, name[, help])¶ Register and return a collector for the middleware.
Parameters: Return: A collector object
Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
configure_default_collector
(type_name, name, help)¶ Register a collector for the middleware and set it as default.
Parameters: Possible errors:
- A collector with the same type and name already exists in the registry.
-
metrics.http_middleware.
get_default_collector
()¶ Return the default collector. If the default collector hasn’t been set yet, register it (with default http_middleware.build_default_collector() parameters) and set it as default.
Return: A collector object
-
metrics.http_middleware.
set_default_collector
(collector)¶ Set the default collector.
Parameters: - collector – middleware collector object
-
metrics.http_middleware.
v1
(handler, collector)¶ Latency measuring wrap-up for the HTTP ver.
1.x.x
handler. Returns a wrapped handler.Learn more in Collecting HTTP metrics.
Parameters: - handler (
function
) – handler function. - collector – middleware collector object. If not set, the default collector is used (like in http_middleware.get_default_collector()).
Usage:
httpd:route(route, http_middleware.v1(request_handler, collector))
See also: Collecting HTTP metrics
- handler (