Tuple compression
Enterprise Edition
Tuple compression is available in the Enterprise Edition only.
Tuple compression, introduced in Tarantool Enterprise Edition 2.10.0, aims to save memory space. Typically, it decreases the volume of stored data by 15%. However, the exact volume saved depends on the type of data.
The following compression algorithms are supported:
To learn about the performance costs of each algorithm, check Tuple compression performance.
Tarantool doesn’t compress tuples themselves, just the fields inside these tuples. You can only compress non-indexed fields. Compression works best when JSON is stored in the field.
Note
The compress module provides the API for compressing and decompressing data.
First, create a space:
box.schema.space.create('bands')
Then, create an index for this space, for example:
box.space.bands:create_index('primary', {parts = {{1, 'unsigned'}}})
Create a format to declare field names and types.
In the example below, the band_name
and year
fields have the zstd
and lz4
compression formats, respectively.
The first field (id
) has the index, so it cannot be compressed.
box.space.bands:format({
{name = 'id', type = 'unsigned'},
{name = 'band_name', type = 'string', compression = 'zstd'},
{name = 'year', type = 'unsigned', compression = 'lz4'}
})
Now, the new tuples that you add to the space bands
will be compressed.
When you read a compressed tuple, you do not need to decompress it back yourself.
To check which fields in a space are compressed, run space_object:format() on the space. If a field is compressed, the format includes the compression algorithm, for example:
tarantool> box.space.bands:format()
---
- [{'name': 'id', 'type': 'unsigned'},
{'type': 'string', 'compression': 'zstd', 'name': 'band_name'},
{'type': 'unsigned', 'compression': 'lz4', 'name': 'year'}]
...
You can enable compression for existing fields. All the tuples added after that will have this field compressed. However, this doesn’t affect the tuples already stored in the space. You need to make the snapshot and restart Tarantool to compress the existing tuples.
Here’s an example of how to compress existing fields:
Create a space without compression and add several tuples:
box.schema.space.create('bands') box.space.bands:format({ { name = 'id', type = 'unsigned' }, { name = 'band_name', type = 'string' }, { name = 'year', type = 'unsigned' } }) box.space.bands:create_index('primary', { parts = { 'id' } }) box.space.bands:insert { 1, 'Roxette', 1986 } box.space.bands:insert { 2, 'Scorpions', 1965 } box.space.bands:insert { 3, 'Ace of Base', 1987 } box.space.bands:insert { 4, 'The Beatles', 1960 }
Suppose that you want fields 2 and 3 to be compressed from now on. To enable compression, change the format as follows:
local new_format = box.space.bands:format() new_format[2].compression = 'zstd' new_format[3].compression = 'lz4' box.space.bands:format(new_format)
From now on, all the tuples that you add to the space have fields 2 and 3 compressed.
To finalize the change, create a snapshot by running box.snapshot() and restart Tarantool. As a result, all old tuples will also be compressed in memory during recovery.
Note
space:upgrade() provides the ability to enable compression
and update the existing tuples in the background.
To achieve this, you need to pass a new space format in the format
argument of space:upgrade()
.
Below are the results of a synthetic test that illustrate how tuple compression affects performance.
The test was carried out on a simple Tarantool space containing 100,000 tuples,
each having a field with a sample JSON roughly 600 bytes large.
The test compared the speed of running select
and replace
operations on uncompressed and compressed data
as well as the overall data size of the space.
Performance is measured in requests per second.
Compression type | select , RPS |
replace , RPS |
Space size, bytes |
---|---|---|---|
None | 4,486k | 1,109k | 41,168,548 |
zstd |
308k | 26k | 21,368,548 |
lz4 |
1,765k | 672k | 25,268,548 |
zlib |
325k | 107k | 20,768,548 |