Understanding the binary protocol
To communicate with each other, Tarantool instances use a binary protocol called iproto.
In this set of examples, the user will be looking at binary code transferred via iproto.
The code is intercepted with tcpdump
, a monitoring utility.
To follow the examples in this section, get a single Linux computer and start three command-line shells (“terminals”).
– On terminal #1, Start monitoring port 3302 with tcpdump:
sudo tcpdump -i lo 'port 3302' -X
On terminal #2, start a server with:
box.cfg{listen=3302}
box.schema.space.create('tspace')
box.space.tspace:create_index('I')
box.space.tspace:insert{280}
box.schema.user.grant('guest','read,write,execute,create,drop','universe')
On terminal #3, start another server, which will act as a client, with:
box.cfg{}
net_box = require('net.box')
conn = net_box.connect('localhost:3302')
On terminal #3, run the following:
conn.space.tspace:select(280)
Now look at what tcpdump shows for the job connecting to 3302 – the “request”. After the words “length 32” is a packet that ends with these 32 bytes (we have added indented comments):
ce 00 00 00 1b MP_UINT = decimal 27 = number of bytes after this
82 MP_MAP, size 2 (we'll call this "Main-Map")
01 IPROTO_SYNC (Main-Map Item#1)
04 MP_INT = 4 = number that gets incremented with each request
00 IPROTO_REQUEST_TYPE (Main-Map Item#2)
01 IPROTO_SELECT
86 MP_MAP, size 6 (we'll call this "Select-Map")
10 IPROTO_SPACE_ID (Select-Map Item#1)
cd 02 00 MP_UINT = decimal 512 = id of tspace (could be larger)
11 IPROTO_INDEX_ID (Select-Map Item#2)
00 MP_INT = 0 = id of index within tspace
14 IPROTO_ITERATOR (Select-Map Item#3)
00 MP_INT = 0 = Tarantool iterator_type.h constant ITER_EQ
13 IPROTO_OFFSET (Select-Map Item#4)
00 MP_INT = 0 = amount to offset
12 IPROTO_LIMIT (Select-Map Item#5)
ce ff ff ff ff MP_UINT = 4294967295 = biggest possible limit
20 IPROTO_KEY (Select-Map Item#6)
91 MP_ARRAY, size 1 (we'll call this "Key-Array")
cd 01 18 MP_UINT = 280 (Select-Map Item#6, Key-Array Item#1)
-- 280 is the key value that we are searching for
Now read the source code file
net_box.c
and skip to the line netbox_encode_select(lua_State *L)
.
From the comments and from simple function calls like
mpstream_encode_uint(&stream, IPROTO_SPACE_ID);
you will be able to see how net_box put together the packet contents that you
have just observed with tcpdump.
There are libraries for reading and writing MessagePack objects. C programmers sometimes include msgpuck.h.
Now you know how Tarantool itself makes requests with the binary protocol.
When in doubt about a detail, consult net_box.c
– it has routines for each
request. Some connectors have similar code.
For an IPROTO_UPDATE example, suppose a user changes field #2 in tuple #2
in space #256 to 'BBBB'
. The body will look like this:
(notice that in this case there is an extra map item
IPROTO_INDEX_BASE, to emphasize that field numbers
start with 1, which is optional and can be omitted):
04 IPROTO_UPDATE
85 IPROTO_MAP, size 5
10 IPROTO_SPACE_ID, Map Item#1
cd 02 00 MP_UINT 256
11 IPROTO_INDEX_ID, Map Item#2
00 MP_INT 0 = primary-key index number
15 IPROTO_INDEX_BASE, Map Item#3
01 MP_INT = 1 i.e. field numbers start at 1
21 IPROTO_TUPLE, Map Item#4
91 MP_ARRAY, size 1, for array of operations
93 MP_ARRAY, size 3
a1 3d MP_STR = OPERATOR = '='
02 MP_INT = FIELD_NO = 2
a5 42 42 42 42 42 MP_STR = VALUE = 'BBBB'
20 IPROTO_KEY, Map Item#5
91 MP_ARRAY, size 1, for array of key values
02 MP_UINT = primary-key value = 2
Byte codes for the IPROTO_EXECUTE example:
0b IPROTO_EXECUTE
83 MP_MAP, size 3
43 IPROTO_STMT_ID Map Item#1
ce d7 aa 74 1b MP_UINT value of n.stmt_id
41 IPROTO_SQL_BIND Map Item#2
92 MP_ARRAY, size 2
01 MP_INT = 1 = value for first parameter
a1 61 MP_STR = 'a' = value for second parameter
2b IPROTO_OPTIONS Map Item#3
90 MP_ARRAY, size 0 (there are no options)
Byte codes for the response to the box.space.space-name:insert{6} example:
ce 00 00 00 20 MP_UINT = HEADER AND BODY SIZE
83 MP_MAP, size 3
00 IPROTO_REQUEST_TYPE
ce 00 00 00 00 MP_UINT = IPROTO_OK
01 IPROTO_SYNC
cf 00 00 00 00 00 00 00 53 MP_UINT = sync value
05 IPROTO_SCHEMA_VERSION
ce 00 00 00 68 MP_UINT = schema version
81 MP_MAP, size 1
30 IPROTO_DATA
dd 00 00 00 01 MP_ARRAY, size 1 (row count)
91 MP_ARRAY, size 1 (field count)
06 MP_INT = 6 = the value that was inserted
Byte codes for the response to the conn:eval([[box.schema.space.create('_space');]])
example:
ce 00 00 00 3b MP_UINT = HEADER AND BODY SIZE
83 MP_MAP, size 3 (i.e. 3 items in header)
00 IPROTO_REQUEST_TYPE
ce 00 00 80 0a MP_UINT = hexadecimal 800a
01 IPROTO_SYNC
cf 00 00 00 00 00 00 00 26 MP_UINT = sync value
05 IPROTO_SCHEMA_VERSION
ce 00 00 00 78 MP_UINT = schema version value
81 MP_MAP, size 1
31 IPROTO_ERROR_24
db 00 00 00 1d 53 70 61 63 etc. MP_STR = "Space '_space' already exists"
Byte codes, if we use the same net.box connection that
we used in the beginning
and we say
conn:execute([[CREATE TABLE t1 (dd INT PRIMARY KEY AUTOINCREMENT, дд STRING COLLATE "unicode");]])
conn:execute([[INSERT INTO t1 VALUES (NULL, 'a'), (NULL, 'b');]])
and we watch what tcpdump displays, we will see two noticeable things:
(1) the CREATE statement caused a schema change so the response has
a new IPROTO_SCHEMA_VERSION value and the body includes
the new contents of some system tables (caused by requests from net.box which users will not see);
(2) the final bytes of the response to the INSERT will be:
81 MP_MAP, size 1
42 IPROTO_SQL_INFO
82 MP_MAP, size 2
00 Tarantool constant (not in iproto_constants.h) = SQL_INFO_ROW_COUNT
02 1 = row count
01 Tarantool constant (not in iproto_constants.h) = SQL_INFO_AUTOINCREMENT_ID
92 MP_ARRAY, size 2
01 first autoincrement number
02 second autoincrement number
Byte codes for the SQL SELECT example,
if we ask for full metadata by saying
conn.space._session_settings:update('sql_full_metadata', {{'=', 'value', true}})
and we select the two rows from the table that we just created
conn:execute([[SELECT dd, дд AS д FROM t1;]])
then tcpdump will show this response, after the header:
82 MP_MAP, size 2 (i.e. metadata and rows)
32 IPROTO_METADATA
92 MP_ARRAY, size 2 (i.e. 2 columns)
85 MP_MAP, size 5 (i.e. 5 items for column#1)
00 a2 44 44 IPROTO_FIELD_NAME and 'DD'
01 a7 69 6e 74 65 67 65 72 IPROTO_FIELD_TYPE and 'integer'
03 c2 IPROTO_FIELD_IS_NULLABLE and false
04 c3 IPROTO_FIELD_IS_AUTOINCREMENT and true
05 c0 PROTO_FIELD_SPAN and nil
85 MP_MAP, size 5 (i.e. 5 items for column#2)
00 a2 d0 94 IPROTO_FIELD_NAME and 'Д' upper case
01 a6 73 74 72 69 6e 67 IPROTO_FIELD_TYPE and 'string'
02 a7 75 6e 69 63 6f 64 65 IPROTO_FIELD_COLL and 'unicode'
03 c3 IPROTO_FIELD_IS_NULLABLE and true
05 a4 d0 b4 d0 b4 IPROTO_FIELD_SPAN and 'дд' lower case
30 IPROTO_DATA
92 MP_ARRAY, size 2
92 MP_ARRAY, size 2
01 MP_INT = 1 i.e. contents of row#1 column#1
a1 61 MP_STR = 'a' i.e. contents of row#1 column#2
92 MP_ARRAY, size 2
02 MP_INT = 2 i.e. contents of row#2 column#1
a1 62 MP_STR = 'b' i.e. contents of row#2 column#2
Byte code for the SQL PREPARE example. If we said
conn:prepare([[SELECT dd, дд AS д FROM t1;]])
then tcpdump would show almost the same response, but there would
be no IPROTO_DATA. Instead, additional items will appear:
34 IPROTO_BIND_COUNT
00 MP_UINT = 0
33 IPROTO_BIND_METADATA
90 MP_ARRAY, size 0
MP_UINT = 0
and MP_ARRAY
has size 0 because there are no parameters to bind.
Full output:
84 MP_MAP, size 4
43 IPROTO_STMT_ID
ce c2 3c 2c 1e MP_UINT = statement id
34 IPROTO_BIND_COUNT
00 MP_INT = 0 = number of parameters to bind
33 IPROTO_BIND_METADATA
90 MP_ARRAY, size 0 = there are no parameters to bind
32 IPROTO_METADATA
92 MP_ARRAY, size 2 (i.e. 2 columns)
85 MP_MAP, size 5 (i.e. 5 items for column#1)
00 a2 44 44 IPROTO_FIELD_NAME and 'DD'
01 a7 69 6e 74 65 67 65 72 IPROTO_FIELD_TYPE and 'integer'
03 c2 IPROTO_FIELD_IS_NULLABLE and false
04 c3 IPROTO_FIELD_IS_AUTOINCREMENT and true
05 c0 PROTO_FIELD_SPAN and nil
85 MP_MAP, size 5 (i.e. 5 items for column#2)
00 a2 d0 94 IPROTO_FIELD_NAME and 'Д' upper case
01 a6 73 74 72 69 6e 67 IPROTO_FIELD_TYPE and 'string'
02 a7 75 6e 69 63 6f 64 65 IPROTO_FIELD_COLL and 'unicode'
03 c3 IPROTO_FIELD_IS_NULLABLE and true
05 a4 d0 b4 d0 b4 IPROTO_FIELD_SPAN and 'дд' lower case
Byte code for the heartbeat example. The master might send this body:
83 MP_MAP, size 3
00 Main-Map Item #1 IPROTO_REQUEST_TYPE
00 MP_UINT = 0
02 Main-Map Item #2 IPROTO_REPLICA_ID
02 MP_UINT = 2 = id
04 Main-Map Item #3 IPROTO_TIMESTAMP
cb MP_DOUBLE (MessagePack "Float 64")
41 d7 ba 06 7b 3a 03 21 8-byte timestamp
81 MP_MAP (body), size 1
5a Body Map Item #1 IPROTO_VCLOCK_SYNC
14 MP_UINT = 20 (vclock sync value)
Byte code for the heartbeat example. The replica might send back this body:
81 MP_MAP, size 1
00 Main-Map Item #1 IPROTO_REQUEST_TYPE
00 MP_UINT = 0 = IPROTO_OK
83 MP_MAP (body), size 3
26 Body Map Item #1 IPROTO_VCLOCK
81 MP_MAP, size 1 (vclock of 1 component)
01 MP_UINT = 1 = id (part 1 of vclock)
06 MP_UINT = 6 = lsn (part 2 of vclock)
5a Body Map Item #2 IPROTO_VCLOCK_SYNC
14 MP_UINT = 20 (vclock sync value)
53 Body Map Item #3 IPROTO_TERM
31 MP_UINT = 49 (term value)