Understanding the binary protocol | Tarantool
Reference Internals Binary protocol Understanding the binary protocol

Understanding the binary protocol

To communicate with each other, Tarantool instances use a binary protocol called iproto.

In this set of examples, the user will be looking at binary code transferred via iproto. The code is intercepted with tcpdump, a monitoring utility.

To follow the examples in this section, get a single Linux computer and start three command-line shells (“terminals”).

– On terminal #1, Start monitoring port 3302 with tcpdump:

sudo tcpdump -i lo 'port 3302' -X

On terminal #2, start a server with:

box.cfg{listen=3302}
box.schema.space.create('tspace')
box.space.tspace:create_index('I')
box.space.tspace:insert{280}
box.schema.user.grant('guest','read,write,execute,create,drop','universe')

On terminal #3, start another server, which will act as a client, with:

box.cfg{}
net_box = require('net.box')
conn = net_box.connect('localhost:3302')

On terminal #3, run the following:

conn.space.tspace:select(280)

Now look at what tcpdump shows for the job connecting to 3302 – the “request”. After the words “length 32” is a packet that ends with these 32 bytes (we have added indented comments):

ce 00 00 00 1b   MP_UINT = decimal 27 = number of bytes after this
82               MP_MAP, size 2 (we'll call this "Main-Map")
01                 IPROTO_SYNC (Main-Map Item#1)
04                 MP_INT = 4 = number that gets incremented with each request
00                 IPROTO_REQUEST_TYPE (Main-Map Item#2)
01                 IPROTO_SELECT
86                 MP_MAP, size 6 (we'll call this "Select-Map")
10                   IPROTO_SPACE_ID (Select-Map Item#1)
cd 02 00             MP_UINT = decimal 512 = id of tspace (could be larger)
11                   IPROTO_INDEX_ID (Select-Map Item#2)
00                   MP_INT = 0 = id of index within tspace
14                   IPROTO_ITERATOR (Select-Map Item#3)
00                   MP_INT = 0 = Tarantool iterator_type.h constant ITER_EQ
13                   IPROTO_OFFSET (Select-Map Item#4)
00                   MP_INT = 0 = amount to offset
12                   IPROTO_LIMIT (Select-Map Item#5)
ce ff ff ff ff       MP_UINT = 4294967295 = biggest possible limit
20                   IPROTO_KEY (Select-Map Item#6)
91                   MP_ARRAY, size 1 (we'll call this "Key-Array")
cd 01 18               MP_UINT = 280 (Select-Map Item#6, Key-Array Item#1)
                       -- 280 is the key value that we are searching for

Now read the source code file net_box.c and skip to the line netbox_encode_select(lua_State *L). From the comments and from simple function calls like mpstream_encode_uint(&stream, IPROTO_SPACE_ID); you will be able to see how net_box put together the packet contents that you have just observed with tcpdump.

There are libraries for reading and writing MessagePack objects. C programmers sometimes include msgpuck.h.

Now you know how Tarantool itself makes requests with the binary protocol. When in doubt about a detail, consult net_box.c – it has routines for each request. Some connectors have similar code.

For an IPROTO_UPDATE example, suppose a user changes field #2 in tuple #2 in space #256 to 'BBBB'. The body will look like this: (notice that in this case there is an extra map item IPROTO_INDEX_BASE, to emphasize that field numbers start with 1, which is optional and can be omitted):

04               IPROTO_UPDATE
85               IPROTO_MAP, size 5
10                 IPROTO_SPACE_ID, Map Item#1
cd 02 00           MP_UINT 256
11                 IPROTO_INDEX_ID, Map Item#2
00                 MP_INT 0 = primary-key index number
15                 IPROTO_INDEX_BASE, Map Item#3
01                 MP_INT = 1 i.e. field numbers start at 1
21                 IPROTO_TUPLE, Map Item#4
91                 MP_ARRAY, size 1, for array of operations
93                   MP_ARRAY, size 3
a1 3d                   MP_STR = OPERATOR = '='
02                      MP_INT = FIELD_NO = 2
a5 42 42 42 42 42       MP_STR = VALUE = 'BBBB'
20                 IPROTO_KEY, Map Item#5
91                 MP_ARRAY, size 1, for array of key values
02                   MP_UINT = primary-key value = 2

Byte codes for the IPROTO_EXECUTE example:

0b               IPROTO_EXECUTE
83               MP_MAP, size 3
43                 IPROTO_STMT_ID Map Item#1
ce d7 aa 74 1b     MP_UINT value of n.stmt_id
41                 IPROTO_SQL_BIND Map Item#2
92                 MP_ARRAY, size 2
01                   MP_INT = 1 = value for first parameter
a1 61                MP_STR = 'a' = value for second parameter
2b                 IPROTO_OPTIONS Map Item#3
90                 MP_ARRAY, size 0 (there are no options)

Byte codes for the response to the box.space.space-name:insert{6} example:

ce 00 00 00 20                MP_UINT = HEADER AND BODY SIZE
83                            MP_MAP, size 3
00                              IPROTO_REQUEST_TYPE
ce 00 00 00 00                  MP_UINT = IPROTO_OK
01                              IPROTO_SYNC
cf 00 00 00 00 00 00 00 53      MP_UINT = sync value
05                              IPROTO_SCHEMA_VERSION
ce 00 00 00 68                  MP_UINT = schema version
81                            MP_MAP, size 1
30                              IPROTO_DATA
dd 00 00 00 01                  MP_ARRAY, size 1 (row count)
91                              MP_ARRAY, size 1 (field count)
06                              MP_INT = 6 = the value that was inserted

Byte codes for the response to the conn:eval([[box.schema.space.create('_space');]]) example:

ce 00 00 00 3b                  MP_UINT = HEADER AND BODY SIZE
83                              MP_MAP, size 3 (i.e. 3 items in header)
   00                              IPROTO_REQUEST_TYPE
   ce 00 00 80 0a                  MP_UINT = hexadecimal 800a
   01                              IPROTO_SYNC
   cf 00 00 00 00 00 00 00 26      MP_UINT = sync value
   05                              IPROTO_SCHEMA_VERSION
   ce 00 00 00 78                  MP_UINT = schema version value
   81                              MP_MAP, size 1
     31                              IPROTO_ERROR_24
     db 00 00 00 1d 53 70 61 63 etc. MP_STR = "Space '_space' already exists"

Byte codes, if we use the same net.box connection that we used in the beginning and we say
conn:execute([[CREATE TABLE t1 (dd INT PRIMARY KEY AUTOINCREMENT, дд STRING COLLATE "unicode");]])
conn:execute([[INSERT INTO t1 VALUES (NULL, 'a'), (NULL, 'b');]])
and we watch what tcpdump displays, we will see two noticeable things: (1) the CREATE statement caused a schema change so the response has a new IPROTO_SCHEMA_VERSION value and the body includes the new contents of some system tables (caused by requests from net.box which users will not see); (2) the final bytes of the response to the INSERT will be:

81   MP_MAP, size 1
42     IPROTO_SQL_INFO
82     MP_MAP, size 2
00       Tarantool constant (not in iproto_constants.h) = SQL_INFO_ROW_COUNT
02       1 = row count
01       Tarantool constant (not in iproto_constants.h) = SQL_INFO_AUTOINCREMENT_ID
92       MP_ARRAY, size 2
01         first autoincrement number
02         second autoincrement number

Byte codes for the SQL SELECT example, if we ask for full metadata by saying
conn.space._session_settings:update('sql_full_metadata', {{'=', 'value', true}})
and we select the two rows from the table that we just created
conn:execute([[SELECT dd, дд AS д FROM t1;]])
then tcpdump will show this response, after the header:

82                       MP_MAP, size 2 (i.e. metadata and rows)
32                         IPROTO_METADATA
92                         MP_ARRAY, size 2 (i.e. 2 columns)
85                           MP_MAP, size 5 (i.e. 5 items for column#1)
00 a2 44 44                    IPROTO_FIELD_NAME and 'DD'
01 a7 69 6e 74 65 67 65 72     IPROTO_FIELD_TYPE and 'integer'
03 c2                          IPROTO_FIELD_IS_NULLABLE and false
04 c3                          IPROTO_FIELD_IS_AUTOINCREMENT and true
05 c0                          PROTO_FIELD_SPAN and nil
85                           MP_MAP, size 5 (i.e. 5 items for column#2)
00 a2 d0 94                    IPROTO_FIELD_NAME and 'Д' upper case
01 a6 73 74 72 69 6e 67        IPROTO_FIELD_TYPE and 'string'
02 a7 75 6e 69 63 6f 64 65     IPROTO_FIELD_COLL and 'unicode'
03 c3                          IPROTO_FIELD_IS_NULLABLE and true
05 a4 d0 b4 d0 b4              IPROTO_FIELD_SPAN and 'дд' lower case
30                         IPROTO_DATA
92                         MP_ARRAY, size 2
92                           MP_ARRAY, size 2
01                             MP_INT = 1 i.e. contents of row#1 column#1
a1 61                          MP_STR = 'a' i.e. contents of row#1 column#2
92                           MP_ARRAY, size 2
02                             MP_INT = 2 i.e. contents of row#2 column#1
a1 62                          MP_STR = 'b' i.e. contents of row#2 column#2

Byte code for the SQL PREPARE example. If we said
conn:prepare([[SELECT dd, дд AS д FROM t1;]])
then tcpdump would show almost the same response, but there would be no IPROTO_DATA. Instead, additional items will appear:

34                       IPROTO_BIND_COUNT
00                       MP_UINT = 0

33                       IPROTO_BIND_METADATA
90                       MP_ARRAY, size 0

MP_UINT = 0 and MP_ARRAY has size 0 because there are no parameters to bind. Full output:

84                       MP_MAP, size 4
43                         IPROTO_STMT_ID
ce c2 3c 2c 1e             MP_UINT = statement id
34                         IPROTO_BIND_COUNT
00                         MP_INT = 0 = number of parameters to bind
33                         IPROTO_BIND_METADATA
90                         MP_ARRAY, size 0 = there are no parameters to bind
32                         IPROTO_METADATA
92                         MP_ARRAY, size 2 (i.e. 2 columns)
85                           MP_MAP, size 5 (i.e. 5 items for column#1)
00 a2 44 44                    IPROTO_FIELD_NAME and 'DD'
01 a7 69 6e 74 65 67 65 72     IPROTO_FIELD_TYPE and 'integer'
03 c2                          IPROTO_FIELD_IS_NULLABLE and false
04 c3                          IPROTO_FIELD_IS_AUTOINCREMENT and true
05 c0                          PROTO_FIELD_SPAN and nil
85                           MP_MAP, size 5 (i.e. 5 items for column#2)
00 a2 d0 94                    IPROTO_FIELD_NAME and 'Д' upper case
01 a6 73 74 72 69 6e 67        IPROTO_FIELD_TYPE and 'string'
02 a7 75 6e 69 63 6f 64 65     IPROTO_FIELD_COLL and 'unicode'
03 c3                          IPROTO_FIELD_IS_NULLABLE and true
05 a4 d0 b4 d0 b4              IPROTO_FIELD_SPAN and 'дд' lower case

Byte code for the heartbeat example. The master might send this body:

83                      MP_MAP, size 3
00                        Main-Map Item #1 IPROTO_REQUEST_TYPE
00                          MP_UINT = 0
02                        Main-Map Item #2 IPROTO_REPLICA_ID
02                          MP_UINT = 2 = id
04                        Main-Map Item #3 IPROTO_TIMESTAMP
cb                          MP_DOUBLE (MessagePack "Float 64")
41 d7 ba 06 7b 3a 03 21     8-byte timestamp
81                      MP_MAP (body), size 1
5a                      Body Map Item #1 IPROTO_VCLOCK_SYNC
14                        MP_UINT = 20 (vclock sync value)

Byte code for the heartbeat example. The replica might send back this body:

81                       MP_MAP, size 1
00                         Main-Map Item #1 IPROTO_REQUEST_TYPE
00                         MP_UINT = 0 = IPROTO_OK
83                         MP_MAP (body), size 3
26                           Body Map Item #1 IPROTO_VCLOCK
81                             MP_MAP, size 1 (vclock of 1 component)
01                               MP_UINT = 1 = id (part 1 of vclock)
06                               MP_UINT = 6 = lsn (part 2 of vclock)
5a                           Body Map Item #2 IPROTO_VCLOCK_SYNC
14                             MP_UINT = 20 (vclock sync value)
53                           Body Map Item #3 IPROTO_TERM
31                             MP_UINT = 49 (term value)
Found what you were looking for?
Feedback