Версия:

Руководство пользователя / Replication / Предотвращение дублирующихся действий
Руководство пользователя / Replication / Предотвращение дублирующихся действий

Предотвращение дублирующихся действий

Предотвращение дублирующихся действий

Tarantool guarantees that every update is applied only once on every replica. However, due to the asynchronous nature of replication, the order of updates is not guaranteed. We now analyze this problem with more details, provide examples of replication going out of sync, and suggest solutions.

Остановка репликации

In a replica set of two masters, suppose master #1 tries to do something that master #2 has already done. For example, try to insert a tuple with the same unique key:

tarantool> box.space.tester:insert{1, 'data'}

This would cause an error saying Duplicate key exists in unique index 'primary' in space 'tester' and the replication would be stopped. (This is the behavior when the replication_skip_conflict configuration parameter has its default recommended value, false.)

$ # сообщения об ошибках от мастера №1
          2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 I> can't read row
          2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
          Duplicate key exists in unique index 'primary' in space 'tester'
          2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
          2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

          $ # сообщения об ошибках от мастера №2
          2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 I> can't read row
          2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
          Duplicate key exists in unique index 'primary' in space 'tester'
          2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
          2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

If we check replication statuses with box.info, we will see that replication at master #1 is stopped (1.upstream.status = stopped). Additionally, no data is replicated from that master (section 1.downstream is missing in the report), because the downstream has encountered the same error:

# статусы репликации (отчет от мастера №3)
          tarantool> box.info
          ---
          - version: 1.7.4-52-g980d30092
            id: 3
            ro: false
            vclock: {1: 9, 2: 1000000, 3: 3}
            uptime: 557
            lsn: 3
            vinyl: []
            cluster:
              uuid: 34d13b1a-f851-45bb-8f57-57489d3b3c8b
            pid: 30445
            status: running
            signature: 1000012
            replication:
              1:
                id: 1
                uuid: 7ab6dee7-dc0f-4477-af2b-0e63452573cf
                lsn: 9
                upstream:
                  peer: replicator@192.168.0.101:3301
                  lag: 0.00050592422485352
                  status: stopped
                  idle: 445.8626639843
                  message: Duplicate key exists in unique index 'primary' in space 'tester'
              2:
                id: 2
                uuid: 9afbe2d9-db84-4d05-9a7b-e0cbbf861e28
                lsn: 1000000
                upstream:
                  status: follow
                  idle: 201.99915885925
                  peer: replicator@192.168.0.102:3301
                  lag: 0.0015020370483398
                downstream:
                  vclock: {1: 8, 2: 1000000, 3: 3}
              3:
                id: 3
                uuid: e826a667-eed7-48d5-a290-64299b159571
                lsn: 3
            uuid: e826a667-eed7-48d5-a290-64299b159571
          ...

Когда позднее репликация возобновлена вручную:

# возобновление остановленной репликации (на всех мастерах)
          tarantool> original_value = box.cfg.replication
          tarantool> box.cfg{replication={}}
          tarantool> box.cfg{replication=original_value}

… the faulty row in the write-ahead-log files is skipped.

Рассинхронизация репликации

Предположим, что мы выполняем следующую операцию в кластере из двух экземпляров с конфигурацией мастер-мастер:

tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

When this operation is applied on both instances in the replica set:

# на мастере №1
            tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})
            # на мастере №2
            tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

… можно получить следующие результаты в зависимости порядка выполнения:

  • each master’s row contains the UUID from master #1,
  • each master’s row contains the UUID from master #2,
  • master #1 has the UUID of master #2, and vice versa.

Коммутативные изменения

The cases described in the previous paragraphs represent examples of non-commutative operations, i.e. operations whose result depends on the execution order. On the contrary, for commutative operations, the execution order does not matter.

Рассмотрим, например, следующую команду:

tarantool> box.space.tester:upsert{{1, 0}, {{'+', 2, 1)}

Эта операция коммутативна: получаем одинаковый результат, независимо от порядка, в котором обновление применяется на других мастерах.