Manual remote bootstrap of failed peer
This page documents the preview (v2.21) version. Preview includes features under active development and is for development and testing only. For production, use the stable (v2024.1) version.
When a Raft peer fails, YugabyteDB executes an automatic remote bootstrap to create a new peer from the remaining ones.
If a majority of Raft peers fail for a given tablet, you need to manually execute a remote bootstrap. A list of tablets is available via the yb-master-ip:7000/tablet-replication yb-admin UI.
Assume you have a cluster where the following applies:
- Replication factor is 3.
- A tablet with UUID
TABLET1. - Three tablet peers, with one in good working order, referred to as
NODE_GOOD, and two broken peers referred to asNODE_BAD1andNODE_BAD2. - Some of the tablet-related data is to be copied from the good peer to each of the bad peers until the majority of them are restored.
These are the steps to follow:
-
Delete the tablet from the broken peers if necessary, by running:
yb-ts-cli --server_address=NODE_BAD1 delete_tablet TABLET1 yb-ts-cli --server_address=NODE_BAD2 delete_tablet TABLET1 -
Trigger a remote bootstrap of
TABLET1fromNODE_GOODtoNODE_BAD1.yb-ts-cli --server_address=NODE_BAD1 remote_bootstrap NODE_GOOD TABLET1
After the remote bootstrap finishes, NODE_BAD2 should be automatically removed from the quorum and TABLET1 fixed, as it has gotten a majority of healthy peers.
If you can't perform the preceding steps, you can do the following to manually execute the equivalent of a remote bootstrap:
-
On
NODE_GOOD, create an archive of the WALS (Raft data), RocksDB (regular) directories, intents (transactions data), and snapshots directories forTABLET1. -
Copy these archives over to
NODE_BAD1, on the same drive thatTABLET1currently has its Raft and RocksDB data. -
Stop
NODE_BAD1, as the file system data underneath will change. -
Remove the old WALS, RocksDB, intents, snapshots data for
TABLET1fromNODE_BAD1. -
Unpack the data copied from
NODE_GOODinto the corresponding (now empty) directories onNODE_BAD1. -
Restart
NODE_BAD1so it can bootstrapTABLET1using this new data. -
Restart
NODE_GOODso it can properly observe the changed state and data onNODE_BAD1.
At this point, NODE_BAD2 should be automatically removed from the quorum and TABLET1 fixed, as it has gotten a majority of healthy peers.
Note that typically, when you try to find tablet data, you would use a find command across the --fs_data_dir paths.
In the following example, assume that is set to /mnt/d0 and your tablet UUID is c08596d5820a4683a96893e092088c39:
find /mnt/d0/ -name '*c08596d5820a4683a96893e092088c39*'
/mnt/d0/yb-data/tserver/wals/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/tablet-meta/c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/consensus-meta/c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.intents
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.snapshots
The data you would be interested is the following:
-
For the Raft WALS:
/mnt/d0/yb-data/tserver/wals/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39 -
For the RocksDB regular database:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39 -
For the intents files:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.intents -
For the snapshot files:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.snapshots