Syncing Files among 3 Nodes with Ansible
Imagine a scenario:
There are three nodes and each of them runs a daemon. The daemon will generate a token file in the name of its hostname under a specific location, say
/tmp/node-01
. To form a fully functional cluster, the daemon on one node needs to know the tokens of the other two nodes. The only way is to sync those token files across these three nodes so that each node has all three nodes’ token files, including the one generated by itself.
The Environment
Let’s say the hostname of the three nodes are node-01
, node-02
, and
node-03
. For demonstration, I use Harvester to
create VMs, you can use whatever you have in hand. Or you already have similar
setup, just skip this section. Note that there’s a VM called tower
, it is the
place where we run Ansible.
The token files generated by the daemon will be /tmp/node-01
, /tmp/node-02
,
and /tmp/node-03
respectively.
Prerequisites
First thing first, install Ansible on tower
machine:
$ sudo apt update
$ sudo apt install -y ansible
$ ansible --version
ansible 2.9.6
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/ubuntu/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
Create the inventory file hosts.yml
:
---
all:
hosts:
node-01:
ansible_host: 10.52.0.124
ansible_user: ubuntu
node-02:
ansible_host: 10.52.0.125
ansible_user: ubuntu
node-03:
ansible_host: 10.52.0.126
ansible_user: ubuntu
vars:
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
Now validate the inventory file we just created and also test the connectivity between the target nodes and the Ansible host.
$ ansible -i hosts.yml -m ping all
node-01 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
node-03 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
node-02 | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"ping": "pong"
}
We’re good to go!
Setting Up Demo Scenario
Since there’s no such daemon, it’s just our imagination, we have to generate the token files by ourselves. Let’s also do this with Ansible!
---
- name: Set up demo environment
hosts: all
tasks:
- name: Genrate the token
ansible.builtin.set_fact:
token: "{{ lookup('ansible.builtin.password', '/dev/null') }}"
- name: Print the token
ansible.builtin.debug:
msg: "{{ token }}"
- name: Create the token file
ansible.builtin.copy:
dest: "/tmp/{{ inventory_hostname }}"
content: "{{ token }}"
Then run the playbook:
$ ansible-playbook -i hosts.yml setup.yml
PLAY [Set up demo environment] *****************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [node-03]
ok: [node-02]
ok: [node-01]
TASK [Genrate the token] ***********************************************************************************************
ok: [node-01]
ok: [node-02]
ok: [node-03]
TASK [Print the token] *************************************************************************************************
ok: [node-01] => {
"changed": false,
"msg": "Awp5JzuQeAyvsSDu6l2a"
}
ok: [node-02] => {
"changed": false,
"msg": ",ktiL0H0:YSYBNmiCREy"
}
ok: [node-03] => {
"changed": false,
"msg": "bLn7A87q4aqLLzSVlC02"
}
TASK [Create the token file] *******************************************************************************************
changed: [node-02]
changed: [node-01]
changed: [node-03]
PLAY RECAP *************************************************************************************************************
node-01 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-02 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-03 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
In case you want to verify that the tokens are actually deployed on the right node, right path:
$ ssh -o StrictHostKeyChecking=no 10.52.0.124 cat /tmp/node-01; echo
Awp5JzuQeAyvsSDu6l2a
$ ssh -o StrictHostKeyChecking=no 10.52.0.125 cat /tmp/node-02; echo
,ktiL0H0:YSYBNmiCREy
$ ssh -o StrictHostKeyChecking=no 10.52.0.126 cat /tmp/node-03; echo
bLn7A87q4aqLLzSVlC02
It’s time for the real work: file synchronization.
Syncing Files with Ansible
Then how do we achieve our goal using Ansible? Basically, we utilize the synchronize module of Ansible, and there’re two ways of doing that:
- The centralized way
- The ad-hoc way
Each way has its pros and cons, I’ll explain them in the following respectively.
The Centralized Way
I believe it’s much easier to understand how this work with a diagram.
First, we pull the token files from all the nodes to the local directory on Ansible host. Then push these fetched token files back to the nodes. In the end, all the nodes have all the token files. And we’ll deliberately skip the “push to itself” part.
Create the playbook centralized-sync.yml
:
---
- name: Centralized sync
hosts: all
tasks:
- name: Create buffer directory
local_action:
module: ansible.builtin.file
path: buffer
state: directory
run_once: yes
- name: Pull the token file to localhost
synchronize:
src: "/tmp/{{ inventory_hostname }}"
dest: "buffer/{{ inventory_hostname }}"
mode: pull
- name: Push back token files to each node
synchronize:
src: "buffer/{{ item }}"
dest: "/tmp/{{ item }}"
mode: push
loop: "{{ groups['all'] }}"
when: item != inventory_hostname
As you can see, we create a buffer
directory on the Ansible host for
collecting the token files. After that we initiate connections and pull the
token file with synchronize module from each nodes to the buffer
directory.
Finally, pushing all the token files to each node. Note the last line, what it
means is when the name of the token file and the name of the node are the same,
just skip and straight to the next iteration.
To execute the playbook we’ve just created:
$ ansible-playbook -i hosts.yml centralized-sync.yml
PLAY [Centralized sync] *************************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [node-02]
ok: [node-03]
ok: [node-01]
TASK [Create buffer directory] *****************************************************************************************
ok: [node-01 -> localhost]
TASK [Pull the token file to localhost] ********************************************************************************
changed: [node-02]
changed: [node-03]
changed: [node-01]
TASK [Push back token files to each node] ******************************************************************************
skipping: [node-01] => (item=node-01)
changed: [node-01] => (item=node-02)
changed: [node-02] => (item=node-01)
skipping: [node-02] => (item=node-02)
changed: [node-03] => (item=node-01)
changed: [node-02] => (item=node-03)
changed: [node-01] => (item=node-03)
changed: [node-03] => (item=node-02)
skipping: [node-03] => (item=node-03)
PLAY RECAP *************************************************************************************************************
node-01 : ok=4 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-02 : ok=3 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-03 : ok=3 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
To verify that each node has all the token files, you can ssh
to each node
and execute a for
loop to show the file content.
$ ssh -o StrictHostKeyChecking=no 10.52.0.124 'for i in {1..3}; do cat /tmp/node-0$i; echo; done'
Awp5JzuQeAyvsSDu6l2a
,ktiL0H0:YSYBNmiCREy
bLn7A87q4aqLLzSVlC02
$ ssh -o StrictHostKeyChecking=no 10.52.0.125 'for i in {1..3}; do cat /tmp/node-0$i; echo; done'
Awp5JzuQeAyvsSDu6l2a
,ktiL0H0:YSYBNmiCREy
bLn7A87q4aqLLzSVlC02
$ ssh -o StrictHostKeyChecking=no 10.52.0.126 'for i in {1..3}; do cat /tmp/node-0$i; echo; done'
Awp5JzuQeAyvsSDu6l2a
,ktiL0H0:YSYBNmiCREy
bLn7A87q4aqLLzSVlC02
This is actually what we want. Now let’s take a look of the other method.
The Ad-hoc Way
Same, the diagram tells us all.
This time, we utilize the delegate_to
keyword heavily, though it just appears
once in the playbook shown below. The point is, we don’t want to install
Ansible and execute the playbook on every node. If so, what’s the difference
between copying files with scp
manually and using Ansible? What the
delegate_to
keyword does is, it delegates the task to the host specified and
references to the other hosts. In our scenario, as node-01
, we want to fetch
the token files on node-02
and node-03
. Then we go to node-02
to fetch
the token files on node-01
and node-03
. Finally, we fetch node-01
and
node-02
’s token files from node-03
’s perspective. It’s pretty suitable to
the usage of delegate_to
.
Create the playbook adhoc-sync.yml
:
---
- name: Ad-hoc sync
hosts: all
tasks:
- name: Create ssh public key buffer directory
local_action:
module: ansible.builtin.file
path: buffer/keys
state: directory
run_once: yes
- name: Generate ssh keypair on each node
ansible.builtin.user:
name: "{{ ansible_user }}"
generate_ssh_key: yes
- name: Fetch all public keys from each node
ansible.builtin.fetch:
src: "/home/{{ ansible_user }}/.ssh/id_rsa.pub"
dest: "buffer/keys/{{ inventory_hostname }}-id_rsa.pub"
flat: yes
- name: Assemble authorized keys from buffer
local_action:
module: ansible.builtin.assemble
src: buffer/keys
dest: buffer/keys/authorized_keys
run_once: yes
- name: Update authorized keys on each node
ansible.builtin.blockinfile:
block: "{{ lookup('file', 'buffer/keys/authorized_keys') }}"
path: "/home/{{ ansible_user }}/.ssh/authorized_keys"
backup: yes
create: yes
mode: 0600
state: present
- name: Synchronize files from each other
ansible.builtin.synchronize:
src: "/tmp/{{ inventory_hostname }}"
dest: "/tmp/{{ inventory_hostname }}"
mode: pull
delegate_to: "{{ item }}"
loop: "{{ groups['all'] }}"
when: item != inventory_hostname
This time the playbook is quite lengthy, but the real work, which is file
synchronization, is written in the last task. In order to rsync
token files
successfully, we have to make sure that each one of the node can ssh
to the
other two nodes using SSH keys. So the first thing is to generate SSH key pair
on every node, then distribute the public keys. We achieve this by collecting
public keys from each node, assembling them into a temporary file in buffer
directory, and updating authorized_keys
on each node with the content of the
temporary file.
To execute the playbook:
$ ansible-playbook -i hosts.yml adhoc-sync.yml
PLAY [Ad-hoc sync] *****************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [node-01]
ok: [node-02]
ok: [node-03]
TASK [Create ssh public key buffer directory] **************************************************************************
changed: [node-01 -> localhost]
TASK [Generate ssh keypair on each node] *******************************************************************************
changed: [node-01]
changed: [node-03]
changed: [node-02]
TASK [Fetch all public keys from each node] ****************************************************************************
changed: [node-03]
changed: [node-01]
changed: [node-02]
TASK [Assemble authorized keys from buffer] ****************************************************************************
changed: [node-01 -> localhost]
TASK [Update authorized keys on each node] *****************************************************************************
changed: [node-01]
changed: [node-02]
changed: [node-03]
TASK [Synchronize files from each other] *******************************************************************************
skipping: [node-01] => (item=node-01)
changed: [node-01 -> 10.52.0.125] => (item=node-02)
changed: [node-03 -> 10.52.0.124] => (item=node-01)
changed: [node-02 -> 10.52.0.124] => (item=node-01)
skipping: [node-02] => (item=node-02)
changed: [node-01 -> 10.52.0.126] => (item=node-03)
changed: [node-03 -> 10.52.0.125] => (item=node-02)
skipping: [node-03] => (item=node-03)
changed: [node-02 -> 10.52.0.126] => (item=node-03)
PLAY RECAP *************************************************************************************************************
node-01 : ok=7 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-02 : ok=5 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-03 : ok=5 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Now that all the nodes have all the token files. You can verify that with the
ssh
commands in the last section, I’ll just skip the detail for simplicity.
Pros and Cons
- The centralized way
- Advantages
- Simple logic (pull then push)
- No prerequisites
- Drawbacks
- Waste of traffic (outbound bandwidth might be expensive if this is in public cloud environment)
- Security issue (traffic will leave the cluster network)
- Advantages
- The ad-hoc way
- Advantages
- Simple in architecture (direct sync among nodes)
- Performant when it comes to large file synchronization, not just small file like token
- More secure in terms of data leak (traffic remains in the cluster network)
- Drawbacks
- Rather complicated logic (
delegate_to
keyword could be hard to understand at the beginning) - Got things to do beforehand (SSH pubkey distribution)
- Rather complicated logic (
- Advantages
Tearing Down Demo Scenario
Things have to be cleaned up after we’ve done the demonstration successfully.
---
- name: Tear down demo environment
hosts: all
tasks:
- name: Remove buffered token files
local_action:
module: ansible.builtin.file
path: "buffer/{{ item }}"
state: absent
loop: "{{ groups['all'] }}"
run_once: yes
- name: Remove combined authorized keys on each node
ansible.builtin.blockinfile:
block: "{{ lookup('file', 'buffer/keys/authorized_keys', errors='ignore') }}"
path: "/home/{{ ansible_user }}/.ssh/authorized_keys"
backup: yes
state: absent
- name: Remove buffered public keys
local_action:
module: ansible.builtin.file
path: "buffer/keys/"
state: absent
run_once: yes
- name: Remove the token file
ansible.builtin.file:
path: "/tmp/{{ item }}"
state: absent
loop: "{{ groups['all'] }}"
Now clean up the intermediate files and synced token files if you want to redo the demo. Otherwise, you can wipe out the VMs directly.
$ ansible-playbook -i hosts.yml teardown.yml
PLAY [Tear down demo environment] **************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [node-02]
ok: [node-01]
ok: [node-03]
TASK [Remove buffered token files] *************************************************************************************
ok: [node-01 -> localhost] => (item=node-01)
ok: [node-01 -> localhost] => (item=node-02)
ok: [node-01 -> localhost] => (item=node-03)
TASK [Remove combined authorized keys on each node] ********************************************************************
changed: [node-02]
changed: [node-03]
changed: [node-01]
TASK [Remove buffered public keys] *************************************************************************************
changed: [node-01 -> localhost]
TASK [Remove the token file] *******************************************************************************************
changed: [node-01] => (item=node-01)
changed: [node-02] => (item=node-01)
changed: [node-03] => (item=node-01)
changed: [node-01] => (item=node-02)
changed: [node-02] => (item=node-02)
changed: [node-03] => (item=node-02)
changed: [node-01] => (item=node-03)
changed: [node-02] => (item=node-03)
changed: [node-03] => (item=node-03)
PLAY RECAP *************************************************************************************************************
node-01 : ok=5 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-02 : ok=3 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
node-03 : ok=3 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
References
- How to generate single reusable random password with ansible
- Controlling where tasks run: delegation and local actions - Ansible Documentation
- ansible synchronize 同步文件夹
- How to synchronize a file between two remote servers in Ansible?
- The best way to authorize ssh key of each node to all nodes in the cluster