ALICE native xrootd
Decommissioning ALICE native xrootd servers and dealing with data loss
Sommaire
Aim of this documentation
This objective of this document is to detail the procedure to follow by system administrators when they want to remove a xrootd server (decommissioning) or when they lost a filesystem on a xrootd server.
This document is based on the mails exchanged on the alice-lcg-task-force@cern.ch list and on the real cases encountered at the GRIF-IPNO site. Costin Grigoras is the author of the different recommendations and tips successfully applied at IPNO.
About the examples
The examples are taken from the IPNO sites where the redirector is ipngridxrd0.in2p3.fr and the xrootd servers are ipngridxrd1, ipngridxrd2, ... On all the xrootd servers the data partitions mount points follow the same naming convention: the data partitions are /grid/xrddataX {X=1..8}.
A quick presentation of the xrootd files tree
On each xrootd server there are one or more disk partitions where the data files are stored. There is also a namespace which is a directory containing the names of the data files: theses names are the ones the redirector uses. The name (or file name) itself is a symlink to the real xrootd data file. The name space can be in a separate partition or in a subdirectory of a data partition.
In the case of IPNO, the namespace is always a subdir of the first data partition. Here are some example from one xrootd server:
# df -h|grep xrddata /dev/sdb1 9.1T 5.6T 3.6T 62% /grid/xrddata1 /dev/sdb2 9.1T 5.6T 3.6T 62% /grid/xrddata2 /dev/sdb3 9.1T 5.6T 3.6T 62% /grid/xrddata3 /dev/sdb4 9.1T 5.6T 3.6T 62% /grid/xrddata4 /dev/sdc1 9.1T 5.6T 3.6T 62% /grid/xrddata5 /dev/sdc2 9.1T 5.6T 3.6T 62% /grid/xrddata6 /dev/sdc3 9.1T 5.6T 3.6T 62% /grid/xrddata7 /dev/sdc4 9.1T 5.6T 3.6T 62% /grid/xrddata8 # # ls -ld /grid/xrddata1/namespace drwxr-xr-x 18 xrootd xrootd 4096 Mar 30 2015 /grid/xrddata1/namespace
The data file %grid%xrddata1%namespace%00%65278%b8f9f574-dd42-11e4-a4e6-63e8b3f6492f in the partion /grid/xrddata6 is recorded in the namespace as /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f :
# ls -lh /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f lrwxrwxrwx 1 xrootd xrootd 85 Apr 7 2015 /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f -> /grid/xrddata6/%grid%xrddata1%namespace%00%65278%b8f9f574-dd42-11e4-a4e6-63e8b3f6492f # ls -lLh /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f -rw-rw-r-- 1 xrootd xrootd 3.6M Apr 7 2015 /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f
To access the file in this example, the URL will be root://ipngridxrd0.in2p3.fr:1094//00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f where ipngridxrd0 is the redirector here. For example to copy the file from a WN:
# xrdcp root://ipngridxrd0.in2p3.fr:1094//00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f /tmp/xrd_test.dat [3.594MB/3.594MB][100%][==================================================][3.59 [3.594MB/3.594MB][100%][==================================================][3.594MB/s] [root@ipngrid90 ~]# ls -lh /tmp/xrd_test.dat -rw-r--r-- 1 root root 3.6M Nov 18 10:54 /tmp/xrd_test.dat
Some observations:
- the file name in the namespace contains the GUID of the xrootd data file (ex: b8f9f574-dd42-11e4-a4e6-63e8b3f6492f in the example above)
# basename /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f b8f9f574-dd42-11e4-a4e6-63e8b3f6492f
- the xrootd data file name is built from the name in the namespace. In the example above, the xrootd data file name %grid%xrddata1%namespace%00%65278%b8f9f574-dd42-11e4-a4e6-63e8b3f6492f in the directory /grid/xrddata6/ is built from the name 00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f of the namespace.
- the xrootd data file name can be a random one as long as the symlink in the namespace continue to point to it. So one can do (to avoid in practice because there is no reason to do it) :
# service xrdservices stop # mv /grid/xrddata6/%grid%xrddata1%namespace%00%65278%b8f9f574-dd42-11e4-a4e6-63e8b3f6492f /grid/xrddata6/testfile.dat # ln -fs /grid/xrddata6/testfile.dat /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f # service xrdservices start
Even though the xrootd data file is renamed as /grid/xrddata6/testfile.dat, xrootd will continue to see it as /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f because we updated the symlink. This flexibility will allow the transfer of the data from one xrootd server to another even if the directory tree is not exactly the same on both servers.
Decommissioning a xrootd server
You may need the remove a xrootd server for many reasons (old hardware, frequent failures, ...). Before stopping the server and disconnecting it from the network, ALICE should be informed by sending an e-mail to alice-lcg-task-force@cern.ch. The experts from ALICE will tell you what to do to transfer the data elsewhere. There are possibilities:
- you have enough space on an other xrootd server on you site to transfer the data to
- in this case you must copy the data with rsync to this xrootd server
- you have enough space available on your SE but no xrootd server alone can receive all the data
- in this case, ALICE will ask you to send the list of the GUIDs and sizes of the files you need to transfer and will manage the transfer to your SE
- your SE doesn't have enough space to store the copies of the files
- in this case, ALICE will ask you to send the list of the GUIDs and sizes of the files you need to transfer and will manage the transfer to the SE on another ALICE site
The procedures in the different cases are detailed below.
Transfer preparation
Put the xrood server in read-only mode
Il any case, before starting the copy of the files, you must stop xrootd services on the server beeing decomminssioned, remount the partions in read-only mode and restart xrootd services.
If I have 8 xrootd data partitions:
service xrdservices stop for i in $(seq 1 8); do mount -o remount,ro /grid/xrddata$i; done service xrdservices start
Collecting files GUIDs
If you can copy the files to another xrootd server by rsync, you don't need to collect the GUIDs of the files. But if the transfer is to be done through xrood itself (from SE to SE), you need to collect the files GUIDs and provide them to the ALICE experts.
The script collect_xrootd_GUIDs.sh will produce 4 files in the subdir /var/tmp/GUIDS_$(hostname)_PID.
- file_names : contains the namesto be used in the xrootd URL of a file when using xrdcp for example
- guids_and_sizes : contains the GUIDs of the data files plus their sizes
- missing_files_names : contains the list of missing files (broken symlinks or missing data files)
- missing_files_guids : contains the GUIDs of the files in missing_files_names
Example:
# date ; sh collect_xrootd_GUIDs.sh /grid/xrddata1/namespace; date Wed Nov 18 17:13:14 CET 2015 The result of this script will be stored in files in /var/tmp/GUIDS_ipngridxrd16.in2p3.fr_26511 ... Wed Nov 18 17:24:06 CET 2015 # # cd /var/tmp/GUIDS_ipngridxrd16.in2p3.fr_26511 # ls -lh total 94M -rw-r--r-- 1 root root 49M Nov 18 17:16 file_names -rw-r--r-- 1 root root 46M Nov 18 17:24 guids_and_sizes -rw-r--r-- 1 root root 0 Nov 18 17:24 missing_files_guids -rw-r--r-- 1 root root 0 Nov 18 17:16 missing_files_names # # wc -l * 1052450 file_names 1052450 guids_and_sizes 0 missing_files_guids 0 missing_files_names # head -2 file_names ./04/45205/7cb82112-3c56-11e5-9516-23d68bd9df8f ./04/45205/816a4938-2e30-11e2-9cd8-db9bc21ad468 # head -2 guids_and_sizes 0cc502f4-1c43-11e5-b7b2-5f940fb22164 18961963 2444240a-de72-11e4-9879-079c5762f860 936798
Here is the collect_xrootd_GUIDs.sh script:
# cat collect_xrootd_GUIDs.sh #!/bin/sh # Collect the GUIDs + file sizes on an xrootd disk server. Also collect the # GUIDs of missing files. # NB: !!! Before launching this script, make sure to first mount # xrootd data partitions in read-only monde if [ "$#" != "1" ]; then echo "" echo "Usage: $0 namespace_base_dir" echo "" echo "Example: $0 /grid/xrddata1/namespace" echo "" exit 0 fi NAMESPACE=$1 [ ! -d "${NAMESPACE}" ] && echo "${NAMESPACE}: is not a directory" && exit 1 OUTDIR=/var/tmp/GUIDS_$(hostname)_$$ mkdir -p ${OUTDIR} echo "" echo "The result of this script will be stored in files in ${OUTDIR} ..." echo "" # Example of entry : ./04/11522/4107d468-a7f6-11df-b283-001e0bd3f44c FILE_NAMES="${OUTDIR}/file_names" # Example of entry: 05151ae6-76d7-11e5-aad5-8b87ecfb2d4e 19676060 GUIDS_AND_SIZES="${OUTDIR}/guids_and_sizes" # details (ls -l) des fichiers # Broken links: entry (symlinks) in the namespace without a data file MISSING_FILES_NAMES="${OUTDIR}/missing_files_names" # GUIDs of missing files MISSING_FILES_GUIDS="${OUTDIR}/missing_files_guids" cd ${NAMESPACE} # File names from the namespace including missings files (broken links) find . -type l -print > ${FILE_NAMES} # Details on data files : ls -l cat ${FILE_NAMES} | xargs ls -lL > ${GUIDS_AND_SIZES} 2> ${MISSING_FILES_NAMES} # keep only GUIDs and file sizes sed -i 's/\// /g' ${GUIDS_AND_SIZES} cat ${GUIDS_AND_SIZES} | awk '{print $NF " " $5}' > ${GUIDS_AND_SIZES}_tmp /bin/mv ${GUIDS_AND_SIZES}_tmp ${GUIDS_AND_SIZES} # Save missing files GUIDs cp ${MISSING_FILES_NAMES} ${MISSING_FILES_GUIDS} sed -i -e 's/\// /g' -e 's/\://g' ${MISSING_FILES_GUIDS} cat ${MISSING_FILES_GUIDS} | awk '{print $5}' > ${MISSING_FILES_GUIDS}_tmp /bin/mv ${MISSING_FILES_GUIDS}_tmp ${MISSING_FILES_GUIDS}
Prepare for the namespaces merging
After the xrootd data files are copied from server A to server B, the namespace on B must be merged with the namespace of A. If all the data copied from A land in the same directory name on B, the namespace merging can be done by just copying the files from the namespace of A to the namespace of B.
For example, let's supposed that 4 partitions A:/grid/xrddata{1,2,3,4} are copied respectively to B:/grid/xrddata{1,2,3,4}. If the namespaces are A:/grid/xrddata1/namespace and B:/grid/xrddata1/namespace, then the merge can be done by copying the files from A:/grid/xrddata1/namespace/ to
B:/grid/xrddata1/namespace/ .
There may be some case where the source directory and the destination directory names differ. For example let's suppose you have to do the following files copy:
- A:/grid/xrddata1 to B:/grid/xrddata1
- A:/grid/xrddata2 to B:/grid/xrddata7
- A:/grid/xrddata3 to B:/grid/xrddata8
In this case before starting the copy, you must collect the file names for each partition because you will need to create these files in the namespace of B as symlinks pointing de to new location of the copied files. You can create one file per partition. Each file will have two columns: the filename in the name space and the real xrootd data file name.
For example to create this file for A:/grid/xrddata1, one can do something like:
# cd /grid/xrddata1/namespace/
One should avoid splitting data from a partitions of A to more than one partition on B, because the transfert and the namespace merging will be more complicated an this can also be a source of problem. But it is still feasible.
Move xrootd data with rsync to another local xrootd server
Suppose that A is the xrootd server being decommissioned and B a xrootd server having enough disk space to receive data from A. In this case you can use rsync to transfer the data from A to B.
I you have N filesystems (mounted partitions) to transfer from A to B, B should have at least N partitions with sufficient space available. Even though it is possible to split a partition from A to more than one partitions of B, this should be avoided. It is much more easier and safer to copy each partition of A entirely to only one partition of B.
Before starting the copy, you must stop xrootd services on A, remount the xrootd data partitions in read-only mode and restart xrootd services (#Put_the_xrood_server_in_read-only_mode).
After the data copy, the namespace on the server B must be updated to reflect its new content. The namespace on B can be updated for each copied partition or only after the last partition is copied.
After the namespace on B is up-to-date, the server A can be stopped and disconnected from the network to avoid accidental reboot.
Important notes about the namespace
Once the data are copied from A to B, the namespace on B must be updated (merging the two namespaces). There are many possible cases.
- Identical source and destination partition names
- If for each partition copied from A to B, the partition name is identical on A an B, then the only thing to to after the copy of data files is to copy the namespace from A to B
- Some source and destination partition names are different
- In this case, after the copy of the each partition or after the last partition is copied you must update the namespace on B par a simple copy or by making symlinks.
- when a data partition name is unchanged during the copy, just copy the concerned file names from the namespace of A to B
- when a data partition name is changed during the copy, you must recreate in the namespace on B the symlinks and point the new paths of the data files.
Copy with preserving file names
If you have the same mount points convention on A an B, and if you can copy data from A to a partion having the sam