Difference between revisions of "ALICE native xrootd"
Ligne 66: | Ligne 66: | ||
The procedures in the different cases are detailed below. | The procedures in the different cases are detailed below. | ||
− | === Transfer preparation === | + | === Transfer preparation : read-only mode and collecting files GUIDs === |
Il any case, before starting the copy of the files, you must stop xrootd services on the server beeing decomminssioned, remount the partions in read-only mode and restart xrootd services. | Il any case, before starting the copy of the files, you must stop xrootd services on the server beeing decomminssioned, remount the partions in read-only mode and restart xrootd services. | ||
Ligne 78: | Ligne 78: | ||
</pre> | </pre> | ||
+ | If you can copy the files to another xrootd server by rsync, you don't need to collect the GUIDs of the files. If the transfer is done through xrood itself (from SE to SE), you need to collect the files GUIDs and provide them to the ALICE experts. | ||
+ | |||
+ | The script collect_server_GUIDs.sh (attached below) will produce two files: full_GUIDs.list and GUIDs_and_size.list. | ||
=== Locally copying | === Locally copying | ||
== What to do in case of data loss ? == | == What to do in case of data loss ? == |
Version du 12:36, 18 novembre 2015
Decommissioning ALICE native xrootd servers and dealing with data loss
Sommaire
Aim of this documentation
This objective of this document is to detail the procedure to follow by system administrators when they want to remove an xrootd server (decommissioning) or when they lost a filesystem on an xrootd server.
This document is based on the mails exchanged on the alice-lcg-task-force@cern.ch list and on the real cases encountered at the GRIF-IPNO site. Costin Grigoras is the author of the different recommendations and tips successfully applied at IPNO.
About the examples
The examples are taken from the IPNO sites where the redirector is ipngridxrd0.in2p3.fr and the xrootd servers are ipngridxrd1, ipngridxrd2, ... On all the xrootd servers the data partitions mount points follow the same naming convention: the data partitions are /grid/xrddataX {X=1..8}.
A quick presentation of the xrootd files tree
On each xrootd server there are on or more disk partitions where the data files are stored. There is also a namespace which is a directory containing the names of the data files: theses names are the ones the redirector uses. The name itself is a symlink to the real data file. The name space can be in a separate partition or in a subdirectory of a data partition.
In the case of IPNO, the namespace is always a subdir of the first data partition. Here are some example from one xrootd server:
# df -h|grep xrddata /dev/sdb1 9.1T 5.6T 3.6T 62% /grid/xrddata1 /dev/sdb2 9.1T 5.6T 3.6T 62% /grid/xrddata2 /dev/sdb3 9.1T 5.6T 3.6T 62% /grid/xrddata3 /dev/sdb4 9.1T 5.6T 3.6T 62% /grid/xrddata4 /dev/sdc1 9.1T 5.6T 3.6T 62% /grid/xrddata5 /dev/sdc2 9.1T 5.6T 3.6T 62% /grid/xrddata6 /dev/sdc3 9.1T 5.6T 3.6T 62% /grid/xrddata7 /dev/sdc4 9.1T 5.6T 3.6T 62% /grid/xrddata8 # # ls -ld /grid/xrddata1/namespace drwxr-xr-x 18 xrootd xrootd 4096 Mar 30 2015 /grid/xrddata1/namespace </tt> The data file %grid%xrddata1%namespace%00%65278%b8f9f574-dd42-11e4-a4e6-63e8b3f6492f in the partion /grid/xrddata6 is recorded in the namespace as /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f : <pre> # ls -lh /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f lrwxrwxrwx 1 xrootd xrootd 85 Apr 7 2015 /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f -> /grid/xrddata6/%grid%xrddata1%namespace%00%65278%b8f9f574-dd42-11e4-a4e6-63e8b3f6492f # ls -lLh /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f -rw-rw-r-- 1 xrootd xrootd 3.6M Apr 7 2015 /grid/xrddata1/namespace/00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f
To access the file in this example, the URL will be root://ipngridxrd0.in2p3.fr:1094//00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f where ipngridxrd0 is the redirector here. For example to copy the file from a WN:
# xrdcp root://ipngridxrd0.in2p3.fr:1094//00/65278/b8f9f574-dd42-11e4-a4e6-63e8b3f6492f /tmp/xrd_test.dat [3.594MB/3.594MB][100%][==================================================][3.59 [3.594MB/3.594MB][100%][==================================================][3.594MB/s] [root@ipngrid90 ~]# ls -lh /tmp/xrd_test.dat -rw-r--r-- 1 root root 3.6M Nov 18 10:54 /tmp/xrd_test.dat [root@ipngrid90 ~]# rm /tmp/xrd_test.dat
Decommissioning an xrootd server
You may need the remove an xrootd server for many reasons (old hardware, frequent failures, ...). Before stopping the server and disconnecting it from the network, ALICE should be informed by sending an e-mail to alice-lcg-task-force@cern.ch. The experts from ALICE will tell you what to do to transfer the data elsewhere. There are possibilities:
- you have enough space on an other xrootd server on you site to transfer the data to
- in this case you must copy the data with rsync to this xrootd server
- you have enough space available on your SE but no xrootd server alone can receive all the data
- in this case, ALICE will ask you to send the list of the GUIDs and sizes of the files you need to transfer and will manage the transfer to your SE
- your SE doesn't have enough space to store the copies of the files
- in this case, ALICE will ask you to send the list of the GUIDs and sizes of the files you need to transfer and will manage the transfer to the SE on another ALICE site
The procedures in the different cases are detailed below.
Transfer preparation : read-only mode and collecting files GUIDs
Il any case, before starting the copy of the files, you must stop xrootd services on the server beeing decomminssioned, remount the partions in read-only mode and restart xrootd services.
If I have 8 xrootd data partitions:
service xrdservices stop for i in $(seq 1 8); do mount -o remount,ro /grid/xrddata$i; done service xrdservices start
If you can copy the files to another xrootd server by rsync, you don't need to collect the GUIDs of the files. If the transfer is done through xrood itself (from SE to SE), you need to collect the files GUIDs and provide them to the ALICE experts.
The script collect_server_GUIDs.sh (attached below) will produce two files: full_GUIDs.list and GUIDs_and_size.list.
=== Locally copying