Support
Contents |
Logged in... now what?
After the login, the user is prompted to a shell on the master node of the cluster. User john will be prompted to the directory
/local/home/john
also known as the home of the user john.
A user willing to run a job on the cluster, typically will have to follow these steps
- copy files in the home folder by internet transfer from another computer
- setup and submit the PBS script to place the job in queue
- at the end of the calculation, transfer the files from the home in the cluster to another computer
- free the space in the home folder
Important: no job is allowed to run in the master node. Jobs must be submitted to the queue manager, which will handle the resources collection and usage for the required calculation.
A number of software packages are just available in the cluster, located in
/usr/local/share/
Alternatively, users can compile locally, in their home folder, the software package they want to use. Before doing so, to not waste time and/or disk space, please take a look if you can find your favourite program just in the dedicated folder.
NEW: shortly software, compilers and libraries will be handled with the module service. Users will be updated as soon as the modules are ready to be used.
Code development
Users working on the development of new programs will find these resources:
- GNU compiler
- MPI (openmpi)
- CUDA v7.5
Disk quota
Accounts will have 1 TB of disk quota available.
Useful commands
The operating system is Linux and working on the cluster requires a minimum ability with the commands prompt. Here a very short list of the most useful commands is provided, both for general use on a Linux terminal and specifically to work on the cluster.
Basic Linux commands
This list is not intended to explain the commands, but just to list the most useful ones.
- Directories
- pwd : shows the directory in which the prompt is positioned
- cd : change directory
- ls : list the content of a directory
- mkdir : create a directory
- rmdir : remove a directory
- Files
- cp : copy a file
- mv : move a file
- rm : delete a file
- file : provides information on the type of file
- File contents
- head : print the first row of a file
- tail : print the last row of a file
- more : show the content of a file
- less : same as more, but it allows to step back
- Editing files
- vi : open a file editor
- emacs : another file editor
- cat : concatenate files
- paste : concatentate file, but by columns
Copying files via scp
Transfer of files to and from the cluster can be handled from the terminal of the computer where the files have to be moved from or in. Assunming that an ssh tunnel has been opened on port 2000, to copy from a local machine to the cluster
scp -P 2000 local-path/file user@localhost:cluster-path/
where local-path is the path on the local computer where is located the file file to be copied into the cluster-path path in the cluster. Check the Clusters section to find the IP's to open the tunnel.
Similarily, to transfer from the cluster to a local computer, in a local machine terminal
scp -P 2000 user@localhost:cluster-path/file local-path/
To copy directories, just use the "-r" option of the scp command.
Alternatively, files transfer can be handled by programs with a graphical interface. A couple of suggestions are:
Running jobs
To submit a job, given that run.slurm is a SLURM script, just call
sbatch run.slurm
To check the status of the job in queue
squeue
To delete a job with id "ID", which is obtained from the squeue command
scancel ID
When a job is completed, the command
seff ID
prints the resources (RAM, CPU time, GPU time, etc.) effectively used by the job ID. This is a very important piece of information since requesting no more than the resources needed by a job helps SLURM to better handling the queue, which in turn means that the waitiing time of the jobs in the queue is lowered.
In the clusters, the queue management system is SLURM. Its function is to collect the resources required by a job (number of nodes, number of processors per node, walltime, etc.), which is run over such resources till job completion or resources availability (e.g., till the scheduled walltime is reached).
The SLURM service, however, is orchestrated by a second service: the scheduler. The latter calculates the priority of jobs which are waiting in queue and directs SLURM to run jobs according to the acquired priority.
The priority of a job queued to SLURM is a property that monotonically increases in time, but starting from a diffent initial value based on the type of queue and the past usage of that queue by the user with respect to the other users of the cluster (within a time window selected by the administrator). The C3P clusters have been set up to calculate the priority of idle jobs following
- The first addend is a linear increase of priority with A an integer, and t the time in minutes. In the actual setup, A = 1. Such a contribution provides a FIFO (first in first out) character to the queue.
- The second addend is the so-called fairshare and influences both the initial value of the priority and how it increases in time. The fairshare contribution provides a long-term balancing of the usage of the cluster based on the share quota of each of the users. If all the users posses the same quota, then over the year they will have shared evenly the cluster resources. If an account has a share quota larger then the others, then it will have access (in percent) to more resources with respect to the other accounts, over the year. At the time a job is queued to SLURM, this contribution can be positive or negative, depending on how much share quota the account has consumed, with respect to the other accounts, in the past. The parameters are an integer multiplier, B = 53 in the actual setup, the share target s0 which is the share quota that an account should reach in the long period, and the share consumed by the account, s(t). These parameters are described in detail in what follows.
- The third addend is a constant contribution to the priority. It is set to 0 for the normal production queues, to 100000 for the test queues (allowing short, test jobs to immediately acquire priority over the other jobs in queue), and to -1440 for the "long", exceptional, queues (i.e., a 1 day penalty).
The share target is in general different per account referent and per cluster. It is calculated as
with
- q is the contribution provided by the account referent in the 5 years previous the account request/renewal to buy computation nodes and/or software licenses with respect to the total contributions of the other account referents on the same cluster
- naccount is the number of accounts owned by the account referent
- sproject is a parameter that is used to respect the constraints imposed from those research projects for which it is required to prove that a computational resource eventually acquired using the project funds is used by the PI for a given percentage of time with respect to the duration of the project; it will be evaluated case by case.
Finally, the share consumed by an account, s(t), is calculated as
where the "past" is discretized in N time windows. For the n-th window, t(n) is the overall time of cluster usage, while x(n) is the cluster usage of the account. The parameter a, which can range from 0 to 1, controls the weight of the time windows. To clarify, if a = 0 then there is no share consumption. If a = 1, then all the time windows contribute evenly to the share consumption. In the case 0 < a < 1 the older the time window, the smaller the associated share consumption.
In the present parametrization, the share consumption is calculated over N = 14 time windows of 24 h each, with a = 1.
SLRUM setup
A job must be placed in a queue by passing to the SLURM service:
- a request collecting the resources needed by the job
- the set of commands to execute
The simplest way to proceed is to write a bash script with all the information, to be passed to the qsub program as specified above. The structure of a PBS script file looks like
#SBATCH --job-name=name #SBATCH --nodes=2 #SBATCH --ntasks=16 #SBATCH --cpus-per-task=1 #SBATCH --partition=avogadro #SBATCH --account=avogadro #SBATCH --time=100:00:00 commands to execute
where the parts in italic should be changed as appropriate.
The first row just selects in which type of shell the script should be run. In the example, the bash shell is used.
The rows starting with #SLURM are seen as comments from the bash shell, but are interpreted as commands from the SLURM scheduler. The minimal set of SLURM commands shown in the example above are
- --job-name : assigns a name to the job, which will be the name found in calling the squeue command, and it will be the root name of the .o and .e files containing, respectively, the standard output and error of the shell
- --nodes : is the number of nodes where to run the (parallel) jobs
- --ntasks : is the number of tasks
- --cpus-per-task : maps the tasks on the CPU's; this is useful for OpenMP or hybrid MPI/OpenMP jobs
- --partition : the queue to use
- --account : the cluster to use
- --time : the maximum execution time required in hh:mm:ss format
Below these SLURM commands, the user should add the instructions to run the calculation just as he/she was writing on an interactive terminal.
Home Folder and Storage
The home folder of each of the accounts is limited in disk space by a quota of 1 TB.
Users are warned that C3P does not offer a backup system. It is safe to download and store on a personal disk space the important data as soon as the job is completed.
In the Curie, Dalton, and Pierre clusters, users will find in their home folder a link to a folder named netapp. It points to a backup storage which is suggested to be used if, for some reason, the data cannot be removed immediatly at the completion of a job.
IMPORTANT It is here recalled that the C3P facility is not planned to work as a data centre. This means that backup data is not replicated or conserved in a room different from that of the clusters. Thus, the integrity of the data is not assured in time. For this reason, users are encouraged to download (and free space) data from the cluster as soon as their job finish.