Support

From C3P
(Difference between revisions)
Jump to: navigation, search
 
(9 intermediate revisions by one user not shown)
Line 18: Line 18:
 
  /usr/local/share/
 
  /usr/local/share/
 
Alternatively, users can compile locally, in their home folder, the software package they want to use. Before doing so, to not waste time and/or disk space, please take a look if you can find your favourite program just in the dedicated folder.
 
Alternatively, users can compile locally, in their home folder, the software package they want to use. Before doing so, to not waste time and/or disk space, please take a look if you can find your favourite program just in the dedicated folder.
 +
 +
NEW: shortly software, compilers and libraries will be handled with the '''module''' service. Users will be updated as soon as the modules are ready to be used.
  
  
Line 25: Line 27:
 
* GNU compiler  
 
* GNU compiler  
 
* MPI (openmpi)
 
* MPI (openmpi)
* CUDA v4.0 on curie1-curie5, v4.1 on curie6-curie8, and v7.5 on curie9-curie11
+
* CUDA v7.5
 +
 
 +
 
 +
== Disk quota ==
 +
 
 +
Accounts will have 1 TB of disk quota available.
  
  
Line 59: Line 66:
 
=== Copying files via scp ===
 
=== Copying files via scp ===
  
Transfer of files to and from the cluster can be handled from the terminal of the computer where the files have to be moved from or in. To copy from a local machine to the cluster
+
Transfer of files to and from the cluster can be handled from the terminal of the computer where the files have to be moved from or in. Assunming that an ssh tunnel has been opened on port 2000, to copy from a local machine to the cluster
  scp local-path/file user@IP:cluster-path/
+
  scp -P 2000 local-path/file user@localhost:cluster-path/
where ''local-path'' is the path on the local computer where is located the file ''file'' to be copied into the ''cluster-path'' path in the cluster. In the command, ''user'' is the user login in the cluster, and ''IP'' is the cluster IP. Please, check the [[Clusters]] section to find the IP's.
+
where ''local-path'' is the path on the local computer where is located the file ''file'' to be copied into the ''cluster-path'' path in the cluster. Check the [[Clusters]] section to find the IP's to open the tunnel.
  
 
Similarily, to transfer from the cluster to a local computer, in a '''local''' machine terminal
 
Similarily, to transfer from the cluster to a local computer, in a '''local''' machine terminal
  scp user@IP:cluster-path/file local-path/
+
  scp -P 2000 user@localhost:cluster-path/file local-path/
  
To copy directories, just use "scp -r".
+
To copy directories, just use the "-r" option of the scp command.
  
  
Line 76: Line 83:
 
=== Running jobs ===
 
=== Running jobs ===
  
To submit a job, given that run.pbs is the PBS script, just call
+
To submit a job, given that run.slurm is a SLURM script, just call
  qsub run.pbs
+
  sbatch run.slurm
  
 
To check the status of the job in queue
 
To check the status of the job in queue
  qstat
+
  squeue
  
To delete a job labeled "JOB", which has an id ID as seen from the qstat command
+
To delete a job with id "ID", which is obtained from the squeue command
  qdel JOB
+
  scancel ID
  
or, equivalently
 
  
  qdel ID
+
When a job is completed, the command
 +
  seff ID
  
To check the available queues and their status
+
prints the resources (RAM, CPU time, GPU time, etc.) effectively used by the job ID. This is a very important piece of information since requesting no more than the resources needed by a job helps SLURM to better handling the queue, which in turn means that the waitiing time of the jobs in the queue is lowered.
qstat -q
+
  
  
 
== Priority and Fairshare ==
 
== Priority and Fairshare ==
  
In the [[Clusters|clusters]], the '''PBS''' (portable batch system) is the queue management system. Its function is to collect the resources required by a job (number of nodes, number of processors per node, walltime, etc.), which is run over such resources till job completion or resources availability (e.g., till the scheduled walltime is reached).
+
In the [[Clusters|clusters]], the queue management system is '''SLURM'''. Its function is to collect the resources required by a job (number of nodes, number of processors per node, walltime, etc.), which is run over such resources till job completion or resources availability (e.g., till the scheduled walltime is reached).
  
The PBS service, however, is orchestred by a second service: the '''Maui scheduler'''. Maui calculates the priority of jobs which are idle (i.e., in queue) and directs PBS to run jobs according to the acquired priority.
+
The SLURM service, however, is orchestrated by a second service: the '''scheduler'''. The latter calculates the priority of jobs which are waiting in queue and directs SLURM to run jobs according to the acquired priority.
  
The '''priority''' of a job queued to PBS is a property that monotonically increases in time, but starting from a diffent initial value based on the type of queue and the past usage of that queue by the user with respect to the  
+
The '''priority''' of a job queued to SLURM is a property that monotonically increases in time, but starting from a diffent initial value based on the type of queue and the past usage of that queue by the user with respect to the  
 
other users of the [[Clusters|cluster]] (within a time window selected by the administrator). The C3P [[Clusters|clusters]] have been set up to calculate the priority of idle jobs following
 
other users of the [[Clusters|cluster]] (within a time window selected by the administrator). The C3P [[Clusters|clusters]] have been set up to calculate the priority of idle jobs following
  
Line 108: Line 114:
  
 
* The first addend is a linear increase of priority with A an integer, and t the time in minutes. In the actual setup, A = 1. Such a contribution provides a FIFO (first in first out) character to the queue.
 
* The first addend is a linear increase of priority with A an integer, and t the time in minutes. In the actual setup, A = 1. Such a contribution provides a FIFO (first in first out) character to the queue.
* The second addend is the so-called ''fairshare'' and influences both the initial value of the priority and how it increases in time. The fairshare contribution provides a long-term balancing of the usage of the [[Clusters|cluster]] based on the '''share''' quota of each of the users. If all the users posses the same quota, then over the year they will have shared evenly the [[Clusters|cluster]] resources. If an account has a share quota larger then the others, then it will have access (in percent) to more resources with respect to the other accounts, over the year. At the time a job is queued to PBS, this contribution can be positive or negative, depending on how much share quota the account has consumed, with respect to the other accounts, in the past. The parameters are an integer multiplier, B = 53 in the actual setup, the ''share target'' s<sub>0</sub> which is the share quota that an account should reach in the long period, and the share consumed by the account, s(t). These parameters are described in detail in what follows.
+
* The second addend is the so-called ''fairshare'' and influences both the initial value of the priority and how it increases in time. The fairshare contribution provides a long-term balancing of the usage of the [[Clusters|cluster]] based on the '''share''' quota of each of the users. If all the users posses the same quota, then over the year they will have shared evenly the [[Clusters|cluster]] resources. If an account has a share quota larger then the others, then it will have access (in percent) to more resources with respect to the other accounts, over the year. At the time a job is queued to SLURM, this contribution can be positive or negative, depending on how much share quota the account has consumed, with respect to the other accounts, in the past. The parameters are an integer multiplier, B = 53 in the actual setup, the ''share target'' s<sub>0</sub> which is the share quota that an account should reach in the long period, and the share consumed by the account, s(t). These parameters are described in detail in what follows.
* The third addend is a constant contribution to the priority. It is set to 0 for the normal production queues, to 100000 for the test queues (allowing short, test jobs to immediately acquire priority over the other jobs in queue), and to -1440 for the "long", exceptional, queues.
+
* The third addend is a constant contribution to the priority. It is set to 0 for the normal production queues, to 100000 for the test queues (allowing short, test jobs to immediately acquire priority over the other jobs in queue), and to -1440 for the "long", exceptional, queues (i.e., a 1 day penalty).
  
 
To check the priority of a queuing job
 
sudo -u mauiadmin /usr/local/maui/bin/diagnose -p
 
providing the account password as requested. At presemt, this information is accessible only on [[Avogadro]]/[[Gibbs]].
 
  
  
Line 135: Line 137:
  
  
the "past" is discretized in N time windows. For the n-th window, t(n) is the overall time of [[Clusters|cluster]] usage, while x(n) is the [[Clusters|cluster]] usage of the account. The parameter a, which can range from 0 to 1, controls the weight of the time windows. To clarify, if a = 0 then there is no share consumption. If a = 1, then all the time windows contribute evenly to the share consumption. In the case 0 < a < 1 older the time window, smaller the associated share consumption.
+
where the "past" is discretized in N time windows. For the n-th window, t(n) is the overall time of [[Clusters|cluster]] usage, while x(n) is the [[Clusters|cluster]] usage of the account. The parameter a, which can range from 0 to 1, controls the weight of the time windows. To clarify, if a = 0 then there is no share consumption. If a = 1, then all the time windows contribute evenly to the share consumption. In the case 0 < a < 1 the older the time window, the smaller the associated share consumption.
 
In the present parametrization, the share consumption is calculated over N = 14 time windows of 24 h each, with a = 1.
 
In the present parametrization, the share consumption is calculated over N = 14 time windows of 24 h each, with a = 1.
  
  
== PBS setup ==
+
== SLRUM setup ==
  
A job must be placed in a queue by passing to the PBS service:
+
A job must be placed in a queue by passing to the SLURM service:
# a request collecting the resoruces needed by the job
+
# a request collecting the resources needed by the job
 
# the set of commands to execute
 
# the set of commands to execute
  
 
The simplest way to proceed is to write a bash script with all the information, to be passed to the ''qsub'' program as specified above. The structure of a PBS script file looks like
 
The simplest way to proceed is to write a bash script with all the information, to be passed to the ''qsub'' program as specified above. The structure of a PBS script file looks like
  
  <nowiki>#</nowiki>!/bin/sh --login
+
  <nowiki>#</nowiki>SBATCH --job-name=''name''
  <nowiki>#</nowiki>PBS -N ''jobname''
+
  <nowiki>#</nowiki>SBATCH --nodes=''2''
  <nowiki>#</nowiki>PBS -A ''account''
+
  <nowiki>#</nowiki>SBATCH --ntasks=''16''
  <nowiki>#</nowiki>PBS -q ''queue''
+
  <nowiki>#</nowiki>SBATCH --cpus-per-task=''1''
  <nowiki>#</nowiki>PBS -l nodes=''n'':ppn=''p''
+
  <nowiki>#</nowiki>SBATCH --partition=avogadro
  <nowiki>#</nowiki>PBS -w ''time in hh:mm:ss format''
+
<nowiki>#</nowiki>SBATCH --account=avogadro
 +
  <nowiki>#</nowiki>SBATCH --time=''100:00:00''
 
   
 
   
 
  ''commands to execute''
 
  ''commands to execute''
Line 160: Line 163:
 
The first row just selects in which type of shell the script should be run. In the example, the bash shell is used.
 
The first row just selects in which type of shell the script should be run. In the example, the bash shell is used.
  
The rows starting with <nowiki>#</nowiki>PBS are seen as comments from the bash shell, but are interpreted as commands from the PBS scheduler. The minimal set of PBS commands shown in the example above are
+
The rows starting with <nowiki>#</nowiki>SLURM are seen as comments from the bash shell, but are interpreted as commands from the SLURM scheduler. The minimal set of SLURM commands shown in the example above are
* -N : assigns a name to the job, which will be the name found in calling the ''qstat'' command, and it will be the root name of the .o and .e files containing, respectively, the standard output and error of the shell
+
* --job-name : assigns a name to the job, which will be the name found in calling the ''squeue'' command, and it will be the root name of the .o and .e files containing, respectively, the standard output and error of the shell
* -A : is used to associate the job to the account that is making the request
+
* --nodes : is the number of nodes where to run the (parallel) jobs
* -q : is used to specify the queue where the job is submitted
+
* --ntasks : is the number of tasks
* -l : list of the number of nodes (n) and the number of processors per node (p) to be allocated for the job
+
* --cpus-per-task : maps the tasks on the CPU's; this is useful for OpenMP or hybrid MPI/OpenMP jobs
* -w : sets the required walltime, which should be less or equal to the maximum walltime for the queue selected by the -q command
+
* --partition : the queue to use
 +
* --account : the cluster to use
 +
* --time : the maximum execution time required in hh:mm:ss format
  
 
+
Below these SLURM commands, the user should add the instructions to run the calculation just as he/she was writing on an interactive terminal.
Below these PBS commands, the user should add the instructions to run the calculation just as he/she was writing on an interactive terminal.
+
 
+
 
+
Some usefull PBS environment variables are
+
* PBS_O_WORKDIR : is the directory from which the PBS script has been run using the ''qsub'' command.
+
* PBS_NODEFILE : contains the list of nodes names allocated, with each name repeated many times as the number of processors allocated for that node.
+
  
  
 
== Home Folder and Storage ==
 
== Home Folder and Storage ==
  
The home folder of each of the accounts is not limited in disk space by a quota. This allows flexibility in the usage of the cluster, not limiting users in those cases when the output of a calculation requires a consistent amount of disk space.
+
The home folder of each of the accounts is limited in disk space by a quota of 1 TB.
 
+
'''Users, however, are asked to promptly free as much disk space as possible as their jobs finish.'''
+
  
To ensure that the cluster remains usable to all, <u>users are requested to keep the home folder below 180 GB</u>. To check the disk space occupied by the home, simply run the command
+
'''Users are warned that C3P does not offer a backup system. It is safe to download and store on a personal disk space the important data as soon as the job is completed.'''
du -h ~/
+
  
  
Line 189: Line 185:
  
  
'''IMPORTANT''' It is here recalled that the C3P facility is not planned to work as a data centre. This means that backup data is not replicated or conserved in a room different from that of the clusters. Thus, '''the integrity of the data is not assured in time'''. For this reason, users are encouraged to download (and free space) data from the cluster as soon as the job finishes.
+
'''IMPORTANT''' It is here recalled that the C3P facility is not planned to work as a data centre. This means that backup data is not replicated or conserved in a room different from that of the clusters. Thus, '''the integrity of the data is not assured in time'''. For this reason, users are encouraged to download (and free space) data from the cluster as soon as their job finish.

Latest revision as of 15:06, 7 August 2021

Contents

Logged in... now what?

After the login, the user is prompted to a shell on the master node of the cluster. User john will be prompted to the directory

/local/home/john

also known as the home of the user john.


A user willing to run a job on the cluster, typically will have to follow these steps

  1. copy files in the home folder by internet transfer from another computer
  2. setup and submit the PBS script to place the job in queue
  3. at the end of the calculation, transfer the files from the home in the cluster to another computer
  4. free the space in the home folder

Important: no job is allowed to run in the master node. Jobs must be submitted to the queue manager, which will handle the resources collection and usage for the required calculation.


A number of software packages are just available in the cluster, located in

/usr/local/share/

Alternatively, users can compile locally, in their home folder, the software package they want to use. Before doing so, to not waste time and/or disk space, please take a look if you can find your favourite program just in the dedicated folder.

NEW: shortly software, compilers and libraries will be handled with the module service. Users will be updated as soon as the modules are ready to be used.


Code development

Users working on the development of new programs will find these resources:

  • GNU compiler
  • MPI (openmpi)
  • CUDA v7.5


Disk quota

Accounts will have 1 TB of disk quota available.


Useful commands

The operating system is Linux and working on the cluster requires a minimum ability with the commands prompt. Here a very short list of the most useful commands is provided, both for general use on a Linux terminal and specifically to work on the cluster.

Basic Linux commands

This list is not intended to explain the commands, but just to list the most useful ones.

  • Directories
    • pwd : shows the directory in which the prompt is positioned
    • cd : change directory
    • ls : list the content of a directory
    • mkdir : create a directory
    • rmdir : remove a directory
  • Files
    • cp : copy a file
    • mv : move a file
    • rm : delete a file
    • file : provides information on the type of file
  • File contents
    • head : print the first row of a file
    • tail : print the last row of a file
    • more : show the content of a file
    • less : same as more, but it allows to step back
  • Editing files
    • vi : open a file editor
    • emacs : another file editor
    • cat : concatenate files
    • paste : concatentate file, but by columns


Copying files via scp

Transfer of files to and from the cluster can be handled from the terminal of the computer where the files have to be moved from or in. Assunming that an ssh tunnel has been opened on port 2000, to copy from a local machine to the cluster

scp -P 2000 local-path/file user@localhost:cluster-path/

where local-path is the path on the local computer where is located the file file to be copied into the cluster-path path in the cluster. Check the Clusters section to find the IP's to open the tunnel.

Similarily, to transfer from the cluster to a local computer, in a local machine terminal

scp -P 2000 user@localhost:cluster-path/file local-path/

To copy directories, just use the "-r" option of the scp command.


Alternatively, files transfer can be handled by programs with a graphical interface. A couple of suggestions are:


Running jobs

To submit a job, given that run.slurm is a SLURM script, just call

sbatch run.slurm

To check the status of the job in queue

squeue

To delete a job with id "ID", which is obtained from the squeue command

scancel ID


When a job is completed, the command

seff ID

prints the resources (RAM, CPU time, GPU time, etc.) effectively used by the job ID. This is a very important piece of information since requesting no more than the resources needed by a job helps SLURM to better handling the queue, which in turn means that the waitiing time of the jobs in the queue is lowered.


Priority and Fairshare

In the clusters, the queue management system is SLURM. Its function is to collect the resources required by a job (number of nodes, number of processors per node, walltime, etc.), which is run over such resources till job completion or resources availability (e.g., till the scheduled walltime is reached).

The SLURM service, however, is orchestrated by a second service: the scheduler. The latter calculates the priority of jobs which are waiting in queue and directs SLURM to run jobs according to the acquired priority.

The priority of a job queued to SLURM is a property that monotonically increases in time, but starting from a diffent initial value based on the type of queue and the past usage of that queue by the user with respect to the other users of the cluster (within a time window selected by the administrator). The C3P clusters have been set up to calculate the priority of idle jobs following


Formula priority.png


  • The first addend is a linear increase of priority with A an integer, and t the time in minutes. In the actual setup, A = 1. Such a contribution provides a FIFO (first in first out) character to the queue.
  • The second addend is the so-called fairshare and influences both the initial value of the priority and how it increases in time. The fairshare contribution provides a long-term balancing of the usage of the cluster based on the share quota of each of the users. If all the users posses the same quota, then over the year they will have shared evenly the cluster resources. If an account has a share quota larger then the others, then it will have access (in percent) to more resources with respect to the other accounts, over the year. At the time a job is queued to SLURM, this contribution can be positive or negative, depending on how much share quota the account has consumed, with respect to the other accounts, in the past. The parameters are an integer multiplier, B = 53 in the actual setup, the share target s0 which is the share quota that an account should reach in the long period, and the share consumed by the account, s(t). These parameters are described in detail in what follows.
  • The third addend is a constant contribution to the priority. It is set to 0 for the normal production queues, to 100000 for the test queues (allowing short, test jobs to immediately acquire priority over the other jobs in queue), and to -1440 for the "long", exceptional, queues (i.e., a 1 day penalty).


The share target is in general different per account referent and per cluster. It is calculated as


Formula share.png


with

  • q is the contribution provided by the account referent in the 5 years previous the account request/renewal to buy computation nodes and/or software licenses with respect to the total contributions of the other account referents on the same cluster
  • naccount is the number of accounts owned by the account referent
  • sproject is a parameter that is used to respect the constraints imposed from those research projects for which it is required to prove that a computational resource eventually acquired using the project funds is used by the PI for a given percentage of time with respect to the duration of the project; it will be evaluated case by case.


Finally, the share consumed by an account, s(t), is calculated as


Formula share 2.png


where the "past" is discretized in N time windows. For the n-th window, t(n) is the overall time of cluster usage, while x(n) is the cluster usage of the account. The parameter a, which can range from 0 to 1, controls the weight of the time windows. To clarify, if a = 0 then there is no share consumption. If a = 1, then all the time windows contribute evenly to the share consumption. In the case 0 < a < 1 the older the time window, the smaller the associated share consumption. In the present parametrization, the share consumption is calculated over N = 14 time windows of 24 h each, with a = 1.


SLRUM setup

A job must be placed in a queue by passing to the SLURM service:

  1. a request collecting the resources needed by the job
  2. the set of commands to execute

The simplest way to proceed is to write a bash script with all the information, to be passed to the qsub program as specified above. The structure of a PBS script file looks like

#SBATCH --job-name=name
#SBATCH --nodes=2
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
#SBATCH --partition=avogadro
#SBATCH --account=avogadro
#SBATCH --time=100:00:00

commands to execute

where the parts in italic should be changed as appropriate.

The first row just selects in which type of shell the script should be run. In the example, the bash shell is used.

The rows starting with #SLURM are seen as comments from the bash shell, but are interpreted as commands from the SLURM scheduler. The minimal set of SLURM commands shown in the example above are

  • --job-name : assigns a name to the job, which will be the name found in calling the squeue command, and it will be the root name of the .o and .e files containing, respectively, the standard output and error of the shell
  • --nodes : is the number of nodes where to run the (parallel) jobs
  • --ntasks : is the number of tasks
  • --cpus-per-task : maps the tasks on the CPU's; this is useful for OpenMP or hybrid MPI/OpenMP jobs
  • --partition : the queue to use
  • --account : the cluster to use
  • --time : the maximum execution time required in hh:mm:ss format

Below these SLURM commands, the user should add the instructions to run the calculation just as he/she was writing on an interactive terminal.


Home Folder and Storage

The home folder of each of the accounts is limited in disk space by a quota of 1 TB.

Users are warned that C3P does not offer a backup system. It is safe to download and store on a personal disk space the important data as soon as the job is completed.


In the Curie, Dalton, and Pierre clusters, users will find in their home folder a link to a folder named netapp. It points to a backup storage which is suggested to be used if, for some reason, the data cannot be removed immediatly at the completion of a job.


IMPORTANT It is here recalled that the C3P facility is not planned to work as a data centre. This means that backup data is not replicated or conserved in a room different from that of the clusters. Thus, the integrity of the data is not assured in time. For this reason, users are encouraged to download (and free space) data from the cluster as soon as their job finish.

Personal tools
Namespaces
Variants
Actions
Navigation
Events
Toolbox