Curie

(Difference between revisions)
Jump to: navigation, search
Line 4: Line 4:
 
'''Curie''' is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:
 
'''Curie''' is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:
  
* '''5 nodes''' with 2 x CPU AMD Opteron 6128 (''16 cores''), 16 GB of RAM, 1 x 250 GB HD SATA, 2 x GPU Tesla M2050, Infiniband, OS Scientific Linux 6.0.
+
* '''3 nodes''' with 2 x CPU AMD Opteron 6238 (''12 cores''), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
* '''2 nodes''' with 2 x CPU AMD Opteron 6128 (''16 cores''), 56 GB of RAM, 1 x 250 GB HD SATA, 2 x GPU Tesla M2075, Infiniband, OS CentOS 6.2.
+
* '''3 nodes''' with 2 x CPU Intel Xeon Gold 5218 (''16 cores''), 192 GB of RAM, 2 x 480 GB HD SATA, 2 x GPU Tesla V100 16 GB of RAM, Infiniband, OS CentOS 6.5.
* '''3 nodes''' with 2 x CPU AMD Opteron 6238 (''24 cores''), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
+
for a total of 6 nodes, 84 cores, 12 GPU's.
for a total of 11 nodes, 184 cores, 23 GPU's.
+
 
 +
Note: the oldest cores with the M20xx Tesla GPU's are down.
  
  
 
== Access ==
 
== Access ==
  
* Linux and Mac OS users can login within the DiSC internet connection using a terminal with the command:
+
* Linux and Mac OS users can login using a terminal. Windows 10 users can use the PowerShell.
  
  ssh account@192.168.9.51
+
The first step si to open a tunnel. Within the DiSC internet this can be simply done as:
 +
 
 +
  ssh -L 2000:192.168.20.253:22 account@192.168.9.18 -p 7000
  
 
:or from outside the Department using
 
:or from outside the Department using
  
  ssh -p 50005 account@147.162.63.10
+
  ssh -L 2000:192.168.20.253:22 account@147.162.63.10 -p 7000
  
 
:where "account" is the user's account.
 
:where "account" is the user's account.
  
* Windows users can use an interface program to the ssh service, e.g. [http://www.putty.org/ Putty].
+
 
 +
Then, in a different shell, login as:
 +
 
 +
ssh -p 2000 account@localhost
 +
 
 +
 
 +
Through port 2000 users can also transfer files directly via the tunnel using the command scp.
  
  
 
== Queues ==
 
== Queues ==
  
The queue manager is [[PBS]] and the following queues are available:
+
The queue manager is [[SLURM]] and the following queues are available:
  
* '''curie''': max nodes 4, max walltime 2 weeks (336 hours)
+
* '''curie2''': max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla K20Xm
* '''curie2''': max nodes 2, max walltime 2 weeks (336 hours)
+
* '''curie3''': max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla V100
* '''curie-test''': max nodes 1, max walltime 10 minutes
+
  
  
== Example PBS file ==
+
== Example SLURM file ==
  
A typical [[PBS]] input file script will be as follow (see the [[Support]] page for more help):
+
A typical [[SLURM]] script will be as follow (see the [[Support]] page for more help):
  
  <nowiki>#</nowiki>!/bin/sh --login
+
  <nowiki>#</nowiki>SBATCH --job-name=''name''
  <nowiki>#</nowiki>PBS -N ''jobname''
+
  <nowiki>#</nowiki>SBATCH --ntasks=''1''
  <nowiki>#</nowiki>PBS -A ''account''
+
  <nowiki>#</nowiki>SBATCH --cpus-per-task=''1''
  <nowiki>#</nowiki>PBS -q ''queue''
+
  <nowiki>#</nowiki>SBATCH --partition=''curie2''
  <nowiki>#</nowiki>PBS -l nodes=''n''
+
  <nowiki>#</nowiki>SBATCH --account=''curie2''
  <nowiki>#</nowiki>PBS -w ''time in hh:mm:ss format''
+
  <nowiki>#</nowiki>SBATCH --time=''100:00:00''
 
   
 
   
 
  ''commands to execute''
 
  ''commands to execute''
  
 
where the parts in ''italic'' should be changed as appropriate.
 
where the parts in ''italic'' should be changed as appropriate.

Revision as of 14:35, 7 August 2021

Contents

Description

Curie is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:

  • 3 nodes with 2 x CPU AMD Opteron 6238 (12 cores), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
  • 3 nodes with 2 x CPU Intel Xeon Gold 5218 (16 cores), 192 GB of RAM, 2 x 480 GB HD SATA, 2 x GPU Tesla V100 16 GB of RAM, Infiniband, OS CentOS 6.5.

for a total of 6 nodes, 84 cores, 12 GPU's.

Note: the oldest cores with the M20xx Tesla GPU's are down.


Access

  • Linux and Mac OS users can login using a terminal. Windows 10 users can use the PowerShell.

The first step si to open a tunnel. Within the DiSC internet this can be simply done as:

ssh -L 2000:192.168.20.253:22 account@192.168.9.18 -p 7000
or from outside the Department using
ssh -L 2000:192.168.20.253:22 account@147.162.63.10 -p 7000
where "account" is the user's account.


Then, in a different shell, login as:

ssh -p 2000 account@localhost


Through port 2000 users can also transfer files directly via the tunnel using the command scp.


Queues

The queue manager is SLURM and the following queues are available:

  • curie2: max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla K20Xm
  • curie3: max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla V100


Example SLURM file

A typical SLURM script will be as follow (see the Support page for more help):

#SBATCH --job-name=name
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=curie2
#SBATCH --account=curie2
#SBATCH --time=100:00:00

commands to execute

where the parts in italic should be changed as appropriate.

Personal tools
Namespaces
Variants
Actions
Navigation
Events
Toolbox