Curie

From C3P
(Difference between revisions)
Jump to: navigation, search
 
(5 intermediate revisions by one user not shown)
Line 4: Line 4:
 
'''Curie''' is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:
 
'''Curie''' is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:
  
* '''5 nodes''' with 2 x CPU AMD Opteron 6128 (''16 cores''), 16 GB of RAM, 1 x 250 GB HD SATA, 2 x GPU Tesla M2050, Infiniband, OS Scientific Linux 6.0.
+
* '''3 nodes''' with 2 x CPU AMD Opteron 6238 (''12 cores''), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
* '''2 nodes''' with 2 x CPU AMD Opteron 6128 (''16 cores''), 56 GB of RAM, 1 x 250 GB HD SATA, 2 x GPU Tesla M2075, Infiniband, OS CentOS 6.2.
+
* '''3 nodes''' with 2 x CPU Intel Xeon Gold 5218 (''16 cores''), 192 GB of RAM, 2 x 480 GB HD SATA, 2 x GPU Tesla V100 16 GB of RAM, Infiniband, OS CentOS 6.5.
* '''3 nodes''' with 2 x CPU AMD Opteron 6238 (''24 cores''), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
+
for a total of 6 nodes, 84 cores, 12 GPU's.
for a total of 11 nodes, 184 cores, 23 GPU's.
+
 
 +
 
 +
Note: the oldest cores with the M20xx Tesla GPU's are down.
  
  
 
== Access ==
 
== Access ==
  
* Linux and Mac OS users can login using a terminal with the command:
+
* Linux and Mac OS users can login using a terminal. Windows 10 users can use the PowerShell.
  
  ssh account@192.168.9.51
+
The first step si to open a tunnel. Within the DiSC internet this can be simply done as:
 +
 
 +
  ssh -L 2000:192.168.20.253:22 account@147.162.63.10
  
 
:where "account" is the user's account.
 
:where "account" is the user's account.
  
* Windows users can use an interface program to the ssh service, e.g. [http://www.putty.org/ Putty].
+
 
 +
Then, in a different shell, login as:
 +
 
 +
ssh -p 2000 account@localhost
 +
 
 +
 
 +
Through port 2000 users can also transfer files directly via the tunnel using the scp command.
 +
 
 +
 
 +
== Troubleshootng ==
 +
 
 +
In some Linux distributions, the first attempt to open the ssh tunnel may fail with the following message:
 +
 
 +
Unable to negotiate with 147.162.63.10 port 7000: no matching key exchange method found. Their offer: diffie-hellman-group1.sh1
 +
 
 +
to solve such an issue, open (or create) the file
 +
 
 +
~/.ssh/config
 +
 
 +
and then add to the file the following lines
 +
 
 +
Host 147.162.63.10
 +
      KexAlgorithms +diffie-hellman-group1-sha1
 +
 
 +
Please, counsult http://www.openssh.com/legacy.html for more information about this issue.
  
  
 
== Queues ==
 
== Queues ==
  
The queue manager is [[PBS]] and the following queues are available:
+
The queue manager is [[SLURM]] and the following queues are available:
  
* '''curie''': max nodes 4, max walltime 2 weeks (336 hours)
+
* '''curie2''': max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla K20Xm
* '''curie2''': max nodes 2, max walltime 2 weeks (336 hours)
+
* '''curie3''': max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla V100
* '''curie-test''': max nodes 1, max walltime 10 minutes
+
  
  
== Example PBS file ==
+
== Example SLURM file ==
  
A typical [[PBS]] input file script will be as follow (see the [[Support]] page for more help):
+
A typical [[SLURM]] script will be as follow (see the [[Support]] page for more help):
  
  <nowiki>#</nowiki>!/bin/sh --login
+
  <nowiki>#</nowiki>SBATCH --job-name=''name''
  <nowiki>#</nowiki>PBS -N ''jobname''
+
  <nowiki>#</nowiki>SBATCH --ntasks=''1''
  <nowiki>#</nowiki>PBS -A ''account''
+
  <nowiki>#</nowiki>SBATCH --cpus-per-task=''1''
  <nowiki>#</nowiki>PBS -q ''queue''
+
  <nowiki>#</nowiki>SBATCH --partition=''curie2''
  <nowiki>#</nowiki>PBS -l nodes=''n''
+
  <nowiki>#</nowiki>SBATCH --account=''curie2''
  <nowiki>#</nowiki>PBS -w ''time in hh:mm:ss format''
+
  <nowiki>#</nowiki>SBATCH --time=''100:00:00''
 
   
 
   
 
  ''commands to execute''
 
  ''commands to execute''
  
 
where the parts in ''italic'' should be changed as appropriate.
 
where the parts in ''italic'' should be changed as appropriate.

Latest revision as of 14:48, 5 February 2026

Contents

Description

Curie is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:

  • 3 nodes with 2 x CPU AMD Opteron 6238 (12 cores), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
  • 3 nodes with 2 x CPU Intel Xeon Gold 5218 (16 cores), 192 GB of RAM, 2 x 480 GB HD SATA, 2 x GPU Tesla V100 16 GB of RAM, Infiniband, OS CentOS 6.5.

for a total of 6 nodes, 84 cores, 12 GPU's.


Note: the oldest cores with the M20xx Tesla GPU's are down.


Access

  • Linux and Mac OS users can login using a terminal. Windows 10 users can use the PowerShell.

The first step si to open a tunnel. Within the DiSC internet this can be simply done as:

ssh -L 2000:192.168.20.253:22 account@147.162.63.10
where "account" is the user's account.


Then, in a different shell, login as:

ssh -p 2000 account@localhost


Through port 2000 users can also transfer files directly via the tunnel using the scp command.


Troubleshootng

In some Linux distributions, the first attempt to open the ssh tunnel may fail with the following message:

Unable to negotiate with 147.162.63.10 port 7000: no matching key exchange method found. Their offer: diffie-hellman-group1.sh1

to solve such an issue, open (or create) the file

~/.ssh/config

and then add to the file the following lines

Host 147.162.63.10
      KexAlgorithms +diffie-hellman-group1-sha1

Please, counsult http://www.openssh.com/legacy.html for more information about this issue.


Queues

The queue manager is SLURM and the following queues are available:

  • curie2: max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla K20Xm
  • curie3: max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla V100


Example SLURM file

A typical SLURM script will be as follow (see the Support page for more help):

#SBATCH --job-name=name
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=curie2
#SBATCH --account=curie2
#SBATCH --time=100:00:00

commands to execute

where the parts in italic should be changed as appropriate.

Personal tools
Namespaces
Variants
Actions
Navigation
Events
Toolbox