Curie
(Difference between revisions)
| Line 4: | Line 4: | ||
'''Curie''' is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software: | '''Curie''' is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software: | ||
| − | * ''' | + | * '''3 nodes''' with 2 x CPU AMD Opteron 6238 (''12 cores''), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5. |
| − | * ''' | + | * '''3 nodes''' with 2 x CPU Intel Xeon Gold 5218 (''16 cores''), 192 GB of RAM, 2 x 480 GB HD SATA, 2 x GPU Tesla V100 16 GB of RAM, Infiniband, OS CentOS 6.5. |
| − | + | for a total of 6 nodes, 84 cores, 12 GPU's. | |
| − | for a total of | + | |
| + | Note: the oldest cores with the M20xx Tesla GPU's are down. | ||
== Access == | == Access == | ||
| − | * Linux and Mac OS users can login | + | * Linux and Mac OS users can login using a terminal. Windows 10 users can use the PowerShell. |
| − | ssh account@192.168.9. | + | The first step si to open a tunnel. Within the DiSC internet this can be simply done as: |
| + | |||
| + | ssh -L 2000:192.168.20.253:22 account@192.168.9.18 -p 7000 | ||
:or from outside the Department using | :or from outside the Department using | ||
| − | ssh - | + | ssh -L 2000:192.168.20.253:22 account@147.162.63.10 -p 7000 |
:where "account" is the user's account. | :where "account" is the user's account. | ||
| − | + | ||
| + | Then, in a different shell, login as: | ||
| + | |||
| + | ssh -p 2000 account@localhost | ||
| + | |||
| + | |||
| + | Through port 2000 users can also transfer files directly via the tunnel using the command scp. | ||
== Queues == | == Queues == | ||
| − | The queue manager is [[ | + | The queue manager is [[SLURM]] and the following queues are available: |
| − | * ''' | + | * '''curie2''': max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla K20Xm |
| − | * ''' | + | * '''curie3''': max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla V100 |
| − | + | ||
| − | == Example | + | == Example SLURM file == |
| − | A typical [[ | + | A typical [[SLURM]] script will be as follow (see the [[Support]] page for more help): |
| − | <nowiki>#</nowiki> | + | <nowiki>#</nowiki>SBATCH --job-name=''name'' |
| − | <nowiki>#</nowiki> | + | <nowiki>#</nowiki>SBATCH --ntasks=''1'' |
| − | <nowiki>#</nowiki> | + | <nowiki>#</nowiki>SBATCH --cpus-per-task=''1'' |
| − | <nowiki>#</nowiki> | + | <nowiki>#</nowiki>SBATCH --partition=''curie2'' |
| − | <nowiki>#</nowiki> | + | <nowiki>#</nowiki>SBATCH --account=''curie2'' |
| − | <nowiki>#</nowiki> | + | <nowiki>#</nowiki>SBATCH --time=''100:00:00'' |
''commands to execute'' | ''commands to execute'' | ||
where the parts in ''italic'' should be changed as appropriate. | where the parts in ''italic'' should be changed as appropriate. | ||
Revision as of 14:35, 7 August 2021
Contents |
Description
Curie is a cluster featuring hybrid CPU/GPU nodes. It is equipped with the following hardware/software:
- 3 nodes with 2 x CPU AMD Opteron 6238 (12 cores), 64 GB of RAM, 1 x 500 GB HD SATA, 3 x GPU Tesla K20Xm, Infiniband, OS CentOS 6.5.
- 3 nodes with 2 x CPU Intel Xeon Gold 5218 (16 cores), 192 GB of RAM, 2 x 480 GB HD SATA, 2 x GPU Tesla V100 16 GB of RAM, Infiniband, OS CentOS 6.5.
for a total of 6 nodes, 84 cores, 12 GPU's.
Note: the oldest cores with the M20xx Tesla GPU's are down.
Access
- Linux and Mac OS users can login using a terminal. Windows 10 users can use the PowerShell.
The first step si to open a tunnel. Within the DiSC internet this can be simply done as:
ssh -L 2000:192.168.20.253:22 account@192.168.9.18 -p 7000
- or from outside the Department using
ssh -L 2000:192.168.20.253:22 account@147.162.63.10 -p 7000
- where "account" is the user's account.
Then, in a different shell, login as:
ssh -p 2000 account@localhost
Through port 2000 users can also transfer files directly via the tunnel using the command scp.
Queues
The queue manager is SLURM and the following queues are available:
- curie2: max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla K20Xm
- curie3: max nodes 2, max walltime 336:00:00 (2 weeks), to use the Tesla V100
Example SLURM file
A typical SLURM script will be as follow (see the Support page for more help):
#SBATCH --job-name=name #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --partition=curie2 #SBATCH --account=curie2 #SBATCH --time=100:00:00 commands to execute
where the parts in italic should be changed as appropriate.