Compiling & Linking Code
Expanse provides the Intel, Portland Group (PGI), and GNU compilers along with multiple MPI implementations (MVAPICH2, MPICH2, OpenMPI). Most applications will achieve the best performance on Expanse using the Intel compilers and MVAPICH2 and the majority of libraries installed on Expanse have been built using this combination. Having such a diverse set of compilers avaiable allows for our users to customize the software stack need for their application. However, there can be some complexity involved in sorting out the module dependencies needed for your applications. Often the set of modules being loaded depends on the application you are using and the compiler and libraries you may need. In many cases you will need to use the module spider
command to sort out what modules your application will need. Additionally, it is possible the list will change if some of the dependent software changes.
Other compilers and versions can be installed by Expanse staff on request. For more information, see the [Expanse User Guide.] (https://www.sdsc.edu/support/user_guides/expanse.html#compiling)
Supported Compilers
Expanse CPU and GPU nodes have different compiler libraries.
CPU Nodes
- GNU, Intel, AOCC (AMD) compilers
- Multiple MPI implementations (OpenMPI, MVAPICH2, and IntelMPI).
- A majority of applications have been built using gcc/10.2.0 which features AMD Rome specific optimization flags (-march=znver2).
- Intel, and AOCC compilers all have flags to support Advanced Vector Extensions 2 (AVX2).
Users should evaluate their application for best compiler and library selection. GNU, Intel, and AOCC compilers all have flags to support Advanced Vector Extensions 2 (AVX2). Using AVX2, up to eight floating point operations can be executed per cycle per core, potentially doubling the performance relative to non-AVX2 processors running at the same clock speed. Note that AVX2 support is not enabled by default and compiler flags must be set as described below.
GPU Nodes
Expanse GPU nodes have GNU, Intel, and PGI compilers available along with multiple MPI implementations (OpenMPI, IntelMPI, and MVAPICH2). The gcc/10.2.0, Intel, and PGI compilers have specific flags for the Cascade Lake architecture. Users should evaluate their application for best compiler and library selections.
Note: that the login nodes are not the same as the GPU nodes, therefore all GPU codes must be compiled by requesting an interactive session on the GPU nodes.
In this tutorial, we include several hands-on examples that cover many of the cases in the table:
- MPI
- OpenMP
- HYBRID
- GPU
- Local scratch
AMD Optimizing C/C++ Compiler (AOCC)
The AMD Optimizing C/C++ Compiler (AOCC) is only available on CPU nodes. AMD compilers can be loaded by executing the following commands at the Linux prompt:
module load aocc
For more information on the AMD compilers run:
[flang | clang ] -help
Suggested Compilers to used based on programming model and languages:
Language | Serial | MPI | OpenMP | MPI + OpenMP |
---|---|---|---|---|
Fortran | flang | mpif90 | ifort -fopenmp | mpif90 -fopenmp |
C | clang | mpiclang | icc -fopenmp | mpicc -fopenmp |
C++ | clang++ | mpiclang | icpc -fopenmp | mpicxx -fopenmp |
Using the AOCC Compilers
- If you have modified your environment, you can reload by executing the module purge & load commands at the Linux prompt, or placing the load command in your startup file (~/.cshrc or ~/.bashrc)
In this example, we show how to reload your environment and how to use the swap
command.
[user@login02 ~]$ module list
Currently Loaded Modules:
1) shared 2) cpu/1.0 3) DefaultModules 4) hdf5/1.10.1 5) intel/ 19.1.1.217
# need to change multiple modules
[user@login02 ~]$ module purge
[user@login02 ~]$ module list
No modules loaded
[user@login02 ~]$ module load slurm
[user@login02 ~]$ module load cpu
[user@login02 ~]$ module load gcc
[user@login02 ~]$ module load openmpi/4.0.4
[user@login02 ~]$ module list
Currently Loaded Modules:
1) slurm/expanse/20.02.3 2) cpu/1.0 3) gcc/10.2.0 4) openmpi/4.0.4
[user@login02 MPI]$ module swap gcc aocc
Due to MODULEPATH changes, the following have been reloaded:
1) openmpi/4.0.4
[user@login02 ~]$ module list
Currently Loaded Modules:
1) slurm/expanse/20.02.3 2) cpu/1.0 3) aocc/2.2.0 4) openmpi/4.0.4
[user@login02 ~]$
Intel Compilers
The Intel compilers and the MVAPICH2 MPI implementation will be loaded by default. The MKL and related libraries may need several modulrs. If you have modified your environment, you can reload by executing the following commands such as those shown below at the Linux prompt or placing in your startup file (~/.cshrc or ~/.bashrc). Below is the list of modules created for the DGEMM MKL example described below (on 01/25/21):
module purge
module load slurm
module load cpu
module load gpu/0.15.4
module load intel/19.0.5.281
module load intel-mkl/2020.3.279
Recall that the list of modules being loaded depends on the application you are using and the compiler and libraries you may need. In some cases you will need to use the module spider command to sort out what modules your application will need. And, it is possible the list will change if some of the dependent software changes.
For AVX2 support, compile with the -xHOST option. Note that -xHOST alone does not enable aggressive optimization, so compilation with -O3 is also suggested. The -fast flag invokes -xHOST, but should be avoided since it also turns on interprocedural optimization (-ipo), which may cause problems in some instances.
Intel MKL libraries are available as part of the "intel" modules on Expanse. Once this module is loaded, the environment variable MKL_ROOT points to the location of the mkl libraries. The MKL link advisor can be used to ascertain the link line (change the MKL_ROOT aspect appropriately).
In the example below, we are working with a serial MKL example that can be found in the examples/MKL/dgemm folder of the GitHub repository. This example based on an Intel MKL repo computes the real matrix C=alpha*A*B+beta*C
using Intel(R) MKL
- Repository contents:
[user@login01 dgemm]$ ll total 3758 drwxr-xr-x 2 user abc123 8 Jan 29 00:45 . drwxr-xr-x 3 user abc123 3 Jan 29 00:25 .. -rw-r--r-- 1 user abc123 2997 Jan 29 00:25 dgemm_example.f -rw-r--r-- 1 user abc123 618 Jan 29 00:25 dgemm-Slurm.sb -rw-r--r-- 1 user abc123 363 Jan 29 00:32 README.txt
- Code snippets:
PROGRAM MAIN IMPLICIT NONE DOUBLE PRECISION ALPHA, BETA INTEGER M, P, N, I, J PARAMETER (M=2000, P=200, N=1000) DOUBLE PRECISION A(M,P), B(P,N), C(M,N) [SNIP] PRINT *, "Computing matrix product using Intel(R) MKL DGEMM " CALL DGEMM('N','N',M,N,P,ALPHA,A,M,B,P,BETA,C,M) [SNIP]
- README.txt contents:
[user@login01 dgemm]$ cat README.txt
[1] Compile:
module purge
module load slurm
module load cpu
module load gpu/0.15.4
module load intel/19.0.5.281
module load intel-mkl/2020.3.279
ifort -o dgemm_example -mkl -static-intel dgemm_example.f
[2] Run:
sbatch dgemm-Slurm.sb
NOTE: for other compilers, replace "gcc"
with the one you want to use.
- Contents of the batch script:
[user@login01 dgemm]$ cat dgemm-Slurm.sb
!/bin/bash
SBATCH --job-name="dgemm_example"
SBATCH --output="dgemm_example.%j.%N.out"
SBATCH --partition=compute
SBATCH --nodes=1
SBATCH --ntasks-per-node=128
SBATCH --mem=248G
SBATCH --account=sds173
SBATCH --export=ALL
SBATCH -t 00:30:00
This job runs with 1 nodes, 128 cores per node for a total of 256 cores.
# Environment
module purge
module load cpu/0.15.4
module load gpu/0.15.4
module load intel/19.0.5.281
module load intel-mkl/2020.3.279
module load slurm
# Use srun to run the job
srun --mpi=pmi2 -n 128 --cpu-bind=rank dgemm_example
- An example of the output:
Top left corner of matrix A:
1. 2. 3. 4. 5. 6.
201. 202. 203. 204. 205. 206.
401. 402. 403. 404. 405. 406.
601. 602. 603. 604. 605. 606.
801. 802. 803. 804. 805. 806.
1001. 1002. 1003. 1004. 1005. 1006.
Top left corner of matrix B:
-1. -2. -3. -4. -5. -6.
-1001. -1002. -1003. -1004. -1005. -1006.
-2001. -2002. -2003. -2004. -2005. -2006.
-3001. -3002. -3003. -3004. -3005. -3006.
-4001. -4002. -4003. -4004. -4005. -4006.
-5001. -5002. -5003. -5004. -5005. -5006.
Top left corner of matrix C:
-2.6666E+09 -2.6666E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09 -2.6667E+09
-6.6467E+09 -6.6467E+09 -6.6468E+09 -6.6468E+09 -6.6469E+09 -6.6470E+09
-1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10 -1.0627E+10
-1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10 -1.4607E+10
-1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8587E+10 -1.8588E+10 -1.8588E+10
-2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2567E+10 -2.2568E+10 -2.2568E+10
For more information on the Intel compilers run: [ifort | icc | icpc] -help |
GNU Compilers
The GNU compilers can be loaded by executing the following commands at the Linux prompt or placing in your startup files (~/.cshrc or ~/.bashrc)
module purge
module load gnu openmpi_ib
For AVX support, compile with -mavx. Note that AVX support is only available in version 4.7 or later, so it is necessary to explicitly load the gnu/4.9.2 module until such time that it becomes the default.
For more information on the GNU compilers: man [gfortran | gcc | g++] |
Serial | MPI | OpenMP | MPI+OpenMP | |
---|---|---|---|---|
Fortran | gfortran | mpif90 | gfortran -fopenmp | mpif90 -fopenmp |
C | gcc | mpicc | gcc -fopenmp | mpicc -fopenmp |
C++ | g++ | mpicxx | g++ -fopenmp | mpicxx -fopenmp |
PGI Compilers
The PGI compilers can be loaded by executing the following commands at the Linux prompt or placing in your startup file (~/.cshrc or ~/.bashrc)
module purge
module load pgi mvapich2_ib
For AVX support, compile with -fast
For more information on the PGI compilers: man [pgf90 | pgcc | pgCC] |
Serial | MPI | OpenMP | MPI+OpenMP | |
---|---|---|---|---|
pgf90 | mpif90 | pgf90 -mp | mpif90 -mp | |
C | pgcc | mpicc | pgcc -mp | mpicc -mp |
C++ | pgCC | mpicxx | pgCC -mp | mpicxx -mp |