Expanse Overview

Expanse Overview

HPC for the long tail of science:

  • Designed by Dell and SDSC delivering 5.16 peak petaflops
  • Designed and operated on the principle that the majority of computational research is performed at modest scale: large number jobs that run for less than 48 hours, but can be computationally intensvie and generate large amounts of data.
  • An NSF-funded system available through the eXtreme Science and Engineering Discovery Environment (XSEDE) program (https://www.xsede.org).
  • Supports interactive computing and science gateways.
  • Will offer Composible Systems and Cloud Bursting.

Expanse Heterogeneous Architecture

System Summary

  • 13 SDSC Scalable Compute Units (SSCU)
  • 728 x 2s Standard Compute Nodes
  • 93,184 Compute Cores
  • 200 TB DDR4 Memory
  • 52x 4-way GPU Nodes w/NVLINK
  • 208 V100s * 4x 2TB Large Memory Nodes
  • HDR 100 non-blocking Fabric
  • 12 PB Lustre High Performance
  • Storage
    • 7 PB Ceph Object Storage
    • 1.2 PB on-node NVMe
  • Dell EMC PowerEdge
  • Direct Liquid Cooled

Expanse Scaleable Compute Unit

Expanse Scaleable Compute Unit


Expanse Connectivity Fabric

Expanse Connectivity Fabric


AMD EPYC 7742 Processor Architecture

AMD EPYC 7742 Processor Architecture

  • 8 Core Complex Dies (CCDs).
  • CCDs connect to memory, I/O, and each other through the I/O Die.
  • 8 memory channels per socket.
  • DDR4 memory at 3200MHz.
  • PCI Gen4, up to 128 lanes of high speed I/O.
  • Memory and I/O can be abstracted into separate quadrants each with 2 DIMM channels and 32 I/O lanes.
  • 2 Core Complexes (CCXs) per CCD
  • 4 Zen2 cores in each CCX share a16ML3 cache. Total of 16x16=256MB L3 cache.
  • Each core includes a private 512KB L2 cache.

New Expanse Features

Composable Systems

Expanse Composable Systems Composable Systems will support complex, distributed, workflows – making Expanse part of a larger CI ecosystem.

  • Bright Cluster Manager + Kubernetes
  • Core components developed via NSF- funded CHASE-CI (NSF Award # 1730158), and the Pacific Research Platform (NSF Award # 1541349)
  • Requests for a composable system will be part of an XRAC request
  • Advanced User Support resources available to assist with projects - this is part of our operations funding.
    • Webinar scheduled for April 2021. See: https://www.sdsc.edu/education_and_training/training_hpc.html

Cloud Bursting

Expanse will support integration with public clouds:

Expanse Cloud Bursting to AWS

  • Supports projects that share data, need access to novel technologies, and integrate cloud resources into workflows
  • Slurm + in-house developed software + Terraform (Hashicorp)
  • Early work funded internally and via NSF E-CAS/Internet2 project for CIPRES (Exploring Cloud for the Acceleration of Science, Award #1904444).
  • Approach is cloud-agnostic and will support the major cloud providers.
  • Users submit directly via Slurm, or as part of a composed system.
  • Options for data movement: data in the cloud; remote mounting of file systems; cached filesystems (e.g., StashCache), and data transfer during the job.
  • Funding for users cloud resources is not part of an Expanse award: the researcher must have access to cloud computing credits via other NSF awards and funding.