HPC for the long tail of science:
- Designed by Dell and SDSC delivering 5.16 peak petaflops
- Designed and operated on the principle that the majority of computational research is performed at modest scale: large number jobs that run for less than 48 hours, but can be computationally intensvie and generate large amounts of data.
- An NSF-funded system available through the eXtreme Science and Engineering Discovery Environment (XSEDE) program (https://www.xsede.org).
- Supports interactive computing and science gateways.
- Will offer Composible Systems and Cloud Bursting.
System Summary
- 13 SDSC Scalable Compute Units (SSCU)
- 728 x 2s Standard Compute Nodes
- 93,184 Compute Cores
- 200 TB DDR4 Memory
- 52x 4-way GPU Nodes w/NVLINK
-
208 V100s | * 4x 2TB Large Memory Nodes |
- HDR 100 non-blocking Fabric
- 12 PB Lustre High Performance
- Storage
- 7 PB Ceph Object Storage
- 1.2 PB on-node NVMe
- Dell EMC PowerEdge
- Direct Liquid Cooled
Expanse Scaleable Compute Unit
Expanse Connectivity Fabric
AMD EPYC 7742 Processor Architecture
- 8 Core Complex Dies (CCDs).
- CCDs connect to memory, I/O, and each other through the I/O Die.
- 8 memory channels per socket.
- DDR4 memory at 3200MHz.
- PCI Gen4, up to 128 lanes of high speed I/O.
- Memory and I/O can be abstracted into separate quadrants each with 2 DIMM channels and 32 I/O lanes.
- 2 Core Complexes (CCXs) per CCD
- 4 Zen2 cores in each CCX share a16ML3 cache. Total of 16x16=256MB L3 cache.
- Each core includes a private 512KB L2 cache.
New Expanse Features
Composable Systems
Composable Systems will support complex, distributed, workflows – making Expanse part of a larger CI ecosystem.
- Bright Cluster Manager + Kubernetes
- Core components developed via NSF- funded CHASE-CI (NSF Award # 1730158), and the Pacific Research Platform (NSF Award # 1541349)
- Requests for a composable system will be part of an XRAC request
- Advanced User Support resources available to assist with projects - this is part of our operations funding.
- Webinar scheduled for April 2021. See: https://www.sdsc.edu/education_and_training/training_hpc.html
Cloud Bursting
Expanse will support integration with public clouds:
- Supports projects that share data, need access to novel technologies, and integrate cloud resources into workflows
- Slurm + in-house developed software + Terraform (Hashicorp)
- Early work funded internally and via NSF E-CAS/Internet2 project for CIPRES (Exploring Cloud for the Acceleration of Science, Award #1904444).
- Approach is cloud-agnostic and will support the major cloud providers.
- Users submit directly via Slurm, or as part of a composed system.
- Options for data movement: data in the cloud; remote mounting of file systems; cached filesystems (e.g., StashCache), and data transfer during the job.
- Funding for users cloud resources is not part of an Expanse award: the researcher must have access to cloud computing credits via other NSF awards and funding.