High Performance Computing (HPC1)

Update (2/28/2017)

The Try-Before-You-Buy program is now live!  The College has purchased two additional compute nodes that are dedicated to the use of non-members so they can evaluate the utility of HPC1 for their research work, with an eye toward becoming full members.  The process for this program is:

  • All try-before-you-buy (TBYB) participants will be grouped as if they are members of the same research group.  This means they will be giving the same access as if they had purchased two compute nodes.
  • 1 TB of shared disk space will be made available to each participant
  • TBYB participants who are familiar with HPC functions and the SLURM queue manager may start work immediately upon being approved.  Those who are unfamiliar with the tools must participate in an on-boarding training (no more than two hours) that will be conducted once each month for all potential participants.  Participants can also use this on-line documentation to learn more about our HPC environment on their own.
  • TBYB participants will have full access to compete for resources with HPC1 members for a period of one month following successful on-boarding.  Extensions may be requested.
  • Requests for access must include a statement of reasonable ability by the requester to purchase HPC1 nodes in the future. This is a program to lower the adoption barriers for new full-time members of HPC1.  It is not designed to provide access to free computing resources.

To participate in the program, email a request to coeitss-support@ucdavis.edu with:

  1. Your contact information and any contact information for others in your research group who will be participating with you
  2. A short abstract from each TBYB applicant describing what research they are planning to conduct and what types of models they will use for that research.
  3. A short statement regarding the availability of funds for becoming a full member of HPC1 if your evaluation is successful
  4. Information about you and your team’s previous experience in an HPC environment

General Information

The College of Engineering High Performance Computing Cluster (HPC1) contains 60 compute nodes and central storage, all connected on Infiniband networking. Each node contains 64GB of RAM shared by two CPU sockets, each with an 8-core CPU running at 2.4GHz. Central storage is managed by redundant storage servers, with 200 TB of usable storage evenly allocated to researchers. The storage is for temporary computation and is not backed up or duplicated in any way except that it is configured as RAID6 so can withstand up-to-two simultaneous hard drive failures.

The cluster was built as a shared resource by participating College of Engineering professors with the understanding that the professors and their affiliated research groups will have complete and instantaneous access to the cluster nodes that they purchased. To illustrate, if a professor purchased five nodes of the cluster and wants to immediately run a job on those five nodes, any jobs currently running on those nodes will be immediately stopped and put back into the input queue and the professor’s job will run immediately. If the professor needs more resources than his original purchase (say 10 nodes), he can start a job requesting those resources and may be bumped if the owner of those other nodes requires them.

Jobs are managed by the SLURM queue manager. Access to the cluster can be granted only to the participating professors and their research groups. If you qualify, enter your access application information here and your professor will be contacted to confirm your access.

Documentation on submitting jobs and other helpful links can be found here.

If your research group is not part of the cluster and you would like to join, please send an email to coeitss-support@ucdavis.edu so that we can discuss access.  Minimum buy-in is to purchase a single cluster node for $5K.

The current compute node configuration is a 1U Dell PowerEdge R630 server with:

  • Two Intel E5-2630 v3 2.4GHz CPU’s with eight cores (16 threads) each
  • 64GB of RDIMM RAM
  • Intel QDR InfiniBand network adapter (30 gigabit, low latency)
  • 1 gigabit Ethernet network adaptor
  • 1 1TB 7200RPM hard drive
  • 10 Gigabit uplink to campus network backbone

Central storage is allocated based on the number of nodes purchased by a PI/Research group at 4TB per node. If 4 nodes are purchased 16TB storage will be allocated to the group. Storage can be expanded if additional nodes are purchased later.

Several compute nodes have the same internal configuration but are Dell PowerEdge R730 2U servers to accommodate the future use of two GPU cards.

Participating Research Groups

Biomedical Engineering
Sharon Aviran
Craig Benham
Yong Duan
Jinyi Qi
Cheemeng Tan

Chemical Engineering
Roland Faller

Civil and Environmental Engineering
Yueyue Fan
Jonathan Herman
Mike Kleeman
Bassam Younis

Computer Science
Computer Science Department
Dipak Ghosal
Yong Jae Lee

Electrical and Computer Engineering
John Owens

Mechanical and Aerospace Engineering
Roger Davis
JP Delplanque
Seongkyu Lee

Comments are closed.