User Tools

Site Tools


jargon

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
jargon [2020/01/24 20:28]
root created
jargon [2022/09/25 19:44] (current)
pwolinsk
Line 4: Line 4:
  
 == HPC == == HPC ==
-high performance computing. Implies a higher percentage of CPU and memory usage than typical administrative computing, or implies a program too large for, or that takes too long on, a desktop computer.  In academia, used for and implies computing for research. Also HTC, high throughput computing, essentially similar but oriented to processing large data files and/or many separate instances of non-parallel computing.+high performance computing. Implies a higher percentage of CPU and memory usage than typical administrative computing, or implies a program too large for, or that takes too long to reasonably run on, a desktop computer.  In academia, used for and implies computing for research. Related, HTC, high throughput computing, essentially similar but oriented to processing large data files and/or many separate instances of non-parallel computing
 + 
 +== parallel program == 
 +A program that is either multi-tasked (like MPI) or multi-threaded (like OpenMP) or both, in order to effectively use more cores and more nodes and get more computing done. May be either shared-memory or distributed-memory.  Opposite, a serial program.  "parallel" usually implies that the multiple threads or tasks are significantly communicating with each other to subdivide some larger task.  HTC implies the that multiple tasks are independent.
  
 == big data == == big data ==
Line 13: Line 16:
  
 == node == == node ==
-a single computer in a box, functionally similar to a desktop computer but typically more powerful and packaged for rackmount in a datacenter. Usually two CPU sockets or four sockets with very large memory vs. one socket for a desktop. Implies shared memory for the programs running on the node.+a single computer in a box, functionally similar to a desktop computer but typically more powerful and packaged for rackmount in a datacenter. Usually two or four CPU sockets vs. one socket for a desktop. Implies shared memory for the programs running on the node.
  
 == CPU/socket/core/thread/processor == == CPU/socket/core/thread/processor ==
Line 19: Line 22:
  
 == cluster == == cluster ==
-a group of nodes connected by a network. Depending on budget and need for communication, the network may be inexpensive commodity (Gigabit ethernet) or faster and more expensive (Infiniband or 40Gb or 100Gb Ethernet).+a group of nodes connected by a network and a common file system. Depending on budget and need for communication, the network may be inexpensive commodity (Gigabit ethernet) or faster and more expensive (Infiniband or 40Gb or 100Gb Ethernet).
  
 == supercomputer == == supercomputer ==
Line 25: Line 28:
  
 == compute node == == compute node ==
-a cluster node dedicated to scientific computing tasks. Usually controlled by a scheduler program. Usually isolated from the public internet by head nodes.  Usually has a high-speed network for running parallel computing and delivering files over the network.+a cluster node dedicated to scientific computing tasks. Usually controlled by a scheduler program. Usually isolated from the public internet by login nodes.  Usually has a high-speed network for running parallel computing and delivering files over the network.
  
 == head or login node == == head or login node ==
Line 39: Line 42:
 a graphical processing unit, a specialized type of CPU derived from a graphics card. Effectively has hundreds of small cores. For certain tasks (those that can be effectively parallelized), is much faster than a general-purpose CPU.  Presently available versions are "co-processors" that must be attached to a PCI bus which is connected to and controlled by a CPU.  Also, sometimes, Xeon Phi, a CPU with many small cores which has the software interface of a general purpose CPU, but is used on highly parallel codes like a GPU. Xeon Phi is available either as co-processor or as a CPU. Some would say Xeon Phi is not a GPU since it has no graphics functions. a graphical processing unit, a specialized type of CPU derived from a graphics card. Effectively has hundreds of small cores. For certain tasks (those that can be effectively parallelized), is much faster than a general-purpose CPU.  Presently available versions are "co-processors" that must be attached to a PCI bus which is connected to and controlled by a CPU.  Also, sometimes, Xeon Phi, a CPU with many small cores which has the software interface of a general purpose CPU, but is used on highly parallel codes like a GPU. Xeon Phi is available either as co-processor or as a CPU. Some would say Xeon Phi is not a GPU since it has no graphics functions.
  
-== shared memory == +== shared memory and multi-threading == 
-In software, a program that runs multiple tasks or software threads, each of which sees the same available memory available from the operating system, and shares that memory using one of the multiple shared memory/multi-threading communication methods (OpenMP, pthreads, POSIX shm, MPI over shared memory, etc.).  Shared memory programs cannot run across multiple nodes. In hardware, usually a node or a single computer system that supports shared memory access across its memory. Implies a limit (a little less than the amount of memory in the node) to the memory size of the running program. +In software, a program that runs multiple tasks or software threads, each of which sees the same available memory available from the operating system, and shares that memory using one of the multiple shared memory/multi-threading communication methods (OpenMP, pthreads or POSIX threads, POSIX shm, MPI over shared memory, etc.).  Shared memory programs cannot run across multiple nodes. In hardware, usually a node or a single computer system that supports shared memory access across its memory. Implies a limit (a little less than the amount of memory in the node) to the memory size of the running program. 
  
-== distributed memory ==+== distributed memory and multi-tasking ==
 In software, a program or group of programs that run on multiple nodes or shared-memory instances and use programs such as MPI to communicate between the nodes.  In hardware, a cluster that runs distributed-memory programs. Distributed-memory programs are limited in memory size only by the size of the cluster that runs them, though there may be some useful limit that is smaller. In software, a program or group of programs that run on multiple nodes or shared-memory instances and use programs such as MPI to communicate between the nodes.  In hardware, a cluster that runs distributed-memory programs. Distributed-memory programs are limited in memory size only by the size of the cluster that runs them, though there may be some useful limit that is smaller.
  
 == MPI == == MPI ==
-message passing interface, software standard used for most programs that use distributed memory.  MPI calls lower-level functions, either networking or shared memory, so it may be used transparently over either distributed or shared memory without changing the user interface.  On a cluster that means it can run transparently either on one node or multiple nodes.  MPI has multiple implementations (OpenMPI, MVAPICH, several commercial) that must be used consistently to both compile and run an MPI program.  Each MPI task is a separate program which can be single-threaded or multi-threaded.+message passing interface, software standard used for most programs that use distributed memory.  MPI was originally designed as multi-tasking over networks on separate computers.  Current implemetnations call lower-level functions, either networking or shared memory, so it may be used transparently over either distributed or shared memory without changing the user interface.  On a cluster that means it can run transparently either on one node or multiple nodes.  MPI has multiple implementations (OpenMPI (not to be confused with OpenMP), MVAPICH, several commercial) that must be used consistently to both compile and run an MPI program.  Each MPI task is a separate program which can be single-threaded or further multi-threaded using a threading system such as Posix threads or OpenMP.
  
 == single-threaded == == single-threaded ==
-A software program that cannot take advantage of multi-threading because it was written without multi-threading support, or because of dependencies between threads is not capable of being multi-threaded.  Essentially can use only one core on one node regardless of the number of cores available. Multiple single-threaded programs can be run on a single node on multiple cores if sufficient memory is available.  +A software program that cannot take advantage of multi-threading because it was written without multi-threading support, or because of dependencies between threadsis not capable of being multi-threaded.  Essentially can use only one core on one node regardless of the number of cores available, but multi-tasking can make the other cores usableAlso multiple separate single-threaded programs can be run on a single node on multiple cores in an HTC fashion if sufficient memory is available.  
  
 == memory hierarchy == == memory hierarchy ==
-A design element used to make fast computers affordable.  Memory is arranged in levels with very small and very fast and very expensive levels close to the CPU, and each succeeding level is made larger and slower. Most modern computers have registers (very fast and of KB size), L1 to L3 or L4 cache of MB size, and main memory of GB size, or "memory" if unspecified.  The operating system automatically handles staging data from main memory through the cache and registers, unless the programmer uses assembly language to control that staging.  This process makes sequential access to main memory relatively fast, as large blocks of memory can be staged through the cache while computing is ongoing, but random access to main memory is relatively slow, as the processor can idle for 200 cycles while waiting for a single element of main memory.+A design element used to make fast computers affordable.  Memory is arranged in levels with very small and very fast and very expensive levels close to the CPU, and each succeeding level is made larger and slower. Modern computers generally have registers (very fast and of KB size), L1 to L3 or L4 cache of MB size, and main memory of GB size, or "memory" if unspecified.  The operating system automatically handles staging data from main memory through the cache and registers, unless the programmer uses assembly language to control that staging.  This process makes sequential access to main memory relatively fast, as large blocks of memory can be staged through the cache while computing is ongoing, but random access to main memory is relatively slow, as the processor can idle for around 200 cycles while waiting for a single element of main memory.  
  
 == storage hierarchy == == storage hierarchy ==
-By analogy with memory hierarchy, the practice of using multiple disk storage systems with an HPC system. Each tier of storage is larger and slower than the preceding tier.  The first "scratch" tier is relatively small and fast for a disk, usually composed of SSD, and does most direct data movement to the compute nodes. The last tier may be tape or large and inexpensive disk drives and holds longer term and larger files. Latency is the main difference between tiers.+By analogy with memory hierarchy, the practice of using multiple disk storage systems with an HPC system. Each tier of storage is larger and slower than the preceding tier.  The first "scratch" tier is relatively small and fast for a disk, usually composed of SSD, and does most direct data movement to the compute nodes. The last tier may be tape or large and inexpensive disk drives and holds longer term and larger files. Latency is the main difference between tiers.  There are few automated systems to move data between storage tiers, so it usually has to be done by the user.
  
 == scratch file system == == scratch file system ==
Line 67: Line 70:
  
 == latency == == latency ==
-delay, or the time it takes to access a minimal message over a given network.  Used to characterize networks in combination with+delay, or the time it takes to access a minimal message over a given network.  Used to characterize networks in combination with bandwidth.  In HPC varies from one processor clock in cache memory to about 10 ms for a spinning disk drive.
  
 == bandwidth == == bandwidth ==
-the amount of data that can be moved over a network per second.+the amount of data that can be moved over a network per unit of time (usually listed in seconds).
  
 == VM or virtual machine == == VM or virtual machine ==
Line 76: Line 79:
  
 == HPC scheduler == == HPC scheduler ==
-A program that maintains a list of batch jobs to be executed on a cluster, ranks them in some priority order, and executes batch jobs on compute nodes as they become available. It tries to keep the cluster from being overloaded or idle. Also OS scheduler, a part of the kernel that runs on a shared-memory node and allows competing user programs access to CPU time, and IO scheduler, another part of the kernel that orders multiple disk accesses for a node+A program that maintains a list of batch jobs to be executed on a cluster, ranks them in some priority order, and executes batch jobs on compute nodes as they become available. It tries to keep the cluster from being overloaded or idle. Also the OS scheduler, a part of the kernel that runs on a shared-memory node and allows competing user programs access to CPU time, and the IO scheduler, another part of the kernel that orders multiple disk accesses for a node.
- +
-== parallel program == +
-A program that is either multi-task (like MPI) or multi-threaded (like OpenMP) or both, in order to effectively use more cores and more nodes and get more computing done. May be either shared-memory or distributed-memory.  Opposite, a serial program.+
  
 == parallel scaling == == parallel scaling ==
 The efficiency of a parallel program, usually defined as the parallel speedup of the program divided by the number of cores occupied.  Speedup is defined as the serial run time divided by the parallel run time.  Usually parallel computing introduces overhead, and scaling is less than 1 or 100%. Rarely, running on multiple CPUs can make each task fit within the memory cache of each CPU, avoiding waiting for main memory access, and scaling can exceed 1. In most cases, scaling starts at 1 at 1 core (by definition) and decreases as more cores are added, until some point is reached at which adding more cores adds overhead and makes the program slower. The efficiency of a parallel program, usually defined as the parallel speedup of the program divided by the number of cores occupied.  Speedup is defined as the serial run time divided by the parallel run time.  Usually parallel computing introduces overhead, and scaling is less than 1 or 100%. Rarely, running on multiple CPUs can make each task fit within the memory cache of each CPU, avoiding waiting for main memory access, and scaling can exceed 1. In most cases, scaling starts at 1 at 1 core (by definition) and decreases as more cores are added, until some point is reached at which adding more cores adds overhead and makes the program slower.
jargon.1579897690.txt.gz · Last modified: 2020/01/24 20:28 by root