User Tools

Site Tools


jargon

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
jargon [2016/08/05 19:05]
root
jargon [2016/11/10 20:31]
root
Line 4: Line 4:
  
 == HPC == == HPC ==
-high performance computing. Implies a higher percentage of CPU and memory usage than typical administrative computing. ​ In academia, used for and implies computing for research. Also HTC, high throughput computing, similar but more oriented to processing large data files.+high performance computing. Implies a higher percentage of CPU and memory usage than typical administrative computing, or implies a program too large for, or that takes too long on, a desktop computer.  In academia, used for and implies computing for research. Also HTC, high throughput computing, ​essentially ​similar but oriented to processing large data files.
  
 == node == == node ==
-a single computer, similar to a desktop computer but typically more powerful and packaged for rackmount in a datacenter. Implies shared memory for the programs running on the node.+a single computer ​in a boxfunctionally ​similar to a desktop computer but typically more powerful and packaged for rackmount in a datacenter. Implies shared memory for the programs running on the node.
  
 == cluster == == cluster ==
-a group of nodes connected by a network. Depending on budget and need for communication,​ the network may be inexpensive commodity (Gigabit ethernet) or faster and more expensive (Infiniband).+a group of nodes connected by a network. Depending on budget and need for communication,​ the network may be inexpensive commodity (Gigabit ethernet) or faster and more expensive (Infiniband) or in between (10Gb Ethernet).
  
 == supercomputer == == supercomputer ==
Line 28: Line 28:
  
 == CPU/​socket/​core/​thread == == CPU/​socket/​core/​thread ==
-Circa 1980, one computer= one node= one CPU= one socket = one core = one processor = usually one thread. Since then computer design has become more complicated. ​ Typical server computers used for HPC have two to eight sockets (usually two) on a mainboard or motherboard. Each socket holds a CPU "​central processing unit". Each CPU has 1 to 64 cores (or processors),​ each of which  can run an independent program. ​ Intel computers also have 1 or 2 hardware threads or hyperthreads per core, each capable of running an independent program, but which do not add any more hardware capability to the system.+Circa 1980, one computer= one node= one CPU= one socket = one core = one processor = usually one thread. Since then computer design has become more complicated. ​ Typical server computers used for HPC have two to eight sockets (usually two) on a mainboard or motherboard. Each socket holds a CPU "​central processing unit" ​chip. Each CPU has 1 to 64 cores (or processors),​ each of which  can run an independent program. ​ Intel computers also have 1 or 2 hardware threads or hyperthreads per core, each capable of running an independent program, but which do not add any more hardware capability to the system.
  
 == GPU == == GPU ==
-a graphical processing unit, a specialized type of CPU derived from a graphics card. Effectively has hundreds of small cores. For certain tasks (those that can be effectively parallelized),​ is much faster than a general-purpose CPU.  Presently available versions must be attached to a PCI bus which is connected to and controlled by a CPU.+a graphical processing unit, a specialized type of CPU derived from a graphics card. Effectively has hundreds of small cores. For certain tasks (those that can be effectively parallelized),​ is much faster than a general-purpose CPU.  Presently available versions must be attached to a PCI bus which is connected to and controlled by a CPU.  Also Xeon Phi, a CPU with many small cores which has the software interface of a general purpose CPU, but is used on highly parallel codes like a GPU.
  
 == shared memory == == shared memory ==
-In software, a program that runs multiple tasks or software threads, each of which sees the same available memory available from the operating system and shares that memory using one of the multiple shared memory/​multi-threading communication methods (OpenMP, pthreads, POSIX shm, MPI over shared memory, etc.). ​ Shared memory programs cannot run across multiple nodes. In hardware, usually a node or a single computer system that supports shared memory access across its memory. Implies a limit (a little less than the amount of memory in the node) to the memory size of the running program. ​+In software, a program that runs multiple tasks or software threads, each of which sees the same available memory available from the operating systemand shares that memory using one of the multiple shared memory/​multi-threading communication methods (OpenMP, pthreads, POSIX shm, MPI over shared memory, etc.). ​ Shared memory programs cannot run across multiple nodes. In hardware, usually a node or a single computer system that supports shared memory access across its memory. Implies a limit (a little less than the amount of memory in the node) to the memory size of the running program. ​
  
 == distributed memory == == distributed memory ==
Line 43: Line 43:
  
 == single-threaded == == single-threaded ==
-A software program that cannot take advantage of multi-threading because it was written without multi-threading support. ​ Essentially can use only one core on one node regardless of the number of cores available. Multiple single-threaded programs can be run on a single node if sufficient memory is available.  ​+A software program that cannot take advantage of multi-threading because it was written without multi-threading support. ​ Essentially can use only one core on one node regardless of the number of cores available. Multiple single-threaded programs can be run on a single node on multiple cores if sufficient memory is available.  ​
  
 == memory hierarchy == == memory hierarchy ==
-A design element used to make fast computers affordable. ​ Memory is arranged in levels with very small and very fast and very expensive levels close to the CPU, and each succeeding level is made larger and slower. Most modern computers have registers (very fast and of KB size), L1 to L3 or L4 cache of MB size, and main memory of GB size, or "​memory"​ if unspecified. ​ The operating system automatically handles staging data from main memory through the cache and registers, unless the programmer uses assembly language to control that staging. ​ This process makes sequential access to main memory relatively fast, as large blocks of memory can be staged through the cache while computing is ongoing, but random access to main memory is relatively slow, as the processor can idle for 200 cycles while waiting for a single element of memory.+A design element used to make fast computers affordable. ​ Memory is arranged in levels with very small and very fast and very expensive levels close to the CPU, and each succeeding level is made larger and slower. Most modern computers have registers (very fast and of KB size), L1 to L3 or L4 cache of MB size, and main memory of GB size, or "​memory"​ if unspecified. ​ The operating system automatically handles staging data from main memory through the cache and registers, unless the programmer uses assembly language to control that staging. ​ This process makes sequential access to main memory relatively fast, as large blocks of memory can be staged through the cache while computing is ongoing, but random access to main memory is relatively slow, as the processor can idle for 200 cycles while waiting for a single element of main memory.
  
 == storage hierarchy == == storage hierarchy ==
-By analogy with memory hierarchy, the practice of using multiple disk storage systems with an HPC system. Each tier of storage is larger and slower than the preceding tier.  The first tier is relatively small and fast, usually composed of SSD, and does most direct data movement to the compute nodes. The last tier may be tape or large and inexpensive disk drives and holds longer term and larger files.+By analogy with memory hierarchy, the practice of using multiple disk storage systems with an HPC system. Each tier of storage is larger and slower than the preceding tier.  The first tier is relatively small and fast for a disk, usually composed of SSD, and does most direct data movement to the compute nodes. The last tier may be tape or large and inexpensive disk drives and holds longer term and larger files.
  
 == scratch file system == == scratch file system ==
-A temporary file system, designed for speed rather than reliability,​ and the first tier in the storage hierarchy.+A temporary file system, designed for speed rather than reliability,​ and the first tier in the storage hierarchy. Usually composed of SSD.
  
 == SSD == == SSD ==
-Solid state disk, memory chips packaged with an interface that resembles ​a disk drive. Faster than rotating disk drives and still more expensive though decreasing in price over time.+Solid state disk, memory chips packaged with an interface that appears to the computer to be a disk drive. Faster than rotating disk drives and still more expensivethough decreasing in price over time.
  
 == latency == == latency ==
Line 64: Line 64:
  
 == VM or virtual machine == == VM or virtual machine ==
-a program running on a node that emulates a computer and connects the host computer'​s resources to the emulated computer. ​ The VM runs an operating system, which run user programs, like a physical computer. Useful for programs that do not consume a lot of CPU time, and also useful to keep user programs from exceeding memory limits, and providing a way to save the state of a user program. A single powerful computer can run a number of VMs. +a program running on a node that emulates a computer and connects the host computer'​s resources to the emulated computer. ​ The VM runs an operating system, which runs user programs, like a physical computer. Useful for programs that do not consume a lot of CPU time, and also useful to keep user programs from exceeding memory limits, and providing a way to save the state of a user program. A single powerful computer can run a number of VMs. 
  
-== scheduler == +== HPC scheduler == 
-A program that maintains a list of batch jobs to be executed, ranks them in some priority order, and executes batch jobs on compute nodes as they become available.+A program that maintains a list of batch jobs to be executed ​on a cluster, ranks them in some priority order, and executes batch jobs on compute nodes as they become available.  Also OS scheduler, a program that runs on a shared-memory node and allows competing user programs access to CPU time, and IO scheduler, another program that arranges multiple disk accesses for a node.
  
 == parallel program == == parallel program ==
-A program that is either multi-task (like MPI) or multi-threaded (like OpenMP) or both, in order to effectively use more cores and get more computing done. May be either shared-memory or distributed-memory. ​ Opposite, a serial program.+A program that is either multi-task (like MPI) or multi-threaded (like OpenMP) or both, in order to effectively use more cores and more nodes and get more computing done. May be either shared-memory or distributed-memory. ​ Opposite, a serial program.
  
 == parallel scaling == == parallel scaling ==
-The efficiency of a parallel program, usually defined as the parallel speedup of the program divided by the number of cores occupied. ​ Speedup is defined as the serial run time divided by the parallel run time.  Usually parallel computing introduces overhead, and scaling is less than 1 or 100%. Rarely, running on multiple CPUs can make each task fit within the memory cache of each CPU, avoiding waiting for main memory access, and scaling can exceed 1. In most cases, scaling starts at 1 at 1 core (by definition) and decreases as more cores are added, until some point is reached at which adding more cores makes the program slower.+The efficiency of a parallel program, usually defined as the parallel speedup of the program divided by the number of cores occupied. ​ Speedup is defined as the serial run time divided by the parallel run time.  Usually parallel computing introduces overhead, and scaling is less than 1 or 100%. Rarely, running on multiple CPUs can make each task fit within the memory cache of each CPU, avoiding waiting for main memory access, and scaling can exceed 1. In most cases, scaling starts at 1 at 1 core (by definition) and decreases as more cores are added, until some point is reached at which adding more cores adds overhead and makes the program slower.
  
jargon.txt · Last modified: 2017/05/10 18:08 by root