User Tools

Site Tools


moving_data

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
moving_data [2020/09/21 19:52]
root old revision restored (2020/09/21 19:42)
moving_data [2021/03/10 20:15] (current)
pwolinsk
Line 1: Line 1:
 ===== Data Transfer to and from AHPCC Clusters ===== ===== Data Transfer to and from AHPCC Clusters =====
  
-A dedicated external data mover node is available, called **tgv.uark.edu** from campus and **dm.uark.edu** from the world. It should be used for moving data to and from the clusters and the Razor parallel file systems.  **tgv**/**dm** is configured with a 10Gb/s network connection and a dedicated 21TB storage system mounted at **''/local_storage''**.  Regular login shells are blocked. The allowed protocols are   +==== Small files (<100MB) ==== 
 + 
 +There data transfer protocols are supported to move data to and from the main storage on the AHPCC clusters:
  
    * **''scp''** (secure copy)       * **''scp''** (secure copy)   
    * **''sftp''** (secure ftp)      * **''sftp''** (secure ftp)  
    * **''rsync''**    * **''rsync''**
 +
 +In addition the **''wget''** and **''curl''** commands are available to download data to your account using a public URL.
 +
  
 === Linux and MacOS === === Linux and MacOS ===
  
-To upload a data file from the current directory on your local desktop machine to your /storage directory on **razor**:+To upload a data file from the current directory on your local desktop machine to your /storage directory on **pinnacle**:
 <code> <code>
-pawel@localdesktop$# scp localfile.dat pwolinsk@tgv.uark.edu:/storage/pwolinsk/+pawel@localdesktop$# scp localfile.dat pwolinsk@pinnacle.uark.edu:/storage/pwolinsk/
 </code> </code>
-To download a data file from your /storage directory on **razor** to the current directory on your local desktop machine:+To download a data file from your /storage directory on **pinnacle** to the current directory on your local desktop machine:
 <code> <code>
-pawel@localdesktop$# scp pwolinsk@tgv.uark.edu:/storage/pwolinsk/remotefile.dat .+pawel@localdesktop$# scp pwolinsk@pinnacle.uark.edu:/storage/pwolinsk/remotefile.dat .
 </code> </code>
-You will also have a staging directory on **tgv**/**dm** called **/local_storage/$USER/**.  A new Globus Online instance on **tgv**/**dm** is in preparation, and login shells are available for special situations such as batch **''wget''** from an http server.  Please contact hpc-support@listserv.uark.edu if you need that.+ 
  
 === Windows === === Windows ===
Line 28: Line 34:
  
 <code> <code>
-C:\Users\Pawel> c:\Users\Pawel\Downloads\pscp.exe filetoupload.txt pwolinsk@razor.uark.edu:+C:\Users\Pawel> c:\Users\Pawel\Downloads\pscp.exe filetoupload.txt pwolinsk@pinnacle.uark.edu:
 </code> </code>
  
-The code above uses secure copy protocol to upload a file "filetoupload.txt" to the home directory of user pwolinsk on razor.uark.edu.+The code above uses secure copy protocol to upload a file "filetoupload.txt" to the home directory of user pwolinsk on pinnacle.uark.edu.
  
 Another popular windows transfer client (GUI) is WinSCP: Another popular windows transfer client (GUI) is WinSCP:
Line 37: Line 43:
 https://winscp.net/eng/download.php https://winscp.net/eng/download.php
  
-<html>+**NOTE:** Pinnacle, Trestles and Razor clusters all share the same main AHPCC storage. So transferring files to any one of the 3 clusters will make those files available on all clusters.
  
-<ul> 
-<table border=0><tr><td bgcolor=#aaaaaa><b>NOTE:</b> The data mover node <b><tt>tgv.uark.edu</tt></b> can also be accessed as <b><tt>dm.uark.edu</tt></b> from outside of the UofA network.  This domain name is assigned an IP address in the network DMZ (demilitarized zone) on a 10Gb/s ethernet network and is the preferred domain name to use for UofA external transfers.</td></tr></table> 
-</html> 
  
-====Data Transfer between Razor & Trestles Clusters =====+==== Large Data Transfer (>100MB) ====
  
-A dedicated internal node named **bridge** is set aside for the purpose of moving data between storage systems of the Razor and Trestles clusters.  The **bridge** node has **//home//**, **//storage//** and **//scratch//** file systems from both Razor and Trestles nodes mounted under these directories:+GLOBUS [[https://globus.org/]] is a service for secure, reliable research data managementIt allows users to move, share, & discover data via a single interface using a web browser.​  GLOBUS is designed to move large data sets.
  
-__Trestles file systems__ +scp, sftp, rsync have the advantage of being very simple and do not require any initial setup However large data sets often fail to transfer correctly using these protocols.  GLOBUS does require some initial setup but is much more reliable. It has many featuressuch as splitting the transfer into multiple simultaneous streams, encrypting the data in flight, automatically retransmitting data on network failure/timeouts, verifying data integrity after transfer.  In addition our installation of GLOBUS is connected to the **100Gb/s** network (while our cluster login nodes used for scp/sftp/rsync are on the 10Gb/s network).
-<code> +
-/trestles/home/ +
-/trestles/storage/ +
-/trestles/scratch/ +
-These are also mounted at: +
-/home +
-/storage +
-/scratch +
-There are also multiple privately owned storage areas /storage[x]. +
-</code> +
-When you login to **bridge**you will be located in your Trestles **//home//**.  Also please recall that the Trestles persistent **/scratch/$USER** partition is being phased out and will in the future be only **/scratch/$PBS_JOBID** for each batch job.+
  
-__Razor file systems__ +GLOBUS service moves data between GLOBUS Endpoints.  Each Endpoint is a server process running on a machine which can send and receive data.  One such endpoint, named **UARK-Pinnacle**, is set up on the Pinnacle cluster.  It is a //public// endpoint (visible to all GLOBUS users), and accessible by anyone with an account on Pinnacle.  To transfer data between your account on Pinnacle and your personal workstation/laptop you will need to set up a //private// GLOBUS endpoint, which is only visible and accessible by you.  This requires the installation of GLOBUS personal connect server on your workstation/laptop.
-<code> +
-/razor/home/ +
-/razor/storage/ +
-/razor/scratch/ +
-</code>+
  
-Although **''scp''** to the bridge node is possible from either cluster, we recommend logging into **bridge** node directly and using the **''cp''** or **''mv''** commands to move files, thus using 40Gb/s Infiniband instead of **''scp''** using 1Gb/s ethernet:+=== GLOBUS personal connect server ===
  
-<code>tres-l1:pwolinsk:$ ssh bridge +To install GLOBUS personal connect server
-Last login: Fri Feb 19 14:03:31 2016 from tres-l1 +  log into [[https://globus.org]], by clicking on "Log In" button at the top right of the page.  If your institution is listed in the drop down list of Organizations, select it and log in with your organization credentials.  Otherwise, if you have a google account or ORCID ID, use one of those or click on the "GLOBUS ID Sign in" You'll have an option to create a new GLOBUS ID.  (This step only identifies you as a user of GLOBUS the service - it is not related to your Pinnacle account)
-No Modulefiles Currently Loaded. +  after logging into the GLOBUS portal, click on the "Endpoints" in the left hand vertical menu. This will take you to a list of endpoints which you have used (list will be empty for new accounts) 
-bridge:pwolinsk:$ cp /trestles/home/pwolinsk/memusage /razor/home/pwolinsk/ +  at the top right of the page click on "Create a personal endpoint" link, and follow the prompts to install, name and start your //private// GLOBUS endpoint on your workstation/laptop (Make a note of the name you used for your endpoint) After completing the steps the "Endpoint" list in the GLOBUS portal will show your newly created endpoint on your workstation/laptop If you click on it you will be able to browse your local files This endpoint is only visible and available to your GLOBUS id.
-bridge:pwolinsk:$ exit +
-logout +
-Connection to bridge closed+
-tres-l1:pwolinsk:$  +
-</code> +
-**''rsync''** is better for copying whole directories: +
-<code> +
-bridge:pwolinsk:$ cd /razor/home/pwolinsk +
-bridge:pwolinsk:/razor$ rsync -av XSEDE /home/pwolinsk/ +
-sending incremental file list +
-XSEDE/ +
-XSEDE/xsede13/ +
-...omitted every file listed with -v option..+
-XSEDE/xsede14/OpenMP_Exercise/heat.c+
  
-sent 185047 bytes  received 1258 bytes  124203.33 bytes/sec +=== Connecting to UARK-Pinnacle Endpoint === 
-total size is 180459  speedup is 0.97 +To transfer data between the //private// GLOBUS endpoint (which you just created) on your workstation/laptop and the //public// endpoint on Pinnacle we have to find and connect to both endpoints using the "File Manager" in the GLOBUS portal. 
-</code>+  - Click on "File Manager" in the vertical menu on the right in the GLOBUS portal 
 +  - Near the top of the page in the "Collection" text entry type in 'UARK-Pinnacle' As you start typing you will see a list of endpoints below which match the search string you are entering.  Eventually you should see 'UARK-Pinnacle' endpoint in the list.  Select it
 +  - You will then be asked to authenticate to use the endpoint.  Click "Continue" and you'll be redirected to an AHPCC themed login page.  Use your AHPCC account user name and password to log in
 +  - After a successful login you will see a window with a listing of your home directory on Pinnacle.
  
-On **bridge****''/home/username''** and **''/trestles/home/username''** are equivalent.+=== Connecting to your personal Endpoint on workstation/laptop === 
 +You are already connected to the //public// UARK-Pinnacle Endpoint and logged into your account on Pinnacle.  To transfer files between your local workstation/laptop and UARK-Pinnacle endpointyou will also need to find and open your //private// endopint.  If you already see both 'UARK-Pinnacle' and your private endpoint in the file manager in the GLOBUS portal you can drag and drop files between the windows in the file manager, which will start transfer of data.  If not, you will have to find and connect to your //private// endpoint on your workstation/laptop: 
 +  - At the top right of the File Manager in the GLOBUS portal, in the Panels section click on the middle icon symbolizing 2 windows side by side. 
 +  - In the empty window in the "File Manager" in the "Collection" text box enter the name of your //private// GLOBUS endpoint you created. Click on the name of your endpoint once it shows up in the list below to connect.  (**NOTE:** the GLOBUS connect personal server has to be started on your workstation/laptop to connect to it via the GLOBUS portal).
  
-**bridge** is not reachable from outside the clusters, please use **tgv**/**dm**.  A data transfer node to the Trestles file systems is planned.+With both endpoints connected you can now drag and drop files between the endpoints.
moving_data.1600717939.txt.gz · Last modified: 2020/09/21 19:52 by root