User Tools

Site Tools


moving_data

Data Transfer to and from AHPCC Clusters

Small files (<100MB)

There data transfer protocols are supported to move data to and from the main storage on the AHPCC clusters:

  • scp (secure copy)
    • sftp (secure ftp)
      * rsync

    In addition the wget and curl commands are available to download data to your account using a public URL.

Linux and MacOS

To upload a data file from the current directory on your local desktop machine to your /storage directory on pinnacle:

pawel@localdesktop$# scp localfile.dat pwolinsk@pinnacle.uark.edu:/storage/pwolinsk/

To download a data file from your /storage directory on pinnacle to the current directory on your local desktop machine:

pawel@localdesktop$# scp pwolinsk@pinnacle.uark.edu:/storage/pwolinsk/remotefile.dat .

Windows

Windows OS does not include Secure Copy or Secure FTP tools. There are multiple file transfer clients available for download. The most popular command line client is pscp.exe available here:

https://the.earth.li/~sgtatham/putty/latest/x86/pscp.exe

To transfer files using pscp.exe download and save pscp.exe to your Windows machine. Then open a terminal (i.e. Command Prompt, under “Start→All Programs→Accessories→Command Prompt”) and specify the full path to the location of the downloaded pscp.exe file followed by two arguments <source> and <destination>. Either one or both could specify a file a on remote machine (user@host:pathtofile) or a local file (pathtofile). For example:

C:\Users\Pawel> c:\Users\Pawel\Downloads\pscp.exe filetoupload.txt pwolinsk@pinnacle.uark.edu:

The code above uses secure copy protocol to upload a file “filetoupload.txt” to the home directory of user pwolinsk on pinnacle.uark.edu.

Another popular windows transfer client (GUI) is WinSCP:

https://winscp.net/eng/download.php

NOTE: Pinnacle, Trestles and Razor clusters all share the same main AHPCC storage. So transferring files to any one of the 3 clusters will make those files available on all clusters.

Large Data Transfer (>100MB)

GLOBUS https://globus.org/ is a service for secure, reliable research data management. It allows users to move, share, & discover data via a single interface using a web browser.​ GLOBUS is designed to move large data sets.

scp, sftp, rsync have the advantage of being very simple and do not require any initial setup. However large data sets often fail to transfer correctly using these protocols. GLOBUS does require some initial setup but is much more reliable. It has many features, such as splitting the transfer into multiple simultaneous streams, encrypting the data in flight, automatically retransmitting data on network failure/timeouts, verifying data integrity after transfer. In addition our installation of GLOBUS is connected to the 100Gb/s network (while our cluster login nodes used for scp/sftp/rsync are on the 10Gb/s network).

GLOBUS service moves data between GLOBUS Endpoints. Each Endpoint is a server process running on a machine which can send and receive data. One such endpoint, named UARK-Pinnacle, is set up on the Pinnacle cluster. It is a public endpoint (visible to all GLOBUS users), and accessible by anyone with an account on Pinnacle. To transfer data between your account on Pinnacle and your personal workstation/laptop you will need to set up a private GLOBUS endpoint, which is only visible and accessible by you. This requires the installation of GLOBUS personal connect server on your workstation/laptop.

GLOBUS personal connect server

To install GLOBUS personal connect server:

  1. log into https://globus.org, by clicking on “Log In” button at the top right of the page. If your institution is listed in the drop down list of Organizations, select it and log in with your organization credentials. Otherwise, if you have a google account or ORCID ID, use one of those or click on the “GLOBUS ID Sign in”. You'll have an option to create a new GLOBUS ID. (This step only identifies you as a user of GLOBUS the service - it is not related to your Pinnacle account).
  2. after logging into the GLOBUS portal, click on the “Endpoints” in the left hand vertical menu. This will take you to a list of endpoints which you have used (list will be empty for new accounts)
  3. at the top right of the page click on “Create a personal endpoint” link, and follow the prompts to install, name and start your private GLOBUS endpoint on your workstation/laptop. (Make a note of the name you used for your endpoint). After completing the steps the “Endpoint” list in the GLOBUS portal will show your newly created endpoint on your workstation/laptop. If you click on it you will be able to browse your local files. This endpoint is only visible and available to your GLOBUS id.

Connecting to UARK-Pinnacle Endpoint

To transfer data between the private GLOBUS endpoint (which you just created) on your workstation/laptop and the public endpoint on Pinnacle we have to find and connect to both endpoints using the “File Manager” in the GLOBUS portal.

  1. Click on “File Manager” in the vertical menu on the right in the GLOBUS portal
  2. Near the top of the page in the “Collection” text entry type in 'UARK-Pinnacle'. As you start typing you will see a list of endpoints below which match the search string you are entering. Eventually you should see 'UARK-Pinnacle' endpoint in the list. Select it.
  3. You will then be asked to authenticate to use the endpoint. Click “Continue” and you'll be redirected to an AHPCC themed login page. Use your AHPCC account user name and password to log in.
  4. After a successful login you will see a window with a listing of your home directory on Pinnacle.

Connecting to your personal Endpoint on workstation/laptop

You are already connected to the public UARK-Pinnacle Endpoint and logged into your account on Pinnacle. To transfer files between your local workstation/laptop and UARK-Pinnacle endpoint, you will also need to find and open your private endopint. If you already see both 'UARK-Pinnacle' and your private endpoint in the file manager in the GLOBUS portal you can drag and drop files between the windows in the file manager, which will start transfer of data. If not, you will have to find and connect to your private endpoint on your workstation/laptop:

  1. At the top right of the File Manager in the GLOBUS portal, in the Panels section click on the middle icon symbolizing 2 windows side by side.
  2. In the empty window in the “File Manager” in the “Collection” text box enter the name of your private GLOBUS endpoint you created. Click on the name of your endpoint once it shows up in the list below to connect. (NOTE: the GLOBUS connect personal server has to be started on your workstation/laptop to connect to it via the GLOBUS portal).

With both endpoints connected you can now drag and drop files between the endpoints.

moving_data.txt · Last modified: 2021/03/10 20:15 by pwolinsk