SCP / SFTP (aka SSH)

DSS can interact with remote servers through the use of SSH to:

  • Read and write datasets
  • Read and write managed folders

DSS can use either the SCP or the SFTP protocol to interact with the remote server.

Note

You can use the DSS other_recipes/download to cache the contents from a SCP/SFTP server.

This can provide better performance if you need to read SCP/SFTP files a lot of time, and don’t mind the copy of the data which is made into a DSS managed folder.

By default, the download recipe will still check the SCP/SFTP server for updates when its output folder is rebuilt. This behavior can be disabled.

Defining the SSH connection

Accessing remote files stored on SCP/SFTP servers first requires the definition of a SSH connection to the remote server, as follows:

  • Go to Administration > Connections
  • Click the “New connection” button and select SSH
  • Enter a name for the new connection, and the required connection parameters
  • Save the new connection

SSH connection parameters

Name Description
Host Host name or IP address of the SSH server to access, mandatory.
User SSH username to use, mandatory.
Use public key authentication
  • Checked to use public key authentication.
  • Unchecked to use password authentication.
Password SSH password to use. Mandatory is using password authentication.
Key passphrase In public-key authentication mode, optional passphrase to use to decrypt the SSH private key.

When using public-key authentication mode, connection to the remote server will be attempted using any of the two standard SSH keys for the Studio Linux user, stored respectively in files $HOME/.ssh/id_dsa and $HOME/.ssh/id_rsa, where $HOME is the home directory of the DSS user account.

Creating SCP or SFTP datasets

  • From the “Datasets” screen of Data Science Studio, click the “New dataset” button and select the “SCP” or “SFTP”
  • Select the connection to use
  • Click browse to locate your files