FTP

Data Science Studio can both read and write datasets directly on FTP servers.

Note

DSS also features a cached FTP dataset which can provide better performance if you only need to read a FTP file and don’t mind the copy of the data which is made locally.

A cached FTP dataset will still check the FTP server for updates when it is rebuilt.

Defining the remote FTP connection

Accessing remote files stored on FTP servers first requires the definition of a connection to the remote server, as follows:

  • Go to Administration > Connections
  • Click the “New connection” button and select the FTP connection
  • Enter a name for the new connection, and the required connection parameters
  • Save the new connection

FTP connection parameters

Name Description
Host Host name or IP address of the FTP server to access (Mandatory)
User FTP username to use, or empty for an anonymous FTP connection
Password FTP password to use, or empty for an anonymous FTP connection
Use passive mode Check to use FTP “passive” data transfer mode (default). Using FTP passive mode is often mandatory when there is a firewall between the Data Science Studio server and the FTP server.
Path Path to the remote folder to use once connected to the FTP server. Start with a / to specify an absolute path, without / the path will be relative to the startup directory.
Writable Check to allow DSS to write datasets on this server. Those datasets will be written in a subfolder or the path (or the startup directory if path is empty) bearing the name of the dataset.
Allow managed Check to allow DSS to write managed datasets on this server. Requires Writable.
Use global proxy When checked, use the global proxy for this connection. Uncheck this if the FTP server is directly accessible. If you have an HTTP proxy, passive mode is mandatory.

Defining a FTP Dataset

After setting up a FTP connection, simply add a new dataset to your project, choosing the “Uncached FTP” type. Select your FTP connection.

If necessary, specify a path (subpath of the connection’s path if it is not empty) or click “Browse” and select a file or directory.

If the final path a directory, the data is the union of all the data in all the files in that directory (including sub-directories). The sample displayed will only present data from the first non-empty file.

Use the FTP dataset for writing

Two cases are supported:

  1. In a folder:
    • the data will be written in possibly multiple files
    • the content of the folder is wiped before writing
    • writing a managed dataset requires a directory
  2. In a file:
    • you must create the file beforehand (it may be empty)
    • the file is emptied before writing