Data Science Studio can both read and write datasets directly on FTP servers.
DSS also features a cached FTP dataset which can provide better performance if you only need to read a FTP file and don’t mind the copy of the data which is made locally.
A cached FTP dataset will still check the FTP server for updates when it is rebuilt.
Defining the remote FTP connection¶
Accessing remote files stored on FTP servers first requires the definition of a connection to the remote server, as follows:
- Go to Administration > Connections
- Click the “New connection” button and select the FTP connection
- Enter a name for the new connection, and the required connection parameters
- Save the new connection
FTP connection parameters¶
|Host||Host name or IP address of the FTP server to access (Mandatory)|
|User||FTP username to use, or empty for an anonymous FTP connection|
|Password||FTP password to use, or empty for an anonymous FTP connection|
|Use passive mode||Check to use FTP “passive” data transfer mode (default). Using FTP passive mode is often mandatory when there is a firewall between the Data Science Studio server and the FTP server.|
|Path||Path to the remote folder to use once connected to the FTP server.
Start with a
|Writable||Check to allow DSS to write datasets on this server.
Those datasets will be written in a subfolder or the
|Allow managed||Check to allow DSS to write
managed datasets on this server.
|Use global proxy||When checked, use the global proxy for this connection. Uncheck this if the FTP server is directly accessible. If you have an HTTP proxy, passive mode is mandatory.|
Defining a FTP Dataset¶
After setting up a FTP connection, simply add a new dataset to your project, choosing the “Uncached FTP” type. Select your FTP connection.
If necessary, specify a path (subpath of the connection’s path if it is not empty) or click “Browse” and select a file or directory.
If the final path a directory, the data is the union of all the data in all the files in that directory (including sub-directories). The sample displayed will only present data from the first non-empty file.
Use the FTP dataset for writing¶
Two cases are supported:
- In a folder:
- the data will be written in possibly multiple files
- the content of the folder is wiped before writing
- writing a managed dataset requires a directory
- In a file:
- you must create the file beforehand (it may be empty)
- the file is emptied before writing