Uncached FTP datasets¶
Data Science Studio can both read and write datasets directly on FTP servers. If you only need to read a FTP file, consider using a locally cached Remote FTP dataset for superior performance and reliability.
A locally-cached dataset will still check the FTP server for updates when it is rebuilt.
Define a live-from-FTP input Dataset¶
After setting up a FTP Connection, simply add a new dataset to your project, choosing the “Uncached FTP” type. Select your FTP connection.
If necessary, specify a path (subpath of the connection’s path if it is not empty) or click “Browse” and select a file or directory.
If the final path a directory, the data is the union of all the data in all the files in that directory (including sub-directories). The sample displayed will only present data from the first non-empty file.
Define an output dataset¶
Two cases are supported:
- In a folder:
- the data will be written in possibly multiple files
- the content of the folder is wiped before writing
- writing a managed dataset requires a directory
- In a file:
- you must create the file beforehand (it may be empty)
- the file is emptied before writing
The default output format will be a single Unix-style CSV file, gzip-compressed; this may be changed.
There are two create an output dataset:
- Create a managed dataset when needed:
When creating a receipe, choose “Create a new dataset”, specify a name and
store into the wanted FTP connection. This will create a managed dataset in a directory (under the connection’s path) with the name
of the dataset. That FTP connection must have both
Allow managed datasetsoptions checked.
- Create a dataset, then use it as output:
Create a new “Uncached FTP” dataset (same as for input). When you need an
output dataset, just select “Use an existing dataset” and pick your dataset.
The FTP connection must have the