Azure Blob Storage¶
DSS can interact with Azure Blob Storage to:
Read and write datasets
Read and write managed folders
Azure Blob Storage is an object storage service: you create “buckets” that can store arbitrary binary content and textual metadata under a specific key, unique in the bucket.
While not technically a hierarchical file system with folders, sub-folders and
files, that behavior can be emulated by using keys containing /
. For
instance, you can store your daily logs using keys like 2015/01/24/app.log
.
DSS uses the same filesystem-like mechanism when accessing AZure: when you specify a container, you can browse it to quickly find your data, or you can set the prefix in which DSS may output datasets. Datasets on Azure thus must be in one of the supported filesystem formats.
Note
- Azure Blob as a filesystem-like storage comes with a few limitations:
keys must not start with a
/
“files” with names containing
/
are not supported“folders” (prefixes)
.
and..
are not supportedlike on a filesystem, a file and a folder with the same name are not supported: if a file
some/key
exists, it takes precedence over asome/key/
prefix / foldermultiple successive / are not supported
Creating a Azure connection¶
Before connecting to Azure Blob Storage with DSS you need to :
Create at least one container on azure
Retrieve the storage account and a accesskey
(See the official documentation for more details)
To configure your connection you must specify :
your storage account on the
storageAccount
fieldyour secret key on the
accessKey
fieldIdeally a default managed container for managed dataset
You can also specify a path within container for managed datasets
Creating Azure Blob Storage datasets¶
After creating your Azure connection in Administration, you can create Azure Blob Storage datasets.
From either the Flow or the datasets list, click on New dataset > Azure Blob Storage.
Select the connection in which your files are located
If available, select the bucket (either by listing or entering it)
Click on “Browse” to locate your files.
Connections path handling¶
The Azure connection can be either in “free selection” mode, or in “path restriction mode”.
In “free selection” mode, users can select the bucket in which they want to read, and the path within the bucket. If the credentials have the permission to list buckets, a bucket selector will be available for users.
In “path restriction mode”, you choose a bucket, and optionally a path within the bucket. Users will only be able to read and write data within that “base bucket + path”.
To enable “path restriction mode”, simply write a bucket name (and optionally a path in bucket) in the “Path restrictions” section of the connection settings
Location of managed datasets and folders¶
For a “free selection” connection¶
When you create a managed dataset or folder in a Azure connection, DSS will automatically create it within the “Default bucket” and the “Default path”.
Below that root path, the “naming rule” applies. See Making relocatable managed datasets for more information.
For a “path restriction” connection¶
When you create a managed dataset or folder in a Azure connection, DSS will automatically create it within the Bucket and Path selected in the “Path restrictions” section, and will append the “Default path” from the “managed datasets & folders” section.
Below that root path, the “naming rule” applies. See Making relocatable managed datasets for more information.