Azure Blob Storage

DSS can interact with Azure Blob Storage to:

  • Read and write datasets
  • Read and write managed folders

Azure Blob Storage is an object storage service: you create “buckets” that can store arbitrary binary content and textual metadata under a specific key, unique in the bucket.

While not technically a hierarchical file system with folders, sub-folders and files, that behavior can be emulated by using keys containing /. For instance, you can store your daily logs using keys like 2015/01/24/app.log.

DSS uses the same filesystem-like mechanism when accessing AZure: when you specify a container, you can browse it to quickly find your data, or you can set the prefix in which DSS may output datasets. Datasets on Azure thus must be in one of the supported filesystem formats.

Note

Azure Blob as a filesystem-like storage comes with a few limitations:
  • keys must not start with a /
  • “files” with names containing / are not supported
  • “folders” (prefixes) . and .. are not supported
  • like on a filesystem, a file and a folder with the same name are not supported: if a file some/key exists, it takes precedence over a some/key/ prefix / folder
  • multiple successive / are not supported

Creating a Azure connection

Before connecting to Azure blob with DSS you need to :

  • Create at least one container on azure
  • Retrieve the storage account and a accesskey

(See the official documentation for more details)

To configure your connection you must specify :

  • your storage account on the storageAccount field )
  • your secret key on the accessKey field
  • Ideally a default managed container for managed dataset
  • You can also specify a path within container for managed datasets

Creating Azure Blob datasets

After creating your Azure connection in Administration, you can create Azure Blob datasets.

From either the Flow or the datasets list, click on New dataset > Azure Blob.

  • Select the connection in which your files are located
  • If available, select the bucket (either by listing or entering it)
  • Click on “Browse” to locate your files.

Connections path handling

The Azure connection can be either in “free selection” mode, or in “path restriction mode”.

In “free selection” mode, users can select the bucket in which they want to read, and the path within the bucket. If the credentials have the permission to list buckets, a bucket selector will be available for users.

In “path restriction mode”, you choose a bucket, and optionally a path within the bucket. Users will only be able to read and write data within that “base bucket + path”.

To enable “path restriction mode”, simply write a bucket name (and optionally a path in bucket) in the “Path restrictions” section of the connection settings

Location of managed datasets and folders

For a “free selection” connection

When you create a managed dataset or folder in a Azure connection, DSS will automatically create it within the “Default bucket” and the “Default path”.

Below that root path, the “naming rule” applies. See Making relocatable managed datasets for more information.

For a “path restriction” connection

When you create a managed dataset or folder in a Azure connection, DSS will automatically create it within the Bucket and Path selected in the “Path restrictions” section, and will append the “Default path” from the “managed datasets & folders” section.

Below that root path, the “naming rule” applies. See Making relocatable managed datasets for more information.