Amazon S3ΒΆ

You can read and/or write datasets from/to Amazon Web Services’ Simple Storage Service (AWS S3). S3 is an object storage service: you create containers (“buckets” in the S3 vocabulary) that can store arbitrary binary content and textual metadata under a specific key, unique in the container.

While not technically a hierarchical file system with folders, sub-folders and files, that behavior can be emulated by using keys containing /. For instance, you can store your daily logs using keys like 2015/01/24/app.log. S3 lets you list all objects with a specific prefix, say 2015/01/ and many S3 clients (including the AWS Web Console) can “browse” your buckets this way.

DSS uses the same filesystem-like mechanism when accessing S3: when you specify a bucket, you can browse it to quickly find your dataset, or you can set the prefix in which DSS may output datasets. Datasets on S3 thus must be in one of the supported filesystem formats.

Note

While very useful, using S3 as a filesystem comes with a few limitations:
  • keys must not start with a /
  • “files” with names containing / are not supported
  • “folders” (prefixes) . and .. are not supported
  • like on a filesystem, a file and a folder with the same name are not supported: if a file some/key exists, it takes precedence over a some/key/ prefix / folder

To use a S3 bucket, you must configure an AWS connection in the Administration settings. You can then create a S3 Dataset, specify the bucket (or choose from the dropdown list if your connection has bucket listing permission) and type or browse to the prefix or path for your dataset.

The access you specify in an AWS connection must have the following permissions:

  • To read data from a bucket, it must at least have listing and reading permissions on that bucket:

    s3:ListBucket arn:aws:s3:::examplebucket
    s3:GetObject arn:aws:s3:::examplebucket/*
    
  • To write data to a bucket, it must also have writing, deletion and multipart-upload-aborting permissions on that bucket:

    s3:ListBucket arn:aws:s3:::examplebucket
    s3:GetObject arn:aws:s3:::examplebucket/*
    s3:PutObject arn:aws:s3:::examplebucket/*
    s3:DeleteObject arn:aws:s3:::examplebucket/*
    s3:AbortMultipartUpload arn:aws:s3:::examplebucket/*
    
  • Not mandatory, but bucket listing permission allows you to pick a bucket from a dropdown list instead of manually typing its name when creating a dataset:

    s3:ListAllMyBuckets arn:aws:s3:::*
    
  • Also not mandatory, bucket locating permission can improve performance by making DSS access a bucket via its preferred AWS endpoint:

    s3:GetBucketLocation arn:aws:s3:::*