HDFS datasets data structure¶
This only applies for HDFS datasets for which ACL synchronization is used.
When user isolation for Hadoop is disabled, datasets location is specified by a path in a connection.
When user isolation for Hadoop is enabled, DSS uses a different files pattern for managed datasets: if the dataset’s configured location is
/user/dataiku/datasets/MYPROJECT/mydataset, then the actual data is written in
The “data” folder belongs to the last user who wrote the dataset (this might be “hive” or “impala”). The “mydataset” folder always belongs to the
ACLs preventing access are on the
mydataset folder. Within that folder, it is normal for data files to have world-readable permissions. The restrictive “gateway” ACLs on
mydataset prevent unauthorized users from accessing them.
This behavior is configured in the settings of the HDFS connection, in the “Write ACL synchronization mode” setting.