Advanced topics

What are the sudo authorizations?

When you install impersonation, DSS adds a sudoers rule in /etc/sudoers.d/dataiku-dss-THE_DSS_USER-RANDOM_STRING

Note

If DSS could not install this sudoers rule, the impersonation setup asks you to do it manually

This rule allows DSS to run, as root, a small wrapper which is used:

  • To execute user-submitted code as the end-user UNIX accounts
  • To change permissions and ownerships on various files required by user-submitted code

No user-submitted code runs as root. The wrapper (also called the security module) has a specific configuration to limit which users it may run as.

Configuration of the local security module

When DSS runs a command on behalf of an end-user, it consults the security module configuration in DATADIR/security/security-config.ini

This ini file contains two important information:

  • Which user groups it may change identity to. This is configured in [users], in the allowed_user_groups settings
  • Where DSS is located. DSS will not change any file permissions outside of this directory

Splitted DSS datadirs

In some configurations, you might have “splitted” your DSS datadir, by using symbolic links.

To allow the security module to change file permissions in the additional locations, fill in the additional_allowed_file_dirs in the dirs section

File structure of HDFS datasets

In regular security mode, datasets location is specified by a path in a connection.

When multi-user security is enabled, DSS uses a different files pattern for managed datasets: if the dataset’s configured location is /user/dataiku/datasets/MYPROJECT/mydataset, then the actual data is written in /user/dataiku/datasets/MYPROJECT/mydataset/data.

The “data” folder belongs to the last user who wrote the dataset (this might be “hive” or “impala”). The “mydataset” folder always belong to the dssuser user.

ACLs preventing access are on the “mydataset” folder. Within the “mydataset” folder, it is normal for data files to have world-readable permissions. The restrictive “gateway” ACLs on “mydataset” prevent unauthorized users from accessing them.

This behavior is configured in the settings of the HDFS connection, in the “Write ACL synchronization mode” setting.