Operations

The multi-user security feature offers some flexibility to adapt to various requirements and ways to organize security.

Default security configuration

When you enable multi-user security and follow the setup instructions, DSS starts with a configuration that enables a per-project security policy with minimal administrator intervention.

Overview

  • The HDFS connections are declared as usable by all users.
  • Each project writes to a different HDFS folder.
  • Each project writes to a different Hive database.

The separation of folders and Hive database for each project are ensured by the naming rules defined in the HDFS connection.

Security is thus ensured in two ways:

  • DSS automatically adds ACLs on the actual directories corresponding to datasets, which prevents users who are not in the project’s authorized groups from accessing the folder, even in user-controlled code.
  • Access through Hive or Impala can be controlled using Sentry / Ranger rules.

Note

This default configuration should be usable by all, we recommend that you keep it.

Adding a project

In that setting, adding a project requires adding a Hive database and granting permissions to the project’s groups on the database.

  • Create the project in DSS
  • Add the groups who must have access to the project

By default, the new database is called dataiku_PROJECTKEY where PROJECTKEY is the key of the newly created project. You can configure this in the settings of each HDFS connection.

For Sentry

(NB: This information is given for information purpose only. Please refer to the official Cloudera documentation for your Cloudera version)

As Hive administrator:

  • Using beeline or another Hive client:
    • Create the new database
    • Create a role
    • Add all groups who must have access to the project to the new role
    • Grant ALL ON DATABASE to the new role
    • Grant ALL ON URI to the new role (URI being by default: root of the connection + project key)

For Ranger

(NB: This information is given for information purpose only. Please refer to the official Hortonworks documentation for your HDP version)

As Hive administrator:

  • Using beeline or another Hive client, create the database
  • Go to the Ranger administration UI
  • Go to the Hive module and add a new policy
    • Set Hive database to the name of the new database
    • Set * as Hive table and column
    • Add your end-user group(s) and all permissions in the “Allow conditions” section
    • Save the new policy

New policies might need up to 1-2 minutes to be available.

Adding a user to a group

Read ACLs are group-level, so no intervention is required when a user is added to a group.

Removing a user from a group

The removed user might still have a write ACL if he was the last to modify some datasets. You need to resynchronize the ACLs on all affected datasets in all projects where the user had access.

  • Use the Authorization matrix to check where the user had access
  • Remove the user
  • For each affected project, go to Project > Settings > Config > Security and click “Resync ACLs”.

Adding access to a group

When you add project access to a group, you need to resynchronize the ACLs on the project’s dataset. This will ensure that the new group has access.

  • Do the permission change on the DSS project
  • Go to Project > Settings > Config > Security and click “Resync ACLs”.

If you are using Hive and/or Impala, you also need to ensure that the new group has Sentry/Ranger access to the database and URI.

For Sentry

If you have used a role for the project as recommended, you simply need to add the new group to the project’s role.

For Ranger

  • Edit the policy that you created for this project
  • Add the new group
  • Save the policy

Removing access from a group

When you remove project access to a group, you need to resynchronize the ACLs on the project’s dataset. This will ensure that the group loses existing access.

  • Do the permission change on the DSS project
  • Go to Project > Settings > Config > Security and click “Resync ACLs”.

If you are using Hive and/or Impala, you also need to ensure that the group’s Sentry/Ranger access to the database and URI is revoked.

For Sentry

If you have used a role for the project as recommended, you simply need to remove the new group from the project’s role.

For Ranger

  • Edit the policy that you created for this project
  • Remove the group
  • Save the policy

Other security configurations

Single prefix and Hive database for multiple projects

Instead of using the simple 1-1 mapping between:

  • Projects and HDFS folders
  • Projects and Hive databases

You can alter the HDFS naming rule to reference a specific “team” variable, that each project owner must set. Working this way reduces the administrative burden since each team can create projects as desired. The Hadoop administrator only has to intervene when a new team must be declared.

Disabling management of ACLs by DSS

You can ask DSS not to manage any HDFS ACLs by going to the HDFS connections settings

Interaction with externally-managed data

DSS only manages ACLs on the connections where managed datasets are written. DSS does not manage ACLs on “external” connections (this is controlled by the “Synchronize read ACL” and “Write ACL synchronization” settings in the HDFS connection).

It is the administrator’s responsibility to ensure that read ACLs on these datasets are properly set.

Existing Hive table

If externally-managed data has an existing Hive table, and no synchronization to the Hive metastore, you need to ensure that Hive-level permissions (Sentry or Ranger) allow access to all relevant groups.

Synchronized Hive table

Even on read-only external data, you can ask DSS to synchronize the definition to the Hive metastore. In that case, you need to ensure that the HDFS-level permissions allow the Hive (and maybe Impala) users to access the folder.