Impersonating other users

Multi-user security requires the ability for the dssuser user to “become” other users. This is done by leveraging two distinct mechanisms:

  • For local code (Python, R, Shell) which executes on the DSS host, DSS uses the sudo mechanism
  • For Hadoop and Spark code, executing on the Hadoop cluster, DSS uses a feature of Hadoop called impersonation which allows an authenticated dssuser to submit work to the cluster on behalf of another user.

When multi-user security is enabled, DSS also impersonates all accesses to the Hive metastore and the HDFS folders.

Identity mapping

One of the main challenges of multi-user security is the ability to collaborate. In a too simple multi-user security setup, when a dataset D is built by user A, another user B wouldn’t be able to override it since the files belong to A.

When multi-user security is enabled, DSS goes to great lengths to ensure that collaboration abilities are preserved. It is thus possible to do “full” impersonation, meaning that each end-user connecting to DSS is impersonated to its corresponding underlying Hadoop / UNIX user.

DSS also makes it possible to do more complex mappings of “DSS end-user” to “UNIX/Hadoop user”. For example, you could declare:

  • When working on project A, all users (who have access to project A in DSS) will see their jobs executed as user “projectA” on UNIX/Hadoop
  • When working on project B, all users (who have access to project B in DSS) will see their jobs executed as user “projectB” on UNIX/Hadoop
  • In all other cases, users are impersonated on a 1-to-1 basis.

There are several use cases for this kind of advanced mapping:

  • If not all your end-users have UNIX accounts (since this is required for them to run jobs)
  • In some cases, to strenghten security. For example, in a case where users U1 and U2 must collaborate on a project, U1 being very privileged and U2 having low privileges. Since both users collaborate on a project, U2 can write code that U1 will later execute. If U1 is not careful and does not check the code written by U2, this code will run with its higher privileges. In a case where U2 is hostile, this leaves more burden on U1 to verify the code written by U2. By mapping both users to a per-project user, you can striclty restrict this “project” user to project-specific resources.