Working with Git

Overview

DSS provides native integration with Git. Several parts of DSS can work with Git.

Version control of projects

Each change that you make in the DSS UI (modify the settings of a dataset, edit a recipe, modify a dashboard, …) is automatically recorded in the version control system.

This gives you:

  • Traceability into all actions performed in DSS

  • The ability to understand the history of each object

  • The ability to revert changes

For more details, see Version control of projects

Importing Python and R code

If you have code that has been developed outside of DSS and is available in a Git repository (for example, a library created by another team), you can import this repository (or a part of it) in the project libraries, and use it in any code capability of DSS (recipes, notebooks, webapps, …)

For more details, see Importing code from Git in project libraries

Importing Jupyter Notebooks

If you have Notebooks that have been developed outside of DSS and are available in a Git repository, you can import these Notebooks in a DSS project. You can modify them inside DSS and push back the changes to the remote repository.

For more details, see Importing Jupyter Notebooks from Git

Developing plugins

When developing plugins, each plugin is a Git repository. You can view the history, revert changes, use branches, and push/pull changes from remote repositories.

For more details, see Git integration in the plugin editor.

Importing plugins

If you have developed a plugin on a DSS instance and have pushed your plugin to a Git repository, you can import this plugin on another DSS instance directly from the Git repository.

For more details, see Installing plugins

Working with remotes

All integration points explained above include the ability to interact with remote repositories (either pull-only or pull-and-push depending on the cases).

This section explains how you can work with remote repositories.

DSS always uses the git command-line client to work with remote repositories, in non-interactive mode.

This applies to all DSS Git remote features, including:

  • Project version control

  • Git references in project libraries

  • Imported Jupyter Notebooks linked to a Git remote

  • Plugin development remotes

Remote Git access is controlled in two layers:

  • Administrators define which remote repositories users may access through Git group rules

  • Users can then authenticate to allowed SSH remotes with their personal SSH keys

Interaction with SSH-based remotes can use:

  • Per-user SSH keys managed in DSS

  • Or the default system-level SSH behavior configured for the DSS server

HTTPS-based remotes are also supported. In that case, the UNIX account running DSS must have credentials available for the target repository, for example through the Git credentials cache.

Setup

To use personal SSH authentication for Git remotes, an administrator must first enable it, then each user can generate or import their own SSH key in DSS.

Administrator setup

Git remote access is configured in Administration > Settings > Git > Group rules.

Rules are evaluated on a first-match basis: the first rule matching both the user’s groups and the remote URL is applied.

Each rule can define:

  • Whether Git is allowed for the group

  • A whitelist of allowed remote URLs

  • Additional Git configuration options

  • Whether DSS controls the SSH command

  • Whether per-user SSH keys are allowed for that rule

  • An alternate home directory for Git configuration overrides

To let users authenticate with their own SSH keys, the matching rule must:

  • Allow Git

  • Match the remote repository URL

  • Have Let DSS control SSH command enabled

  • Have Allow per-user SSH keys enabled

If a rule denies Git, no remote operation is allowed. If a rule allows Git but disables per-user SSH keys, DSS falls back to the default system-level SSH behavior for that rule.

User setup

Users manage their Git SSH keys in Profile > Credentials > SSH.

Users can:

  • Generate a new SSH key pair directly in DSS

  • Import an existing private key

  • Copy the corresponding public key

  • Reorder their keys

  • Define a Git repos whitelist regex on each key

The public key can then be added on the Git hosting platform, for example as a deploy key or a personal SSH key, depending on the Git provider and the desired scope.

Private keys and passphrases are stored encrypted in DSS user credentials. DSS only exposes the public key and the fingerprint back to the user interface.

Key selection rules

For a given SSH remote:

  • DSS first applies the matching Git group rule

  • DSS then filters the user’s SSH keys using the key-level Git repos whitelist regex

  • An empty regex behaves as a catch-all and matches all repositories

  • Matching keys are sorted by their configured order and loaded into an SSH agent for the Git command.

  • If no user key matches, DSS uses the default system-level SSH behavior

This means that Git group rules and user key regexes are separate filters:

  • Group rules control whether the repository may be accessed at all

  • User key regexes control which of the user’s keys are considered for that repository

Legacy and admin-managed setup

Per-user SSH keys are the recommended way to authenticate users to SSH remotes.

However, DSS can still use the default system-level SSH configuration of the DSS server. This remains useful for legacy or admin-managed setups, for example when administrators configure SSH keys or Git settings outside DSS.

If you rely on system-level SSH behavior, make sure that:

  • The UNIX account running DSS can connect to the repository without any interactive prompt

  • The SSH host key of the remote server has already been validated

Configuration and security

Interaction with remote repositories is still executed by the DSS server in non-interactive mode.

Per-user SSH keys do not bypass Git group rules. Users can only use their personal SSH keys for repositories allowed by the matching rule in Administration > Settings > Git.

If no rule matches for a given group, access to Git remotes is denied to this group. It is often desirable to have a catch-all rule as the last rule, i.e. a rule without a group name that catches users not handled by previous rules.

Warning

Never use .* as a whitelisted URL, because that allows the user to clone local repositories as the dssuser, which can be abused to read folders (as the dssuser) that a user shouldn’t be allowed to read.

The default value when adding a new rule prevents this.

Example 1: Allow repository URLs explicitly per group

If you want:

  • “group1” to be able to work with remotes “remote1a” and “remote1b”

  • “group2” to be able to work with remote “remote2”

  • All other groups to be denied access to any remote

Configure two rules:

  • Group=group1, URLs whitelist = 2 entries, “remote1a” and “remote1b”

  • Group=group2, URLs whitelist = 1 entry, “remote2”

If you want:

  • “group1” to be able to work only with remote “remote1”

  • All other groups to be able to work with remote “remote2”

Configure two rules:

  • Group=group1, URLs whitelist = 1 entry, “remote1”

  • Group=<empty>, URLs whitelist = 1 entry, “remote2”

Example 2: Use admin-managed SSH behavior per group

This is useful if you want administrators to manage SSH configuration outside DSS instead of relying on per-user SSH keys.

If you want:

  • “group1” to be able to work with any remote, but with SSH key “/home/dataiku/.ssh/group1-key”

  • “group2” to be able to work with any remote, but with SSH key “/home/dataiku/.ssh/group2-key”

  • All other groups to be denied access to any remote

Configure two rules:

  • Group=group1, URLs whitelist = default value, add a configuration option "core.sshCommand" = "ssh -i /home/dataiku/.ssh/group1-key -o StrictHostKeyChecking=yes"

  • Group=group2, URLs whitelist = default value, add a configuration option "core.sshCommand" = "ssh -i /home/dataiku/.ssh/group2-key -o StrictHostKeyChecking=yes"

Note

On Dataiku Cloud, SSH keys created via the SSH extension are available by default to the DSS instance. To specify a custom SSH key, you must also add the -F none option so that SSH will not load the default configuration file, otherwise SSH keys listed by default will also be used. So a full configuration would look like: "ssh -i /home/dataiku/.ssh/group1-key -o StrictHostKeyChecking=yes -F none".

Testing SSH access

Users can validate their SSH setup from Profile > Credentials > SSH with Test your SSH keys with your repositories.

The test accepts one or more repository URLs, separated by a newline or a ;.

For each repository, DSS:

  • Checks whether the repository is allowed by the matching Git group rule

  • Tests each matching per-user SSH key individually

  • Tests the final DSS behavior using all matching user SSH keys together with the default system SSH keys

The overall status is successful only when both conditions are met:

  • The repository is allowed by Dataiku Git access rules

  • Read and write access are confirmed

When testing per-user SSH keys specifically, DSS also indicates whether:

  • Per-user SSH keys are allowed by the matching Git group rule

  • At least one user SSH key matches the repository

The read and write checks are performed as follows:

  • Read access is validated with a shallow clone

  • Write access is validated with git push --dry-run

If a user’s key does not match the repository because of its whitelist regex, it is shown as skipped rather than failed.

Troubleshooting

“Unknown Host Key” issues

The first time you push to a remote, you might encounter an “UnknownHostKey” error. Because DSS enforces strict host key checking, the SSH host key of the remote server must already be known by the DSS server.

You need to log into the DSS server and run a single ssh or remote Git command to the origin you want to talk with in order to retrieve and verify the host key. The key will then be added to the relevant known_hosts file and DSS can connect afterwards.

For example if you want to push to git@myserver.com:myrepo and get an UnknownHostKey error, log in to the server and run ssh git@myserver.com. You will get a prompt to accept the host key. Accept it and you can then work with this remote.

Other common issues

If a repository test or Git remote operation fails:

  • Check that the repository URL matches an allowed Git group rule

  • Check that the matching rule allows per-user SSH keys

  • Check that at least one user SSH key matches the repository regex

  • Use the repository test tool in Profile > Credentials > SSH to distinguish blocked repositories, regex mismatches, and read or write access failures