Working with Git¶
Overview¶
DSS provides native integration with Git. Several parts of DSS can work with Git.
Version control of projects¶
Each change that you make in the DSS UI (modify the settings of a dataset, edit a recipe, modify a dashboard, …) is automatically recorded in the version control system.
This gives you:
Traceability into all actions performed in DSS
The ability to understand the history of each object
The ability to revert changes
For more details, see Version control of projects
Importing Python and R code¶
If you have code that has been developed outside of DSS and is available in a Git repository (for example, a library created by another team), you can import this repository (or a part of it) in the project libraries, and use it in any code capability of DSS (recipes, notebooks, webapps, …)
For more details, see Importing code from Git in project libraries
Importing Jupyter Notebooks¶
If you have Notebooks that have been developed outside of DSS and are available in a Git repository, you can import these Notebooks in a DSS project. You can modify them inside DSS and push back the changes to the remote repository.
For more details, see Importing Jupyter Notebooks from Git
Developing plugins¶
When developing plugins, each plugin is a Git repository. You can view the history, revert changes, use branches, and push/pull changes from remote repositories.
For more details, see Git integration in the plugin editor.
Importing plugins¶
If you have developed a plugin on a DSS instance and have pushed your plugin to a Git repository, you can import this plugin on another DSS instance directly from the Git repository.
For more details, see Installing plugins
Working with remotes¶
All integration points explained above include the ability to interact with remote repositories (either pull-only or pull-and-push depending on the cases).
This section explains how you can work with remote repositories.
DSS always uses the git command-line client to work with remote repositories, in non-interactive mode.
This applies to all DSS Git remote features, including:
Project version control
Git references in project libraries
Imported Jupyter Notebooks linked to a Git remote
Plugin development remotes
Remote Git access is controlled in two layers:
Administrators define which remote repositories users may access through Git group rules
Users can then authenticate to allowed SSH remotes with their personal SSH keys
Interaction with SSH-based remotes can use:
Per-user SSH keys managed in DSS
Or the default system-level SSH behavior configured for the DSS server
HTTPS-based remotes are also supported. In that case, the UNIX account running DSS must have credentials available for the target repository, for example through the Git credentials cache.
Setup¶
To use personal SSH authentication for Git remotes, an administrator must first enable it, then each user can generate or import their own SSH key in DSS.
Administrator setup¶
Git remote access is configured in Administration > Settings > Git > Group rules.
Rules are evaluated on a first-match basis: the first rule matching both the user’s groups and the remote URL is applied.
Each rule can define:
Whether Git is allowed for the group
A whitelist of allowed remote URLs
Additional Git configuration options
Whether DSS controls the SSH command
Whether per-user SSH keys are allowed for that rule
An alternate home directory for Git configuration overrides
To let users authenticate with their own SSH keys, the matching rule must:
Allow Git
Match the remote repository URL
Have Let DSS control SSH command enabled
Have Allow per-user SSH keys enabled
If a rule denies Git, no remote operation is allowed. If a rule allows Git but disables per-user SSH keys, DSS falls back to the default system-level SSH behavior for that rule.
User setup¶
Users manage their Git SSH keys in Profile > Credentials > SSH.
Users can:
Generate a new SSH key pair directly in DSS
Import an existing private key
Copy the corresponding public key
Reorder their keys
Define a Git repos whitelist regex on each key
The public key can then be added on the Git hosting platform, for example as a deploy key or a personal SSH key, depending on the Git provider and the desired scope.
Private keys and passphrases are stored encrypted in DSS user credentials. DSS only exposes the public key and the fingerprint back to the user interface.
Key selection rules¶
For a given SSH remote:
DSS first applies the matching Git group rule
DSS then filters the user’s SSH keys using the key-level Git repos whitelist regex
An empty regex behaves as a catch-all and matches all repositories
Matching keys are sorted by their configured order and loaded into an SSH agent for the Git command.
If no user key matches, DSS uses the default system-level SSH behavior
This means that Git group rules and user key regexes are separate filters:
Group rules control whether the repository may be accessed at all
User key regexes control which of the user’s keys are considered for that repository
Legacy and admin-managed setup¶
Per-user SSH keys are the recommended way to authenticate users to SSH remotes.
However, DSS can still use the default system-level SSH configuration of the DSS server. This remains useful for legacy or admin-managed setups, for example when administrators configure SSH keys or Git settings outside DSS.
If you rely on system-level SSH behavior, make sure that:
The UNIX account running DSS can connect to the repository without any interactive prompt
The SSH host key of the remote server has already been validated
Configuration and security¶
Interaction with remote repositories is still executed by the DSS server in non-interactive mode.
Per-user SSH keys do not bypass Git group rules. Users can only use their personal SSH keys for repositories allowed by the matching rule in Administration > Settings > Git.
If no rule matches for a given group, access to Git remotes is denied to this group. It is often desirable to have a catch-all rule as the last rule, i.e. a rule without a group name that catches users not handled by previous rules.
Warning
Never use .* as a whitelisted URL, because that allows the user to clone local repositories as the dssuser, which can be
abused to read folders (as the dssuser) that a user shouldn’t be allowed to read.
The default value when adding a new rule prevents this.
Example 1: Allow repository URLs explicitly per group¶
If you want:
“group1” to be able to work with remotes “remote1a” and “remote1b”
“group2” to be able to work with remote “remote2”
All other groups to be denied access to any remote
Configure two rules:
Group=group1, URLs whitelist = 2 entries, “remote1a” and “remote1b”
Group=group2, URLs whitelist = 1 entry, “remote2”
If you want:
“group1” to be able to work only with remote “remote1”
All other groups to be able to work with remote “remote2”
Configure two rules:
Group=group1, URLs whitelist = 1 entry, “remote1”
Group=<empty>, URLs whitelist = 1 entry, “remote2”
Example 2: Use admin-managed SSH behavior per group¶
This is useful if you want administrators to manage SSH configuration outside DSS instead of relying on per-user SSH keys.
If you want:
“group1” to be able to work with any remote, but with SSH key “/home/dataiku/.ssh/group1-key”
“group2” to be able to work with any remote, but with SSH key “/home/dataiku/.ssh/group2-key”
All other groups to be denied access to any remote
Configure two rules:
Group=group1, URLs whitelist = default value, add a configuration option
"core.sshCommand" = "ssh -i /home/dataiku/.ssh/group1-key -o StrictHostKeyChecking=yes"Group=group2, URLs whitelist = default value, add a configuration option
"core.sshCommand" = "ssh -i /home/dataiku/.ssh/group2-key -o StrictHostKeyChecking=yes"
Note
On Dataiku Cloud, SSH keys created via the SSH extension are available by default to the DSS instance.
To specify a custom SSH key, you must also add the -F none option so that SSH will not load the default configuration file, otherwise SSH keys listed by default will also be used.
So a full configuration would look like: "ssh -i /home/dataiku/.ssh/group1-key -o StrictHostKeyChecking=yes -F none".
Testing SSH access¶
Users can validate their SSH setup from Profile > Credentials > SSH with Test your SSH keys with your repositories.
The test accepts one or more repository URLs, separated by a newline or a ;.
For each repository, DSS:
Checks whether the repository is allowed by the matching Git group rule
Tests each matching per-user SSH key individually
Tests the final DSS behavior using all matching user SSH keys together with the default system SSH keys
The overall status is successful only when both conditions are met:
The repository is allowed by Dataiku Git access rules
Read and write access are confirmed
When testing per-user SSH keys specifically, DSS also indicates whether:
Per-user SSH keys are allowed by the matching Git group rule
At least one user SSH key matches the repository
The read and write checks are performed as follows:
Read access is validated with a shallow clone
Write access is validated with
git push --dry-run
If a user’s key does not match the repository because of its whitelist regex, it is shown as skipped rather than failed.
Troubleshooting¶
“Unknown Host Key” issues¶
The first time you push to a remote, you might encounter an “UnknownHostKey” error. Because DSS enforces strict host key checking, the SSH host key of the remote server must already be known by the DSS server.
You need to log into the DSS server and run a single ssh or remote Git command to the origin you want to talk with in order to retrieve and verify the host key. The key will then be added to the relevant known_hosts file and DSS can connect afterwards.
For example if you want to push to git@myserver.com:myrepo and get an UnknownHostKey error, log in to the server and run ssh git@myserver.com. You will get a prompt to accept the host key. Accept it and you can then work with this remote.
Other common issues¶
If a repository test or Git remote operation fails:
Check that the repository URL matches an allowed Git group rule
Check that the matching rule allows per-user SSH keys
Check that at least one user SSH key matches the repository regex
Use the repository test tool in Profile > Credentials > SSH to distinguish blocked repositories, regex mismatches, and read or write access failures