Instance templates and setup actions¶

Instance template represent common configuration for instances, reusable across several instances. It is required to use an instance template to launch an instance. Instances stay linked to their instance template for their whole lifecycle.

What is configured through the instance templates includes, but is not limited to:

Identities able to SSH to the instance
Cloud credentials for the managed DSS
Installation of additional dependencies and resources
Pre-baked and custom configurations for DSS

To create, edit and delete templates, head to the Instance templates in the left menu of FM. The following document explains each section of the configuration.

SSH Key ¶

Use this field to enter a public SSH key that will be deployed on the instance. This is useful for admins to connect to the machine with SSH. This field is optional.

This key will be available on the centos account, i.e. you will be able to login as centos@DSS.HOST.IP

User-assigned service accounts ¶

In most cases, your DSS instances will require GCP credentials in order to operate. These credentials will be used notably to integrate with GAR and GKE.

The recommended way to offer GCP credentials to DSS instance is the use of a service account.

Setup actions ¶

Setup actions are configuration steps ran by the agent. As a user, you create a list a setup actions you wish to see executed on the machine.

Add authorized SSH key ¶

This setup action ensures the SSH public key passed as a parameter is present in ~/.ssh/authorized_keys file of the default admin account. The default admin is the centos user with currently provided images.

Install system packages ¶

This setup action is a convenient way to install additional system packages on the machine should you need them. It takes a list of Almalinux packages as only parameter. The package name or the package URL can be used.

Set advanced security ¶

This setup actions ensures DSS add security related HTTP headers. HSTS headers can be toggled separately.

Install a JDBC driver ¶

Instances come pre-configured with drivers for PostgresSQL, MariaDB, Snowflake, AWS Athena and Google BigQuery. If you need another driver, this setup action eases the process. It can download a file by HTTP, HTTPS, from S3 bucket or from an ABS container.

Install JDBC Driver parameters¶
Parameter	Expected value
Database type	The type of database you will use. This parameter has no actual effect, it is used for readability.
URL	This field expects the full address to the driver file or archive. Download from HTTP(S) endpoint: http(s)://hostname/path/to/file.(jar\|tar.gz\|zip) Redirections are solved before download. Download from a S3 bucket: s3://BUCKET_NAME/OBJECT_NAME Download from Azure Blob Storage: abs://STORAGE_ACCOUNT_NAME/CONTAINER_NAME/OBJECT_NAME Use a driver available on the machine: file://path/to/file.(jar\|tar.gz\|zip)
Paths in archive	This field must be used when the driver is shipped as a tarball or a ZIP file. Add here all the paths to find the JAR files in the driver archive. Paths are relative to the top of the archive. Wildcards are supported. Examples of paths: .jar subdirA/.jar subdirB/*.jar
HTTP Headers	List of HTTP headers to add to the query. One header per line. Header1: Value1 Header2: Value2 Parameter ignored for all other kinds of download.
HTTP Username	HTTP If the endpoint expect Basic Authentication, use this parameter to specify the user name. Azure If the instance have several Managed Identities, set the client_id of the targeted one in this parameter. To connect to Azure Blob Storage with a SAS Token (not recommended), set the value of this parameter to token.
HTTP Password	HTTP If the endpoint expect Basic Authentication, use this parameter to specify the password. Azure To connect to Azure Blob Storage with a SAS Token (not recommended), store the token value in this parameter.
Datadir subdirectory	For very specific use-cases only, we recommend to let it empty.

Run Ansible tasks ¶

This setup action allows you to run arbitrary ansible tasks at different point of the startup process.

The Stage parameter specificies at which point of the startup sequence it must be executed. There is three stages:

Before DSS install: These tasks will be run before the agent installs (if not already installed) or upgrades (if required) DSS.
After DSS install: These tasks will be run once DSS is installed or upgraded, but not yet started.
After DSS is started: These tasks will be run once DSS is ready to receive public API calls from the agent.

The Ansible tasks allows you to Write a YAML list of ansible tasks as if they were written in a role. Available tasks are base Ansible tasks and Ansible modules for Dataiku DSS. When using Dataiku modules, it is not required to use the connection and authentication options. It is automatically handled by FM.

Some additional facts are available:

dataiku.dss.port
dataiku.dss.datadir
dataiku.dss.version
dataiku.dss.node_id: Identifier matching the node id in Fleet Manager, unique per fleet
dataiku.dss.node_type: Node type is either design, automation, deployer or govern
dataiku.dss.logical_instance_id: Unique ID that identifies this instance in the Fleet Manager
dataiku.dss.instance_type: The cloud instance type (also referred to as instance size) used to run this instance
dataiku.dss.was_installed: Available only for stages After DSS install and After DSS startup
dataiku.dss.was_upgraded: Available only for stages After DSS install and After DSS startup
dataiku.dss.api_key: Available only for stage After DSS startup

Example:

---
- dss_group:
    name: datascienceguys
- dss_user:
    login: dsadmin
    password: verylongbutinsecurepassword
    groups: [datascienceguys]

Ansible is ran with the unix user held by the agent, and can run administrative tasks with become.

Setup Kubernetes and Spark-on-Kubernetes ¶

This task takes no parameter and pre-configures DSS so you can use Kubernetes clusters and Spark integration with them. It prepares the base images and enables DSS Spark integration.

Add environment variables ¶

This setup action enables to add environment variables that can be used in DSS. These variables are stored in bin/env-site.sh file.

Add properties ¶

Ansible is ran with the unix user held by the agent, and can run administrative tasks with become.

Add SSH keys ¶

This setup action enables to add SSH keys to ~/.ssh folder that can be used to connect to other machines from the DSS one.

To generate your public key on Dataiku Cloud:

go to your launchpad > extension tab > add an extension,
select the SSH integration feature,
enter the hostnames of the remote that this key is allowed to connect to,
click to validate and generate the key.

Dataiku Cloud will then automatically generate the key and run a command to the origin to get (and verify) the SSH host key of this server. You can now copy the generated key and add it to your hosts. To find this key in the future or generate a new one go to the extension tab and edit the SSH Integration feature.

Setup proxy ¶

This setup action enables to configure a proxy in front of DSS.

The default value for the NO_PROXY variable is: localhost,127.0.0.1,169.254.169.254,metadata,.google.internal.

169.254.169.254 is the IP used by GCP to host the metadata service.

Add Certificate Authority to DSS truststore ¶

This setup action is a convenient way to add a Certificate Authority to your DSS instances’ truststore. It will then be trusted for Java, R and Python processes. It takes a Certificate Authority in the public PEM format. A chain of trust can also be added by appending all the certificates in the same setup action.

Example (single CA):

-----BEGIN CERTIFICATE-----
(Your Root certificate authority)
-----END CERTIFICATE-----

Example (Chain of Trust):

-----BEGIN CERTIFICATE-----
(Your Primary SSL certificate)
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
(Your Intermediate certificate)
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
(Your Root certificate authority)
-----END CERTIFICATE-----

Warning

The name must be unique for each CA as it is used to write the CA in your instances.

Install code env with Visual ML preset ¶

This setup action installs a code environment with the Visual Machine Learning and Visual Time series forecasting preset.

Enable Install GPU-based preset to install the GPU-compatible packages. Otherwise, the CPU packages are installed.

Leaving Allow in-place update enabled means that if there is a newer version of the preset the next time the setup action runs, and it is compatible with the previously installed code environment, said code environment is updated in place. Otherwise, a new code environment is created with the updated preset.