Installing a Govern node¶
You need to manually install a Govern node if you plan to use Dataiku governance capabilities. See AI Governance for more information.
The process of installing a Govern instance is very similar to a regular DSS installation, except for the database requirement below. Requirements and Installing a new DSS instance thus remain mostly valid.
Database requirements¶
Govern is based on a PostgreSQL 15+ database for the storage of data. We recommend using an up-to-date minor version of PostgreSQL database, which will include the latest fixes. A dedicated database and user need to be created on the PostgreSQL instance for Govern:
CREATE USER <govern_user> WITH ENCRYPTED PASSWORD '<govern_pwd>';
CREATE DATABASE <govern_db> OWNER <govern_user>;
Where <govern_user>, <govern_pwd> and <govern_db> are the values of your choice.
Important
We recommend regular PostgreSQL backups. Additionally, always perform a fresh backup before a Govern migration to ensure you can roll back the automatic schema changes if needed.
Installation¶
Unpack the kit, just like for a design node.
Then from the user account which will be used to run Dataiku Govern, enter the following command:
dataiku-dss-VERSION/installer.sh -t govern -d DATA_DIR -p PORT -l LICENSE_FILE
Where:
DATA_DIRis the location of the data directory that you want to use. If the directory already exists, it must be empty.PORTis the base TCP port to be used for Govern.LICENSE_FILEis the path to your DSS license file.
In short, all installation steps are the same as for a design node, you simply need to add -t govern to the installer.sh command-line.
Dependencies handling, enabling startup at boot time, and starting the govern node, work exactly as for the design node.
Post-installation steps¶
Before starting Govern, the PostgreSQL database connection needs to be setup in the settings.
Edit DATA_DIR/config/dip.properties and add the connection setting there:
psql.jdbc.url=jdbc:postgresql://<psql_host>:<psql_port>/<govern_db>?currentSchema=<govern_schema>
psql.jdbc.user=<govern_user>
psql.jdbc.password=<govern_pwd>
Where <govern_user>, <govern_pwd> and <govern_db> should be replaced with the value used previously to create the user and database for Govern.
In case there’s a specific schema to be used for govern, it can be specified with ?currentSchema=<govern_schema>. This is optional, and this part may be removed from the URL if default schema configured in the database is to be used.
<psql_host> and <psql_port> should point to a running PostgreSQL server.
In order to avoid writing a password in cleartext in the configuration file, encrypt it first using:
DATA_DIR/bin/govern-admin encrypt-password <govern_pwd>
Use the encrypted password string (starting with e:AES:) in the psql.jdbc.password field.
Finally, for bootstrapping the initial configuration of govern, issue the following command (only first time after kit installation):
DATA_DIR/bin/govern-admin init-db
Govern can then be started with the standard command:
DATA_DIR/bin/dss start
Connection pool configuration¶
Important
PostgreSQL Server Requirement
Configure the max_connections setting of your PostgreSQL server to at least 500. High max_connections have very low cost on PostgreSQL and there is no significant drawback to high max_connections. Too low max_connections can fully prevent Govern from working. There is no direct correlation between “instance size”, “what jobs do” and required connection count. 500 is sufficient for almost all instances.
Configuration Steps¶
Warning
We highly discourage changing any of the following settings without guidance from Dataiku Support.
Stop the Govern instance.
Edit
config/general-settings.jsonand find the top-level"datasourceConnectionSettings"key.
Fill it out as follows:
"datasourceConnectionSettings": { "connectionTimeoutMS": 30000, "minimumIdle": 50, "maximumPoolSize": 50, "idleTimeoutMS": 600000, "maxLifetimeMS": 1800000, "leakDetectionThresholdMS": 1800000 }
Save the file, then start the Govern instance.