Upgrading a DSS instance¶
On macOS, upgrade your instance by following the instructions on https://www.dataiku.com/product/get-started/mac/, and install directly over the existing application. It’s still a good idea to make a backup of the data directory first.
In the rest of this procedure, DATA_DIR denotes the location of the DSS Data directory.
For each version of DSS, we publish Release notes, which indicate the detailed limitations, attention points and notes about release. We strongly advise that you read all release notes for the new DSS version before starting the upgrade.
Notably, some machine learning models often need to be retrained when upgrading between major DSS upgrades.
This documentation explains how to upgrade a single DSS instance. After the upgrade completes, it is not possible to rollback the upgrade. We therefore strongly advise that you take a backup of the whole DATA_DIR prior to starting the upgrade procedure
Some people perform upgrades by:
Creating a new DSS instance
Exporting projects from the old instance
Importing the projects into the new instance
Then only shutting down the old DSS instance
We do not recommend that you use this approach for the following reasons:
It is much slower and requires much more operations than an instance clone
While a project export carries all important parts of the projects, some things are NOT part of a project export and will be lost. This includes files written from Jupyter notebooks, SQL notebooks results, and the whole “state” of the Flow. In other words, all incremental computation state will be lost and all datasets / partitions will need to be recomputed.
If you want to keep the original instance up and running while trying the migration, please see the following procedure.
Some people prefer to keep an old instance running and to clone it to a new DSS instance that will be upgraded to the new version.
This requires a few additional migration operations and care:
If you are going to run it on the same machine, keep in mind that each instance needs its own block of 10 consecutive TCP ports. Thus, the new instance needs to be installed on a different port range
installidflag of the new instance is recommended to avoid conflicts.
The new instance will run all scenarios just like the old one. This could lead to corrupted data
If Graphite reporting is enabled, you need to change the prefix for the new instance in order not to corrupt the metrics.
We recommend that you get in touch with your Dataiku Customer Success Manager before such a procedure.
In any case, the path would be “duplicate the instance, migrate ports and DATA_DIR, upgrade the new instance” (copying DATA_DIR between DSS instances of distinct versions is not supported).
Before upgrading, it is very highly recommended to backup the whole content of the data directory.
Stop the old version of DSS
Unpack the distribution tarball in the location you have chosen for the new installation directory.
cd SOMEDIR tar xzf /PATH/TO/dataiku-dss-NEWVERSION.tar.gz # This creates installation directory SOMEDIR/dataiku-dss-NEWVERSION for the new version
dataiku-dss-NEWVERSION/installer.sh -d DATA_DIR -u
Like for normal install, DSS will check for missing system dependencies, and ask you to run a dependencies installation command with superuser privileges if needed.
DSS will ask you to confirm migration of the existing data directory
If R installation has been performed (see: R integration), you must perform again the “install-R-integration” step after upgrade.
If graphics exports have been enabled, you must replay the same installation procedure
If you used standalone libraries for Hadoop and/or Spark, you need to rerun the corresponding install procedure.
For some major upgrades, you may need to rebuild the code environments that you already have. The reason is that core dependencies may have been updated, and DSS may not be compatible with the old core dependencies anymore.
If you are using code environments with containerized execution, make sure that all your code env images have been rebuilt and you will need to update all your code environments accordingly (for the appropriate selected container configurations).
For more details, please check the release notes of your version
For some major upgrades, you may need to retrain some of the machine learning models.
Note that in these cases, the packages deployed in an API node also need to be regenerated on DSS and redeployed on the API node.
For more details, please check the release notes of your version.