Installing Python packages¶
Any Python package can be used in DSS. There is no restriction to which package can be installed and used.
The recommended way to install your own Python packages is to install them in a code environment.
Additional prerequisites¶
Some Python packages may require additional system dependencies if they include native code. In particular, you may need to install system development tools, the development package for the Python interpreter itself, and additional development libraries.
If you get an error when installing a Python package, please refer to code environment troubleshooting.
Installing in a specific code environment (recommended)¶
Please see Operations (Python).
Installing in the built-in DSS environment (not recommended)¶
In addition to user-controlled code environments, DSS has its own built-in Python virtual environment, dedicated to run the system Python components of DSS. It is possible, although not recommended, to install your own packages in that built-in environment.
Installing packages in the built-in environment requires shell access on the host running DSS and can only be performed by DSS administrators.
In Dataiku Cloud, the built-in environment is managed so it is not possible to install your own packages. Please use a code environment to install non-default packages.
Warning
Please pay attention to the following notes:
The built-in Python environment uses the Python virtualenv mechanism. Importantly, this implies that in order to install packages in the built-in environment, you must NOT use the pip or python commands of your system, but use the pip or python commands of the DSS virtualenv.
The built-in Python environment uses Python 2.7, 3.6 or 3.7, as chosen at installation time. If you require another version of Python, please use a code environment.
A number of packages are preinstalled in the built-in environment. Modifying the version of these packages is not supported and may result in causing DSS to stop functioning. Notably, you must not change the version of the
pandas
,numpy
andscikit-learn
packages in the built-in environment. We upgrade these dependencies of DSS when releasing a new version, after they are properly qualified and we made sure everything works together.The additional Python packages installed by
DATA_DIR/bin/pip
or added toDATA_DIR/lib/python
are preserved by DSS upgrades.
There are three kinds of installation:
Python packages installed by pip;
Python packages installed by the python setup.py install command;
custom Python packages, that actually do not need to be installed but only copied in the DSS Python libraries folder.
Python packages available through pip¶
Many python packages can be installed by pip, the python package installer. This is the easiest and recommended way of installing Python packages.
First open a terminal and go to the DSS data directory. To use the DSS pip, you must use the bin/pip command. For instance, to know which Python packages are currently available in DSS you can run the command:
cd DATA_DIR
./bin/pip list
And to install a package:
cd DATA_DIR
./bin/pip install package_name
If everything went well, you should see at the end of your command:
Successfully installed package_name
Cleaning up...
Installing without Internet access¶
Here is the standard way to install a pip Python package on a server with no internet access. See also the documentation about the format of requirements.txt.
Installing a “python setup.py install” package¶
Some packages are not available through pip, and must be installed from source with the python setup.py install
command.
Here’s how you can proceed in such a case. First open a terminal and go to the DSS data directory.
Then instead of running python setup.py install
, run the following command:
cd DATA_DIR
./bin/pip install -e package_directory
where package_directory refers to the path to the package source directory, which contains the setup.py file.
Installing custom Python packages¶
Note
This will make your custom Python packages globally available for all Python code running with DSS.
If you have custom python code, for instance a module with user-defined functions and classes, you can copy them in the lib/python subdirectory of the DSS data directory. Then you will be able to import them in all Python recipes or notebooks within DSS.