Concepts

What is a Code Studio

A Code Studio is a personal space for running a web-based IDE, and optionally one or several web applications.

Code Studios run in Kubernetes, and require setup of Elastic AI computation.

Some of the capabilities made possible by Code Studios include:

  • Editing and debugging Python recipes in Visual Studio Code

  • Debugging Python code in JupyterLab

  • Developing custom web applications using the Streamlit framework

  • Editing and debugging R recipes in RStudio Server

Within each instance of a Code Studio, you have full access to the terminal, can install any package, perform any action, save your IDE preferences, …

Each Code Studio can be started and stopped.

Each Code Studio is a separate container and has its separate filesystem. It cannot access the DSS host filesystem.

Note

In Dataiku Online, a space-admin needs to activate the feature « Code-Studio » in the Launchpad (extension tab > add an extension). The feature will be ready to use without needing any additional requirements.

Code Studio templates

Before you can run a Code Studio, the DSS administrator must set up a Code Studio template.

Each template provides a specific development environment and optional additional dependencies.

For example, you could have:

  • one template providing RStudio Server

  • another template containing the Visual Studio Code IDE + the streamlit framework for developing advanced visualizations

Users can then, from these templates, spawn personal instances of the development environments, called Code Studios. To follow our example, these Code Studios can then be used to edit an existing R recipe in DSS, or to develop a streamlit webapp using Visual Studio Code.

The template consists of blocks that define what will be available in the Code Studio.

Synchronized files

The Code Studios run separately from DSS, in the Kubernetes cluster.

Some files are then synchronized between DSS and the Code Studio, in order for recipes / project libraries / … to be available within each Code Studio.

Files are:

  • synchronized from DSS to Code Studio when the Code Studio starts

  • synchronized from Code Studio to DSS when the Code Studio stops

  • synchronized both ways (with conflict detection) when clicking the “Synchronize files” button in the UI of the Code Studio

Each type of file is synchronized to a particular location in the Code Studio, which is overridable in the template settings.

All versioned files are under the control of the DSS instance’s git, so it’s recommended to avoid putting large files or binary files in versioned areas. Instead, large files should preferably go into non-versioned (resources) areas.

Project libraries

The usual Project libraries (see Reusing Python Code and Reusing R Code) are available to all Code Studios of the project. Project libraries are versioned in the version control of the project.

Project libraries can be edited outside of a Code Studio through the “Libraries > Libraries” menu.

Project libraries are available at /home/dataiku/workspace/project-lib-versioned in the Code Studio by default.

Note

Project libraries can also have non-Python and non-R files, such as stylesheets, small static files, … For large files, use Project resources instead.

Project resources

Project resources are non-versioned files that are global to a project, and available to all Code Studios of the project. Project resources are useful for storing artifacts that may be used by several Code Studios in the project, and that should not be versioned (usually because they are large), such as images.

Project resources are available at /home/dataiku/workspace/project-lib-resources in the Code Studio by default.

Code Recipes

The code of code recipes (Python, R, SQL, Scala) is available to all Code Studios of the project, with one file per recipe.

The code is exactly what you see when editing a given recipe in the DSS UI, for example after opening it from the Flow.

Code Recipes are available at /home/dataiku/workspace/recipes in the Code Studio by default.

Code Studio versioned files

These files are specific to each Code Studio and are not shared between Code Studios.

These files should be used for Code Studios that define an application (such as a Streamlit Application), for the code of the application itself. For example, the default “Streamlit” block for templates puts the code of the application there.

Code Studio versioned files can be edited in DSS, in “Files > Versioned” in the UI of the Code Studio.

Code Studio versioned files are available at /home/dataiku/workspace/code_studio-versioned in the Code Studio by default.

Code Studio resource files

These files are specific to each Code Studio and are not shared between Code Studios.

These files should be used for Code Studios that define an application (such as a Streamlit Application), for storing artifacts that are needed by the Code Studio and that should not be versioned (usually because they are large), such as images.

Code Studio resource files can be edited in DSS, in “Files > Resources” in the UI of the Code Studio.

Code Studio resource files are available at /home/dataiku/workspace/code_studio-resources in the Code Studio by default.

User config files

These files are shared across all Code Studios of a user and are not shared between users.

They are useful to store user settings.

In the built-in blocks for Visual Studio Code, RStudio Server and JupyterLab, this folder is used to store the IDE configuration.

User config files are available at /home/dataiku/workspace/user-versioned in the Code Studio by default.

User resource files

These files are shared across all Code Studios of a user and are not shared between users. They are not versioned.

They are useful to store user artifacts that should not be versioned, such as plugins, tools, …

User resource files are available at /home/dataiku/workspace/user-resources in the Code Studio by default.

Requirements

Note

We recommend using Dataiku Cloud Stacks or Dataiku Online, which fulfills all Code Studios requirements out of the box.

In order to use Code Studios:

  • Elastic AI computation must be fully set up (which includes having a containerized execution config, having the ability to build images and push them to the registry, and having a cluster defined)

  • Your kubectl version should be at least 1.23.

Technical details

At its core, a Code Studio is a Kubernetes pod running an HTTP server in a Kubernetes cluster. DSS starts it and shuttles files between the DSS instance and the container inside the pod, then forwards requests on the DSS UI to the HTTP server.

To start and connect to a pod in a Kubernetes cluster, DSS must:

  • prepare container images

  • prepare Kubernetes resource definitions to start a pod in the cluster

  • identify ports on which the pod serves its app

All this is defined within the Code Studio template. Additionally, the template defines which files from the DSS filesystem are shared with the container in the cluster.

Once a Code Studio template has been prepared and is available to some users, they can start creating Code Studios to run the application(s) defined in that template, and access it via DSS. Here, “application” is something served by an HTTP server, and the runtime is hosting the HTTP server. Each Code Studio spawns some Kubernetes resources, typically a deployment, leading to a pod somewhere in the cluster running the application(s). DSS then synchronizes the files from the DSS filesystem to the container in the pod, as defined in the template, and starts the web server in the container.

Once running, the user can connect to all applications (that is, their HTTP servers) running in the container, use them as appropriate to edit files or perform analyses, and finally synchronize back modified files from the container to the DSS filesystem.