Using managed EKS clusters

Initial Setup

Install the EKS plugin

Install the EKS plugin from the store

Prepare your local aws, docker and kubectl commands

Follow AWS documentation to make sure that:

  • Your local (on the DSS machine) aws command has credentials that give it write access to ECR and full control on EKS
  • Your local (on the DSS machine) kubectl command is installed
  • Your local (on the DSS machine) docker command is installed and can build images

Create base images

Build the base image as indicated in Setting up (Kubernetes).

Create a new execution config

In Administration > Settings > Containerized execution, add a new execution config of type “Kubernetes”.

  • The image registry URL is the one given by aws ecr describe-repositories, without the image name. It typically looks like XXXXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/PREFIX, where XXXXXXXXXXXX is your AWS account ID, us-east-1 is the AWS region for the repository and PREFIX is an optional prefix to triage your repositories.
  • Set “Image pre-push hook” to “Enable push to ECR”

Cluster configuration

Connection

The connection is where you define how to connect to AWS. Dataiku recommends that you leave this empty, and instead use AWS credentials in ~/.aws/credentials (i.e. where the aws command finds them).

This can be defined either inline in each cluster (not recommended), or as a preset in the plugin’s settings (recommended).

Network settings

EKS requires two subnets in the same VPC. Your AWS administrator needs to provide you with two subnet identifiers. We strongly recommend that these subnets reside in the same VPC as the DSS host. Else, you’ll need to manually setup some peering and routing between VPCs.

Additionally, you need to indicate security group ids. These security groups will be associated to the EKS cluster nodes. The networking requirements is that the DSS machine should have full inbound connectivity from the EKS cluster nodes. We recommend that you use the default security group.

This can be defined either inline in each cluster (not recommended), or as a preset in the plugin’s settings (recommended).

Cluster nodes

This is where you define the number and type of nodes of the cluster.