MapR¶
Warning
REMOVED Support for MapR is REMOVED. We recommend that users plan a migration toward a Kubernetes-based infrastructure.
DSS used to support MapR clusters with the following versions:
MapR Core Components: versions 5.2.0 to 6.1.0
MapR Ecosystem Pack (MEP): versions 3.0.x, and 4.1.x to 6.0.0
Security¶
Connecting to MapR clusters secured with MapR security is supported through a custom installation sequence described below.
User isolation is not supported.
Connecting to secure MapR clusters¶
DSS can connect to secure MapR clusters through a permanent service ticket, issued ahead of time by a cluster administrator,
and accessed through environment variable MAPR_TICKETFILE_LOCATION
.
The installation sequence thus becomes the following:
Open a shell session to a cluster administrator account (typically
mapr
).Create a permanent service ticket for the service account used by DSS as follows:
maprlogin generateticket -type service -user DSS_USER -out DSS_TICKET_FILE
This creates a permanent service ticket (default duration 10 000 years). You can further adjust this with options to maprlogin.
You can check the service ticket generated with:
maprlogin print -ticketfile DSS_TICKET_FILE
Store this ticket file in a location accessible, and private to, the DSS service account.
Define the following environment variable in the persistent session initialization file for the DSS service account (.bash_profile, .profile or equivalent):
export MAPR_TICKETFILE_LOCATION=/ABSOLUTE/PATH/TO/DSS_TICKET_FILE
Switch to the DSS service account, and run the DSS installation or upgrade command as usual:
/PATH/TO/dataiku-dss-VERSION/installer.sh ARGS ...
This script will detect that the cluster is secure, and warn you that it will not automatically run the Hadoop integration step.
Run the install-hadoop-integration command with no arguments:
/PATH/TO/DSS_DATADIR/bin/dssadmin install-hadoop-integration
This script will warn you that you did not specify a Kerberos principal and keytab though the cluster is secure. Type <Enter> to confirm.
The Hadoop integration step should proceed without errors, using the ticket file to authenticate to the cluster.
You can then run the Spark and/or R integration steps using the standard procedures, as needed.
Start DSS and connect to the user interface using an administrator account:
/PATH/TO/DSS_DATADIR/bin/dss start
Complete the installation by configuring HiveServer2 and optionally Impala connection parameters as suitable for your cluster.
If using the default HiveServer2 authentication mode for secure MapR (MapR-SASL), the HiveServer2 connection parameters should be:
Principal : leave empty
Extra URL :
auth=maprsasl;saslQop=auth-conf
Limitations¶
Using S3 as a Hadoop filesystem (see Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS)) is not supported
Validation of Hive recipes with “UNION” or “UNION ALL” statements is not possible