SharePoint Online

Note

For documentation regarding the SharePoint Online Plugin, go to the section below. The following explanation relates to the Sharepoint connection type natively available in DSS, which should serve most integration needs (unless otherwise indicated).

Dataiku can interact with SharePoint Online to:

  • read and write datasets based on SharePoint Online lists

  • read and write datasets based on SharePoint Online stored documents

  • read and write managed folders

There are two types of SharePoint objects that can be leveraged: document libraries and lists.

To interact with a document, you will need to know its SharePoint site, the drive in which it is stored, and its path within this drive.

Dataiku uses the same filesystem-like mechanism when accessing SharePoint Online: when you specify a site and drive, you can browse it to quickly find your data, or you can set the prefix in which Dataiku may output datasets. Datasets on SharePoint thus must be in one of the supported filesystem formats.

To interact with SharePoint lists, you will need to know the site, and the name of the list.

Warning

Sharepoint list views are not supported by the native connector. If you need to read data from a view, please use the Sharepoint Online plugin instead.

Creating a SharePoint Online connection

Creating a Sharepoint online connection gives you a way to authenticate and get data from your Sharepoint instance/tenant. Data in Sharepoint is arranged in one or more sites on the tenant.

To connect, you will also need to create an App on the Azure portal (an “App registration”), and set up a means to authenticate with it - via a client secret or using a certificate. Info on how to setup connections with authentication options, and different security implications, follows.

Note that the APIs used by the native SharePoint connector are different from those used by the SharePoint Online plugin . So if you already have an App to connect your Dataiku instance to with the plugin, you will either need a new one or update the configuration of the existing one very carefully. Adding a new one is usually much easier to manage.

Connecting using a Certificate

For this method we configure the App on the Azure portal with a certificate, and the corresponding private key is used by DSS to authenticate (using mutual TLS authentication), to create an OAuth2 connection. This is the most secure and recommended method to use.

There are two modes you can use for OAuth2 here, which have different security implications:

  1. Per-user mode - a classic OAuth2 flow (Authorization Code Grant) is used to authenticate each DSS end-user who uses the connection. When the user authenticates, they are redirected from DSS to Azure/Sharepoint and back to establish they have access. This means each user directly using the connection must have a Sharepoint account. The level of access to Sharepoint available to the connection reflects what the user can access in their Sharepoint account - so access control is taken care of mostly automatically. However, bear in mind that when a user imports data into a DSS project with this connection, other users with access to the project will normally still be able to see the data - so the security model on the DSS side still needs to be considered carefully.

  2. Global mode - the access to Sharepoint is authenticated at the level of the Azure App (with an OAuth2 Client Credentials grant). This means the access to Sharepoint available to all users of the connection in DSS reflects the permissions you give the App in Azure. We advise on the permissions to use in the procedure below. Note that this gives an equal level of access to ALL sites in the tenant.

Steps

  • get your certificate and private key ready. If you do not already have one, it can be created for instance by using openssl: openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout private_key.pem -out certificate.pem

  • create a new App registration (Azure Portal > Microsoft Entra ID > App registrations)

  • in the Overview tab, note the Application (client) ID and the Directory (tenant) ID

  • (Per-user mode only) in the Authentication tab, add a Redirect URI:

    • choose Web Applications > Web

    • add a redirect URI of DSS_BASE_URL/dip/api/oauth2-callback

Note

For example if Dataiku is accessed at https://dss.mycompany.corp/, the OAuth2 redirect URL is https://dss.mycompany.corp/dip/api/oauth2-callback

  • in Certificates & secrets, upload your certificate (or certificate.pem created from the first step). Copy the certificate’s thumbprint - if the UI does not allow you to copy all the thumbprint, run openssl x509 -noout -fingerprint -sha1 -in certificate.pem | cut -d ‘=’ -f2 | sed ‘s/://g’ to get this value.

  • navigate to the API permissions tab and add the appropriate permissions.

    • For Per-User mode, Click + Add a permission > Microsoft Graph > Delegated permissions. We advise adding openid, Files.ReadWrite.All, Sites.ReadWrite.All and User.Read.

    • For Global mode, click + Add a permission > Microsoft Graph > Application permissions. If you just need read access use Sites.Read.All, if you need to write to files but not lists use Sites.ReadWrite.All and if you need to write to lists use Sites.manage.All. To write to lists without Sites.manage.All, the option “Truncate instead of delete” can be used for the List datasets in DSS. Note that here we are giving access to ALL sites in your Sharepoint instance.

  • create a new SharePoint Online connection in Dataiku

  • choose “Certificate-Based” as Auth type

  • fill the “Tenant id”, “App id” and “Thumbprint” fields with the fields you noted earlier in the Azure App

  • open the private key file (or private_key.pem created earlier) in a text application and copy/paste its content into the Client certificate (private key) input box. The section to copy starts and ends with “-----BEGIN PRIVATE KEY-----” / “-----END PRIVATE KEY-----“.

  • “Authorization endpoint” (Per-user mode only) should be “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/authorize” for single tenant apps, where <<your tenant id>> must be replaced with your tenant’s guid. In the special case of multi-tenant apps, the endpoint is “https://login.microsoftonline.com/common/oauth2/v2.0/authorize”.

  • “Token endpoint” should be “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/token” for single tenant apps. In the special case of multi-tenant apps, it can be “https://login.microsoftonline.com/common/oauth2/v2.0/token”.

  • “Scope” should be “offline_access User.Read Files.ReadWrite.All Sites.ReadWrite.All Sites.Manage.All” for Per-User mode, or “https://graph.microsoft.com/.default” for Global mode.

  • “Default site” should contain the name for an existing SharePoint site where new managed datasets will be created by default. To find the site name, browse to that site and copy the section of the URL following /sites/. For instance, if the URL looks like “https://my-corp.sharepoint.com/sites/myproject/_layouts/15/viewlsts.aspx?view=14”, the site name is myproject.

  • “Default drive” should contain the name of an existing drive where new managed datasets will be created by default. This drive must belong to the default site. To find the drive name, go to the “Site contents” section and copy the name of the document library containing your drive.

  • “Default path” is the path to the directory within the default drive where managed folders will be created

  • click CREATE to create the connection

  • test the connection. If using Global mode just click the TEST button. For Per-User mode, you connect your account to Sharepoint first - just do the next step for yourself, then come back to the connection and click the TEST button

For Per-User mode, each user will need to do the following to connect their account to Sharepoint:

  • go to user profile > Credentials

  • Find the connection and click the “Connect” button

  • follow the instructions that appear

Connecting using a Client Secret

Here, the App on Azure portal has a “client secret” credential which is used by DSS to authenticate to create an OAuth2 connection. This is less secure than using a certificate but slightly simpler to set up. The security rules in your Sharepoint tenant might disallow this method and there’s no real reason to use it over the certificate method in a production system.

There are two modes you can use for OAuth2 here, which have different security implications (exactly like the Certificate option):

  1. Per-user mode - a classic OAuth2 flow (Authorization Code Grant) is used to authenticate each DSS end user who uses the connection. When the user authenticates, they are redirected from DSS to Azure/Sharepoint and back to establish they have access. This means each user directly using the connection must have a Sharepoint account. The level of access to Sharepoint available to the connection reflects what the user can access in their Sharepoint account - so access control is taken care of mostly automatically. However, bear in mind that when a user imports data into a DSS project with this connection, other users with access to the project will normally still be able to see the data - so the security model on the DSS side still needs to be considered carefully.

  2. Global mode - the access to Sharepoint is authenticated at the level of the Azure App (with an OAuth2 Client Credentials grant). This means the access to Sharepoint available to all users of the connection in DSS reflects the permissions you give the App in Azure. We advise on the permissions to use in the procedure below. Note that this gives an equal level of access to ALL sites in the tenant.

Steps

  • create a new App registration (Azure Portal > Microsoft Entra ID > App registrations). Dataiku will connect with this app

  • in the Overview tab, note the Application (client) ID and the Directory (tenant) ID

  • (Per-user mode only) in the Authentication tab, add a Redirect URI:

    • choose Web Applications > Web

    • add a redirect URI of DSS_BASE_URL/dip/api/oauth2-callback

Note

For example if Dataiku is accessed at https://dss.mycompany.corp/, the OAuth2 redirect URL is https://dss.mycompany.corp/dip/api/oauth2-callback

  • if you selected the “Web” platform earlier, create a client secret for this application (App registration > Certificates & Secrets), note the client (app) secret

  • navigate to the API permissions tab and add the appropriate permissions.

    • For Per-User mode, Click + Add a permission > Microsoft Graph > Delegated permissions. We advise adding openid, Files.ReadWrite.All, Sites.ReadWrite.All and User.Read.

    • For Global mode, click + Add a permission > Microsoft Graph > Application permissions. If you just need read access use Sites.Read.All, if you need to write to files but not lists use Sites.ReadWrite.All and if you need to write to lists use Sites.manage.All. To write to lists without Sites.manage.All, the option “Truncate instead of delete” can be used for the List datasets in DSS. Note that here we are giving access to ALL sites in your Sharepoint instance.

  • create a new SharePoint Online connection in Dataiku

  • choose “Client Secret” as Auth type

  • fill the “Tenant id”, “App id”, and “App secret” fields with the fields you noted earlier in the Azure App

  • “Authorization endpoint” (Per-user mode only) should be “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/authorize” for single tenant apps, where <<your tenant id>> must be replaced with your tenant’s guid. In the special case of multi-tenant apps, the endpoint is “https://login.microsoftonline.com/common/oauth2/v2.0/authorize”.

  • “Token endpoint” should be “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/token” for single tenant apps. In the special case of multi-tenant apps, it can be “https://login.microsoftonline.com/common/oauth2/v2.0/token”.

  • “Scope” should be “offline_access User.Read Files.ReadWrite.All Sites.ReadWrite.All Sites.Manage.All” for Per-User mode, or https://graph.microsoft.com/.default for Global mode.

  • “Default site” should contain the name for an existing SharePoint site where new managed datasets will be created by default. To find the site name, browse to that site and copy the section of the URL following /sites/. For instance, if the URL looks like “https://my-corp.sharepoint.com/sites/myproject/_layouts/15/viewlsts.aspx?view=14” the site name is myproject.

  • “Default drive” should contain the name of an existing drive where new managed datasets will be created by default. This drive must belong to the default site. To find the drive name, go to the “Site contents” section and copy the name of the document library containing your drive.

  • “Default path” is the path to the directory within the default drive where managed folders will be created

  • in credentials, choose the “Credentials mode” - Per User or Global as appropriate.

  • click CREATE to create the connection

  • test the connection. If using Global mode just click the TEST button. For Per-User mode, you connect your account to Sharepoint first - just do the next step for yourself, then come back to the connection and click the TEST button

For Per-User mode, each user will need to do the following to connect their account to Sharepoint:

  • go to user profile > Credentials

  • Find the connection and click the “Connect” button

  • follow the instructions that appear

Connecting using user name and password (Resource Owner Password)

Caution

This method is only possible with very specific kinds of account (managed accounts without MFA) and is considered a less secure and legacy method. The other options are preferred. Additional security measures and configuration steps are likely to be needed to use this method (e.g. creating a specific account with limited access, and an exclusion policy for MFA, and setting a trusted IP).

You can setup a Sharepoint connection with the username and password of a given Sharepoint user pre-entered on the DSS side (this is then used to authenticate with an OAuth2 Resource Owner Password Credentials grant). If the Global mode is used for this option, this allows all DSS end-users to have same the level of access as a specific Sharepoint user, which may be convenient (though note the warning above). Thus the Global mode has a different meaning here to the previous methods (in this case one set of Sharepoint user credentials is used for all DSS users using the connection, for the other methods, the App permissions control access). If Per-user mode is used, each user has to enter their own credentials - but this has no particular advantage over the Per-User methods in the previous sections, so that combination is unlikely to be useful.

As per the warning above, this method requires special conditions and security considerations.

Note

This type of connection is only possible with managed accounts. To know if an account is managed, go to this URl using a web browser, after editing the email address with the one you intend to use: https://login.microsoftonline.com/GetUserRealm.srf?login=your.SharePoint@email.address. The key NameSpaceType should read Managed. Also note the account cannot have MFA enabled.

Steps

  • create a new App registration (Azure Portal > Microsoft Entra ID > App registrations). Dataiku will connect with this app.

  • in Authentication > Advanced settings, allow public client flows

  • navigate to the API permissions tab and add the appropriate permissions. Click + Add a permission > Microsoft Graph > Delegated permissions. We advise adding openid, Files.ReadWrite.All, Sites.ReadWrite.All and User.Read.

  • create a new SharePoint Online connection in Dataiku

  • choose Resource Owner Password as Auth type

  • fill the “Tenant id” and “App id” fields with the fields you noted earlier in the Azure App

  • “Authorization endpoint” should be “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/authorize” for single tenant apps, where <<your tenant id>> must be replaced with your tenant’s guid. In the special case of multi-tenant apps, the endpoint is “https://login.microsoftonline.com/common/oauth2/v2.0/authorize”.

  • “Token endpoint” should be “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/token” for single tenant apps. In the special case of multi-tenant apps, it can be “https://login.microsoftonline.com/common/oauth2/v2.0/token”.

  • “Scope” should be “offline_access User.Read Files.ReadWrite.All Sites.ReadWrite.All Sites.Manage.All”.

  • “Default site” should contain the name for an existing SharePoint site where new managed datasets will be created by default. To find the site name, browse to that site and copy the section of the URL following /sites/. For instance, if the URL looks like “https://my-corp.sharepoint.com/sites/myproject/_layouts/15/viewlsts.aspx?view=14”, the site name is myproject.

  • “Default drive” should contain the name of an existing drive where new managed datasets will be created by default. This drive must belong to the default site. To find the drive name, go to the “Site contents” section and copy the name of the document library containing your drive.

  • “Default path” is the path to the directory within the default drive where managed folders will be created

  • pick a credential mode. You probably want to use “Global” - which means this username access will be shared with all the Dataiku users with access rights to this connection. “Per user” means that each Dataiku user will have to put their SharePoint user name and password in their credential page before using the connection (but as noted above, there’s usually no advantage in using this combination)

  • click CREATE to create the connection

  • test the connection. If using Global mode just click the TEST button. For Per-User mode, you need to provide credentials for yourself - just do the next step for yourself, then come back to the connection and click the TEST button

If the credential mode is “Per user”, this extra steps are necessary for each Dataiku user:

  • go to User profile > Credentials

  • click the “Edit” button next to the new connection name

  • follow the instructions that appear

Advanced connection properties

If you ever encounters timeouts while connecting to SharePoint, it’s possible to configure different timeouts properties in the Advanced connection properties section:

Name

Description

ConnectTimeout

Max time to establish a connection (milliseconds). default 10s

ReadTimeout

Max time waiting for data (milliseconds). default 10s

WriteTimeout

Max time to send data (milliseconds). default 10s

Creating SharePoint Online datasets

From a SharePoint document

After creating your SharePoint Online connection in Administration, you can create datasets from documents stored on SharePoint.

From either the Flow or the datasets list, click on +Dataset > Cloud Storage & Social > SharePoint Document.

  • select the connection

  • select the SharePoint site and the drive in which your files are located

  • click on “Browse” to locate your files

From a SharePoint list

After creating your SharePoint Online connection in Administration, you can create datasets from a SharePoint list.

From either the Flow or the datasets list, click on +Dataset > Cloud Storage & Social > SharePoint List.

  • select the connection in which your lists are located

  • select the SharePoint site and the list

Location of managed datasets and folders

When you create a managed dataset or folder in a SharePoint Online connection, Dataiku will automatically create it within the “Default site”, “Default drive” and the “Default path”.

Below that root path, the “naming rule” applies. See relocation for more information.

SharePoint Online Plugin

This plugin provides a read/write connector to interact with SharePoint Online documents and lists.

How to set up

Important

This plugin is intended to be used with SharePoint Online only. For the other SharePoint Server editions (2013, 2016, 2019), please refer to the unsupported plugin which can be found here.

Option 1: SharePoint Online login - using a SharePoint login directly (deprecated, replaced with option 5)

This option entails storing a set of end-user Sharepoint login credentials in the configuration, so the plugin will authenticate as this user directly.

Caution

This option is currently being deprecated by Microsoft. Please refer to Option 5 for a replacement.

Limitations

This authentication method cannot be used if the Sharepoint account used is set up with MFA or if it belongs to a federated space name.

You can find out whether or not this is the case by going to this URL (after editing it with your SharePoint account email address): https://login.microsoftonline.com/GetUserRealm.srf?login=your.SharePoint@email.address. The key NameSpaceType should read Managed. If it is Federated, you will need to use “Azure Single Sign On” or “Site App Token” instead.

Set up
  1. Find the tenant and site name of the SharePoint you want to sync with Dataiku.

  2. In Dataiku, go to Plugins > Installed > SharePoint > Settings > SharePoint Online Login. There, fill in the details of the SharePoint instance you are trying to sync with Dataiku.

Say a typical URL for the files you want to give access to is https://dataiku.sharepoint.com/sites/rnd/plugins/Shared%20Documents/safe/list.xlsx

  • Tenant is the sub domain preceding sharepoint.com, here dataiku. If your company is using a custom address to access their SharePoint Online, then tenant will be the whole domain name preceded by https://, so for instance https://my-corp.com.

  • Site path is the path to the SharePoint site or sub-site you want to give access to. In this example it would be sites/rnd/plugins

  • Root directory is the path to the highest level directory you want your Dataiku users to have access to. In the current example, Shared Documents/safe will let the user browse any files and folders in the sub-directory safe. Default value for Root directory is Shared Documents, but it can also be left blank, in which case the user can access all the document libraries of the rnd/plugins site.

Option 2: Azure Single Sign On - use OAuth2 to authenticate as Sharepoint users

This option allows the end-users of DSS who also have Sharepoint accounts to authenticate as themselves when connecting to Sharepoint. Thus they will be able to access only the information they have the permission to access in Sharepoint. As SSO via OAuth2 is used (specifically an Authorization Code grant), the users will not have to enter Sharepoint/Azure credentials on the DSS side (redirection to and from Sharepoint is used to authorize the user as per the OAuth2 standard).

Note that the data in datasets imported from Sharepoint will normally still be visible to other DSS users who have access to the same project, so it is still important to manage access correctly at a project level in DSS.

  1. From the Azure Portal, go to Microsoft Entra ID > +Add App registrations create a new App (New registration). Set a name and a redirect URI pointing back to your Dataiku instance. It should follow this structure: https://<<your Dataiku instance domain>>:<<your Dataiku instance port>>/dip/api/oauth2-callback. Unless your Dataiku instance is on a localhost, the URI has to point to a https secured server.

  2. Click on the newly created app. Copy the Application ID. Then go to the Certificates & secrets > New client secret tab. Set a description, choose an expire date, and copy the value of the created secret.

  3. Then navigate to the API permissions tab and add the following delegated permissions: User.Read, AllSites.manage, AllSites.Read, AllSites.Write, MyFiles.Read and MyFiles.Write.

  4. In Dataiku, go to Plugins > Installed > SharePoint > Settings > Azure Single Sign On. There, set paste the App ID and App secret copied from step 1. The authorization endpoint should be https://login.microsoftonline.com/common/oauth2/authorize?resource=https://<<Your Tenant>>.sharepoint.com (make sure to set the appropriate tenant).

  5. Finally, each Dataiku user who needs access to SharePoint Online will have to go to Profile & settings > Credentials from their account. Click the edit button for the SharePoint preset you are configuring. This will redirect you to a Microsoft Single Sign On page. Log in to the account if necessary click Yes.

Option 3: Site App Permissions - using Site App token (deprecated)

An application token can be created to give Dataiku an access to a given SharePoint site path. This allows the users of DSS authorized to use the connection to access a specific site in Sharepoint, but nothing else. Note that only a tenant administrator of the SharePoint site can create this token.

Warning

This option is being discontinued by Microsoft. If you have need for a connection with site-specific access like this, please contact us.

  • Register an app with your SharePoint site by going to site’s URL followed by /_layouts/15/appregnew.aspx. The whole URL should look like this: https://{your tenant name}.sharepoint.com/{your site path}/_layouts/15/appregnew.aspx. Pick a name, generate the client ID and secret and copy them. Set localhost as App Domain, and https://localhost as redirect URL.

  • Next you will have to give the token the proper access rights. Go to your site’s URL followed by _layouts/15/appinv.aspx. The URL should look like this: https://{your tenant name}.sharepoint.com/{your site path}/_layouts/15/appinv.aspx.

  • Enter the client ID created at the first step and press Lookup

  • In the Permission Request XML box, copy and past this: <AppPermissionRequests AllowAppOnlyPolicy="true"><AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" /></AppPermissionRequests>. More details about rights can be found on SharePoint’s documentation.

  • Finally, press Create and OK.

  • Find your tenant ID. To do this, click on Cog > Site settings > Site collection app permissions. There you should see the App you created in the first step. The App Identifier contains two alphanumerical strings separated by a @ symbol. The string left of the @ is your app ID. The string right of the @ is your tenant ID. Copy it.

  • In Dataiku, go to App > Plugins > Installed > SharePoint Online > Site App Permissions (deprecated). Create a new preset, and paste the tenant name, client secret, client ID and tenant ID. This preset will now be usable by using Site App Permissions as type of authentication.

Option 4: Certificates - authenticate as an App (OAuth2 client)

With this option you configure access to Sharepoint via an Azure App registration (OAuth2 client), and the data users are allowed to see is controlled by the App’s permissions in Azure. Thus the connection is authenticated to act as the App (in the manner of a service account). Authentication against the Azure App is done using a certificate. (Technically, an OAuth2 Client Credentials grant is used for this option.)

Note

In the configuration below, the access given will be global across the Sharepoint instance - this is the easiest way to configure things. If this does not meet your security needs, and you need to limit access to particular sites or parts of your Sharepoint instance, consider Option 2 instead, or contact us for guidance on advanced configuration of this option.

  1. Get your certificate and private key ready. If you do not already have one, it can be created for instance by using openssl: openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout private_key.pem -out certificate.pem

  2. From the Azure Portal, go to Microsoft Entra ID > +Add App registrations to create a new App.

  3. Click on the newly created app. Copy the Application ID and tenant ID. Then go to the Certificates & secrets > Certificates tab. Upload your certificate (or certificate.pem created from the first step 1). Copy the certificate’s thumbprint - if the UI does not allow you to copy all the thumbprint, run openssl x509 -noout -fingerprint -sha1 -in certificate.pem | cut -d ‘=’ -f2 | sed ‘s/://g’ to get this value.

  4. Then navigate to the API permissions tab and add the appropriate permissions. A simple way to configure things is to use one of the following Sharepoint application permissions: Sites.Read.All, Sites.ReadWrite.All or Sites.manage.All. Use Sites.Read.All if you want just read-only access, Sites.ReadWrite.All if you need to write to Sharepoint documents/files, and Sites.manage.All if you need to write to Sharepoint Lists. Bear in mind these options give global access if - you need to restrict by site please contact us (or use Option 2 instead).

  5. In Dataiku, go to Plugins > Installed > SharePoint Online > Settings > Certificates. There, set paste the App ID, Tenant ID and certificate’s thumbprint copied from step 3.

  6. Open the private key file (or private_key.pem created at step 1) in a text application and copy /paste its content into the Client certificate (private key) section of the preset. The section to copy starts and ends with -----BEGIN PRIVATE KEY----- / -----END PRIVATE KEY----- or -----BEGIN ENCRYPTED PRIVATE KEY----- / -----END ENCRYPTED PRIVATE KEY-----

  7. The preset will now be usable by selecting Certificates as type of authentication.

Option 5: App username password - OAuth2 but pre-configured with a Sharepoint user

Caution

This method is only possible with very specific kinds of account (managed accounts without MFA) and is considered a less secure and legacy method. The other OAuth options (options 2 and 4) are preferred. Other security measures and configuration steps are likely to be needed to use this method (e.g. creating a specific account with limited access, and an exclusion policy for MFA, and setting a trusted IP).

In this method, a Sharepoint end-user credentials are entered on the Dataiku side so the authentication handshake is automatic (this is then used to authenticate with an OAuth2 Resource Owner Password Credentials grant). This means a dedicated Sharepoint user is used, so all DSS users using it can only access what this dedicated user can see in Sharepoint. Thus it may be a useful way to limit access for DSS users to only parts of your Sharepoint instance. It also means the DSS users do not need to each have a Sharepoint account themselves. However, as noted above, there are security and configuration implications to consider.

  1. From the Azure Portal, go to Microsoft Entra ID > + Add App registrations, create a new App (New registration).

  2. Click on the newly created app. Copy the Tenant ID and Application ID for later use.

  3. Navigate to the Authentication tab. In the Advanced settings, allow public client flow

  4. Then navigate to the API permissions tab and add the following delegated permissions: User.Read, AllSites.manage, AllSites.Read, AllSites.Write, MyFiles.Read and MyFiles.Write.

  5. In Dataiku, go to Plugins > Installed > SharePoint > Settings > App username password. There, enter the Tenant ID and App ID copied from step 1. Set your username password in the preset.

  6. To use the preset, select the Username / password type of authentication, then the preset you just created.

How to use

  • In your Dataiku project flow, select Dataset > SharePoint.

  • Click Shared Documents or Lists, according to the data source you are trying to sync Dataiku to.

  • Pick the authentication type and the preset, and browse to the document or folder you want to use as dataset.

  • For Azure Single Sign On presets, the Dataiku users can access other SharePoint sites they have access rights to. For this, select Show advanced parameters and set the site path and/or root path in Site path preset overwrite / Root directory preset overwrite

  • If the source is a list, you will have know its name beforehand. Although all the column names will be visible, only these selected in the SharePoint’s list standard view will be populated with data.

If necessary, another list view can be selected:

  1. First, find the name of the view for which the required columns are visible.

  2. In your Dataiku list dataset, tick Show advanced parameters, and write the name of the view in the View name section.

Export data back to SharePoint Online

As a document
  1. On your SharePoint Online, create a destination directory.

  2. On your Dataiku project flow, create a SharePoint folder by pressing Dataset > Folder and then select the SharePoint Server Shared Documents plugin as the Store into parameter.

  3. You will get a red error box at this stage. To resolve the error, go to the Settings tab and set the correct type of authentication and connection. Once this is done, type “/” in the path window and press Browse. Navigate to the destination directory created on step 1 and press OK. It is important that the destination directory is created for the sole use of your Dataiku project. Being a managed folder, its content could be deleted. For instance, with the parameters as shown below, removing the folder from the flow with the drop data option selected will result in the actual “dss” SharePoint directory being deleted as well.

  4. In your Dataiku flow, pick the dataset you want to export back to SharePoint, and select the Export to folder recipe. In the Export Recipe window, click on Use existing and pick the folder you created in steps 2-3. You can change the file format to you liking.

  5. Save and Run, and your dataset will be exported to the SharePoint directory.

As a list

Important

Limitations: Dataiku datasets can also be exported to SharePoint Online as lists. However, this operation will overwrite any existing list bearing the same name. Also, some types (such as lookup or calculated fields) do not exist in Dataiku and would therefore be replaced by strings.

For these reasons, it is strongly recommended to export lists from Dataiku to SharePoint only on specific lists created with the intent of being updated by Dataiku. Keep in mind that any user modification on the SharePoint side will be overwritten at the next sync.

  1. Create your destination list by creating a SharePoint Online list dataset ( +Dataset > SharePoint Online > Lists ). Select the appropriate authentication and list title. Check that this title is not already in use on SharePoint to avoid any deletion. Project variables in the title, such as DSS_${projectKey}_ can help reduce conflicts in the name space.

  2. Name the Dataiku dataset and click Create. An error should show because the dataset does not exist on the SharePoint side.

  3. In you flow, create a sync recipe from the dataset you want to export. In the Output dataset section, use Existing dataset and select the SharePoint dataset named at step 2.

  4. Click Run to start the export.

Note

  • Upload speed can be increased by using multiple workers. However, this will result in the lines being appended in random order.

  • To trigger the upload from a scenario, use a step to build the recipe’s output dataset.