SharePoint Online¶
Dataiku can interact with SharePoint Online to:
read and write datasets based on SharePoint Online lists
read and write datasets based on SharePoint Online stored documents
read and write managed folders
There are two types of SharePoint objects that can be leveraged: document libraries and lists.
To interact with a document, you will need to know its SharePoint site, the drive in which it is stored, and its path within this drive.
Dataiku uses the same filesystem-like mechanism when accessing SharePoint Online: when you specify a site and drive, you can browse it to quickly find your data, or you can set the prefix in which Dataiku may output datasets. Datasets on SharePoint thus must be in one of the supported filesystem formats.
To interact with SharePoint lists, you will need to know the site, and the name of the list.
Creating a SharePoint Online connection¶
Before connecting to SharePoint Online with Dataiku you need to :
create at least one site on your SharePoint instance
define an app on Azure portal and retrieve its client id and client secret
Connecting to SharePoint Online using OAuth2¶
Dataiku can access SharePoint Online using OAuth2 as a per-user credential.
create a new App registration (Azure Portal > Microsoft Entra ID > App registrations). Dataiku will connect with the identity of this app
in the Overview tab, note the Application (client) ID and the Directory (tenant) ID
in the Authentication tab, add a new Platform
choose the “Web” platform
add a redirect URI of DSS_BASE_URL/dip/api/oauth2-callback
Note
For example if Dataiku is accessed at https://dss.mycompany.corp/, the OAuth2 redirect URL is https://dss.mycompany.corp/dip/api/oauth2-callback
if you selected the “Web” platform earlier, create a client secret for this application (App registration > Certificates & Secrets), note the client (app) secret
create a new SharePoint Online connection in Dataiku
fill the “Tenant id”, “App id”, and “App secret” fields with the fields you noted earlier in the Azure App
“Authorization endpoint” should be “https://login.microsoftonline.com/common/oauth2/v2.0/authorize” for multi-tenant apps, or “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/authorize” for single tenant
“Token endpoint” should be “https://login.microsoftonline.com/common/oauth2/v2.0/token” for multi-tenant apps, or “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/token” for single tenant
“Scope” should be “offline_access User.Read Files.ReadWrite.All Sites.ReadWrite.All Sites.Manage.All”
“Default site” should contain the name for an existing SharePoint site where new managed datasets will be created by default. To find the site name, browse to that site and copy the section of the URL following /sites/. For instance, if the URL looks like “https://my-corp.sharepoint.com/sites/myproject/_layouts/15/viewlsts.aspx?view=14” the site name is myproject.
“Default drive” should contain the name of an existing drive where new managed datasets will be created by default. This drive must belong to the default site. To find the drive name, go to the “Site contents” section and copy the name of the document library containing your drive.
“Default path” is the path to the directory within the default drive where managed folders will be created
create the connection (you can’t test it yet)
Then for each user:
go to user profile > Credentials
click the “Edit” button next to the new connection name
follow the instructions that appear
Connecting to SharePoint Online using user name and password¶
Dataiku can access SharePoint Online using OAuth2 as a per-user or global credentials.
Note
This type of connection is only possible with managed accounts. To know if an account is managed, go to this URl using a web browser, after editing the email address with the one you intend to use: https://login.microsoftonline.com/GetUserRealm.srf?login=your.SharePoint@email.address. The key NameSpaceType should read Managed.
create a new App registration (Azure Portal > Microsoft Entra ID > App registrations)
in Authentication > Advanced settings, allow public client flows
navigate to the API permissions tab and add the appropriate permissions
create a new SharePoint Online connection in Dataiku
choose User / Password as Auth type
fill the “Tenant id” and “App id” fields with the fields you noted earlier in the Azure App
“Authorization endpoint” should be “https://login.microsoftonline.com/common/oauth2/v2.0/authorize” for multi-tenant apps, or “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/authorize” for single tenant
“Token endpoint” should be “https://login.microsoftonline.com/common/oauth2/v2.0/token” for multi-tenant apps, or “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/token” for single tenant
“Scope” should be “offline_access User.Read Files.ReadWrite.All Sites.ReadWrite.All Sites.Manage.All”
“Default site” should contain the name for an existing SharePoint site where new managed datasets will be created by default. To find the site name, browse to that site and copy the section of the URL following /sites/. For instance, if the URL looks like “https://my-corp.sharepoint.com/sites/myproject/_layouts/15/viewlsts.aspx?view=14”, the site name is myproject.
“Default drive” should contain the name of an existing drive where new managed datasets will be created by default. This drive must belong to the default site. To find the drive name, go to the “Site contents” section and copy the name of the document library containing your drive.
“Default path” is the path to the directory within the default drive where managed folders will be created
pick a credential mode. “Per user” means that each Dataiku user will have to put their SharePoint user name and password in their credential page before using the connection. “Global” means that this username access will be shared with all the Dataiku users with access rights to this connection.
create the connection
if the credential mode mode is “Global”, you can test the connection. Otherwise you have to enter your credential in your Dataiku user profile first.
If the credential mode is “Per user”, this extra steps are necessary for each Dataiku user:
go to User profile > Credentials
click the “Edit” button next to the new connection name
follow the instructions that appear
Connecting to SharePoint Online using certificates¶
Dataiku can access SharePoint Online using certificate and private key as global credentials.
get your certificate and private key ready. If you do not already have one, it can be created for instance by using openssl: openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout private_key.pem -out certificate.pem
create a new App registration (Azure Portal > Microsoft Entra ID > App registrations)
in Authentication > Advanced settings, allow public client flows
in Certificates & secrets, upload your certificate (or certificate.pem created from the first step). Copy the certificate’s thumbprint.
navigate to the API permissions tab and add the appropriate permissions
create a new SharePoint Online connection in Dataiku
choose “Private key” as Auth type
fill the “Tenant id”, “App id” and “Thumbprint” fields with the fields you noted earlier in the Azure App
open the private key file (or private_key.pem created earlier) in a text application and copy/paste its content into the Client certificate (private key) input box. The section to copy starts and ends with “-----BEGIN PRIVATE KEY-----” / “-----END PRIVATE KEY-----“.
“Authorization endpoint” should be “https://login.microsoftonline.com/common/oauth2/v2.0/authorize” for multi-tenant apps, or “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/authorize” for single tenant
“Token endpoint” should be “https://login.microsoftonline.com/common/oauth2/v2.0/token” for multi-tenant apps, or “https://login.microsoftonline.com/<<your tenant id>>/oauth2/v2.0/token” for single tenant
“Scope” should be “https://graph.microsoft.com/.default”
“Default site” should contain the name for an existing SharePoint site where new managed datasets will be created by default. To find the site name, browse to that site and copy the section of the URL following /sites/. For instance, if the URL looks like “https://my-corp.sharepoint.com/sites/myproject/_layouts/15/viewlsts.aspx?view=14”, the site name is myproject.
“Default drive” should contain the name of an existing drive where new managed datasets will be created by default. This drive must belong to the default site. To find the drive name, go to the “Site contents” section and copy the name of the document library containing your drive.
“Default path” is the path to the directory within the default drive where managed folders will be created
create and test the connection
Creating SharePoint Online datasets¶
From a SharePoint document¶
After creating your SharePoint Online connection in Administration, you can create datasets from documents stored on SharePoint.
From either the Flow or the datasets list, click on +Dataset > Cloud Storage & Social > SharePoint Document.
select the connection
select the SharePoint site and the drive in which your files are located
click on “Browse” to locate your files
From a SharePoint list¶
After creating your SharePoint Online connection in Administration, you can create datasets from a SharePoint list.
From either the Flow or the datasets list, click on +Dataset > Cloud Storage & Social > SharePoint List.
select the connection in which your lists are located
select the SharePoint site and the list
Location of managed datasets and folders¶
When you create a managed dataset or folder in a SharePoint Online connection, Dataiku will automatically create it within the “Default site”, “Default drive” and the “Default path”.
Below that root path, the “naming rule” applies. See Making relocatable managed datasets for more information.