File Handling in SharePoint with Python

Dhrubajyoti Das
Python in Plain English
12 min readMay 16, 2021

--

Data collection has always been a rigorous task for individuals playing in Business Intelligence, Data Science, or Data Analytics, etc.

In addition, when it comes to leverage cloud storage such as Microsoft OneDrive, Microsoft SharePoint, AWS S3, etc., the integration with any data processing tool such as Python, R, Alteryx, Talend, Informatica, etc. becomes a complex task to do.

This article will try to reduce this pain up to a certain extent for sure. I will take a specific source system, i.e., “Microsoft SharePoint,” and discuss its integration methodology and a few “file handling” operations using Python.

We would need either a “Personal” or “Work/School” Microsoft account to begin with. To give a background here, integration with SharePoint/OneDrive happens through Microsoft Graph API. Let’s get started now.

Firstly, we need to create an app in Microsoft Azure AD. Head over to Azure Portal and log in with your MS account.

Click on “Azure Active Directory”

The next step is the registration of your app. Let’s do that.

Click “App Registrations” and then “New Registration”.

Now, put any name of your choice as the application name. Select “Accounts in this organizational directory only (<Domain Name>only — Single tenant)” if you are not too sure about multi-tenant access. In this case, I have selected “single-tenant” as I want my app to be available in the current tenant only. Please read this if you need more clarity. Also, leave redirect URI blank as we will use this application programmatically from Python. Programmatic interaction doesn’t need a reply URL mandatorily as we need to acquire an access token and use it to call different APIs. A redirect URI is required when a web application (generally) uses graph API and returns to its homepage/any page after authentication.

App Registration Page

Once registration is complete, you’ll be redirected to the application overview page, where you will see few key credentials such as client id, tenant id, etc. You can either keep them preserved in a secure note or always come back to this place to fetch them. Along with these two credentials, we also need OAuth(v2) endpoint to grant consent for the first time to use this application from Python and acquire an access token as we advance.

Preserve App ID and Tenant ID for future use
Keep this endpoint handy.

Now that we are ready with the initial setup, we have to add API permissions to our application. Please note, Microsoft Graph API provides plenty of services; however, we will add only two scopes for this demo. This tutorial would, anyway, enable you to explore other APIs on your own. You’ll see that the “User.Read” scope has been added by default. If you want, you may remove it but keeping the same won’t hamper anything.

API Permissions → Add a permission

Next, you need to select “Microsoft Graph” and then “Delegated Permissions”. We won’t choose “Application Permissions” as we will access APIs through Python using a native app that authenticates and runs on behalf of the signed-in user(essentially the MS account you are using) in the background.

Select “Microsoft Graph”
Select “Delegated Permissions”
Add Scopes

Now, I want you to focus on the “Admin Consent Required” section. There are a few APIs, such as ChannelMessage.Read.All that is used to read messages in a channel in Teams require admin approval. This means a user having admin privileges or the O365 administrator needs to approve the permission to use such API in your application. For deployment in an organizational environment, it is highly recommended that you chalk out the full plan and discuss it internally before starting with the development. At last, click “Add permissions.”

It should look similar to the image above.

So, we are pretty much done with allowing API permissions. We need to expose the API/add scopes for the app to use using the “Delegated permissions” method. Along with this, we also need to authorize the application so that the API can trust the app.

Expose an API → Add a scope
This should auto-fetch your App ID. Click on “Save and Continue”.

Put the scope name exactly what it was in API permission. E.g., Sites.ReadWrite.All. Please maintain case sensitivity and avoid white spaces. You also should enable the scope for both admins and users so that the logged-in user can use the app too(that’s what need anyway). Now, you should copy the scope name to “Admin consent display name” and “User consent display name.” Also, add a few lines for the description to understand what’s going on. Descriptions are important to state the purpose of using the API.

You need to add all scopes individually. Once done, your screen should look like below.

Next, we need to authorize the app to use these scopes. Go to “Add a client application.” Put your app ID and check all scopes that you want your app to use. Finally, select “Add application.”

By now, you must be thinking, “Phew! It’s a long process just to set up the application”. Trust me! It’s almost done. We need to alter the manifest and allow public flows. Then, we are good to go to the next phase of the integration, where we’ll use Python to make things happen in reality. We need to allow the implicit grant to acquire an access token using the MSAL library in Python.

Manifest → Change false to true→ Save
We are done here with app configuration. :)

At this moment, we are done with the app setup. Now, it’s time to create a SharePoint site(maybe termed as MS Teams channel too). The “Files” section of a teams channel is nothing but an extension of SharePoint storage. This is already integrated with Teams. This way, you can make the whole process more user-friendly as business users do not have to log in to SharePoint separately and upload files. With teams, business users can handle their files very easily.

You can skip adding people right now. Once you are ready to move your code to production, you can add other people to start using the service. Voila! The channel/SharePoint site is created now. Now, go to “Files” and click on “Open in Sharepoint”. Please make sure you are at “Documents” and not “Documents → General”. “Documents” is the root directory. Once you are in the browser, you will find the SharePoint site name in the URL. Look out for “/sites/site-name” in this section. We need to use the site name later in the code.

Here, we are done with all configurations. From now on, you can use any scripting language/ETL tool that can handle REST APIs. Since I am a Python lover, I will guide you to write few lines of code in Python to perform basic file handling operations such as download, single upload, resumable upload, delete, read a directory, copy, move, etc.

There are two broad steps to make the integration happen. The first is authorizing the logged-in user, and the second is interaction with SharePoint. Also note, authorizing the end-user is a one-time job. Let’s get done with the authorization.

import requests
import json
from requests_oauthlib import OAuth2Session
from oauthlib.oauth2 import MobileApplicationClient

As you must have noticed, we are using OAuth(v2) endpoints for authorization; the code also needs to use a library that gives an option to interact with such endpoints. We are using an extension of requests.Session i.e. OAuth2Session

client_id = "your-app-id"scopes = ['Sites.ReadWrite.All','Files.ReadWrite.All']auth_url = 'https://login.microsoftonline.com/tenant-id/oauth2/v2.0/authorize'#MobileApplicationClient is used to get the Implicit Grant
oauth = OAuth2Session(client=MobileApplicationClient(client_id=client_id), scope=scopes)
authorization_url, state = oauth.authorization_url(auth_url)
authorization_link = oauth.get(authorization_url)print(consent_link.url)

Click on the generated link and accept. Practically, you shouldn’t worry much to read all that the page says as you created the app yourself. So, you know exactly what you are trying to do.

Don’t worry if you see the right one. This would come as we have not set any reply URL. Here, you are done with authorizing the app. Let’s head over to step 2. We will need the msal library to move forward, just “pip” it. Nevertheless, you, anyway, need to install all libraries used here if you don’t have those at the moment.

import os
import requests
import json
import msal
import urllib

msal library is used to authenticate using Microsoft Identity Platform. It’s the key library here.

# Graph API Configuration
CLIENT_ID = 'your-app-id'
TENANT_ID = 'tenant-id'AUTHORITY_URL = 'https://login.microsoftonline.com/{}'.format(TENANT_ID)RESOURCE_URL = 'https://graph.microsoft.com/'SHAREPOINT_HOST_NAME = 'yourcompany.sharepoint.com' # URL of sharepoint host without https://API_VERSION = 'v1.0'USERNAME = 'user@yourcompany.com' #Office365 user's account usernamePASSWORD = 'password'SCOPES = ['Sites.ReadWrite.All','Files.ReadWrite.All'] # Add other scopes/permissions as needed.SITE_NAME = 'your-sharepoint-site-name' # Name of sharepoint site

You can also populate all these variables using the .env file and “dotenv” library. This is certainly a better way to handle credentials in your script as it’s not recommended to put passwords, API keys, etc., in clear text inside a code block. To make this tutorial easier, I have chosen the easiest way. Now, we will create a public client app that would acquire an access token to pass through all API calls.

#Creating a public client app, Aquire a access token for the user and set the header for API calls
pca = msal.PublicClientApplication(CLIENT_ID, authority = AUTHORITY_URL)
token = pca.acquire_token_by_username_password(USERNAME,PASSWORD,SCOPES)headers = {'Authorization': 'Bearer {}'.format(token['access_token'])}

Now, we need to go down the hierarchy and get the “ID” of each level. It’s “Site” → “Drive” → “Item”.

In this case, “Site” is “Sharepoint Tutorial”, “Drive” is “Documents”, and “Item” is all the folders in “Documents”. Now, let’s put a simple text file and download it.

# Get Item ID/Folder ID
item_path = 'General' # You can put an absolute path here to your file
item_url = urllib.parse.quote(item_path)
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{item_url}', headers=headers)
item_info = result.json()
item_id = item_info['id']
# Listing children(all files and folders) within General
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/items/{item_id}/children', headers = headers)
file_name = result.json()['value'][0]['name']

Now, if you want to download all files, you can loop through result.json(). Sine, I have uploaded only 1 file, I will put a specific ID of that file and download it.

# Get File ID(in this case, id of simple_text_file.txt)
file_path = item_path + '/' + file_name
file_url = urllib.parse.quote(file_path)
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{file_url}', headers = headers)
file_info = result.json()
file_id = file_info['id']

Once we get the file ID, we need to read its content and write it to our local machine. This will eventually download the file.

result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/items/{file_id}/content', headers = headers)
open(file_name, 'wb').write(result.content)

So far, we have covered two operations: download and reading a directory. Now, let’s close this loop by knowing how to upload a file.

Graph API segregates upload operation in two ways. If the file is < 4 MB, then it can be uploaded in one go. On the other hand, if it’s > 4 MB, we have to create a secure upload session and split the file into chunks. You should know that each chunk can be a maximum of 10 MB. This process is also known as resumable upload.

We’ll run a check whether the file exists. If it doesn’t, we’ll upload a new file; else, we’ll update the existing file's content (basically replace). At the same time, we’ll also check the file size. Please note, resumable upload only supports fail, replace, and rename. Hence, we have to replace the existing file anyway. Hence, we won’t be running a check on file existence in this case.

## UPLOAD A FILE# File Name you want to upload
filename = 'your-file-name-with-extension'
# Sharepoint folder/full path to the folder where you want to upload
folder_path = 'General/uploads'
# Generating relative paths
full_rel_path = urllib.parse.quote(f'{folder_path}/{filename}')
file_rel_path = urllib.parse.quote(filename)
folder_rel_path = urllib.parse.quote(folder_path)

This is the basic configuration. Now, we’ll run a check on file existence.

# Check whether the file already exists
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{full_rel_path}', headers=headers)
if result.status_code == 200:
file_exists = 1
file_id = result.json()['id']
else:
file_exists = 0
file_id = ''

Now, we need to get the folder ID where the file needs to be uploaded.

# Get the folder ID
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{folder_rel_path}', headers=headers)
folder_id = result.json()['id']

Let’s check the file size as well. os.stat() method would always show the size of the file in bytes.

st = os.stat(filename)
size = st.st_size

Now, we got everything we need to upload the file. Let’s check each condition and execute lines of code accordingly.

if size <= 4194304:
if file_exists == 1:
result = requests.put(
f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/items/{file_id}/content',
headers = upload_header,
data=open(filename, 'rb').read()
)
elif file_exists == 0:
file_url = urllib.parse.quote(filename)
result = requests.put(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/items/{folder_id}:/{file_rel_path}:/content'
,headers = upload_header
,data = open(filename, 'rb').read()
)
else:
result = requests.post(
f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/items/{folder_id}:/{file_rel_path}:/createUploadSession',
headers={'Authorization': 'Bearer ' + token['access_token']},
json={
'@microsoft.graph.conflictBehavior': 'replace',
'description': 'Uploading a large file',
'fileSystemInfo': {'@odata.type': 'microsoft.graph.fileSystemInfo'},
'name': filename
}
)
upload_url = result.json()['uploadUrl']
CHUNK_SIZE = 10485760
chunks = int(size / CHUNK_SIZE) + 1 if size % CHUNK_SIZE > 0 else 0
with open(filename, 'rb') as fd:
start = 0
for chunk_num in range(chunks):
chunk = fd.read(CHUNK_SIZE)
bytes_read = len(chunk)
upload_range = f'bytes {start}-{start + bytes_read - 1}/{size}'
result = requests.put(upload_url,
headers={
'Content-Length': str(bytes_read),
'Content-Range': upload_range
},
data=chunk
)
result.raise_for_status()
start += bytes_read

Your file should be uploaded now. Always verify the same from the front end to make sure everything is happening the way you wanted.

Another common file handling operation is “delete”. So, let’s delete the file which we just have uploaded. I should mention here that MS Graph API doesn’t support permanent delete yet. This means we need to move the file to recycle bin. Hopefully, Microsoft will introduce permanent delete directly from “Documents” or delete items from recycle bin. Nevertheless, let’s switch back to Python. This is fairly easy.

# Delete a file
filename = 'your-file-name-with-extension'
folder_path = 'General/uploads'
rel_path = urllib.parse.quote(f'{folder_path}/{filename}')result = requests.delete(f'{RESOURCE_URL}{API_VERSION}/sites/{site_id}/drives/{drive_id}/root:/{rel_path}', headers = headers)

Let’s copy the file to another folder. I have created one called “copied”. The configuration looks like below.

# File Name you want to copy
filename = 'your-file-name-with-extension'
# Sharepoint folder/full path to the folder where you want to upload
folder_path = 'General/uploads'
# Generating relative path
full_rel_path = urllib.parse.quote(f'{folder_path}/{filename}')
# Setting up header
copy_header = { 'Authorization': 'Bearer ' + token['access_token'], 'Content-type': 'application/json'}

Now, we need to fetch the id of the file to be copied.

# Get File ID
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{full_rel_path}', headers = headers)
file_info = result.json()
file_id = file_info['id']

In addition, we need the id of the source and destination folder.

# Get folder ID of the destination
destination = 'General/copied'
destination_url = urllib.parse.quote(destination)
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{destination_url}', headers=headers)
destination_id = result.json()['id']
# Get folder ID of the source
source = 'General/uploads'
source_url = urllib.parse.quote(source)
result = requests.get(f'{RESOURCE_URL}{API_VERSION}/drives/{drive_id}/root:/{source_url}', headers=headers)
source_id = result.json()['id']

Finally, call the API to copy the file.

# Copy the file
result = requests.post(f'{RESOURCE_URL}{API_VERSION}/sites/{site_id}/drive/items/{file_id}/copy'
, headers = copy_header
, json = {
"parentReference": {
"id": destination_id
},
"name": filename
})

Please have a look at the JSON parameter that I have passed. The request body guides the API to copy the file to the destination and its new name. I have kept the same name everywhere as this is the general practice.

Now that we are done with copying files. Why don’t we try moving the same? Let’s create another folder called “moved” and execute the below code. All configurations, fetching file IDs, and folder IDs will be exactly as same as copying a file. We’ll use the PATCH method to perform this.

result = requests.patch(f'{RESOURCE_URL}{API_VERSION}/sites/{site_id}/drive/items/{file_id}'
, headers = copy_header
, json = {
"parentReference": {
"id": destination_id
},
"name": filename
})

Conclusion

And, that’s a wrap! By now, you know how to set up an app in AAD, how to add permissions, expose APIs, authorize users, and call different APIs to perform several file handling operations.

Keep experimenting, keep learning!

More content at plainenglish.io

--

--

Data has always driven me. My primary skillsets are in Data Analytics, BI, ETL, and Visualization. Taking baby steps to explore the world of Data Science.