python integration with powerBI

Python Integration in Power BI

Share This Blog

Introduction:

Microsoft Power BI is a data transforming, data modelling and data visualization tool. Power BI is a collection of software services, apps, and connectors that work together to turn unrelated sources of data into coherent, visually immersive, and interactive insights. The data may be in the form of an Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data warehouses. Just like excel it has a very prompt data query editor that can import and transform data. Power Bi supports different ways to import data from different data sources. One such source is Python Script which forms the basis of our discussion today.

So, let’s start with Python installation and then importing basic excel and csv. Then we enter into advanced analytics of using matplotlib using python scripts in Power BI and then GraphQL integration in python script.

Objective:

  • Importing basic csv or xlsx using Python script as Data Source.
  • Import data from an GraphQL API request to Power BI using Python Scripts.
  • Importing data using Python scripts and using matplotlib.

Installation:

We can download Python (Anaconda) using the link https://www.anaconda.com/products/individual . Then we can go to our windows and search for Anaconda (in CMD base) and Jupyter Notebook (for web-based IDE).

1. Importing basic csv/excel using python script in Power BI as Data Source:

Excel and CSV files can be directly imported from get data without using python script. So why go into this trouble? Python is a very powerful and widely used language, that we can use it in query editor to transform and clean the data. Python is also widely used among data science community utilizing powerful libraries such as NumPy, Pandas, Sci-Kit.

To import data, we can select Get Data from Home Tab and search for Python script in the search bar and then select it. Now let’s pass a small code in the script and check whether it works. Type in import pandas hit ok. If it goes on to show Python in the navigator in the display options then the Python scripting is working. If anyone is selecting python for first time then he/she would get a connection error. This can be fixed in the following steps.

  • First select the file tab in Power BI desktop.
  • Select Options and Settings from the drop-down and then select Options.
  • Now a new pop-up window will open and here we have to select.
  • Python scripting here and check whether the python home directory is configured or not. Once done hit ok and then check typing a demo again.

Now let’s import a csv using python script. Type in the following command in the space in python script:

import pandas as pd
df = pd.read_csv( 'E:\Inside D Drive\Tableau\Sample - Superstore - Copy.csv')
df

To explain the code first we are importing the pandas library. Then we import the csv using read command and the file path and file name. Finally, we print the table. This follows the ok button and here’s our table.

Now select the check box and then you can load/transform data as per requirement.

Next, we can do similar steps for excel also. Running the command, we would get a warning but the output would be shown. Additionally, here we have to select the sheet name or it will take default as first sheet.

import pandas as pd
df = pd.read_excel(r'E:\Inside D Drive\Tableau\Sample - Superstore - Copy.xlsx',sheet_name='People')
df

1.Import data from an API request to the power BI using Python Scripts:

It is another way to integrate python with power BI using python Script. Here we are importing data from API request.

What is an API? API is the acronym for Application Programming Interface, which is a software intermediary that allows two applications to talk to each other.

Why GraphQL in python?In Power Bi we can use the GraphQL query by using the power Query editor, but its bit tough to write power query in Power BI. But by using the python Script we can achieve this very easily. Now here comes the python and power BI integration.

Important things to do!

In this case we are using the GraphQL resources of SpaceX which is publicly available. Use the below link to navigate there https://api.spacex.land/graphql/ In the GraphQL page select your schema and table from the query tab as per your requirement. After selecting the schema, you will see the GraphQL query in the middle tab of that window. It will be generating automatically. Copy that GraphQL query and put that query in the query section of the python script.   By using this query, we can make API request. See the below code.

# Libraries that is used in this code
import pandas as pd
import requests
import json
from datetime import datetime, timedelta
# URL that you are requesting to
url=https://api.spacex.land/graphql/
# GraphQL query copied from the link and keeping it in the (“””) so that it will be consider as a String
query="""query 
{
  launchesPast(limit: 10) {
    mission_name
    launch_date_local
    rocket {    
              rocket_name
              rocket_type
                }
    ships {
             name
             model
             }
                                         }
}
"""
# Response of the request
r=requests.post(url,json={'query':query})
# Loading text format of response as a JSON
Jdata=json.loads(r.text)
# Extracting Key-Value Pairs
Jdata1=Jdata['data']['launchesPast']
# Converting it to a table format
NData=pd.json_normalize(Jdata1,'ships',['mission_name','launch_date_local',['rocket','rocket_name']])
#SpaceX missions history with renamed columns
NData.columns=['Ship_name','Ship_Model','Mission_name', 'Launch_date_local','Rocket_name']
NData

Work with Power BI

Now, we will use the above python code in the power BI. For this first open your Power BI Desktop, in the power BI desktop go to Get data -> more -> Search for Python Script -> select Python script and click connect.

If you are facing any problem in connecting to the python Script in Power BI, go to the top page of this blog so that you may get the solution of your problem. Once you are connected, put the above python script in the script box of the Power BI and click ok. See the screenshot.

  • After clicking ok, in the next page you see it will import all tables. See the screenshot below. 
  • Select the table and load it or transform it. After loading it in the Power BI you can do the visualizations. See the example in the screenshot. This is how you can play with GraphQL APIs in python and integrate python in Power BI.

3. Importing data using Python scripts and using matplotlib:

 Creating Python Visuals in Power BI Desktop:

  • Select the Python visual Icon in the Visualization Pane, as shown in the following screenshot.
  • In the Enable script visuals dialog box that appears, select Enable.
  • When we add a Python visual to a report, then a placeholder Python visual image appears on the report canvas and the Python script editor appears along the bottom of the center pane.
  • On, Fields Panel click on get data and load a Sample dataset like ‘Superstore.csv’.
  • Next, drag the Category, Year, Quarter, Month, Day, Ship Mode, Sub-Category, Order ID fields From Superstore.csv to the Values section where it says Add Data fields here. Screenshot attached below. You can choose the ones you need.
  • Python script can only use fields added to the values section. we can add or remove fields from the Values Section while working on python script. power bi desktop automatically detects field changes.
  • As we select or remove fields, supporting code in the python script editor is automatically generated or removed.
  • Based on our selections (Category, Year, Quarter, Month, Day, Ship Mode, Sub-Category, Order ID), the python script editor generates the following code: –
    • The editor created a dataset data frame, with the fields Superstore.
    • The default aggregation is: do not summarize.
    • Similar to table visuals, fields are grouped and duplicate rows appear only once.
  • With the data frame automatically generated by the fields we selected; we are ready to write a python script that results in plotting to the python.
  • On python script editor, under paste or type your script code here, enter this code:
  • Matplotlib is Python’s 2D plotting library that produces quality figures, using this library it makes easier to generate plots, bar charts, scatter chart and many more.
  • Pandas is also a library for data manipulation and analysis, it provides powerful and flexible data structures that makes an easy task. DataFrame is one of the essential components of these structure.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df_sales=dataset.groupby(['City']).Sales.agg(['sum'])
df_sales_top10=df_sales.sort_values(by='sum', ascending=False).head(10)
df_sales_top10=df_sales_top10.reset_index()
df_sales_top10=df_sales_top10.rename(columns={'sum':'Sales_Sum'})

#Formatting Plot Size, Text
plt.rcParams["figure.figsize"] = (16,5)
plt.rcParams.update({'font.size': 10})
plt.xticks(rotation=45)

#Get Sales_Data Grouped By Sales
x=df_sales_top10['City']
y=round(df_sales_top10['Sales_Sum']/1000,2)
plt.bar(x, y, color='#66ccff')

# displaying the title and Labels
plt.title("Top 10 City By Sales")
plt.xlabel('City',)
plt.ylabel('Total Sales')

#Function for Data Labels 
def addlabels(x,y):
    for i in range(len(x)):
        plt.text(i,y[i]+2,str(y[i])+'k',ha ='center')#,bbox = dict(facecolor = 'red', alpha =.5))

#Adding Data Label and GridLines
addlabels(x, y)
plt.grid(axis = 'y',linestyle = '--', linewidth = 1,color='#66ccff')

#Show Plot
plt.show()
  • The matplotlib library is Imported to plot and create our visuals. When we select the Run script button, the following Bar chart generates in the placeholder Python visual image.

In above Bar chart visual shows that the correlations between City and sales in dollars. In python visual, a dataset of maximum 150,000 rows can be used for python plot, not more than this. Python script will give a timeout error after 5 minutes of execution. Python plots cannot be used for cross-filtering. We can see both visualizations in comparison here.

Python support in Power BI personal gateway

When you publish a Power BI report with Python scripts to the service, these scripts will also be executed when your data is refreshed through the on-premises data gateway in personal mode (personal gateway).

To enable this, you must ensure that the Python runtime with the dependent Python packages is also installed on the machine hosting your personal gateway. Note, Python script execution is not supported for on-premises data gateways shared by multiple users.

Conclusion:

Data visualization helps to identify patterns, trends and correlations that might not be detected otherwise. Visualizations offer an effective way to convey information to the end-user. In this age of big data analysis, where we are drowning by data volume, it is nearly impossible to tell stories without using visualizations. Python offers the developer an option to add customized, scenario-based attractive visuals with just a few lines of codes in their dashboards. 

The 3 ways to use python script are demonstrated above. Power BI and Python visualizations both have their own credits.  But the integration of Python in Power BI helps data scientists mostly. They can work in python visualizations which are in-built rather than the extension visualizations in Power BI. Python can also help in data cleansing by manipulating the query. Data scientists in the 21st century use python as a base language to learn which further helps them in integrating in Power BI.

priyanka
Priyanka Panda