Today, Python is one of the most sought after skills in the world of Data Science, and as such, we can leverage this power in our Tableau Data Visualisations. While integration is not entirely out of the box and requires some initial setup, it is not as hard to get up and running. Python is an interpreted, high-level, general-purpose programming language.
TabPy is a Python package that allows you to execute Python code on the fly and display results in Tableau visualizations, so you can quickly deploy advanced analytics applications. The split approach granted by TabPy allows for the best of two worlds—class-leading data visualization capabilities, backed by powerful data science algorithms. One huge benefit of surfacing Python algorithms in Tableau is that users can tune parameters and evaluate their impact on the analysis in real time as the dashboard updates.
In this post, we will introduce Python, show you how to integrate Python in Tableau, and more importantly, leave you with an example that you can build on.
1. Install Python on your machine (or server) :
2. Download TabPy server :
Then open up Anaconda prompt on your machine and typing:
conda install -c anaconda tabpy-server then hitting enter.
You then type y and hit enter when it asks if you want to proceed.
Here is how that looks like.
After this navigate to the anaconda2 folder and in that folder navigate to tabpy_server then type pwd . Copy that location and type à cd location and hit enter .Take reference from below
3. Start TabPy server :
Finally you would want to type ./startup.sh and hit enter.
Now it should be ready to run and look like this:
4. Connect TabPy Server to Tableau Desktop :
In tableau go to :
5. Worksheet :
We are going to start by opening the Sample Superstore Data Source that is provided with Tableau Desktop. Using this Data Source, we are going to create a Calculated Field to perform Pearson’s Correlation Coefficient (r).
Let us create a Calculated Field called Pearson Correlation Coefficient:
SCRIPT_REAL(“import numpy as np
Things to note:
We are importing the NumPy library.
We are calling the corrcoeffunction and passing through SUM(Sales) and SUM(Profits).
We will now build our worksheet:
Change the Mark Typeto Circle.
Drag Categoryonto Columns.
Drag Salesonto Columns.
Drag Profitonto Rows.
Drag Customer Nameonto the Detail Mark.
You should now have the following:
Now we will use our Pearson Correlation Coefficient.
Drag Pearson Correlation Coefficientonto the Colour Mark.
You will see an invalid JSON error, but do not worry about that. Please close the dialogue.
Right-click on this object, go to Compute Using and select Customer Name.
Click on the Color Mark.
Click on Edit Colourand select the Red-Green Diverging.