Transforming Data Using Power Query

After extracting the data that you need from your source, it is very likely that you are going to want to change it or clean it before you load it into the worksheet or the Excel Data Model. It’s rarely the case that your source data is in exactly the right format and, of course, this is where Power Query shows its worth.

When you connect to a data source, Power Query launches the Query Editor, a dedicated window that lets you transform the data from your connections in ways that help you analyze it. Transforming data means modifying it in some way to meet your needs – for example, you could remove a column, change a data type, or merge tables. As you transform data, it collectively takes on the shape you need to further your analysis. The process of applying transformations to one or more sets of data is often called Shaping Data.

 

Step 1: Open the Query Editor by selecting Launch Editor from the Power Query ribbon.

 

dt1

 

Step 2: The Query Editor also opens whenever you connect to a data source, create a new query, or Load an existing query. Power Query keeps track of everything you do with the data. The Query Editor records and gives a proper name to each transformation or step. The Query Editor tracks each operation in the APPLIED STEPS section of the Query Settings pane.

Power Query doesn’t change the original source data. Instead, Power Query records each step you take when connecting or transforming the data, and once you’ve finished shaping the data, it takes a snapshot of the refined data set and brings it into Excel.

 

dt2

 

Step 3: You can also delete steps applied to the query by clicking on the Cross icon present beside every step. Promoted Headers and Changed Type steps automatically appear when launching the Query Editor. Deleting these two steps will give us the output of just the first step, Source. At this point, the column names from the Source file are shown in the first row of the table.

 

dt3

 

Naming Columns: click the Use First Row As Headers button on the Home tab in the Query Editor toolbar to provide an appropriate name to the columns. Once you have done this, a new step will be created called Promoted Headers, and the Query Editor will display its output.

 

dt4

 

Moving Columns: Columns can be moved left or right in the table by dragging and dropping them in the appropriate place, or by right clicking the column and then selecting Move and either Left, Right, To Beginning or To End. For Example, Right Click on the Customer ID column and select Move from the drop-down menu and choose Right. Below you can see the Customer ID column has been moved to the right of the Customer Name column.

 

dt5

 

Removing Columns: It is also important that you do not import columns that you do not need if you are loading data into the Excel Data Model because these columns will increase the amount of memory needed to store the data. You can remove columns from a table by selecting one or more columns in the Results pane and right-click, you can choose the Remove option to remove the selected columns or the Remove Other Columns option to remove all but the selected columns.

 

dt6

 

Splitting Columns: Consider the Order ID column. Notice how the values in that column are divided into three parts by the delimiter ‘ – ‘. If you want each of these names shown in a separate column, click the column to select it, then click the Split Column button on the Home tab of the Query Editor toolbar, and then select By Delimiter or make the same selection from the right-click menu.

 

dt7

 

On selecting By Delimiter, a dialog box opens up. From the Select or enter delimiter drop-down box, select Custom and enter the symbol ‘ – ‘. Under Split, select At each occurrence of the delimiter. Clicking OK will result in the existing Order ID column being removed from the table and being replaced with three columns called Order ID.1 and Order ID.2 and Order ID.3

 

dt8

 

It is also possible to split a column into multiple columns of a certain number of characters by clicking the By Number of Characters option under the Split Column option. Select the Customer ID column, then split the column by number of characters. By doing this it will result in a dialog box where you have to specify the number of characters to split the column and how you want to split them. Enter 2 under Number of characters and Once, as far left as possible for Split, which will result in two columns Customer ID.1 and Customer ID.2. In the first column two characters from the left are stored and the rest of the Customer ID value in the second column.

 

dt10

 

Merging Columns: To merge two columns containing text data into a single column, select two columns then select Merge Columns option from the right click menu or from the Add Column tab in the toolbar.

 

dt11

 

Consider Customer ID.1 and Customer ID.2, the previously separated columns to merge into one column. Select the two columns, on hitting Merge columns it will lead to a dialog box where you can specify any separator for merging and a name for the resulting column after merging. As we don’t need a separator here, select None and enter Customer ID for the name of the new column. Click OK and see that the two columns have been merged into a single Customer ID column.

 

dt12

 

Setting the Data Type of a Column: When you import a CSV or Excel file into the Power Query, it will automatically try to set a data type for each column. In fact, Power Query will try to do this after other operations as well, such as splitting a column. But you can also choose your own data type for a column quite easily by clicking it and then selecting the  appropriate type from the Data Type drop-down box on the Home tab of the toolbar in the Query Editor. Here we have considered the Ship Mode column and selected the type to be Text.

 

dt13

 

Filtering Rows Using Auto-Filter: It’s very likely that you will not need all of the data you have imported and that you will have to filter out some rows. Probably the easiest way to filter rows in a table is to use the Auto-Filter box. You can find this by clicking the down-arrow icon next to a column name in the Query Editor. The Auto-Filter box displays the first 1,000 distinct values that it finds in a column. If there are more than 1,000 distinct values in a column, a Load More button appears which, if you click it, will load 1,000 more distinct values. Unchecking a value in the Auto-Filter box will mean that any row that contains that value for the selected column will be removed from the table.

 

dt14

 

Number Filters: Number Filters menu is available just above the Auto-Filter box for columns of type number. On selecting Number Filters, it allows you to choose from a set of conditions to filter your data for the selected column. Selecting one of these options will display the Filter Rows dialog where you can apply multiple filter conditions with AND or OR logic. Once a filter has been applied, the Clear Filter menu option will be enabled, selecting this menu option will remove the filter.

 

dt15

 

Text Filters and Date Filters: Similar to the Number Filters, Text Filters and Date Filters are available for the columns of type Text and Date respectively. Again you can filter rows in a column according to your required criteria from a given set of conditions.

 

dt16

 

Filtering Rows by Range: There are a number of options available for filtering rows by their position in the table. These options can be accessed from the Query Editor toolbar in the Home tab. The same options can also be accessed by clicking on the Table icon in the top left-hand corner of the table in the Query Editor.

 

dt17

 

Removing Duplicate Values: It is possible to filter rows from a table that have duplicate values in a certain column and also to remove duplicate rows. To remove the duplicates from a single column, Select the column, and select Remove Duplicates from the Right Click menu. To remove duplicates from all the columns you can manually select all the columns and click Remove Duplicates from the toolbar in the Home tab Or click the Table icon in the top left-hand corner of the table and select Remove Duplicates from the menu.

 

dt18

 

Sorting a Table: Tables in Power Query can be sorted in either ascending or descending order by one or more columns. To sort a table by a single column, select that column in the Query Editor and then either click one of the two sort buttons on the toolbar or click the down arrow next to the column name and select Sort Ascending or Sort Descending from the menu. When a column is sorted, a small arrow pointing upward or downward will appear in the column header, indicating whether the table is sorted in ascending or descending order.

 

dt19

 

Replacing Values with Other Values in a Table: The Replace Values button, found in the Column section of the Transform tab on the Query Editor toolbar, allows you to search for values in columns and replace them with other values. The same functionality can be accessed by right-clicking a column and selecting the Replace Values menu item.

 

dt20

After you have clicked this button, the Replace Values dialog will appear. There you must enter the value to find, the value to replace with, and whether the entire cell contents must be matched. Note that the matching is case sensitive in nature. To delete occurrences of a value completely, you need to leave the Replace With box empty. To replace blank text values with a null value, you need to leave the Value To Find text box empty and enter the value “null” in the Replace With text box.

 

dt21

 

Text Transformations: Selecting one or more text columns in the Query Editor will enable the Format drop-down box on the Transforms tab of the Query Editor toolbar and the same functionality can also be accessed from the right-click menu.

 

dt22

  • Lowercase – Sets all text in the selected columns to lower case.
  • Uppercase – Sets all text in the selected columns to upper case.
  • Capitalize Each Word – Makes all words in the selected column start with a capital letter and sets all subsequent letters in a word to lower case.
  • Trim – Removes any leading or trailing white-space characters from text.
  • Clean – Removes any unprintable characters from text in the selected columns.
  • Length – Returns the number of characters of each value in the selected columns.
  • JSON & XML – Allows you to transform a piece of text in a cell into either an XML document or a JSON document. These options are also available in the Parse drop-down box, on the Transform tab in the toolbar.

Number Transformations: Selecting a column of data type Number will enable a number of items in the Number section of the Transform tab in the toolbar. The same functionalities can also be accessed from the right click menu.

 

dt23

 

Date, Time and Duration Transformations: Selecting a column of data type Date, Time and Duration will enable either the Date, Time, or Duration drop-down boxes in the Date & Time section of the Transform tab in the toolbar.

 

dt24

 

Filling Up and Down to Replace Missing Values: The Fill drop-down box on the Transform tab of the Query Editor toolbar, contains two options; Fill Down allows you to replace null values in a column with the last non-null value in a column, and Fill Up does the same but uses the next non-null value in the column. This functionality is extremely useful for cleaning data that contains missing values.

 

dt25

 

Aggregating Values in a Table: Numeric values in a table can be aggregated i.e summed, counted, averaged or otherwise summarized by clicking the Group By button in the Home tab of the toolbar of the Query Editor. Clicking this button opens up the following dialog box.

 

dt27

 

The upper Group by section allows you to specify which columns you want to group by when aggregating. By default, any columns that were selected in the Query Editor will be included in this section; you can add or remove columns using the + and – buttons. The output of the group by operation will include all of the distinct combinations of values from the selected columns.
Secondly, the lower section allows you to specify the aggregation operations that you want to perform on any of the remaining columns. Again, multiple aggregation operations can be specified here by clicking on the + and – buttons. Each aggregation operation will result in a new column being added to the output and the name of that column is specified in the New column name text box.

 

Unpivoting Columns to Rows: When your source data is pivoted data, it might contain one column for each year’s sales. Sometimes data is distributed in a way that it gets difficult to create a report out of it. For example, all the years present in the data source are distributed over different columns showing their respective sales. In that case, it is much more convenient to have a single column for all the year values and one column for all the sales values. The Unpivot button is available in the Transform tab of the toolbar in the Query Editor. The Unpivoted version of the data shows all the year values in a new column called Attribute and the sales values, in a column called Value.

 

dt28

 

Transposing a Table: Transposing a table involves turning the rows of a table into columns and the columns into rows, and it can be achieved by clicking the Transpose button in the Transform tab of the Query Editor toolbar. After transposing, the column names will completely be lost and you have to specify meaningful names to them.

 

dt29

 

Built-in Custom Columns: All of the built-in custom column types can be found on the Insert tab of the Query Editor toolbar. The same functionalities can also be achieved from the right click menu by selecting a column and from the Table icon in the top-left corner of the table.

 

dt30

 

  • Insert Index Column – The Add Index Column button adds a new column to your table containing values either from zero or from one. Index columns are typically not very useful on their own, but they are extremely useful as an intermediate step when creating other, more complex calculations.
  • Duplicating Columns – To duplicate a column simply select the column and click on the Duplicate Column in the toolbar. This column will contain the same values as the original column and will be unaffected by any subsequent changes to the original column.
  • Calculating Values Across Columns – Many different types of calculations such as sum, average, min, max, Standard Deviation can be performed across multiple numeric columns by selecting the columns and clicking one of the options under the Statistics, Standard, or Scientific drop-down boxes.

 

Adding Custom Columns with M Calculations: To create a custom column that uses an M expression, click the Add Custom Column button in the toolbar. When you do this, the Insert Custom Column dialog will appear.

 

dt31

 

Calculations based on values from other columns in the same row are easy to create in this dialog. To reference a value in another column, select the column name in the Available Columns box and then either double-click it or click the Insert button. When you do this, the name of the column, surrounded by square brackets to show it is a column reference, will appear in the Custom Column Formula text box and you can then add to the expression in that text box yourself. When you have finished writing your expression, click the OK button and a new column will be added to the right-hand side of the table containing the calculated values. You can enter the name of the new column in the New column name box.

Leave A Reply

Your email address will not be published. Required fields are marked *