Incremental Refresh in Power BI

Share This Blog

Incremental Refresh in power bi service with Data Gateway

In this article we will see about the following sections.

  1. About Gateway, On-premises data gateway, its types and how a data source is added to a gateway.
  2. What is incremental refresh, how it is configured and how the policy is defined.
  3. About publishing and refreshing of the data from data source.
  4. The benefits and limitations of incremental refresh.
  5. Handling errors during incremental refresh.

About Gateway

A “gateway” is a device or software component that serves as a bridge or connection between two different networks or systems, enabling them to communicate and share information. So, a data gateway is one that acts as an intermediary between two networks for data communication.

An On-premises data gateway decrypts the credentials of on-premises data source (Data stored within an organization’s own physical infrastructure) and connects to it. The gateway sends queries to data source and returns the result to the gateway cloud service (Power BI Service in this case).

This On-premises data gateway needs to be installed for the Power BI cloud Service. And it is installed by keeping some considerations in view such as –

  1. Number of users
  2. Connection Type
  3. Location

In Power BI Service there may be repots that depend on multiple data sources and dashboards on multiple reports. In such cases all such data sources in the first case must go through a single gateway to contribute for the report and all such reports must go through a dedicated gateway to constitute a dashboard.

Types of gateways

There are two types of on-premises data gateways such as –

  • On-premises data gateway: Using this gateway multiple users can connect to multiple on-premises data sources.
  • On-premises data gateway (personal mode): Using this gateway only one user can connect to data sources and this gateway cannot be shared with other users. This is meant only to be used with Power BI and not for other cloud services such as – Fabric, Azure Analysis Services, Azure Logic Apps, etc.

How to Install an on-premises data gateway

  1. Download the standard gateway.

(https://learn.microsoft.com/en-us/data-integration/gateway/service-gateway-install)

  1. Install the gateway by keeping the installation path as default in the installer
  2. Then you need to enter your organization email address for Office 365 and Sign in.
  3. Select “Register a new gateway on this computer” and then hit Next.
  4. Now enter a name for the gateway, and the name must be a unique one across the tenant. Also enter a recovery key for future uses.

Add to an existing gateway cluster – check this box if you already have installed a standard gateway previously in another computer, with which you want to make a cluster. The benefit of making a cluster of gateways is to rout the data requests through the other available gateways when the primary is unavailable or fails in any case.

  1. Now have a review of the information and select Close.

How to add a Gateway data source

Power BI supports many on-premises data sources, and each source has its own particular requirements. A gateway can be used for a single data source or multiple data sources. For this case of ours, lets learn how to add a data source, SQL server data source to the gateway. And the same way other data sources can also be added to the gateway. And for adding a data source you must have a data gateway pre-installed (just as we learned in the previous section).

  1. Select the gear icon from the page header in the Power BI Service, and then select Manage Connections and gateways.
  1. Select New at the top of the screen to add new data source.
  2. Select On-premises on the New connection screen provide Gateway cluster name, provide a connection name, and select Data Source Type, for e.g. – SQL Server.
  3. Enter information about the data source. As given bellow –
  1. Select an authentication method to use when connecting to the data source.
  1. As long as you make an import connection with the data source you can leave Single sign-on options. They are mostly used with DirectQuery-based reports.
  1. Now under General > Privacy level, optionally configure a privacy policy for the data source. This is not applicable for DirectQuery.
  1. Now select Create. And you will find that a new connection has been created.

Now as we have installed a gateway and added a data source to it, we can implement incremental refresh in the dataset to make the dataset/report be able to get refreshed automatically as and when the tables in the dataset gets loaded or updated frequently.

Incremental Refresh

Incremental Refresh

Incremental Refresh and Real Time Data for Datasets

Incremental refresh is configured in Power BI Desktop model. After it is configured in the desktop model then it is published to the Power BI Service. Each table in the dataset contains a single partition and all rows of a table belongs to the single partition. If a table in the dataset contains millions and millions of rows or more, refreshing that table can take long time and consume an excessive number of resources.

But if incremental refresh is implemented, then what the service does is, it dynamically partitions the data and separates the portion of data that needs to be refreshed repeatedly from the data that don’t require any refreshment. This is done with the help of certain parameters that the service makes use of, which is defined while configuring the incremental refresh policy in the Power BI Desktop.

Configure Incremental Refresh Policy in Power BI Desktop

RangeStart and RangeEnd are the two reserved, case-sensitive date/time parameters that need to be created. These parameters, established within the Manage Parameters dialogue within Power Query Editor, are initially employed to filter the data brought into the Power BI Desktop model table, selecting solely the rows containing a date/time within that specific timeframe.

RangeStart signifies the earliest date/time, whereas RangeEnd signifies the most recent or latest date/time. Subsequent to the model’s publication to the service, the service itself automatically replaces RangeStart and RangeEnd. This replacement of the parameters is used to query data, as set by the refresh period indicated in the incremental refresh policy settings.

Suppose you have a table in your data source that updates up to 10,000 rows per day. To limit the number of rows initially loaded into the Power BI Desktop model, you can configure two-days period between RangeStart and RangeEnd.

  1. Go to New Parameters in Manage Parameters in the Power Query Editor.
  1. Create the Parameters.

When the RangeStart and RangeEnd parameters are set, you can apply special date filters to the date column of your table. When you choose Apply, a portion of the data that was chosen by the filters is imported into the model.

Define the Policy

After applying filters and loading a portion of the data into the model, you specify an incremental refresh policy for the table. The service uses the policy to generate and manage table partitions and carry out refresh actions after the model has been published to it. You can specify both necessary and optional options for the policy using the Incremental refresh and real-time data dialog box.

Now go to the Data pane in the Report view section of the model and right-click on the table on which you want to apply the incremental refresh policy.

Then the policy page will pop up and do as following.

  • The Select table listbox defaults to the table you selected in Data pane. Enable incremental refresh for the table with the slider.
  • Archive data starting before refresh date – The historical period in which rows with a date/time in that period are included in the dataset, together with rows for the current incomplete historical period, as well as rows in the refresh period up until the current date and time, is determined by the Archive data commencing before refresh date setting. For example – the table maintains the most recent five complete years of historical data in year segments if you pick five years. Up to and including the refresh period, the table will additionally include rows for the current year divided into quarters, months, or days.
  • Incrementally refresh data starting before refresh date – All rows with a date/time in that period are included in the refresh partitions and refreshed with each refresh operation during the incremental refresh period, which is determined by the incremental refresh setting, “Incrementally refresh data starting before refresh date.” For example – For example, if you give a refresh period of three days, the service overrides the RangeStart and RangeEnd parameters with each refresh operation to produce a query for rows with a date/time within a three-day period, with the beginning and ending dates and times determined by the current date and time.

Publishing and Refreshing of the data

Publish

You publish the model to the service after defining the incremental refresh policy. After you have finished publishing, you can run the initial refresh action on the dataset.

Refresh

After publishing to the service, you run an initial refresh action on the dataset. This should be an individual (manual) refresh so you can track progress. The initial refresh operation can take a long time to complete. Partitions must be formed, historical data must be loaded, objects such as relationships and hierarchies must be built or rebuilt, and computed objects must be recalculated.

Subsequent refresh actions, whether individual or scheduled, are substantially faster since only the incremental refresh partitions are refreshed. Other processing processes, such as partition merging and recalculation, must also occur, but they normally require far less time than the first refresh.

Merits and limitations of Incremental Refresh

  • Fewer refresh cycles, especially useful in fast-changing data requirements.
  • Refreshes are faster as only the recent data changes need to be updated, archiving the old data.
  • The data is more reliable as no long running connection is required.
  • Resource consumption is reduced significantly as smaller datasets require less memory and resources.
  • Large data sets can be scaled as data sets grow without the need to update the entire data in the database.
  • One of the demerits of incremental refresh is – datasets with this feature applied cannot be downloaded directly from the Power BI Service. This restriction is due to the way incremental refresh works and the underlying architecture of the of Power BI.

Handling Errors during Incremental Refresh

  • Verify if the gateway version is up to date.
  • Verify the report has a gateway selected. If there is no gateway selected, the data source might have changed or might be missing.
  • Sometime the connector used does not works properly, which is an indication of datasets used are unsupported data type sources for refresh.
  • Gateway unreachable when we setup with the credentials, it may be because of outdated gateway, Install the latest gateway might solve the issue.

Sitaram Dash