Reference architecture – traditional DWH with PowerBI as a visualization layer
Last year we started working with a customer to build a business intelligence system, that can increase their time to decision, by having reports and aggregations always ready. At the same time the customer wanted to enable self-service BI scenarios.
What we essentially decided to build for the customer was a traditional datawarehouse with PowerBI as a presentation and visualization layer. The cloud was still not an option, so we had to stay on-premise for most of the layers. The implementation was not so straight forward and turned out to be quite a challenge. A challenge on two completely different subjects. On one side you have the technical challenges of adopting a completely new tool and technology that can connect easily to hundreds of sources and enable interactive data exploration. Secondly – you have problem with the governance. PowerBI is so easy to use, that anyone working with data immediately feels comfortable with the tool, which drives adoption on users and much more reports get produced. But what is the single source of truth for those reports? And how do you make sure that all business processes are implemented/coded/understood in all the different models?
So, we came up with a solution that:
- Must work with hundreds of millions of records
- Must be able to scale easily up and out
- Have a single source of truth layer, where all business logic is defined
- Have self-service capabilities so business analyzers can create interactive data exploration reports
- All this should be platform independent for users i.e. it should work both on Windows, iOS (none of the business users was using Linux distributions whatsoever 😊)
A little more on the components we used:
SQL Server Integration Services – orchestration, data loading, transformations, validations and overall data movement service. Extremely easy DEV to PROD cycle, using parametrized connection managers, variables for the sources, etc.
SQL Server DB Engine – to host the landing, staging and DWH layers, with table partitioning for implementing a sliding window scenario. A crucial feature in SQL Server 2017 was the ability to truncate partition, which extremely simplified the sliding window implementation
SQL Server Analysis Services Tabular – originally, we were not planning for AS Tabular. But it was the only solution to handle the volume of data. It couldn’t fit in the 1 GB dataset limit of PowerBI and we needed a solution. Using Analysis Services to host the entire semantic layer also helped in centralizing the entire logic and implement the Single Source of Truth core, we wanted to have. Having AS Tabular, allowed us to use PowerBI with Live connection and if anything was needed as a new measure for instance, could have been added directly in PowerBI Desktop, rather than the Tabular Cube (which might be quite complex).
PowerBI Pro – the self-service and presentation layer, connect live to on-prem AS Tabular database, using the data gateway. We were very cautious with the gateway, especially the data transfer, required traffic and security hardening. We implemented the solution with a single GW but it can easily scale out by creating a GW Cluster.
And here are some statistics for the whole implementation:
Along the way of building that solution we faced several challenges, but we’ll share them later, so subscribe to our blog and newsletter and follow our webinars for more details.