Posted on

End To End BI 2

Business Intelligence - Information Consumers

THE ENVIRONMENTS – BI SIZING & HARDWARE (Staging) –
In the previous article we gave an introduction to the End to End BI Philosophy and introduced the various layers that comprise the Business Intelligence Data Warehouse (BIDW). In this article we will focus on getting your End to End BI environment ready by making consideration of sizing and hardware with some reference to software. This article assumes that you are a Business Intelligence Consultant or Architect with design authority or a BI Programme Manager or Business Owner wanting to quickly gain insight into Business Intelligence. If you are technically orientated, your decision making will benefit significantly if you collaborate with an Infrastructure Architect – so go make contact with one in your organisation.

You should plan to size and have hardware procured for five exclusive environments. These are a sandbox, a development environment, a system test environment, and pre-production and production environments. Let’s take a brief look at the nature of these environments.

Business Intelligence - Information Consumers

The sandbox is your playground; it’s a place where you can experiment and learn with zero risk of compromising a deployment on a box used to maintain real content. You can install and configure software, write code and make and break the environment. During my time working for BusinessObjects many software engineers preferred to use VMware for their sandbox. They would install and configure the BI environment and then clone it. Any system corruption – no problem, just blow out the corrupt environment, clone your clean environment again and your back in business. Ensure that the sandbox has sufficient specification to support the minimum requirement of the platforms you intend to install.

The other four environments are where you will manage the lifecycle of your Business Intelligence content. Your development and system test environments are a cut down version of your pre-production environment. Development and system test should run the same software versions and patches as the pre-production and production environments. However the CPU and RAM is usually considerably lower.

The development environment is the place for authorised developers to develop and unit test their code and content. It is not a place for so called ‘ghost’ developers to experiment. Ghost developers are persons that gain access to a development environment with a view to ‘experimenting’ and learning how things work. They will do this anonymously using a generic logon, backdoor or someone else’s password! If you’re a ghost developer, and you know if you are, go play on the sandbox and don’t risk compromising another developer’s hard work on the development environment. At this point you will benefit if you have version control software in place.

After unit testing content on the development environment it is promoted to the system test environment. This is the first time that the testing team will have the opportunity to test the code. To test effectively the system test environment must be stable. Any content identified with defects is passed back to development to be fixed. When content passes testing it is promoted to the pre-production environment. In pre-production we expect high volumes of industrial strength data to be available for the first time. This data may need to be encrypted/ anonymised depending on your industry. It is the opportunity for the content, solution and system to be pressure, stressed, volume tested (PSV) before being unleashed on the production environment. It is a useful place for DBAs to monitor query execution plans and performance tune

Your production environment should mirror your pre-production environment and be secure. By passing content on pre-production testing you are saying that the code is fit to go through the Change Advisory Board (CAB) and move to production. Testing on pre-production is only truly legitimate if the pre-production environment is exactly the same as your production environment.

For brevity the remainder of this article will now focus on the pre-production environment but the concepts can be applied successfully across all four environments. Taking the pre-production environment as our worked example we will need to consider the staging layer, the data mart layer, the ETL system, the Business Intelligence Platform, the RDBMS, hardware and software. This means that there are implications for budget, procurement timelines, licensing, disk space and performance. You may also need to consider contractural terms with third party suppliers of source data and any outsourcing organisations that manage your source systems. If your organisation utilises an ODS or BCNF entity tables you will also need to make other additional considerations. You must now make sizing estimates that will allow you to make informed decisions.

Our first consideration is the BIDW layer known as the staging area as this is where we will (usually) initially land the data. Let’s first of all consider the staging tables. This is the place that you will bring your data in to from outside the BIDW, it’s a kind of landing area. Remember that your sources for Business Intelligence may be wide ranging and include applications such as mainframe legacy systems and an ODS. Your historic load will probably comprise very large data volumes. You will need additional storage space if you need to store the data in flat files etc. before loading to staging. This can happen because of licensing, operating and contractual limitations.

There are three things to consider regarding the architecture and sizing of your staging area. These are the data volumes and nature of your transient staging area, your persistent staging area, and any data archiving area. Your transient data is that data made available to the ETL system for transform and load. You will not want to lose your transient data because you may need to reference it at some point in the future. However, for ETL performance reasons you will not want to hold unnecessarily large volumes of transient data so you will move the data off to your persistent area once the transforms are successful. You must make two decisions. The first is for how long you need to hold your transient data and the second is for how long to hold your persistent data. Let’s say that for legal and audit reasons you decide to persist your staging data for five years. At the end of the period you will need to delete or archive the expired data. To make these decisions you need guidance and remit from senior stakeholders.

It’s time to start making some calculations. Here’s an example thought process. One column field of type integer will equal four bytes. Now let’s say that my staging table has sixty columns that are all of type integer, that’s sixty multiplied by four equals 240 bytes of data per single row. Now let’s say that the source system serves up 60,000 rows of data per night. That’s 60,000 * 240 =14400000 bytes. Ok and let’s say now that I’ll hold my transient data for seven days, that’s 14400000 * 7 = 100800000 bytes of data. Let’s convert that to GBs. 100800000/1024/1024/1024= 0.093877 GB. Let’s say that I have 50 staging tables of similar size loaded every night. That’s equal to 0.093877 * 50 = 4.69GB. Not a lot of space by today’s standards. Now let’s calculate for persisting. That’s 4.69 * 52 (weeks) * 5 (years) = 1219.4 GB or a 1.2 Terabytes of data. We have just calculated your incremental load data for the next five years. You will probably also want to load the historic data held in your source systems. So you must repeat the calculations for the number of years back data (historic) that you want to make available to business users.

So now I have an idea of what disk space I need to procure for staging. Here your Infrastructure Architect can advise on the standard for storing data on a SAN etc. Where possible you should be working within the mandate and standards laid down by the Enterprise Architect.

There are always more details to consider around these areas but the things we have discussed form the backbone of what you will need to think about. There isn’t ‘one size fits all’ but there are frameworks and patterns which repeat themselves again and again. Your considerations now are the procurement, licensing, and installation of any additional/ new RDMS platform and the procurement of any additional hardware to make the staging area performant. In the ideal world these would already be specified in the Solution Architecture, but the reality is that BI Solution Architecture is often overlooked.

The next article in this series will consider your pre-production data marts.

Posted on

End To End BI 1

Star Schema

End to End BI Recommendations –

This is the first in a series of short articles elaborating end to end BI philosophies and methods.

In the end to end BI philosophy (referred to as Cornerstone Solution® by BI System Builders) the BI component of the data warehouse is considered to consist of four layers.  The data warehouse is not regarded as simply a set of physical tables and views on a RDBMS.

The four layers are the Staging Layer, the Data Mart Layer, the Semantic Layer, and the Presentation Layer. Collectively these can be thought of as a Business Intelligence Data Warehouse (BIDW). This is an important principle because this way of thinking engenders the beginning of an end to end BI mindset.

Star Schema
Achieving a successful BIDW commences with modelling the data mart layer according to business user requirements. Committing to undertake this activity first becomes a catalyst to all other necessary requirements to create an end to end BI solution:

  • The Business Architecture is considered (‘as is’ and ‘to be’) and the Business Event Analysis and Modelling (BEAM) methodology is used to understand the business requirements. This will result in the development of a set of logical dimensional models that will later directly translate to physical data warehouse tables. These physical models form the ETL target tables in the data mart layer.
  • Data profiling of source data should now be carried out and fields mapped from source systems/ source files to staging tables. This activity will enable the commencement of ETL job development to populate the staging tables. Note that a source for BI may be among others a flat file, third party data, a legacy system, an ODS, or a 3NF data warehouse.
  • Fields are now mapped through from staging tables to data mart tables enabling further ETL jobs to be developed.
  • The semantic layer is configured as an abstraction of the data mart tables for business use and querying purposes. It’s a good practice to start this early as it often illuminates any flaws in the dimensional design, thus de-risking the programme.
  • BI reports can be developed in the presentation layer for information consumer purposes.

The next article will be a short introduction to sizing, hardware and BI Platform for the BIDW.

Posted on

Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema by Lawrence Corr and Jim Stagnitto

BEAM Business Event Analysis & Modeling

BACKGROUND

I first worked with Lawrence Corr back in 2002 whilst I was designing content in the Analytical Applications Division (AAD) of the BusinessObjects Product Group.  At thatBEAM Business Event Analysis & Modeling time Lawrence was engaged as an external consultant to BusinessObjects, critiquing and advising us (AAD), on our BusinessObjects data warehouse design. Back then Lawrence also gave me my first formal dimensional modeling training when I attended his Data Warehousing Design Techniques course. Lawrence already had a very impressive reputation and was closely associated with Ralph Kimball.

It was therefore of interest to me when Lawrence said that he was writing a new book entitled Agile Data Warehouse Design.  Here’s what I found…

 

AGILE

There are two initial points to make about the book. Firstly, about the title of the book Agile Data Warehouse Design. Despite what the title might initially suggest I did not find the book to be about delivering a data warehouse using existing Agile techniques such as Scrum and extreme Programming (XP). It is rather about a structured method of bringing together Business Intelligence requirements analysis and dimensional modeling techniques using an Agile mindset.  The goal being to deliver logical models that work, in a highly time efficient fashion.  As such the Agile Manifesto is listed at the rear of the book and it is easy to see how the methods described meet the aims laid out in the manifesto.

 

PRACTICAL METHOD

Secondly, Agile Data Warehouse Design is a pragmatic book. It is not just agile theory alone. It will provide you with practical techniques, artifacts, and tools that will enable you to model successfully. I say that because I have already implemented these techniques, known as BEAM, extensively at a leading insurance company, at a leading car manufacturer working across all their vehicle brands and for a well known high street retailer. I have found that business users became actively engaged when introduced to the BEAM technique of the 7Ws (more on the 7Ws later).  Furthermore the BEAM tools made it easy for end users to contribute in an intelligent and structured way. That said, the business users did not need to understand the BEAM techniques themselves; in fact I never mentioned that we were using BEAM at all and they didn’t need to know. They simply attended the interviews and enjoyed having their brains picked and taking joint ownership of the developing dimensional model.

 

BI BREAKPOINTS

I became all the more interested in reading Agile Data Warehouse Design when I began to realise that it tackles head on several key ‘BI Breakpoints’:  the term used by BI System Builders to describe weaknesses in an End to End BI solution that can become points of failure. While the term is not explicitly used in the book it quickly became clear to me that the BEAM method will help developers address the specific BI Breakpoints between Business Analysis and Data Warehouse design.  As such, I found Agile Data Warehouse Design to be highly complimentary to the Cornerstone Solution® BI method.  The Cornerstone Solution® End to End BI method is used by BI System Builders to address BI Breakpoints. You can read more about BI Breakpoints here.

 

WHICH BI BREAKPOINTS DOES BEAM ADDRESS?

BEAM addresses BI Breakpoints around business analysis and dimensional model design.  A key issue for effective dimensional modeling, that I’ve faced many times, is that it requires the combination of three different contributing skill sets: Business Domain Expertise, Business Analysis, and Dimensional Modeling. The domain expertise is provided by the business. However, it is the role of the business analyst (BA) to extract that expertise, understand the business process area and then document the business requirements. To do this successfully requires the ability to ask the right questions.  Once the BA’s document is available it is translated into a dimensional model by the Business Intelligence and Data Warehouse (BI/DW) team.

Generally speaking I observe that BAs will have a predominantly business background while  dimensional modelers (DMs) a technical one.  Frequently a BA is assigned to go to the business and gather user requirements, the result of which is a copious document. Once the document is complete it is handed over to the BI/DW team to work with. Although the document is useful, typically it will not explicitly describe critical dimensional modeling design elements such as fact granularity, and fact and dimension table types and relationships as required by the BI/DW team for development purposes. Consequently, this handover can become a BI Breakpoint.

The BI/DW team will attempt to interpret the business analysis document as best they can. However, issues can arise because the BI/DW team had no involvement during the analysis stage and could not ask pertinent questions whilst the business analysis was being undertaken.  After sign-off of the business analysis document the BA may move onto another project and not be available to provide further help. As contractors and consultants are often used as BA’s they may even have left the business all together. This can cause a chasm of understanding to open up between the BI/DW team and what the business users had been describing and requesting.  Needing clarification or finding information missing and not knowing or wanting to approach the business again the BI/DW team may fall back onto something that is more securely under their control as a means to drive their modeling effort – source system data analysis.

The risk of building out dimensional models based on source system analysis is that the final tables will be close in design to the source data but may not model the business process area or meet business user needs.  The tables may not meet business requirements and they may not be a true dimensional design at all. To my mind this is a failing, because ignoring for the moment the new SAP HANA, I have always found dimensional models to be the most effective performance design for use with SAP BusinessObjects tools against a relational database and the best way to think of business process measurement in general.

To help avoid the BI Breakpoint that can occur between the BA and the BI/DW developers we have the notion of cross-functional teams.  A cross-functional team is superior to the structure previously mentioned.  The team members work closely and simultaneously together often in the same project room. The DM from the BI/DW team sits in on the BA’s interviews with the business users and starts to construct the logical model design. The dimensional modeler can ask clarification questions directly to the BA and business user at any point in the process. Furthermore the evolving logical model design can be frequently replayed to the rest of the team to confirm it. In my experience the cross-functional team has been more successful than the polarised BA and BI/DW (chasm-forming) teams. BEAM takes the concept of the cross-functional team much further and provides an intelligent and effective framework for the BA and BI/DW teams to work together in.  Following the BEAM method is an effective antidote to creating BI Breakpoints.

 

WHAT IS THE BEAM METHOD?

BEAM stands for Business Event Analysis and Modeling.  As the name suggests it combines elements of requirements analysis and data modeling. Its key concept is to use 7 dimensional types (the 7Ws) to identify and then elaborate business events. BEAM concentrates on business events rather than known reporting requirements so as to model whole business process areas.  This provides a major advantage.  Modeling a business process area yields a design that can be readily scaled as requirements grow. Modeling for a set of reporting requirements alone can lead to a narrow solution. ‘Narrow’ because the design may not lend itself to be scaled when new requirements are on boarded. Therefore, the BEAM approach helps avoid the BI Breakpoint of non-scalability. BEAM’s 7W approach also lays a solid foundation for ad-hoc reporting and self-service BI by teaching business users – by stealth – to think dimensionally.

The 7Ws used by BEAM are: Who, What, When, Where, How, How Many, and Why.  A similar conceptual technique is used in investigative journalism to ensure full story reporting coverage. For a specified business process area the BEAM idea is to identify event stories by asking a ’who does what’ question and then expressing the answer as a simple story.  An example of this would be ‘traders buy commodities’.  A series of these ’who does what’ stories are captured and then the remainder of the 7Ws such as the ‘when’ and ‘where’ are asked to drive out their interesting details. All the results are documented in a BEAM table template.

The BEAM table template is one of several tools employed, you will also learn how to use the BEAM tools of hierarchy charts, timelines, event matrices, and enhanced star schemas. The BEAM method will then take you through modeling events, dimensions, processes and star schemas to provide working software and documentation as detailed in the Agile Manifesto.

From the information gathered in the Business Event Analysis stage it is then possible to easily identify dimension and fact table types. Dimension and fact table patterns are explained in the second half of the book  ‘Modeling’.  If you are new to dimensional modeling you will learn much from the vast design and implementation experience of Lawrence and his co-writer Jim Stagnitto. The BEAM method and notation walks you through a natural continuum from the interview stage right through to the end dimensional model.

When the BEAM method is properly understood and implemented it will effectively bridge the gap (BI Breakpoint) between the BA and the DM. Both the BA and the DM can work together using BEAM or for someone with hybrid skills the two roles can become one. In summary Agile Data Warehouse Design is a thoroughly well written book that addresses BI Breakpoints and brings with it four key benefits.  It will show you how to practically apply an effective combined analysis and modeling method (BEAM). It will help engage business communities so that full business process areas can be modelled making your solution scalable. It will lower costs to the business by reducing analysis and modelling time. It will reduce the risk of a project struggling by delivering working software and documentation on time.

You can buy Agile Data Warehouse Design from Amazon and find out more about BEAM and matching agile/dimensional modelling courses on Lawrence’s Decision One Consulting web site.