Data warehouses are repositories designed to facilitate reporting and analysis; such databases can be completely detached from the information system, which means a great gain in speed when launching queries.
Online Transaction Processing, typically used by transaction-oriented applications like vtiger CRM, does not take into account the creation of detached databases, and potential analyses are conducted on the same data used by the application, and are not designed for situations where the amount of data to be analyzed is remarkable.
Data Model used in OLTP.
Datawarehouses configured for OLAP (Analytical Processing) use a multidimensional data model.
The most common style used to build a data warehouse is called star model: The central table is called fact table referencing any number of dimensions (the surrounding tables). Using such a method it is possible to produce reports including information about, for example, the quantity of processors produced in the second quarter of 2006. For such reason (hyper)cubes are created, and in that model data from different dimensions (potentially all variables) can be linked together.
A very important feature of data warehouses is that they can be redundant and summarize information that is usually calculated in the application as their main goal is to report on the information and updates are scarce. A simple example would be saving the contact name in a certain format (surname, name) because we have studied the client's needs and detected that this will be the way the contact name should be presented. With that we avoid having to use a function to calculate this value each time.
Finally data warehouses create versions of the information in order to conserve historical information and in an attempt to give consistent reporting over time. For example, if an account changes it's name, the BI system will create a new time stamped version of the account so that the entities that existed before the change are still related with the same account while new entities entering the system will related with the newer version.
For the vtiger CRM - BI project we have constructed a data warehouse that is capable of saving all the information contained in a productive vtiger CRM. All the entities in the system are saved as dimension tables and versioned based on decisions we have made during the analysis study. For example, the account is versioned each time the city, country, state, zip code, street, member of, email opt out and assigned to fields change.
We have created the fact tables:
In the Sales fact tables we have added specific calculations to obtain important information of loss and profit:
Once created the data warehouse we need to create ETL processes that will regularly extract the information form the production vtiger CRM and load it into the data warehouse for it to be used by the different reporting and analysis tools available. The basic procedure is depicted in the next image:
The Path of BI tools from data extraction to reporting.
As can already be seen, the data warehouse reflects the business logic and must be constructed to adapt perfectly to that logic. Thus changes will be necessary to the data warehouse, and to the rest of the procedure to achieve maximum benefit.
With this simplistic overview we hope to have given you an idea of how powerful the data warehouse concept i, and have prepared you to grasp the next important concept: Online Analytical Processing.