Before starting with BI, we need to know Data Warehouse Concepts.
Basic concepts of Data Warehouse include:
ü Introduction and Definition
ü Data Warehouse Architecture
ü ETL process
ü Staging Area, Metadata, Repository and Data Mart
ü Difference Between OLAP & OLTP
A Data Warehouse is a large store of data which is constructed by many sources of data within an organization to develop analysis, decision making and enables management to take decisions.
The term DWH was first coined by Bill Inmon in 1990. Inmon’s definition is nothing but the features of DWH.
In an organization which database they keep. They have operational database for their own use. An operational database is nothing but the database which is used for the daily transactions in an organization and acts as a source system for the DWH (which will be discussed later in this document).
Why do we prefer DWH instead we have Operational Databases in an organization?
This question is a FAQ when you are trying to understand DWH.
Suppose your organization is a Hospital and you are a Business Head Executive in your organization moreover you want to see that how many people have joined your organization in previous month or how much revenue has been generated. To answer these questions you need to first have data stored in your database. Since an operational database undergoes many transactional processes so the data is updated and updated records remove the previous data and put in new updated data.
Data Warehouse provides us basic generalized and combined view of data present in the database and required for your business. It is nothing but a database kept separate from the operational databases, so that the data is having a record. Hence no frequent updates are done on the DWH. Moreover we can delete, read, update or even insert into the operational databases but in DWH you can only read.
Remember in the very first paragraph I’ve mentioned that Inmon stated about DWH, those are nothing but the 4 features of DWH which are as follows:
Since it provides the information about the subject not about the organizational operations, so it is Subject-Oriented.
Since DWH is comprised of different sources of data and then the data is integrated, so it is integrated.
Ø Time Variant:
Data collected in DWH is noted by particular time and date, so it is Time –Variant.
Ø Non Volatile:
Keeping the history of data is very important, so when new data is inserted then previous data is not deleted.
OLAP (Data Warehouse):
a) Historical processing of Data
b) Used to analyse the business
c) Based on Schema(Star Schema)
d) Database Size from 100GB – 100TB.
OLTP (Operational Database):
a) Day to Day Transactional Processing
b) Used to run the business
c) Based on ER model (ER – Entity Relationship)
d) Database size from 100MB – 100GB.
Data Warehousing is nothing but the process of constructing and utilizing the data warehouse. In this we can gather the data and then analyse it according to business need.
It’s a three tier architecture which is described as below
a) Bottom Tier
b) Middle Tier
c) Top Tier
This tier is related to database server and data warehousing layer. It is a Relational Database (RDBMS), where we user ETL tools which are nothing but the back end tools and they perform all extract, clean, transform and load. It consists of operational database, data warehouse server which in return performs monitoring, administration, and data warehouse and data mart.
Middle tier consists of OLAP server and it can be implemented in following ways:
Ø Relational OLAP: It maps the operations on multidimensional databases to standard relational operations.
Ø Multidimensional OLAP: This model basically focuses on operations on multidimensional databases.
This tier is client tier which deals directly with client, front end – client tier. The tier consists of reporting tools, query tools, analysis tools and data mining tools.