Before starting with BI, we need to
know Data Warehouse Concepts.
Basic concepts of Data Warehouse
include:
ü Introduction and Definition
ü Data Warehouse Architecture
ü ETL process
ü Staging Area, Metadata, Repository
and Data Mart
ü Difference Between OLAP & OLTP
ü Schemas
A Data Warehouse is a large store of
data which is constructed by many sources of data within an organization to
develop analysis, decision making and enables management to take decisions.
The term DWH was first coined by Bill
Inmon in 1990. Inmon’s definition is
nothing but the features of DWH.
In an organization which database
they keep. They have operational database for their own use. An operational
database is nothing but the database which is used for the daily transactions
in an organization and acts as a source system for the DWH (which will be discussed later in this
document).
Why do we prefer DWH instead we have Operational Databases in an
organization?
This question is a FAQ
when you are trying to understand DWH.
Suppose your organization is a
Hospital and you are a Business Head Executive in your organization moreover
you want to see that how many people have joined your organization in previous
month or how much revenue has been generated. To answer these questions you
need to first have data stored in your database. Since an operational database
undergoes many transactional processes so the data is updated and updated
records remove the previous data and put in new updated data.
Data Warehouse provides us basic
generalized and combined view of data present in the database and required for
your business. It is nothing but a database kept separate from the operational
databases, so that the data is having a record. Hence no frequent updates are
done on the DWH. Moreover we can delete, read, update or even insert into the
operational databases but in DWH you can only read.
Remember in the very first paragraph
I’ve mentioned that Inmon stated about DWH, those are nothing but the 4 features
of DWH which are as follows:
Ø Subject-Oriented:
Since
it provides the information about the subject not about the organizational
operations, so it is Subject-Oriented.
Ø Integrated:
Since
DWH is comprised of different sources of data and then the data is integrated,
so it is integrated.
Ø Time Variant:
Data
collected in DWH is noted by particular time and date, so it is Time –Variant.
Ø Non Volatile:
Keeping the
history of data is very important, so when new data is inserted then previous
data is not deleted.
OLAP (Data Warehouse):
a) Historical processing of Data
b) Used to analyse the business
c) Based on Schema(Star Schema)
d) Database Size from 100GB – 100TB.
OLTP (Operational Database):
a) Day to Day Transactional Processing
b) Used to run the business
c) Based on ER model (ER – Entity
Relationship)
d) Database size from 100MB – 100GB.
Data Warehousing is nothing but the
process of constructing and utilizing the data warehouse. In this we can gather
the data and then analyse it according to business need.
It’s a three tier
architecture which is described as below
a) Bottom Tier
b) Middle Tier
c) Top Tier
This tier is related to database
server and data warehousing layer. It is a Relational Database (RDBMS), where
we user ETL tools which are nothing but the back end tools and they perform all
extract, clean, transform and load. It consists of operational database, data warehouse
server which in return performs monitoring, administration, and data warehouse
and data mart.
Middle tier consists of
OLAP server and it can be implemented in following ways:
Ø Relational
OLAP: It maps the
operations on multidimensional databases to standard relational operations.
Ø Multidimensional
OLAP: This model
basically focuses on operations on multidimensional databases.
This tier
is client tier which deals directly with client, front end – client tier. The
tier consists of reporting tools, query tools, analysis tools and data mining
tools.