Featured post

Data Warehouse Schemas

1.      Data Warehousing Schemas A logical representation of the entire database is known as data warehousing schemas. A database requir...

Showing posts with label Basic Data warehouse. Show all posts
Showing posts with label Basic Data warehouse. Show all posts

Sunday, 5 February 2017

Data Marts

Data Marts

Data Marts are nothing but the sub sets of the data warehouse. The information or the data which is inside data mart pertains to single department whereas data warehouse is group of all of them and has all the information regardless of any group.

Why Data Marts.

Ø  To speed up the queries by reducing the volume of data to be scanned.
Ø  To segment data into different hardware platforms.
Ø  To structure data in a form suitable for a user access tool.

Design Data Mart

Data Mart should be a designed as smaller version of snowflake schema, within the data warehouse and should match the database design of data warehouse.
It helps maintaining control over database instances.


The summary table helps to summarize the data mart in the same way of data warehouse.

Data Warehouse Schemas

1.     Data Warehousing Schemas

A logical representation of the entire database is known as data warehousing schemas. A database requires relational model but Data Warehouse (DWH) uses star schema, snowflake schema, etc.

3.1Prerequisites for schema

Before knowing schemas the prominent object to learn is
·         Fact Tables
·         Dimension Tables

Fact Table

A fact table is a table created to store the measurement, metrics and facts of any business process according to business requirements. It is almost surrounded by dimension tables. It has minimum two columns one which stores the references to the dimension table and another one to store extra valuable data. Primary Key of fact table are usually composite primary keys.

Steps to define Fact Table.

Ø  Identify a business process for analysis (like sales).
Ø  Identify measures of facts (sales dollar), by asking questions like 'What number of XX are relevant for the business process?', replacing the XX with various options that make sense within the context of the business.
Ø  Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension)
Ø  List the columns that describe each dimension (region name, branch name, business unit name).
Ø  Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollars).

Dimension Table

It is one of the companions to the fact table. This table is nothing but a part of fact table which contains all the description about the business in the form text, that is why we call it as descriptive attribute table. The goal of a dimension table is to create standardized, conformed dimensions that can be shared across the enterprise's data warehouse environment, and enable joining to multiple fact tables representing various business processes.

3.2Types of Schemas

There are various types of DWH schemas which are as follows:
·         Star Schema
·         Snowflake Schema
·         Fact Constellation Schema
These all schemas mentioned above are used day to day in data warehousing, most frequently used schema is Star Schema.

Star Schema

The schema makes the star combination of fact table and dimension tables. One fact table will be in centre surrounded by dimension tables. Each dimension in a star schema is represented with only one dimension table.
Dimension Table contains set of attributes which is nothing but the descriptive information about business.

The above example is about sales data of company with respect to four dimensions time, item, location and branch. The fact table also contains the business related specific description, for example here dollars_sold and units_sold.

Snowflake Schema

In snowflake schema, each dimension table itself acts as a reference to another dimension table. To clear this we have an example, one dimension table named as item has column supplier_key which in return is the reference to the supplier dimension table.
This is required only if the tables are normalized, all the dimension tables are to be normalized for this feature.

Due to normalization in the snowflake schema, the redundancy is reduced and becomes easy to store and maintain the data.

Fact Constellation Schema

The schema which has multiple fact tables is known as fact constellation schema also known as galaxy schema. For example here we have two fact tables sales and shipping.


Both share same dimension table items for sales and shipping orders. It is possible to share the dimension table between fact tables.

Data Warehouse A brief Introduction, Features, Architecture

1.     Data Warehouse

Before starting with BI, we need to know Data Warehouse Concepts.
Basic concepts of Data Warehouse include:
ü  Introduction and Definition
ü  Data Warehouse Architecture
ü  ETL process
ü  Staging Area, Metadata, Repository and Data Mart
ü  Difference Between OLAP & OLTP
ü  Schemas

1.1.          What is Data Warehouse (DWH)?

A Data Warehouse is a large store of data which is constructed by many sources of data within an organization to develop analysis, decision making and enables management to take decisions.
The term DWH was first coined by Bill Inmon in 1990.  Inmon’s definition is nothing but the features of DWH.
In an organization which database they keep. They have operational database for their own use. An operational database is nothing but the database which is used for the daily transactions in an organization and acts as a source system for the DWH (which will be discussed later in this document).
Why do we prefer DWH instead we have Operational Databases in an organization?
This question is a FAQ when you are trying to understand DWH.
Suppose your organization is a Hospital and you are a Business Head Executive in your organization moreover you want to see that how many people have joined your organization in previous month or how much revenue has been generated. To answer these questions you need to first have data stored in your database. Since an operational database undergoes many transactional processes so the data is updated and updated records remove the previous data and put in new updated data.
Data Warehouse provides us basic generalized and combined view of data present in the database and required for your business. It is nothing but a database kept separate from the operational databases, so that the data is having a record. Hence no frequent updates are done on the DWH. Moreover we can delete, read, update or even insert into the operational databases but in DWH you can only read.


1.2.          Data Warehouse Features

Remember in the very first paragraph I’ve mentioned that Inmon stated about DWH, those are nothing but the 4 features of DWH which are as follows:
Ø  Subject-Oriented:
Since it provides the information about the subject not about the organizational operations, so it is Subject-Oriented.
Ø  Integrated:
Since DWH is comprised of different sources of data and then the data is integrated, so it is integrated.
Ø  Time Variant:
Data collected in DWH is noted by particular time and date, so it is Time –Variant.
Ø  Non Volatile:
Keeping the history of data is very important, so when new data is inserted then previous data is not deleted.

1.3.          Difference between OLAP and OLTP:

OLAP (Data Warehouse):
a)      Historical processing of Data
b)      Used to analyse the business
c)      Based on Schema(Star Schema)
d)      Database Size from 100GB – 100TB.
OLTP (Operational Database):
a)      Day to Day Transactional Processing
b)      Used to run the business
c)      Based on ER model (ER – Entity Relationship)
d)      Database size from 100MB – 100GB.

1.4.          What is Data Warehousing?

Data Warehousing is nothing but the process of constructing and utilizing the data warehouse. In this we can gather the data and then analyse it according to business need.


1.5.          Data Warehouse Architecture

It’s a three tier architecture which is described as below
a)      Bottom Tier
b)      Middle Tier
c)      Top Tier

                                         

1.5.1.   Bottom Tier

This tier is related to database server and data warehousing layer. It is a Relational Database (RDBMS), where we user ETL tools which are nothing but the back end tools and they perform all extract, clean, transform and load. It consists of operational database, data warehouse server which in return performs monitoring, administration, and data warehouse and data mart.

1.5.2.   Middle Tier

Middle tier consists of OLAP server and it can be implemented in following ways:
Ø  Relational OLAP: It maps the operations on multidimensional databases to standard relational operations.
Ø  Multidimensional OLAP: This model basically focuses on operations on multidimensional databases.

1.5.3.   Top Tier


This tier is client tier which deals directly with client, front end – client tier. The tier consists of reporting tools, query tools, analysis tools and data mining tools.