Featured post

Data Warehouse Schemas

1.      Data Warehousing Schemas A logical representation of the entire database is known as data warehousing schemas. A database requir...

Sunday 5 February 2017

Data Marts

Data Marts

Data Marts are nothing but the sub sets of the data warehouse. The information or the data which is inside data mart pertains to single department whereas data warehouse is group of all of them and has all the information regardless of any group.

Why Data Marts.

Ø  To speed up the queries by reducing the volume of data to be scanned.
Ø  To segment data into different hardware platforms.
Ø  To structure data in a form suitable for a user access tool.

Design Data Mart

Data Mart should be a designed as smaller version of snowflake schema, within the data warehouse and should match the database design of data warehouse.
It helps maintaining control over database instances.


The summary table helps to summarize the data mart in the same way of data warehouse.

Data Warehouse Schemas

1.     Data Warehousing Schemas

A logical representation of the entire database is known as data warehousing schemas. A database requires relational model but Data Warehouse (DWH) uses star schema, snowflake schema, etc.

3.1Prerequisites for schema

Before knowing schemas the prominent object to learn is
·         Fact Tables
·         Dimension Tables

Fact Table

A fact table is a table created to store the measurement, metrics and facts of any business process according to business requirements. It is almost surrounded by dimension tables. It has minimum two columns one which stores the references to the dimension table and another one to store extra valuable data. Primary Key of fact table are usually composite primary keys.

Steps to define Fact Table.

Ø  Identify a business process for analysis (like sales).
Ø  Identify measures of facts (sales dollar), by asking questions like 'What number of XX are relevant for the business process?', replacing the XX with various options that make sense within the context of the business.
Ø  Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension)
Ø  List the columns that describe each dimension (region name, branch name, business unit name).
Ø  Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollars).

Dimension Table

It is one of the companions to the fact table. This table is nothing but a part of fact table which contains all the description about the business in the form text, that is why we call it as descriptive attribute table. The goal of a dimension table is to create standardized, conformed dimensions that can be shared across the enterprise's data warehouse environment, and enable joining to multiple fact tables representing various business processes.

3.2Types of Schemas

There are various types of DWH schemas which are as follows:
·         Star Schema
·         Snowflake Schema
·         Fact Constellation Schema
These all schemas mentioned above are used day to day in data warehousing, most frequently used schema is Star Schema.

Star Schema

The schema makes the star combination of fact table and dimension tables. One fact table will be in centre surrounded by dimension tables. Each dimension in a star schema is represented with only one dimension table.
Dimension Table contains set of attributes which is nothing but the descriptive information about business.

The above example is about sales data of company with respect to four dimensions time, item, location and branch. The fact table also contains the business related specific description, for example here dollars_sold and units_sold.

Snowflake Schema

In snowflake schema, each dimension table itself acts as a reference to another dimension table. To clear this we have an example, one dimension table named as item has column supplier_key which in return is the reference to the supplier dimension table.
This is required only if the tables are normalized, all the dimension tables are to be normalized for this feature.

Due to normalization in the snowflake schema, the redundancy is reduced and becomes easy to store and maintain the data.

Fact Constellation Schema

The schema which has multiple fact tables is known as fact constellation schema also known as galaxy schema. For example here we have two fact tables sales and shipping.


Both share same dimension table items for sales and shipping orders. It is possible to share the dimension table between fact tables.

OLAP - A Breif Description, Types of OLAP and OLAP operations

1.     OLAP           

OLAP is nothing but Online Analytical Processing. As we have discussed about OLAP and OLTP Detailed description is down below.

1.1.          What is OLAP?

Online Analytical Processing Server basically works on the multidimensional data model. It performs multidimensional analysis of business data and sophisticated data modelling. The chief component of OLAP is the OLAP server which resides between a client and database management system.  It allows higher authority to get an access of information through fast and consistent access to data.

1.2.          Types of OLAP

There are four types of OLAP Servers:
Ø  Relational OLAP
Ø  Multidimensional OLAP
Ø  Hybrid OLAP
Ø  Specialized SQL servers

1.2.1.   Relational OLAP

Relational OLAP is also known as ROLAP which is nothing but the OLAP server which maps multidimensional operations to standard relational operations. ROLAP servers are placed between relational back-end server and client front end tools.
ROLAP has following features:
ü  Implementation of aggregation navigation logic
ü  Optimization for each DBMS back end.
ü  Additional tools and Services.

1.2.2.   Multidimensional OLAP

Multidimensional OLAP is commonly known as MOLAP which utilizes the multidimensional operation. It uses array based multidimensional storage engines for multidimensional views on the data. Many MOLAP severs uses two levels of data storage representation to handle dense sparse data sets.

1.2.3.   Hybrid OLAP

It is combination of both ROLAP and MOLAP.
It offers following features:
ü  Higher Scalability of data of ROLAP
ü  Faster computation of MOLAP.
It is commonly known as HOLAP.

1.2.4.   Difference between ROLAP and MOLAP

ROLAP:
·         Data Retrieval is very slow.
·         Uses relational table.
·         Available space in data warehouse is enough for it.
MOLAP
·         Data retrieval is fast.
·         Uses sparse array to store data.
·         Maintains a separate database for data cubes.

1.3.          OLAP Operations

As discussed in above “what is OLAP“, it is based on multidimensional view of data.
Different types of OLAP operations are as follows:
Ø  Roll – up
Ø  Drill – Down
Ø  Slice – dice
Ø  Pivot (rotate)

1.3.1.   Roll – Up

Roll – up performs aggregation on data cube in any of the following ways:
Ø  Dimension Reduction: In this one dimension is reduced that is hierarchical order is reduced or rolled up.
Ø  Or by increasing dimensions by hierarchical order.
The following diagram illustrates roll – up:


Rollup is always performed on hierarchy. For example here in above example locations are in “cities” and are rolled up to “countries”. Aggregations performed on cities that sum of the items of cities.

1.3.2.   Drill – Down

Drill – Down is the reverse operation of roll – up because here we go from higher form to lower form.

The concept of Hierarchy was “Day > Month > Quarter > Year”, and here same is happening as quarter have been divided into months that is nothing but reverse hierarchy. So the aggregate function has been divided according to months.

1.3.3.   Slice and Dice

Slice is an action performed when you want one slice of information. For example we have data cube where time is in quarters is a dimension and cities as one dimension and items as another dimension. So if want data cube consisting of items and locations in quarter 1.

Here Slicing has been done for time Q1 for all items and locations.
Dice is another sort of action performed when we want a particular block of data it can more than two dimensions. For example if we require small data cube which has condition that time dimensions to be Q1 and Q2, Locations as Toronto and Vancouver and moreover items to be Mobile and Modem

It is a kind of Sub Data Cube. Sub Data Cube derived from a data cube.

1.3.4.   Pivot

Pivot means rotation of the table. So data inside data cube will be rotated along the rotational axes just to provide alternate presentation of data.


Data Warehouse A brief Introduction, Features, Architecture

1.     Data Warehouse

Before starting with BI, we need to know Data Warehouse Concepts.
Basic concepts of Data Warehouse include:
ü  Introduction and Definition
ü  Data Warehouse Architecture
ü  ETL process
ü  Staging Area, Metadata, Repository and Data Mart
ü  Difference Between OLAP & OLTP
ü  Schemas

1.1.          What is Data Warehouse (DWH)?

A Data Warehouse is a large store of data which is constructed by many sources of data within an organization to develop analysis, decision making and enables management to take decisions.
The term DWH was first coined by Bill Inmon in 1990.  Inmon’s definition is nothing but the features of DWH.
In an organization which database they keep. They have operational database for their own use. An operational database is nothing but the database which is used for the daily transactions in an organization and acts as a source system for the DWH (which will be discussed later in this document).
Why do we prefer DWH instead we have Operational Databases in an organization?
This question is a FAQ when you are trying to understand DWH.
Suppose your organization is a Hospital and you are a Business Head Executive in your organization moreover you want to see that how many people have joined your organization in previous month or how much revenue has been generated. To answer these questions you need to first have data stored in your database. Since an operational database undergoes many transactional processes so the data is updated and updated records remove the previous data and put in new updated data.
Data Warehouse provides us basic generalized and combined view of data present in the database and required for your business. It is nothing but a database kept separate from the operational databases, so that the data is having a record. Hence no frequent updates are done on the DWH. Moreover we can delete, read, update or even insert into the operational databases but in DWH you can only read.


1.2.          Data Warehouse Features

Remember in the very first paragraph I’ve mentioned that Inmon stated about DWH, those are nothing but the 4 features of DWH which are as follows:
Ø  Subject-Oriented:
Since it provides the information about the subject not about the organizational operations, so it is Subject-Oriented.
Ø  Integrated:
Since DWH is comprised of different sources of data and then the data is integrated, so it is integrated.
Ø  Time Variant:
Data collected in DWH is noted by particular time and date, so it is Time –Variant.
Ø  Non Volatile:
Keeping the history of data is very important, so when new data is inserted then previous data is not deleted.

1.3.          Difference between OLAP and OLTP:

OLAP (Data Warehouse):
a)      Historical processing of Data
b)      Used to analyse the business
c)      Based on Schema(Star Schema)
d)      Database Size from 100GB – 100TB.
OLTP (Operational Database):
a)      Day to Day Transactional Processing
b)      Used to run the business
c)      Based on ER model (ER – Entity Relationship)
d)      Database size from 100MB – 100GB.

1.4.          What is Data Warehousing?

Data Warehousing is nothing but the process of constructing and utilizing the data warehouse. In this we can gather the data and then analyse it according to business need.


1.5.          Data Warehouse Architecture

It’s a three tier architecture which is described as below
a)      Bottom Tier
b)      Middle Tier
c)      Top Tier

                                         

1.5.1.   Bottom Tier

This tier is related to database server and data warehousing layer. It is a Relational Database (RDBMS), where we user ETL tools which are nothing but the back end tools and they perform all extract, clean, transform and load. It consists of operational database, data warehouse server which in return performs monitoring, administration, and data warehouse and data mart.

1.5.2.   Middle Tier

Middle tier consists of OLAP server and it can be implemented in following ways:
Ø  Relational OLAP: It maps the operations on multidimensional databases to standard relational operations.
Ø  Multidimensional OLAP: This model basically focuses on operations on multidimensional databases.

1.5.3.   Top Tier


This tier is client tier which deals directly with client, front end – client tier. The tier consists of reporting tools, query tools, analysis tools and data mining tools.