Data Warehousing

Sep 27 • Notes • 1631 Views • No Comments on Data Warehousing

Data warehousing is the process of gathering information from different parts of a business process in a centralized database. It can be defined as the collection of data that is used by employees in an organisation for easy access and smooth running. Since the early 1990s, data warehousing has become an essential part of any organisation and this resulted from the emergence of Information Technology and the revolution of the information management system. Data warehousing is an integral part of any organisation. It proves to be helpful in providing easy access to collective information to all the employees of an organisation. A data warehouse system is implemented to support decision making in an organisation. It helps in providing information or data when queries need extensive searching on a large scale.

Properties of Data warehouse
The data warehouse is that portion of an overall data environment that serves as the single integrated source of data for processing information. The data warehouse has specific characteristics that include the following:
(a) Subject Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject.
(b) Integrated: A single source of information for and about understanding multiple areas of interest. The data warehouse provides one stop shopping and contains information about a variety of subjects. Thus OIRAP data warehouse has information on students, faculties and staff, instructional workload, and student outcomes.
(c) Non volatile:  Stable information that does not change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed.
(d) Time variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse.

Layers in data warehouse architecture
Data warehouse architecture consists of the following interconnected layers.
(a) Operational Database layer: The source data for data warehouse. An organization’s Enterprise Resource Planning systems fall into this layer.
(b) Data access layer: The interface between the operational and informational access layer- Tools to extract, transform, load data into the warehouse fall into this layer.
(c) Metadata layer:  The data directory. This is usually more detailed than an operational system data. These are dictionary for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.
(d) Informational access layer: The accessed for reporting and analyzing and the tools for reporting and analyzing data. Business intelligence tools fall into this layer.

Data Mart

Data warehousing

Data warehousing

A data structure that is optimized for access. It is designed to facilitate end user analysis of data. It typically supports a single, analytic application used by a distinct set of workers. A data mart is a special form of data ware house, typically containing a topic oriented subset of enterprise data appropriate to a specific business function.

Advantages of Data mart

Data Mart

Data Mart

The advantages of data mart are listed below:
(a) It focuses on the presentation instead of organization of data.
(b) It facilitates data reporting.
(c) It provides meaningful reports to the users of their concern.
(d) It makes the data design simpler.
(e) It helps in incrementally building up the organisational data warehouse.
(f) It ensures security also.

Sources of data
There are different source data for the data warehouse which are kept as referral information for future use especially for decision making at the managerial level. The main sources are:
(a) Archives (Historical data):- Types of data  those were used previously by an organization and are outdated.
(b) Current systems of records (Recent history) : Types of data of recent past.
(c) Operational Transactions (Future Data source): The current day to day data used in transactions which can be a referral for future use.

OLAP stands for On-Line Analytical Processing. It is a method in which multi-dimensional analysis occurs. On-Line Analytical Processing (OLAP) is the technology that enables client applications to efficiently access this data. OLAP provides many benefits to analytical users, for example:
(a) An intuitive multidimensional data model makes it easy to select, navigate, and explore the data.
(b) An analytical query language provides power to explore complex business data relationships.
(c) Pre calculation of frequently queried data enables very fast response time to ad hoc queries.

Properties of Online Transaction Processing
Online Transaction Processing systems typically
(a) Support large numbers of concurrent users who are actively adding and modifying data.
(b) Represent the constantly changing state of an organisation but don’t save its history.
(c) Contain large amount of data including extensive data used to verify transactions.
(d) Have complex structures.
(e) Are tuned to be responsible to transaction activity.
(f) Provide the technology infrastructure to support the day to day operations of an organisation.

Difficulties are encountered when OLTP databases are used
Difficulties often encountered when OLTP (online transaction processing) databases are used for online analysis include the following:
(a) Analysts do not have the technical expertise required to create ad hoc queries against the complex data structure.
(b) Analytical queries that summarize large volumes of data adversely affect the ability of the system to respond to online transactions.
(c) System performance when responding to complex analysis queries can be slow or unpredictable, providing inadequate support to online analytical users.
(d) Constantly changing data interfaces with the consistency of analytical information.

How data warehouse are used to solve the problems in OLTP?
Data warehouses:
(a) Can combine data from heterogeneous data sources into a single homogeneous structure.
(b) Organize data in simplified structures for efficiency of analytical queries rather than for transaction processing.
(c) Contain transformed data that is valid, consistent, consolidated, and formatted for analysis.
(d) Provide stable data that represents business history.
(e) Are updated periodically with additional data rather than frequent transactions.
(f) Simply security requirements.

You may also like to visit: 

Data Warehousing and Data Mining
Important questions and answers on Data Mining
Data Mining concepts is the best source for your learning

Tell us Your Queries, Suggestions and Feedback

Your email address will not be published. Required fields are marked *

« »