Components of a successful Data Warehouse
*
In this document, the term Data Warehouse also pertains to Data Marts and
Operational Data Stores (ODS).
Without it, there is no point in building a data
warehouse. There must be a reason or
reasons the business community needs to resolve through the creation of a data
warehouse.
Since they ultimately are the
customers, they need to be involved with the plan, the design and the
implementation.
The design team must market the concept to their
potential users, who could be almost everyone at the company. It is important that the user community
understands the purpose of the data warehouse, and what it will and won't do.
There needs to be a business model that explains how
the company will implement a data warehouse.
This model needs to be understood and accepted by key business and
information system stakeholders.
The data warehouse users will include many types,
such as casual users and power users. The casual users, who simply want to
review the same data on a regular basis, now represents about 80% of a
company’s decision support populace.
Those power users, who once ruled the decision support roost and pushed
for self-service capabilities, now represent a mere 20% of users.1
A business manager ultimately owns the data that
enters into the warehouse. There are three
new roles that are explained:
Data Steward: The Data Steward is someone from the business community
who has responsibility for the data in the data warehouse. They often need to generate political
support for cleansing of data.
Business Data
Contact: The Business Data Contact is a person or group to
contact if users have business questions about a data object. They are responsible for the business
definition, allowable values and the procedures related to the data. They handle the day-to-day issues and communication for
the Data Steward. They may also provide business requirements to the DW
project team.
Technical Data
Contact: The Technical Data Contact is the person or group to
contact if users have technical questions about a data object. The technical data contact is responsible
for the technical and operational support of the data.
The business user already has to go to many
different places to gather and analyze data.
The data warehouse should not be yet just “another” place to go. It must be THE place to go for decision
support information.
A new department needs to be created to support the
data warehouse. The most important part
(where most companies fail) is to create a team made up of both I/S
professionals and Business Analysts.
Most importantly, a support area must be established
to handle data questions and enhancement requests. The database requires an
ongoing support team for production problems, maintenance and enhancements to
the extract programs and data structures. Tool questions should be handled
through the normal process.
The
recommendation is to create 3 high level teams. These are:
·
Core
Team
·
Development
Team
·
Extended
Team
Each
team has multiple teams and/or departments within them.
There are three important methodologies that need to
be created and implemented:
Data
Management Methodology
This methodology describes the
steps to follow to add/modify data structures within the data warehouse.
Data Delivery Methodology
This methodology describes the
steps to follow to add/modify data delivery mechanisms within the data
warehouse.
Technology
and Training Methodology
This methodology describes the
steps to get an end-user set up with access to the database, and getting them
the appropriate data and tool training to effectively use the database.
This includes the data extract, transform and load (ETL)
tools, the database, and the query and reporting tools. The database can be an ODS, a data warehouse
and data marts or some combination of these.
Appropriate means it will work for the business users. The technology may be already in place.
Before tables are built in a production data
warehouse, test versions are created in a tool such as Microsoft Access. This allows the user to be involved with the
design and population of the table.
Design iterations can be accomplished quickly and easily before
involving other technical staff such as DBAs.
The concept behind Release
Management is quite simple. As requests
come in from end-users to fix, enhance or newly create data structures, they
are prioritized and logically grouped into “Releases”. Each release is small enough to be completed
quickly. Before items are included in a
release, there must be an operational source of data, and there must be
resources to work on it. Each release
will follow the Data Management Methodology explained in this document.
Cleansing and transformation plans need to be
developed. Standard error and exception
reporting needs to be designed and developed as well.
All table names and column names must be
user-friendly. Standard abbreviations should also be used. All columns end with a suffix that explains
what the column is. For example:
|
Suffix
|
Class Word
|
Description
|
|
AT
|
Amount
|
A
data item which denotes a monetary figure.
|
|
CD
|
Code
|
A
data item which contains a specific range of values.
|
|
DT
|
Date
|
A
data item which contains an actual date (MM/DD/YYYY).
|
|
DS
|
Description
|
A
data item which contains descriptive text.
|
|
FL
|
Flag
|
A
data item which contains 2 values such as Y/N, 0/1 or T/F.
|
|
NM
|
Name
|
A
data item which contains the actual name or title of a person, place or
thing.
|
|
NR
|
Number
|
A
data item used to denote numeric values.
|
|
PT
|
Percentage
|
A
data item which denotes a quantity as a percentage.
|
|
QY
|
Quantity
|
A
data item which represents the numeric count of something.
|
|
TS
|
Timestamp
|
A
data item that contains a system timestamp.
A timestamp includes both the date and time of an occurrence.
|
|
TX
|
Text
|
A
data item that contains free form text.
|
Access to the data warehouse must not be granted to anybody
until they have graduated from Data Training school. The business staff from the data warehouse support team provides
the training. 2
This is where the business and technical definitions
(meta data) are stored. It is a
database that is centrally located and available to everybody in the company.
A graphical application that is a front end browser
to the Meta Data Repository. It is
designed to be very easy to use, and provides table and column definitions,
along with data steward information to the business user.
Users need a place to enter their data enhancement
requests, and the data warehouse support team needs a place to report what has
been completed, what is in process and what is pending. This database application allows the entry
of new requests, and the reporting of new and existing requests.
The business user needs a place to go to get online
help. This could be a word processing
document, or a Lotus Notes database. All
documentation, diagrams, report inventory, trouble logs, etc need to be stored
here.
Since the data warehouse support team relies on
other areas (DBA’s, Data Analysts, etc) to complete it’s mission, these other
areas must be committed to specific service levels to meet the users needs and
requirements.
The new data warehouse support team is a permanent
fixture that will support the business users in data questions and
enhancements.
The data warehouse must be backed up on a regular
basis, must have disaster recovery procedures, and must have built in security.
After the data warehouse is up and running in a
production environment, usage may vary dramatically and rapidly. You will need tools to track and analyze
information on usage patterns - by user, time and a host of other factors.
REFERENCES
1.
Eckerson,
Wayne. “The Fallacy of Self-Service Decision Support”, DM Review January 1999.
2.
Rehm,
Clay. “Data Warehouse Dos and Don’ts”, IDWA Navigator, Fall 1998.