The Difference Between a Data Warehouse and a Database

·

·

, ,

Processing Types: OLAP vs OLTP

The most significant difference between databases and data warehouses is how they process data.

Databases use OnLine Transactional Processing (OLTP) to delete, insert, replace, and update large numbers of short online transactions quickly. This type of processing immediately responds to user requests, and so is used to process the day-to-day operations of a business in real-time. For example, if a user wants to reserve a hotel room using an online booking form, the process is executed with OLTP.

Data warehouses use OnLine Analytical Processing (OLAP) to analyze massive volumes of data rapidly. This process gives analysts the power to look at your data from different points of view. For example, even though your database records sales data for every minute of every day, you may just want to know the total amount sold each day. To do this, you need to collect and sum the sales data together for each day. OLAP is specifically designed to do this and using it for data warehousing 1000x faster than if you used OLTP to perform the same calculation.


Optimization

A database is optimized to update (add, modify, or delete) data with maximum speed and efficiency. Response times from databases need to be extremely quick for efficient transaction processing. The most important aspect of a database is that it records the write operation in the system; a company won’t be in business very long if its database didn’t make a record of every purchase!

Data warehouses are optimized to rapidly execute a low number of complex queries on large multi-dimensional datasets.


Data Structure

The data in databases are normalized. The goal of normalization is to reduce and even eliminate data redundancy, i.e., storing the same piece of data more than once. This reduction of duplicate data leads to increased consistency and, thus, more accurate data as the database stores it in only one place.


Normalizing data splits it into many different tables. Each table represents a separate entity of the data. For example, a database recording BOOK SALES may have three tables to denote BOOK information, the SUBJECT covered in the book, and the PUBLISHER.


Normalization of book sales data

Normalizing data ensures the database takes up minimal disk space and so it is memory efficient. However, it is not query efficient. Querying a normalized database can be slow and cumbersome. Since businesses want to perform complex queries on the data in their data warehouse, that data is often denormalized and contains repeated data for easier access.


Denormalization of book sales data

Data Analysis

Databases usually just process transactions, but it is also possible to perform data analysis with them. However, in-depth exploration is challenging for both the user and computer due to the normalized data structure and the large number of table joins you need to perform. It requires a skilled developer or analyst to create and execute complex queries on a DataBase Management System (DBSM), which takes up a lot of time and computing resources. Moreover, the analysis does not go deep – the best you can get is a one-time static report as databases just give a snapshot of data at a specific time.


Data warehouses are designed to perform complex analytical queries on large multi-dimensional datasets in a straightforward manner. There is no need to learn advanced theory or how to use sophisticated DBMS software. Not only is the analysis simpler to perform, but the results are much more useful; you can dive deep and see how your data changes over time, rather than the snapshot that databases provide.


Data Timeline

Databases process the day-to-day transactions for one aspect of the business. Therefore, they typically contain current, rather than historical data about one business process.

Data warehouses are used for analytical purposes and business reporting. Data warehouses typically store historical data by integrating copies of transaction data from disparate sources. Data warehouses can also use real-time data feeds for reports that use the most current, integrated information.


Concurrent Users

Databases support thousands of concurrent users because they are updated in real-time to reflect the business’s transactions. Thus, many users need to interact with the database simultaneously without affecting its performance.


However, only one user can modify a piece of data at a time – it would be disastrous if two users overwrote the same information in different ways at the same time!


In contrast, data warehouses support a limited number of concurrent users. A data warehouse is separated from front-end applications, and using it involves writing and executing complex queries. These queries are computationally expensive, and so only a small number of people can use the system simultaneously.


Database vs. Data Warehouse Comparison

Property Database Data Warehouse
Processing Method OnLine Transaction Processing (OLTP) OnLine Analytical Processing (OLAP)
Optimization Deletes, inserts, replaces and updates large numbers of short online transactions quickly. Rapidly analyze massive volumes of data and provide different viewpoints for analysts.
Data structure Highly normalized data structure with many different tables containing no redundant data.

Thus, data is more accurate but slow to retrieve.
Denormalized data structure with few tables containing repeat data.

Thus, data is potentially less accurate but fast to retrieve.
Data timeline Current, real-time data for one part of the business Historical data for all parts of the business
Data analysis Analysis is slow and painful due to the large number of table joins needed and the small time frame of data available. Analysis is fast and easy due to the small number of table joins needed and the extensive time frame of data available.
Concurrent users Thousands of concurrent users supported.

However, only one user can modify each piece of data at a time.
Small number of concurrent users.
ACID compliance Records data in an ACID-compliant manner to ensure the highest levels of integrity. Not always ACID-compliant though some companies do offer it.
Uptime 99.99% uptime Downtime is built-in to accommodate periodic uploads of new data
Storage Limited to a single data source from a particular business function All data sources from all business functions
Query type Simple transactional queries Complex queries for in-depth analysis
Data summary Highly granular and precise As granular and precise as you want it to be




Leave a Reply