Row-Oriented vs Column-Oriented Databases

Justgiveacar
4 min readSep 3, 2021

There are two types of data storage databases one is a row-oriented database and another one is a column-oriented database.

Row-Oriented (ACID Transactions): They are databases that organize data by the record, keeping all of the data associated with a record next to each other in memory. Row-oriented databases are the traditional way of organizing data and still provide some key benefits for storing data quickly. They are optimized for reading and writing rows efficiently.

Common row-oriented databases:

  • PostgreSQL
  • MySQL

Column-Oriented (Analytics): They are databases that organize data by field, keeping all of the data associated with a field next to each other in memory. Columnar databases have grown in popularity and provide performance advantages to querying data. They are optimized for reading and computing on columns efficiently.

Common column-oriented databases:

  • AWS RedShift
  • Google BigQuery
  • HBase

Row-Oriented Databases

Traditional Database Management Systems (DBMS) were created to store data. They are optimized to read and write a single row of data which leads to a series of design choices including having a row store architecture.

In a row store or row-oriented database, the data is stored row by row, such that the first column of a row will be next to the last column of the previous row.

For instance, let’s take this example data:

This data would be stored on a disk in a row-oriented database in order row by row like this:

This allows the database to write a row quickly because all that needs to be done to write to it is to tack on another row to the end of the data.

Row-oriented databases are still commonly used for Online Transactional Processing (OLTP) style applications since they can manage writes to the database well. However, another use case for databases is to analyze the data within them. This Online Analytical Processing (OLAP) use cases need a database that can support ad hoc querying of the data. This is where row-oriented databases are slower than column-oriented databases.

Reading from Row-Oriented Databases

Row-oriented databases are fast at retrieving a row or a set of rows but when performing an aggregation it brings extra data (columns) into memory which is slower than only selecting the columns that you are performing the aggregation on. In addition, the number of disks the row-oriented database might need to access is usually larger.

Number of Disks accessed

Let’s assume a Disk can only hold enough bytes of data for three columns to be stored on each disk. In a row-oriented database, the table above would be stored as:

So we can see that while adding data to a row-oriented database is quick and easy, getting data out of it can require extra memory to be used and multiple disks to be accessed.

Column-Oriented Databases

Data Warehouses were created in order to support analyzing data. These types of databases are read optimized.

In a column-oriented database, the data is stored such that each row of a column will be next to other rows from that same column.

Let’s look at the same data set again and see how it would be stored in a column-oriented database.

A table is stored one column at a time in order row by row:

Writing to a Column-Oriented Databases

If we want to add a new record, we have to navigate around the data to plug each column into where it should be.

If the data was stored on a single disk it would have the same extra memory problem as a row-oriented database, since it would need to bring everything into memory. However, column-oriented databases will have significant benefits when stored on separate disks.

If we placed the table above into the similarly restricted three columns of data disk they would be stored like this:

Reading from a Column-Oriented Database

To get the sum of the name the computer only needs to go to one disk where stores the name and sum all the values inside of it. No extra memory needs to be pulled in, and it accesses a minimal number of disks.

Summary

We can build better services by weighing the tradeoffs between response latency and data volume and understanding the difference between column-oriented and row-oriented databases. When clients rely on services for data and analytics to their company, choosing the right database can save a lot of headaches.

--

--