Database indexes what? why? and how?

Buddhima Udaranga
4 min readJun 2, 2019

--

You may have heard of these term when working with databases. For me this was a new term and I didn’t have an idea what are they and how they are going to work. Well let’s find out.

Indexes

Almost all the database types are currently supporting indexing. And creating an index is also not a hard task in a database. Query is so simple. But my question was how are they working.

What is an index ?

The most simple answer I got from the internet is that index is a data structure. Then another question pops up. What kind of data structure? That question does not have a simple answer. But in most cases index is a binary tree data structure. Let’s go through a simple example

example database

Consider the below query

SELECT * FROM cricketers WHERE first_name = ‘Steve’;

To extract those two entries with first name is Steve the database will have to go through all 1001 entries of the database. That’s obviously looks inefficient. What if we have another data structure that has sorted well organized First Name column and each entry pointed to the row of the table. That would have been so efficient. Something Like below

Okay now let’s talk math. The time complexity for a lookup in Unsorted array is O(n). The time complexity of lookup in binary tree is O(log(n)) which makes the query extremely efficient. Same applies to insertion and deletion. So it is now obvious that how indexes work. But when we go deep into indexing it is not that simple. Although Binary tree is the most commonly used data structure for the indexing there can be scenarios that using R-tree structures or hash table indexes. That depends on your requirement.

In most databases following syntax will work for creating an index

create index firstName_index ON cricketers (first_name);

The issue I have been through when I was working with indexes is that how I am going to know whether my index is used when executing a query. First thing that we should know is that most of the time it is database decision when to and when not to use the index. We can’t force it to use our index. But still there are ways to do that.

Consider the index we created when we execute a query with related to column name first_name. The database will find whether there are indexes that are created for that column.

If there is then it will decide is it worth to use this index for executing this query. For example there can be cases that not using the index is more efficient. For an example having two or three data entries can lead to not to use index. But again that highly depends on the database that we use.

For database types like H2, MySQL, Postgres, we can use the keyword explain in front of the query that we are executing. That will provide the execution plan.

explain SELECT * FROM cricketers WHERE first_name = ‘Steve’;

But for the db types Oracle, DB2, MSSQL this will not work. For that kind of databases you can use a database client like Datagrep or DBeaver. DBeaver does not support DB2 explain so I recommend using Datagrep.

That’s all about indexes. If you want to try out some indexes please feel free to use indexes used in WSO2 identity server. navigate to

{IS_HOME}/dbscripts

To get db scripts.

Although there are numerous advantageous of indexes there are also few disadvantages of indexes too

  1. Depending on the size of the table an index consumes a significant amount of space.
  2. When data get updated the database also need to update indexes which might cost in performance.

So index is a something that we should use but for where it is necessary only. There are few more terms that comes with the database scope. Such as tablespaces, bufferpools. Let’s meet with them in another article.

References

[1] https://dzone.com/articles/database-btree-indexing-in-sqlite

[2] https://www.programmerinterview.com/database-sql/what-is-an-index/

--

--

No responses yet