Database Indexing Techniques

Indexes are database objects associated with database tables and created to speed up access to data within the tables. Indexes are part of constraint mechanism. When we insert data in table and if a column has primary key constraint, then database checks if row with the same primary key value exists. If it doesn’t then the full table is scanned, An index gives immediate access to key values , so that check for existence can be made virtually instantaneously.

A unique constraint also requires an index. Column of the unique constraint can be left Null unlike primary key constraint. In database schema tables have parent child relationship, with primary key on parent table and foreign key on child table. If indexes are applied on foreign key then retrieval of data becomes faster, deletion of data also speeds up. This can be helpful if a table has billions of records which rather would take hours to retrieve data.

Index is sorted list of key values, structured in a manner that makes search very efficient where each key value is a pointer to the row in the table. Index can also be used to select statement that includes order by, group by or union keyword since data is to be fetched in a sorted order. If index is applied to a column then data retrieval becomes faster automatically. Index can improve performance when you join tables. Depending on the size of tables and memory resources available, it may be quicker to scan tables into memory and join them there, rather than use indexes. Nested loop join first scans through one table using an index on other table to locate matching rows, this is usually a disk intensive operation. Hash join technique reads entire table in memory converts it into hash table and uses hashing algorithm to locate matching rows. This technique is more CPU and Memory intensive. A sort-merge join technique sorts tables on join column and then merges them together. This is often a compromise among disk, CPU and Memory.

An index improves performance for data retrieval but it reduces performance for DML operations. Since a row is inserted into table, a new key must be inserted into every index on table which increases strain on databases. Hence for transaction processing system where more records are inserted, fewer indexes are preferred. Whereas in query-intensive systems such as data warehousing you can create as many indexes as per your requirement.

There are many indexing techniques but most popular is B*Tree index and bit map index. B*tree is default index in most relational database systems.

B*Tree Index :
The top most level of the index is called the root and the lowest level is called the leaf node. All other levels in between are called branches. Both the root and branch contain entries that point to the next level in the index. A leaf node consists of the index key and pointers pointing to the physical location (i.e., row ids) in which the corresponding records are stored. This index is helpful if query has ‘=’ keyword since a particular value is searched in tree structure at leaf node. But it decreases performance if keywords such as ‘>’, ‘<’ ,’>=’ , ‘<=’ exist in query because every node is searched considering these range values. Also keyword NULL leads to full table scan in B*tree indexes.

E.g. Select * from products where product name is NULL.
This query leads to full table scan. Thus B*Tree is useful where table has high cardinality (distinct columns values) .

Bit- Map Index :
In bit map index, Indexes are represented in binary format. i.e in  0′s and 1′s format. This index is helpful where table has low cardinality. Bitmap index stores the column values in bits. Each bit represents a single value. For example, the gender column has two possible values: Male and Female. Two bits will be used in the bitmap to capture the index on the gender column. So the more distinct the value is, the more space is required to store the bitmap. Internally, the database engine, like Oracle, uses a map function to convert the bit location to the distinct value. Thus bitmap index works efficiently in case where table has low cardinality.

Hence indexes can be applied taking into consideration following points:

1) Table is used for intensive query processing (e.g data warehouse,  OLAP systems )  or transaction processing system ( OLTP systems ).

2) More DML operations are performed or only data retrieval is performed.

3) Table column/columns on which index is to be applied has low cardinality (distinct column values) or high cardinality. Depending on cardinality which index is to be applied can be decided. (e.g Bit map index or B* tree index) .

This entry was posted in Database and tagged , , , . Bookmark the permalink.

Leave a Reply