Indexing has become
a critical buzzword in the arena of databases. Not only database users, but
also all of us are using indexes in our daily life. Got puzzled? Let me
give you one scenario. Have you ever noticed how did a librarian arranges and
orders all the books in the library? And when you are asking for a particular
book, he immediately gives it to you within some couple of minutes without
searching the whole library. This is possible because he has already indexed
each book and categorized them according to the author and the publisher and
again ordered all the books falling under same author and publisher in an
alphabetical order of the title. So when you ask for a particular book, he just
refers to the shelf belonging to the author and publisher of the book and the
book is with you in no time.
Let’s take another
common example of maintaining your household components. What you generally do
when you are in need of Feviquick glue. I can bet on that you
will search the whole house like meJ. But the point here is that you can avoid
this irritating search if you would have remembered or noted the place where
you had kept the Feviquick glue. Not only this, but for all
that is part of our daily activities, like books, clothes, visiting places,
etc. things that we use regularly or frequently, we remember them i.e. we keep
information about them in our memory (brain). This is what we called in
database language as indexing so that next time when we use them, it becomes
easier for us to access them. One thing you might have noted here is that I am
insisting about frequent and regular things and not about rare or occasional
things like you might not want to remember the places where you don’t think you
would have any business to do or the strangers in the street to whom you are
not expecting to meet again. But if you want to remember also, nobody will
resist you as human brains have enough capacity to remember everything that
comes throughout its life span. But in database world we have space
constraints as well as processing constraints, so we must
take wise decisions while indexing. From the above discussion, we concluded
that an index is a structure or classification used to find
information easier.
Let’s have a look on how
SQL Server is maintaining indexes in the databases.
Definition: Indexes
are relational data structure created for a table or a view in a database in
order to make data easier to search and retrieve when the query is processed.
The disk space required to store the index structure is usually less than that
required by the table. And the main reason behind are the key values on which
the indexing are usually done and not on all the columns.
Let’s have an example to
elaborate it in more detail: Consider a table nameddbo.EmployeeDim. The details
about dbo.EmployeeDim are given below:
Database: OLAP
Table Name:
dbo.EmployeeDim
Data:
emp_name
|
emp_id
|
emp_DOB
|
Department
|
Arun
|
101
|
1984-03-16 00:00:00.000
|
sales
|
Ahmed
|
201
|
1985-05-24 00:00:00.000
|
finance
|
Bobby
|
104
|
1985-04-09 00:00:00.000
|
HR
|
John
|
156
|
1982-07-12 00:00:00.000
|
marketing
|
Vasim
|
132
|
1982-02-12 00:00:00.000
|
production
|
Steve
|
229
|
1984-12-15 00:00:00.000
|
admin
|
Arif
|
203
|
1983-02-19 00:00:00.000
|
production
|
Script:
USE [OLAP]
GO
/****** Object:
Table [dbo].[EmployeeDim] ******/
CREATE TABLE
[dbo].[EmployeeDim](
[emp_name] [varchar](50) NULL,
[emp_id] [int] NULL,
[emp_DOB] [datetime] NULL,
[Department] [varchar](50) NULL
) ON [PRIMARY]
Table Structure:
Column_name
|
Type
|
Length
|
emp_name
|
varchar
|
50
|
emp_id
|
int
|
4
|
emp_DOB
|
datetime
|
8
|
Department
|
varchar
|
50
|
Here in this example, we
will use information provided by the employee table. You might have noted here
that none of the columns are sorted i.e. the information about the employees
are not in an ordered fashion. Although this is a small example with limited
number of rows, but guess a table like this having tones of records and u have
been told to find the DOB (date of birth) and department for the employee id
‘229’ how u will get it then and imagine how much time it will take to scan
each row to get the information about the employee id ‘229’. Before going for a
deep discussion, let’s have some idea about the terms used in indexing.
Table Scan: Whenever
we are firing a query on a database, the SQL server query optimizer is trying
to fetch the best possible way to run the query for better performance. In this
process, when the Query Optimizer determines that there are no useful indexes
available to produce the output result set, then it will go for table scan. In
table scan, it has to scan the whole table row by row in order to get the
requested rows. This is very slow and is recommended to avoid them. There are
rarely some cases where table scan are faster than index scan i.e. where the
table size is less.
Index Scan:
Auxiliary data structure used to accelerate access to data within the database.
Here, it will not crawl on the whole table rather it will directly fetch those
rows and pages that qualify the requesting query. For this a separate data
structure is maintained to keep information about the key values of the table
(described below).
Note: In some references, you
may find that Table scan is called as Index scan and Index scan is named as
Index seek.
The main property of
index is that it is in sorted form. For example, as in case of dictionary, all
the words are in sorted form, so when you are looking for any word, you are
directly jumping to the corresponding page related to the word. In the same way
the database engine is directly seeking the demanded rows if the corresponding
column is indexed.
In our above example,
suppose we want to find the employee name, his DOB and department information
for employee id ‘132’. The below query will do your job.
SELECT [emp_name]
,[emp_DOB]
,[Department]
FROM [OLAP].[dbo].[EmployeeDim]
WHERE [emp_id] = 156
Let’s have a look on the
background process that has carried on for this simple query. As of now, there
is no index in our table named [EmployeeDim], so the database engine will
search each record, row by row, to get the desired result. Below is shown how
the database engine performs the operation.

The Execution Plan for the above query shows that 100 % of the total cost is spent for Table Scan.
Now let’s create one index on the column named [emp_id] and see how it is working. Follow the below procedures to create an index on a table:
Step1: Open SQL Server
Management Server and navigate to [EmployeeDim] table as below:
Step2: Select the column named [emp_id] and click on “Manage Indexes and Keys” on the toolbar as below:
Step3: On “Indexes/Keys”
dialog box, click on Add button and click on the ellipse button shown beside
the Columns property.

Step4: Then on “Index Columns”
dialog box, select the column ‘emp_id’ in Column Name and click OK button.
After that press Close button and save the table
USE [OLAP]
GO
/****** Object: Index [IX_EmployeeDim] ******/
CREATE NONCLUSTERED
INDEX [IX_EmployeeDim] ON [dbo].[EmployeeDim]
(
[emp_id] ASC
)
• Expand Indexes folder under the table EmployeeDim, you
will find one index was created on the name of IX_EmployeeDim.
• Or else you can run the below query to check all the
available indexes under a table.
Now let’s execute the
same query to find out the information about the employee having employee id
132. This time we may see that SQL Server engine is performing its operation in
a different way and is shown as below:
A separate data
structure is created for the column emp_id in ascending order and has the
reference to the original row with the SL_No column (SL_No is taken to describe
the example pictorially; you can assume it as the row_id maintained internally
for each row by the database engine, it is usually not visible to the user).
Here the SQL server
database engine will directly jump to the row 132 in the index structure by
using search algorithm, usually it performs B-tree search for faster extraction
of data. As soon as it got the row, it will seek the corresponding record in
the main table by the reference id (row_id) and will display the rest of the
information from the table about the employee having emp_id ‘132’.
As in this example, we
have taken a small table, so performance difference is negligible, but in case
of big tables with records more than lakhs, indexes plays a vital role in
fetching data promptly.
Types of indexes: We have basically two types of indexes in SQL
Server.
Clustered index: It’s a
type of index where the data itself is arranged at the indexed column i.e. we
have the actual data at the leaf nodes. For example, in a science definition
pocket book, every word is arranged alphabetically and the definition of the
word is given along with it. Here there is no need to look up for the
referencing page, as in case of other example like in a general book with
indexes at the end of the book, contains page number along with the term or
word which is then followed to get the information about the term.
In our above example, if
we create cluster index on the column emp_id, the table itself will be
rearranged according to the sorting order of emp_id.
Syntax:
USE [OLAP]
GO
/****** Object: Index [IX_EmployeeDim] ******/
CREATE CLUSTERED INDEX
[IX_EmployeeDim] ON [dbo].[EmployeeDim]
(
[emp_id] ASC
)
After executing the
query, perform a select statement on the table [EmployeeDim], you will notice
that all the rows are ordered by [emp_id].
SELECT * FROM
dbo.EmployeeDim
Advantage:
·
Seeking operations are
much faster than normal indexing.
Disadvantage:
·
It will rearrange the
whole data at every operation of UPDATE, INSERT or DELETE on key column /
cluster index column, in order to keep ordering of the data in that column.
Non-Clustered index: Here leaf node contains index pages or
reference to the actual data record instead of data pages (actual data). Refer
the first example for details about the syntax. It will not affect the default
order of the table rather a separate data structure is maintained for each
index.
Advantage:
·
Can have more than one
non-clustered index on a table.
·
Reordering of the table
data is not necessary
Disadvantage:
·
Need separate data
structure. As a result consume more space
·
Over indexing i.e. more
number of indexes may cause performance degradation.
Cluster Indexing
|
Non-Cluster Indexing
|
• Actual data at the leaf nodes.
|
• Reference to the data on the leaf nodes.
|
• Data for the index column is sorted
physically.
|
• Data in the index structure is sorted.
|
• Updating data in the cluster index column
affects the whole table.
|
• Updating data does not affect the whole
table, only the index structure has to be rebuild for the index column.
|
• Only 1 Cluster Index is possible for one
table.
|
• Up to 249 Non Clustered Indexes are
possible for one table in SQL Server 2005 and 999 in SQL Server 2008.
|
• Clustered index are unique in nature for
a given table.
|
• This may not be unique.
|
• Can be efficient on fixed or rarely
changing tables.
|
• Is good for table which is modified
frequently.
|
• Index order matches the physical storage
order of the data.
|
• Index order does not match with the
physical order of the data.
|
• Does not required additional disk space
to store data in the index as it is physically reordering the main table.
|
• Separate data structure is maintained for
each index, resulting additional space consumption.
|
Best practices for Indexing:
·
Create index on columns
that are frequently used for data retrieval or in filter conditions. For
example in SELECT statement and WHERE clause or in ORDER BY,GROUP BY, and
DISTINCT clauses. As in our first example, if most of the queries are referencing
emp_name column instead of emp_id column, then it is wise to create index on
emp_name instead of emp_id i.e. create indexes based on use.
·
Every table should have
a clustered index otherwise it will act as heap table only even if it has
non-clustered indexes, resulting performance degradation.
·
Create indexes on
selective columns only that are frequently used.
·
Try to keep the keys of
clustered index as small as possible in size. (i.e. int, smallint)
·
Optimize cluster index
for ranged data i.e. data that are retrieved using BETWEEN, >, >=, <,
and <=.
·
Creative covering
indexes are a wise decision for most frequently used queries.
·
In case of composite
indexes, order the columns according to the restrictiveness i.e. smaller
quantity first as described in the composite index example above.
·
Create index on all
foreign keys which may be helpful in joining operations.
·
Use multiple narrow
indexes (i.e. on smaller number of columns, may be single column index) rather
than a few wide indexes (composite indexes).
·
Avoid using indexes that
are rarely used or else drop them.
·
Identify poorly
performing queries with the help of SQL Profiler, Query Execution plan, and try
to optimize it.
·
Last but not the least;
remove indexes while performing BULK insert or heavy insert operations
especially in case of Clustered Index as it will perform additional operations
for rearranging the indexes for each insertion of the record. Instead, drop
indexes before BULK insert and recreate it after insertion operation finishes.

No comments:
Post a Comment