Email Subscription Form

Saturday, November 9, 2019

SQL Query Secrets

Have you ever been querying a SQL table, and one of your queries seems to take forever?  And then the next query you run takes milliseconds?  This would frequently happen to me, and I thought it meant that the server that hosted the database was unreliable in some way.  But this week I learned about indexes, and that the way we structure our queries has a huge impact on how long they will take to execute!  In this post, I'll describe what indexes are and talk about the ways we can use them to optimize our queries.

An index is a database structure that is designed to speed up queries in a table.  An easy way to understand this is to think about the index at the back of a book.  Let's say you have a book on car repair, and you want to find information about your car's brakes.  You could look up "brakes" in the index, or you could search through every single page of the book for the word "brakes".  It's pretty obvious which would take less time!

Unlike books, databases can have more than one index.  There are two different kinds of indexes: clustered and unclustered.  A clustered index is used to store a table in sorted order.  There can only be one clustered index, because the table is stored in only one order.  Unclustered indexes are stored in the original table order, but they save the location of certain fields in the table.

Let's take a look at an example.  If we had a table like this, called the Users table:

UserIdStateLastNameFirstNameEmailMobile Phone

and we had a clustered index defined to have UserId as the key, a search on UserId would be very fast, and the data returned would be in order by UserId.

The table could also use unclustered indexes, such as the following:

State- the records in the table are indexed by state
LastNameFirstName- the records in the table are indexed by LastName and FirstName

When you query a database, the query will first look to see if an index can be used to speed up the search.  For example, if I made the request 
select LastName, FirstName from Users where UserId = 5 
the query would use the UserId index and the LastNameFirstName index to find the record.

Similarly, if I made the request
select LastName, FirstName from Users where State = 'MA'
the query would use the LastNameFirstName index and the State index to find the record.

Of course, with a table of only five records, optimizing in this way won't make much of a difference.  But imagine that this table had five million records, and you can see how using an index would be very helpful.

Querying a table on a non-indexed field is called a table scan.  The query needs to search through the entire table for the values, just as a person who wasn't using a book index would have to search through every single page of the book.  

How can you know what indexes a table has?  You can find out with one simple query:
EXEC sp_helpindex "Users" 
where you would replace "Users" with whatever the name of the table is.  This will return a result of all of the clustered and unclustered indexes applied to the table, and the result will include the name of the index, a description of the index, and all the keys used in the index.

If you want to optimize your SQL queries, only ask for the data that you really need, rather than asking for select *.  Because not every field in the table is indexed, looking for every field will take longer.  

Let's say that you want to query the Users table to find the email addresses of all of the users who live in Massachusetts (MA).  But you also would like to have some more information about those users.  You could ask for 
select FirstName, LastName, Email from Users where State = 'MA'.
To find the records, the query will use the FirstNameLastName index and the State index.  Only the Email will be a non-indexed field.

But if you asked for
select * from Users where State = 'MA'
now the query needs to look for two different non-indexed fields: Email and Mobile Phone.

Another helpful tip is to specify all the keys in an index when you want to use that index to make a query.  For example, if you wanted to find the Email for Prunella Prunewhip, you should ask for 
select Email from Users where LastName = 'Prunewhip' and FirstName = 'Prunella'
rather than asking for
select Email from Users where LastName = 'Prunewhip'.
In the second example, the LastNameFirstName index won't be used.

And when you want to use an index, the query will run faster if you specify the keys in the order they appear, so it's better to say
where LastName = 'Prunewhip' and FirstName = 'Prunella'
than it is to say
where FirstName = 'Prunella' and LastName = 'Prunewhip'

Here's one more tip: when you want to use an index, be sure not to manipulate one of the index keys in your query, because this will mean that the index won't be used.  For example, if you had a table like this, called Grades:


and you had an unclustered index called LastNameGrade, and you executed a query like:
select LastName from Grades where (Grade + 100) = 178
the LastNameGrade index wouldn't be used, because the Grade value was being manipulated.  It's necessary for the query to go through the entire table and add 100 to each Grade field in order to search for the correct value.

Armed with this knowledge, you should be able to create queries that will run as fast as possible, getting you the data you need.  I'd like to extend my thanks to my colleague Cindy Gall, whose informative workshop inspired this post!

No comments:

Post a Comment

Code Like a Developer...

I'll be honest: I don't love coding.  Don't get me wrong, I love test automation!  I love the feeling of solving a technical cha...