20.04.16 Designing Indexes and Keys The structure of indexes and keys in relational databases may affect significantly the physical organization of data and thus the performance during data manipulation. The structure of indexes and keys in relational databases may affect significantly the physical organization of data and thus the performance during data manipulation. Sometimes, their selection is either random or automatically generated by ORM or other frameworks or tools, which may seriously threaten the performance and is most often detected in late phases of the project, when changing data scheme is more expensive. In order to avoid such consequences, we must pay attention to the selection and design of keys and indexes. Relational model defines the concept of keys, but also the concept of uniqueness constraint which are often mixed up during implementation. The main difference lies in the fact that the key attribute cannot be NULL, i.e. must have a determined value. From the relational model standpoint, the order of attributes in key (or uniqueness) constraint and the mutual order of tuples are irrelevant. As a relational database management system, SQL Server offers objects that provide the implementation of the relational model, but does not strictly insist on all limitations of the relational model. On the other hand, the details of physical implementation of such objects enable us to optimize data access in accordance with business needs. One of these objects is index, which allows the server to apply the key and uniqueness constraints defined in the relational model. SQL Server offers various types of indexes, but the most commonly used index is the one in the form of a B+ tree. If all data are stored at the leaf level of the index, then such an index is called a clustered index, and then the order of the index directly affects the physical organization of data. Otherwise, when the leaf level of the index contains references to respective table rows, such an index is called a non-clustered index. Out of all potential indexes, including those that need to be present due to key and uniqueness constraints, we may select any of them to be clustered. During such selection, we should consider that the columns of the clustered index should be unique, static (immutable), narrow, fixed in size, must have a value (not null) and ever-increasing values. If the columns do not meet these criteria, each of them will take up more space in all indexes within the given table, but also cause other effects that adversely affect performance such as fragmentation and physical row relocation in clustered index. When selecting an index (clustered or non-clustered), we should also pay attention to the usability of such index. In order for SQL Server to use the index to access data more efficiently, the arrangement of columns of such index must match the selection condition. In other words, if the index is arranged first by column1 then by column2, then the condition stating only column2 for selection is not compatible and such index will not be used. Therefore, for indexes on multiple columns, column order can make the difference between usable and unusable index. Given that the order of attributes is completely irrelevant to key constraints, this is an opportunity to perform optimization without changing the relational model or application layer. For index prefix columns (the ones that are positioned first in an index) we should primarily choose the columns that are often found in joint conditions and often used in filters, especially the ones that are used in the most expensive queries. In addition, we should consider selectivity. The most selective columns are a good candidate for index prefixes. Eventually, one should consider the usual filter operator by given column. Those columns that are always compared by equal sign are better candidates for prefixes in relation to the columns that are usually filtered by range (between, <, >, etc.). It is important to remember that keys and uniqueness constraints are an integral part of the model and that the order of attributes in them is irrelevant. On the other hand, indexes are a physical data structure that allows the enforcement of constraints and more efficient access to data, and the order of columns within them may be of great significance. However, sometimes even the column order cannot achieve a satisfactory result if a wrong index type is used. Therefore, already in the design phase, consider your keys and available index types, find out what they are intended for and how to use them in the most effective manner. Find out more information about physical data organization in SQL Server from Kimberly L. Tripp (@KimberlyLTripp), Kendra Little (@Kendra_Little), but also other prominent members of the global SQL Server community such as Paul Randal, Brent Ozar, Jonathan Kehayias, Jeremiah Peschka, Adam Machanic and many others. For the more ambitious, there are also preparatory materials for (retired) MCM certificates freely available at https://technet.microsoft.com/en-us/dn912438. Autor: Duško Mirković Duško is software architect in Schneider Electric DMS NS. If he had another job, software development would be his hobby. For the last 7 years he spent most of the time developing databases and data access layer for SQL Server, historical databases, data warehouses as well as database development process. Broad foundation in object oriented programming helps him to more efficiently bridge the gap between object and relational model. He believes that one can only be strong as much as the people around him, which gives him motivation to share. When he shuts down the computer he likes to spend time with his family and friends or read epic fantasy and science fiction.