Editorials

Full Text Search Engine

I was reviewing the Full Text Search Engine implementation for SQL Server today, and came across a few things I didn’t know, and wanted to pass on. Before I get to the things new to me, let me start with a high level explanation of what the Full Text Search Engine is and does.

The Full Text Search capabilities have been available in Windows for a long time. SQL Server began using the Full Text Search Engine allowing it to link both SQL Table Data and Textual data to be queried in the same process. So, if you indexed a table containing sentences or XML, the engine could index individual words in text columns of a table. It could also index embedded blob data such as word documents stored in a table, or pointed to by a file stream.

Ultimately, this extends the query capabilities of SQL Server to search for word patterns across multiple columns or documents in an efficient manner. Comparing it to using LIKE on table data you will find it can be much faster. This is especially true if you have many rows of data, or large text fragments contained in your tables.

Full text indexing works by parsing the text in a document or table, finding significant words, and creating a map to the location of those words. This is broken up into different processes to reduce the amount of contention that can occur when data is being modified and queried at the same time. To optimize all of the index manipulation, SQL Server integrated the Full Text Search Engine to run inside the SQL Server process, giving SQL Server more control over the Full Text operations, balancing the loads to optimize query performance.

I did not know that the Full Text Indexes are maintained in fragments, allowing updates to be performed without having to rebuild the whole index. They are Grouped within a database, and work as one logical group. What this means is that as your data is modified, and full text indexes are updated, they become fragmented, and requiring merging of old and new data to get the current state of things.

As a result, it is important to use the REORGANIZE option of the ALTER FULLTEXT CATALOG command which will merge the different fragments into a single current index.

I also found that you can maintain your full text indexes on a schedule, automatically as data is added, or on demand. Using a schedule can increase performance allowing you to push the updates into hours with less activity. Of course this means your data continues to get stale until the next scheduled update occurs.

The Full Text Search capabilities are really cool. If you’d like to read more about them take a look at the MSDN documentation at https://msdn.microsoft.com/en-us/library/ms142571.aspx.

Cheers,

Ben