Archive

Archive for October, 2011

SQL Server: Simple Parameterization

31/10/2011 Leave a comment

Simple parameterization is SQL Server feature which allow the optimizer to parameterize submitted queries. If submitted query has no parameters and has constant values plugged in, the optimizer can choose to treat constant values as parameters and automatically create parameterized query and execution plan for it. Simple parameterization is on by default and it’s designed to reduce cost of generating execution plans. SQL Server uses a simple parameterization for a relatively small set of queries where query plan doesn’t depend on particular constant values.
To see how simple parameterization works I will use AdventureWorks database. Let’s run some simple query and see what is written in SQL Server’s plan cache.

DBCC FREEPROCCACHE
GO
SELECT *
FROM Production.TransactionHistory
WHERE TransactionID = 213429
GO
SELECT cp.objtype,
cp.usecounts,
cp.size_in_bytes,
cp.plan_handle,
st.text,
qp.query_plan
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
WHERE st.text LIKE '%TransactionHistory%'
AND st.text NOT LIKE '%sys.dm_exec_cached_plans%'
GO

Here are the results from the plan cache query:

We can see that SQL Server created two entries in the cache. The first entry is submitted query with constant value and it’s cached as objtype adhoc. It is a shell query and it doesn’t contain execution plan. A shell query contains only the exact adhoc batch text and pointer to the parameterized (prepared) plan. The second entry is parameterized query and has objtype of prepared. Prepared query contains compiled execution plan and query text with a parameter in place of a constant value. We can also see that shell query use 16KB of cache while prepared plan use 32KB. If we open a query_plan of the shell query we can see that shell query contains only pointer to the prepared plan. From (1) we can see which parameterized plan is used with parameterized query text and in (2) we can see which statement is submitted.

Let’s run same type of the query but with different constant value (213430). If we observe SELECT operator in the actual execution plan of the query we can tell that query is parameterized.

(1) In “Parameter List” we can see “Parameter Compiled Value” and “Parameter Runtime Value”. (2) Shows us parameterized text and (3) gives us hash value of the query plan. We can use QueryPlanHash value to return query plan and execution count of the parameterized query.

SELECT st.text, qs.execution_count, qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text (qs.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) qp
WHERE qs.query_plan_hash = 0xAFB37EEDFFA28FE3

If we look at results of plan cache query again, we can see that the second shell query is created with the exact text of the second query and it points to a parameterized plan.

Why shell query is cached when it only points to parameterized query? It’s cached only to make it easier to find parameterized query if the exact same query with the same constant is re-executed later.

Now, I will show one interesting behavior of simple parameterization. Let’s run same type of the query but with constant value of 5. Here are the results from the plan cache query:

Two new entries are created. (1) One shell query and one (2) prepared but with parameter data type of tinyint. It’s strange that the optimizer created a new prepared query with tinyint data type parameter when there is a cached plan with int parameter value. It seems that the optimizer, during simple parameterization, use parameter with the lowest possible data type for the specified constant value of number data type. If we execute the query with varchar or nvarchar constant value, data type varchar(8000) and nvarchar(4000) is used as parameter.

As I mentioned at the begging of the post, relatively small set of queries will be simple parameterized. Only simple queries whose plan doesn’t depend on constant value will be considered “safe” for simple parameterization and will be cached. Also, there are a lot of statement types which SQL Server will not even attempt to auto-parameterize.  For complete list of these statement types you can check Apendix A in the article Plan Caching in SQL Server 2008.

Let’s take a look at these simple queries.

SELECT TransactionID, ProductID
FROM Production.TransactionHistory
WHERE ProductID = 725
GO
SELECT *
FROM Production.TransactionHistory
WHERE ProductID = 725
GO

If we execute those 2 queries and check results from plan cache query we can see that first query is parameterized and second one isn’t. Why?

For the first query, Query Optimizer thinks that nonclustered index seek is the best option for the query even when constant value change, because only columns we need are contained in the nonclustered index (clustered index is created on TransactionID column).
The second query is different, we want all columns from the table and a nonclustered index seek and key lookup might be a good choice when only a few rows are returned, but it might be a terrible choice when many rows are returned.  If we look at query_plan from the second query we can see that query is parameterized. It seems that query went through auto-parameterization process but query optimizer decided that it is “unsafe” and parameterized query isn’t cached.

Conclusion:

SQL server will simple parameterize some simple queries whose plan doesn’t depend on different constant values. During simple parameterization, if the optimizer decide that parameterized query is “safe”, two entries are written to the plan cache: shell query and prepared query. Shell query doesn’t contain full plan, it only point to parameterized plan which contain full execution plan for the query. Shell and parameterized queries are cached to reduce cost of generating execution plans. If we run a lot of adhoc queries with different constant value, more parameterized queries for the same type of query but with different parameter data type and large number of shell queries will be written to the plan cache.

SQL Server: Find statistics information in database

13/10/2011 1 comment

Recently I worked on a performance tuning of a database and noticed a lot of user created statistics. I checked few of them and saw that lots of user created statistics are multi-column statistics and some of them have a filter defined. There was no documentation of created statistics and I wanted to find out all statistics with all defined columns and filter. That is easy with sys.stats, but for better visibility, I wanted to list all statistics columns and defined filter in a single concatenated string.
Below script will return all statistics with all columns and defined filter for all user tables and views in a database. Script works in SQL Server 2008 and Denali (SQL Server 2005 doesn’t have filtered statistics).

;WITH Stat(SchemaName, TableName, StatisticsName, StatisticsType, filter_definition, stats_column_id, name)
AS
(
SELECT sh.name AS SchemaName ,
o.name AS TableName,
s.name AS StatisticsName,
CASE
WHEN s.user_created = 1 THEN 'USER'
WHEN s.auto_created = 1 THEN 'AUTO'
WHEN s.auto_created = 0 AND s.user_created = 0 THEN 'INDEX'
ELSE 'UNKNOWN'
END AS StatisticsType,
s.filter_definition,
sc.stats_column_id,
c.name
FROM sys.stats s
INNER JOIN sys.objects o
ON o.object_id = s.object_id
INNER JOIN sys.schemas sh
ON sh.schema_id = o.schema_id
INNER JOIN sys.stats_columns sc
ON sc.stats_id = s.stats_id
AND sc.object_id = s.object_id
INNER JOIN sys.columns c
ON c.object_id = sc.object_id
AND c.column_id = sc.column_id
WHERE o.type in ('U','V') -- UserTable and View
)

SELECT s.SchemaName,
s.TableName,
s.StatisticsName,
STUFF (( SELECT ', ' + s1.name
FROM Stat s1
WHERE s1.TableName=s.TableName
AND s1.StatisticsName=s.StatisticsName
ORDER BY s1.stats_column_id
FOR XML PATH('')
), 1,1, null) + CASE
WHEN s.filter_definition IS NOT NULL THEN
' FILTER: ' + s.filter_definition
ELSE ''
END AS StatisticsColumns
FROM Stat s
/* Filter statistics: USER - User-created, AUTO - automac-created, INDEX - index statistics */
-- WHERE s.StatisticsType in ('USER', 'AUTO')
GROUP BY s.SchemaName, s.TableName, s.StatisticsType, s.filter_definition, s.StatisticsName
ORDER BY s.SchemaName, s.TableName, s.StatisticsType

In the query we can filter for only automatically created and user created statistic if we want to exclude index statistics. Below is sample output for user and automatically created statistics in AdventureWorks database. (1) Show you the output for multi-column statistics with a filter defined.

If you have some suggestions for the script feel free to add comment.

UPDATE STATISTICS: How to cheat the optimizer

10/10/2011 Leave a comment

Did you ever need to see what execution plan will optimizer choose for your query on large set of data but you have a really small test database? Or you are testing your notification system for sort warning, for example, and you need to force query optimizer to think that tables used in a query are smaller? If your answer is yes, then keep reading.

Statistics is very important to the query optimizer for generating execution plans and it’s very important that statistics is up to date in a production database. To update statistics we use UPDATE STATISTICS statement. If you read BOL page for update statistics you can see that statement has some undocumented options. We can use these undocumented options, ROWCOUNT and PAGECOUNT, to cheat the optimizer. In the SQL Server Query Optimization Team blog,  we can read that ROWCOUNT and PAGECOUNT alter the internal metadata of the specified table or index by overriding the counters containing the row and page counts of the object.

In the next example, I will show how to use ROWCOUNT and PAGECOUNT options and how to cheat the optimizer. You will see that internal metadata for the table are changed and that statistics isn’t updated. Never use this in production database.

To see how it works I will use Sales.SalesOrderDetail table in AdventureWorks2008R2 database.
First, let us see statistics for table’s primary key and information’s about rows and pages in the table from sys.partitions and sys.dm_db_partition_stats.

DBCC SHOW_STATISTICS ('Sales.SalesOrderDetail',PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID);

SELECT * FROM sys.partitions
WHERE object_id = object_id('Sales.SalesOrderDetail')

SELECT * FROM sys.dm_db_partition_stats
WHERE object_id = object_id('Sales.SalesOrderDetail')

Results:

DBCC SHOW_STATISTICS tell us (1) that Aug 2 2010 is the last time when statistics was updated and that table has 121.317 rows. Sys.partition (2) also tells us that table has 121.317 rows. sys.dm_db_partition_stats (3) shows us that 1.234 pages are used for storing in-row data.

Now, let’s UPDATE STATISTICS with ROWCOUNT and PAGECOUNT options. I set 500.000 rows and 10.000 pages but you can set any other value.

UPDATE STATISTICS Sales.SalesOrderDetail
WITH ROWCOUNT = 500000,PAGECOUNT = 10000

After we updated statistics with ROWCOUNT and PAGECOUNT option let us see again what is written in statistics and in sys.partitions and sys.dm_db_partition_stats.

From (1) we can see that statistics isn’t updated. From (2) we can see that sys.partition has information that table has 500.000 rows and sys.dm_db_partition_stats thinks that table occupies 10.000 pages (3). These values are used by query optimizer during query processing.
Let’s run a simple query on Sales.SalesOrderDetail table and look at actual execution plan of the query.

SELECT SalesOrderID
FROM Sales.SalesOrderDetail

Query is very simple. We want all SalesOrderID from the table. We can see that (1) actual number of rows is 121.371 but query optimizer thought (2) that there are 500.000 rows in the table.

Using ROWCOUNT and PAGECOUNT options allow us to cheat the query optimizer and to easily see what kind of query plan will be generated on large amount of data.  You can also specify smaller number of rows and pages to simulate a sort warning for example.

Categories: Performance, SQL Server