多列复合索引的使用 绕过微软sql server的一个缺陷
来源:asp之家 发布时间:2012-08-21 10:37:36
然而,微软sql server在处理这类索引时,有个重要的缺陷,那就是把本该编译成索引seek的操作编成了索引扫描,这可能导致严重性能下降
举个例子来说明问题,假设某个表T有索引 ( cityid, sentdate, userid), 现在有个分页列表功能,要获得大于某个多列复合索引V0的若干个记录的查询,用最简单表意的方式写出来就是 V >= V0, 如果分解开来,就是:
cityid > @cityid0 or (cityid = @cityid0 and (sentdate > @sentdate0 or (sentdate = @sentdate0 and userid >= @userid0))),
当你写出上述查询时,你会期待sql server会自动的把上述识别为V >= V0类型的边界条件,并使用index seek操作来实施该查询。然而,微软的sql server (2005版)有一个重要缺陷(其他的sql server如何还不得知), 当它遇到这样sql时,sql server就会采用index scan来实施,结果是您建立好的索引根本就没有被使用,如果这个表的数据量很大,那所造成的性能下降是非常大的。
对于这个问题,我曾经提交给微软的有关人士,他们进一步要求我去一个正式的网站上去提交这个缺陷,我懒得去做。
不过,对这个缺陷,还是有个办法能够绕过去的,只要把上面给出的条件变变形,sql server还是能够变回到是用index seek, 而不是低性能的index scan. 具体请看我的英文原文吧(对不起了, 我一旦写了中文,就不想翻成英文,反过来也一样, 估计大家英文都还可以,实在不行的就看黑体部分吧, ):
The seek predicate of the form "x > bookmark_of_x" is needed in paging related query. The compiler has no difficulty to parse it correctly if x is a single column index, or two columns index, however, if x is a three columns index or more, then the compiler will have a hard time to recognize it. This failure will result in that the seek predicate ended up in residue predicate, which results in a much worse execution plan.
To illustrate the point, take a example,
Create table A( a int, b int, c int, d float, primary key (a, b, c))
now check the plan for the query:
select c, d from A where (a> 111 or a= 111 and
(b > 222 or b = 222 and c > 333))
you can see a table scan op is used, and the Where clause ended up in residue predicate.
However, if you rewrite the query in an equivalent form:
select c, d from A where a> 111 or a= 111 and b > 222 or a= 111 and b= 222 and c >333
Then the compiler can choose an index seek op, which is desired.
The problem is, the compiler should be able to recognize the first form of seek predicate on multiple columns index, it saves the user from having to pay extra time to figure out a get-around, not to mention the first form is a more efficient form of same expression.
上面的问题,可以说是部分的绕过去了,但是,也有绕不过的时候,接着看下面一段:
It looks like that sql server lacks a consept of vector bookmark, or vector comparison or whatever you like to call it.
The workaround is not a perfect workaround. If sql server were to understand the concept of vector bookmark, then the following two would be the same in execution plan and performance:
1. select top(n) * from A where vectorIndex >= @vectorIndex
2. select * from A where vectorIndex >= @vectorIndex and vectorIndex <=@vectorIndexEnd
-- @vectorIndexEnd corresponds to the last row of 1.
However, test has shown that, the second statement takes far more time than the first statement, and sql server actually only seek to the begining of the vector range and scan to the end of the whole Index, instead of stop at the end of the vector range.
Not only sql server compile badly when the vector bookmark has 3 columns, test has shown that even with as few as 2 columns, sql serer still can not correctly recognize this is actually a vector range, example:
3. select top (100) a, b, c, d from A where a> 60 or a= 60 and b > 20
4. select a, b, c, d from A where (a> 60 or a= 60 and b > 20) and
(a< 60 or a= 60 and b <= 21),
上面两个查询实质相同(表中的数据刚好如此),并且给出同业的结果集,但是,3比4的速度要快的多,如果去看execution plan也证明3确实应当比4快.
也就是说, 即使在索引vectorIndex只含两列的情况下, sql server也无法正确的理解范围表达式 @vectorIndex0 < vectorIndex < @vectorIndex1, 它能把前半部分正确的解读为seek, 但是, 后半部分无法正确解读, 导致, sql server会一直扫描到整个表的末尾, 而不是在@vectorIndex1处停下来.
以下测试代码, 有兴趣的人可以拿去自己玩:
代码如下:
CREATE TABLE [dbo].[A](
[a] [int] NOT NULL,
[b] [int] NOT NULL,
[c] [int] NOT NULL,
[d] [float] NULL,
PRIMARY KEY CLUSTERED ([a] ASC, [b] ASC, [c] ASC)
)
declare @a int, @b int, @c int
set @a =1
while @a <= 100
begin
set @b = 1
begin tran
while @b <= 100
begin
set @c = 1
while @c <= 100
begin
INSERT INTO A (a, b, c, d)
VALUES (@a,@b,@c,@a+@b+@c)
set @c = @c + 1
end
set @b = @b + 1
end
commit
set @a = @a + 1
end
SET STATISTICS PROFILE ON
SET STATISTICS time ON
SET STATISTICS io ON
select top (10) a, b, c, d from A where (a> 60 or a= 60 and
(b > 20 or b = 20 and c >= 31))
select a, b, c, d from A where (a> 60 or a= 60 and
(b > 20 or b = 20 and c >= 31)) and (a< 60 or a= 60 and
(b < 20 or b = 20 and c <= 40))
select top (10) a, b, c, d from A where a> 60 or a= 60 and b > 20 or a= 60 and b= 20 and c >= 31
select a, b, c, d from A where (a> 60 or a= 60 and b > 20 or a= 60 and b= 20 and c >= 31) and
(a< 60 or a= 60 and b < 20 or a= 60 and b= 20 and c <= 40)
select top (100) a, b, c, d from A where a> 60 or a= 60 and b > 20
select a, b, c, d from A where (a> 60 or a= 60 and b > 20) and (a< 60 or a= 60 and b <= 21)
select top (100) a, b, c, d from A where a> 60 or a= 60 and b > 20
select a, b, c, d from A where (a> 60 or a= 60 and b > 20) and (a< 60 or a= 60 and b <= 21)


猜你喜欢
- 第一步:首先进入python安装目录下的 【scripts】.第二步:执行安装pyqt5的命令:python37 -m pip instal
- 在 MySQL 中通常我们使用 limit 来完成页面上的分页功能,但是当数据量达到一个很大的值之后,越往后翻页,接口的响应速度就越慢。本文
- 多表查询使用单个select 语句从多个表格中取出相关的查询结果,多表连接通常是建立在有相互关系的父子表上;1交叉连接第一个表格的所有行 乘
- 在使用Sublime Text3 的时候导numpy的包发现报错,找不到这个包,这是因为要配置pip源才能正常导包,进行from numpy
- 目录GitHub 消息的问题解决方案代码实现0.环境准备1、模拟登录github2.模拟进入Inbox3.检查僵尸项目4.取消关注僵尸项目5
- 在这系列视觉设计的文章间隙插一篇字体单位的文章。前文说了,字体单位应该用em而不用px,原因简单来说就是支持IE6下的字体缩放,在页面中按c
- Vue Demi是什么如果你想开发一个同时支持Vue2和Vue3的库可能想到以下两种方式:1.创建两个分支,分别支持Vue2和Vue32.只
- (1)int转strings := strconv.Itoa(i)等价于s := strconv.FormatInt(int64(i), 1
- Pandas DataFrame 取一行数据会得到Series的方法如题,想要取如下dataframe的一行数据,以为得到的还是datafr
- 前言在python中, 切片是一个经常会使用到的语法, 不管是元组, 列表还是字符串, 一般语法就是:sequence[ilow:ihigh
- 前言Helium工具是对Selenium的封装,将Selenium工具的使用变得更加简单。Selenium虽然好,但是在它的使用过程中元素的
- python环境 3.6.5 win7 linux环境同理先尝试了PyInstaller ,打包时一直提示 no module named
- 网上有不少生成缩略图的ASP组件。若你的虚拟空间不支持注册新组件,可能会感觉自己的网站失色不少。心晴不才,结合网上资源写了个无组件生成缩略图
- python 实现删除文件或文件夹  
- 所谓产品其实最终展现在用户面前的只是界面而已,所谓界面绝大多数时候只包括两个部分:图片、文字。重视界面上的每一个像素和每一个文字是UED的基
- 这两天在看webpack,今天卡在webpack-dev-server上了,折腾了一下午,一直无法正常运行,每次服务器也提示正常启动了,但是
- 对角矩阵scipy中的函数在scipy.linalg中,通过tri(N, M=None, k=0, dtype=None)可生成N&
- MySQL-8.0.22-winx64的数据库安装教程,供大家参考,具体内容如下1.安装步骤直接将安装包解压在安装目录之下。2.添加系统变量
- 1. 开发1.1. 架构Gorm使用可链接的API,*gorm.DB是链的桥梁,对于每个链API,它将创建一个新的关系。db, err :=
- pyquery库是jQuery的Python实现,可以用于解析HTML网页内容,使用方法:from pyquery import PyQue