深度分页优化

我们在做开发的时候，一般后台都是会对表格的数据进行分页，但是如果分页的深度很深，并且还需要排序筛选等，那么mysql需要扫描的数据就会非常大，导致对性能有影响。

创建测试数据

首先创建100万数据，这个过程可以使用存储过程：

create table person
(
    id       int          not null
        primary key,
    name     varchar(100) null,
    password varchar(100) null,
    email    varchar(100) null,
    created  timestamp    null,
    notes    varchar(200) null
);

DELIMITER $$

CREATE PROCEDURE insert_persons(IN total_records INT)
BEGIN
    DECLARE i INT DEFAULT 0;
    DECLARE random_name VARCHAR(100);
    DECLARE random_password VARCHAR(100);
    DECLARE random_email VARCHAR(100);
    DECLARE random_notes VARCHAR(200);

    START TRANSACTION;
    WHILE i < total_records
        DO
            SET random_name = CONCAT('name_', i);
            SET random_password = CONCAT('pass_', i);
            SET random_email = CONCAT('email_', i, '@example.com');
            SET random_notes = CONCAT('note_', i);

            INSERT INTO person (id, name, password, email, created, notes)
            VALUES (i, random_name, random_password, random_email, NOW(), random_notes);

            SET i = i + 1;

            -- 每10000条提交一次事务，防止事务太大影响性能
            IF i % 10000 = 0 THEN
                COMMIT;
                START TRANSACTION;
            END IF;
        END WHILE;
    COMMIT;
END$$

DELIMITER ;

CALL insert_persons(1000000);

select  count(*) from person;

测试SQL语句

简单分页，数据量比较少，测试时间大概在0.1s左右：

select  * from person where create_time > '2024-08-07 00:00:00' limit 1,20;

如果是分页深度90万数据，大概要0.8s

select  * from person where create_time > '2024-08-07 00:00:00' limit 900000,20;

我们使用explain查看执行计划：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	person	null	ALL	null	null	null	null	994034	33.33	Using where

创建索引：

这里发现会走全表扫描，那么我们去给created创建一个索引

create index person_create_time_index on person (create_time);

然后再去看执行计划，这时候会走索引扫描：，然而查询时间依然是2s

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	person	null	range	person_create_time_index	person_create_time_index	5	null	497017	100	Using index condition

我们会发现，加了索引，并且走索引了，时间反而从0.8s增加到了2s呢，这是为什么呢？

因为我们的create_time时间是二级索引，而我们知道二级索引不存实际的数据，只会保存主键Id，然后再通过主键ID去回表查询，找到满足条件的记录，这样也就是需要扫描90w+20条数据，然后扔掉前90w数据。这个过程需要回表90w+20次

数据库首先使用 create_time 索引定位符合 create_time > '2024-08-07 00:00:00' 条件的记录（在我的数据中是第一条）。由于有 LIMIT 900000, 20，所以数据库会先扫描 900000 条记录并跳过它们。

如何优化？

减少回表查询

可以通过减少回表次数来优化。

在B+树索引结构中：

主键索引，叶子节点存放的是整行数据
二级索引，叶子节点存放的是主键的值。

如果我们把查询条件，转移回到主键索引树，那就可以减少回表次数。想要转移到主键索引树查询的话，查询条件得改为主键id

select *
from person
where id >= (select a.id from person a where a.create_time > '2024-08-07 00:00:00' limit 900000,1)
and create_time > '2024-08-07 00:00:00'
limit 20;

这样我们会走两次索引，时间消耗在0.5s左右：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	person	null	range	PRIMARY,person_create_time_index	PRIMARY	4	null	207292	50	Using where
2	SUBQUERY	a	null	range	person_create_time_index	person_create_time_index	5	null	497017	100	Using where; Using index

子查询可以只查询id，从而避免了大量回表查询带来的消耗，并且只需要定位到第一条满足条件的值，然后再做分页查询

关联查询

关联查询思路和子查询一样，耗时大概也在0.5s左右

select * from person p1
inner join (select p2.id from person p2 where p2.create_time>'2024-08-07 00:00:00' limit 900000,20) p3
on p1.id=p3.id;

查询计划：

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<derived2>	null	ALL	null	null	null	null	497017	100	null
1	PRIMARY	p1	null	eq_ref	PRIMARY	PRIMARY	4	p3.id	1	100	null
2	DERIVED	p2	null	range	person_create_time_index	person_create_time_index	5	null	497017	100	Using where; Using index

主要思路大概是：先通过person_create_time_index二级索引树查询到满足条件的主键ID，再与原表通过主键ID内连接，这样后面直接走了主键索引了，同时也减少了回表。

游标翻页

每次查询的时候，记录上一次查询到了哪里，然后下次查询的时候直接从这里开始查询，这个记录可以由前端传递

 select  * from person where id>=900000 and create_time>'2024-08-07 00:00:00' limit 20;

这个过程大概耗时0.1s左右

创建测试数据​

测试SQL语句​

如何优化？​

减少回表查询​

关联查询​

游标翻页​