一个 MySQL 问题

问题描述

存在一个 orders 表,有 id, uid, gid 三个字段,查询同时存在 gid 为 1 和 2 的 uid

这里给出了 3 条 sql 语句和相关 EXPLAIN 结果(结果我就不排版了)。

30000 条数据 5000 用户 10 商品

插入了大约 3w 条测试数据后对几条 sql 语句做了一下测试。

1

EXPLAIN SELECT DISTINCT uid FROM orders WHERE uid IN (SELECT uid FROM orders WHERE gid = 1) AND gid = 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1	SIMPLE	orders
NULL
ALL
NULL

NULL

NULL

NULL
28644 10.00 Using where; Using temporary
1 SIMPLE <subquery2>
NULL
eq_ref <auto_key> <auto_key> 4 ops.orders.uid 1 100.00 Distinct
2 MATERIALIZED orders
NULL
ALL
NULL

NULL

NULL

NULL
28644 10.00 Using where; Distinct

2

EXPLAIN SELECT DISTINCT a.uid FROM orders2 a INNER JOIN orders2 b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1	SIMPLE	a
NULL
ALL
NULL

NULL

NULL

NULL
28644 10.00 Using where; Using temporary
1 SIMPLE b
NULL
ALL
NULL

NULL

NULL

NULL
28644 1.00 Using where; Distinct; Using join buffer (Block Nested Loop)

3

EXPLAIN SELECT uid FROM (SELECT DISTINCT uid, gid FROM orders WHERE gid = 1 OR gid = 2) a GROUP BY uid HAVING COUNT(*) = 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1	PRIMARY	<derived2>
NULL
ALL
NULL

NULL

NULL

NULL
5442 100.00 Using temporary; Using filesort
2 DERIVED orders
NULL
ALL
NULL

NULL

NULL

NULL
28644 19.00 Using where; Using temporary

加索引

uidgid 加上索引后,再次执行上面 3 条 sql 语句。

1

EXPLAIN SELECT DISTINCT uid FROM orders WHERE uid IN (SELECT uid FROM orders WHERE gid = 1) AND gid = 2

1
2
3
4
5
6
7
8
9
1	SIMPLE	orders
NULL
ref idx_uid,idx_gid idx_gid 4 const 2912 100.00 Using where; Using temporary
1 SIMPLE <subquery2>
NULL
eq_ref <auto_key> <auto_key> 4 ops.orders.uid 1 100.00 Distinct
2 MATERIALIZED orders
NULL
ref idx_uid,idx_gid idx_gid 4 const 3000 100.00 Distinct

2

EXPLAIN SELECT DISTINCT a.uid FROM orders a INNER JOIN orders b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2

1
2
3
4
5
6
1	SIMPLE	b
NULL
ref idx_uid,idx_gid idx_gid 4 const 2912 100.00 Using temporary
1 SIMPLE a
NULL
ref idx_uid,idx_gid idx_uid 4 ops.b.uid 5 10.47 Using where

3

EXPLAIN SELECT uid FROM (SELECT DISTINCT uid, gid FROM orders WHERE gid = 1 OR gid = 2) a GROUP BY uid HAVING COUNT(*) = 2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1	PRIMARY	<derived2>
NULL
ALL
NULL

NULL

NULL

NULL
5912 100.00 Using temporary; Using filesort
2 DERIVED orders
NULL
ALL idx_gid
NULL

NULL

NULL
28644 20.64 Using where; Using temporary

去重

去重后,一个 uid 最多只能有一个 gid,大约 22333 条数据。

1

EXPLAIN SELECT uid FROM orders WHERE uid IN (SELECT uid FROM orders WHERE gid = 1) AND gid = 2

1
2
3
4
5
6
7
8
9
10
11
12
1	SIMPLE	orders
NULL
ref idx_uid,idx_gid idx_gid 4 const 2220 100.00 Using where
1 SIMPLE <subquery2>
NULL
eq_ref <auto_key> <auto_key> 4 ops.orders.uid 1 100.00
NULL

2 MATERIALIZED orders
NULL
ref idx_uid,idx_gid idx_gid 4 const 2233 100.00
NULL

2

EXPLAIN SELECT a.uid FROM orders a INNER JOIN orders b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2

1
2
3
4
5
6
7
8
1	SIMPLE	b
NULL
ref idx_uid,idx_gid idx_gid 4 const 2220 100.00
NULL

1 SIMPLE a
NULL
ref idx_uid,idx_gid idx_uid 4 ops.b.uid 4 10.00 Using where

3

EXPLAIN SELECT uid FROM orders WHERE gid = 1 OR gid = 2 GROUP BY uid HAVING COUNT(*) = 2

1
2
3
4
5
1	SIMPLE	orders
NULL
index idx_uid,idx_gid idx_uid 4
NULL
22333 19.94 Using where

总结

碰到这个问题时,想过 1 和 2,因为对 MySQL 方面基础不扎实,主观上都否定了,回来测了一下大致才有了比较清晰的认识。

  • 1 这个解法当时讨论了一下,脑子一热忘记后面的 gid 也能用上索引,认为 1 只能用到一次索引,写出这个 sql 语句后就发现问题了。。。
  • 2 在加了索引后是性能上最好的,哪怕去重后性能上几乎不差 3,也是主观上被最快否决的答案。
  • 3 是 dave 给的答案,非常巧妙。在 3w 数据+去重的情况下,性能也是最好的,不过没去重情况下,需要一个 distinct 的子查询时会全表扫描,表现不是很好。

PS: 一开始打算在阿里云上起一个 rds 实例测试,结果发现需要用户余额 >= 100 才配使用,腾讯云就没有这个限制。然后腾讯云 MySQL 管理使用 PMA,emmmmmm

拓展题

找出有 gid 1 但没有 gid 2 的 uid,这题目前想到两个解法

SELECT uid FROM orders WHERE uid NOT IN (SELECT uid FROM orders WHERE gid = 2) AND gid = 1

另一个是

SELECT a.uid FROM orders2 a LEFT JOIN orders2 b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2 WHERE a.gid = 1 AND b.id IS NULL