问题描述
存在一个 orders
表,有 id
, uid
, gid
三个字段,查询同时存在 gid
为 1 和 2 的 uid
。
这里给出了 3 条 sql 语句和相关 EXPLAIN
结果(结果我就不排版了)。
30000 条数据 5000 用户 10 商品
插入了大约 3w 条测试数据后对几条 sql 语句做了一下测试。
1
1
| EXPLAIN SELECT DISTINCT uid FROM orders WHERE uid IN (SELECT uid FROM orders WHERE gid = 1) AND gid = 2
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| 1 SIMPLE orders NULL ALL NULL
NULL
NULL
NULL 28644 10.00 Using where; Using temporary 1 SIMPLE <subquery2> NULL eq_ref <auto_key> <auto_key> 4 ops.orders.uid 1 100.00 Distinct 2 MATERIALIZED orders NULL ALL NULL
NULL
NULL
NULL 28644 10.00 Using where; Distinct
|
2
1
| EXPLAIN SELECT DISTINCT a.uid FROM orders2 a INNER JOIN orders2 b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| 1 SIMPLE a NULL ALL NULL
NULL
NULL
NULL 28644 10.00 Using where; Using temporary 1 SIMPLE b NULL ALL NULL
NULL
NULL
NULL 28644 1.00 Using where; Distinct; Using join buffer (Block Nested Loop)
|
3
1
| EXPLAIN SELECT uid FROM (SELECT DISTINCT uid, gid FROM orders WHERE gid = 1 OR gid = 2) a GROUP BY uid HAVING COUNT(*) = 2
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| 1 PRIMARY <derived2> NULL ALL NULL
NULL
NULL
NULL 5442 100.00 Using temporary; Using filesort 2 DERIVED orders NULL ALL NULL
NULL
NULL
NULL 28644 19.00 Using where; Using temporary
|
加索引
给 uid
和 gid
加上索引后,再次执行上面 3 条 sql 语句。
1
1
| EXPLAIN SELECT DISTINCT uid FROM orders WHERE uid IN (SELECT uid FROM orders WHERE gid = 1) AND gid = 2
|
1 2 3 4 5 6 7 8 9
| 1 SIMPLE orders NULL ref idx_uid,idx_gid idx_gid 4 const 2912 100.00 Using where; Using temporary 1 SIMPLE <subquery2> NULL eq_ref <auto_key> <auto_key> 4 ops.orders.uid 1 100.00 Distinct 2 MATERIALIZED orders NULL ref idx_uid,idx_gid idx_gid 4 const 3000 100.00 Distinct
|
2
1
| EXPLAIN SELECT DISTINCT a.uid FROM orders a INNER JOIN orders b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2
|
1 2 3 4 5 6
| 1 SIMPLE b NULL ref idx_uid,idx_gid idx_gid 4 const 2912 100.00 Using temporary 1 SIMPLE a NULL ref idx_uid,idx_gid idx_uid 4 ops.b.uid 5 10.47 Using where
|
3
EXPLAIN SELECT uid FROM (SELECT DISTINCT uid, gid FROM orders WHERE gid = 1 OR gid = 2) a GROUP BY uid HAVING COUNT(*) = 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| 1 PRIMARY <derived2> NULL ALL NULL
NULL
NULL
NULL 5912 100.00 Using temporary; Using filesort 2 DERIVED orders NULL ALL idx_gid NULL
NULL
NULL 28644 20.64 Using where; Using temporary
|
去重
去重后,一个 uid
最多只能有一个 gid
,大约 22333 条数据。
1
1
| EXPLAIN SELECT uid FROM orders WHERE uid IN (SELECT uid FROM orders WHERE gid = 1) AND gid = 2
|
1 2 3 4 5 6 7 8 9 10 11 12 13
| 1 SIMPLE orders NULL ref idx_uid,idx_gid idx_gid 4 const 2220 100.00 Using where 1 SIMPLE <subquery2> NULL eq_ref <auto_key> <auto_key> 4 ops.orders.uid 1 100.00 NULL
2 MATERIALIZED orders NULL ref idx_uid,idx_gid idx_gid 4 const 2233 100.00 NULL
|
2
1
| EXPLAIN SELECT a.uid FROM orders a INNER JOIN orders b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2
|
1 2 3 4 5 6 7 8
| 1 SIMPLE b NULL ref idx_uid,idx_gid idx_gid 4 const 2220 100.00 NULL
1 SIMPLE a NULL ref idx_uid,idx_gid idx_uid 4 ops.b.uid 4 10.00 Using where
|
3
1
| EXPLAIN SELECT uid FROM orders WHERE gid = 1 OR gid = 2 GROUP BY uid HAVING COUNT(*) = 2
|
1 2 3 4 5
| 1 SIMPLE orders NULL index idx_uid,idx_gid idx_uid 4 NULL 22333 19.94 Using where
|
总结
碰到这个问题时,想过 1 和 2,因为对 MySQL 方面基础不扎实,主观上都否定了,回来测了一下大致才有了比较清晰的认识。
- 1 这个解法当时讨论了一下,脑子一热忘记后面的 gid 也能用上索引,认为 1 只能用到一次索引,写出这个 sql 语句后就发现问题了。。。
- 2 在加了索引后是性能上最好的,哪怕去重后性能上几乎不差 3,也是主观上被最快否决的答案。
- 3 是 dave 给的答案,非常巧妙。在 3w 数据+去重的情况下,性能也是最好的,不过没去重情况下,需要一个 distinct 的子查询时会全表扫描,表现不是很好。
PS: 一开始打算在阿里云上起一个 rds 实例测试,结果发现需要用户余额 >= 100 才配使用,腾讯云就没有这个限制。然后腾讯云 MySQL 管理使用 PMA,emmmmmm
拓展题
找出有 gid 1 但没有 gid 2 的 uid,这题目前想到两个解法
SELECT uid FROM orders WHERE uid NOT IN (SELECT uid FROM orders WHERE gid = 2) AND gid = 1
另一个是
SELECT a.uid FROM orders2 a LEFT JOIN orders2 b ON a.uid = b.uid AND a.gid = 1 AND b.gid = 2 WHERE a.gid = 1 AND b.id IS NULL