数据库概论.陈立军.04.SQL 数据查询(3)

Posted on 2021-04-27 In DB.CLJ Views:

字符串操作、like、正则表达式、全文索引、字符串函数

SQL 数据查询

Select ...
From ...
Where ...
Group by ...
Having ...
Union ...
Order by ...
Limit ...

在线资源

http://sqlfiddle.com

字符串操作

1	列名 [not] like '字符串'

匹配规则

%       匹配零个或多个字符
_       匹配任意单个字符
[]      任何在指定范围内的字符, [a-f], [abcdef]
[^]     任何不在指定范围内的字符, [^a-f], [^abcdef]

转义字符

用 escape 定义转义字符，以去掉特殊字符的特定含义，使其被作为普通字符看待
- 如 escape \，定义 \ 作为转义字符
- 则可用 \% 去匹配 %，用 \_去匹配 _
- 用什么去匹配 \：\\

select * from my_tb
where col1 like 'x%%x__xx' escape 'x'

/*
x%x_x    (x)
%xx_x    (x)
%xx_xx   (v)
%__x     (v)
*/

例子

列出姓名以“张”打头的教师的所有信息

1
2
3

select *
from PROF
where pname like '张%'

列出名称中含有3个以上字符，且倒数第三个是d，倒数第二个是 _ 的课程

1
2
3

select *
from C
where cname like '%_d\__' escape '\'

索引

在 cname 上建有索引 cname_idx，观察下面查询的执行计划，看谁用到了该索引

/* 放弃使用索引 */
select * from C where cname like '%d'

/* 使用索引 */
select * from C where cname like 'd%'

like 作用有限，如何增强数据库的文本处理能力
- 全文检索 + 正则表达式

正则表达式

Regular Expression
一些例子

用户名
/^[a-z0-9_-]{3,16}$/

密码
/^[a-z0-9_-]{6,18}$/

电子邮箱
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

IP地址
/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/

HTML标签
/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

如果数据库本身不支持正则表达式，那么一切都会变得很麻烦
- 现在的数据库都是直接支持的
用 check 约束表达正则表达式

create table IP_Address(ip char(15) primary key)
alter table IP_Address add constraint CHK_IP_Valid check(
    ip like '_%._%._%._%'
    and ip not like '%.%.%.%.%'
    and ip not like '%[^0-9.]%'
    and ip not like '%[0-9][0-9][0-9][0-9]%'
    and ip not like '%[3-9][0-9][0-9]%'
    and ip not like '%2[6-9][0-9]%'
    and ip not like '%25[6-9]%'
)

数据库中的正则表达式
- 按照 PPT 中，好像是含有正则表示式项即可

select name from my_name
where name regexp '^吴' /* 名字以 '吴' 开头 */

select name from my_name
where name regexp '吴' /* 名字中含有 '吴' */

Oracle

在表定义的时候，可以在添加 constraint 的时候使用正则表达式
- MySQL 和 SQL Server 好像都没有

函数名	说明
REGEXP_LIKE	类似于LIKE 运算符，但执行正则表达式匹配而不是简单的模式匹配
REGEXP_INSTR	在给定字符串中搜索某个正则表达式模式，并返回匹配项的位置
REGEXP_REPLACE	搜索某个正则表达式模式并使用替换字符串替换它
REGEXP_SUBSTR	在给定字符串中搜索某个正则表达式模式并返回匹配的子字符串

Alter table students
add constraint stud_ssn_ck
check (
    REGEXP_LIKE(
        ssn,
        '^([[:digit:]]{3}-[[:digit:]]{2}-[[:digit:]]{4}|[[:digit:]]{9})$'
    )
)

标准 SQL

similar to

全文检索

大量文档不适用于用 like 查询

倒排索引

SQL Server 中的全文索引

创建全文索引

key index 指定 table_name 上唯一码索引的名称, 最好是聚簇索引

create fulltext catalog catalog_name
create fulltext index on
    table_name[(column_name]
key index index_name
on catalog_name

全文索引的使用

1 2	Contains(属性列\|, 查找条件) freetext(属性列\|, 查找文本)

使用 freetext 时，全文查询引擎内部将查找文本拆分为若干个搜索词，并赋予每个词以不同的加权，然后查找匹配

示例

doc(doc_id,title,author,abstract,content)
/* 聚簇索引 */
create unique clustered index doc_idxondoc(doc_id)
/* 建立全文索引 */
create fulltext catalog doc_fulltext_catalog
create fulltext index on doc(title,author,abstract,content)
key index doc_idx
on doc_fulltext_catalog
/* 使用 */
select *
from doc
where contains(author,'Tom and Jerry')

select *
from doc
where contains(*, 'database and not dataspace')

select *
from doc
where freetext(content, 'DeepLearning')

MySQL全文索引

1 2	create fulltextindex ft_indx_nameontb_name(col_name,...) match (col_name,...) against(search_expr[search_modifier])

search_modifier:
- in natural language mode
- in natural language mode with query expansion
- in booleanmode
- with query expansion
例子

create table myDoc(
    id int auto_increment primary key,
    title varchar(100),
    content text,
    fulltext(title,content)
)
insert into myDoc(title,content) values
    ('MySQL Tutorial','DBMSstands for DataBase...'),
    ('How To Use MySQL Well','Afteryou went through a ...'),
    ('Optimizing MySQL','Inthis tutorial we will show ...'),
    ('1001 MySQL Tricks','1. Never run mysqldas root. 2. ...'),
    ('MySQL vs. YourSQL','In the following database comparison ...'),
    ('MySQL Security','Whenconfigured properly, MySQL ...');

select *
from myDoc
where match(title,content)
against('database' in natural language mode)

select id,match(title,content)
against('database') as score /* 匹配分 */
from myDoc


match(title,content) against('+MySQL -YourSQL'in boolean mode)
match(title,content) against('+MySQL +YourSQL'in boolean mode) /* 且 */
match(title,content) against('+MySQL YourSQL'in boolean mode)  /* 或 */

字符串函数

例子

计算字符 a 出现次数

1	select len('databases') − len(replace('databases','a',''))

如果是检索多个字符出现次数的话，还要除以相应的字符个数

1	select len('databases') − len(replace('databases','ta',''))/len('ta')