Python中的字符串匹配算法分析

在Python在str对象的find、index、replace等操作都是基于字符串匹配的。通过阅读Python的源代码可知,在Python中的字符串匹配算法是基于Boyer-Moore、Horspool和Sunday的混合。

关于它的详细介绍可以访问这个网页: http://effbot.org/zone/stringlib.htm

用Python代码实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/usr/bin/env python
#-*- coding:utf-8 -*-


def make_delta1(p):
ALPHABET_LEN = 256
i = 0
patternlen = len(p)
delta1 = [0] * ALPHABET_LEN

while i < ALPHABET_LEN:
delta1[i] = patternlen
i += 1

i = 0
while i < patternlen-1:
delta1[ord(p[i])] = patternlen-1 - i
i += 1
return delta1


def find(s, p):
# find first occurrence of p in s
n = len(s)
m = len(p)
delta1 = make_delta1(p)
skip = delta1[ord(p[m-1])]
i = 0
while i <= n-m:
if s[i+m-1] == p[m-1]: # (boyer-moore)
# potential match
if s[i:i+m-1] == p[:m-1]:
return i
if s[i+m] not in p:
i = i + m + 1 # (sunday)
else:
i = i + skip # (horspool)
else:
# skip
if s[i+m] not in p:
i = i + m + 1 # (sunday)
else:
i = i + 1
return -1 # not found

if __name__ == '__main__':
print find("HERE IS A SILMPLE EXAMPLE", "EXAMPLE")