# Greedy String Tiling(GST)算法

LCS算法Longest Common Subsequences 是用于找出最长匹配子串的. 这里其实并没有真正实现维基所述的LCS, 那个要造2维数组来扫, 在Python我就简单化处理了.

TODO: 这个Rabin–Karp algorithm算法看起来也很有趣, 下次实现一下.

``````#! /usr/bin/env python

a=['a','b','c','d','e','f']
b=['d','e','a','b','c','f']

c = ['1','2','3','4','5','6','7','8','9','1','3']
d = ['3','4','5','2','1','7','8','9','1']

def gst(a,b,minlength=2):
if len(a) == 0 or len(b) == 0:
return []
# if py>3.0, nonlocal is better
class markit:
a=[0]
minlen=2
markit.a=[0]*len(a)
markit.minlen=minlength

#output index
out=[]

# To find the max length substr (index)
# apos is the position of a[0] in origin string
def maxsub(a,b,apos=0,lennow=0):
if (len(a) == 0 or len(b) == 0):
return []
if (a[0]==b[0] and markit.a[apos]!=1 ):
return [apos]+maxsub(a[1:],b[1:],apos+1,lennow=lennow+1)
elif (a[0]!=b[0] and lennow>0):
return []
return max(maxsub(a, b[1:],apos), maxsub(a[1:], b,apos+1), key=len)

# Loop to find all longest substr until the length < minlength
while True:
findmax=maxsub(a,b,0,0)
if (len(findmax)<markit.minlen):
break
else:
for i in findmax:
markit.a[i]=1
out+=findmax
return [ a[i] for i in out]

print gst(a,b,2)
``````

◆ 本文地址: http://platinhom.github.io/2016/02/16/Greedy-String-Tiling/, 转载请注明 ◆

/ 已阅读()
Source 类别: Coding