論文の息抜きにSuffixArray
2月上旬投稿予定の論文を書くのに疲れたので,
息抜きに非常に単純なSuffixArrayをPythonで実装してみました.
ソースコード
#!/usr/bin/python # -*- coding: utf-8 -*- import sys def createSA(text): """Create Suffix Array""" print "create SuffixArray" tlen = len(text) if (tlen == 0): return sa = [(text[i:], i) for i in range(tlen)] sa.sort() for i in range(len(sa)): print i, "\t", sa[i][1], "\t", sa[i][0] return sa def searchSA(sa, query): """Search query using Suffix Array""" print "query search using SuffixArray" salen = len(sa) qlen = len(query) if (salen == 0 or qlen == 0): return result = [] lower = 0 higher = salen width = salen / 2 while (True): prefix = sa[lower + width][0][0:qlen] if (prefix < query): lower += width + 1 prefix = sa[higher - width][0][0:qlen] if (prefix > query): higher -= width + 1 if (lower > higher): return if (width == 0): break width /= 2 for i in range(lower, higher+1): print sa[i][1], "\t", sa[i][0] return
実行例
SA = createSA("abracadabra") print "" searchSA(SA, "ab")
実行結果
create SuffixArray 0 10 a 1 7 abra 2 0 abracadabra 3 3 acadabra 4 5 adabra 5 8 bra 6 1 bracadabra 7 4 cadabra 8 6 dabra 9 9 ra 10 2 racadabra query search using SuffixArray 7 abra 0 abracadabra
もっと綺麗にコードが書けるようになりたいですね.