論文の息抜きにSuffixArray

2月上旬投稿予定の論文を書くのに疲れたので,
息抜きに非常に単純なSuffixArrayをPythonで実装してみました.

ソースコード

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys

def createSA(text):
    """Create Suffix Array"""
    print "create SuffixArray"

    tlen = len(text)
    if (tlen == 0): return
    sa = [(text[i:], i) for i in range(tlen)]
    sa.sort()

    for i in range(len(sa)):
        print i, "\t", sa[i][1], "\t", sa[i][0]
    return sa

def searchSA(sa, query):
    """Search query using Suffix Array"""
    print "query search using SuffixArray"

    salen = len(sa)
    qlen = len(query)
    if (salen == 0 or qlen == 0): return
    result = []

    lower = 0
    higher = salen
    width = salen / 2
    while (True):
        prefix = sa[lower + width][0][0:qlen]
        if (prefix < query):
            lower += width + 1

        prefix = sa[higher - width][0][0:qlen]
        if (prefix > query):
            higher -= width + 1

        if (lower > higher): return
        if (width == 0): break
        width /= 2

    for i in range(lower, higher+1):
        print sa[i][1], "\t", sa[i][0]
    return

実行例

SA = createSA("abracadabra")
print ""
searchSA(SA, "ab")

実行結果

create SuffixArray
0       10      a
1       7       abra
2       0       abracadabra
3       3       acadabra
4       5       adabra
5       8       bra
6       1       bracadabra
7       4       cadabra
8       6       dabra
9       9       ra
10      2       racadabra

query search using SuffixArray
7       abra
0       abracadabra

もっと綺麗にコードが書けるようになりたいですね.