DoorKickers 的網誌

網誌

后缀排序

2020-02-26 12:26:19 By DoorKickers

Pre-defination

S : a string named S that consists of several characters

|S| : the size of S

S[l:r] : a sub-string of S which consists of the characters S[l], ... , S[r]

Suf(i) : a suffix of S that means S[i:|S|]

Algorithm

Sqy Algorithm I is an algorithm that can sort the suffix of a given string S in O(nlogn) time, and the result of this algorithm will be stored in a array called Suffix Array.

Let me show the code to you first, and then I will explain some intricacy details

(Assumed that you have already got the Radix Sort which can sort an array in O(n) time.)

inline void SA_sort(char *s, int n) {
    for (int i = 1; i <= n; i++) ++c[x[i] = s[i]];
    for (int i = 2; i <= m; i++) c[i] += c[i - 1];
    for (int i = n; i >= 1; i--) sa[c[x[i]]--] = i;
    for (int k = 1; k <= n; k <<= 1) {
        int num = 0;
        for (int i = n - k + 1; i <= n; i++) y[++num] = i;
        for (int i = 1; i <= n; i++) {
            if (sa[i] > k)
                y[++num] = sa[i] - k;
        }
        for (int i = 1; i <= m; i++) c[i] = 0;
        for (int i = 1; i <= n; i++) ++c[x[i]];
        for (int i = 2; i <= m; i++) c[i] += c[i - 1];
        for (int i = n; i >= 1; i--) sa[c[x[y[i]]]--] = y[i], y[i] = 0;
        swap(x, y);
        x[sa[1]] = 1, num = 1;
        for (int i = 2; i <= n; i++) {
            x[sa[i]] = (y[sa[i]] == y[sa[i - 1]] && y[sa[i] + k] == y[sa[i - 1] + k]) ? num : ++num;
        }
        if (num == n) break;
        m = num;

    }
}

Ok, that's the code.

Above all, the most important part that can make you better understanding of that snippet is the meaning of each array.

To simplify, we can use the function concept instead of array concept.

A function f is a machine that can get input and return an output.

Good, looks like you understand it completely.

 

Next step : look through the arrays that mentioned in that snippet.

You will comprehend them via function concept (expect c[])

 

x[] : input a initial position i, and return the "value" of the first-key whose length is k and whose start position is i.

y[] : input a current partial rank k of second-keys, and return the initial position i of the first-key which corresponds to the No.k second-key.

sa[] : input a current global rank k of the combination of first-keys and it's corresponding second-key, and return the initial position i which represents for that rank k string.

Remember it!!!!!!!!!!!!!

Ok, the writter is fucked up.

So

To be continued.

評論

No comments yet.

發表評論

可以用@mike来提到mike这个用户,mike会被高亮显示。如果你真的想打“@”这个字符,请用“@@”。

可以用/kel来使用表情 kel。