題目的難度顏色使用 Luogu 上的分級，由簡單到困難分別為 🔴🟠🟡🟢🩵🔵🟣⚫。

🔗 🟣 P2292 [HNOI2004] L 语言

Problem Statement

題目簡述

給定一個由小寫英文字母單字組成的字典，以及多篇沒有標點、只含小寫英文字母的文章。
若一篇文章可以被切分成若干段，且每一段都是字典中的單字，則稱這篇文章在該字典下可以被理解。
對於每篇文章，輸出它在字典下可以被理解的最長前綴位置；若沒有任何非空前綴可以被理解，則輸出 $0$ 。

Constraints

約束條件

$1 \le n \le 20$ ， $1 \le m \le 50$ 。
字典單字長度滿足 $1 \le |s| \le 20$ 。
每篇文章長度滿足 $1 \le |t| \le 2 \times 10^6$ 。
所有字典單字與文章都只包含小寫英文字母。
對於 $80\%$ 的資料， $m \le 20$ ， $|t| \le 10^6$ 。

思路：AC 自動機 + 前綴可分割 DP

先只看「可理解」這件事本身，可以先用 DP 定義前綴的可理解性：

f_i = \begin{cases} \text{true}, & \text{若長度為 } i \text{ 的前綴是可理解的} \\ \text{false}, & \text{若長度為 } i \text{ 的前綴不可被理解的} \end{cases}

空前綴自然是合法的，所以 $f_0=\text{true}$ 。

那麼當某個字典單字剛好作為這個前綴的後綴時，只要它前面的部分也能被理解，當前前綴就能被理解。

換句話說，如果長度為 $L$ 的單字在第 $i$ 個位置結尾，就有轉移：

f_i \leftarrow f_i \lor f_{i-L}

所以問題變成：掃描文章時，如何快速知道「有哪些字典單字在目前位置結尾」。

最直接的做法是每到一個位置，就枚舉所有單字並檢查是否匹配文章後綴。但文章單串長度可達 $2\times 10^6$ ，詢問又有多篇，這樣做顯然是不能接受的。

注意這題的字典是固定的，文章有很多篇；每篇文章都要一邊掃描，一邊查詢目前位置結尾的所有字典單字。這正是多模式匹配的使用場景，所以自然想到 AC 自動機。

前置知識

AC 自動機可以看成「Trie + fail 指標」的多模式匹配工具。把所有字典單字放進 Trie 後，就能在掃描文章時同步得到目前位置匹配到哪些單字。這裡只使用它的匹配結果來做 DP，不展開原理；原理可以見 OI Wiki。

因此只要枚舉所有在目前位置結尾的單字長度 $L_k$ ，檢查 $f_{i-L_k}$ 是否為真即可。

方法一：沿著 fail 鏈枚舉匹配長度

在構建完 AC 自動機後，掃描文章時每到一個字元，就沿著自動機走一步，找到目前文章後綴對應的節點。而延著 fail 鏈往上走，就能找到所有在目前位置結尾的單字。

但如果直接沿著 fail 鏈往上跳，確實可以找到所有後綴匹配，但中間會經過很多「不是單字結尾」的節點，這些節點對 DP 轉移沒有用。於是可以額外維護 last：它指向 fail 鏈上最近的一個單字結尾節點，相當於把不必要的節點跳過。

這樣從目前節點開始沿著 last 往上走，就能依序得到所有在目前位置結尾的單字長度 $L_k$ 。

剪枝

不過就算使用了 last，仍然可能有很多單字長度要枚舉，在本題加強後的資料面前仍然會超時。可以考慮以下兩個關鍵剪枝：

設目前已知最大可理解長度為 $ans$ ，現在掃描到位置 $i$ 。如果想讓答案從 $ans$ 延伸到 $i$ ，中間這段至少要由一個長度為 $i-ans$ 的單字接上；但字典中的單字最長只有 $max\_len$ 。所以一旦 $i-ans>max\_len$ ，這個缺口不可能被任何單字補上，後面也無法再接回合法切分，可以直接停止處理這篇文章。
由於題目只問最長前綴位置，不問方案數，也不問切分方式。因此在枚舉目前位置結尾的單字長度時，只要找到一個能讓 $f_i$ 成立的長度，就可以立刻停止；繼續找其他長度不會改變 $f_i$ 的真假。

關鍵剪枝

若 $i-ans>max\_len$ ，則不存在足夠長的單字銜接目前缺口，可以直接停止掃描這篇文章。
若已找到某個合法長度 $L$ ，使得 $f_{i-L}=\text{true}$ ，就能推出 $f_i=\text{true}$ ，不用再沿著 last 尋找其他 $L$ 。

其中 1. 可加可不加，2. 是必須的，否則會 TLE。

複雜度分析

時間複雜度： $\mathcal{O}(S|\Sigma| + M\min(n,\max |s|))$ ，其中 $S$ 是字典 trie 的節點數， $|\Sigma|=26$ ， $M$ 是 $Q$ 篇文章的總長度。每個位置沿 last 最多只會經過所有能在此結尾的單字節點，數量不超過字典單字數 $n$ ，也不超過題目給定的單字長度上界 $\max |s|\le 20$ 。
空間複雜度： $\mathcal{O}(S|\Sigma| + m)$ ，其中 $m$ 是單篇文章長度上界，來自處理單篇文章時的 DP 陣列。

方法二：狀態壓縮 DP

方法一的瓶頸在於每次都要沿著 last 鏈逐個嘗試長度，最壞需要 $\mathcal{O}(\min(n,\max |s|))$ 的時間。

注意到題目保證 $|s|\le 20$ ，所有可能的單字長度只有 $1$ 到 $20$ ，因此可以把「匹配長度集合」壓縮成一個整數 $\text{matchMask}$ ：第 $L-1$ 位為 $1$ 表示長度為 $L$ 的單字在目前位置結尾。

建立 AC 自動機時，替每個節點維護這個 bitmask。節點自身的長度和沿失配指標繼承的長度都會貢獻到 mask 中，在 BFS 的過程中即可一併處理完畢。之後掃描到某個位置時，直接讀目前節點的 mask，就能在 $\mathcal{O}(1)$ 知道所有可用的匹配長度。

DP 的歷史狀態同樣只需要保留最近 $\max |s|$ 個，因此也能將其壓縮成另一個整數 $\text{stateMask}$ ：第 $L-1$ 位為 $1$ ，就代表往前數 $L$ 個字元之前的那個前綴是可被理解的。

於是轉移式

\exists L,\quad \text{長度 }L\text{ 的單字在目前位置結尾，且 } f_{i-L}=1

變成兩個整數求交集：

\text{matchMask} \ \&\ \text{stateMask} \ne 0

只要結果非零，就表示存在某個單字能接在合法前綴之後，目前前綴可被理解。

掃描完目前位置後，把狀態遮罩左移一位：所有歷史前綴離下一個位置又遠了一格。若目前前綴合法，就把最低位設為 $1$ 。同時只保留最近 $\max |s|$ 位，因為超過這個距離的狀態永遠不會再被用到。

核心轉換

把 DP 轉移中的「枚舉單字長度」改成「長度集合求交」。AC 自動機端把匹配長度壓成遮罩，DP 端把最近的可銜接狀態也壓成遮罩，兩者做一次位運算就完成轉移。

複雜度分析

時間複雜度：建 AC 自動機為 $\mathcal{O}(S|\Sigma|)$ ；每篇文章只需線性掃描，為 $\mathcal{O}(m)$ 。
空間複雜度： $\mathcal{O}(S|\Sigma| + m)$ 。DP 狀態被壓成常數個整數，不再需要與文章長度同級的陣列，但目前寫法仍需要處理輸入時的字串。

Code

方法一：輸出鏈枚舉匹配長度

#include <bits/stdc++.h>
using namespace std;
const int ALPH = 26;
#define endl '\n'

struct Node {
    array<Node*, ALPH> child;
    Node *fail, *last;
    int length;
    Node() : fail(nullptr), last(nullptr), length(0) {
        fill(child.begin(), child.end(), nullptr);
    }
};

class AhoCorasick {
public:
    Node* root;

    AhoCorasick() {
        root = new Node();
    }

    void insert(const string& word) {
        Node* node = root;
        for (char ch : word) {
            int idx = ch - 'a';
            if (node->child[idx] == nullptr) node->child[idx] = new Node();
            node = node->child[idx];
        }
        node->length = word.length();
    }

    void build() {
        root->fail = root->last = root;
        // BFS
        queue<Node*> q;
        for (int i = 0; i < ALPH; ++i) {
            if (root->child[i] == nullptr) {
                // 添加虛擬子節點
                root->child[i] = root;
            } else {
                root->child[i]->fail = root->child[i]->last = root;
                q.push(root->child[i]);
            }
        }
        while (!q.empty()) {
            Node* u = q.front();
            q.pop();
            for (int i = 0; i < ALPH; ++i) {
                Node* v = u->child[i];
                if (v == nullptr) {
                    // 添加虛擬子節點
                    u->child[i] = u->fail->child[i];
                } else {
                    // 失配位置
                    v->fail = u->fail->child[i];
                    // 上一個一定是某個 word 結尾的節點
                    v->last = (v->fail->length > 0) ? v->fail : v->fail->last;
                    q.push(v);
                }
            }
        }
    }
};

void solve() {
    int n, q;
    cin >> n >> q;

    AhoCorasick ac;
    string pattern, t;
    int max_len = 0;
    for (int i = 0; i < n; ++i) {
        cin >> pattern;
        ac.insert(pattern);
        max_len = max(max_len, (int)pattern.length());
    }
    ac.build();

    while (q--) {
        cin >> t;
        int m = t.length();

        int ans = 0;
        vector<bool> f(m + 1, false);
        f[0] = true;

        Node* node = ac.root;
        for (int i = 1; i <= t.length(); ++i) {
            // 剪枝：當前需要長度至少為 i - ans 的模式串才能匹配，
            // 但這個長度不可能超過 max_len
            if (i - ans > max_len) break;

            node = node->child[t[i - 1] - 'a'];

            // 沒有任何字串的前綴與 t[..i] 的後綴匹配
            if (node == ac.root) break;

            // 沿著 last 往上尋找
            Node* temp = node;
            while (temp != ac.root) {
                f[i] = f[i] || f[i - temp->length];
                // 剪枝：已經匹配就沒必要再往上找了，主要是這個避免 TLE
                if (f[i]) {
                    ans = i;  // 更新答案
                    break;
                }
                temp = temp->last;
            }
        }
        cout << ans << endl;
    }
    return;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    solve();
    return 0;
}

方法二：狀態壓縮 DP

#include <bits/stdc++.h>
using namespace std;
const int ALPH = 26;
#define endl '\n'

struct Node {
    array<Node*, ALPH> child;
    Node *fail, *last;
    int length, mask;

    Node() : fail(nullptr), last(nullptr), length(0), mask(0) {
        fill(child.begin(), child.end(), nullptr);
    }
};

class AhoCorasick {
public:
    Node* root;

    AhoCorasick() {
        root = new Node();
    }

    void insert(const string& word) {
        Node* node = root;
        for (char ch : word) {
            int c = ch - 'a';
            if (node->child[c] == nullptr) {
                node->child[c] = new Node();
            }
            node = node->child[c];
        }
        node->length = word.length();
    }

    void build() {
        root->fail = root->last = root;
        root->mask = 0;
        // BFS
        queue<Node*> q;
        for (int i = 0; i < ALPH; ++i) {
            if (root->child[i] == nullptr) {
                // 添加虛擬子節點
                root->child[i] = root;
            } else {
                Node* v = root->child[i];
                v->fail = v->last = root;
                v->fail = root;
                // 維護 mask
                if (v->length > 0) {
                    v->mask |= 1ULL << (v->length - 1);
                }
                q.push(v);
            }
        }

        while (!q.empty()) {
            Node* u = q.front();
            q.pop();
            for (int i = 0; i < ALPH; ++i) {
                Node* v = u->child[i];
                if (v == nullptr) {
                    // 添加虛擬子節點
                    u->child[i] = u->fail->child[i];
                } else {
                    // 失配位置
                    v->fail = u->fail->child[i];
                    // 上一個一定是某個 word 結尾的節點
                    v->last = (v->fail->length > 0) ? v->fail : v->fail->last;
                    // 繼承 fail 節點的所有可匹配長度，並加上自己的長度
                    v->mask = v->fail->mask;
                    if (v->length > 0) {
                        v->mask |= 1ULL << (v->length - 1);
                    }
                    q.push(v);
                }
            }
        }
    }
};

void solve() {
    int n, q;
    cin >> n >> q;

    AhoCorasick ac;
    string pattern, t;
    int max_len = 0;
    for (int i = 0; i < n; ++i) {
        cin >> pattern;
        ac.insert(pattern);
        max_len = max(max_len, (int)pattern.length());
    }

    ac.build();

    // 只保留第 0 到 max_len - 1 位
    int U = (1 << max_len) - 1;

    while (q--) {
        cin >> t;
        int m = t.length();

        int ans = 0;
        Node* node = ac.root;

        // 處理第 i 位之前，f 的第 k 位表示 f[i-k] 是否為 true
        // 一開始處理 i = 1，需要知道 f[0]，所以第 1 位為 true
        int f = 1;
        for (int i = 1; i <= m; ++i) {
            // 剪枝：當前需要長度至少為 i - ans 的模式串才能匹配，
            // 但這個長度不可能超過 max_len
            if (i - ans > max_len) break;

            node = node->child[t[i - 1] - 'a'];

            // 沒有任何字串的前綴與 t[..i] 的後綴匹配
            if (node == ac.root) break;

            bool ok = (node->mask & f) != 0;
            if (ok) {
                ans = i;
            }

            // 更新 f
            f = (f << 1) & U | (ok ? 1 : 0);
        }
        cout << ans << endl;
    }
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    solve();
    return 0;
}

寫在最後

Cover Image Credit

The cover image was created by @真白. All rights belong to the original artist.

It is used here only as a non-commercial cover illustration for this note. I do not claim ownership of the artwork.

If you are the copyright holder and believe this usage is inappropriate, please contact me by email or leave a comment. I will remove the image promptly.