🔗 🟡 2707. Extra Characters in a String 1736

tags: `Biweekly Contest 105` `陣列(Array)` `字串(String)` `動態規劃(Dynamic Programming)` `記憶化搜尋(Memoization)` `雜湊表(Hash Table)` `字典樹(Trie)`

題意

給定一個下標從 $0$ 開始的字串 $s$ 和一個單字字典 $dictionary$ 。你需要將 $s$ 分割成一個或多個不重疊的子字串，每個子字串均存在於字典中。但 $s$ 中可能會有一些額外的字元未出現在任何子字串中。

返回在最佳分割 $s$ 的情況下，剩餘的最少額外字元數。

思路：動態規劃(Dynamic Programming) + 雜湊表(Hash Table) / 字典樹(Trie)

方法一：動態規劃(Dynamic Programming) + 雜湊表(Hash Table)

首先定義問題，令 $dfs(i)$ 表示從第 $i$ 個字元開始，所得到的最少獨立字元數，即子字串 $s[i:]$ 中，需要添加的最少額外字元數量。由於可以將問題分解成重複的子問題，因此可以使用動態規劃(Dynamic Programming) 來解決。

根據我們對問題的定義，則 $dfs(i)$ 可以從以下兩種來源中轉移，從中取最小值：

若選擇 $s[i]$ 作為獨立字元，則 $dfs(i) = dfs(i+1) + 1$
枚舉 $j \in [i, n-1]$ ，若 $s[i:j+1]$ 在字典中，則 $dfs(i) = \displaystyle\min_{\substack{j \in [i, n-1] \\ s[i:j+1] \in dictionary}} \left(dfs(j+1)\right)$

這裡使用記憶化搜尋(Memoization) 來實現動態規劃。

而檢查子字串 $s[i:j+1]$ 是否在字典中，可以使用雜湊表(Hash Table) 保存字典中的單字後來查找。

複雜度分析

時間複雜度： $\mathcal{O}(n^3 + L)$ $O (n^{3} + L)$ ，其中 $L$ $L$ 為字典中所有單字的總長度， $n$ $n$ 為字串 $s$ $s$ 的長度。
- 預處理字典中的單字，需要 $O(L)$ 的時間複雜度。
- 總共有 $n$ 個位置(狀態)，對於每個狀態需要枚舉 $O(n)$ 個可能的子字串，每個子字串需要 $O(n)$ 的時間複雜度來檢查是否在字典中。
空間複雜度： $\mathcal{O}(n + L)$ $O (n + L)$
- 需要 $O(L)$ 的空間複雜度來保存字典中的單字。
- 總共有 $n$ 個狀態，需要 $O(n)$ 的空間複雜度來保存每個狀態的結果。

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        n = len(s)
        words = set(dictionary)
        
        # 從第 i 個位置開始，所得到的獨立字元數 
        @cache # Memoization
        def dfs(i: int):
            if i >= n:
                return 0
            res = dfs(i + 1) + 1 # 不選，即第 i 個字元為獨立字元的情況
            for j in range(i, n): # 枚舉以第 i 個字元開頭的字
                if s[i:j+1] in words: # 如果該字串在字典中
                    res = min(res, dfs(j+1)) # 選擇該字串，並遞迴計算剩餘字串的獨立字元數
            return res
        return dfs(0)

class Solution {
public:
    int minExtraChar(string s, vector<string>& dictionary) {
        int n = s.size();
        unordered_set<string> words(dictionary.begin(), dictionary.end());
        vector<int> memo(n, -1);
        function<int(int)> dfs = [&](int i) -> int {
            if (i >= n) return 0;
            int& res = memo[i];
            if (res != -1) return res;
            res = dfs(i + 1) + 1; // 不選，即第 i 個字元為獨立字元的情況
            for (int j = i; j < n; ++j) {
                if (words.count(s.substr(i, j - i + 1))) {
                    res = min(res, dfs(j + 1));
                }
            }
            return res;
        };
        return dfs(0);
    }
};

方法二：動態規劃(Dynamic Programming) + 字典樹(Trie)

在方法一中，我們使用雜湊表(Hash Table)來檢查子字串是否在字典中。然而，雜湊表的查找一個單字的時間複雜度為 $\mathcal{O}(x)$ ，其中 $x$ 為子字串的長度，且當我們從 $s[i:j]$ 轉移到 $s[i:j+1]$ 時，又需要重新計算子字串的雜湊值。故可以使用字典樹(Trie) 來優化查找的時間複雜度。

為此我們首先要創建一個字典樹，將字典中的單字插入到字典樹中，並標記單詞的結尾。在這裡不多贅述字典樹的實現，可以參考程式碼或是自行查詢相關資料。

在轉移時，我們從 $s[i]$ 開始，在字典樹中查找匹配的字串。如果當前字元不在字典樹的子節點中，則停止繼續查找。若找到一個在字典中的單字，即 node.is_end == true ，則可以更新結果。此外，若當前字元 $s[j]$ 不在字典樹的子節點中，則不用繼續往下找，可以直接跳出迴圈。

除了檢查子字串是否在字典中的方式，其餘部分與方法一大致相同。

複雜度分析

時間複雜度： $\mathcal{O}(n^2 + L)$ $O (n^{2} + L)$ ，其中 $L$ $L$ 為字典中所有單字的總長度， $n$ $n$ 為字串 $s$ $s$ 的長度。
- 構建字典樹需要 $O(L)$ 的時間複雜度。
- 總共有 $n$ 個位置(狀態)，對於每個狀態，需要 $O(n)$ 的時間複雜度來查找所有子字串是否在字典中。
空間複雜度： $\mathcal{O}(n + |\Sigma|L)$ $O (n + ∣Σ∣ L)$ ，其中 $|\Sigma|$ $∣Σ∣$ 為字典樹中所有可能的字元數，本題中為 $26$ $26$ 。
- 需要 $O(L)$ 的空間複雜度來保存字典樹中的單字作為節點，而每個節點需要 $|\Sigma|$ 的空間來保存指針。如果使用雜湊表來保存子節點，則需要 $O(L)$ 的空間複雜度。
- 總共有 $n$ 個狀態，故總共需要 $O(n)$ 的空間複雜度來保存每個狀態的結果。

class TrieNode:
    def __init__(self) -> None:
        self.child = {}
        self.is_end = False

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        n = len(s)

        root = TrieNode()
        for word in dictionary:
            node = root
            for ch in word:
                if ch not in node.child:
                    node.child[ch] = TrieNode()
                node = node.child[ch]
            node.is_end = True
        
        @cache
        def dfs(i: int):
            if i >= n:
                return 0
            res = dfs(i + 1) + 1 # 不選，即第 i 個字元為獨立字元的情況
            node = root
            for j in range(i, n):
                ch = s[j]
                if ch not in node.child: # 當前字元不在字典樹中，不用繼續往下找
                    break
                node = node.child[ch]
                if node.is_end: # 選，s[i:j+1] 在字典中
                    res = min(res, dfs(j + 1))
            return res
        return dfs(0)

class TrieNode {
public:
    vector<TrieNode*> child;
    bool is_end;
    TrieNode() {
        child = vector<TrieNode*>(26, nullptr);
        is_end = false;
    }
};

class Solution {
public:
    int minExtraChar(string s, vector<string>& dictionary) {
        int n = s.size();

        TrieNode* root = new TrieNode();
        for (const string& word : dictionary) {
            TrieNode* node = root;
            for (char ch : word) {
                if (node->child[ch - 'a'] == nullptr) {
                    node->child[ch - 'a'] = new TrieNode();
                }
                node = node->child[ch - 'a'];
            }
            node->is_end = true;
        }

        vector<int> memo(n, -1);
        function<int(int)> dfs = [&](int i) -> int {
            if (i >= n) return 0;
            int& res = memo[i];
            if (res != -1) return res;
            res = dfs(i + 1) + 1; // 不選，即第 i 個字元為獨立字元的情況
            TrieNode* node = root;
            for (int j = i; j < n; ++j) {
                if (node->child[s[j] - 'a'] == nullptr) break; // 當前字元不在字典樹中，不用繼續往下找
                node = node->child[s[j] - 'a'];
                if (node->is_end) res = min(res, dfs(j + 1)); // 選，s[i:j+1] 在字典中
            }
            return res;
        };
        return dfs(0);
    }
};