🔗 🟡 648. Replace Words

tags: `雜湊表(Hash Table)` `字典樹(Trie)` `字串(String)` `分組循環`

題意

在英語中，我們有一個叫做「字根(root word)」的概念，可以在其後面加上一些其他詞來形成另一個更長的詞 - 我們稱這個詞為「衍生字(derivative)」。例如，當字根 “help” 後面加上衍生字 “ful” 時，我們可以形成一個新詞 “helpful”。

給定一個包含許多字根的字典 $dictionary$ 和一個由空格分隔的句子 $sentence$ ，將句子中所有的衍生字替換為形成它的字根。如果一個衍生字可以被多個字根替換，則用最短的字根進行替換。

返回替換後的句子。

思路

首先要將句子以空格分隔成多個單字，在 Python 中可以直接使用 split(" ")，在 C++ 中可以使用分組循環的方式逐一處理。

對於每個單字 $word$ ，由於我們要找到其最短的字根，因此問題其實就是找到 $word$ 在 $dictionary$ 中的最小前綴(Prefix) 。但需要注意， $word$ 的所有前綴可能都不在 $dictionary$ 中，這時我們就要保留原單字。

方法一：雜湊表(Hash Table)

為了找到 $word$ 在 $dictionary$ 中最小前綴字根，可以將 $dictionary$ 轉換為一個集合(Hash Set)，這樣可以快速查找字根。之後對於 $word$ 的每個前綴 $word[:i]$ ，檢查其是否在字根集合中即可，如果找到匹配的字根，則用該字根替換該單詞；如果找不到，則保留原單詞。

具體步驟如下：

建立字根集合：將字根字典轉換為一個集合(Hash Set)，以便能夠快速查找字根。
處理每個單詞：將句子按空格分割成單詞列表，對於每個單詞，逐一檢查其前綴(Prefix) 是否存在於字根集合中。
替換字根：如果找到匹配的字根，則用該字根替換該單詞；如果找不到，則保留原單詞。
構建結果句子：將處理後的單詞重新組合成一個句子並返回。

複雜度分析

時間複雜度： $\mathcal{O}(d + n \cdot m)$ $O (d + n \cdot m)$ ，其中 $d$ $d$ 是 $dictionary$ $d i c t i o na ry$ 的字元總數， $n$ $n$ 是句子中的單字數量， $m$ $m$ 是每個單字的最大長度。
- 構建雜湊表的時間複雜度為 $\mathcal{O}(d)$ 。
- 在最壞情況下，我們需要對每個單字檢查其所有前綴，因此時間複雜度為 $\mathcal{O}(n \cdot m)$ 。不過嚴格來說應該是 $\displaystyle\sum_{w_i \isin \rm{words}} w_i^2$ ，其中 $w_i$ 是第 $i$ 個單字的長度。
空間複雜度： $\mathcal{O}(d + s)$ $O (d + s)$ ，其中 $s$ $s$ 為字串 $sentence$ $se n t e n ce$ 的長度
- 我們需要額外的空間來存儲字根集合。
- 分割後的單字列表需要 $\mathcal{O}(s)$ 的空間。

class Solution:
    def replaceWords(self, dictionary: List[str], sentence: str) -> str:
        d = set(dictionary) # hash set
        ans = []
        for word in sentence.split(" "):
            for i in range(1, len(word)+1):
                if word[:i] in d:
                    ans.append(word[:i])
                    break
            else:
                ans.append(word)
        return " ".join(ans)

class Solution {
public:
    string replaceWords(vector<string>& dictionary, string sentence) {
        int n = sentence.size(), i = 0, st;
        unordered_set<string> dict;
        for (auto& word : dictionary) dict.insert(word);
        string ans, word;
        // 分組循環
        while (i < n) {
            st = i;
            while (i < n && sentence[i] != ' ') i++;
            word = sentence.substr(st, i - st);
            i++;
            // Process word
            for (int j = 1; j <= word.size(); j++) {
                if (dict.count(word.substr(0, j))) { // find prefix
                    word = word.substr(0, j);
                    break;
                }
            }
            if (ans.empty()) ans = word;
            else ans += " " + word;
        }
        return ans;
    }
};

方法二：字典樹(Trie)

但在方法一的過程中，對於每個單詞，我們需要對其所有前綴進行查找，這樣的時間複雜度是 $\mathcal{O}(n \times m)$ ，其中 $n$ 是句子中的單字數量， $m$ 是每個單字的最大長度。這樣的效率並不高。

那有沒有更好的方法呢？答案是有的，我們可以使用來字典樹(Trie) 優化查找過程。字典樹是一種樹形數據結構，用於存儲字符串集合，通常用於查找單詞的前綴。在這裡，我們可以將字典中的所有字根插入到字典樹中，這樣就可以快速查找單詞的前綴。

在方法一中若要查詢單字 $\rm{cat}$ 的所有前綴是否在字典中，我們需要對其所有前綴 $\rm{c}$ 、 $\rm{ca}$ 、 $\rm{cat}$ 進行查找，時間複雜度為 $\mathcal{O}(w^2)$ ，其中 $w$ 是單字的長度；而在字典樹中，我們只需要對其逐個字元進行查找即可，這樣的時間複雜度是 $\mathcal{O}(w)$ 。

具體步驟如下

構建字典樹：將字根字典中的字根插入到字典樹(Trie)中，以便能夠快速檢查單詞的前綴。
- 對於每個節點，我們需要一個長度為 $26$ 的子節點列表 $child$ ，以及一個標記是否為結尾的變數 $isEnd$ 。
處理每個單詞：將句子按空格分割成單詞列表，對於每個單詞，通過字典樹檢查其前綴是否存在於字根集合中。
替換字根：如果找到匹配的字根，則用該字根替換該單詞；如果找不到，則保留原單詞。
- 找不到字根的情況有兩種：
  - 沒有前綴，即字典樹中沒有以該單詞為前綴的字根。
  - 有前綴，但沒有以該單詞為結尾的字根。
構建結果句子：將處理後的單詞重新組合成一個句子並返回。

複雜度分析

時間複雜度： $\mathcal{O}(d + s)$ $O (d + s)$ ，其中 $d$ $d$ 是字典中所有字根的字元總數， $s$ $s$ 是字串 $sentence$ $se n t e n ce$ 的長度
- 構建字典樹的時間複雜度為 $\mathcal{O}(d)$ 。
- 在最壞情況下，我們需要對每個單字檢查其所有前綴，因此時間複雜度為 $\mathcal{O}(s)$ 。
空間複雜度： $\mathcal{O}(d + s)$ $O (d + s)$ 。
- 存儲字典樹需要 $\mathcal{O}(d)$ 的空間。
- 分割後的單字列表需要 $\mathcal{O}(s)$ 的空間。

class Node: # Trie
    def __init__(self):
        self.child = [None] * 26
        self.isEnd = False
    
class Solution:
    def replaceWords(self, dictionary: List[str], sentence: str) -> str:
        root = Node()
        for word in dictionary:
            cur = root
            for ch in word:
                idx = ord(ch) - ord("a")
                if cur.child[idx] is None:
                    cur.child[idx] = Node()
                cur = cur.child[idx]
            cur.isEnd = True
        ans = []
        for word in sentence.split(" "):
            cur = root
            for i, ch in enumerate(word):
                idx = ord(ch) - ord("a")
                cur = cur.child[idx]
                if cur is None: # no prefix, case 1
                    ans.append(word)
                    break
                if cur.isEnd: # find prefix
                    ans.append(word[:i+1])
                    break
            else: # no prefix, case 2
                ans.append(word)
        return " ".join(ans)

class Solution2 {
public:
    string replaceWords(vector<string>& dictionary, string sentence) {
        Node *root = new Node();
        for (auto& word : dictionary) {
            Node* cur = root;
            for (auto& ch : word) {
                if (!cur->child[ch - 'a']) cur->child[ch - 'a'] = new Node();
                cur = cur->child[ch - 'a'];
            }
            cur->isEnd = true;
        }
        int n = sentence.size(), i = 0, st;
        string ans, word;
        while (i < n) {
            st = i;
            while (i < n && sentence[i] != ' ') i++;
            word = sentence.substr(st, i - st);
            i++;
            // Process word
            Node* cur = root;
            for (int j = 0; j < word.size(); j++) {
                if (!cur->child[word[j] - 'a']) break;
                cur = cur->child[word[j] - 'a'];
                if (cur->isEnd) {
                    word = word.substr(0, j + 1);
                    break;
                }
            }
            if (ans.empty()) ans = word;
            else ans += " " + word;
        }
        return ans;
    }
};