208 Implement Trie (Prefix Tree)

4 minute read

208 Implement Trie (Prefix Tree)

문제

A trie (pronounced as “try”) or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings.
There are various applications of this data structure, such as autocomplete and spellchecker.

  • 트라이(trie) 또는 prefix tree: 문자열 데이터 집합에서 키를 효율적으로 저장하고 조회하는 데 사용하는 트리 데이터 구조
  • 자동 완성 또는 맞춤법 검사 등에서 사용

Implement the Trie class:

  • Trie() Initializes the trie object.
  • void insert(String word) Inserts the string word into the trie.
  • boolean search(String word) Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
  • boolean startsWith(String prefix) Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.
  • Trie(): 트라이 오브젝트 초기화
  • insert(String word): word를 삽입
  • search(String word): 트라이에 word 존재하면 true 반환, 없으면 false 반환
  • startsWith(String prefix): prefix 가지는 word 존재하면 true, 없으면 false

조건

  • 1 <= word.length, prefix.length <= 2000
  • word and prefix consist only of lowercase English letters.
  • At most 3 * 10^4 calls in total will be made to insert, search, and startsWith.

예제

Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]

Explanation
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple");   // return True
trie.search("app");     // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app");     // return True

해결

1st

1 생각

  • insert, search는 dict 쓰면 금방 될 거 같다
  • 그런데 startsWith는 어떻게 해야 할까? 떠오르는 생각을 나열해보자
    1. 딕셔너리 key를 모두 순회하면서 prefix로 시작하는 문자열 있는지 찾는다
    2. prefix별로 단어들이 관리하고, prefix에 해당하는 단어들을 한번에 반환
      • 그런데 이러면, 어디서부터 어디까지를 prefix로 볼 것인지가 문제일 거 같다
  • 컴퓨터를 과소평가 한 거 같다. 더 이상 생각이 나지 않아 책을 참고라니, 모든 문자를 노드로 봐서 노드를 연결하여 단어를 만든다.

1 코드

import sys
sys.path.insert(0, '.')
import tracemalloc
import collections
from pai_util.trace import display_top

class TrieNode:
    def __init__(self):
        self.word = False   # 한 단어가 완성됨을 의미 
        self.children = {}  # 연결되는 다음 문자

class Trie:
    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.root = TrieNode()
        self.word_map = collections.defaultdict(bool)


    def insert(self, word: str) -> None:
        """
        Inserts a word into the trie.
        """
        node = self.root
        for char in word:
            if char not in node.children: 
                node.children[char] = TrieNode()
            node = node.children[char] # node.children[char].children[char]...로 연결된다
        node.word = True
        self.word_map[word] = True

    '''
    Runtime: 176 ms, faster than 64.30% of Python3 online submissions for Implement Trie (Prefix Tree).
    Memory Usage: 32.2 MB, less than 37.82% of Python3 online submissions for Implement Trie (Prefix Tree).
    '''
    def search(self, word: str) -> bool:
        """
        Returns if the word is in the trie.
        """
        node = self.root
        for char in word:
            if char not in node.children:
                return False
            node = node.children[char]
        
        return node.word


    '''
    Runtime: 160 ms, faster than 74.00% of Python3 online submissions for Implement Trie (Prefix Tree).
    Memory Usage: 32.2 MB, less than 37.82% of Python3 online submissions for Implement Trie (Prefix Tree).
    '''
    def search2(self, word: str) -> bool:
        """
        Returns if the word is in the trie.
        """
        if word in self.word_map:
            return True
        else:
            return False


    def startsWith(self, prefix: str) -> bool:
        """
        Returns if there is any word in the trie that starts with the given prefix.
        """
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        
        return True

1 결과

Runtime: 168 ms, faster than 71.19% of Python3 online submissions for Implement Trie (Prefix Tree).
Memory Usage: 32.4 MB, less than 36.61% of Python3 online submissions for Implement Trie (Prefix Tree).

1 회고

  • 문자열을 문자로 쪼개서 각각 class의 인스턴스로 만든다? 처음에는 낭비라고 생각을 했다
  • 왜 낭비라고 생각했을까???
    • “apple”만 보면 char[] 배열이고, char 바이트의 연속일 텐데, 굳이 이 char 바이트를 TrieNode 같은 인스턴스로 만드는 게 메모리 낭비로 느껴졌다
    • 근데 전체 테스트 케이스 호출이 3 * 10^4(=30,000) 있었고 결과의 메모리 사용량을 보면 32.4 MB를 사용했는데, 32.4 MB = 32,400,000 byte, 이를 단순히 30,000으로 나누면 1회당 1,080 byte 사용했는데, 이 정도면 요즘 컴퓨터에서는 그렇게 낭비 아니지 않을까? 4k 영상도 핸드폰으로 보는 시대인데?
  • 메모레 체크상위 10개를 표시해 보면 2,000자의 문자열이라도 그렇게 많은 메모리를 차지하지 않는다
tracemalloc.start()
# 1 <= word.length <= 2000
word = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxgtyeuyriwpasdksdklfnsdmvncxmvnjdgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuef'
obj = Trie()
print(len(word))
obj.insert(word)
print(obj.search(word))
prefix1 = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwf'
print(obj.startsWith(prefix1))
prefix2 = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefh'
print(obj.startsWith(prefix2))
prefix3 = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxgtyeuyriwpasdksdklfnsdmvncxmvnjdgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviue'
print(obj.startsWith(prefix3))
word2 = 'jnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfiddsdfsdfsdfsdfiddsdfsdfidjdksdndvlndslfwoiefhowfcslmnvdslfwoiefhowfcslmnvkjsdhowfcslmnvkjsdbfkjcslmnvkjsdbfkjsdbfjsdhfiuwefgquyvrgquyvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhtxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqasdfsddsvrtgbdidddddytjnhgdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqghasgfsagdstxfzxgcvgquyrwfeutyqwreuywdksdklfnsdmvncxmvnjdgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuef'
print(len(word2))
obj.insert(word2)
word3 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasdgcvsd'
print(len(word3))
obj.insert(word3)
prefix4 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasdcxvdsfdv'
print(obj.startsWith(prefix4))
prefix5 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasd'
print(obj.startsWith(prefix5))
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')
""" print("[ Top 10 ]")
for stat in top_stats[:10]:
    print('메모리 블록이 할당된 곳:', stat.traceback, '/ 메모리 블록 수(int):', stat.count, '/ 총 메모리 블록의 바이트 단위 크기 (int):', stat.size) """
display_top(snapshot)
tracemalloc.stop()

'''
2000
True
True
True
True
1871
288
False
True
Current memory usage is 1.600048MB; Peak was 1.600374MB
Top 10 lines
#1: 208.py:28: 877.4 KiB
    node.children[char] = TrieNode()
#2: 208.py:9: 422.4 KiB
    self.word = False   # 한 단어가 완성됨을 의미
#3: 208.py:10: 259.9 KiB
    self.children = {}  # 연결되는 다음 문자
#4: 208.py:95: 1.1 KiB
    word3 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasdgcvsd'
#5: 208.py:97: 0.4 KiB
    obj.insert(word3)
#6: 208.py:101: 0.4 KiB
    print(obj.startsWith(prefix5))
#7: 208.py:85: 0.4 KiB
    print(obj.search(word))
#8: 208.py:105: 0.4 KiB
    snapshot = tracemalloc.take_snapshot()
#9: 208.py:18: 0.2 KiB
    self.word_map = collections.defaultdict(bool)
#10: 208.py:17: 0.1 KiB
    self.root = TrieNode()
2 other: 0.1 KiB
Total allocated size: 1563.0 KiB
'''
  • 2,000 + 1,871 + 288 = 4,159자에 대해 TrieNode를 생성하였고, 이에 대해 877.4 KiB는 약 0.9 MB 정도
  • java를 봐도 정수 하나를 Integer 클래스의 인스턴스로 만들어서 사용하기도 하는데, 인스턴스로 만들어서 오히려 결과적으로 문제 해결 및 중복 제거가 가능하다면 그게 더 이득이라고 보는 게 맞는 거 같다.

Updated: