208 Implement Trie (Prefix Tree)
208 Implement Trie (Prefix Tree)
문제
A trie (pronounced as “try”) or prefix tree is a tree data structure used to efficiently store and retrieve keys in a dataset of strings.
There are various applications of this data structure, such as autocomplete and spellchecker.
- 트라이(trie) 또는 prefix tree: 문자열 데이터 집합에서 키를 효율적으로 저장하고 조회하는 데 사용하는 트리 데이터 구조
- 자동 완성 또는 맞춤법 검사 등에서 사용
Implement the Trie class:
Trie()
Initializes the trie object.void insert(String word)
Inserts the string word into the trie.boolean search(String word)
Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.boolean startsWith(String prefix)
Returns true if there is a previously inserted string word that has the prefix prefix, and false otherwise.
Trie()
: 트라이 오브젝트 초기화insert(String word)
:word
를 삽입search(String word)
: 트라이에word
존재하면 true 반환, 없으면 false 반환startsWith(String prefix)
:prefix
가지는word
존재하면 true, 없으면 false
조건
- 1 <= word.length, prefix.length <= 2000
- word and prefix consist only of lowercase English letters.
- At most 3 * 10^4 calls in total will be made to insert, search, and startsWith.
예제
Input
["Trie", "insert", "search", "search", "startsWith", "insert", "search"]
[[], ["apple"], ["apple"], ["app"], ["app"], ["app"], ["app"]]
Output
[null, null, true, false, true, null, true]
Explanation
Trie trie = new Trie();
trie.insert("apple");
trie.search("apple"); // return True
trie.search("app"); // return False
trie.startsWith("app"); // return True
trie.insert("app");
trie.search("app"); // return True
해결
1st
1 생각
- insert, search는 dict 쓰면 금방 될 거 같다
- 그런데 startsWith는 어떻게 해야 할까? 떠오르는 생각을 나열해보자
- 딕셔너리 key를 모두 순회하면서
prefix
로 시작하는 문자열 있는지 찾는다 prefix
별로 단어들이 관리하고,prefix
에 해당하는 단어들을 한번에 반환- 그런데 이러면, 어디서부터 어디까지를 prefix로 볼 것인지가 문제일 거 같다
- 딕셔너리 key를 모두 순회하면서
- 컴퓨터를 과소평가 한 거 같다. 더 이상 생각이 나지 않아 책을 참고라니, 모든 문자를 노드로 봐서 노드를 연결하여 단어를 만든다.
1 코드
import sys
sys.path.insert(0, '.')
import tracemalloc
import collections
from pai_util.trace import display_top
class TrieNode:
def __init__(self):
self.word = False # 한 단어가 완성됨을 의미
self.children = {} # 연결되는 다음 문자
class Trie:
def __init__(self):
"""
Initialize your data structure here.
"""
self.root = TrieNode()
self.word_map = collections.defaultdict(bool)
def insert(self, word: str) -> None:
"""
Inserts a word into the trie.
"""
node = self.root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char] # node.children[char].children[char]...로 연결된다
node.word = True
self.word_map[word] = True
'''
Runtime: 176 ms, faster than 64.30% of Python3 online submissions for Implement Trie (Prefix Tree).
Memory Usage: 32.2 MB, less than 37.82% of Python3 online submissions for Implement Trie (Prefix Tree).
'''
def search(self, word: str) -> bool:
"""
Returns if the word is in the trie.
"""
node = self.root
for char in word:
if char not in node.children:
return False
node = node.children[char]
return node.word
'''
Runtime: 160 ms, faster than 74.00% of Python3 online submissions for Implement Trie (Prefix Tree).
Memory Usage: 32.2 MB, less than 37.82% of Python3 online submissions for Implement Trie (Prefix Tree).
'''
def search2(self, word: str) -> bool:
"""
Returns if the word is in the trie.
"""
if word in self.word_map:
return True
else:
return False
def startsWith(self, prefix: str) -> bool:
"""
Returns if there is any word in the trie that starts with the given prefix.
"""
node = self.root
for char in prefix:
if char not in node.children:
return False
node = node.children[char]
return True
1 결과
Runtime: 168 ms, faster than 71.19% of Python3 online submissions for Implement Trie (Prefix Tree).
Memory Usage: 32.4 MB, less than 36.61% of Python3 online submissions for Implement Trie (Prefix Tree).
1 회고
- 문자열을 문자로 쪼개서 각각 class의 인스턴스로 만든다? 처음에는 낭비라고 생각을 했다
- 왜 낭비라고 생각했을까???
- “apple”만 보면 char[] 배열이고, char 바이트의 연속일 텐데, 굳이 이 char 바이트를 TrieNode 같은 인스턴스로 만드는 게 메모리 낭비로 느껴졌다
- 근데 전체 테스트 케이스 호출이 3 * 10^4(=30,000) 있었고 결과의 메모리 사용량을 보면 32.4 MB를 사용했는데, 32.4 MB = 32,400,000 byte, 이를 단순히 30,000으로 나누면 1회당 1,080 byte 사용했는데, 이 정도면 요즘 컴퓨터에서는 그렇게 낭비 아니지 않을까? 4k 영상도 핸드폰으로 보는 시대인데?
- 메모레 체크와 상위 10개를 표시해 보면 2,000자의 문자열이라도 그렇게 많은 메모리를 차지하지 않는다
tracemalloc.start()
# 1 <= word.length <= 2000
word = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxgtyeuyriwpasdksdklfnsdmvncxmvnjdgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuef'
obj = Trie()
print(len(word))
obj.insert(word)
print(obj.search(word))
prefix1 = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwf'
print(obj.startsWith(prefix1))
prefix2 = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefh'
print(obj.startsWith(prefix2))
prefix3 = 'asdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxgtyeuyriwpasdksdklfnsdmvncxmvnjdgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviue'
print(obj.startsWith(prefix3))
word2 = 'jnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfiddsdfsdfsdfsdfiddsdfsdfidjdksdndvlndslfwoiefhowfcslmnvdslfwoiefhowfcslmnvkjsdhowfcslmnvkjsdbfkjcslmnvkjsdbfkjsdbfjsdhfiuwefgquyvrgquyvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhtxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqasdfsddsvrtgbdidddddytjnhgdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvasdfsddsvrtgbdidddddytjnhgtfgghdddddrfeferdhtyjretyjnddsdfsdfidjdksdndvlndslfwoiefhowfcdslmnvkjsdbfkjsdbfjsdhfiuwefgquywesdfsdfliwopfihwlfcnsdvccjbvhsdfjhsfiqoqghasgfsagdstxfzxgcvgquyrwfeutyqwreuywdksdklfnsdmvncxmvnjdgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuefqoqwjpowqpqoiwjdjksbjdnbcnmxbvhfdgvgjghasgfsagdstxfzxgcvgquyrwfeutyqwreuywqtduhsvajvzbxbxbxbhshsdfgtyeuyriwpasdksdklfnsdmvncxmvnfjdgviuef'
print(len(word2))
obj.insert(word2)
word3 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasdgcvsd'
print(len(word3))
obj.insert(word3)
prefix4 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasdcxvdsfdv'
print(obj.startsWith(prefix4))
prefix5 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasd'
print(obj.startsWith(prefix5))
current, peak = tracemalloc.get_traced_memory()
print(f"Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('traceback')
""" print("[ Top 10 ]")
for stat in top_stats[:10]:
print('메모리 블록이 할당된 곳:', stat.traceback, '/ 메모리 블록 수(int):', stat.count, '/ 총 메모리 블록의 바이트 단위 크기 (int):', stat.size) """
display_top(snapshot)
tracemalloc.stop()
'''
2000
True
True
True
True
1871
288
False
True
Current memory usage is 1.600048MB; Peak was 1.600374MB
Top 10 lines
#1: 208.py:28: 877.4 KiB
node.children[char] = TrieNode()
#2: 208.py:9: 422.4 KiB
self.word = False # 한 단어가 완성됨을 의미
#3: 208.py:10: 259.9 KiB
self.children = {} # 연결되는 다음 문자
#4: 208.py:95: 1.1 KiB
word3 = 'dewoidhdslknforhgferiubgvkjfbvkasdjdbfkjasdfjksdbfkjsabhfuiweihfiuqwegfihbvhsbvmnxcbncnndhjdhxcycywioqwiskkdfbnsdkjhfreifgiwreuhfdsjbvckcjbvkjsdfkujshfiuwegfiweufedjihfdskjfnsdkjbfviuerhbvyrefbvifdhbsdkjfbvksdubfiweufhiewuhasdjkcaskjdqiuqwiqwouwiudhwiudbsutcdstytdtcrsfytcrsadtycfasdgcvsd'
#5: 208.py:97: 0.4 KiB
obj.insert(word3)
#6: 208.py:101: 0.4 KiB
print(obj.startsWith(prefix5))
#7: 208.py:85: 0.4 KiB
print(obj.search(word))
#8: 208.py:105: 0.4 KiB
snapshot = tracemalloc.take_snapshot()
#9: 208.py:18: 0.2 KiB
self.word_map = collections.defaultdict(bool)
#10: 208.py:17: 0.1 KiB
self.root = TrieNode()
2 other: 0.1 KiB
Total allocated size: 1563.0 KiB
'''
- 2,000 + 1,871 + 288 = 4,159자에 대해 TrieNode를 생성하였고, 이에 대해 877.4 KiB는 약 0.9 MB 정도
- java를 봐도 정수 하나를 Integer 클래스의 인스턴스로 만들어서 사용하기도 하는데, 인스턴스로 만들어서 오히려 결과적으로 문제 해결 및 중복 제거가 가능하다면 그게 더 이득이라고 보는 게 맞는 거 같다.