Tries, also known as prefix trees, are an easy way to encode values that share a common prefix, and easily query which values share a given prefix.

Data Structure

A Trie is a tree where the path from the root to the leaf represents a string in the dictionary.

Create Tries

At the beginning, a trie is just a root tree node. Each time we read a string, we insert the string into the trie. 1. Let P point to the root node, and i points to the first character of the string 2. Check if s[i] is already in children list of P: 1) If yes, then move P to the child node and i ++ 2) If no, create a new child node for P with the key s[i], then move P and i++ 3. Repeat step 2 until you are out of characters in the string

Note that for each string, we need to add a stop value (e.g. None) as a child of the node to indicate taht the string is complete.

Examples

The following is a Trie of the string set [foo, foobar, foobaz].

Tires Example

Searching for a string in a trie is basically the same as inserting one, except that instead of adding a new node when you encounter a character that isn’t encoded, you return False indicating that the string isn’t found. If you get to the end of the string and there’s not an indicator that this complete string exists (a None as a child, in this example), then return False. Otherwise, return True.

Finding all strings starting with a given prefix

Firstly, find the tree node encoding the given prefix. Then, find all leaf nodes in the subtree whose root is the found node. You can use either DFS (better is you only want to return one leaf) or BFS.

Python implementation

TrieNode

[-] TrieNode Class
class TrieNode(object):
    def __init__(self, value, string):
        self.value = value
        self.children = {}
        self.string = string

Trie Class

[-] TrieNode Class
class Trie(object):
    """
    Class for Trie tree
    """
    def __init__(self):
        self.root = TrieNode(None)
        
    def add_string(self, s):
        p = self.root
        for c in s:
            if c in p.children:
                p = p.children[c]
            else:
                child = TrieNode(None)
                p.children[c] = child
                p = child
        p.word = s
    
    def find(self, s):
        p = self.root
        for c in s:
            if c in p.children:
                p = p.children[c]
            else:
                return False
        return p.word is not None

    def start_with(self, prefix):
        ret = []
        p = self.root
        for c in prefix:
            if c in p.children:
                p = p.children[c]
            else:
                return ret
        stack = [p]
        while stack:
            node = stack.pop()
            if node.word is not None:
                ret.append(node.word)
            for p in node.children.values():
                stack.append(p)
        return ret

Use cases

You can find all words in the set matching the given prefix or find the given word in $O(n)$ time where $n$ is the length of the word.

[-] Sample code to find all words matching the given prefix
if __name__ == "__main__":
    t = Trie()
    t.add_string("foo")
    t.add_string("food")
    t.add_string("foot")
    t.add_string("zzzdevil")

    print("Start with 'f': %s" % t.start_with('f'))

There is very popular questions (FB interview question) here. The question is, by given a set of strings, find the smallest subset, which contains the prefixes for every strings in the set.

The solution is to DFS the tree, but we stop going deeper when we touch the leaf node (the node represents a word).

[-] Sample code to find the smallest subset that contains prefixes of all words
class Trie(object):
    # ...
    def smallest__subset_prefix(self):
        """
        Find the smallest subset of the dictionary, which contains prefixes of all words in the dictionary
        :return: a list of strings
        """
        ret = []
        # DFS
        stack = [self.root]
        while stack:
            node = stack.pop()
            if node.word is not None:
                ret.append(node.word)
            else:
                for p in node.children.values():
                    stack.append(p)

        return ret
        
if __name__ == "__main__":
    t = Trie()
    t.add_string("foo")
    t.add_string("food")
    t.add_string("foot")
    t.add_string("zzzdevil")

    print("==== Find all strings start with the given prefix ===")
    prefix = "f"
    print("Prefix: %s" % prefix)
    print("Result: %s" % t.start_with('f'))
    print()

    print("==== Find smallest subset contains prefixes of all words ===")
    print("Result: %s" % t.smallest__subset_prefix())
    

References:

https://biesnecker.com/algods/2015/04/13/dive-into-tries/

https://biesnecker.com/sweiqotd/2015/04/14/given-set-of-strings-return-minimum-subset-of-prefixes/