Alien Dictionary

Problem

There is a new alien language that uses the English alphabet, but the order of the letters is unknown.

You are given a list of strings words from the alien language’s dictionary. It is claimed that the strings in words are sorted lexicographically by the rules of this new language.

If this claim is incorrect, and the given arrangement of strings in words cannot correspond to any order of letters, return "".

Otherwise, return a string of the unique letters in the new alien language sorted in lexicographically increasing order by the new language’s rules. If there are multiple solutions, return any of them.

A string a is lexicographically smaller than a string b if either of the following is true:

  • The first letter where they differ is smaller in a than in b.
  • a is a prefix of b and a.length < b.length.

Examples

Example 1:

Input: words = ["z","o"]

Output: "zo"

Explanation:
From "z" and "o", we know 'z' < 'o', so return "zo".

Example 2:

Input: words = ["hrn","hrf","er","enn","rfnn"]

Output: "hernf"

Explanation:

  • from "hrn" and "hrf", we know 'n' < 'f'
  • from "hrf" and "er", we know 'h' < 'e'
  • from "er" and "enn", we know 'r' < 'n'
  • from "enn" and "rfnn" we know 'e' < 'r'
  • so one possible solution is "hernf"

Example 3:

Input: words = ["abc","ab"]

Output: ""

Explanation:
The second word is a prefix of the first word, but the first word appears before the second. This is impossible in a valid lexicographical ordering, so return "".

Constraints

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 100
  • words[i] consists of only lowercase English letters.

You should aim for a solution with O(N + V + E) time and O(V + E) space, where N is the sum of the lengths of all the strings, V is the number of unique characters (vertices), and E is the number of edges.

Solution

We characters all words to an indegree of 0. For each pairwise words, we check if w1 starts with w2 but is longer. This is invalid, like hello coming before hell in the dictionary. Then, we step along each char and check for the first mismatch. This tells us a potential edge c1 -> c2, and we only count this once. Then we run BFS over 0 degree chars, appending to the result string. If the result string is the number of input chars, then return that. Otherwise invalid.

class Solution {
public:
    string foreignDictionary(vector<string>& words) {
        std::unordered_map<char, int> indegree;
        std::unordered_map<char, std::unordered_set<char>> adj;
        for (const auto &word : words) {
            for (const auto &c : word) {
                indegree[c] = 0;
            }
        }
 
        // Check pairwise words
        for (int i = 0; i < words.size() - 1; ++i) {
            const auto &w1 = words[i];
            const auto &w2 = words[i + 1];
            const auto min_len = std::min(w1.size(), w2.size());
            // If w1 starts with w2 and is longer, then invalid
            // e.g. "hello", "hell" -> "hell" should come before "hello"
            if (w1.size() > w2.size() && w1.substr(0, min_len) == w2.substr(0, min_len)) {
                return "";
            }
            // Create known edges
            // All we can infer is on the first non-equal char, w1[j] -> w2[j]
            for (int j = 0; j < min_len; ++j) {
                const auto c1 = w1[j];
                const auto c2 = w2[j];
                if (c1 != c2) {
                    if (!adj[c1].contains(c2)) {
                        adj[c1].insert(c2);
                        ++indegree[c2];
                    }
                    break;
                }
            }
        }
 
        // Run BFS over words with no prereqs
        std::queue<char> open;
        std::stringstream ss;
        for (const auto &[c, deg] : indegree) {
            if (deg == 0) {
                open.push(c);
            }
        }
        while (!open.empty()) {
            auto c = open.front();
            open.pop();
            ss << c;
            for (const auto n : adj[c]) {
                --indegree[n];
                if (indegree[n] == 0) {
                    open.push(n);
                }
            }
        }
        // Solution if result string contains all characters
        auto res = ss.str();
        return res.size() == indegree.size() ? res : "";
    }
};