Top K Frequent Elements

Problem

Given an integer array nums and an integer k, return the k most frequent elements within the array.

The test cases are generated such that the answer is always unique.

You may return the output in any order.

Examples

Example 1:

Input: nums = [1,2,2,3,3,3], k = 2

Output: [2,3]

Example 2:

Input: nums = [7,7], k = 1

Output: [7]

Constraints

  • 1 <= nums.length <= 10^4.
  • -1000 <= nums[i] <= 1000
  • 1 <= k <= number of distinct elements in nums.

You should aim for a solution with O(n) time and O(n) space, where n is the size of the input array.

Solution

One solution is to map frequencies of each num, then create vector of those with same frequency. Lots of small allocations. If known range is small, juts use a fixed size array and sort (these are technically all constant)

class Solution {
public:
    vector<int> topKFrequent(vector<int>& nums, int k) {
        // One solution is to unordered_map<int, int> frequencies of each num,
        //   create a vector<vector<int>> which are buckets where buckets[i] stores
        //   numbers that appear exactly i times. This is pretty costly though as 
        //   we need to do a lot of small vector allocs
        // Another solution is to use the fact that we have a fixed range of numbers,
        //  -1000 <= nums[i] <= 1000, so we have 2001 distinct values and can just allocate
        //  a single array for this
        constexpr int OFFSET = 1000;
        constexpr int RANGE = 2001;
 
        std::vector<int> frequencies(RANGE, 0); // Count frequency of each number
        std::vector<int> unique_nums;
        unique_nums.reserve(RANGE); // Probably overkill
 
        // Count frequency of each num and store the unique values
        for (const auto &num : nums) {
            const auto idx = static_cast<std::size_t>(num + OFFSET);
            if (frequencies[idx] == 0) {
                unique_nums.push_back(num);
            }
            ++frequencies[idx];
        }
 
        // Here we need to sort the values by frequency, which is O(n log n) on the distinct values,
        // but there is at most 2001 distinct values which is essentially constant time
        std::ranges::sort(unique_nums, [&](int lhs, int rhs) -> bool {
            const auto idx_lhs = static_cast<std::size_t>(lhs + OFFSET);
            const auto idx_rhs = static_cast<std::size_t>(rhs + OFFSET);
            return frequencies[idx_lhs] > frequencies[idx_rhs];
        });
 
        // Get the top K values
        unique_nums.resize(static_cast<std::size_t>(k));
        return unique_nums;
    }
};