context
While working for TRB Capital Management, certain strategies necessitated finding the minimum and maximum of a moving window of prices.
problem statement
Design a data structure supporting the following operations:
-
build(size_t capacity): initialize the data structure with capacity/window sizecapacity -
The data structure must always hold \(\leq\)
capacityprices. -
void push_back(double value) - If the data structure exceeds capacity, remove elements from the front of the window.
-
void pop_front(): remove the price from the front of the window -
size_t size(): return the number of prices in the data structure -
double get(): return the extrema (min or max)
solution
Try to solve it yourself first. The point of this exercise it to create the most theoretically optimal solution you can, not brute-force and move on.
naïve solution
One can design a data structure meeting these requirements through
simulating the operations directly with a
std::deque<double>.
On the upside, this approach is simple to understand. Further, operations are all \(O(1)\) time—that is, nearly all operations. The minimum/maximum element must be found via a linear scan in \(O(n)\) time, certainly far from optimal.
#include <algorithm>
#include <deque>
#include <stdexcept>
class ExtremaCircularBuffer {
public:
ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
void push_back(double value) {
if (prices.size() == capacity) {
prices.pop_front();
}
prices.push_back(value);
}
void pop_front() {
if (prices.empty()) {
throw std::out_of_range("Cannot pop_front() from empty buffer");
}
prices.pop_front();
}
size_t size() const { return prices.size(); }
double get() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find max() of empty buffer");
}
return *std::max_element(prices.begin(), prices.end());
}
private:
std::deque<double> prices;
size_t capacity;
};
optimizing the approach
Rather than bear the brunt of the work finding extrema in calls to
get(), we can distribute it across the data structure as it is built.
Maintaining the prices in a sorted order seems to suffice, and gives access to both max and min in \(O(1)\) time. However, all of the problem constraints have not been addressed. Adhering to the interface of a circular buffer is another challenge.
Fortunately, pairing each element with a count allows intelligent
removal/insertion of elements—if an element has a count of
\(0\), remove it from the list of sorted prices. A
std::map<double, size_t>
allows us to do all of this.
Now, we can access extrema instantly. Insertion and deletion take \(O(log(n))\) time thanks to the map—but we can do better.
#include <deque>
#include <map>
#include <stdexcept>
class ExtremaCircularBuffer {
public:
ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
void push_back(double value) {
if (prices.size() == capacity) {
double front = prices.front();
if (--sorted_prices[front] == 0)
sorted_prices.erase(front);
prices.pop_front();
}
prices.push_back(value);
++sorted_prices[value];
}
void pop_front() {
if (prices.empty()) {
throw std::out_of_range("Cannot pop_front() from empty buffer");
}
double front = prices.front();
if (--sorted_prices[front] == 0)
sorted_prices.erase(front);
prices.pop_front();
}
size_t size() const { return prices.size(); }
double get_max() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find max() of empty buffer");
}
return sorted_prices.rbegin()->first;
}
double get_min() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find min() of empty buffer");
}
return sorted_prices.begin()->first;
}
private:
std::deque<double> prices;
std::map<double, size_t> sorted_prices;
size_t capacity;
};
monotonic queues deques
Thinking a bit deeper about the problem constraints, it is clear that:
- If an extrema is pushed onto the data structure, all previously pushed elements are irrelevant to any further operations.
Elements are processed in FIFO order, enabling this observation to be exploited. This is the foundationl idea of the monotone priority queue data structure. So, for maintaining a minimum/maximum, the data structure will store a monotonically increasing/decreasing double-ended queue.
This solution does not satisfy a circular buffer inherently. If an arbitrary number of elements are removed from the data structure when an extrema is added, it is certainly not possible to maintain a window of fixed size.
Thus, we make one more observation to meet this criterion:
- If each price (extrema) on the monotonic double-ended queue also maintains a count of previously popped elements, we can deduce the proper action to take when the data structure reaches capacity.
- If elements were previously popped before this extrema was added to the data structure, decrement the price's count of popped elements and do nothing.
- Otherwise, either no elements were pushed before this extrema or they've all been popped. Remove (pop) this element from the deque.
This approach supports all operations in amortized \(O(1)\) time (with a monotonic sequence, elements are added or removed at least once; across a sequence of \(n\) operations, \(n\) total \(O(1)\) operations will be executed).
#include <deque>
#include <stdexcept>
#include <utility>
class ExtremaCircularBuffer {
public:
explicit ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
void push_back(double value) {
if (prices.size() == capacity) {
double front_value = prices.front();
pop_max(front_value);
prices.pop_front();
}
prices.push_back(value);
push_max(value);
}
void pop_front() {
if (prices.empty()) {
throw std::out_of_range("Cannot pop_front() from empty buffer");
}
double front_value = prices.front();
pop_max(front_value);
prices.pop_front();
}
size_t size() const { return prices.size(); }
double get_max() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find max() of empty buffer");
}
return maxs.front().first;
}
private:
void push_max(double value) {
size_t popped = 0;
while (!maxs.empty() && maxs.back().first < value) {
popped += maxs.back().second + 1;
maxs.pop_back();
}
maxs.emplace_back(value, popped);
}
void pop_max(double value) {
size_t popped = maxs.front().second;
if (popped == 0) {
maxs.pop_front();
} else {
--maxs.front().second;
}
}
std::deque<double> prices;
std::deque<std::pair<double, size_t>> maxs;
size_t capacity;
};
further improvements
- While the final approach is theoretically faster than the second, with small data sets the overhead of the latter is likely to upset any performance gains.
-
The class could leverage templates to take in a comparator
std::less<double>) to easily specify a minimum/maximumExtremaCircularBufferas well as a value type to support all operations. - As it stands, the class also only maintains one of either extrema, and using two monotonic deques, while still theoretically optimal, doesn't give me a good feeling. The second map-based approach might be favorable here.