feat(algorithms): extrema circular buffer
This commit is contained in:
parent
b7045bfc34
commit
85b584bcf2
2 changed files with 416 additions and 7 deletions
|
|
@ -41,17 +41,361 @@
|
|||
</p>
|
||||
</header>
|
||||
<article class="post-article">
|
||||
<h2>an h2</h2>
|
||||
<h2>context</h2>
|
||||
<div>
|
||||
<p>
|
||||
While working for
|
||||
<a href="https://trbcap.com/">TRB Capital Management</a>, certain
|
||||
strategies necessitated finding the minimum and maximum of a
|
||||
moving window of prices.
|
||||
</p>
|
||||
</div>
|
||||
<h2>problem statement</h2>
|
||||
<p>Design a data structure supporting the following operations:</p>
|
||||
<ul>
|
||||
<li>
|
||||
<span class="inline-code"
|
||||
><code>build(size_t capacity)</code></span
|
||||
>
|
||||
: initialize the data structure with capacity/window size
|
||||
<span class="inline-code"><code>capacity</code></span>
|
||||
</li>
|
||||
<ul>
|
||||
<li>
|
||||
The data structure must always hold \(\leq\)
|
||||
<span class="inline-code"><code>capacity</code></span>
|
||||
prices.
|
||||
</li>
|
||||
</ul>
|
||||
<li>
|
||||
<span class="inline-code"
|
||||
><code>void push_back(double value)</code></span
|
||||
>
|
||||
</li>
|
||||
<ul>
|
||||
<li>
|
||||
If the data structure exceeds capacity, remove elements from the
|
||||
front of the window.
|
||||
</li>
|
||||
</ul>
|
||||
<li>
|
||||
<span class="inline-code"><code>void pop_front()</code></span>
|
||||
: remove the price from the front of the window
|
||||
</li>
|
||||
<li>
|
||||
<span class="inline-code"><code>size_t size()</code></span>
|
||||
: return the number of prices in the data structure
|
||||
</li>
|
||||
<li>
|
||||
<span class="inline-code"><code>double get()</code></span>
|
||||
: return the extrema (min or max)
|
||||
</li>
|
||||
</ul>
|
||||
<h2>solution</h2>
|
||||
<p>
|
||||
Try to solve it yourself first. The point of this exercise it to
|
||||
create the most theoretically optimal solution you can, not
|
||||
brute-force and move on.
|
||||
</p>
|
||||
<div class="fold">
|
||||
<h3>
|
||||
<h3>naïve solution</h3>
|
||||
</div>
|
||||
<div class="problem-content">
|
||||
<p>
|
||||
One can design a data structure meeting these requirements through
|
||||
simulating the operations directly with a
|
||||
<a
|
||||
target="blank"
|
||||
href="https://leetcode.com/problems/container-with-most-water/"
|
||||
>container with most water</a
|
||||
>
|
||||
</h3>
|
||||
href="https://en.cppreference.com/w/cpp/container/deque"
|
||||
><span class="inline-code"
|
||||
><code>std::deque<double></code></span
|
||||
></a
|
||||
>.
|
||||
</p>
|
||||
<p>
|
||||
On the upside, this approach is simple to understand. Further,
|
||||
operations are all \(O(1)\) time—that is, nearly all
|
||||
operations. The minimum/maximum element must be found via a linear
|
||||
scan in \(O(n)\) time, certainly far from optimal.
|
||||
</p>
|
||||
<pre><code class="language-cpp">#include <algorithm>
|
||||
#include <deque>
|
||||
#include <stdexcept>
|
||||
|
||||
class ExtremaCircularBuffer {
|
||||
public:
|
||||
ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
|
||||
|
||||
void push_back(double value) {
|
||||
if (prices.size() == capacity) {
|
||||
prices.pop_front();
|
||||
}
|
||||
|
||||
prices.push_back(value);
|
||||
}
|
||||
|
||||
void pop_front() {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot pop_front() from empty buffer");
|
||||
}
|
||||
|
||||
prices.pop_front();
|
||||
}
|
||||
|
||||
size_t size() const { return prices.size(); }
|
||||
|
||||
double get() const {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot find max() of empty buffer");
|
||||
}
|
||||
|
||||
return *std::max_element(prices.begin(), prices.end());
|
||||
}
|
||||
|
||||
private:
|
||||
std::deque<double> prices;
|
||||
size_t capacity;
|
||||
};</code></pre>
|
||||
</div>
|
||||
<div class="fold">
|
||||
<h3>optimizing the approach</h3>
|
||||
</div>
|
||||
<div class="problem-content">
|
||||
<p>
|
||||
Rather than bear the brunt of the work finding extrema in calls to
|
||||
<span class="inline-code"><code>get()</code></span
|
||||
>, we can distribute it across the data structure as it is built.
|
||||
</p>
|
||||
<p>
|
||||
Maintaining the prices in a sorted order seems to suffice, and
|
||||
gives access to both max <i>and</i> min in \(O(1)\) time. However,
|
||||
all of the problem constraints have not been addressed. Adhering
|
||||
to the interface of a circular buffer is another challenge.
|
||||
</p>
|
||||
<p>
|
||||
Fortunately, pairing each element with a count allows intelligent
|
||||
removal/insertion of elements—if an element has a count of
|
||||
\(0\), remove it from the list of sorted prices. A
|
||||
<a
|
||||
target="blank"
|
||||
href="https://en.cppreference.com/w/cpp/container/map"
|
||||
><span class="inline-code"
|
||||
><code>std::map<double, size_t></code></span
|
||||
></a
|
||||
>
|
||||
allows us to do all of this.
|
||||
</p>
|
||||
<p>
|
||||
Now, we can access extrema instantly. Insertion and deletion take
|
||||
\(O(log(n))\) time thanks to the map—but we can do better.
|
||||
</p>
|
||||
<pre><code class="language-cpp">#include <deque>
|
||||
#include <map>
|
||||
#include <stdexcept>
|
||||
|
||||
class ExtremaCircularBuffer {
|
||||
public:
|
||||
ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
|
||||
|
||||
void push_back(double value) {
|
||||
if (prices.size() == capacity) {
|
||||
double front = prices.front();
|
||||
|
||||
if (--sorted_prices[front] == 0)
|
||||
sorted_prices.erase(front);
|
||||
prices.pop_front();
|
||||
}
|
||||
|
||||
prices.push_back(value);
|
||||
++sorted_prices[value];
|
||||
}
|
||||
|
||||
void pop_front() {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot pop_front() from empty buffer");
|
||||
}
|
||||
|
||||
double front = prices.front();
|
||||
|
||||
if (--sorted_prices[front] == 0)
|
||||
sorted_prices.erase(front);
|
||||
prices.pop_front();
|
||||
}
|
||||
|
||||
size_t size() const { return prices.size(); }
|
||||
|
||||
double get_max() const {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot find max() of empty buffer");
|
||||
}
|
||||
|
||||
return sorted_prices.rbegin()->first;
|
||||
}
|
||||
|
||||
double get_min() const {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot find min() of empty buffer");
|
||||
}
|
||||
|
||||
return sorted_prices.begin()->first;
|
||||
}
|
||||
|
||||
private:
|
||||
std::deque<double> prices;
|
||||
std::map<double, size_t> sorted_prices;
|
||||
size_t capacity;
|
||||
};</code></pre>
|
||||
</div>
|
||||
<div class="fold">
|
||||
<h3>monotonic <s>queues</s> deques</h3>
|
||||
</div>
|
||||
<div class="problem-content">
|
||||
<p>
|
||||
Thinking a bit deeper about the problem constraints, it is clear
|
||||
that:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
If an extrema is pushed onto the data structure, all previously
|
||||
pushed elements are irrelevant to any further operations.
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
Elements are processed in FIFO order, enabling this observation to
|
||||
be exploited. This is the foundationl idea of the
|
||||
<a
|
||||
target="blank"
|
||||
href="https://www.wikiwand.com/en/Monotone_priority_queue"
|
||||
>monotone priority queue</a
|
||||
>
|
||||
data structure. So, for maintaining a minimum/maximum, the data
|
||||
structure will store a monotonically increasing/decreasing
|
||||
double-ended queue.
|
||||
</p>
|
||||
<p>
|
||||
This solution does not satisfy a circular buffer inherently. If an
|
||||
arbitrary number of elements are removed from the data structure
|
||||
when an extrema is added, it is certainly not possible to maintain
|
||||
a window of fixed size.
|
||||
</p>
|
||||
<p>Thus, we make one more observation to meet this criterion:</p>
|
||||
<ul>
|
||||
<li>
|
||||
If each price (extrema) on the monotonic double-ended queue also
|
||||
maintains a count of <i>previously popped elements</i>, we can
|
||||
deduce the proper action to take when the data structure reaches
|
||||
capacity.
|
||||
</li>
|
||||
<ol>
|
||||
<li>
|
||||
If elements were previously popped before this extrema was
|
||||
added to the data structure, decrement the price's count
|
||||
of popped elements and do nothing.
|
||||
</li>
|
||||
<li>
|
||||
Otherwise, either no elements were pushed before this extrema
|
||||
or they've all been popped. Remove (pop) this element
|
||||
from the deque.
|
||||
</li>
|
||||
</ol>
|
||||
</ul>
|
||||
<p>
|
||||
This approach supports all operations in amortized \(O(1)\) time
|
||||
(with a monotonic sequence, elements are added or removed at least
|
||||
once; across a sequence of \(n\) operations, \(n\) total \(O(1)\)
|
||||
operations will be executed).
|
||||
</p>
|
||||
<pre><code class="language-cpp">#include <deque>
|
||||
#include <stdexcept>
|
||||
#include <utility>
|
||||
|
||||
class ExtremaCircularBuffer {
|
||||
public:
|
||||
explicit ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
|
||||
|
||||
void push_back(double value) {
|
||||
if (prices.size() == capacity) {
|
||||
double front_value = prices.front();
|
||||
pop_max(front_value);
|
||||
prices.pop_front();
|
||||
}
|
||||
|
||||
prices.push_back(value);
|
||||
push_max(value);
|
||||
}
|
||||
|
||||
void pop_front() {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot pop_front() from empty buffer");
|
||||
}
|
||||
|
||||
double front_value = prices.front();
|
||||
pop_max(front_value);
|
||||
prices.pop_front();
|
||||
}
|
||||
|
||||
size_t size() const { return prices.size(); }
|
||||
|
||||
double get_max() const {
|
||||
if (prices.empty()) {
|
||||
throw std::out_of_range("Cannot find max() of empty buffer");
|
||||
}
|
||||
|
||||
return maxs.front().first;
|
||||
}
|
||||
|
||||
private:
|
||||
void push_max(double value) {
|
||||
size_t popped = 0;
|
||||
|
||||
while (!maxs.empty() && maxs.back().first < value) {
|
||||
popped += maxs.back().second + 1;
|
||||
maxs.pop_back();
|
||||
}
|
||||
|
||||
maxs.emplace_back(value, popped);
|
||||
}
|
||||
|
||||
void pop_max(double value) {
|
||||
size_t popped = maxs.front().second;
|
||||
|
||||
if (popped == 0) {
|
||||
maxs.pop_front();
|
||||
} else {
|
||||
--maxs.front().second;
|
||||
}
|
||||
}
|
||||
|
||||
std::deque<double> prices;
|
||||
std::deque<std::pair<double, size_t>> maxs;
|
||||
size_t capacity;
|
||||
};</code></pre>
|
||||
<div class="fold"><h3>further improvements</h3></div>
|
||||
<ol>
|
||||
<li>
|
||||
While the final approach is <i>theoretically</i> faster than the
|
||||
second, with small data sets the overhead of the latter is
|
||||
likely to upset any performance gains.
|
||||
</li>
|
||||
<li>
|
||||
The class could leverage templates to take in a comparator
|
||||
<span class="inline-code"
|
||||
><code>std::less<double></code></span
|
||||
>
|
||||
) to easily specify a minimum/maximum
|
||||
<span class="inline-code"
|
||||
><code>ExtremaCircularBuffer</code></span
|
||||
>
|
||||
as well as a value type to support all operations.
|
||||
</li>
|
||||
<li>
|
||||
As it stands, the class also only maintains one of either
|
||||
extrema, and using two monotonic deques, while still
|
||||
<i>theoretically</i> optimal, doesn't give me a good
|
||||
feeling. The second map-based approach might be favorable here.
|
||||
</li>
|
||||
</ol>
|
||||
</div>
|
||||
<div class="problem-content">content</div>
|
||||
</article>
|
||||
</div>
|
||||
</main>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue