feat(algorithms): extrema circular buffer

This commit is contained in:
Barrett Ruth 2024-07-31 10:22:04 -05:00
parent b7045bfc34
commit 85b584bcf2
2 changed files with 416 additions and 7 deletions

View file

@ -41,17 +41,361 @@
</p>
</header>
<article class="post-article">
<h2>an h2</h2>
<h2>context</h2>
<div>
<p>
While working for
<a href="https://trbcap.com/">TRB Capital Management</a>, certain
strategies necessitated finding the minimum and maximum of a
moving window of prices.
</p>
</div>
<h2>problem statement</h2>
<p>Design a data structure supporting the following operations:</p>
<ul>
<li>
<span class="inline-code"
><code>build(size_t capacity)</code></span
>
: initialize the data structure with capacity/window size
<span class="inline-code"><code>capacity</code></span>
</li>
<ul>
<li>
The data structure must always hold \(\leq\)
<span class="inline-code"><code>capacity</code></span>
prices.
</li>
</ul>
<li>
<span class="inline-code"
><code>void push_back(double value)</code></span
>
</li>
<ul>
<li>
If the data structure exceeds capacity, remove elements from the
front of the window.
</li>
</ul>
<li>
<span class="inline-code"><code>void pop_front()</code></span>
: remove the price from the front of the window
</li>
<li>
<span class="inline-code"><code>size_t size()</code></span>
: return the number of prices in the data structure
</li>
<li>
<span class="inline-code"><code>double get()</code></span>
: return the extrema (min or max)
</li>
</ul>
<h2>solution</h2>
<p>
Try to solve it yourself first. The point of this exercise it to
create the most theoretically optimal solution you can, not
brute-force and move on.
</p>
<div class="fold">
<h3>
<h3>na&iuml;ve solution</h3>
</div>
<div class="problem-content">
<p>
One can design a data structure meeting these requirements through
simulating the operations directly with a
<a
target="blank"
href="https://leetcode.com/problems/container-with-most-water/"
>container with most water</a
>
</h3>
href="https://en.cppreference.com/w/cpp/container/deque"
><span class="inline-code"
><code>std::deque&lt;double&gt;</code></span
></a
>.
</p>
<p>
On the upside, this approach is simple to understand. Further,
operations are all \(O(1)\) time&mdash;that is, nearly all
operations. The minimum/maximum element must be found via a linear
scan in \(O(n)\) time, certainly far from optimal.
</p>
<pre><code class="language-cpp">#include &lt;algorithm&gt;
#include &lt;deque&gt;
#include &lt;stdexcept&gt;
class ExtremaCircularBuffer {
public:
ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
void push_back(double value) {
if (prices.size() == capacity) {
prices.pop_front();
}
prices.push_back(value);
}
void pop_front() {
if (prices.empty()) {
throw std::out_of_range("Cannot pop_front() from empty buffer");
}
prices.pop_front();
}
size_t size() const { return prices.size(); }
double get() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find max() of empty buffer");
}
return *std::max_element(prices.begin(), prices.end());
}
private:
std::deque&lt;double&gt; prices;
size_t capacity;
};</code></pre>
</div>
<div class="fold">
<h3>optimizing the approach</h3>
</div>
<div class="problem-content">
<p>
Rather than bear the brunt of the work finding extrema in calls to
<span class="inline-code"><code>get()</code></span
>, we can distribute it across the data structure as it is built.
</p>
<p>
Maintaining the prices in a sorted order seems to suffice, and
gives access to both max <i>and</i> min in \(O(1)\) time. However,
all of the problem constraints have not been addressed. Adhering
to the interface of a circular buffer is another challenge.
</p>
<p>
Fortunately, pairing each element with a count allows intelligent
removal/insertion of elements&mdash;if an element has a count of
\(0\), remove it from the list of sorted prices. A
<a
target="blank"
href="https://en.cppreference.com/w/cpp/container/map"
><span class="inline-code"
><code>std::map&lt;double, size_t&gt;</code></span
></a
>
allows us to do all of this.
</p>
<p>
Now, we can access extrema instantly. Insertion and deletion take
\(O(log(n))\) time thanks to the map&mdash;but we can do better.
</p>
<pre><code class="language-cpp">#include &lt;deque&gt;
#include &lt;map&gt;
#include &lt;stdexcept&gt;
class ExtremaCircularBuffer {
public:
ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
void push_back(double value) {
if (prices.size() == capacity) {
double front = prices.front();
if (--sorted_prices[front] == 0)
sorted_prices.erase(front);
prices.pop_front();
}
prices.push_back(value);
++sorted_prices[value];
}
void pop_front() {
if (prices.empty()) {
throw std::out_of_range("Cannot pop_front() from empty buffer");
}
double front = prices.front();
if (--sorted_prices[front] == 0)
sorted_prices.erase(front);
prices.pop_front();
}
size_t size() const { return prices.size(); }
double get_max() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find max() of empty buffer");
}
return sorted_prices.rbegin()-&gt;first;
}
double get_min() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find min() of empty buffer");
}
return sorted_prices.begin()-&gt;first;
}
private:
std::deque&lt;double&gt; prices;
std::map&lt;double, size_t&gt; sorted_prices;
size_t capacity;
};</code></pre>
</div>
<div class="fold">
<h3>monotonic <s>queues</s> deques</h3>
</div>
<div class="problem-content">
<p>
Thinking a bit deeper about the problem constraints, it is clear
that:
</p>
<ul>
<li>
If an extrema is pushed onto the data structure, all previously
pushed elements are irrelevant to any further operations.
</li>
</ul>
<p>
Elements are processed in FIFO order, enabling this observation to
be exploited. This is the foundationl idea of the
<a
target="blank"
href="https://www.wikiwand.com/en/Monotone_priority_queue"
>monotone priority queue</a
>
data structure. So, for maintaining a minimum/maximum, the data
structure will store a monotonically increasing/decreasing
double-ended queue.
</p>
<p>
This solution does not satisfy a circular buffer inherently. If an
arbitrary number of elements are removed from the data structure
when an extrema is added, it is certainly not possible to maintain
a window of fixed size.
</p>
<p>Thus, we make one more observation to meet this criterion:</p>
<ul>
<li>
If each price (extrema) on the monotonic double-ended queue also
maintains a count of <i>previously popped elements</i>, we can
deduce the proper action to take when the data structure reaches
capacity.
</li>
<ol>
<li>
If elements were previously popped before this extrema was
added to the data structure, decrement the price&apos;s count
of popped elements and do nothing.
</li>
<li>
Otherwise, either no elements were pushed before this extrema
or they&apos;ve all been popped. Remove (pop) this element
from the deque.
</li>
</ol>
</ul>
<p>
This approach supports all operations in amortized \(O(1)\) time
(with a monotonic sequence, elements are added or removed at least
once; across a sequence of \(n\) operations, \(n\) total \(O(1)\)
operations will be executed).
</p>
<pre><code class="language-cpp">#include &lt;deque&gt;
#include &lt;stdexcept&gt;
#include &lt;utility&gt;
class ExtremaCircularBuffer {
public:
explicit ExtremaCircularBuffer(size_t capacity) : capacity(capacity) {}
void push_back(double value) {
if (prices.size() == capacity) {
double front_value = prices.front();
pop_max(front_value);
prices.pop_front();
}
prices.push_back(value);
push_max(value);
}
void pop_front() {
if (prices.empty()) {
throw std::out_of_range("Cannot pop_front() from empty buffer");
}
double front_value = prices.front();
pop_max(front_value);
prices.pop_front();
}
size_t size() const { return prices.size(); }
double get_max() const {
if (prices.empty()) {
throw std::out_of_range("Cannot find max() of empty buffer");
}
return maxs.front().first;
}
private:
void push_max(double value) {
size_t popped = 0;
while (!maxs.empty() &amp;&amp; maxs.back().first &lt; value) {
popped += maxs.back().second + 1;
maxs.pop_back();
}
maxs.emplace_back(value, popped);
}
void pop_max(double value) {
size_t popped = maxs.front().second;
if (popped == 0) {
maxs.pop_front();
} else {
--maxs.front().second;
}
}
std::deque&lt;double&gt; prices;
std::deque&lt;std::pair&lt;double, size_t&gt;&gt; maxs;
size_t capacity;
};</code></pre>
<div class="fold"><h3>further improvements</h3></div>
<ol>
<li>
While the final approach is <i>theoretically</i> faster than the
second, with small data sets the overhead of the latter is
likely to upset any performance gains.
</li>
<li>
The class could leverage templates to take in a comparator
<span class="inline-code"
><code>std::less&lt;double&gt;</code></span
>
) to easily specify a minimum/maximum
<span class="inline-code"
><code>ExtremaCircularBuffer</code></span
>
as well as a value type to support all operations.
</li>
<li>
As it stands, the class also only maintains one of either
extrema, and using two monotonic deques, while still
<i>theoretically</i> optimal, doesn&apos;t give me a good
feeling. The second map-based approach might be favorable here.
</li>
</ol>
</div>
<div class="problem-content">content</div>
</article>
</div>
</main>