240 lines
9.6 KiB
HTML
240 lines
9.6 KiB
HTML
<!doctype html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta charset="UTF-8" />
|
|
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
|
<link rel="stylesheet" href="/styles/common.css" />
|
|
<link rel="stylesheet" href="/styles/post.css" />
|
|
<link rel="icon" type="image/webp" href="/public/logo.webp" />
|
|
<link href="/public/prism/prism.css" rel="stylesheet" />
|
|
<link href="/public/prism/prism-theme.css" rel="stylesheet" />
|
|
<script defer src="/public/prism/prism.js"></script>
|
|
<script
|
|
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"
|
|
async
|
|
></script>
|
|
<title>Barrett Ruth</title>
|
|
</head>
|
|
<body class="graph-background">
|
|
<header>
|
|
<a
|
|
href="/"
|
|
style="text-decoration: none; color: inherit"
|
|
onclick="goHome(event)"
|
|
>
|
|
<div class="terminal-container">
|
|
<span class="terminal-prompt">barrett@ruth:~$ /algorithms</span>
|
|
<span class="terminal-cursor"></span>
|
|
</div>
|
|
</a>
|
|
</header>
|
|
<main class="main">
|
|
<div class="post-container">
|
|
<header class="post-header">
|
|
<h1 class="post-title">extrema circular buffer</h1>
|
|
<p class="post-meta">
|
|
<time datetime="2024-07-30">30/07/2024</time>
|
|
</p>
|
|
</header>
|
|
<article class="post-article">
|
|
<h2>context</h2>
|
|
<div>
|
|
<p>
|
|
While working for
|
|
<a href="https://trbcap.com/">TRB Capital Management</a>, certain
|
|
strategies necessitated finding the minimum and maximum of a
|
|
moving window of prices.
|
|
</p>
|
|
</div>
|
|
<h2>problem statement</h2>
|
|
<p>Design a data structure supporting the following operations:</p>
|
|
<ul>
|
|
<li>
|
|
<span class="inline-code"
|
|
><code>build(size_t capacity)</code></span
|
|
>
|
|
: initialize the data structure with capacity/window size
|
|
<span class="inline-code"><code>capacity</code></span>
|
|
</li>
|
|
<ul>
|
|
<li>
|
|
The data structure must always hold \(\leq\)
|
|
<span class="inline-code"><code>capacity</code></span>
|
|
prices.
|
|
</li>
|
|
</ul>
|
|
<li>
|
|
<span class="inline-code"
|
|
><code>void push_back(double value)</code></span
|
|
>
|
|
</li>
|
|
<ul>
|
|
<li>
|
|
If the data structure exceeds capacity, remove elements from the
|
|
front of the window.
|
|
</li>
|
|
</ul>
|
|
<li>
|
|
<span class="inline-code"><code>void pop_front()</code></span>
|
|
: remove the price from the front of the window
|
|
</li>
|
|
<li>
|
|
<span class="inline-code"><code>size_t size()</code></span>
|
|
: return the number of prices in the data structure
|
|
</li>
|
|
<li>
|
|
<span class="inline-code"><code>double get()</code></span>
|
|
: return the extrema (min or max)
|
|
</li>
|
|
</ul>
|
|
<h2>solution</h2>
|
|
<p>
|
|
Try to solve it yourself first. The point of this exercise it to
|
|
create the most theoretically optimal solution you can, not
|
|
brute-force and move on.
|
|
</p>
|
|
<div class="fold">
|
|
<h3>naïve solution</h3>
|
|
</div>
|
|
<div class="problem-content">
|
|
<p>
|
|
One can design a data structure meeting these requirements through
|
|
simulating the operations directly with a
|
|
<a
|
|
target="blank"
|
|
href="https://en.cppreference.com/w/cpp/container/deque"
|
|
><span class="inline-code"
|
|
><code>std::deque<double></code></span
|
|
></a
|
|
>.
|
|
</p>
|
|
<p>
|
|
On the upside, this approach is simple to understand. Further,
|
|
operations are all \(O(1)\) time—that is, nearly all
|
|
operations. The minimum/maximum element must be found via a linear
|
|
scan in \(O(n)\) time, certainly far from optimal.
|
|
</p>
|
|
<div class="code" data-file="naive.cpp"</div>
|
|
</div>
|
|
<h3>optimizing the approach</h3>
|
|
<div class="problem-content">
|
|
<p>
|
|
Rather than bear the brunt of the work finding extrema in calls to
|
|
<span class="inline-code"><code>get()</code></span
|
|
>, we can distribute it across the data structure as it is built.
|
|
</p>
|
|
<p>
|
|
Maintaining the prices in a sorted order seems to suffice, and
|
|
gives access to both max <i>and</i> min in \(O(1)\) time. However,
|
|
all of the problem constraints have not been addressed. Adhering
|
|
to the interface of a circular buffer is another challenge.
|
|
</p>
|
|
<p>
|
|
Fortunately, pairing each element with a count allows intelligent
|
|
removal/insertion of elements—if an element has a count of
|
|
\(0\), remove it from the list of sorted prices. A
|
|
<a
|
|
target="blank"
|
|
href="https://en.cppreference.com/w/cpp/container/map"
|
|
>std::map</a
|
|
>
|
|
allows us to do all of this.
|
|
</p>
|
|
<p>
|
|
Now, we can access extrema instantly. Insertion and deletion take
|
|
\(O(log(n))\) time thanks to the map—but we can do better.
|
|
</p>
|
|
<div class="code" data-file="map.cpp"></div>
|
|
</div>
|
|
<h3>monotonic <s>queues</s> deques</h3>
|
|
<div class="problem-content">
|
|
<p>
|
|
Thinking a bit deeper about the problem constraints, it is clear
|
|
that:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
If an extrema is pushed onto the data structure, all previously
|
|
pushed elements are irrelevant to any further operations.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
Elements are processed in FIFO order, enabling this observation to
|
|
be exploited. This is the foundationl idea of the
|
|
<a
|
|
target="blank"
|
|
href="https://www.wikiwand.com/en/Monotone_priority_queue"
|
|
>monotone priority queue</a
|
|
>
|
|
data structure. So, for maintaining a minimum/maximum, the data
|
|
structure will store a monotonically increasing/decreasing
|
|
double-ended queue.
|
|
</p>
|
|
<p>
|
|
This solution does not satisfy a circular buffer inherently. If an
|
|
arbitrary number of elements are removed from the data structure
|
|
when an extrema is added, it is certainly not possible to maintain
|
|
a window of fixed size.
|
|
</p>
|
|
<p>Thus, we make one more observation to meet this criterion:</p>
|
|
<ul>
|
|
<li>
|
|
If each price (extrema) on the monotonic double-ended queue also
|
|
maintains a count of <i>previously popped elements</i>, we can
|
|
deduce the proper action to take when the data structure reaches
|
|
capacity.
|
|
</li>
|
|
<ol>
|
|
<li>
|
|
If elements were previously popped before this extrema was
|
|
added to the data structure, decrement the price's count
|
|
of popped elements and do nothing.
|
|
</li>
|
|
<li>
|
|
Otherwise, either no elements were pushed before this extrema
|
|
or they've all been popped. Remove (pop) this element
|
|
from the deque.
|
|
</li>
|
|
</ol>
|
|
</ul>
|
|
<p>
|
|
This approach supports all operations in amortized \(O(1)\) time
|
|
(with a monotonic sequence, elements are added or removed at least
|
|
once; across a sequence of \(n\) operations, \(n\) total \(O(1)\)
|
|
operations will be executed).
|
|
</p>
|
|
<div class="code" data-file="monotonic.cpp"></div>
|
|
<h3>further improvements</h3>
|
|
<ol>
|
|
<li>
|
|
While the final approach is <i>theoretically</i> faster than the
|
|
second, with small data sets the overhead of the latter is
|
|
likely to upset any performance gains.
|
|
</li>
|
|
<li>
|
|
The class could leverage templates to take in a comparator
|
|
<span class="inline-code"
|
|
><code>std::less<double></code></span
|
|
>
|
|
) to easily specify a minimum/maximum
|
|
<span class="inline-code"
|
|
><code>ExtremaCircularBuffer</code></span
|
|
>
|
|
as well as a value type to support all operations.
|
|
</li>
|
|
<li>
|
|
As it stands, the class also only maintains one of either
|
|
extrema, and using two monotonic deques, while still
|
|
<i>theoretically</i> optimal, doesn't give me a good
|
|
feeling. The second map-based approach might be favorable here.
|
|
</li>
|
|
</ol>
|
|
</div>
|
|
</article>
|
|
</div>
|
|
</main>
|
|
<script src="/scripts/common.js"></script>
|
|
<script src="/scripts/post.js"></script>
|
|
</body>
|
|
</html>
|