barrettruth.com/posts/trading/extrema-circular-buffer.html

225 lines
9 KiB
HTML

<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="stylesheet" href="/styles/common.css" />
<link rel="stylesheet" href="/styles/post.css" />
<link rel="icon" type="image/webp" href="/public/logo.webp" />
<link href="/public/prism/prism.css" rel="stylesheet" />
<link href="/public/prism/prism-theme.css" rel="stylesheet" />
<script defer src="/public/prism/prism.js"></script>
<script
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"
async
></script>
<title>Barrett Ruth</title>
</head>
<body class="graph-background">
<header>
<a
href="/"
style="text-decoration: none; color: inherit"
onclick="goHome(event)"
>
<div class="terminal-container">
<span class="terminal-prompt">barrett@ruth:~$ /algorithms</span>
<span class="terminal-cursor"></span>
</div>
</a>
</header>
<main class="main">
<div class="post-container">
<header class="post-header">
<h1 class="post-title">extrema circular buffer</h1>
<p class="post-meta">
<time datetime="2024-07-30">30/07/2024</time>
</p>
</header>
<article class="post-article">
<h2>context</h2>
<div>
<p>
While working for
<a href="https://trbcap.com/">TRB Capital Management</a>, certain
strategies necessitated finding the minimum and maximum of a
moving window of prices.
</p>
</div>
<h2>problem statement</h2>
<p>Design a data structure supporting the following operations:</p>
<ul>
<li>
<span><code>build(size_t capacity)</code></span>
: initialize the data structure with capacity/window size
<span><code>capacity</code></span>
</li>
<ul>
<li>
The data structure must always hold \(\leq\)
<span><code>capacity</code></span>
prices.
</li>
</ul>
<li>
<span><code>void push_back(double value)</code></span>
</li>
<ul>
<li>
If the data structure exceeds capacity, remove elements from the
front of the window.
</li>
</ul>
<li>
<span><code>void pop_front()</code></span>
: remove the price from the front of the window
</li>
<li>
<span><code>size_t size()</code></span>
: return the number of prices in the data structure
</li>
<li>
<span><code>double get()</code></span>
: return the extrema (min or max)
</li>
</ul>
<h2>solution</h2>
<p>
Try to solve it yourself first. The point of this exercise it to
create the most theoretically optimal solution you can, not
brute-force and move on.
</p>
<div class="fold">
<h3>na&iuml;ve solution</h3>
</div>
<div class="problem-content">
<p>
One can design a data structure meeting these requirements through
simulating the operations directly with a
<a
target="blank"
href="https://en.cppreference.com/w/cpp/container/deque"
><span><code>std::deque&lt;double&gt;</code></span></a
>.
</p>
<p>
On the upside, this approach is simple to understand. Further,
operations are all \(O(1)\) time&mdash;that is, nearly all
operations. The minimum/maximum element must be found via a linear
scan in \(O(n)\) time, certainly far from optimal.
</p>
<div class="code" data-file="naive.cpp"></div>
</div>
<h3>optimizing the approach</h3>
<div class="problem-content">
<p>
Rather than bear the brunt of the work finding extrema in calls to
<span><code>get()</code></span
>, we can distribute it across the data structure as it is built.
</p>
<p>
Maintaining the prices in a sorted order seems to suffice, and
gives access to both max <i>and</i> min in \(O(1)\) time. However,
all of the problem constraints have not been addressed. Adhering
to the interface of a circular buffer is another challenge.
</p>
<p>
Fortunately, pairing each element with a count allows intelligent
removal/insertion of elements&mdash;if an element has a count of
\(0\), remove it from the list of sorted prices. A
<a
target="blank"
href="https://en.cppreference.com/w/cpp/container/map"
>std::map</a
>
allows us to do all of this.
</p>
<p>
Now, we can access extrema instantly. Insertion and deletion take
\(O(log(n))\) time thanks to the map&mdash;but we can do better.
</p>
<div class="code" data-file="map.cpp"></div>
</div>
<h3>monotonic <s>queues</s> deques</h3>
<div class="problem-content">
<p>
Thinking a bit deeper about the problem constraints, it is clear
that:
</p>
<ul>
<li>
If an extrema is pushed onto the data structure, all previously
pushed elements are irrelevant to any further operations.
</li>
</ul>
<p>
Elements are processed in FIFO order, enabling this observation to
be exploited. This is the foundationl idea of the
<a
target="blank"
href="https://www.wikiwand.com/en/Monotone_priority_queue"
>monotone priority queue</a
>
data structure. So, for maintaining a minimum/maximum, the data
structure will store a monotonically increasing/decreasing
double-ended queue.
</p>
<p>
This solution does not satisfy a circular buffer inherently. If an
arbitrary number of elements are removed from the data structure
when an extrema is added, it is certainly not possible to maintain
a window of fixed size.
</p>
<p>Thus, we make one more observation to meet this criterion:</p>
<ul>
<li>
If each price (extrema) on the monotonic double-ended queue also
maintains a count of <i>previously popped elements</i>, we can
deduce the proper action to take when the data structure reaches
capacity.
</li>
<ol>
<li>
If elements were previously popped before this extrema was
added to the data structure, decrement the price&apos;s count
of popped elements and do nothing.
</li>
<li>
Otherwise, either no elements were pushed before this extrema
or they&apos;ve all been popped. Remove (pop) this element
from the deque.
</li>
</ol>
</ul>
<p>
This approach supports all operations in amortized \(O(1)\) time
(with a monotonic sequence, elements are added or removed at least
once; across a sequence of \(n\) operations, \(n\) total \(O(1)\)
operations will be executed).
</p>
<div class="code" data-file="monotonic.cpp"></div>
<h3>further improvements</h3>
<ol>
<li>
The class could leverage templates to take in a comparator
<span><code>std::less&lt;double&gt;</code></span>
) to easily specify a minimum/maximum
<span><code>ExtremaCircularBuffer</code></span>
as well as a value type to support all operations.
</li>
<li>
As it stands, the class also only maintains one of either
extrema, and using two monotonic deques, while still
<i>theoretically</i> optimal, doesn&apos;t give me a good
feeling. The second map-based approach might be favorable here.
</li>
</ol>
</div>
</article>
</div>
</main>
<script src="/scripts/common.js"></script>
<script src="/scripts/post.js"></script>
</body>
</html>