extrema circular buffer

context

While working for TRB Capital Management, certain strategies necessitated finding the minimum and maximum of a moving window of prices.

problem statement

Design a data structure supporting the following operations:

  • build(size_t capacity) : initialize the data structure with capacity/window size capacity
    • The data structure must always hold \(\leq\) capacity prices.
  • void push_back(double value)
    • If the data structure exceeds capacity, remove elements from the front of the window.
  • void pop_front() : remove the price from the front of the window
  • size_t size() : return the number of prices in the data structure
  • double get() : return the extrema (min or max)

solution

Try to solve it yourself first. The point of this exercise it to create the most theoretically optimal solution you can, not brute-force and move on.

naïve solution

One can design a data structure meeting these requirements through simulating the operations directly with a std::deque<double>.

On the upside, this approach is simple to understand. Further, operations are all \(O(1)\) time—that is, nearly all operations. The minimum/maximum element must be found via a linear scan in \(O(n)\) time, certainly far from optimal.

optimizing the approach

Rather than bear the brunt of the work finding extrema in calls to get(), we can distribute it across the data structure as it is built.

Maintaining the prices in a sorted order seems to suffice, and gives access to both max and min in \(O(1)\) time. However, all of the problem constraints have not been addressed. Adhering to the interface of a circular buffer is another challenge.

Fortunately, pairing each element with a count allows intelligent removal/insertion of elements—if an element has a count of \(0\), remove it from the list of sorted prices. A std::map allows us to do all of this.

Now, we can access extrema instantly. Insertion and deletion take \(O(log(n))\) time thanks to the map—but we can do better.

monotonic queues deques

Thinking a bit deeper about the problem constraints, it is clear that:

  • If an extrema is pushed onto the data structure, all previously pushed elements are irrelevant to any further operations.

Elements are processed in FIFO order, enabling this observation to be exploited. This is the foundationl idea of the monotone priority queue data structure. So, for maintaining a minimum/maximum, the data structure will store a monotonically increasing/decreasing double-ended queue.

This solution does not satisfy a circular buffer inherently. If an arbitrary number of elements are removed from the data structure when an extrema is added, it is certainly not possible to maintain a window of fixed size.

Thus, we make one more observation to meet this criterion:

  • If each price (extrema) on the monotonic double-ended queue also maintains a count of previously popped elements, we can deduce the proper action to take when the data structure reaches capacity.
    1. If elements were previously popped before this extrema was added to the data structure, decrement the price's count of popped elements and do nothing.
    2. Otherwise, either no elements were pushed before this extrema or they've all been popped. Remove (pop) this element from the deque.

This approach supports all operations in amortized \(O(1)\) time (with a monotonic sequence, elements are added or removed at least once; across a sequence of \(n\) operations, \(n\) total \(O(1)\) operations will be executed).

further improvements

The final implementation utilized in the TRB includes the following features:

  1. A ringbuffer a statically-allocated std::array, as any fix-sized queue can be supplanted with one
  2. A templatized value type and comparator for flexibility
  3. C++ specific optimizations (rule of 5, smart pointers, and an STL-compliant API)