1D1D DP: A Dynamic Programming Optimization

Introduction

1D/1D DP is a dynamic programming optimization that reduces the time complexity of a specific recurrence from \( O(N^2) \) to \( O(N \log N) \) or \( O(N) \). It bears many similarities to the Convex Hull Trick but is slightly more general.

Pre-Requisite Knowledge

This article will assume you know about the following:

Time Complexity
Prefix Sum Array
Basic Dynamic Programming
Binary Search
Deque Data Structure

You should also be familiar with some math operators. \( \displaystyle \sum_{i = 1}^{10} f(i) \) should be read as "The sum of \( f(i) \) over all values of \( i \) from \( 1 \) to \( 10 \). Similarly, \( \displaystyle \max_{i = 1}^{10} f(i) \) should be read as "The maximum of \( f(i) \) over all values of \( i \) from \( 1 \) to \( 10 \)". Also, \( \displaystyle \argmax_{i = 1}^{10} f(i) \) should be read as "The value of \( i \) that yields the maximum value of \( f(i) \), where \( i \) is between \( 1 \) and \( 10\)".

The symbol \( \exists \) means "there exists", the symbol \( \nexists \) means "there does not exist", and the symbol \( \forall \) means "for all". For example, \( \forall i \lt j, \exists k: f(i, k) \gt f(j, k) \) should be read as "For all pairs \( i \) and \( j \) such that \(i \lt j \), there exists some \( k \) such that \( f(i, k) \gt f(j, k) \)".

Finally, \( \lfloor x \rfloor \) means \( x \), rounded down, and \( \lceil x \rceil \) means \( x \), rounded up.

Motivation Problem

Problem Link: APIO '10 P1 - Commando

To summarize the problem statement: Given an array of \( N \) positive numbers, \( (N \leq 10^6) \), partition it into any number of contiguous subarrays. The value for each subarray is as follows: if \( x \) is the sum of the elements in the subarray, the value is \( Ax^2 + Bx + C \) where \( A \lt 0 \) and \( A \), \(B \), and \( C \) are given constants. The total value is the sum of values of the subarrays that the array is partitioned into. Find the maximum possible total value.

For the rest of the article, 1-index notation will be used, i.e. the indexes start from \( 1 \) and end in \( N \). The array in the problem will be referred to as \( a[] \).

Naive Solution

Let \( p[i] \) be the sum of all values of \( a[i] \) from \( 1 \) to \( i \). Note that \( sum_{k = l}^{r} a[k] = p[r] - p[l - 1] \) and that all values of \( p[i] \) for \( 1 \leq i \leq N \) can be calculated in \( O(N) \) since \( p[i] = p[i - 1] + a[i] \).

Now, let \( dp[i] \) be the answer to the problem for the first \( i \) elements of \( a[] \), that is the maximum value that can be obtained by optimally partitioning the first \( i \) elements into subarrays. To get the answer, we pick some \( j \lt i \) and use the subarray from \( j + 1 \) to \( i \), and use \( dp[j] \) to determine the optimal value for the rest of the array from \( 1 \) to \( j \).

The answer for \( dp[i] \) is then the maximum value over all choices of \( j \lt i \), which can be written as: \[ dp[i] = \max_{j = 0}^{i - 1} \left[ dp[j] + \value(p[i] - p[j]) \right] \]

Where \( \value(x) \) = \( Ax^2 + Bx + C\). This can be expanded to \[ dp[i] = \max_{j = 0}^{i - 1} \left[ dp[j] + A(p[i] - p[j])^2 + B(p[i] - p[j]) + C \right] \]

Note that \( dp[0] = 0 \) and \( p[0] = 0 \) in this implementation, and choosing \( j = 0 \) represents taking the entire current array as the only subarray. Since we need to iterate through all indexes less than \( i \) to calculate \( dp[i] \), the time complexity is \( O(N^2) \).

Below is a C++ implemenation:

#include <stdio.h>
#include <algorithm>
using namespace std;
typedef long long int lli;
const int MN = 1e6+1;
const lli INF = 1e18;
lli N, A, B, C, a[MN], p[MN], dp[MN];

int main(){
	scanf("%lld %lld %lld %lld", &N, &A, &B, &C);
	for(int i = 1; i <= N; i++){
		scanf("%lld", &a[i]);
		p[i] = p[i - 1] + a[i];
	}

	for(int i = 1; i <= N; i++){
		lli best = -INF;
		for(int j = 0; j < i; j++){
			lli val = dp[j] + A*(p[i] - p[j])*(p[i] - p[j]) + B*(p[i] - p[j]) + C;
			best = max(best, val);
		}
		dp[i] = best;
	}

	printf("%lld\n", dp[N]);
}

Since the time complexity is \( O(N^2) \) and \( N \leq 10^6 \), this code won't be able to get full points. The 1D1D optimization allows this recurrence to pass by taking advantage of some key properties.

General Form

In the motivation problem, there are \( O(N) \) states, each relying on \( O(N) \) previous states. This is where the name "1D1D" comes from: 1 dimension of states, each relying on one dimension of previous states. All 1D1D DP recurrences follow the form of

General Form

\[ dp[i] = \max_{j = 0}^{i - 1} \left[ d_j + f(i, j) \right] \]

Note that depending on the problem and implementation, \( max \) may be replaced with \( min \) and \( j = 0 \) might instead be \( j = 1 \).

The function \( f(i, j) \) represents a cost or weight function in minimization problems and a value or utility function in maximization problems. In the motivation problem, \( f(i, j) = A(p[i] - p[j])^2 + B(p[i] - p[j]) + C \).

The Quadrangle Inequality

The 1D1D optimization works only if \( f(i, j) \) satisfies a specific property called the Quadrangle Inequality. In some literature, this is called \( f(i, j) \) being "convex". There are two equivalent forms of the inequality:

Quadrangle Inequality

\ \[ \forall j \lt j' \lt i \lt i', f(i, j) + f(i', j') \geq f(i', j) + f(i, j') \]

Since the only time \( f(i, j) \) is used is when \( j \lt i \), we add the restriction that \( j \lt i \) and \( j' \lt i \).

This property looks confusing and difficult to prove given \( f(i, j) \), so we'll come back to it later. This property is important because it implies two other useful properties: the Monotonic Property and the Turning Point Property.

Monotonic Property

Recall that the general recurrence relation is \[ dp[i] = \max_{j = 1}^{i - 1} \left[ dp[j] + f(i, j) \right] \]

We define \( \opt(i) \) to be the value of \( j \) that maximizes \( dp[j] + f(i, j) \), that is, the value of \( j \) that is used to get the final value of \( dp[i] \).

We can write this in math notation as

Definition of \( \opt(i) \)

\[ \opt(i) = \argmax_{j = 1}^{i - 1} \left[ d_j + f(i, j) \right] \]

The Monotonic Property states that as \( i \) increases, \( \opt(i) \) does not decrease. This can be expressed in math notation in two equivalent ways:

Monotonic Property

\[ \forall i, \opt(i) \leq \opt(i + 1) \] or \[ \forall i \lt i', \opt(i) \leq \opt(i') \]

Turning Point Property

Conceptually, if we're trying to calculate \( dp[i] \), then all \( j \lt i \) are considered possible "candidates" for \( \opt(i) \). For some \( j \) and \( j' \), \( j \) is considered to be a "better" candidate if it yields a larger value for \( dp[i] \), that is, \( dp[j] + f(i, j) \gt dp[j'] + f(i, j') \).

The Turning Point Property states that for all \( j \lt j' \), there must exist some "turning point" index \( i \), where \( j' \) becomes a better candidate than \( j \). For all indexes less than \( i \), \( j \) is a better candidate, but for all indexes greater than or equal to \( i \), \( j' \) is a better candidate.

This turning point might be less than \( 0 \), in which case \( j' \) is always a better candidate, or more than \( N \) (past the end of the array), in which case \( j \) is always better.

In other words, the Turning Point Property states that it is never the case that for some \( j \lt j' \), \( j' \) is a better candidate at some index \( i \), but at a later index \( i' \), \( j \) is a better candidate.

This can be written in math notation as:

Turning Point Property

\[ \forall j \lt j', \nexists j \lt j' \lt i \lt i': \\ \begin{align} dp[j'] + f(i, j') & \gt dp[j] + f(i, j) \\ & \text{and} \\ dp[j] + f(i', j) & \gt dp[j'] + f(i', j) \end{align} \]

The O(N²) Technique

Since we know that \( \opt(i) \) doesn't decrease, we can keep track of \( \opt(i - 1) \) in a variable and instead of calculating \[ dp[i] = \max_{j = 0}^{i - 1} \left[ d_j + f(i, j) \right] \] we can make a small change and calculate \[ dp[i] = \max_{j = \opt(i - 1)}^{i - 1} \left[ d_j + f(i, j) \right] \] This solution is still \( O(N^2) \) and in the worst case, \( g(i) = 0 \) for all \( i \), but in practice, this will often be fast enough to pass.

Below is a C++ implementation:

#include <stdio.h>
#include <algorithm>
using namespace std;
typedef long long int lli;
const int MN = 1e6+1;
const lli INF = 1e18;
lli N, A, B, C, a[MN], p[MN], dp[MN];

int main(){
	scanf("%lld %lld %lld %lld", &N, &p, &B, &C);
	for(int i = 1; i <= N; i++){
		scanf("%lld", &a[i]);
		p[i] = p[i - 1] + a[i];
	}

	int opt = 0;
	for(int i = 1; i <= N; i++){
		lli best = -INF;
		for(int j = opt; j < i; j++){
			lli val = dp[j] + p*(p[i] - p[j])*(p[i] - p[j]) + B*(p[i] - p[j]) + C;
			if(val > best){
				best = val;
				opt = j;
			}
		}
		dp[i] = best;
	}

	printf("%lld\n", dp[N]);
}

This code is enough to get full points on the official test data for APIO Commando.

The O(N log N) Technique

Note that the turning point of two candidates \( j \lt j' \) can be determined with a binary search for the first index where \( dp[j'] + f(i, j') \geq dp[j] + f(i, j) \).

The Turning Point Property includes the idea of "candidates" for \( \opt(i) \). In this technique, we keep track of all "useful" candidates.

First, we explore a consequence of the Turning Point Property. Suppose we have three candidates for \( \opt(i) \): \( a \), \( b \), and \( c \). If the turning point of \( a \) and \( b \) is greater than the turning point of \( b \) and \( c \), it means that \( b \) becomes better than \( a \) after \( c \) becomes better than \( b \). This means that \( b \) is "useless" since it will always be worse than either \( a \) or \( c \).

Now, the algorithm emerges. We loop \( i \) from \( 1 \) to \( N \), maintaining a deque of candidates. We look at the front of deque, which contains the value of \( \opt(i) \). Now that \( \opt(i) \) is determined, \( dp[i] \) can be quickly calculated using the recurrence \( dp[i] = dp[j] + f(i, j) \) and setting \( j = \opt(i) \). Now, \( i \) becomes a candidate for future iterations of \( i \), so it needs to be added to the back of the deque. As discussed in the previous paragraph, to make sure there are no useless candidates, the candidate at the back of the deque needs to be repeatedly removed until the turning point of second-from-back and back candidates of the deque is less than the turning point of the candidate at the back of the deque and \( i \).

However, even after making this check, candidates can later become useless. Suppose we have two candidates for \( \opt(i) \): \( a \) and \( b \). Neither of these have been determined to be useless by the criteria in the previous paragraph, but the turning point of \( a \) and \( b \) is at least \( i \). That means \( b \) is better than \( a \) as a candidate and will always be for all \( i' \gt i \), so \( a \) has become useless.

Consequently, \( \opt(i) \) is not necessarily the front of the deque. To find \( \opt(i) \), at the beginning of the each iteration, the front element of the deque should be removed until the turning point of the front element and second-from-front element is greater than \( i \).

The correctness of the algorithm lies in the fact that the first check also makes sure that the turning point between adjacent elements is increasing from the front to the back of the deque. This means the front of the deque has the best candidate as long as the turning point between the front and the second-from-front is greater than \( i \), which the second check makes sure of.

A C++ implementation is given below. Note that in \( \text{turn}() \), \( j' \) is represented by the variable \( k \).

#include <stdio.h>
#include <algorithm>
#include <deque>
using namespace std;
typedef long long int lli;
const int MN = 1e6+1;
const lli INF = 1e18;
lli N, A, B, C, arr[MN], p[MN], dp[MN];

lli val(int i, int j){
	return dp[j] + A*(p[i] - p[j])*(p[i] - p[j]) + B*(p[i] - p[j]) + C;
}

int turn(int j, int k){
	int l = 1, r = N + 1, m;
	while(l < r){
		m = (l + r) / 2;
		if(val(m, k) >= val(m, j)) r = m;
		else l = m + 1;
	}
	return l;
}

int main(){
	scanf("%lld %lld %lld %lld", &N, &A, &B, &C);
	for(int i = 1; i <= N; i++){
		scanf("%lld", &arr[i]);
		p[i] = p[i - 1] + arr[i];
	}

	deque<int> opt = {0};
	for(int i = 1; i <= N; i++){
		while(opt.size() >= 2 && turn(opt[0], opt[1]) <= i)
			opt.pop_front(); 
		dp[i] = val(i, opt.front());
		while(opt.size() >= 2 && turn(opt[opt.size() - 2], opt[opt.size() - 1]) >= turn(opt[opt.size() - 1], i))
			opt.pop_back();
		opt.push_back(i);
	}

	printf("%lld\n", dp[N]);
}

The number of times that the turning point is computed is equal to the number of times any element is removed from the deque, plus two per iteration of the main loop. Each element is added and removed at most once, so it is computed an \( O(N) \) number of times. Computing a turning point takes \( O(\log N) \) time due to the binary search, so the total time complexity is \( O(N \log N) \).

The O(N) Technique

The \( O(N) \) technique is very similar, but is not guaranteed to be always easily applicable. The time complexity reduction comes from computing the turning point of two candidates in \( O(1) \) time.

To show how this is possible, we rearrange the turning point formula for candidates \( j \lt j' \). Recall that \( j' \) is at least as good \( j \) if \( dp[j] + f(i, j) \leq dp[j'] + f(i, j') \).

\[ \require{cancel} \begin{align} dp[j] + f(i, j) & \leq dp[j'] + f(i, j') \\ dp[j] + Ap[i]^2 - 2Ap[i]p[j] + Ap[j]^2 + Bp[i] - Bp[j] + C & \leq dp[j'] + Ap[i]^2 - 2Ap[i]p[j'] + Ap[j']^2 + Bp[i] - Bp[j'] + C \\ dp[j] + \cancel{Ap[i]^2} - 2Ap[i]p[j] + Ap[j]^2 + \cancel{Bp[i]} - Bp[j] + \cancel{C} & \leq dp[j'] + \cancel{Ap[i]^2} - 2Ap[i]p[j'] + Ap[j']^2 + \cancel{Bp[i]} - Bp[j'] + \cancel{C} \\ dp[j] - dp[j'] + Bp[j'] - Bp[j] & \leq 2Ap[i]p[j] - 2Ap[i]p[j'] \\ dp[j] - dp[j'] + B(p[j'] - Bp[j]) & \leq 2Ap[i](p[j] - p[j']) \end{align} \] Note that since \( a[i] \) is positive, \( p[i] \) must be increasing as \( i \) increases, so \( p[j] - p[j'] \) must be negative, and it is given in the problem statement that \( A \) is negative, so \( 2A(p[j] - p[j']) \) is positive, and we can move it to the other side without flipping the inequality sign. \[ \require{cancel} \begin{align} dp[j] - dp[j'] + B(p[j'] - Bp[j]) & \leq 2Ap[i](p[j] - p[j']) \\\\ \frac{dp[j] - dp[j'] + B(p[j'] - Bp[j])}{2A(p[j] - p[j'])} & \leq p[i] \end{align} \]

Now, we can say that for some \( j \lt j' \), \( j' \) at least as good as \( j \) when \( p[i] \) greater than or equal to \( \frac{dp[j] - dp[j'] + B(p[j'] - Bp[j])}{2A(p[j] - p[j'])} \). This turning point function is in terms of \( p[i] \) instead of \( i \), so the code should be modified slightly.

The algorithm for the first check remains the exact same. Note that in the first check, we just need to compare two turning points for which is greater. Since \( p[i] \) is increasing as \( i \) increases, the relative position of turning point doesn't change if it in terms of \( p[i] \) instead of \( i \).

For the second check, the turning point just needs to be compared against \( p[i] \) instead of \( i \).

#include <stdio.h>
#include <algorithm>
#include <deque>
using namespace std;
typedef long long int lli;
const int MN = 1e6+1;
const lli INF = 1e18;
lli N, A, B, C, a[MN], p[MN], dp[MN];

double turn(int j, int k){
	return
	(dp[j] - dp[k] + A*(p[j]*p[j] - p[k]*p[k]) + B*(p[k] - p[j])) /
	(2.0 * A * (p[j] - p[k]));
}

int main(){
	scanf("%lld %lld %lld %lld", &N, &A, &B, &C);
	for(int i = 1; i <= N; i++){
		scanf("%lld", &a[i]);
		p[i] = p[i - 1] + a[i];
	}

	deque<int> opt = {0};
	for(int i = 1; i <= N; i++){
		while(opt.size() >= 2 && turn(opt[0], opt[1]) <= p[i])
			opt.pop_front(); 
		int j = opt.front();
		dp[i] = dp[j] + A*(p[i] - p[j])*(p[i] - p[j]) + B*(p[i] - p[j]) + C;
		while(opt.size() >= 2 && turn(opt[opt.size() - 2], opt[opt.size() - 1]) >= turn(opt[opt.size() - 1], i))
			opt.pop_back();
		opt.push_back(i);
	}

	printf("%lld\n", dp[N]);
}

Recognizing the Quadrangle Inequality

To illustrate how we can prove the Quadrangle Inequality for a recurrence, here is an example of an algebraic proof for the motivation problem, APIO Commando.

While it is possible to write out, expand, and factor terms by hand on paper, it might be more viable, especially during contest, to abandon trying to prove the Quadrangle Inequality. Instead, the naive \( O(N^2) \) solution can be coded up, (not the optimzied version discussed in this article), and if the values of \( \opt(i) \) can be printed out. If \( \opt(i) \) does not decrease for several small test inputs, the Quandrangle Inequality is probably satisfied.

Implementation Details

Avoiding Floating-Point Numbers

To avoid floating points in the previous code example, notice that \( p[i] \) must be greater than or equal to some fractional value for \( j' \) to be at least as good as \( j \), it's enough to check that \( p[i] \) is greater than the fraction, rounded up.

We take advantage of the property that \[ \left\lceil \frac{a}{b} \right\rceil = \begin{cases} \left\lfloor \frac{a}{b} \right\rfloor, & \text{ if } b | a \\ \left\lfloor \frac{a}{b} \right\rfloor + 1, & \text{ otherwise} \end{cases} \] Where \( b | a \) means "\( b \) divides \( a \)" or "\( a \) is divisible by \( b \)". Only the turning point function needs to be modified slightly:

int ceildiv(lli a, lli b){
	return a/b + ((a % b) > 0);
}

int turn(int j, int k){
	return ceildiv(
		dp[j] - dp[k] + A*(p[j]*p[j] - p[k]*p[k]) + B*(p[k] - p[j]),
		2 * A * (p[j] - p[k])
	);
}

Since the modulo operator is expensive, this performs slightly worse than the floating-point version, but has the advantage of eliminating floating-point errors.

Two-Pointers Implemenation

Since the only the back is pushed to, a built-in deque is not required. Only two pointers, one for the back and one for the front, and an array is required. The size of the implicit deque is given by \( \text{back} - \text{front} + 1 \). This method is slightly faster than using a built-in deque. We don't actually use pointers; two can use two integers that keep track of the array indexes.

#include <stdio.h>
#include <algorithm>
#include <deque>
using namespace std;
typedef long long int lli;
const int MN = 1e6+1;
const lli INF = 1e18;
lli N, A, B, C, a[MN], p[MN], dp[MN];
int opt[MN], front, back;

double turn(int j, int k){
	return
	(dp[j] - dp[k] + A*(p[j]*p[j] - p[k]*p[k]) + B*(p[k] - p[j])) /
	(2.0 * A * (p[j] - p[k]));
}

int main(){
	scanf("%lld %lld %lld %lld", &N, &A, &B, &C);
	for(int i = 1; i <= N; i++){
		scanf("%lld", &a[i]);
		p[i] = p[i - 1] + a[i];
	}

	for(int i = 1; i <= N; i++){
		while(back - front >= 1 && turn(opt[front], opt[front + 1]) <= p[i])
			++front;
		int j = opt[front];
		dp[i] = dp[j] + A*(p[i] - p[j])*(p[i] - p[j]) + B*(p[i] - p[j]) + C;
		while(back - front >= 1 && turn(opt[back - 1], opt[back]) >= turn(opt[back], i))
			--back;
		opt[++back] = i;
	}

	printf("%lld\n", dp[N]);
}

Most built-in deques are very fast, so this implementation offers very little improvement in performance. However, it makes Line 29 easier to implement and less likely to mess up.

Minimization Problems

The general form of minimization 1D1D problems is, as expected,

\[ dp[i] = \min_{j = 0}^{i - 1} \left[ dp[j] + f(i, j) \right] \]

The Quadrangle Inequality is the same except for the flipped sign:

\[ f(i, j) + f(i', j') \leq f(i', j) + f(i, j') \]

The Monotonic Property is the exact same, and the Turning Point Property is the same, except a candidate is "better" for \( opt(i) \) if it yields a lower value for \( dp[i] \) since we wish to minimize \( dp[i] \).

In the \( O(N \log N) \) implementation, only the condition in the binary search needs to be changed. The \( O(N) \) algorithm is also the same, except the inequality for the turning point is flipped.

Additional Reading

This marks the end of main article. Thank you for reading!

You can read a proof that the Quadrangle Inequality implies the Monotonic Property and the Turning Point Property here.

This article is heavily based off of two sources:

This blog post by the University of British Columbia ACM-ICPC Team
This Codeforces comment by WJMZBMR

If you prefer to read formal papers, you can also check out the ones mentioned at the end of this Codeforces blog.