Daniel Volya

Daniel

Volya

Portfolio Optimization Quantum Benchmarking

Random Matrix Theory, Portfolio Optimization, and Quantum Benchmarking

in science and tagged finance and quantum

Today I want to showcase random matrix theory, the Marchenko-Pastur distribution and it's application from portfolio optimization to quantum benchmarking.

Random Matrix Theory

One interesting topic in random matrix theory is understanding the eigenvalues in a (large) random matrix. An notable example comes from Eugene Wigner, who suggested that the gaps between lines in the spectrum of a heavy atomic nucleus should be similar to the gaps between the eigenvalues of a random matrix:

Perhaps I am now too courageous when I try to guess the distribution of the distances between successive levels (of energies of heavy nuclei). Theoretically, the situation is quite simple if one attacks the problem in a simpleminded fashion. The question is simply what are the distances of the characteristic values of a symmetric matrix with random coefficients.

-- Eugene Wigner, Results and theory of resonance absorption

Montgomery-Odlyzko Law, the Riemann Zeta Function, and Primes

A slight tangent, but it has been conjuected that the zeros of the Riemann zeta function $$\zeta(z) = \sum_{n=1}^\infty \frac{1}{n^z}$$ are related to the eigenvalues of Hermitian matrices. In particular, there is a relationship between the normalized spacing of zeros of the Riemann zeta function and the normalized spacing of eigenvalues from the Gaussian Unitary Ensemble (GUE).

Normalized spacing of Riemann Zeta functions zeros and of GUE eigenvalues.
Click to expand code
using LinearAlgebra
using Random
using Statistics
using Plots

function normalized_eigenvalue_spacings(n::Int, num_samples::Int)
    spacings = Float64[]

    for _ in 1:num_samples
        H = randn(ComplexF64, n, n)
        H = (H + adjoint(H)) / 2
        eigenvalues = eigvals(H)
        sorted_eigenvalues = sort(eigenvalues)
        current_spacings = diff(sorted_eigenvalues)
        normalized_spacings = current_spacings / mean(current_spacings)
        append!(spacings, normalized_spacings)
    end
    return spacings
end

n = 100  # Size of the matrix
num_samples = 1000  # Number of samples
gue_spacings = normalized_eigenvalue_spacings(n, num_samples)

# Read the Riemann zeta zeros from the file
# https://www-users.cse.umn.edu/~odlyzko/zeta_tables/index.html
zeros = map(x -> parse(Float64, x), readlines("zeros3"))
zeta_spacings = diff(zeros)
normalized_zeta_spacings = zeta_spacings / mean(zeta_spacings)

# Plot the histogram of both distributions
stephist(gue_spacings, bins=50, normalize=true, label="GUE Eigenvalues",
          xlabel="Normalized Spacing", ylabel="Density", title="Comparison of Spacings")
stephist!(normalized_zeta_spacings, bins=50, normalize=true, label="Riemann Zeta Zeros")
savefig("spacings.svg")

This is pretty cool. It tells us that there there is a fundamental relationship between physics and number theory, i.e. a connection to prime numbers.

Marchenko-Pastur Distribution

For the rest of this post, we will look at a particular type of random matrix called a Wishart matrix. It is a square matrix of the form $$W = XX^\dagger$$ where $X$ is an $N \times T$ matrix and the entries are i.i.d random normal variables. Note that for real-valued matrices $X^\dagger = X^T$.

In the case that $T \ge N$ and $N$ is large, the distribution of eigenvalues of the matrix $W$ will be distributed approximately by the Marchenko-Pastur distribution: $$ d\nu(x) = \frac{1}{2\pi \sigma^2}\frac{\sqrt{(\lambda_+ - x)(x-\lambda_-)}}{x \lambda} $$ where $\lambda = T/N$ and $\lambda_\pm = \sigma^2 \left(1 \pm \sqrt{\lambda}\right)^2$. A description and precise theorem may be found here.

Histogram and distribution of marchenko-pastur
Empirical and theoritical Marchenko-Pastur distribution.
Click to expand code
function mp(x, Q, σ=1)
    y = 1 / Q
    b = (σ * (1 + sqrt(1 / Q)))^2  # Largest eigenvalue
    a = (σ * (1 - sqrt(1 / Q)))^2  # Smallest eigenvalue

    if x > b || x < a
        return 0
    else
        return (1 / (2 * π * σ^2 * x * y)) * sqrt((b - x) * (x - a))
    end
end
N = 500
T = 1000
histogram(eigvals(Hermitian(cor(randn(T,N)))), normalize=true, bins=50, label="Empirical")

plot!(x, mp.(x, T/N), linewidth=3, label="Marchenko-Pastur")
savefig("mp.svg")

Portfolio Optimization

The question we explore here is: what is the best weighted-selection of stocks for us to invest in?

History of some stock prices

Alan Edelman in one of his talks mentioned that some students who attended his class on random matrix theory at MIT dropped out and started a hedge fund. They’re apparently doing pretty well today. So maybe there is something to random matrix theory and the stock market...

Correlation matrices are Wishart matrices

A correlation matrix is an example of Wishart matrix if the data matrix $X$ containing $T$ data points of $N$ variables has i.i.d. Normal entries.

While it's not always the case, it is often assumed that the logarithm of daily stock returns follows a normal distribution.

This assumption works well enough that much of modern financial theory has been developed based on it. Even today, the fundamental ideas still apply, although the normal distribution is sometimes replaced with a more skewed one. As a result, the correlation matrix of the log returns is what we will look at.

The daily log returns for handful of stocks. Looks kinda like noise.

Correlation filtering

The concept of correlation filtering involves the following steps:

  1. Calculate the eigenvalues of the correlation matrix.
  2. If the eigenvalues fall within the theoretical range specified by the Marchenko-Pastur distribution, set those eigenvalues to zero, then reconstruct the matrix.

Eigenvalues within this range are considered to represent randomness and are therefore discarded. Let's go through an example. Due to the requirement that both $N$ and $T$ be large, we will use a pre-downloaded dataset of stock prices. Here I use 100 randomly selected stocks listed in the S&P500 index and look at their 8 year history.

Click to expand code
training_period = 400
n = 100

df = CSV.File("stock_prices.csv") |> DataFrame

random_columns = append!([1], randperm(length(names(df)))[1:n])
df = df[:, random_columns]

ta = TimeArray(df, timestamp =:Date)
dr = percentchange(ta)
dr = log.(dr .+ 1)

returns = values(dr)
in_samples  = returns[1:(size(returns)[1] - training_period), :]

corr = cor(in_samples)
variences = diag(cov(values(in_samples)))
standard_devs = sqrt.(variences)

T,N = size(in_samples) # (1756, 100)

After constructing the correlation matrix (size $100\times 100$) I look at the eigenvalues.

Histogram of correlation matrix eigenvalues as well as Marchenko-Pastur distribution using $T/N$ and a fitted one.

It appears that our assumptions are mostly valid when there is no temporal structure! Therefore, any deviations suggest a non-random pattern in the data. However, while the emperical data seems to follow the Marchenko-Pastur distribution, our theory of the distribution does not follow so nicely. Part of this deviation can be explained by sampling error. So, as suggested by Merrill Lynch, we can tune our $\lambda$ and $\sigma$ to get a better fit.

Click to expand code
using StatsBase, Optim

histogram(e, normalize=:pdf, bins=25, ylabel="Density", xlabel="Eigenvalues", label="Empirical")
x = range(x_min, x_max; length=100) |> Array
plot!(x, mp.(x, T/N), c=:red, label="Original", linewidth=3)

function objective(params)
    Q = params[1]
    σ = params[2]

    hist_edges = LinRange(minimum(e), maximum(e), 51)
    hist_values = normalize(fit(Histogram, e, hist_edges), mode=:pdf).weights

    mp_values = [mp(edge, Q, σ) for edge in hist_edges[1:end-1]]

    return sum((hist_values - mp_values).^2)
end


# Constraints: Q > 0 and σ > 0
lower_bounds = [0.0, 0.0]
upper_bounds = [Inf, Inf]

# Optimize to find the best parameters using constrained optimization
result = optimize(objective, lower_bounds, upper_bounds, [T/N, 1], Fminbox(BFGS()))

# result = optimize(objective, [T/N, 1], BFGS())
fitted_params = result.minimizer

plot!(x, mp.(x, fitted_params[1], fitted_params[2]), label="Fitted", linewidth=3, c=:black)

With the fitted parameters we now remove the eigenvalues that are less than the minimum eigenvalue and then reconstruct back the matrix.

Correlation matrix before and after filtering
Click to expand code
e,v = eigen(Hermitian(corr))

σ = fitted_params[2]
Q = fitted_params[1]
max_eig = (σ * (1 + sqrt(1/Q)))^2

e[e .> max_eig]

e[e .<= max_eig] .= 0

filtered = v * diagm(e) * v'
filtered[diagind(filtered)] .=1

plot(heatmap(corr, yflip=true,aspect_ratio=:equal, legend=false, axis=false, grid=false, title="Original"), heatmap(filtered, yflip=true, aspect_ratio=:equal, axis=false, legend=false, grid=false, title="Filtered"))

Minimum Varience Portfolio

We will perform a brief in-sample and out-of-sample comparison between the minimum variance portfolios derived from the filtered and standard covariance matrices. Let $ D = \text{diag}(\Sigma) $, which is the matrix where $ D_{i,i} = \Sigma_{i,i} $ and $ D_{i,j} = 0 $ for $ i \neq j $. We will use the relationship between the correlation matrix $ C $ and the covariance matrix $\Sigma $, given by the formula

$$ \Sigma = D^{1/2} C D^{1/2}. $$

By substituting $ C $ with the filtered correlation matrix $ \hat{C} $, we obtain the filtered covariance matrix $ \hat{\Sigma} $.

Varience and filtered varience
Click to expand code
cov_mat = cov(in_samples)
inv_cov = pinv(cov_mat)
o = ones(size(inv_cov)[1])
inv_dot_ones = inv_cov * o
min_var_weights = inv_dot_ones / (inv_dot_ones' * o)

bar(min_var_weights, linecolor=nothing, label="Original Varience")

filtered_cov = diagm(standard_devs) * filtered * diagm(standard_devs)
filt_inv_cov = pinv(filtered_cov)
o = ones(size(filt_inv_cov)[1])
inv_dot_ones = filt_inv_cov * o
filt_min_var_weights = inv_dot_ones / (inv_dot_ones' * o)

bar!(filt_min_var_weights, linecolor=nothing, label="Filtered Varience", xlabel="Stock")

Now, let’s plot their return over time. Since both contain short sales (negative varience), we’re going to remove short sales and redistribute their weight.

Comparing filtered minimum varience portfolio

Naturally, the covariance matrix changes over time, whether filtered or not, and the filtering itself evolves. Since there is no free lunch, there may be instances where the unfiltered portfolio outperforms the filtered one, but this is likely due to random chance. In this case, we were able to outperform the S&P500.

Click to expand code
function get_cumulative_returns_over_time(sample, weights)
    # Ignoring short sales
    weights[weights .<= 0] .= 0
    weights ./= sum(weights)

    # Calculate cumulative product and returns
    cumulative_returns = cumprod(1 .+ sample, dims=1) .- 1
    return cumulative_returns * weights
end

cumulative_returns = get_cumulative_returns_over_time(returns, min_var_weights)
cumulative_returns_filt = get_cumulative_returns_over_time(returns, filt_min_var_weights)


in_sample_ind = 1:(size(returns, 1) - training_period)
out_sample_ind = (size(returns, 1) - training_period):(size(returns, 1) - 1)

p1 = plot(cumulative_returns[in_sample_ind], c=:black, xlabel="Time", label="In-sample", title="Minimum Varience Portfolio")
plot!(p1,out_sample_ind, cumulative_returns[out_sample_ind], c=:red, label="Out-sample")

p2=plot(cumulative_returns_filt[in_sample_ind], c=:black,  xlabel="Time", label="In-sample", title="Filtered Minimum Varience Portfolio")
plot!(p2,out_sample_ind,cumulative_returns_filt[out_sample_ind], c=:red, label="Out-sample")


p3 = plot(cumulative_returns, label="Original")
plot!(cumulative_returns_filt, label="Filtered")

plot(p1,p2,p3, layout=(3,1))

Quantum Benchmarking

This section is based on my recent research on random dynamical quantum maps.

Quantum benchmarking is the process of evaluating and comparing the performance of quantum devices or algorithms. It involves a series of tests or protocols designed to measure various aspects of a quantum system's behavior, such as its fidelity, error rates, coherence times, and gate operations. The goal of quantum benchmarking is to assess the quality and reliability of quantum computations, identify sources of noise or errors, and ultimately guide improvements in quantum hardware and algorithms. Here we will introduce a quantum benchmarking protocol based on the Marchenko-Pastur distribution.

Random Quantum States

Suppose that instead of taking the list of stock prices, we instead look at quantum states $\ket{\psi}$. In particular, we are interested in random quantum states. A good way to think about random quantum states is to look at the random unitary operators that transform states. Unitary operators are nice, in the sense that they form a group (composition of two unitaries $U_1U_2$ is also unitary.) Hence it is easier to mathematically perscribe what randomness means. So, we will write a random state as applying a random unitary (sample from a Haar-measure) applied to an initial known state $\ket{0}$: $$ \ket{\psi_{\text{random}}} = U_\text{Haar}\ket{0}. $$

Now suppose that we actually have two quantum systems that have been joined together: the system ($S$) and the environment ($E$). Initially, the state and environment are in a known, seperable quantum state, so that we can now write:

$$ \ket{\psi_{\text{random}}} = U_\text{Haar}(\ket{0_S} \otimes \ket{0_E}). $$

The random unitary entangles together the system and the environment to give us a global state $\ket{\psi_{\text{random}}}$.

Density matrices are Wishart matrices

In reality, we do not have knowledge about the environment $E$. This in turn means we have uncertainty to what the state of our system $S$ is (remember that our system is entangled with the environment!) It is convenient to express uncertainty of a quantum system via density representation. So we re-write the above equation

$$ \rho_{\text{random}} = U_\text{Haar}(\ket{0_S}\bra{0_S} \otimes \ket{0_E}\bra{0_E} )U_\text{Haar}^\dagger. $$

Mathematically, to determine what the state of our system $S$ is we use the partial trace, $$ \rho_S^\prime = \mathrm{Tr_E}\left[\rho_\mathrm{random}\right] $$ which encodes all the uncertainty inroduced from the environment. The density matrix is a positive semi-definite, Hermitian matrix (this may sound similiar to what a correlation matrix is!) And similiar to the stock price example from before, we have two variables: the dimension of the system and the dimension of the environment. The eigenvalues of the random density matrix will follow the Marchenko-Pastur distribution with respect to these variables. We will use this properties to test the performance of a quantum computer.

Random Dynamical Quantum Maps

The Marchenko-Pastur distribution is of course defined for large dimensions, which may be hard to realize if we were wanting a quantum computer to run a benchmarking. Instead, what we can do, is define a small environment and repeatedly apply the same random unitary while introducing a fresh environment each time. On a quantum computer we can achieve this by either introducing more fresh qubits, or by reseting a qubit to a known pure state. We will utilize resets, which as a result, creates an effective larger environment.

Repeating the same quantum operation

The process of applying a unitary and then doing a partial trace can be viewed as a quantum map $\mathcal{E}$ which takes an input density matrix and maps it a new density matrix:

$$ \mathcal{E}(\rho) = \mathrm{Tr}_\mathrm{env}\left[U(\rho \otimes \rho_\mathrm{env})U^\dagger\right] = \sum_{i=1}^r \bra{e_i} U [\rho \otimes \ket{e_0}\bra{e_0}]U^\dagger \ket{e_i} = \sum_{i=1}^r K_i \rho K_i^\dagger $$

where tracing over $r$ environmental degrees-of-freedom is equivalent to the Kraus representation of rank $r$.

The quantum operation is implemented by introducing fresh environmental qubits and applying the same random unitary.

On a quantum computer, applying a global random unitary is actually a difficult problem. In fact, performing a global truly random unitary is exponential. Instead, what we may do is approximate the unitary. With each depth $d$ of pair-wise random unitaries, we construct a global unitary where the $d$-th moment matches the true random unitary distribution.

Marchenko-Pastur Distribution

The distribution of the eigenvalues of the system will now look like

$$ P_{\mathrm{MP}}(\lambda) = \frac{1}{2\pi\kappa}\frac{1}{\lambda}\sqrt{(\lambda_{+} - \lambda)(\lambda-\lambda_{-})} $$ where $\kappa=1/(Nr)$ for system dimension $N$ and environment dimension (rank) $r$ $$ \lambda_\pm = \frac{1}{N} \left(1\pm \frac{1}{\sqrt{r}} \right)^2. $$

Marchenko Pastur distribution of resulting state $\rho$ after undergoing the random operation with depth $d$.

Another way to understand the Marchenko Pastur distribution in the context of quantum computing is via the following:

$$ \rho_{\mathrm{ss}} = UDU^\dagger = \sum_i \lambda_{\mathrm{MP}} \ket{\psi_i}\bra{\psi_i}. $$

In words, a random density state can be viewed as a collection of random pure states, where their classical probability of occurring is sampled from the Marchenko Pastur distribution!

Testing Quantum Computers

Now we can look at how different quantum computers perform. Note that we directly measure in the computational basis, which means we do not gain all the information about the state. However, this is okay, as the randomness guarantees that there is no bias to any particular state, and is in fact an invariant measure. Measurements of $\rho_\mathrm{ss}$ are given by the diagonal matrix elements. For instance, the probability of $\ket{011}$ is given as $\bra{011}\rho_{\mathrm{ss}}\ket{011}$, corresponding to the magnitude-squared of the third diagonal matrix element, assuming the standard computational basis. We denote the measurement bitvector as $\ket{x}$, hence the probability outcome of $\ket{x}$

$$ \mathrm{Pr}_{\ket{x}} = \bra{x}\rho_{\mathrm{ss}}\ket{x} = \sum_i \lambda_{\mathrm{MP}_i} |\bra{\psi_i}\ket{x}|^2 = \sum_i \lambda_{\mathrm{MP}} \mathrm{PT}(n) $$

where PT stands for the Porter-Thomas distrubtion. You can see the results below.

Histogram of measurement outputs for real, simulation, and theoretical output distribution.
Empirical cumulative distributions for real, simulation, and theoretical output.

We can quantify the error by calculating the Kolmogorov-Smirnov (KS) distance -- the maximum vertical discrepancy between the empirical and theoretical CDFs -- to quantify the overall error. The KS distances for ibm_kyoto paired with the simulation are 0.103 and 0.081, respectively, while ibm_osaka paired with the simulation display distances of 0.275 and 0.149. So, in this case, we see that both in simulation and on the actual device, our benchmarking procedure indicates ibm_osaka performed worse.

You can read more about the results in paper. But essentially, what distinguishes this protocol is that not only are we able to check whether our unitary gates work correctly, but also whether a reset operation work on qubits. Both of these operations are essential for not only achieving quantum computing (DiVincenzo's criteria) but importantly for achieving quantum error correction.

The code for theoritcal analysis is available in the Qutee.jl examples. The raw data from quantum computers can be found here.