How many parameters are needed to specify a numerical range?

Joint work with Ryan O'Loughlin, P. Pikul, S. Weis, K. Życzkowski, and I.M. Spitkovsky.

Let's define the joint numerical range of $k$ Hermitian matrices $X_i \in \mathbb{C}^{d\times d}$ by

\begin{equation}\begin{aligned} L(X_1,\ldots,X_k) =& \Big\{ \left( \Tr \rho X_1, \ldots, \Tr \rho X_k\right) : \rho\in\CC^{d\times d}, \rho=\rho^\dagger, \Tr \rho=1,\rho\succeq 0\Big\}, \end{aligned} \end{equation}

The (more standard) numerical range $W(X)$ of a single arbitrary (not necessarily Hermitian) matrix $X\in\mathbb{C}^{d\times d}$ can be identified with $L(X_+, X_-)$, where the $X_\pm$ provide decomposition of $X$ into (anti)hermitian parts: $X=X_++i X_-$ 1. The joint numerical range has a physical interpretation of the set of allowed tuples of quantum expectation values (see the note about joint numerical ranges). This note attempts to provide some hints into the following question: generically, how many parameters $p(d,k)$ are needed to specify $L(X_1,\ldots, X_k)$ as the matrix dimension $d$ and count $k$ change?

Of course, an upper bound to $p(d,k)$ is the number of real parameters of the matrices $X_i$ themselves: as they are Hermitian, and each is defined by $d^2$ real numbers, it is $k d^2$ (and normal numerical range, corresponding to $k=2$, is defined by a nonhermitian matrix with $2d^2$ independent parameters). However, it is known that for the case of $W(X)$, a complex symmetric $X'$ can always be found such that $W(X)=W(X')$, leading to a bound $p(d) \le d(d+1)$. Equivalently, one can observe that any complex matrix $X$ is unitary reducible to an upper triangular matrix $X''$, with the same number of parameters.

But is this bound tight? How to even start trying to understand the problem? As usual in such cases, I turned to numerics.

Boundary points of joint numerical ranges

Let us consider now the problem of finding $\vec x \in L(X_1, \ldots, X_k)$ maximizing the expression $\vec n \cdot \vec x$ for a fixed $\vec n \in \RR^k$. As evident from the definition, $L(\ldots)$ is an image of a linear map over set of positive semidefinite unit trace matrices $\rho$ of a fixed dimension; therefore, maximization can be done equivalently over the states themselves, and the question reduces to finding

\begin{equation}\begin{aligned} \arg\,\!\!\!\!\!\!\!\max_{\rho\succeq 0,\Tr\rho=1} \Tr \rho \sum_{i=1}^k n_i X_i. \end{aligned} \end{equation}

The set of maximizers can be characterized by the following construction. Let $X:=\sum_{i=1}^k n_i X_i$; for every eigenvector $v\in\CC^{d}$ to maximal eigenvalue $\lambda_\max$ of $X$, the projector $\rho:=\Pi_{v}$ is a maximizer, and every convex combination of such projections is a maximizer. In other words,

\begin{equation}\begin{aligned} \arg\,\!\!\!\!\!\!\!\max_{\rho\succeq 0,\Tr\rho=1} \Tr \rho \sum_{i=1}^k n_i X_i = \{\rho : \rho\succeq 0, \Tr \rho =1, \Tr \rho X = \lambda_\max\}, \end{aligned} \end{equation}

and consequently $\max_{\vec x \in L} \vec n \cdot \vec x = \lambda_\max$. If the set of maximizers is not a single point (but a lower-dimensional convex set in $\RR^k$), this necessarily means that the preimage under the $\rho \mapsto \left( \Tr \rho X_i \right)_{i=1}^k$ map was not a single point, and consequently $\sum n_i X_i$ has degenerate eigenspace to its maximal eigenvalue. But such a behavior is not generic: for any fixed $(X_i)_{i=1}^k$, unless all of them are simultaneously proportional to identity, the set of $\vec n$ such that eigenspace to maximal $\sum_{i=1}^k n_i X_i$ is degenerate, is of measure zero in $\RR^k$2

By this construction, if the matrices $X_1, \ldots, X_k$ are fixed, generically arbitrary $\vec n$ corresponds to exactly one $\vec x\in \partial L(X_1, \ldots, X_k)$. This observation will serve as the basis of the further numerical analysis: we are interested in the number of parameters needed to fully specify $L(X_1, \ldots, X_k)$, so implicitly we take the tuples of $k$ matrices under equivalence classes corresponding to the same joint numerical range. If a different $(Y_i)$ generates the same numerical range as $(X_i)$, by definition, the $\vec x \in L$ maximizing $\vec n \cdot \vec x$ is the same.

Numerical procedure

The numerical idea is following: for fixed $X_1,\ldots, X_k$, let's try to find small (in any reasonable norm) matrices $\delta_i$, such that $L(X_1, \ldots, X_k)=L(X_1+\delta_1, \ldots, X_k+\delta_k)$. The strict description of $L$ is relatively complex; while its geometry can be understood 3 by polynomial equations, the algorithm to generate a comprehensive description is highly nontrivial. Here, I assume that the knowledge of a finite (sufficiently many) points on the boundary is sufficient to fully specify the entire $L$. This is just an assumption, but numerical evidence suggests that it is a reasonable one.

At this point let us fix $N$ vectors $\vec n_1, \ldots, \vec n_N$ — in the calculations, $N\approx 10^3$ and the vectors are uniformly independently sampled from the sphere $S_{k-1}\subset \RR^k$. Now, let us define the function $\xi: \CC^{k\times d\times d} \rightarrow \RR^{N\times k}$,

\begin{equation}\begin{aligned} \xi(X_1,\ldots, X_k) &= (\vec x_1, \ldots, \vec x_N),\\ \text{where }&\vec x_j:=\arg\,\!\!\!\max_{\vec x\in L(X_1, \ldots, X_k) } \vec n_j \cdot \vec x. \end{aligned} \end{equation}

Given $k$ Hermitian matrices, the function $\xi$ outputs a set of specific points on the boundary of $L$. We assume that the chosen $N\approx 10^3$ is high enough so that the result of the map $\xi$ fully specifies the numerical range in question. Heurestically, this $N$ is much higher than the upper bound $k d^2$ to the number of parameters needed to parameterize $L$, so it is not unsubstantiated claim, but also not a proven one.

Local dimension estimation

The entire idea can now be understood in the following way: for fixed $(X_i):=X_1, ldots, X_k$, we want to find a linear approximation to $\xi$ in the vicinity of $(X_i)$. Assume we can make a Taylor series expansion

\begin{equation}\begin{aligned} \xi(X_1+\delta_1,\ldots, X_k+\delta_k) &= \xi_0 + \sum \Xi[(\delta_i)_{i=1}^k] + O(\delta^2),\\ \end{aligned} \end{equation}

where $\Xi: \CC^{k\times d\times d} \rightarrow \RR^{N\times k}$ transforms the tuple of displacement operators $(\delta_i)$ into tuple of point displacements on the boundary of numerical range.

The claim is now that the dimension of the image of $\Xi$ is equal to the number of parameters needed to parameterize $L$ around $(X_1,\ldots,X_k)$. It relies on the assumption of full specification of $L$ by finitely many boundary points, and nonzero convergence radius of the Taylor series. Numerically it seems that the assumptions are reasonable, but you never know!

The numerical investigation performed was exactly the estimation of the image dimension of $\Xi$ by fixing $\vec n_1, \ldots, \vec n_N$, operators $X_1, \ldots, X_k$, and approximating $\Xi$ by finite differences: since, if all the above assumptions hold,

\begin{equation}\begin{aligned} \Xi[(\delta_i)]=\lim_{t\rightarrow 0} \frac{\xi(X_i+\delta_i)-\xi(X_i)}{t}, \end{aligned}\end{equation}

we just take $t\approx 0$ in the above equation. This leads to uncertainties (in addition to inherent numerical noise) in the determination of $\Xi$; so by the image dimension we take the count of singular values of singificant size. The cutoff $\epsilon$ below which the magnitude of a singular value is taken to be zero (indicating a one-dimensional kernel subspace) is a subjective choice, but for reasonable $(X_i)$ (taken i.i.d from Gaussian Unitary Ensemble, of operator norm around 1) it is clear that some singular values of numerical approximation to $\Xi$ are stable zeros, with only numerical noise affecting their size (e.g. they are stable under changes of $t$ in the approximation).

One observation at this point: the estimated image dimension seems to be equal for all matrices $(X_i)$ sampled from the Gaussian Unitary Ensemble. I believe that this is a generic behavior, but it may not be the case on restricted subsets, e.g. if $(X_i)$ have joint block forms.

The numerically estimated number of parameters $p(d,k)$ look like this:

$p(d,k)$ $k=2$ $k=3$ $k=4$ $k=5$ $k=6$ $k=7$
$d=2$ 54 9 13 17 21 25
$d=3$ 9 19 28 37 46  
$d=4$ 14 33 49 65 81  
$d=5$ 20 51 76 101    
$d=6$ 27 73 109 145    
$d=7$ 35 99 148 197    
$d=8$ 44 129 193      

Missing entries are due to numerical uncertainties and time constrains for the calculations; they were not optimized for speed at all and there is a lot of room for improvement. It seems (through OEIS search) that

\begin{equation}\begin{aligned} p(d,k=2) &= \frac{d(d-3)}2,\\ p(d,k\ge3) &= (k-1)d^2 +1, \end{aligned} \end{equation}

and additionally, $p(1,1)=1$, since the numerical range is a single point in $\RR$, and $p(d\ge 2,1)=2$, as the numerical range is a closed segment specified by its endpoints.

  1. So, $X_\pm=X_\pm^\dagger$. 

  2. Numerically speaking, if you just sample $\vec n$ e.g. from a sphere $S_{k-1}$ according to some distribution, you are not going to see the degeneracy. 

  3. With some difficulties. While Kippenhahn theorem ascertains that $L(X_1, X_2)$ is a convex hull of a real algebraic variety (and provides a way to find the defining polynomial equations), no direct analogue for higher number of operators $k$ exists. This does not matter here, as explicitly we do not use such a description, but a numerical approximation. 

  4. Not surprising: every $L(X_1, X_2)$ is an ellipse for $X_1$ and $X_2$ being Hermitian matrices of size $d=2$, and every ellipse (possibly degenerate, but we exclude such cases) is defined by two foci and semimajor axis, leading to $p(d=2,k=2)=5$ parameters in total. Similar reasoning applied to higher $k$ for $d=2$ leads to $p(d=2,k)=4k-3$, matching the numerical observations.