# Understanding the Tucker decomposition, and compressing tensor-valued data (with R code)

This article is originally published at https://www.alexejgossmann.com/

In many applications, data naturally form an *n*-way tensor with *n > 2*, rather than a “tidy” table.
As mentioned in the beginning of my last blog post, a tensor is essentially a multi-dimensional array:

- a tensor of order one is a vector, which simply is a column of numbers,
- a tensor of order two is a matrix, which is basically numbers arranged in a rectangle,
- a tensor of order three looks like numbers arranged in rectangular box (or a cube, if all modes have the same dimension),
- an
*n*th order (or*n*-way) tensor looks like numbers arranged in an*n*-hyperrectangle… you get the idea…

In this post I introduce the Tucker decomposition (Tucker (1966) “Some mathematical notes on three-mode factor analysis”). The Tucker decomposition family includes methods such as

- the
*higher-order SVD*, or HOSVD, which is a generalization of the matrix SVD to tensors (De Lathauwer, De Moor, and Vanderwalle (2000) “A multilinear singular value decomposition”), - the
*higher order orthogonal iteration*, or HOOI, which delivers the best approximation to a given tensor by another tensor with prescribed mode-1 rank, mode-2 rank, etc. (De Lathauwer, De Moor, and Vanderwalle (2000) “On the Best Rank-1 and Rank-(R1,R2,…,RN) Approximation of Higher-Order Tensors”).

I introduce both approaches, and in order to demonstrate the usefulness of these concepts, I present a simple data compression example using The World Bank’s World Development Indicators dataset (though I use the version available on Kaggle).

However, before we can get started with the decompositions, we need to look at and understand the *k*-mode tensor product.

Throughout this post, I will also introduce the R functions from the package `rTensor`

, which can be used to perform all of the presented computations.

## Tensor times matrix: the *k*-mode product

The -mode product of a tensor with a matrix is written as

The resulting tensor is of size , and contains the elements

It can be hard, at first, to understand what that definition really means, or to visualize it in your mind. I find that it becomes easier once you realize that the *k*-mode product amounts to multiplying each mode-*k* fiber of by the matrix .

We can demonstrate that in R:

```
library(rTensor)
tnsr <- as.tensor(array(1:12, dim = c(2, 2, 3)))
mat <- matrix(1:6, 3, 2)
# 1-mode product performed via the function ttm in rTensor
tnsr_times_mat <- ttm(tnsr = tnsr, mat = mat, m = 1)
```

Now, for example, the first slice of `tnsr_times_mat`

is the same as the matrix product of `mat`

with the first slice of `tnsr`

:

```
tnsr_times_mat@data[ , , 1]
# [,1] [,2]
# [1,] 9 19
# [2,] 12 26
# [3,] 15 33
mat %*% as.matrix(tnsr@data[ , , 1])
# [,1] [,2]
# [1,] 9 19
# [2,] 12 26
# [3,] 15 33
```

You might want to play around some more with the function `ttm`

in R to get a better understanding of the *k*-mode product.

A few important facts about the *k*-mode product:

- if ,
- but (in general ).

## Tucker decomposition

The Tucker decomposition (Tucker (1966)) decomposes a tensor into a core tensor multiplied by a matrix along each mode (i.e., transformed via a -mode product for every ):

Note that might be much smaller than the original tensor if we accept an approximation instead of an exact equality.

In case of three-way tensors, we can hold on to the following mental image:

It is interesting to note that the CP decomposition, that I introduced in a previous blog post, is a special case of the Tucker decomposition, where the core tensor is constrained to be superdiagonal.

### Higher-order SVD (HOSVD)

So, how do you compute the Tucker decomposition?

Many algorithms rely on the following fundamental equivalence:

The above equation uses some notation that was not introduced yet:

- denotes the Kronecker product.
is the mode- unfolding (or mode- matricization) of the tensor . The mode- unfolding arranges the mode- fibers (a

*fiber*is a generalization of*column*to tensors) of as columns into a matrix. The concept may be easiest to understand by looking at an example. The following R code shows a 3-way tensor and all three of its mode- unfoldings (using the`k_unfold`

function from the`rTensor`

package):`tnsr <- as.tensor(array(1:12, dim = c(2, 3, 2))) tnsr@data # , , 1 # # [,1] [,2] [,3] # [1,] 1 3 5 # [2,] 2 4 6 # # , , 2 # # [,1] [,2] [,3] # [1,] 7 9 11 # [2,] 8 10 12 # mode-1 unfolding: k_unfold(tnsr, 1)@data # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 1 3 5 7 9 11 # [2,] 2 4 6 8 10 12 # mode-2 unfolding: k_unfold(tnsr, 2)@data # [,1] [,2] [,3] [,4] # [1,] 1 2 7 8 # [2,] 3 4 9 10 # [3,] 5 6 11 12 # mode-3 unfolding: k_unfold(tnsr, 3)@data # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 1 2 3 4 5 6 # [2,] 7 8 9 10 11 12`

A straightforward approach to solve the Tucker decomposition would be to solve each mode- matricized form of the Tucker decomposition (shown in the equivalence above) for . This approach is known as *higher order SVD*, or HOSVD. It can be regarded as a generalization of the matrix SVD, because the matrices are orthogonal, while the tensor is “ordered” and “all-orthogonal” (see De Lathauwer et. al. (2000) for detail). The resulting algorithm is shown below.

In R we can perform HOSVD using the function `hosvd`

from `rTensor`

:

```
tnsr <- rand_tensor(modes = c(30, 40, 50))
hosv_decomp <- hosvd(tnsr)
```

Now `hosv_decomp$Z`

is our matrix , and `hosv_decomp$U`

is a list containing all the matrices . We can use the function `ttl`

, which performs multiple *k*-mode products on multiple modes successively given a tensor and a list of matrices, to check that up to numerical error the equation is satisfied:

```
HOSVD_prod <- ttl(hosv_decomp$Z, hosv_decomp$U, 1:3)
error <- tnsr - HOSVD_prod
table(abs(error@data) < 1e-12)
#
# TRUE
# 60000
```

### Higher order orthogonal iteration (HOOI)

Note that we can also use HOSVD to compress by truncating the matrices . The truncated HOSVD, however, is known to not give the best fit, as measured by the norm of the difference

The *higher order orthogonal iteration*, or HOOI, algorithm finds the optimal approximation (with respect to the Frobenius norm loss) by, essentially, iterating the alternating truncation and SVD until convergence. If we truncate to have columns, then the HOOI solution can be obtained by the following algorithm.

### Application of HOOI to data compression

The example considered below is somewhat silly, given that the tensor I’m compressing isn’t very big, and thus there isn’t much of a point in compressing it. However, I think that the example still shows off very well how the algorithm can be very useful when the data size is much bigger (or the available storage much smaller).

I have downloaded from Kaggle the World Development Indicators dataset, originally collected and published by The World Bank (the original dataset is available here).

The data can be arranged into a three-way tensor with the three modes corresponding to *country* (list of available countries), *indicator* (list of available indicators), and *year* (1960-2014). Since I didn’t have any time to deal with NA values in any creative way, I have kept only three indicators in the dataset. And I have replaced the remaining NAs with a country-wise average value for each particular indicator. Also, I have forgotten to normalize the data :disappointed:. The preprocessing resulted in a tensor of size 247-countries-by-3-indicators-by-55-years, that looks sort of like this:

In particular, large stretches of the data within a given country tend to be nearly constant, or nearly piece-wise constant.

We use the function `tucker`

from `rTensor`

to obtain a Tucker decomposition via HOOI, where we set the ranks to the value 3 at each mode.

```
dim(wdi_tnsr)
# [1] 247 3 55
tucker_decomp <- tucker(wdi_tnsr, ranks = c(3, 3, 3))
str(tucker_decomp)
# List of 7
# $ Z :Formal class 'Tensor' [package "rTensor"] with 3 slots
# .. ..@ num_modes: int 3
# .. ..@ modes : int [1:3] 3 3 3
# .. ..@ data : num [1:3, 1:3, 1:3] -6.60e+10 -1.13e+05 6.24e+05 -7.76e+05 -1.93e+08 ...
# $ U :List of 3
# ..$ : num [1:247, 1:3] -0.02577 -0.00065 -0.01146 -0.19637 -0.17317 ...
# ..$ : num [1:3, 1:3] -1.00 -6.97e-10 -2.08e-02 2.08e-02 -4.70e-08 ...
# ..$ : num [1:55, 1:3] -0.0762 -0.0772 -0.0785 -0.0802 -0.082 ...
# $ conv : logi TRUE
# $ est :Formal class 'Tensor' [package "rTensor"] with 3 slots
# .. ..@ num_modes: int 3
# .. ..@ modes : int [1:3] 247 3 55
# .. ..@ data : num [1:247, 1:3, 1:55] 9.83e+07 4.44e+06 8.81e+07 1.05e+09 8.97e+08 ...
# $ norm_percent: num 99.4
# $ fnorm_resid : num 3.9e+08
# $ all_resids : num [1:2] 3.9e+08 3.9e+08
# NULL
```

To see how well the tensor decomposition approximates the original tensor, we can look at the relative error

```
wdi_tnsr_approx <- ttl(tucker_decomp$Z, tucker_decomp$U, 1:3)
fnorm(wdi_tnsr - wdi_tnsr_approx) / fnorm(wdi_tnsr)
# [1] 0.005908934
```

and at the percentage of the norm of the original tensor explained by the Tucker decomposition

```
tucker_decomp$norm_percent
# [1] 99.40911
```

We, observe that we indeed achieve a recovery with an accuracy of over 99%. For comparison, the original tensor contains `247 * 3 * 55 = 40755`

entries, while the computed Tucker decomposition consists of only `127 * 3 + 3 * 3 + 55 * 3 + 3 * 3 * 3 = 582`

numbers. That’s a reduction in size by a factor greater than 70.

Even though data compression does not make much sense for the size of the dataset considered here, it clearly shows potential to be very useful for purposes of data distribution and data storage, when the data size far exceeds the terabyte range.

Thanks for visiting r-craft.org

This article is originally published at https://www.alexejgossmann.com/

Please visit source website for post related comments.