Multi-way data analysis: a perspective of advanced tensor decomposition

Huiwen Yu

Research output: Book/ReportPh.D. thesisResearch

Abstract

In era of Artificial Intelligence, the volume and dimensionality of the data both dramatically increase. For example, the instrument can yield millions of bytes data in a blink of an eye owing to the development of the modern instrument technology. More and more data come to the hands of the analysts in the format of large multi-way array, so-called high-order tensor. There is a need by the analysts or the users for efficiently analyzing such multi-way data and extracting the essential information from the complex data in order to conduct the subsequent expert analysis. Tensor decomposition has been proved to be efficient tools for analyzing such type of multi-way data. High-order tensor models have been successfully applied in different fields for analyzing different types of multi-way data and got increasing attention in recent years. For example, PARAFAC2 is widely used for analyzing multi-way GC-MS data (a type of shifted high-order tensor) and extract the chemical information of interest from that massive data. As a high-order tensor decomposition model, the mathematical principles of PARAFAC2 model are very complex. The applications of PARAFAC2 on large irregular high-order tensor require skilled tensor knowledge by the analysts. Moreover, the fitting of the PARAFAC2 model is mathematical difficult and practically time consuming especially for large datasets. There is an urgent need for the optimizations and improvements in PARAFAC2 model in order to extend and generalize its applications in different fields. Furthermore, it is also important to find possible alternative tensor model of PARAFAC2 to achieve more efficient analysis on complex and irregular high-order tensor data. Therefore, these aforementioned topics constitute the main research focus of the thesis. In Paper I, we systematically reviewed some representative high-order tensor models which were used for food data analysis. Not only the advances, advantages and limitations of different high- order tensor models were reviewed, the applications of high-order tensor analysis in food process control, quality evaluation and fraud, identification and classification, prediction and quantification, and image analysis were also summarized in the context of multi-way NIR data. In the work of Paper II, we proposed some novel implementations of extrapolation-based PARAFAC2 algorithms in order to accelerate the model fitting of PARAFAC2. We showed that the newly proposed All-at-once Nesterov-like extrapolation PARAFAC2-ALS algorithm achieved the 4 fastest convergence speed whilst maintaining a low fraction of local minima solutions. The comprehensive investigation and validation were performed by comparing different PARAFAC2 algorithms on both simulated and real multi-way datasets. In the work of Paper III, we investigated the local minima issue in the PARAFAC2 decomposition. We reveal the local minima problem in multi-way data analysis and then showed how model constraints, optimized initializations and simple algorithmic design in the general PARAFAC2 algorithm can be beneficial for avoiding local minima. It is concluded that using the proposed remedies can significantly decrease the appearance of local minima in PARAFAC2 modeling, or even eliminate all the local minima in some cases. In Paper IV, we introduced a new tensor analysis method named PARASIAS for analyzing shifted high-order tensor. Compared to the state-of-art PARAFAC2 model, the proposed PARASIAS method had significant advantages in terms of model simplicity, convergence speed, the model robustness, the ability of yielding non-negative loading matrices from the shift mode and the possibility of easily extending to data with multiple shift mode, even though more than the right components were sometimes necessary for PARASIAS to model the data in order to yield better resolved results.
Original languageEnglish
PublisherDepartment of Food Science, Faculty of Science, University of Copenhagen
Number of pages135
Publication statusPublished - 2022

Cite this