Progressive Visual Analytics     GitHub   Link

The user interface of my tool PPCA. The top-left window shows previous results and allows user to export them to a CSV file. The bottom-left window allows user to select parameters of dataset and algorithms. The bottom-right shows the convergence result of current algorithm in realtime. The center area shows the projection result.

This is a research project that I have worked on for one and a half year, supervised by Jean-Daniel Fekete.

Visual analytics helps user understand complex data and models with graphical representations and interactions. Interaction techniques play an important role in visual analytics. However, interaction performance is affected by the growth of datasets and the complexity of computations. With large datasets or sophisticate algorithms, the visual analytics system can no longer remain interactive.

In the traditional visual analysis workflow, users need to wait for the computation an uncontrolled amount of time. Furthermore, if some parameters need to be adjusted, the whole computation needs to be redone from the beginning. In 2014, Stolper et al. proposed a new paradigm called Progressive Visual Analytics. Instead of waiting until the whole computation is completed, the system visualizes partial results progressively and refines the partial result at each iteration. Users can analysis these partial results immediately. They can also stop the computation when they need to adjust the parameters and recalculate the result, which avoids waiting for a long time.

In this project, I studied how to apply this paradigm to a PCA-based visualization system. Principal Component Analysis (PCA) is a dimensionality reduction method widely used in machine learning and statistics. In visualization, PCA is used to project high-dimensional data in 2D.

An animation showing how my tool computes PCA progressively

I developped this tool called PPCA in C++. It performs traditional or progressive PCA on the dataset chosen. The interface is implemented using the lightweight GUI and graphical library Dear ImGUI, that relies on OpenGL for the actual drawing. Basic matrix operations including matrix multiplication, transpose, element-wise operations are computed with OpenCV.

A comparison between the time behavior of a traditional and progressive PCA pipeline.

PPCA is composed of three parts: GUI, algorithm module and data processing module. Data processing module generate the data matrix with a data generator or from files. Algorithm module takes the raw data as input to produce partial or final results.

PPCA is implemented with two threads, one render thread for refreshing the display and one calculation thread to perform the progressive computation. The render thread is the main thread and is always keeps reactive. The calculation thread is blocked when the calculation is finished. When the user modifies some parameters, the new parameter set will be passed to the calculation thread. The calculation thread will calculate the projection result progressively and pass the partial result to the render thread by a vertex buffer. At each iteration, the calculation thread checks the signal to see if user wants to interrupt or pause the calculation.

A comparison of the timelines of three Progressive PCA algorithms applied to the MNIST dataset

I studied, implemented several state-of-art PCA algorithms and adapting them to progressive PCA. With this tool, I compared different progressive PCA algorithms by a set of benchmarks and explore trade-offs in terms of speed, accuracy, latency, and memory usage.

Publication: Fekete, J. D., Chen, Q., Feng, Y., & Renault, J. (2019, October). Practical Use Cases for Progressive Visual Analytics. IEEE VIS DISA 2019.