FITAT 2025

Speaker

Goutam Chakraborty

Iwate Prefectural University

Distinguished & Emeritus Professor, Iwate Prefectural University, Japan.

Biography:

Prof Goutam Chakraborty is a Distinguished and Emeritus Professor of Iwate Prefectural University, Japan. From 2022 to 2025, he served as a Distinguished Professor and Dean of Madanapalle Institute of Technology & Science (MITS), India. He spent short periods as a visiting professor at different universities, including one year at the University of Waterloo, Canada.

His research interests include machine learning, data science, and soft computing algorithms, and their applications in solving pattern recognition, prediction, scheduling, and optimization problems. His work has been applied to various challenges, including wired and wireless network problems. Recently, he has focused on medical data and image analysis, scale-free networks, and matrix completion problems. He has authored approximately 270 peer-reviewed research papers published in well-regarded journals and international conferences. Additionally, he has delivered keynote speeches and invited talks at various international conferences, served as an editor for several journals and edited books, and organized IEEE conferences in various capacities.

Presently, Prof Chakraborty is the steering co-chair of the technical committee on Awareness Computing, IEEE SMC, and steering chair of IEEE Transactions on Affective Computing. He is a life member of IEEE and ACM.

Relation Between Data Complexity and the topological dimension of its Manifold Space

For any set of high-dimensional data with latent patterns like classes or clusters, the individual data points in the feature space are distributed over a lower-dimensional manifold space. It is a curved space, with a topological dimension much lower than the original data dimension. Analyses on the data would be simpler if we worked on this manifold space.

PCA is a lower-dimensional projection of the data. In PCA, we make a simplified assumption of linear correlation among elements of the feature vector. This assumption facilitates representing the data in a low-dimensional Euclidean space, where the prominent eigenvectors serve as the basis vectors. Here, the representation error is high as, in general, the correlation is not linear.

The core principle of data analysis is to find the distances between pairs of data points. For high-dimensional data, the Euclidean distance (or Minkowski distance) is not the true measure because data points lie on a curved (manifold) space. The distance needs to be calculated on this curved surface on which data points are distributed. One efficient tool is to use the diffusion distance, calculated based on random-walk over the data points. Random walk, by its own nature, could measure distances on the curved manifold space. This paves the way to estimate the dimension of the embedding space. We will discuss the algorithm.

When the elements of the data vectors in the data set are un-correlated (random), their mutual distance distribution is normal with low variance. The dimension of the manifold space is high. On the other hand, we hypothesize that, if all members of the data set represent simple patterns, the mutual distances’ distribution will not be normal and the divergence will be large. The dimension of the manifold space is low. We will establish this with experiments. This will facilitate us to estimate the complexity of the pattern embedded in the data set.