I n t e r c o n n e c t i o n

Exploring that which binds everything as one

Artificial Intelligence


The Manifold Hypothesis

How is it possible that AI models can learn meaningful patterns from ultra high-dimensional data?

A manifold is a smooth, curved surface that locally appears as flat Euclidean space. A sphere, for example, is a 2D manifold living in 3D space. A curved line is a 1D manifold in 2D space.

Real-world high-dimensional data, such as images, audio, text, and so on, doesn't actually spread out across all of its high-dimensional space. Instead, it clusters near a much lower-dimensional structure, called a manifold, embedded within that space.

The manifold hypothesis says that in an average grayscale image, with tens of thousands of dimensions, the actual subjects such as faces or cars occupy a small slice of that entire feature space, and perhaps only a few hundred or maybe thousand dimensions contain meaningful variation.

The manifold hypothesis is what enables learning and generalization. Many deep learning methods can be understood as trying to learn a coordinate system for the data manifold, "unfolding" or "flattening" it into a simpler latent space.

In practice, data manifolds are also rarely perfectly smooth. They can have discontinuities, holes, or multiple disconnected components, and research is being done every day in order to understand this phenomena better.


The Curse of Dimensionality

In high dimensions, everything is basically orthogonal. You can have two random vectors in a thousand dimensional space and it will still probably have a dot product close to zero. This is due to the law of large numbers averaging out coordinate contributions.

In these large dimensional spaces our intuition from low-dimensional spaces breaks down. In 3D, common intuition says that random directions point all over, but in these high dimensional spaces "all over" collapses into basically perfect right angles.

For machine learning, this presents a lot of problems. Too many features (dimensions) cause data to become sparse, which means the distances between points becomes less meaningful, and models then overfit the data, which means it will identify specific patterns from the data, not broadly applicable rules. With that, we encounter the problem of requiring exponentially more data to fight sparsity and find meaningful patterns within that large state space.

This does become useful at times, as non-orthogonal vectors will start to stand out like a flare and help find vectors that aren't just randomly related to each other.