Artificial Intelligence

The Manifold Hypothesis

The Curse of Dimensionality

In high dimensions, everything is basically orthogonal. You can have two random vectors in a thousand dimensional space and it will still probably have a dot product close to zero. This is due to the law of large numbers averaging out coordinate contributions.

In these large dimensional spaces our intuition from low-dimensional spaces breaks down. In 3D, common intuition says that random directions point all over, but in these high dimensional spaces "all over" collapses into basically perfect right angles.

For machine learning, this presents a lot of problems. Too many features (dimensions) cause data to become sparse, which means the distances between points becomes less meaningful, and models then overfit the data, which means it will identify specific patterns from the data, not broadly applicable rules. With that, we encounter the problem of requiring exponentially more data to fight sparsity and find meaningful patterns within that large state space.

This does become useful at times, as non-orthogonal vectors will start to stand out like a flare and help find vectors that aren't just randomly related to each other.

I n t e r c o n n e c t i o n

The Manifold Hypothesis

The Curse of Dimensionality