2012年5月2日 星期三

A Global Geometric Framework for Nonlinear Dimensionality Reduction

The goal of this paper is almost the same as the paper I read last week (Nonlinear Dimensionality Reduction by Locally Linear Embedding). Actually, they are coming form the same year Science magazine, and the position of this paper is one former. The same concept is derived from two different teams at almost the same time. That is quite interesting.

In this paper, they also want to find the low dimension underlying manifold structure from high dimension space.
The example is also "Swill roll".














"A" is the original data structure in high dimension space, and they want to find the underlying structure and expand them as "C".
They also assume that geodesic distance can be approximated by adding up a sequence of “short hops” between neighboring points (This concept is the same as previous paper).
However, they use different method to solve the underlying structure.
They proposed isometric feature mapping (Isomap) algorithm to solve it.The following table is their algorithm overview.




















At the first step, they find the neighbor points for every point (the distance of them less then a threshold or found by KNN) . And construct a weighted graph G over the data points by connecting their neighbors.
Every point is viewed as a node in G, and the weight of one edge is defined as dx(i,j). dx(i,j) represent the distance between near points i and j.
After constructing this graph, we can compute the shortest distance of every two nodes by dynamic programming method (Floyd–Warshall algorithm). They denote this distance as dG(i,j).
Then, we know the distances of every two points in the underlying structure. Thus, we can start put them in the new low dimension space. They use classical  multidimensional scaling (MDS) to put the points into d-dimension  Euclidean space Y that best preserves the distances on the graph.
It can be fulfill by minimizing the following cost function.





Where  DY denotes the matrix of Euclidean distances {dY(i,j) = |yi -yj|}.  The Tau operator converts distances to inner products  that supports efficient optimization. This function can be solved by setting the coordinates yto the top d eigen-vectors of the matrix Tau(DG).
The true dimensionality of the data can be estimated from the decrease in error as the dimensionality of Y is increased. That is shown in the figure below (The true dimensionality of underlying structure is marked by arrows).









They put three experiments result on this paper. For all three data sets, the natural appearance of linear interpolations between distant points in the low dimensional coordinate space confirms that Isomap has captured the data’s perceptually relevant structure.




沒有留言:

張貼留言