2012年4月25日 星期三

Nonlinear dimensionality reduction by locally linear embedding

In this paper, the authors want to find the low-dimension representation of the observed data which considers the original data structure. In real world, we receive lots of stimuli at the same time, such as  the pixel intensities of images, the power spectra of sounds, and so on. These oberservations typically have a much more compact description, and are lied on or close to a smooth low-dimensional manifold.  Thus, in this paper, they assume the data are generated on some low-dimension manifold as Fig1 shows.

Traditional method, such as PCA, cannot preserve the distances on the manifold and might lower the performance. LLE (locally linear embedding) wants the find the axes along with the manifold, as figure(c) shows.

Figure 2 is the overview of the algorithm. I will describe them in detail in the following :
For every point, they first find K-nearest neighbors, and use these neighbors to learn the weights which minimize the lose function. (Step 1 in Figure 2)

These weights can be found by solving a least-squares problem.  (Step 2 in Figure 2)
After finding these weights, try to find the low-dimension vector Yi representing the axes on each point.
This can be done by minimizing the following function

That is almost the same as the function in Step 2. The only difference is that the fixed values are Wij now, and we want to find suitable Y. This can be solve by finding the eigenvalues. (Step 3 in Figure 2)
After finding the new positions of all points, we can put all the points on the low-dimension space and the result is like figure 1(c).

They do the experiments on the facial images and text documents, and I only show the result of facial images here.
This figure is generated by projection all the points on the ﬁrst two coordinates, and the bottom images correspond to points along the top-right path (linked by solid line). We can find the expressions of these facial images change smoothly. That means, this dimension reduction method really somewhat preserves the semantic distances.

========================================================
Comment:
The authors first observe the behaviors the data. Data often have some correlation between certain dimensions of the high-dim space. That is, data can't expend all the space, and they want to find the underlying structure. They find the distance on the manifold is almost the same as the euclidean distance between near points, and assume this property holds for the data. Then, use this property to resolve the underlying structure. That is a good example for solving the problem by observing the property of the data.