2012年4月25日 星期三

Nonlinear dimensionality reduction by locally linear embedding

In this paper, the authors want to find the low-dimension representation of the observed data which considers the original data structure. In real world, we receive lots of stimuli at the same time, such as  the pixel intensities of images, the power spectra of sounds, and so on. These oberservations typically have a much more compact description, and are lied on or close to a smooth low-dimensional manifold.  Thus, in this paper, they assume the data are generated on some low-dimension manifold as Fig1 shows.






















Traditional method, such as PCA, cannot preserve the distances on the manifold and might lower the performance. LLE (locally linear embedding) wants the find the axes along with the manifold, as figure(c) shows.
























Figure 2 is the overview of the algorithm. I will describe them in detail in the following :
For every point, they first find K-nearest neighbors, and use these neighbors to learn the weights which minimize the lose function. (Step 1 in Figure 2)






These weights can be found by solving a least-squares problem.  (Step 2 in Figure 2)
After finding these weights, try to find the low-dimension vector Yi representing the axes on each point.
This can be done by minimizing the following function






That is almost the same as the function in Step 2. The only difference is that the fixed values are Wij now, and we want to find suitable Y. This can be solve by finding the eigenvalues. (Step 3 in Figure 2)
After finding the new positions of all points, we can put all the points on the low-dimension space and the result is like figure 1(c).

They do the experiments on the facial images and text documents, and I only show the result of facial images here.
This figure is generated by projection all the points on the first two coordinates, and the bottom images correspond to points along the top-right path (linked by solid line). We can find the expressions of these facial images change smoothly. That means, this dimension reduction method really somewhat preserves the semantic distances. 

========================================================
Comment:
The authors first observe the behaviors the data. Data often have some correlation between certain dimensions of the high-dim space. That is, data can't expend all the space, and they want to find the underlying structure. They find the distance on the manifold is almost the same as the euclidean distance between near points, and assume this property holds for the data. Then, use this property to resolve the underlying structure. That is a good example for solving the problem by observing the property of the data. 

2012年4月10日 星期二

Probabilistic latent semantic indexing

In this paper, the authors propose one useful method to learn the latent semantic by probabilistic, pLSA.

In pLSA, they model that every document is composed by lots of latent class, and the words generated are only dependent on the latent classes when given latent classes.
So they model the problem like the following.
d represents certain document, w represents certain word, and z is the latent class.

In this paper, they want minimizing the error distance between original data and the reconstructed data after using latent classes (using KL divergence to measure it). After a serious of calculation, we can covert original question to maximize the following log-likelihood function
where n(d, w) denotes the term frequency.

This optimization problem can be solved by using EM algorithm, and the following is the algorithm.
E step:
Calculating the probability of certain class, z, by using the values of d and w get previous round.
M step:
Then, use the P(z|d, w) to update the probabilities of P(w|z), P(d|z) and P(z).
Repeat the steps of E and M until the change rate of results under certain threshold.

The above is the traditional EM method of solving optimization problem. However, the results highly depend on the initial probabilities, and might get the local maximum values by using this method.
Thus, they proposed tempered EM (TEM). This concept is derived from deterministic annealing. Then, the formula in E step becomes that,

The physical meaning of pLSA can be viewed as finding the bases of generating the documents.
P(w|z1), P(w|z2) ... P(w|zn) can be viewed as the basis vectors for representing the document. Because the summation of the values of these bases should be equal to 1, the reconstructed document should be on the plane expended by these vectors. The error rate can be viewed as the shortest distance between the original points and the expended plane. We want to learn a suitable bases that minimize the distance between learned plane and training points.

========================================================
Comment:
I think the most important part of this paper is the physical meaning of pLSA. It is not just a bunch of mathematical formula, optimization, and so on. It also has a very strong physical support that really convinced me which can be used on almost any kind of data, not just an adhoc solution. This paper provides a good example of explaining complex maths in a simple geometric way.