AMMAI2012: Support vector learning for ordinal regression

In this paper, they proposed a problem about ordinal regression, which is very different from the traditional learning problems, such as classification and metric regression.

In traditional classification problem, the loss function is defined as 0-1 loss.
In regression estimation, the loss function will take into account the full metric structure.
In ordinal regression problem, the problem this paper want to solve, they consider a problem which shares properties of both classification and metric regression. Like classification, the outcome space is a finite set, and like metric regression, there exists an ordering among the elements.

The detail of the problem is defined as follows.
An input space X belongs to R^n with objects being represented by feature vectors x = (x₁,...,x_n)^T.
Y denotes the rank order.
For every training point in the set S, we want to find a mapping function that map the points to the rankings.

The error cost function is defined as

l_pref is 1 if and only if the ranking of h(x1) and h(x2) is different from y1 and y2.
The expected cost can be computed as

However, that is hard to minimize.
Thus, we first convert it into another set S', which is a set representing the pairwise ranking.

And in theorem 1, it said

Which means that for every projection function we get in the set S', the 0-1 loss for S' is proportional to the expected cost for S. Then, the minimum cost projection function found on the set S' will be the same as found on set S. Thus, now the ordinal problem is converted into the classification problem and can be solved by SVM.

●Solving by using SVM
We want to find a projection vector W, and some biases that project original data into ranks.

Then, we can give SVM some degree of violation, and solve W.

After finding W, we can find different thresholds θ(r_i) for different ranking now by minimizing the following equation under the constraint of above equation.

After doing this, we can get the final results we want as the following figure.

When new data comes, it simply project it by using W, and judge its rank by check the thresholds θ(r_i)

========================================================

Comment:
The authors are very smart, they convert the ordinal problem into classification problem. Then, they can directly using the methods developed well on classification to solve the ordinal problem. However, I think the most important thing is to tell us why the projection derive on set S' will be the same as S. They only provide a theorem to say that they will be the same but haven't explain it. I think that is one weak point.
Another weak point is the notation of this paper. This paper use too much equation to define a simple concept, such as l_pref. I think I will become much better to provide some illustration, or state their meaning directly by using words.

AMMAI2012

2012年6月5日星期二

Support vector learning for ordinal regression

2 則留言:

2012年6月5日 星期二

Support vector learning for ordinal regression

2 則留言:

2012年6月5日星期二