2012年3月12日 星期一

Distinctive Image Features from Scale-Invariant Keypoints

In this paper, they describe one of the most used scale-invariant features, SIFT.

SIFT is mainly separated by two parts. One is feature detector, and the other is feature descriptor.

●For feature detector, it has three steps.

1. Scale-space extrema detection

It first put different Gaussian mask with different σ on the original image and get the equation

where

Then, calculate the difference between layers and store as D

where k=2^1/s, and s is the number of layers in one octave.

The following is the graphic view of these equations.

From the property of Gaussian mask, we know that applying 2σ Gaussian mask on the image I is the same as applying σ Gaussian mask on the half-scale image I. That's why the above image looks like.

After doing that, for each point, compare its D value with 26 points surround it, like the above image. If it's the smallest one or the biggest one, then save it as a candidate for interest point.

2.Keypoint localization

After selecting candidate points, some interest points might be not stable. Thus, we want to delete them from our candidate list. For the points that are stable, we slightly move them to fit the graphic more precisely.

There are two rules to decide if the point is stable or not. The points are thought as not stable if they have the low contrast or are poorly localized on the edge.

3. Orientation assignment

After doing all the steps above, we can get the points like (d) in the above figure.

Then, we need to assign the orientations to all interest points, so that they can rotate to the dominate directions before using SIFT descriptor.

The way to assign the direction and magnitude is simply calculate the gradient of the bits surrounding the interest point.

After doing above three steps for feature detector, start using feature descriptor to describe the patch around interest points.

●The local image descriptor (feature descriptor)

After rotating the pixels in the region to main direction, it separate the region into 16*16 subregions. Calculate the orientation and magnitude as section 3 above, and then accumulated into orientation histograms summarizing the contents over 4x4 subregions.

The following is the example of using 8*8 subregions and accumulated into 2*2 descriptor array.

========================================================

Comment:

SIFT is really a useful descriptor to handle the scaling and the rotation problem. Besides, in SIFT, all the things are done in the local, so it somewhat can solve the problem of different view angle. But SIFT still have some limitation, that is, it doesn't contain any information about color. That is why some people propose CSIFT afterward.

沒有留言:

張貼留言