They tried some well-known techniques for text retrieval, such as tf-idf, vector representation on VWs.
The following is the retrieval algorithm for this paper
The main things different from text-retrieval of this paper are feature extraction, visual word construction, and spatial verification. Thus, I will focus on these parts and quickly go through the other parts in the following.
●Feature extraction (first two steps in pre-processing)
This paper use two kinds of feature detectors and combine them together.
The first detector is Shape Adapter (SA), which tends to be centered on corner like feature.
The other is Maximally Stable (MS), which tends to find the blobs of high contrast with respect to their surrounding.
After finding interesting regions by using these two detectors, use SIFT descriptor(you can find the detail introduction in here) to describe the regions.
In order to get more stable features, any region which does not survive for more than three frames is deleted.
●Visual word construction
After extracting features, they use k-means clustering to find the centroids. Then use these centroids as "visual words". In this paper, they use 6,000 clusters for Shape Adapted regions, and 10,000 clusters for Maximally Stable regions.
Then, each frame is represented as a vector as tradition text-retrieval system.
Each element in the vector is weighted by tf-idf. The similarity is defined by cosine similarity.
They also use stop-list technique to delete some terms, and found that really useful as showed in the figure below.
One thing worth mention is that images has geometric information and document haven't.
So, they further do spatial verification on the matched pairs.
For every match pair, find 15 nearest spatial neighbors in both the query and target frame to verify it. If there doesn't have any match pair in these 30 points (15 in query, 15 in target), then reject this pair.
The following is the illustration (it only search 5 nearest neighbors).
Convert the new problem to old problem is a very clever way to solve problems because there are some well-developed methods to solve it. In this paper, it give us a very good example to do that. After converting the problem, we still need to check is there any different property between the problems. Like this paper, they add spatial verification at the end.