[IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia...

A LOG-RATIO PAIR APPROACH TO ENDOSCOPIC IMAGE MATCHING

Rohana Abdul Karim1, Mohd Marzuki Mustafa2, Mohd Asyraf Zulkifley3

Department of Electrical, Electronic and Systems EngineeringFaculty of Engineering and Built Environment,

Universiti Kebangsaan MalaysiaBangi, 43600, Selangor, Malaysia

[email protected],[email protected],[email protected]

ABSTRACT

In this paper, we proposed a novel algorithm for endoscopic

image matching. The algorithm consists of two main com-

ponents, log-ratio descriptor and probabilistic matching crite-

rion. Log-ratio descriptor is developed by using selected pair

of grayscale intensity information that surround the keypoint.

The spatial distribution of the pairs follow approximately nor-

mal distribution. Then, probabilistic t-test is implemented to

produce a distinctive features descriptor. Acceptable proba-

bility is calculated based on the probability of t-distribution

information. Finally, matching the keypoints is performed by

comparing the acceptable probability and nearest neighbor lo-

cation information. Simulation results show that the proposed

algorithm achieves more than 90% matching in various types

of tissue surface and movement.

Index Terms— log ratio descriptor,matching keypoint,

endoscopic image

1. INTRODUCTION

Minimal invasive surgery (MIS) is an advanced technology

for surgery that aims to reduce injury to tissues, lesser pain,

fewer scars and faster recovery speed. The surgery is done by

performing a small incision, in which specialized instruments

are fed into the patient body. Usually, shape of the instru-

ments are long, thin and slender. The main operating tools

for MIS are endoscope, fiber optics and end effectors. The

endoscope has a camera mounted to its tip where it is used to

capture internal organ, tissues as well as texture. The captured

image is known as endoscopic image.

In recent years, there has been rising interest in au-

tonomous endoscopic image processing such as for image

enhancement [1] and classification [2]. The main reason is

better processing that leads to improved input image, espe-

cially for MIS where a small enhancement can distinguish

better the internal organs. Moreover, even with bare eyes

observation, the surgeon can benefit much through better vi-

sualization. The images will be displayed on devices such as

television or LED monitor. The surgeons just need to observe

the screen to recognize and highlight any sign of disease

during the inspection.

In telementoring system, telepointer technology [3] is

used to mark and guide the right entry points for the incision

based on endoscopic image [4]. However, internal organs and

tissue surfaces are non-rigid in nature, which are continually

moveable unconsciously. As a result, landmark location that

had been pointed previously will not remain at the same loca-

tion a moment later that leads to wrong localization. Besides,

entry points marking can also change due to interruption from

the surgery equipments. Unfortunately, this noise is difficult

to identify and hard to rectify since the exact location after

the movement cannot be identified. The reason is similar ap-

pearance of the internal organs where the tissues surface are

generic and poor texture. Hence, it is difficult to distinguish

the landmark features from their local environment and lead

to inaccurate localization.

The aim of our proposed system is to maintain the image

registration by matching the keypoints features. The goal is to

keep tracking the landmark features regardless of the keypoint

movement. This paper quantitatively determines the effec-

tiveness log-ratio descriptor in matching the keypoints of the

internal organs. Thus, a pair of log-ratio approach is imple-

mented as descriptor for probabilistic matching of endoscopic

image.

2. RELATED WORKS

Feature descriptor matching is one of the approaches for

searching similarity between two or more objects in con-

secutive frames. Generally, the descriptor can be classified

into two schemes, which are based on 1) appearance and 2)

geometric image transformation. An appearance-based de-

scriptor leverages on the information surrounding a keypoint

such as gradient, intensity, location and colour to build a

unique signature. Nevertheless, keypoints localization from

one frame to the next frame requires a more flexible and

distinctive descriptor. Therefore, geometric image transfor-

mation method such as rotation, scale, angle, and orientation

invariants are needed to precisely describes the keypoints.

2014 IEEE Workshop on Statistical Signal Processing (SSP)

978-1-4799-4975-5/14/$31.00 ©2014 IEEE 185

SIFT descriptor based on 3-D spatial histogram of gra-

dient image [5] is categorized under geometric image trans-

formation. This method will not perform well in endoscopic

image [6] due to non-rigid nature of the tissues. Thus, numer-

ous studies have attempted to provide alternative descriptor

for the endoscopic image matching. To overcome SIFT draw-

back, Du et al. [6] introduced zone matching to obtain more

matching pairs. Another alternative is FREAK where it can

be computed faster for keypoint matching because of binary-

based descriptor. It development was motivated from human

visual, specifically retina system. Initially, FREAK was pur-

posely designed for embedded applications where Nguyen et.

al [7] have firstly adapted it for endoscopic image matching

by limiting the number of matched keypoints and altering the

weight association between the current frame and reference

frame. Besides, Mountney and Yang have proposed a context

specific descriptor [8], which is intentionally invented for en-

doscopic image matching. They represented the descriptor in

the form of decision tree. Prior to building feature descrip-

tor, patch data will be trained with a numbers of tests. The

test will compare the intensities and color values for a pair

location within the patch to decide either it is a feature or

not. In contrary, [5] [6] [7] are insensitive to sudden illumina-

tion changes as the tissue moves forward and backward from

lighting spot. This motivates us to decrease unexpected illu-

mination changes.

3. METHODOLOGY

3.1. Point of Interest Detection

STAR detector is chosen as the base point detector. It is

the modified version of the Center Surrounded Extremas for

Real-time Feature Detection (CenSuRE) [9]. STAR is built

around scale-space concept which is invariant to illumina-

tion, scale, rotation, affine, and perspective changes. STAR

detector consists of three main steps: convolution computa-

tion, non-maximal suppression and corner detection. STAR

detector uses two square boxes as convolution kernel to ap-

proximate the bilevel Laplacian of Gaussian (LoG). One of

the kernel is rotated by 45 degrees. It will be attached to the

other kernel with the same pivot point. Then, non-maximal

suppression filters the convolution output. This process iden-

tify extrema values either it is a maxima or a minima in 3 x

3 x 3 neighborhood kernel. Lastly, scale-adapted Harris mea-

sure is used to detect corner to strengthen the criteria of a true

keypoint. Points that are corner and extrema are selected as

the final keypoints.

3.2. Log-Ratio Pair Descriptor

A patch of 25 x 25 p is firstly setup. Each keypoint will be

the center point of the patch. The patch is then smoothed out

with Gaussian kernel of 3 x 3 to reduce noise sensitivity.

Let (Xj , Yj) be the first keypoint location and (Xj′ , Yj′)

is the second keypoint location for one sample size, N . All

locations must be uniquely select by the user. The spatial

arrangement for selecting (X,Y ) is similar to BRIEF method

(II) that follows gaussian distribution. The advantage of using

gaussian distribution is the center point will be sampled more

compared to edge point. Let define N as

N := (d|d ∈ d < 30) (1)

For N less than 30, log-normal distribution will be main-

tained; otherwise it will follow normal distribution.

Feature vectors Desc are calculated by using the follow-

ingDesc (i)k = ln(1...N)

I (Xj , Yj)

I (Xj′ , Yj′)(2)

ik := {(1, 2, 3, 4, 5, 6, ...+∞]} (3)

where I is pixel intensity in grayscale space on p. i is a set of

keypoints for the current frame k. In each frame, some pixels

will have an overflow value because of lighting spot and spec-

ular reflection, while lower intensity than expected can also

happened because of sudden illumination changes. Therefore,

the intensity value is normalized to a smaller range by using

log transformation. Combination of log and ratio formats will

tend to normally distribute the quality of the keypoints.

3.3. Feature Matching

Feature matching is essential to relate the keypoints between

two frames. The proposed feature matching is divided into

three steps. First, keypoints are filtered based on physi-

cal proximity, characteristic scale and orientation proper-

ties. Secondly, probabilistic matching module and thirdly is

neighborhood consistency test. For the first and third steps,

our method is similar to [10] with little modifications where

orientation calculation and parameters for temporal displace-

ment are modified. The parameters for temporal matching

remains the same as in [10]. Table 1 summarizes the modifi-

cation for the first and third steps. Hamming distance is then

improved by using a new probabilistic matching criteria to

derive the distance ratio between the descriptors.

The first step aims is to prune any keypoint in the previous

frame k − 1 that exceeds the filter threshold T . The output Cwill list out the candidate keypoints that might be the probable

match with ik such that it reduces the computation burden of

the second step. Mathematically, it can be expressed as

C = ik−1 < T (4)

The idea for the second step is to compare mean of the ref-

erence sample Desc (i)k with mean of the observed sample

Desc (C) and relative variation from the average data. This

test will identify whether both samples have similar descriptor

or not. T-test was chosen as the statistical test to accomplish

the idea because of two reasons: 1) feature descriptor is a

sample observed from the true population (the whole patch);

hence the mean and standard deviation of the true population

are unknown. 2) Sample data is in a form of numeric and


186

continuous data. Hence, both reasons fit the statistical test for

comparing two samples.

The simulation test to validate the algorithm is classified

into two parts. First, a null hypothesis is constructed where in

our case there is no difference in the mean between reference

sample with the observed sample. Secondly, probability of

matching is calculated to find either it falls into the acceptable

interval or not. This step requires a pre-calculated data of

log-ratio difference (LRD) for a pair of samples,mean μ and

standard deviation s of LRD. LRD is the difference between

reference sample with the observed sample.

LRD = Desc (i)k −Desc (C) (5)

In this paper, the acceptable interval of true mean ref-

erence sample is 30% of the confidence interval. A lower

confidence interval value makes it more selective in choosing

the sample data. Hence, it will produce more distinct feature

descriptor. Confidence interval is calculated by using equa-

tion 6;

L1, L2 = Q± t

(S√N

)(6)

where Q is mean of the reference sample, t is a t-value for

confidence interval percentage and S is standard deviation of

the reference sample.

Next, the probabilities are calculated by using t-distribution

with a degree of freedom N − 1. Let P1 be the probabil-

ity with standard t that deviates less than t1 and P2 be the

probability with t2 that deviates higher than standard t. The

acceptable probability (AP ) for mean LRD where it is only

true if AP is higher than 0.8. This infers that both samples

come from a common source.

t1 =(L1− μ)

Ssand t2 =

(L2− μ)

Ss(7)

where SS =s√N

(8)

AP = 1− (P1 + P2) (9)

However, some of the keypoints return more than one

sample that have higher threshold value than AP. In this case,

dual matching method is used by matching the keypoints with

their nearest neighbour location. This process identifies the

best matched keypoints among the most probable candidates.

The last step is neighbourhood consistency. This step test

the relative spatial movement of the pixels where dot product

is applied if temporal displacement greater than 15 pixels for

each matched feature.

Table 1: Differences Between Our Works and [10]

Item Original Modification

Orientation |θSURF − θSIFT | |θSIFT − θSIFT |Temporal

displacement 5 15

(a) Heartbeat (b) Series zoom

(c) Uncounsicous

movement

(d) Rotation (e) Translation

Fig. 1: Output samples of the matching process. Red color in-

dicates the keypoint has a match with another keypoint in the

previous frame. Blue color depicts that the keypoints which

is not matched with any keypoint in the previous frame.

4. SIMULATION RESULTS AND DISCUSSION

The proposed algorithm was tested by using 5 videos [11] in

which the videos contain translation, scale, rotation, uncon-

scious movement and heartbeat noise. Each video consists of

301 frames. To evaluate the performance of the algorithm,

the percentage of the new features found that are correctly

matched to previously detected features are calculated. Cor-

rect match TP is defined as a keypoint in the current frame

that has matched descriptor with the previous keypoint and

located within a certain range compared to the last known lo-

cation. As for example, a false positive FP is a keypoint that

has been identified as a match but has temporal displacement

value more then the allowed threshold. The assumption of five

pixels threshold is derived based on the characteristics of the

in-vivo surgery videos. The characteristic indicates that the

spatial movement of the tissue will slightly change while fea-

ture movement will be similar to its neighboring features [10].

Fig.1 shows some output samples of the matching process.

Matched% =(TP − FP )

Totalnewkeypointfound∗ 100 (10)

It is apparent from the graph in Fig.2 that our algorithm

performs well with an average matching percentage more

than 90%. The results infer that the fusion of log-ratio de-

scriptor and probabilistic matching criteria is able to produce

adaptable keypoint matching module even with variety of

tissue movements. Besides, the proposed algorithm is able

to discriminate the keypoints eventhough the internal organs

have similar tissue surface and poor texture.

On the other hand, the matched percentages are not con-

sistent for all tested videos. Video with a translation move-

ment records the lowest performance due to object lost. Dur-

ing the translation movement, camera view will move up and

down, right to left and vice versa which leads to track lost of


187

Fig. 2: Average percentage of the new features matched with

the previously detected features in various types of tissue

movements.

the keypoint. Part of the captured area in the current frame

is not captured in the previous frame. Therefore, a lot of new

features detected are not recognized compared to the previous

frame, which reduces the matching performance.

5. CONCLUSION

This study was designed to determine the effect of log-ratio

descriptor for endoscopic image matching. To suit the in-

tended goal, we introduced probabilistic matching criteria

that has been inspired from probabilistic t-distribution and

nearest neighbour location. The results show that the pro-

posed algorithms performs well in finding a good match for

endoscopic image with average of more than 90% matched

keypoints. The algorithm can be further improved by enhanc-

ing and broaden the matching criteria through association

matching. In addition, the effect of spatial arrangement for

selecting log-ratio pairs descriptors can be investigated for

better matching performance as well as implementation of

colour constancy in keypoint matching [12].

6. ACKNOWLEDGMENT

We would like to acknowledge funding from Universiti

Kebangsaan Malaysia (GGPM-2012-062) and Universiti

Malaysia Pahang for the SLAI/KPT scholarship awarded

to the first author.

7. REFERENCES

[1] H Okuhata, H Nakamura, S Hara, H Tsutsui, and

T Onoye, “Application of the real-time Retinex im-

age enhancement for endoscopic images.,” Conferenceproceedings : ... Annual International Conference ofthe IEEE Engineering in Medicine and Biology Soci-ety. IEEE Engineering in Medicine and Biology Society.Conference, vol. 2013, pp. 3407–10, Jan. 2013.

[2] M Hafner, M Liedlgruber, A Uhl, A Vecsei, and F Wrba,

“Color treatment in endoscopic image classification us-

ing multi-scale local color vector patterns.,” Medicalimage analysis, vol. 16, no. 1, pp. 75–86, Jan. 2012.

[3] Rohana Abdul Karim, Nor Farizan Zakaria,

Mohd Asyraf Zulkifley, Mohd Marzuki Mustafa,

Ismail Sagap, and Nani Harlina Md Latar, “Telepointer

technology in telemedicine : a review,” BioMedicalEngineering OnLine, vol. 12, no. 1, pp. 21, 2013.

[4] J V Clarke, A H Deakin, A C Nicol, and F Picard, “Mea-

suring the positional accuracy of computer assisted sur-

gical tracking systems,” Computer Aided Surgery, vol.

15, no. 1-3, pp. 13–18, 2010.

[5] David G. Lowe, “Distinctive Image Features from

Scale-Invariant Keypoints,” International Journal ofComputer Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.

[6] Pengfei Du, Ya Zhou, Qiaona Xing, and Xiaoming Hu,

“Improved SIFT matching algorithm for 3D reconstruc-

tion from endoscopic images,” in Proceedings of the10th International Conference on Virtual Reality Con-tinuum and Its Applications in Industry - VRCAI ’11,

New York, New York, USA, Dec. 2011, p. 561, ACM

Press.

[7] Thinh T Nguyen, Hoeryong Jung, and Doo Yong Lee,

“Markerless tracking for augmented reality for image-

guided Endoscopic Retrograde Cholangiopancreatogra-

phy.,” Conference proceedings : ... Annual InternationalConference of the IEEE Engineering in Medicine andBiology Society. IEEE Engineering in Medicine and Bi-ology Society. Conference, vol. 2013, pp. 7364–7, Jan.

2013.

[8] Peter Mountney and Guang-Zhong Yang, “Context spe-

cific descriptors for tracking deforming tissue.,” Med-ical image analysis, vol. 16, no. 3, pp. 550–61, Apr.

2012.

[9] Motilal Agrawal, Kurt Konolige, and Morten Rufus

Blas, “CenSurE : Center Surround Extremas for Re-

altime Feature Detection and Matching,” pp. 102–115,

2008.

[10] Michael C. Yip, David G. Lowe, Septimiu E. Salcudean,

Robert N. Rohling, and Christopher Y. Nguan, “Tissue

Tracking and Registration for Image-Guided Surgery,”

IEEE Transactions on Medical Imaging, vol. 31, no. 11,

pp. 2169–2182, Nov. 2012.

[11] Hamlyn Centre Laparoscopic / Endoscopic Video

Datasets, “http://hamlyn.doc.ic.ac.uk/vision/,” .

[12] Mohd Asyraf Zulkifley, Wan Mimi Diyana Wan Zaki,

Aini Hussain, and Mohd Marzuki Mustafa, “Enhance-

ment of surf performance through masked grey world

approach,” Journal of Computational Information Sys-tems, vol. 8, no. 9, pp. 3911–3919, 2012.


188

[IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia...

Documents

Transcript of [IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia...