[IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia...
-
Upload
mohd-asyraf -
Category
Documents
-
view
217 -
download
1
Transcript of [IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia...
![Page 1: [IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia (2014.6.29-2014.7.2)] 2014 IEEE Workshop on Statistical Signal Processing (SSP) - A log-ratio pair](https://reader031.fdokumen.site/reader031/viewer/2022021813/5750a4ce1a28abcf0cad291d/html5/thumbnails/1.jpg)
A LOG-RATIO PAIR APPROACH TO ENDOSCOPIC IMAGE MATCHING
Rohana Abdul Karim1, Mohd Marzuki Mustafa2, Mohd Asyraf Zulkifley3
Department of Electrical, Electronic and Systems EngineeringFaculty of Engineering and Built Environment,
Universiti Kebangsaan MalaysiaBangi, 43600, Selangor, Malaysia
[email protected],[email protected],[email protected]
ABSTRACT
In this paper, we proposed a novel algorithm for endoscopic
image matching. The algorithm consists of two main com-
ponents, log-ratio descriptor and probabilistic matching crite-
rion. Log-ratio descriptor is developed by using selected pair
of grayscale intensity information that surround the keypoint.
The spatial distribution of the pairs follow approximately nor-
mal distribution. Then, probabilistic t-test is implemented to
produce a distinctive features descriptor. Acceptable proba-
bility is calculated based on the probability of t-distribution
information. Finally, matching the keypoints is performed by
comparing the acceptable probability and nearest neighbor lo-
cation information. Simulation results show that the proposed
algorithm achieves more than 90% matching in various types
of tissue surface and movement.
Index Terms— log ratio descriptor,matching keypoint,
endoscopic image
1. INTRODUCTION
Minimal invasive surgery (MIS) is an advanced technology
for surgery that aims to reduce injury to tissues, lesser pain,
fewer scars and faster recovery speed. The surgery is done by
performing a small incision, in which specialized instruments
are fed into the patient body. Usually, shape of the instru-
ments are long, thin and slender. The main operating tools
for MIS are endoscope, fiber optics and end effectors. The
endoscope has a camera mounted to its tip where it is used to
capture internal organ, tissues as well as texture. The captured
image is known as endoscopic image.
In recent years, there has been rising interest in au-
tonomous endoscopic image processing such as for image
enhancement [1] and classification [2]. The main reason is
better processing that leads to improved input image, espe-
cially for MIS where a small enhancement can distinguish
better the internal organs. Moreover, even with bare eyes
observation, the surgeon can benefit much through better vi-
sualization. The images will be displayed on devices such as
television or LED monitor. The surgeons just need to observe
the screen to recognize and highlight any sign of disease
during the inspection.
In telementoring system, telepointer technology [3] is
used to mark and guide the right entry points for the incision
based on endoscopic image [4]. However, internal organs and
tissue surfaces are non-rigid in nature, which are continually
moveable unconsciously. As a result, landmark location that
had been pointed previously will not remain at the same loca-
tion a moment later that leads to wrong localization. Besides,
entry points marking can also change due to interruption from
the surgery equipments. Unfortunately, this noise is difficult
to identify and hard to rectify since the exact location after
the movement cannot be identified. The reason is similar ap-
pearance of the internal organs where the tissues surface are
generic and poor texture. Hence, it is difficult to distinguish
the landmark features from their local environment and lead
to inaccurate localization.
The aim of our proposed system is to maintain the image
registration by matching the keypoints features. The goal is to
keep tracking the landmark features regardless of the keypoint
movement. This paper quantitatively determines the effec-
tiveness log-ratio descriptor in matching the keypoints of the
internal organs. Thus, a pair of log-ratio approach is imple-
mented as descriptor for probabilistic matching of endoscopic
image.
2. RELATED WORKS
Feature descriptor matching is one of the approaches for
searching similarity between two or more objects in con-
secutive frames. Generally, the descriptor can be classified
into two schemes, which are based on 1) appearance and 2)
geometric image transformation. An appearance-based de-
scriptor leverages on the information surrounding a keypoint
such as gradient, intensity, location and colour to build a
unique signature. Nevertheless, keypoints localization from
one frame to the next frame requires a more flexible and
distinctive descriptor. Therefore, geometric image transfor-
mation method such as rotation, scale, angle, and orientation
invariants are needed to precisely describes the keypoints.
2014 IEEE Workshop on Statistical Signal Processing (SSP)
978-1-4799-4975-5/14/$31.00 ©2014 IEEE 185
![Page 2: [IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia (2014.6.29-2014.7.2)] 2014 IEEE Workshop on Statistical Signal Processing (SSP) - A log-ratio pair](https://reader031.fdokumen.site/reader031/viewer/2022021813/5750a4ce1a28abcf0cad291d/html5/thumbnails/2.jpg)
SIFT descriptor based on 3-D spatial histogram of gra-
dient image [5] is categorized under geometric image trans-
formation. This method will not perform well in endoscopic
image [6] due to non-rigid nature of the tissues. Thus, numer-
ous studies have attempted to provide alternative descriptor
for the endoscopic image matching. To overcome SIFT draw-
back, Du et al. [6] introduced zone matching to obtain more
matching pairs. Another alternative is FREAK where it can
be computed faster for keypoint matching because of binary-
based descriptor. It development was motivated from human
visual, specifically retina system. Initially, FREAK was pur-
posely designed for embedded applications where Nguyen et.
al [7] have firstly adapted it for endoscopic image matching
by limiting the number of matched keypoints and altering the
weight association between the current frame and reference
frame. Besides, Mountney and Yang have proposed a context
specific descriptor [8], which is intentionally invented for en-
doscopic image matching. They represented the descriptor in
the form of decision tree. Prior to building feature descrip-
tor, patch data will be trained with a numbers of tests. The
test will compare the intensities and color values for a pair
location within the patch to decide either it is a feature or
not. In contrary, [5] [6] [7] are insensitive to sudden illumina-
tion changes as the tissue moves forward and backward from
lighting spot. This motivates us to decrease unexpected illu-
mination changes.
3. METHODOLOGY
3.1. Point of Interest Detection
STAR detector is chosen as the base point detector. It is
the modified version of the Center Surrounded Extremas for
Real-time Feature Detection (CenSuRE) [9]. STAR is built
around scale-space concept which is invariant to illumina-
tion, scale, rotation, affine, and perspective changes. STAR
detector consists of three main steps: convolution computa-
tion, non-maximal suppression and corner detection. STAR
detector uses two square boxes as convolution kernel to ap-
proximate the bilevel Laplacian of Gaussian (LoG). One of
the kernel is rotated by 45 degrees. It will be attached to the
other kernel with the same pivot point. Then, non-maximal
suppression filters the convolution output. This process iden-
tify extrema values either it is a maxima or a minima in 3 x
3 x 3 neighborhood kernel. Lastly, scale-adapted Harris mea-
sure is used to detect corner to strengthen the criteria of a true
keypoint. Points that are corner and extrema are selected as
the final keypoints.
3.2. Log-Ratio Pair Descriptor
A patch of 25 x 25 p is firstly setup. Each keypoint will be
the center point of the patch. The patch is then smoothed out
with Gaussian kernel of 3 x 3 to reduce noise sensitivity.
Let (Xj , Yj) be the first keypoint location and (Xj′ , Yj′)
is the second keypoint location for one sample size, N . All
locations must be uniquely select by the user. The spatial
arrangement for selecting (X,Y ) is similar to BRIEF method
(II) that follows gaussian distribution. The advantage of using
gaussian distribution is the center point will be sampled more
compared to edge point. Let define N as
N := (d|d ∈ d < 30) (1)
For N less than 30, log-normal distribution will be main-
tained; otherwise it will follow normal distribution.
Feature vectors Desc are calculated by using the follow-
ingDesc (i)k = ln(1...N)
I (Xj , Yj)
I (Xj′ , Yj′)(2)
ik := {(1, 2, 3, 4, 5, 6, ...+∞]} (3)
where I is pixel intensity in grayscale space on p. i is a set of
keypoints for the current frame k. In each frame, some pixels
will have an overflow value because of lighting spot and spec-
ular reflection, while lower intensity than expected can also
happened because of sudden illumination changes. Therefore,
the intensity value is normalized to a smaller range by using
log transformation. Combination of log and ratio formats will
tend to normally distribute the quality of the keypoints.
3.3. Feature Matching
Feature matching is essential to relate the keypoints between
two frames. The proposed feature matching is divided into
three steps. First, keypoints are filtered based on physi-
cal proximity, characteristic scale and orientation proper-
ties. Secondly, probabilistic matching module and thirdly is
neighborhood consistency test. For the first and third steps,
our method is similar to [10] with little modifications where
orientation calculation and parameters for temporal displace-
ment are modified. The parameters for temporal matching
remains the same as in [10]. Table 1 summarizes the modifi-
cation for the first and third steps. Hamming distance is then
improved by using a new probabilistic matching criteria to
derive the distance ratio between the descriptors.
The first step aims is to prune any keypoint in the previous
frame k − 1 that exceeds the filter threshold T . The output Cwill list out the candidate keypoints that might be the probable
match with ik such that it reduces the computation burden of
the second step. Mathematically, it can be expressed as
C = ik−1 < T (4)
The idea for the second step is to compare mean of the ref-
erence sample Desc (i)k with mean of the observed sample
Desc (C) and relative variation from the average data. This
test will identify whether both samples have similar descriptor
or not. T-test was chosen as the statistical test to accomplish
the idea because of two reasons: 1) feature descriptor is a
sample observed from the true population (the whole patch);
hence the mean and standard deviation of the true population
are unknown. 2) Sample data is in a form of numeric and
2014 IEEE Workshop on Statistical Signal Processing (SSP)
186
![Page 3: [IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia (2014.6.29-2014.7.2)] 2014 IEEE Workshop on Statistical Signal Processing (SSP) - A log-ratio pair](https://reader031.fdokumen.site/reader031/viewer/2022021813/5750a4ce1a28abcf0cad291d/html5/thumbnails/3.jpg)
continuous data. Hence, both reasons fit the statistical test for
comparing two samples.
The simulation test to validate the algorithm is classified
into two parts. First, a null hypothesis is constructed where in
our case there is no difference in the mean between reference
sample with the observed sample. Secondly, probability of
matching is calculated to find either it falls into the acceptable
interval or not. This step requires a pre-calculated data of
log-ratio difference (LRD) for a pair of samples,mean μ and
standard deviation s of LRD. LRD is the difference between
reference sample with the observed sample.
LRD = Desc (i)k −Desc (C) (5)
In this paper, the acceptable interval of true mean ref-
erence sample is 30% of the confidence interval. A lower
confidence interval value makes it more selective in choosing
the sample data. Hence, it will produce more distinct feature
descriptor. Confidence interval is calculated by using equa-
tion 6;
L1, L2 = Q± t
(S√N
)(6)
where Q is mean of the reference sample, t is a t-value for
confidence interval percentage and S is standard deviation of
the reference sample.
Next, the probabilities are calculated by using t-distribution
with a degree of freedom N − 1. Let P1 be the probabil-
ity with standard t that deviates less than t1 and P2 be the
probability with t2 that deviates higher than standard t. The
acceptable probability (AP ) for mean LRD where it is only
true if AP is higher than 0.8. This infers that both samples
come from a common source.
t1 =(L1− μ)
Ssand t2 =
(L2− μ)
Ss(7)
where SS =s√N
(8)
AP = 1− (P1 + P2) (9)
However, some of the keypoints return more than one
sample that have higher threshold value than AP. In this case,
dual matching method is used by matching the keypoints with
their nearest neighbour location. This process identifies the
best matched keypoints among the most probable candidates.
The last step is neighbourhood consistency. This step test
the relative spatial movement of the pixels where dot product
is applied if temporal displacement greater than 15 pixels for
each matched feature.
Table 1: Differences Between Our Works and [10]
Item Original Modification
Orientation |θSURF − θSIFT | |θSIFT − θSIFT |Temporal
displacement 5 15
(a) Heartbeat (b) Series zoom
(c) Uncounsicous
movement
(d) Rotation (e) Translation
Fig. 1: Output samples of the matching process. Red color in-
dicates the keypoint has a match with another keypoint in the
previous frame. Blue color depicts that the keypoints which
is not matched with any keypoint in the previous frame.
4. SIMULATION RESULTS AND DISCUSSION
The proposed algorithm was tested by using 5 videos [11] in
which the videos contain translation, scale, rotation, uncon-
scious movement and heartbeat noise. Each video consists of
301 frames. To evaluate the performance of the algorithm,
the percentage of the new features found that are correctly
matched to previously detected features are calculated. Cor-
rect match TP is defined as a keypoint in the current frame
that has matched descriptor with the previous keypoint and
located within a certain range compared to the last known lo-
cation. As for example, a false positive FP is a keypoint that
has been identified as a match but has temporal displacement
value more then the allowed threshold. The assumption of five
pixels threshold is derived based on the characteristics of the
in-vivo surgery videos. The characteristic indicates that the
spatial movement of the tissue will slightly change while fea-
ture movement will be similar to its neighboring features [10].
Fig.1 shows some output samples of the matching process.
Matched% =(TP − FP )
Totalnewkeypointfound∗ 100 (10)
It is apparent from the graph in Fig.2 that our algorithm
performs well with an average matching percentage more
than 90%. The results infer that the fusion of log-ratio de-
scriptor and probabilistic matching criteria is able to produce
adaptable keypoint matching module even with variety of
tissue movements. Besides, the proposed algorithm is able
to discriminate the keypoints eventhough the internal organs
have similar tissue surface and poor texture.
On the other hand, the matched percentages are not con-
sistent for all tested videos. Video with a translation move-
ment records the lowest performance due to object lost. Dur-
ing the translation movement, camera view will move up and
down, right to left and vice versa which leads to track lost of
2014 IEEE Workshop on Statistical Signal Processing (SSP)
187
![Page 4: [IEEE 2014 IEEE Statistical Signal Processing Workshop (SSP) - Gold Coast, Australia (2014.6.29-2014.7.2)] 2014 IEEE Workshop on Statistical Signal Processing (SSP) - A log-ratio pair](https://reader031.fdokumen.site/reader031/viewer/2022021813/5750a4ce1a28abcf0cad291d/html5/thumbnails/4.jpg)
Fig. 2: Average percentage of the new features matched with
the previously detected features in various types of tissue
movements.
the keypoint. Part of the captured area in the current frame
is not captured in the previous frame. Therefore, a lot of new
features detected are not recognized compared to the previous
frame, which reduces the matching performance.
5. CONCLUSION
This study was designed to determine the effect of log-ratio
descriptor for endoscopic image matching. To suit the in-
tended goal, we introduced probabilistic matching criteria
that has been inspired from probabilistic t-distribution and
nearest neighbour location. The results show that the pro-
posed algorithms performs well in finding a good match for
endoscopic image with average of more than 90% matched
keypoints. The algorithm can be further improved by enhanc-
ing and broaden the matching criteria through association
matching. In addition, the effect of spatial arrangement for
selecting log-ratio pairs descriptors can be investigated for
better matching performance as well as implementation of
colour constancy in keypoint matching [12].
6. ACKNOWLEDGMENT
We would like to acknowledge funding from Universiti
Kebangsaan Malaysia (GGPM-2012-062) and Universiti
Malaysia Pahang for the SLAI/KPT scholarship awarded
to the first author.
7. REFERENCES
[1] H Okuhata, H Nakamura, S Hara, H Tsutsui, and
T Onoye, “Application of the real-time Retinex im-
age enhancement for endoscopic images.,” Conferenceproceedings : ... Annual International Conference ofthe IEEE Engineering in Medicine and Biology Soci-ety. IEEE Engineering in Medicine and Biology Society.Conference, vol. 2013, pp. 3407–10, Jan. 2013.
[2] M Hafner, M Liedlgruber, A Uhl, A Vecsei, and F Wrba,
“Color treatment in endoscopic image classification us-
ing multi-scale local color vector patterns.,” Medicalimage analysis, vol. 16, no. 1, pp. 75–86, Jan. 2012.
[3] Rohana Abdul Karim, Nor Farizan Zakaria,
Mohd Asyraf Zulkifley, Mohd Marzuki Mustafa,
Ismail Sagap, and Nani Harlina Md Latar, “Telepointer
technology in telemedicine : a review,” BioMedicalEngineering OnLine, vol. 12, no. 1, pp. 21, 2013.
[4] J V Clarke, A H Deakin, A C Nicol, and F Picard, “Mea-
suring the positional accuracy of computer assisted sur-
gical tracking systems,” Computer Aided Surgery, vol.
15, no. 1-3, pp. 13–18, 2010.
[5] David G. Lowe, “Distinctive Image Features from
Scale-Invariant Keypoints,” International Journal ofComputer Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.
[6] Pengfei Du, Ya Zhou, Qiaona Xing, and Xiaoming Hu,
“Improved SIFT matching algorithm for 3D reconstruc-
tion from endoscopic images,” in Proceedings of the10th International Conference on Virtual Reality Con-tinuum and Its Applications in Industry - VRCAI ’11,
New York, New York, USA, Dec. 2011, p. 561, ACM
Press.
[7] Thinh T Nguyen, Hoeryong Jung, and Doo Yong Lee,
“Markerless tracking for augmented reality for image-
guided Endoscopic Retrograde Cholangiopancreatogra-
phy.,” Conference proceedings : ... Annual InternationalConference of the IEEE Engineering in Medicine andBiology Society. IEEE Engineering in Medicine and Bi-ology Society. Conference, vol. 2013, pp. 7364–7, Jan.
2013.
[8] Peter Mountney and Guang-Zhong Yang, “Context spe-
cific descriptors for tracking deforming tissue.,” Med-ical image analysis, vol. 16, no. 3, pp. 550–61, Apr.
2012.
[9] Motilal Agrawal, Kurt Konolige, and Morten Rufus
Blas, “CenSurE : Center Surround Extremas for Re-
altime Feature Detection and Matching,” pp. 102–115,
2008.
[10] Michael C. Yip, David G. Lowe, Septimiu E. Salcudean,
Robert N. Rohling, and Christopher Y. Nguan, “Tissue
Tracking and Registration for Image-Guided Surgery,”
IEEE Transactions on Medical Imaging, vol. 31, no. 11,
pp. 2169–2182, Nov. 2012.
[11] Hamlyn Centre Laparoscopic / Endoscopic Video
Datasets, “http://hamlyn.doc.ic.ac.uk/vision/,” .
[12] Mohd Asyraf Zulkifley, Wan Mimi Diyana Wan Zaki,
Aini Hussain, and Mohd Marzuki Mustafa, “Enhance-
ment of surf performance through masked grey world
approach,” Journal of Computational Information Sys-tems, vol. 8, no. 9, pp. 3911–3919, 2012.
2014 IEEE Workshop on Statistical Signal Processing (SSP)
188