& 2006 International Society of Nephrology
A novel approach for accurate prediction ofspontaneous passage of ureteral stones: Supportvector machines
F Dal Moro1,3, A Abate2,3, GRG Lanckriet2,3, G Arandjelovic1, P Gasparella1, P Bassi1, M Mancini1and F Pagano1
1Department of Urology, University of Padova, Padova, Italy and 2Electrical Engineering and Computer Sciences Department, Universityof California at Berkeley, Berkeley, California, USA
The objective of this study was to optimally predict the
Referring to the statistics on the incidence of kidney stone
spontaneous passage of ureteral stones in patients with renal
disease in industrialized countries, we understand how
colic by applying for the first time support vector machines
important it is to correctly analyze this pathology in order
(SVM), an instance of kernel methods, for classification. After
to predict accurately which patients need what sort of
reviewing the results found in the literature, we compared
intervention. Everybody agrees that considering the stone size
the performances obtained with logistic regression (LR) and
is the most important factor for predicting the spontaneous
accurately trained artificial neural networks (ANN) to those
passage of calculi.1,2 However, this does not seem discrimi-
obtained with SVM, that is, the standard SVM, and the linear
native enough when calculi are of mid-size dimensions. At
programming SVM (LP-SVM); the latter techniques show an
this stage, the urologist needs more information in order to
improved performance. Moreover, we rank the prediction
take a valid clinical decision, but there is no demonstrated
factors according to their importance using Fisher scores and
result as to which factor should be considered first and what
the LP-SVM feature weights. A data set of 1163 patients
are the actual interactions between all the factors.3
affected by renal colic has been analyzed and restricted to
In literature, the statistical methodologies employed have
single out a statistically coherent subset of 402 patients. Nine
been the multivariate logistic regression (LR)4 and the
clinical factors are used as inputs for the classification
artificial neural network (ANN).5 In this work, we propose
algorithms, to predict one binary output. The algorithms are
to use the recently developed support vector machines
cross-validated by training and testing on randomly selected
(SVM),6–8 an instance of kernel methods, for classification, as
train- and test-set partitions of the data and reporting the
well as linear programming SVM (LP-SVM);9 these are in
average performance on the test sets. The SVM-based
general believed to outperform the ANN.8,10
approaches obtained a sensitivity of 84.5% and a specificity
The paper will unfold as follows: along with a critical
of 86.9%. The feature ranking based on LP-SVM gives the
analysis of the results presented in medical literature – with a
highest importance to stone size, stone position and
special focus on the ANN – we describe how the statistical
symptom duration before check-up. We propose a
tests are performed. Critical results follow, and a discussion
statistically correct way of employing LR, ANN and SVM for
on their significance, both technically and clinically, is
the prediction of spontaneous passage of ureteral stones
developed. Conclusions mark the state of the art of our
in patients with renal colic. SVM outperformed ANN, as well
work, and define some future directions of our research.
as LR. This study will soon be translated into a practicalsoftware toolbox for actual clinical usage.
Kidney International (2006) 69, 157–160. doi:10.1038/sj.ki.5000010
Figure 1 plots the achievable true positive (TP) rate (i.e.,
KEYWORDS: urolithiasis; ureteral calculi; support vector machine; artificial
sensitivity) versus true negative rate (TN) (i.e., specificity) for
intelligence; statistical methods; neural networks
the different learning algorithms. Each of the four plotscorresponds to a different learning algorithm. Each dotwithin a plot corresponds to the average test-set performanceobtained for a certain setting of the algorithm’s ‘hyper-parameters’, that is, parameters that are a priori chosen and
Correspondence: FD Moro, Department of Urology, University of Padova
are endogenous to the actual training procedure. The choice
Medical School, Via Giustiniani, 2, Padova I-35128, Italy. E-mail: fabrizio.
of SVM and LP-SVM reflects the relative importance the
training algorithm should give to false positives versus false
3These authors contributed equally to this work.
negatives. For the ANN and LR, these parameters are,
Received 7 November 2004; revised 16 May 2005; accepted 8 July 2005
respectively, related to the actual structure of the network or
Kidney International (2006) 69, 157–160
F Dal Moro et al.: Accurate prediction of spontaneous passage of ureteral stones
to more technical training issues (weights and thresholds, for
and finally on the first five in the ranking. For both rankings,
similar results were obtained. Using stone size only led to
The best results in prediction accuracy were singled out,
acceptable results. Using more inputs increased the perfor-
picking up a point at the upper-right-most part of each of the
mance, whereas using just the five most important inputs was
four plots (see arrows); using the old method of multivariate
qualitatively equivalent to the results obtained using all
LR, the outcome showed 90.3% sensitivity and 69.7%
inputs. Therefore, we concluded that the remaining four
specificity (Figure 1a). The ANN matched this performance
clinical factors introduce spurious information and, in this
with 94.9% sensitivity and 62.9% specificity (Figure 1b).
specific setting, can be regarded as redundant.
When using an SVM, 84.5% sensitivity and 86.9% specificitycould be obtained (Figure 1d). LP-SVM presented results that
were on the upper rim of the SVM performance (Figure 1c).
Let us first list and highlight the main pitfalls of the results
Again, it was possible to associate to each and every point of
presented in literature, which have mostly been obtained with
this plot a single combination of all the hyper-parameters of
the aid of ANN.11 First of all, the used data sets are often of
relatively low cardinality, a condition that is more likely to
With respect to our second objective, ranking the input
provide poor results or unstable prediction algorithms.12,13
factors, Table 1 shows the ranking obtained using Fisher
Second, the ANN results in literature are based on using
scores and LP-SVM weights, respectively. As both ranking
only one hold-out test set and hence so the reported
approaches are essentially different, we should not necessarily
performance depends heavily on the particular test set that
expect the rankings to be similar. However, when inspecting
is used. Therefore, training and testing should be performed
the results, we saw a rather high overlap within the top five
more than once and the test-set performances averaged out,
values of both rankings (the factor identified as most
to reduce the variance of the performance estimate. Whereas
significant being the same and three factors from the top
most literature ignores this fact, we applied cross-validation
five overlapping in both results). This certainly advocates the
and averaged the performance over 30 randomly chosen test
robustness and significance of the obtained outcomes.
sets, as mentioned before. Therefore, we performed statisti-
Moreover, these rankings were validated by simulations using
cally more accurate tests for all our learning algorithms,
only the more relevant inputs. More precisely, we set up and
ran the training/testing procedure first on the most
Third, we strongly question the ANN results concerning
prominent input, then on the two most influential inputs
the input rankings: it is known that networks with a structurethat is more complex than that of a perceptron (i.e., with oneor more hidden layers), offer no clear connection betweentheir weights and the relative relevance of their inputs.14,5
Also, it is wrong to look at the absolute values of the weights
of even a perceptron when the inputs are not normalized.12,15
We resolve the pitfall of ANN not allowing the determination
of the relative importance of the clinical factors by using
Fisher scores and the LP-SVM approach.
Table 1 | Classes of importance of the spontaneous stone
expulsion factors second to different methods
Figure 1 | Comparison of the average test-set performances for
the four learning algorithms run on normalized data. (a–d) The
axes represent specificity and sensitivity. Each dot within a plotcorresponds to the average test-set performance obtained for acertain setting of algorithm hyper-parameters that are endogenous
to the actual training procedure. As stated in the literature, ANN
slightly improves the results obtained through LR, while the kernel
algorithms outperform the other two methods.
Kidney International (2006) 69, 157–160
F Dal Moro et al.: Accurate prediction of spontaneous passage of ureteral stones
As for the particular strength of the SVM approach, we
cortisone or alpha-blocker agents (i.e., Tamsulosin), prior to
first pointed out the broad range of performances that could
and/or after the colic episode, were already excluded before:
be achieved in the specificity/sensitivity plane (Figure 1), by
in fact, the efficacy of these treatments has been proven by
varying the SVM hyper-parameter settings. This gave rise to a
curve that was similar to a receiver operating characteristic
The fact that the stone size is by and large the most
curve, although more specialized. A usual receiver operating
influential factor explains why the LR (linear) results are not
characteristic curve would be obtained from one set of
too far from those obtained with the (nonlinear) ANN.
classifier weights, using the known testing-threshold shift. In
Nevertheless, the SVM approach is still able to infer deeper
this case, each dot corresponds to a different set of classifier
relationships between inputs and outputs, resulting in a
weights, obtained from SVM training for a specific hyper-
better performance, and therefore represents the method of
parameter setting. These plots show how flexible the SVM is
in terms of specificity/sensitivity trade-off. The ANN and LRoffer a lot less flexibility.
In the case of ANN, we varied several training parameters,
This work proposes the application of the SVM to drastically
resulting in only a small variation in TP and TN rates,
improve the prediction results for intervention on renal colic
although enough to still improve on the prediction accuracy
obtained in the literature. The new results, which outclass
those obtained via LR and the ANN approach, are
The points referring to the SVM, being widely spread
particularly interesting from a clinical perspective, as they
through the TP/TN plot, show how this method can be more
maintain the ANN level of sensitivity (i.e., correctly
predicting that no intervention is needed) while improving
The SVM prediction improves LR and ANN significantly
significantly on the specificity (i.e., correctly predicting the
along the specificity axis. This, important from a statistical
need for an intervention). The authors are willing to translate
standpoint, also has a sharp clinical meaning: a wrong
these algorithms into a software toolbox, which would then
prediction in terms of specificity would result in the patient
help physicians on their fieldwork. This is the first time an
missing an invasive intervention, which would effectively
instance of kernel methods, that is, the SVM, has been
be needed. Thus, it is clear that the best prediction of
applied with success to such clinical data. Intelligent systems
spontaneous stone passage will be one that combines an
such as this could markedly reduce costs of therapeutical
outstanding sensitivity with a remarkable specificity. The
approaches and recoveries for kidney stone disease. Given the
SVM approach offers a great variety of predictive sensitivity/
outstanding performance of SVMs, their application in other
specificity combinations, depending on the setting of its
fields of urology, such as the oncological field, is imminent.
hyper-parameters. If we consider a possible optimal opera-tion point (corresponding to a specific hyper-parameter
setting), that is, 84.5% sensitivity and 86.9% specificity, the
We gathered and sorted the information collected from 1163
SVM approach shows significantly better results than those
patients who were treated for an episode of renal colic in the period
from January to December 2003 in the Urology Institute of the
Focusing on the problem of input ranking, we notice how
Hospital of Padova, Italy. A focused selection of the patients was
the results obtained with the Fisher scores make sense from a
made on the basis of some important criteria. The patients excluded
clinical point of view. In earlier work, the ranking, computed
with ANN, gave questionable results.12,15 The classification
patients in whom the colic episode was due to renal calculi;
obtained with LP-SVM was similar to the first. We compared
patients in whom the actual show-up or expulsion of the calculi
the results obtained by those two methods by splitting the
spontaneous passage factors in three groups of decreasing
patients treated with Ca antagonists, cortisone or a-litics in the
importance according to the weights we obtained, so that we
3 months previous and/or after the colic episode;
could ponder over their clinical value.
patients with anatomic malformations of the excretory tract;
Simulations with an increasing number of input features
transplanted or mono-kidney patients, under more aggressivetherapy;
improved until the ‘heaviest’ five inputs were used, the latter
patients with more than one ureteral calculi;
leading to results equivalent to those obtained when using all
patients in whom the rigorous follow-up at the 3-month check-
input factors. This means that the last four inputs do not add
up from the episode was not possible; and
any further information to the prediction problem and can
patients who, after the axcess to emergency unit underwent
extracorporeal shock wave litothripsy (ESWL), endourological
The hydration and the medical therapy can increase the
or surgical procedures for stone removal.
rate of spontaneous stone passage, but they were not taken
Out of 1163 patients with pieloureteral colic, 402 were found
into consideration as parameters. That is because it is
valuable for experiment, as summarized in Figure 2.
common praxis, when hydration is considered, to advise each
Furthermore, for the actual statistical tests, we considered
patient with renal colic a minimum 2–3 l of water intake a
diagnostic criteria for the renal colic such as spontaneous expulsion
day. Patients who underwent treatment with Ca antagonists,
(as reported by the patient), colic treatments together with ESWL,
Kidney International (2006) 69, 157–160
F Dal Moro et al.: Accurate prediction of spontaneous passage of ureteral stones
with ANN or a standard SVM. The latter is based on a methodology
known as kernel-based learning,7,8 which allows one to come upwith nonlinear versions of many well-known linear statisticalalgorithms. In the case of SVM, the kernel methodology is used to
obtain the nonlinear SVM algorithm, derived from a linear maximalmargin classifier. The algorithms were implemented using MA-
TLABs and commercial optimization software Moseks.
The second objective, ranking the clinical factors according to
their importance, was addressed in two ways. First, by using Fisher
Treatment with Ca antagonists,cortison, antagonists
scores: these scores are computed as the difference in means of thefactor values, computed for each class (i.e., input), corrected by their
variance within each class; these scores therefore analyze theimportance of every input factor independently. Second, we used
the explicit LP-SVM feature weights:8 these weights were obtainedfrom the training algorithm, looking at all factors simultaneously
and thus taking the dependence between the different inputs into
Segura JW, Preminger GM, Assimos DG et al. Ureteral stones clinicalguidelines panel summary report on the management of ureteral calculi.
Figure 2 | Scheme for the selection of patients.
Anagnostu T, Tolley D. Management of ureteric stones. Eur Urol 2004; 45:714–721.
imaging showing ureteral calculi and clinical findings of the
Miller OF, Kane CJ. Time to stone passage for observed ureteral calculi: a
physician during the colic episode. As already mentioned, all the
guide for patient education. J Urol 1999; 162: 688–690.
patients who, after the excess in the emergency unit, underwent
Parekattil SJ, White MD, Moran ME, Kogan BA. A computer model to
ESWL, endourological or surgical treatment for the stone removal
predict the outcome and duration of ureteral or renal calculous passage. J Urol 2004; 171: 1436–1439.
were excluded. The interval between first renal colic and stone
Ramesh AN, Kambhampati C, Monson JR, Drew PJ. Artificial intelligence
passage was 6 months. In total, we considered nine clinically
in medicine. Ann R Coll Surg Engl 2004; 86: 334–338.
important factors (i.e., ‘inputs’) for each of the 408 patients (i.e.,
Cristianini N, Schoelkopf B. Support vector machines and kernel methods.
‘data points’). We selected the factors among those referred to as
Boser BE, Guyon I, Vapnik V. A Training algorithm for optimal margin
most influential in medical literature: age, sex, body mass index,
classifiers. Proc Comput Learn Theory 1992: 144–152, ACM Press.
fever, previous urological treatments, previous expulsion of stones,
Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines.
duration of the symptoms (in hours), dimension and position of the
Cambridge University Press: Cambridge; 2000.
stone.18 With each patient is also associated a ‘binary output’ value,
Bradley PS, Mangasarian OL, Street WN. Feature selection viamathematical programming. INFORMS J Comput 1998; 10: 209–217.
corresponding to two classes of patients, that is, those ones with
Tu JV. Advantages and disadvantages of using artificial neural networks
actual spontaneous expulsion of the stone (0) and those needing an
vs logistic regression for predicting medical outcomes. J Clin Epidemiol
Experiments were performed using the learning algorithms LR,
Batuello JT, Gamito EJ, Crawford E et al. Artificial neural network modelfor the assessment of lymph node spread in patients with clinically
ANN, SVM and LP-SVM. Performance was evaluated using cross-
localized prostate cancer. Urology 2001; 57: 481–485.
validation, a well-known statistical methodology: 50 of the 402 data
Cummings JM, Boullier JA, Izenberg SD et al. Prediction of spontaneous
points (i.e., patients) were randomly selected and not used for
ureteral calculous passage by an artificial neural network. J Urol 2000;
training. After training with LR, ANN and SVM and LP-SVM on the
Bagli DJ, Agarwal SK, Venkateswaran S et al. Artificial neural networks
352 training data points, the accuracy of the trained classifier was
in pediatric urology: prediction of sonographic outcome following
tested on the hold-out test set of the 50 data points, by reporting the
pyeloplasty. J Urol 1998; 160: 980–983.
percentage of correctly predicted spontaneous expulsions (true
Russel S, Norvig P. Artificial Intelligence, A Modern Approach. 2nd edn,
negatives) and the percentage of correctly predicted cases needing
Prentice–Hall: Englewood Cliffs, NJ.
Leane MM, Cumming I, Corrigan OI. The use of artificial neural networks
intervention (true positives). This procedure was repeated 30 times,
for the selection of the most appropriate formulation and processing
resulting in 30 different random splits in training and test sets.
variables in order to predict the in vitro dissolution of sustained release
Finally, the average true positive and true negative rate on the 30 test
minitablets. PharmSciTech 2003; 4: E26.
sets was reported. Also, all simulations were performed both on the
Porpiglia F, Ghignone G, Fiori C et al. Nifedipine versus Tamsulosin for themanagement of lower ureteral stones. J Urol 2004; 172: 568–571.
original data set that was not normalized, as well as on a data set
Dellabella M, Milanese G, Muzzonigro G. Efficacy of Tamsulosin in the
with covariates normalized to have zero mean and unit variance.
medical management of juxtavesical ureteral stones. J Urol 2003; 170:
LR and LP-SVM9 are linear classification methods: they work
best if both classes of data can be separated reasonably well in a
Gomha MA, Sheir KZ, Showky S et al. Can we improve the prediction ofstone-free status after extracorporeal shock wave lithotripsy for ureteral
linear way, that is, using a hyper-plane. If this is not the case, a
stones? A neural network or a statistical model? J Urol 2004; 172:
nonlinear separating function is needed. This can be established
Kidney International (2006) 69, 157–160
T h e n e w e ng l a n d j o u r na l o f m e dic i n eMetformin versus Insulin for the Treatment Janet A. Rowan, M.B., Ch.B., William M. Hague, M.D., Wanzhen Gao, Ph.D., Malcolm R. Battin, M.B., Ch.B., and M. Peter Moore, M.B., Ch.B., Background Metformin is a logical treatment for women with gestational diabetes mellitus, but From National Women’s Health, Auck- land City Hospital,
Targeted Delivery of Azathioprine (AZA) to the Ileum and Colon for the Treatment of Crohn’s Disease (DCD): Scintigraphic and Pharmacokinetic (PK) Evaluation of a Novel Azathioprine Delayed-Release (AZA-DR) Formulation William J. Sandborn M.D., Mayo Clinic, Rochester, MN; Bonnie Hepburn M.D., Barry Goldlust Ph.D., Santarus, Inc., San Diego, CA; Walter Doll Ph.D., Erik Sandef