Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and PerformanceReport as inadecuate

Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance - Download this document for free, or read online. Document in PDF available to download.

Journal of Immunology Research - Volume 2015 2015, Article ID 573165, 9 pages -

Research Article

Applied Systems Biology, Leibniz Institute for Natural Product Research and Infection Biology–Hans-Knöll-Institute HKI, Beutenbergstraße 11a, 07745 Jena, Germany

Friedrich Schiller University Jena, Fürstengraben 1, 07743 Jena, Germany

Received 27 August 2015; Accepted 15 September 2015

Academic Editor: Francesco Pappalardo

Copyright © 2015 Carl-Magnus Svensson et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Application of personalized medicine requires integration of different data to determine each patient’s unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis. The evaluation, and often training, of automated classifiers requires manually labelled data as ground truth. In many cases such labelling is not perfect, either because of the data being ambiguous even for a trained expert or because of mistakes. Here we investigated the interobserver variability of image data comprising fluorescently stained circulating tumor cells and its effect on the performance of two automated classifiers, a random forest and a support vector machine. We found that uncertainty in annotation between observers limited the performance of the automated classifiers, especially when it was included in the test set on which classifier performance was measured. The random forest classifier turned out to be resilient to uncertainty in the training data while the support vector machine’s performance is highly dependent on the amount of uncertainty in the training data. We finally introduced the consensus data set as a possible solution for evaluation of automated classifiers that minimizes the penalty of interobserver variability.

Author: Carl-Magnus Svensson, Ron Hübler, and Marc Thilo Figge



Related documents