Categorize, cluster, and classify: A 3-c strategy for scientific discovery in the medical informatics platform of the human brain project

Tal Galili, Alexis Mitelpunkt*, Netta Shachar, Mira Marcus-Kalish, Yoav Benjamini

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


One of the goals of the European Flagship Human Brain Project is to create a platform that will enable scientists to search for new biologically and clinically meaningful discoveries by making use of a large database of neurological data enlisted from many hospitals. While the patients whose data will be available have been diagnosed, there is a widespread concern that their diagnosis, which relies on current medical classification, may be too wide and ambiguous and thus hides important scientific information.

We therefore offer a strategy for a search, which combines supervised and unsupervised learning in three steps: Categorization, Clustering and Classification. This 3-C strategy runs as follows: using external medical knowledge, we categories the available set of features into three types: the patients' assigned disease diagnosis, clinical measurements and potential biological markers, where the latter may include genomic and brain imaging information. In order to reduce the number of clinical measurements a supervised learning algorithm (Random Forest) is applied and only the best predicting features are kept. We then use unsupervised learning in order to create new clinical manifestation classes that are based on clustering the selected clinical measurement. Profiles of these clusters of clinical manifestation classes are visually described using profile plots and analytically described using decision trees in order to facilitate their clinical interpretation. Finally, we classify the new clinical manifestation classes by relying on the potential biological markers. Our strategy strives to connect between potential biomarkers, and classes of clinical and functional manifestation, both expressed by meaningful features. We demonstrate this strategy using data from the Alzheimer's Disease Neuroimaging Initiative cohort (ADNI).

Original languageEnglish
Title of host publicationDiscovery Science - 17th International Conference, DS 2014, Proceedings
EditorsSašo Džeroski, Panče Panov, Dragi Kocev, Ljupčo Todorovski
PublisherSpringer Verlag
Number of pages14
ISBN (Electronic)9783319118116
StatePublished - 2014
Event17th International Conference on Discovery Science, DS 2014 - Bled, Slovenia
Duration: 8 Oct 201410 Oct 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference17th International Conference on Discovery Science, DS 2014


  • Bioinformatics
  • Categorization
  • Classification
  • Clustering
  • Disease profiling
  • Medical informatics


Dive into the research topics of 'Categorize, cluster, and classify: A 3-c strategy for scientific discovery in the medical informatics platform of the human brain project'. Together they form a unique fingerprint.

Cite this