Multisite Connectome-based Disease Classification
Purpose: Making use of Multisite Data-sharing InitiativesThere is great interest in identifying neuroimaging biomarkers of psychiatric disorders. Such discovery will not only deeply extend our knowledge about the network architecture of the human brain, but also offers the potential for an objective, machinebased diagnostic system to enter the clinical realm. To this end, multiple data-sharing initiatives have been launched in the neuroimaging field, including ADHD-200, Alzheimer's Disease Neuroimaging Initiative (ADNI), Autism Brain Imaging Data Exchange (ABIDE), and Enhanced NKI-Rockland dataset. These community-wide collaborative efforts offer unique potential, as they foster reproducible research and allow us to examine the association between diseases and biomarkers with unprecedented sample size.
Multisite data heterogeneity and multitask learningHowever, multisite data present new challenges for this, as the data aggregation process introduces several sources of systematic confounds, such as variability in the scanner quality, image acquisition protocol, subject demographics, and other sources of experimental variations. To effectively make use of multisite data, it is imperative to train the classifiers in a way that accounts for these site-specific heterogeneities. To this end, we propose a classification framework that adopts a multitask learning approach. In brief, the idea behind multitask learning is to jointly train multiple tasks in order to improve classification performance, under the assumption that the tasks are related to each other in some sense.
Our method adopts a penalization scheme that accounts for the following two-way structure that exists in a multisite connectomic dataset: (1) the 6-D spatial structure in the functional connectomes that arises from pairs of points in 3-D brain space, and (2) the inter-site structure captured via the multitask L1/L2-penalty, which allows consistent model interpretation to be made by selecting the same set of informative features across sites.
Results: improved classification and model interpretabilityThe experiments on the publicly available multisite ADHD-200 dataset (7 contributing sites) not only shows that the proposed multitask approach can lead to improvement in classification performance, but also yields interpretable models that have consistent representation of informative features across sites.
Relevant PublicationsT. Watanabe, D. Kessler, C. Scott, C. Sripada, "Multisite Disease Classification with Functional Connectomes via Multitask Structured Sparse SVM,'' Sparsity Techniques in Medical Imaging, 2014. [ pdf]
T. Watanabe, D. Kessler, C. Scott, C. Sripada, "Multitask Structured Sparse Support Vector Machine for Multisite Connectome-based Disease Classification,'' Technical Report, University of Michigan, 2014. [ pdf] [ code]