Multisite Connectome-based Disease Classification

Purpose: Making use of Multisite Data-sharing Initiatives

There is great interest in identifying neuroimaging biomarkers of psychiatric disorders. Such discovery will not only deeply extend our knowledge about the network architecture of the human brain, but also offers the potential for an objective, machinebased diagnostic system to enter the clinical realm. To this end, multiple data-sharing initiatives have been launched in the neuroimaging field, including ADHD-200, Alzheimer's Disease Neuroimaging Initiative (ADNI), Autism Brain Imaging Data Exchange (ABIDE), and Enhanced NKI-Rockland dataset. These community-wide collaborative efforts offer unique potential, as they foster reproducible research and allow us to examine the association between diseases and biomarkers with unprecedented sample size.

Multisite data heterogeneity and multitask learning

However, multisite data present new challenges for this, as the data aggregation process introduces several sources of systematic confounds, such as variability in the scanner quality, image acquisition protocol, subject demographics, and other sources of experimental variations. To effectively make use of multisite data, it is imperative to train the classifiers in a way that accounts for these site-specific heterogeneities. To this end, we propose a classification framework that adopts a multitask learning approach. In brief, the idea behind multitask learning is to jointly train multiple tasks in order to improve classification performance, under the assumption that the tasks are related to each other in some sense.

Our method adopts a penalization scheme that accounts for the following two-way structure that exists in a multisite connectomic dataset: (1) the 6-D spatial structure in the functional connectomes that arises from pairs of points in 3-D brain space, and (2) the inter-site structure captured via the multitask L1/L2-penalty, which allows consistent model interpretation to be made by selecting the same set of informative features across sites.

The multitask L1/L2 penalty (right) allows us recover models that provide consistent feature selection across sites, which is critical for interpretation.

Results: improved classification and model interpretability

The experiments on the publicly available multisite ADHD-200 dataset (7 contributing sites) not only shows that the proposed multitask approach can lead to improvement in classification performance, but also yields interpretable models that have consistent representation of informative features across sites.

The proposed multitask L1/L2-based approach demonstrates superior classification performance as opposed to single-task counterparts .

Weight vectors estimated from the proposed method; here we focus on connectivity patterns in the intra-frontoparietal (6-6) and the intra-default network (7-7) regions (regions frequently reported to exhibit disrupted connectivity patterns in ADHD)

Relevant Publications

T. Watanabe, D. Kessler, C. Scott, C. Sripada, "Multisite Disease Classification with Functional Connectomes via Multitask Structured Sparse SVM,'' Sparsity Techniques in Medical Imaging, 2014. [ pdf]

T. Watanabe, D. Kessler, C. Scott, C. Sripada, "Multitask Structured Sparse Support Vector Machine for Multisite Connectome-based Disease Classification,'' Technical Report, University of Michigan, 2014. [ pdf] [ code]