Title: T-thinker: A Task-Based Parallel Computing Model for Compute-Intensive Graph Analytics and Beyond

Abstract: Pioneered by Google’s Pregel, the think-like-a-vertex (TLAV) computing model has dominated the area of parallel and distributed graph processing. However, TLAV models are only scalable for data-intensive iterative graph algorithms such as random walks and graph traversal. Unfortunately, researchers were using TLAV models to solve compute-intensive graph problems, leading to performance not much beyond that of a serial algorithm due to the IO bottleneck incurred by unnecessarily materializing a lot of intermediate data. This talk advocates a new parallel computing model called T-thinker, which adopts the think-like-a-task (TLAT) computing paradigm to divide the computing workloads of compute-intensive problems while allowing backtracking search to avoid data materialization as much as possible. We will explain how the T-thinker model can achieve ideal speedup ratio for many compute-intensive problems such as mining dense subgraphs, frequent subgraph pattern mining, and subgraph matching/enumeration. A number of TLAT-based systems will be covered including G-thinker, G-thinkerQ, T-FSM, PrefixFPM, G2-AIMD and T-DFS, which tackles compute-intensive graph problems in various settings such as on a shared-memory multi-core machine, on a distributed cluster, and on multiple GPUs. We will also explain how the T-thinker model applies beyond the graph domain to problems such as training big models consisting of many decision trees, and massively parallel spatial data processing.

Biography: Da Yan is an Associate Professor in the Department of Computer Sciences of the Luddy School of Informatics, Computing, and Engineering (SICE) at Indiana University Bloomington. He received his Ph.D. degree in Computer Science from the Hong Kong University of Science and Technology in 2014, and he received my B.S. degree in Computer Science from Fudan University in Shanghai in 2009. He is a DOE Early Career Research Program (ECRP) awardee in 2023, and the sole winner of the Hong Kong 2015 Young Scientist Award in Physical/Mathematical Science. His research interests include parallel and distributed systems for big data analytics, data mining, and machine learning (esp. deep learning). He frequently publishes in top DB and AI conferences such as SIGMOD, VLDB, ICDE, KDD, ICML, ICLR, AAAI, IJCAI, EMNLP, and in top journals such as ACM TODS, VLDB Journal, IEEE TKDE, IEEE TPDS, ACM Computing Surveys. He also serves extensively in the major top DB and AI conferences and journals as reviewers, co-organized events such as the BIOKDD workshop with SIGKDD, Dagstuhl seminars, and a few top conferences, and he served as guest editors of journals such as IEEE/ACM TCBB, BMC Bioinformatics, and IEEE CG&A.