Ioannidis to Lead $2M BIGDATA Research
ECE Assistant Professor Stratis Ioannidis will lead a BIGDATA collaborative research effort for the “Design and Computation of Scalable Graph Distances in Metric Spaces: A Unified Multiscale Interpretable Perspective”. The project is funded by a $2M NSF grant, awarded by the National Science Foundation’s BIGDATA program. The grant includes a $350K donation from Google in the form of Google Cloud credits, giving NEU researchers direct access to Google’s cloud platform. This multi-institutional research effort is in collaboration with Tina Eliassi-Rad, from Northeastern’s College of Computer and Information Sciences, and Jose Bento, from the Computer Science department of Boston College.
Abstract Source: NSF
Representations of real-world phenomena as graphs (a.k.a. networks) are ubiquitous, ranging from social and information networks, to technological, biological, chemical, and brain networks. Many graph mining tasks — including clustering, anomaly detection, nearest neighbor, similarity search, pattern recognition, and transfer learning — require a distance measure between graphs to be computed efficiently. The existing distance measures between graphs leave a lot to be desired. They are overwhelmingly based on heuristics. Many do not scale to graphs with millions of nodes; others do not satisfy the metric properties of non-negativity, positive definiteness, symmetry, and triangle inequality. This project studies a formal mathematical foundation covering a family of graph distances that overcome these limitations, focusing on real-world applications in biology and social network analysis. It also provides a universal methodology for parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes, and scaling it over cloud computing resources.
This project studies, designs, and evaluates graph distances that satisfy the following six properties: (1) They are scalable — i.e., they are strictly subquadratic in runtime and achieve a speedup when computed in parallel. (2) They are metrics — i.e., they satisfy non-negativity, positive definiteness, symmetry, and triangle inequality. (3) They are discriminative, as measured by comparisons to the "chemical distance", which finds the optimal mapping between two graphs that minimizes edge discrepancies. (4) They are statistically robust — i.e., they have confidence intervals. (5) They can incorporate auxiliary information available on nodes and links. (6) They are interpretable to subject matter experts. Rather than providing a single metric, this project explores a family of such graph distance metrics. It also provides a universal methodology, using the Alternating Directions Method of Multipliers (ADMM), to parallelizing the computation of graph distance metrics within this family over massive graphs with millions of nodes. The proposed metrics are evaluated over massive real-world graphs using Apache Spark on a cloud computing infrastructure.