Pioneering a Data-Centric Approach to Distributed Machine Learning
ECE Associate Professor Stratis Ioannidis and ECE Professor Edmund Yeh, in collaboration with ECE Professor Carlee Joe-Wong from Carnegie Mellon University, have been awarded a $1M grant from the National Science Foundation to pioneer a data-centric approach to distributed machine learning. The project utilizes advances in Named Data Networking (NDN) to enable new types of distributed learning algorithms that intelligently move data and model components through heterogeneous networks of sensors, while optimally harnessing the networks’ diverse computation, energy, and bandwidth resources. The project is expected to improve the performance of machine learning algorithms in a vast number of potential applications, ranging from smart cities to satellite data analysis to augmented reality.
Abstract Source: NSF
Machine learning algorithms have revolutionized many fields by giving them the ability to use historical data for making predictions or detecting patterns that can then be used to automate various tasks and create new applications for users. The data that many of today’s machine learning applications require, however, is often collected by a network of multiple sensors. For example, data from environmental sensors in smart cities can be used to predict air pollution or traffic at different locations in the city. Analyzing this data with machine learning algorithms then requires these devices to cooperate with each other, exchanging data and models. This project designs mechanisms for devices to efficiently cooperate.
Distributing machine learning algorithms is particularly challenging when devices are heterogeneously resource-constrained, e.g., with varying compute, power, or bandwidth limitations, as is often the case in today’s networks. Traditional learning algorithms either bring all data to a single location for analysis, or entirely distribute the learning algorithm to the data sources. A more flexible approach that instead intelligently brings data to the computing components of the learning algorithms, and conversely brings computing to data sources, can better harness these devices’ resources but raises a natural question of how data and model components should be moved through the network. This project develops a data-centric approach to distributed learning that utilizes advances in Named Data Networking (NDN) to simplify the process of exchanging information, enabling new types of distributed learning algorithms.
The outcomes of this project may improve the distributed learning in a vast number of potential applications, ranging from smart cities to satellite data analysis to augmented reality. The project also supports ongoing efforts in education and broadening participation in computing to underrepresented communities. These efforts include (i) development of new course materials that teach students about the challenges of realistic machine learning deployments, (ii) recruitment of high school and undergraduate students to work on suitably scoped projects that will contribute to the research vision, and (iii) presentations and mentoring sessions aimed at increasing the participation of underrepresented minorities in computing.
This project is a collaborative effort between Carnegie Mellon University and Northeastern University. Results, including algorithm implementations, technical reports, and measurement datasets, will be made publicly available on a repository hosted by CMU. These will remain available for at least two years after the conclusion of the project.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.