This event has passed.

Danlin Jia’s PhD Dissertation Defense

Name: Danlin Jia’s PhD Dissertation Defense
Start: 2022-12-08T11:00:00-05:00
End: 2022-12-08T12:00:00-05:00

December 8, 2022 @ 11:00 am - 12:00 pm

“Towards Performance and Cost-efficiency for Data-intensive Applications in Distributed Data Processing Systems”

Abstract:

Data-intensive science (DIS) has experienced a significant boom in the past decade. The emerging technologies of data-intensive services and infrastructures contribute to DIS’s development and raise challenges. An ecosystem has been constructed considering performance, scalability, sustainability, and reliability to provide a high-quality service to DIS applications. The ecosystem consists of services exposed to users for application deployment and infrastructures to support data storage, transfer, and management from the system’s perspective. DIS applications share typical features, such as memory and I/O intensity. Thus, addressing the bottlenecks triggered by memory-intensive or I/O-intensive workloads in services and infrastructures is essential to improve the performance and cost-efficiency of the whole ecosystem. In this dissertation, we investigate the characteristics of various DIS applications and design new resource allocation and scheduling schemes for the services and infrastructures in the DIS ecosystem.

We first investigate memory optimization in DIS ecosystems. In-memory data analytic frameworks are proposed to cache critical intermediate data in memory instead of in storage drives. Apache Spark is a commonly adopted in-memory data analytic framework with two memory managers, Static and Unified. However, the static memory manager lacks flexibility. In contrast, the unified memory manager puts heavy pressure on the garbage collection of the Java Virtual Machine on which Spark resides. To address these issues, we propose a new learning-based bidirectional usage-bounded memory allocation scheme to support dynamic memory allocation considering both memory demands and latency introduced by garbage collection. Distributed data-processing workloads in container-based virtualization take advantage of resource sharing, fast delivery, and excellent portability of containerization but also suffer from resource competition and performance interference. This inevitably induces performance degradation and significantly long latency, even worse when over-provisioning. Motivated by this problem, we design an efficient memory allocation scheme (RITA) for containerized parallel systems to improve data processing latency. RITA monitors applications’ memory usage and cache characteristics and dynamically re-allocates memory resources.

We also propose I/O optimizations for DIS applications and infrastructures. Distributed Deep Learning (DDL) accelerates DNN training by distributing training workloads across multiple computation accelerators, e.g., GPUs. Although a surge of research has been devoted to optimizing DDL training, the impact of data loading on GPU usage and training performance has been relatively under-explored. When multiple DDL applications are deployed, the lack of a practical and efficient technique for data-loader allocation incurs GPU idleness and degrades the training throughput. In this dissertation, we thus investigate the impact of data-loading on the global training throughput and design a resource allocator that uses the data-loading rate as a knob to reduce the GPU idleness. Finally, designs and optimizations on disaggregated storage systems supported by cutting-edge storage and network techniques emerge dramatically. Disaggregated storage systems can scale resources independently and provide high-quality services for hyper-scale architectures. The traditional congestion control mechanism relieves congestion by limiting the data-sending rate of senders. However, such a design scarifies the storage drive’s performance as data are generated but stalled on storage host nodes if network congestion happens. To solve this issue, we design a storage-side rate control mechanism to mitigate network congestion while avoiding sacrificing I/O performance.

Committee:

Prof. Ningfang Mi (Advisor)

Prof. Xue Lin

Prof. David Kaeli

Details

Date: December 8, 2022
Time:
11:00 am - 12:00 pm
Website: https://northeastern.zoom.us/j/95300695802?pwd=aVp3cWhGa2txbjhmVmtJY1UwV2piUT09

Organizer

Other

Department: Electrical and Computer Engineering
Topics: MS/PhD Thesis Defense
Audience: Graduate, PhD, Faculty, Staff