Loading Events

« All Events

  • This event has passed.

ECE PhD Dissertation Defense: Maher Kachmar

July 13, 2021 @ 10:00 am - 11:00 am

PhD Dissertation Defense: Active Resource Partitioning and Planning for Storage Systems using Time Series Forecasting and Machine Learning Techniques

Maher Kachmar

Location: Zoom

Abstract: In today’s enterprise storage systems, supported data services such as snapshot delete or drive rebuild can result in tremendous performance overhead if executed inline along with heavy foreground IO, often leading to missing Service Level Objectives (SLOs). Moreover, new classes of data services, such as thin provisioning, instant volume snapshots, and data reduction features make capacity planning and drive wear-out prediction quiet challenging. Having enough free storage pool capacity available ensures that the storage system operates in favorable conditions during heavy foreground IO cycles. This enables the storage system to defer background work to a future idle cycle. Static partitioning of storage systems resources such as CPU cores or memory caches may lead to missing data reduction rate (DRR) guarantees. However, typical storage system applications such as Virtual Desktop Infrastructure (VDI) or web services follow a repetitive workload pattern that can be learned and/or forecasted. Learning these workload pattern allows us to address several storage system resource partitioning and planning challenges that may not be overcome with traditional manual tuning and primitive feedback mechanism.
First, we propose a priority-based background scheduler that learns this pattern and allows storage systems to maintain peak performance and meet service level objectives (SLOs) while supporting a number of data services. When foreground IO demand intensifies, system resources are dedicated to service foreground IO requests. Any background processing that can be deferred is recorded to be processed in future idle cycles, as long as our forecaster predicts that the storage pool has remaining capacity. A smart background scheduler can adopt a resource partitioning model that allows both foreground and background IO to execute together, as long as foreground IOs are not impacted, harnessing any free cycles to clear background debt. Using traces from VDI and web services applications, we show how our technique can out-perform a static policy that sets fixed limits on the deferred background debt and reduces SLO violations from 54.6% (when using a fixed background debt watermark), to only 6.2% when employing our dynamic smart background scheduler.
Second, we propose a smart capacity planning and recommendation tool that ensures the right number of drives are available in the storage pool in order to meet both capacity and performance constraints, without over-provisioning storage. Equipped with forecasting models that characterize workload patterns, we can predict future storage pool utilization and drive wear-outs. Similarly, to meet SLOs, the tool recommends expanding pool space in order to defer more background work through larger debt bins. Overall, our capacity planning tool provides a day/hour countdown for the next Data Unavailability/Data Loss (DU/DL) event, accurately predicting DU/DL events to cover a future 12-hour time window.
Moreover, supported services such as data deduplication are becoming a common feature adopted in the data center, especially as new storage technologies mature. Static partitioning of storage system resources, memory caches, may lead to missing SLOs, such as the Data Reduction Rate (DRR) or IO latency. Lastly, we propose a Content-Aware Learning Cache (CALC) that uses online reinforcement learning models (Q-Learning, SARSA and Actor-Critic) to actively partition the storage system cache between a deduplicated data digest cache, content cache, and address-based data cache to improve cache hit performance, while maximizing data reduction rates. Using traces from popular storage applications, we show how our machine learning approach is robust and can out-perform an iterative search method for various data-sets and cache sizes. Our content-aware learning cache improves hit rates by 7.1% when compared to iterative search methods, and 18.2\% when compared to traditional LRU-based data cache implementation.

Details

Date:
July 13, 2021
Time:
10:00 am - 11:00 am
Website:
https://northeastern.zoom.us/j/9812717772#success

Other

Department
Electrical and Computer Engineering
Topics
MS/PhD Thesis Defense