Computing’s invisible challenge

To us, it may not seem like a big deal: CNN’s web­site is taking too long to load. The day’s most pop­ular YouTube video won’t stop buffering. “Twitter is over capacity.” While these little hic­cups in usability may frus­trate end users, they merely scratch the sur­face of the enor­mous tech­nical chal­lenge that’s con­fronting the backend.

North­eastern Uni­ver­sity assis­tant pro­fessor of elec­trical and com­puter engi­neering Ning­fang Mi recently learned she was one of 42 early-​​career researchers to win a Young Inves­ti­gator Award from the Air Force Office of Sci­en­tific Research. They will receive the grants over a three-​​year period.

She plans to use award to figure out a better way to manage the vast amount of infor­ma­tion sharing that takes place online—and push that mas­sive tech­nical chal­lenge even fur­ther into the back­ground for end users.

These days most of the data we request online is stored in the so-​​called “cloud”—a series of vir­tual com­puters dis­trib­uted on phys­ical servers around the world. For instance, Google has 12 data cen­ters across four con­ti­nents. The 20,000 emails sit­ting in my Gmail inbox aren’t actu­ally stored on my computer—they’re stored in Google’s cloud, which exists on all those remote servers. Every time I look at one of my emails, I am requesting access to it from one of those servers.

Now con­sider YouTube. Its bil­lions of hours of video aren’t all sit­ting on the same phys­ical server; rather, they are stored remotely in the cloud. In this case, I am just one of mil­lions of users requesting the same video in a given moment. And that, Mi explained, is where things get challenging.

Her research is focused on mod­eling per­for­mance in dif­ferent sce­narios and fig­uring out the best ways to manage resources based on the out­comes of those models. This will give her a sense of the work­loads and number of traffic requests that remote servers are likely to have to handle.

“Based on this kind of infor­ma­tion,” she said, “how can I find the best con­fig­u­ra­tion for the plat­form in order to pro­vide the highest quality of service?”

There are two options: She can either move infor­ma­tion around on a single server or move infor­ma­tion between servers. The best choice will depend on the sit­u­a­tion at hand.

“Before pre­dic­tions were based more on average load or traffic, but now we know that in reality the work­load changes,” Mi said. “The term I use here is ‘bursti­ness’ or ‘spikes.’”

Indeed, it all depends on the bursti­ness of human behavior. Some online phe­nomena are pre­dictable, Mi said. For instance, you’re likely to see a burst in email activity on the East Coast every weekday at around 9 a.m. EST. Sim­i­larly, the Internet is likely to be all-​​a-​​flurry across a range of web­sites on elec­tion night as people world over dis­cuss the race on Twitter, stream accep­tances speeches on NBC, and read about the results in The New York Times.

But what about when a celebrity unex­pect­edly passes away or makes a com­ment that goes viral? Or when a boy in a bal­loon sud­denly becomes one of the biggest news sto­ries on the Internet? No one can pre­dict events like that, so no amount of resource man­age­ment prepa­ra­tion could ready YouTube for the asso­ci­ated activity spikes.

Mi, for her part, is devel­oping models that will help detect those bursts with more immediacy—and in some cases even pre­dict them a couple hours in advance. So while we may not know when the next media hoax will drive traffic from mil­lions of curious viewers, at least our com­puters will be able to handle it better.

 

Related Faculty: Ningfang Mi

Related Departments:Electrical & Computer Engineering