ExperienceAmazon Applied Scientist InternshipTopic 1: Analyze Amazon Day customer behavior and explore machine learning opportunity (2020 Summer)
Topic 2: Amazon Day customer behavior and explore machine learning opportunity (2019 Summer)
Teaching Assistant at UC, Riverside
Course ProjectsWeb DeveloperTopic: Design of search engine for Wikipedia (demo: WiKiSearch)
Machine Learning and Data MiningTopic: Label prompt for Yelp image
ResearchNon-linear Computing Lab, UC RiversideTopic: Machine learning-assisted resource management in computing systemsThe widespread adoption of Internet of Things and latency-critical applications has fueled the burgeoning development of edge colocation data centers (a.k.a., edge colocation) - small-scale data centers in distributed locations. Due to limited resources and demand for low latency, we conduct several explorations for resource management in edge computing systems using machine learning. We study resource management from the perspective of both data center operator and users (attacker). Firstly, we propose battery-assisted power management in edge data centers considering the computing performance and thermal behavior under significant workload fluctuations. In particular, the workload fluctuations allow the battery to be frequently recharged and made available for temporary capacity boosts. But, using batteries can overload the data center cooling system which is designed with a matching capacity of the power system. We design a novel power management solution, DeepPM, that exploits the UPS battery and cold air inside the edge data center as energy storage to boost performance. DeepPM uses deep reinforcement learning (DRL) to learn the data center thermal behavior online in a model-free manner and uses it on-the-fly to determine power allocation for optimum latency performance without overheating the data center. Next, we study the vulnerability and thermal attack opportunities from the mismatch between power load and cooling load in edge colocation data centers. We discover that the sharing of cooling systems also exposes edge colocations' potential vulnerabilities to cooling load injection attacks (called thermal attacks) by an attacker which, if left at large, may create thermal emergencies and even trigger system outages. Importantly, thermal attacks can be launched by leveraging the emerging architecture of built-in batteries integrated with servers that can conceal the attacker's actual server power (or cooling load). We consider both one-shot attacks (which aim at creating system outages) repeated attacks (which aim at causing frequent thermal emergencies). For repeated attacks, we present a foresighted attack strategy which, using reinforcement learning, learns on the fly a good timing for attacks based on the battery state and benign tenants' load. Topic: Calibration and accuracy monitoring for deep neural networks on operational dataset
|