PhD Seminar • Systems and Networking • Distributed DNN Training on Serverless ResourcesExport this event to calendar

Friday, July 15, 2022 — 2:00 PM to 3:00 PM EDT

Please note: This PhD seminar will be given online.

Runsheng (Benson) Guo, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Khuzaima Daudjee

Deep Neural Networks (DNNs) are often trained in parallel on a cluster of virtual machines (VMs) so as to reduce training time. However, this requires explicit cluster management, which is cumbersome and often results in costly over-provisioning of resources. Training DNNs on serverless compute is an attractive alternative that is receiving growing interest. In a serverless environment, cluster management is handled for the user, compute resources can be scaled at a fine-grained level, and users are billed only for resources that are consumed. Despite these advantages, existing serverless systems for DNN training are ineffective because they are limited to CPU-based training and bottlenecked by expensive distributed communication.

I will present Hydrozoa, a serverless system we have developed for distributed DNN training. Hydrozoa overcomes existing limitations of serverless DNN training with a novel architecture that combines serverless containers with hybrid-parallel training and supports dynamic worker scaling, which helps improve statistical training efficiency. Hydrozoa achieves significant throughput-per-dollar improvements over existing VM-based and serverless training approaches while relieving the user from the burden of managing machine clusters.


Bio: Runsheng (Benson) Guo is a PhD candidate whose research interests are in distributed ML training, serverless & cloud computing.


To join this PhD seminar on Zoom, please go to https://uwaterloo.zoom.us/j/92272133761.

Location 
Online PhD seminar
200 University Avenue West

Waterloo, ON N2L 3G1
Canada
Event tags 

S M T W T F S
25
26
27
28
29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
3
4
5
6
  1. 2024 (80)
    1. April (8)
    2. March (22)
    3. February (25)
    4. January (25)
  2. 2023 (296)
    1. December (20)
    2. November (28)
    3. October (15)
    4. September (25)
    5. August (30)
    6. July (30)
    7. June (22)
    8. May (23)
    9. April (32)
    10. March (31)
    11. February (18)
    12. January (22)
  3. 2022 (245)
  4. 2021 (210)
  5. 2020 (217)
  6. 2019 (255)
  7. 2018 (217)
  8. 2017 (36)
  9. 2016 (21)
  10. 2015 (36)
  11. 2014 (33)
  12. 2013 (23)
  13. 2012 (4)
  14. 2011 (1)
  15. 2010 (1)
  16. 2009 (1)
  17. 2008 (1)