Systems and networking researchers win three awards at 18th IEEE/IFIP NOMS 2022 | Cheriton School of Computer Science

The 18^th IFIP/IEEE Network Operations and Management Symposium (NOMS 2022) held in Budapest, Hungary explored network and service management in the era of cloudification, softwarization, and artificial intelligence as its main theme.

Several systems and networking researchers from the Cheriton School of Computer Science along with their national and international colleagues attended the week-long symposium to present their research, winning three awards in total — a best dissertation award, a best student paper award, and a best paper award.

NOMS 2022 Best Dissertation Award

Shihab Chowdhury received the NOMS 2022 Best Dissertation Award for his PhD titled “Resource management in softwarized networks,” which he defended in February 2021. Shihab conducted doctoral studies in the Systems and Networking group, under the supervision of Raouf Boutaba, Professor and Director of the Cheriton School of Computer Science.

Network softwarization is an emerging paradigm where software controls the treatment of network flows, adds value to these flows by software processing, and orchestrates the on-demand creation of customized networks to meet the needs of various applications. Network softwarization facilitates the programmability of network equipment, allowing multiple virtual networks to function on top of a single physical network structure.

Software-defined networking, network function virtualization, and network virtualization are the three cornerstones of the transformative trend toward network softwarization. Together, they empower network operators to accelerate time-to-market for new services, diversify the supply chain for networking hardware and software, and bring the benefits of agility, economies of scale, and flexibility of cloud computing to networks. The enhanced programmability made possible by softwarization creates unique opportunities for adapting network resources to support applications and users with diverse requirements.

To effectively leverage the flexibility provided by softwarization and realize its full potential, it is of paramount importance to devise mechanisms to allocate resources to different applications and users, and to monitor their use over time. To this end, Shihab’s PhD research advanced the state of the art in how resources are allocated and monitored, and he created a foundation for effective resource management in softwarized networks.

Shihab’s PhD research has made many outstanding and significant contributions to resource management in softwarized communication networks. His strengths in building systems are coupled with his mastery of its mathematical foundations, bringing rigorous computational theory to systems implementation.

In addition to receiving the Best Dissertation Award at NOMS 2022, Shihab’s doctoral research was also recognized by the University of Waterloo, where he received the 2021 Alumni Gold Meda l for outstanding academic performance in a doctoral program.

NOMS 2022 Best Student Paper Award

PhD student Soheil Johari received the NOMS 2022 Best Student Paper Award for “Anomaly detection and localization in NFV systems: an unsupervised learning approach,” research he conducted with his doctoral advisor Raouf Boutaba and colleagues Nashid Shahriar, Massimo Tornatore and Aladdin Saleh.

photo of Soheil Johari, Nashid Shahriar, Massimo Tornatore and Raouf Boutaba

L to R: Soheil Johari, Nashid Shahriar, Massimo Tornatore and Raouf Boutaba. Aladdin Saleh’s photo was unavailable.

Network function virtualization (NFV), a revolutionary shift in telecommunication service provision, decouples network and service functions from the physical devices on which they run by implementing them through software called virtual network functions or VNFs. Many VNFs have performance equal to that of pure hardware implementations, but with the flexibility and optimization that softwarization allows. Despite these advantages, provisioning and managing VNF-based services adds complexity that makes VNFs more prone to failure than dedicated hardware-based solutions.

Detecting anomalous behaviour in an NFV system and finding its origin are critically important to ensure its reliability. Deep learning methods have shown promising results, but most approaches use supervised learning algorithms, which require substantial amounts of labelled faulty instances — data that is not only scarce but also labour-intensive to correctly identify and label. Using unsupervised learning-based anomaly detection would not only avoid the need for abundant labelled faulty instances, but it could also provide more generalized protection against the variety of anomalous behaviour seen in NFV systems.

The research team developed a novel unsupervised anomaly-detection approach for NFV systems when training data is contaminated. They first trained an unsupervised anomaly-detection method known as Deep Autoencoding Gaussian Mixture Model, or DAGMM, on the contaminated training data as the teacher model, first treating the training data as if it had no contamination. Then they used DAGMM to clean the training data, thereby removing potentially anomalous instances. From these removed instances, they pseudo-labelled samples that DAGMM had classified with high confidence as anomalies.

Once the anomalies were successfully detected, the team used an unsupervised machine learning approach to find the anomalous VNF. Localizing an anomaly is challenging, as there might be no labeled anomalous instances in the training data, and distinguishing between different failure scenarios can only be done by comparing the detected anomaly with normal instances.

To accomplish this, the research team used a local AI-explainability method called SHapley Additive exPlanations to find the anomalies. Armed with this data, the team then conducted comprehensive experimental analyses on two datasets collected on different NFV testbeds. Their results revealed that their solutions outperformed previous methods by up to 22% for anomaly detection and up to 19% for anomaly localization. Future work will focus on improving the generalization of the detection and localization models, evaluating their applicability on larger and more complex datasets, and identifying the type of faults in the NFV system using unsupervised learning.

NOMS 2022 Best Paper Award

Muhammad Sulaiman, Arash Moayyedi, Mohammad A. Salahuddin and Raouf Boutaba from the Cheriton School of Computer Science, along with Aladdin Saleh from Rogers Communications Canada, received the NOMS 2022 Best Paper Award for “Multi-agent deep reinforcement learning for slicing and admission control in 5G C-RAN.”

photo of Muhammad Sulaiman, Arash Moayyedi, Mohammad A. Salahuddin, Raouf Boutaba

L to R: Muhammad Sulaiman, Arash Moayyedi, Mohammad A. Salahuddin and Raouf Boutaba. Aladdin Saleh’s photo was unavailable.

Cloud radio access network (C-RAN) is the next generation of radio access network architecture — a centralized, cloud computing–based architecture for radio access networks that allows for flexible resource management through virtualization and supports large-scale deployments. As C-RAN has been adopted in 5G mobile networks, the underlying network has been re-imagined as a network of interconnected cloud sites with virtual resources. Using network function virtualization (NFV), a service provider can leverage flexible and strategic placement of virtualized network functions (VNFs) at these different cloud sites to reduce network bottlenecks and make optimal use of the infrastructure.

5G mobile networks can support a wide range of services from enhanced mobile broadband to ultra-reliable low-latency communications to massive machine-type communications. Network slicing is a key enabling technology that offers isolated end-to-end virtual networks — 5G network slices — that are tailored to meet the specific quality-of-service requirements of the different services on the same infrastructure. Network slices include chains of VNFs, and when placing these VNFs in 5G C-RAN at different cloud sites it is important to consider the service type and its service-level agreements. Additionally, given limited resources, it is impossible for an infrastructure provider to serve all incoming slice requests. Therefore, an admission control decision must also be made for each incoming slice request.

Among the various AI-based solutions, deep reinforcement learning has shown unprecedented performance in solving challenging problems. A deep reinforcement learning agent interacts with an environment and through trial and reward learns the actions that maximize its cumulative reward without a priori knowledge of the environment or the need for massive training datasets. For these reasons, deep reinforcement learning is particularly well suited to solving admission control and slicing in 5G C-RAN.

In practice, future slice requests are not known and must be predicted to make intelligent slicing and admission control decisions. Deep reinforcement learning can take the future consequences of its actions into account while maximizing its cumulative reward. If a certain slicing decision causes future slice requests to be rejected because of a resource bottleneck, deep reinforcement learning predicts this and avoids that slicing decision. On the other hand, a reinforcement learning agent can also learn the admission control action to reject slice requests with high resource requirements and small revenue.

The optimal admission control policy for any radio access network depends on the corresponding slicing policy, and vice versa. Therefore, it is important that two policies coordinate to achieve the maximum revenue. However, the reward formulation for a single deep reinforcement learning agent does not allow for an effective solution to the joint problem. A promising solution is multi-agent deep reinforcement learning, where the agents can have separate policies for admission control and slicing. The reward functions for these policies can be designed such that the two agents learn to work in synergy towards a common goal — i.e., revenue maximization.

To this end, the research team proposed a novel multi-agent deep reinforcement learning–based solution that jointly addressed slicing and admission control in 5G C-RAN to improve a service provider’s long-term revenue. To evaluate the efficacy of their proposed solution, the research team developed a C-RAN slicing and admission control simulation framework that also facilitated the evaluation of other solutions, such as a single-agent deep reinforcement learning–based solution.

Their results demonstrated that multi-agent deep reinforcement learning achieved up to an 18% gain in a provider’s long-term revenue compared with approaches that use simple heuristics to address the problems, and up to a 3.8% gain compared with approaches that use deep reinforcement learning to individually address either slicing or admission control. Future studies will evaluate the team’s approach in more complex 5G C-RAN environments and make their proposed approach more robust by training the deep reinforcement learning agent under more diverse and varying network conditions.