CerasProjectsAbstracts < CERAS

CERAS Web>CerasProjectsAbstracts (2007-05-28, CristianaAmza)

Fine-grained Resource Management and Problem Detection in Dynamic Content Servers

_Principal Investigator:_
Cristiana Amza, Assistant Professor
Electrical and Computer Engineering, University of Toronto

Background: As networked computer systems grow in complexity, automatic problem detection, analysis and correction become essential system management tools. Many commercial tools for coordinated monitoring and control of large scale systems exist, however, the complexity of the displayed information for currently deployed multi-tier networked systems still exceeds the ability of humans to diagnose and respond to problems rapidly and correctly.
The traditional approach to automated problem detection is to develop a priori models of system structure and behavior, which may be represented quantitatively or as a set of event-condition-action rules. While these approaches provide a basis for system modeling, these models have several limitations: they are either costly to build, or incomplete, hence inaccurate. Building such models needs extensive knowledge about the system. Finally, these models may become obsolete as systems change or encounter unprecedented situations.

Objectives: In this project, we will design and implement groundbreaking techniques for system self-optimization and self-healing at a fine granularity of resources and application contexts. The approach is based on dynamically learned statistical system models. Our system self-optimization techniques will address both performance, and power concerns. Our self-healing techniques will address detecting, diagnosing and repairing system faults, at the fine granularity of system components and low-level application contexts.

Potential benefit to Ontario: As system and workload complexity increases, manual management and performance optimization of Internet servers is increasingly costly and time consuming. Yet, suboptimal management and tuning can cause severe resource bottlenecks or failures that may cost the cluster owner millions of dollars due to system unavailability to clients. In large cluster systems with many workloads, the probability of failures, cooling problems or bottlenecks increases. These problems are especially critical in large corporations that require continuous availability to serve an international clientele. Recent statistics show that maintenance costs exceed 75% of the budget of large companies. Thus, the proposed techniques are essential for the long term survival of large Ontario-based companies and for supporting the growth of smaller companies, as well as for reducing the costs of operation in a range of dynamic content services such as, e-commerce, on-line bidding and massively multi-player games.

_
Other Projects_

Automated Management of Virtual Database Appliances

Semantically Configurable Modelling Notations and Tools

Model Management for Continuously Evolving Systems

Modeling, Evolution, and Automated Configuration of Software Services

Elaborating and Evaluating UML’s 3-Layer Semantics Architecture

Intelligent Autonomic Computing for Computational Biology

Performance Management of IT Infrastructure

Performance-Model-Assisted Creation and Management of Service Systems

Q3-4-5-8.doc: New Form.doc

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
doc	Q3-4-5-8.doc	r1	manage	547.5 K	2007-05-28 - 08:30	CristianaAmza	New Form.doc

Topic revision: r3 - 2007-05-28 - CristianaAmza

CERAS