The Centre of Excellence for Research in Advanced Systems (CERAS) is an innovative, collaborative virtual organization which brings together researchers from IBM, universities and research centres from Canada, US and other countries. The aim is to investigate technologies, techniques and methods for web service centric software, commonly known as Web2.0. This next generation of web service technologies will enable new types of web applications, introduce new ways of interaction and collaboration over the Internet and create new business models.
CERAS’s objectives are to (i) create and test seed ideas for distributed virtual enterprises, (ii) demonstrate how emerging applications can be developed, deployed and run more effectively on a virtual infrastructure, and (iii) explore the concept of an academic virtual campus.
CERAS will investigate two technological aspects: virtualization of computing resources and model driven engineering, with the goal of providing easier development, deployment, reconfiguration, maintenance and adaptation of a web service based IT infrastructure. Of interest is a perpetual beta environment where applications are continuously evolving. Under this environment, a cycle consisting of application design and development, application deployment, operations, and run time analysis that results in feedback to application design, is repeated. Included in our investigation are applications that rely on dynamically discovering and composing application services. Using this approach, an application may be built from components provided by different parties and may require access to data available at some remote locations. CERAS will initially focus on applications in the life science area and the automotive area. Other application areas will be added in the future. The types of applications to be deployed on a virtual infrastructure may range from batch to highly interactive.
Future IT infrastructures are expected to support a much more diverse and a much greater number of applications. The will lead to new challenges in managing the hardware and software resources. Moreover, different types of computing resources may be available, e.g., clusters, grid, and individual systems; and these resources may be scattered across many geographical regions and time zones. Effective resource management is an important issue. An important objective of CERAS is to conduct research on making application services more autonomic by providing automatic capabilities for their configuration, management, tuning, and repair.
A research infrastructure for CERAS will be developed. This infrastructure is composed of computing resources located at the University of Waterloo (UW), the Ontario Cancer Institute (OCI) and North Carolina State University (NCSU). The primary usage of the computing resources at these locations are: UW – investigation of technological issues related to virtualization of computing resources; OCI – production environment for scientific and highly interactive applications in the life science area; and NCSU – a virtual Campus that matches students, instructors with computing resources and enables remote education and computing resource reservations for the researchers. A workload Portal will be developed; this portal will enable several types of workloads to access the computing resources at the three locations. These resources are heterogeneous in nature and under different administrative domains. Usage policy may therefore be imposed at some locations and there may be restrictions because the computing platform required for a given application is only available at a certain location. The possibility of connecting the resources at the three locations by a high speed network (e.g., CANARIE) will be explored. The overall infrastructure, seen as a big virtual data centre, will be shared by CERAS researchers and others. It also provides an environment for virtual research collaboration where a distributed research team may run common experiments, share and exchange research settings.
The proposed research aims at devising an integrated approach in which existing and new applications are deployed and evaluated on a virtual infrastructure. The overall research program can be organized into two layers:
In terms of autonomic computing, autonomic management at the resource provisioning layer is achieved through a collection of autonomic managers; each has the capability to monitor specific resources, analyze the results, plan any changes if necessary, and enact these changes in the operating environment. Workload may also be reconfigured in case of failures. At the application services layer, autonomic managers could re-negotiate service level agreements (SLAs). They rely on the availability of appropriate models of the computing infrastructure, the high-level application goals, user preferences, and various decision models in order to perform their planning tasks.
Model-Driven Engineering (MDE) techniques will be used in application design and development. MDE refers to the systematic use of models as primary engineering artifacts throughout the engineering lifecycle. Models are abstractions of a system and its environment and they play an important role in the proposed research program. Autonomic resource provisioning relies on the availability of various models that are utilized by the autonomic managers and the system users, including specifications of high-level application goals, specifications of application services, workflow models of service composition and orchestration, specifications of SLAs, decision models and goal models for achieving SLAs, policy models (e.g., business rules), user models, models of computing infrastructure (computing nodes, storage nodes, network links, etc.), performance models, and service deployment models. These models will be utilized at any time of the development cycle, including design time and runtime.
Two or more applications in the life science and automotive areas will be selected for our investigation. [To add: concrete problems to be addressed in life sciences and automotive industry] Possible work could include techniques for dynamically discovering and composing application services and tools for application development. These applications will be used as examples in our work on application design and self configuration and optimization of IT infrastructures.
CERAS’s research program consists of a number of projects. An overview of these projects follows.
A model can be expressed using a general-purpose language such as unified modeling language (UML) or a mathematics-based language of first-order predicate calculus or linear temporal logic. Alternatively, the model can be expressed using a domain-specific language (DSL), that is, a language that is specialized for the task at hand. The challenges to be addressed are related to providing effective methods, tools, and theories for engineering appropriate modeling languages to represent different kinds of models, such as those listed in the previous section. The challenges include specifying abstract and concrete syntax of modeling languages; specifying their semantics; performing quantitative analyses (e.g., safety and liveness) and qualitative analyses (e.g., performance, availability, timeliness, and security), providing tool support for implementing modeling languages and environments, including editors, analyzers, and code generators. Usability aspects of the languages and tools are of key importance.
The IT infrastructure under consideration consists of a variety of computing resources, e.g., clusters, individual servers, and grid. These resources may be geographically distributed and are accessed via a virtualization layer. Application services supported may have varying workloads and performance requirements. There may also be a cost associated with resource usage. By dynamically adjusting resource allocations, we can ensure that a service will meet performance targets despite workload fluctuations. In addition, we can balance competing resource demands from services that share the same resource pool. Algorithms for automatic, dynamic resource allocation that minimize cost while meeting service level agreements will be developed and evaluated. These algorithms will take into consideration different classes of jobs. The effectiveness of these algorithms will be evaluated. The efficiency of various applications running on a virtual infrastructure will also be investigated, with a view to providing feedback to application design. Autonomic control of computing resources based on layered performance models will also be explored.
Achieving the self-optimization and self-configuration of IT infrastructures will require the ability to model the available computing and storage infrastructure, including resource aspects such as utilization, performance, and security. The models should enable analyses both at design and runtime. It should be possible to build the models by gathering data from the running system, to modify the model, predict the results through analysis and simulation, and then push the change to the live environment. The modeling means will be developed using the tools and methods investigated in the project on modeling language engineering.
The use of virtualization to improve the availability of the applications that constitute an information service will be investigated. When a failure occurs, some computing resources will become unavailable, but other resources will remain available. Through redundancy and replication, affected systems can be migrated to and restarted from the available hardware, and continue running. We will address both the migration in the scenario where there is planned downtime for upgrades, and the more challenging scenario of unpredictable failures. Our approach will rely on the capability of a virtual infrastructure to checkpoint the state of a running virtual machine (VM) and to restart from a checkpoint. If the VM on which the application is running fails, we can restart from the latest checkpoint on a different, live VM. The goal is to provide high availability to any application with full transparency to users and little or no software modification. Techniques that perform periodic checkpointing, identify failures, and migrate to live virtual machines with minimum overhead, will be developed and evaluated on a virtual infrastructure.
In some situations, simply assigning additional resources to an application is either impossible or inadequate. For example, the performance of a database system may become limited by data contention, in which case more resources will not help. In such situations, it is necessary to dynamically provision a new instance of the application and share the load between the old and new instances. A major challenge for replicating an application is the global consistency across replicas of the database being managed, as well as the internal state of the database processes which contain buffer pools, lock tables, user connections, and other memory areas. Mechanisms to effectively implement the dynamic provisioning of new instances of an application, using either Tivoli Intelligent Orchestrator or machine virtualization as the underlying infrastructure, will be investigated.
To effectively manage a multi-tier information service, e.g., a web-based systems consisting of web servers, application servers, database servers, and storage server, the tuning decisions made by the different components of the service must be coordinated. Our current focus is on mechanisms that can be used to coordinate database systems with storage servers providing virtualized storage. We will investigate the problem of end-to-end physical design for database systems and storage systems. That is, given a description of the database system workload, we seek to automatically recommend both a database physical design and a configuration of the underlying storage system. Tools exist for solving these design problems locally within the DBMS and storage tiers. We plan to examine ways to use these tools as components of a solution to the end-to-end design problem.
The NCSU site operates a production environment for a virtual Campus. Resources are allocated to faculty, staff and students, typically for instructional computing. Provisions are made for resource reservation. The typical usage pattern is day time usage with pre-defined capacity and long term reservation. Workload for research computing is sporadic and short term, with unpredictable resource requirement and reservation length. At the OCI site, the workload consists of a mix of highly interactive applications and long running transactions. The facility at UW is mainly used for research in middleware for virtual infrastructure and evaluation of algorithms for dynamic resource provisioning. A possible project is to work with the three locations to monitor the resource usage with a view of coming up with workload models for different types of applications.
The purpose of this project is to design languages for modeling different aspects of application services, such as the specification of services, their orchestration, and deployment; runtime models of service configuration, health, and performance; and specifications of service level agreements and decision models to maintain them. The different models are aimed at providing optimal separation of concerns in application development and management.
A key idea of the overall research is a feedback-based integration between Application Services and Computing Resource Provisioning. Application service layer requests resources from the underlying resource provisioning infrastructure. While the latter makes a best effort to provide the desired resources and to shield any failures and infrastructure modifications from the application layer, some failures or changes may be best handled at the application layer. In this project, we will investigate such feedback scenarios and we will develop a methodology to design such feedback systems. The ultimate goal is to support the notion of “perpetual beta” – application services that are constantly running and evolving, while achieving high reliability. Thanks to the virtual infrastructure, changes to the applications can be tested and deployed without risk. Furthermore, changes to the infrastructure should be possible without disturbing utility provided by application services. [may need to add a project of how to handle multiple versions in parallel and contain application errors from propagation]
The different models of a running system will be constantly evolving. In this project, we will investigate concepts, tools, and methods to facilitate such change in an efficient and consistent manner. To that end, we will investigate model mappings and transformation, mappings between technology spaces (e.g., semantic web and UML), model merging, and reconciliation of multiple views.
CERAS includes researchers from the following institutions: University of Waterloo, University of Toronto, Carleton University, Queen's University, University of Western Ontario, North Carolina State University(USA), Ontario Cancer Institute, IBM Toronto Lab, IBM Ottawa Lab, IBM Raleigh (USA). Research funds come from the above institutions as well as from the governments of the U.S. and Canada.
The Institute will be managed by a Research Steering Committee made of researchers from the above institutions.