Exception Handling Patterns for Hierarchical Scientific Workflows
Rafael Tolosana, Omer Rana, Jose Angel Banares, Pedro Alvarez and Joaquim Ezpeleta
Abstract: Scientific workflows generally involve the distribution of tasks to distributed resources, which may exist in different administrative domains. Such a distribution may lead to faults that may arise at different levels: application level, enactment level, and resource management level, for instance. Detecting these faults, and subsequently adapting the structure of the workflow dynamically remains an important challenge. An approach to supporting such dynamic adaptation is presented, along with an evaluation of the approach using an example from the myexperiment.org workflow repository. An analysis of the overhead in using the approach is also presented, along with the benefits/pitfalls of using the proposed approach.
A New Multi-Objective Optimization Scheme for Grid Resource Allocation
Alexander Van Der Kuijl, Michael Emmerich and Hui Li
Abstract: Grid computing emerges as an infrastructure for large-scale data processing, resource sharing, and scientific computing. Job scheduling at the Grid level is challenging in that Grid schedulers do not have control over the computing resources across multiple domains. This makes many traditional algorithms developed on parallel systems not suitable in the Grid case. In this paper we propose a Grid scheduling algorithm using multi-attribute utility theory and multi-objective optimization. It attempts to make optimal decisions based on the available set of objectives. By comparing to a deadline-and-budget algorithm with three objectives, we show that the proposed Multi-Objective Optimization (MOO) scheduling algorithm is capable of obtaining a broader set of non-dominated solutions, and can obtain solutions of higher quality, that is proximity to the Pareto front of optimal solutions.
Resource Use Pattern Analysis for Opportunistic Grids
Marcelo Finger, Germano Bezerra and Danilo Conde
Abstract: This work presents a method for predicting resource availability in opportunistic grids by means of Use Pattern Analysis (UPA), a technique based on non-supervised learning methods. The basic assumptions of the method and its capability to predict resource availability were demonstrated by simulations; accurate learning techniques and distance metrics are determined. The UPA method was implemented and experiments showed the feasibility of its use in low-overhead scheduling of grid tasks and its superiority over other predictive and non-predictive methods.
Dynamic Self-Scheduling for Parallel Applications with Task Dependencies
Aline Nascimento, Cristina Boeres and Vinod Rebello
Abstract: As grids are in essence heterogeneous, dynamic, shared and distributed environments, managing these kinds of platforms efficiently is extremely complex. Few transparent grid management systems have been developed to cope with these characteristics simultaneously and therefore both new and existing applications must be modified to execute efficiently. A promising scalable approach to deal with these intricacies is the design of self-managing or autonomic applications. Autonomic applications adapt their execution accordingly by considering knowledge about their own behaviour and environmental conditions. This paper focuses on the dynamic scheduling that provides the self-optimizing ability to autonomic applications. Being distributed, collaborative and pro-active, the proposed hierarchical scheduling infrastructure addresses important issues to enable an efficient execution in a computational grid. Unlike other approaches, the cooperative, hybrid and application-specific strategy deals effectively with task dependencies. Several experiments have been analysed in real grid environments highlighting the efficiency and scalability of the proposed infrastructure. This paper presents an intra site dynamic scheduling heuristic for tightly coupled parallel applications represented by DAGs.
Extending XACML Authorisation Model to Support Policy Obligations Handling in Distributed Applications
Yuri Demchenko, Cees de Laat, Oscar Koeroo and Hakon Sagehaug
Abstract: The paper summarises the recent and on-going developments and discussions in the Grid security community to built interoperable and scalable AuthZ infrastructure for distributed applications. The paper provides a short overview of the XACML policy format and policy obligations definition in the XACML specification. The paper analyses the basic use cases for obligations in computer Grids and on-demand network resource provisioning abstracted to the general complex resource provisioning (CRP) model to identify major requirements and functionalities in obligations handling that further is proposed as a Reference Model for Obligations Handling (OHRM). The paper refers to ongoing implementations of the obligations interoperability and handling framework in such project as EU funded projects EGEE and Phosphorus. The proposed implementation is based on the adoption and extension of the OASIS SAML2.0 profile of XACML specification but defining a number of missing interface definitions and semantic conventions. The purpose of this paper is to facilitate wider discussion of the policy obligations concept based on the described ongoing implementations.
Heuristic for resources allocation on utility computing infrastructures
Joao Nuno Silva, Luis Veiga and Paulo Ferreira
Abstract: The use of utility on demand computing infrastructures, such as Amazon's Elastic Clouds, is a viable solution to speed lengthy parallel computing problems to those who do not have access to other cluster or grid infrastructures. With a suitable middleware, bag-of-tasks problems could be easily deployed over a pool of virtual computers created on such infrastructures. In a bag-of-tasks problems, as there is no communication between tasks, the number of concurrent tasks can vary over time. In a utility computing infrastructure, if too many virtual computers are created, the speedups are high but could not be cost effective, if too few computers are created, the cost are low but the speedups may be below expectations. Without previous knowledge of the processing time of each task it is difficult to define how many machines should be created. In this paper we present an heuristic to obtain the best number of machines that should be allocated to process tasks so that for a given budget the speedups are maximal. We simulated the proposed heuristics against real and theoretical workloads and evaluated the ratio between number of allocated hosts, charged times, speedups and processing times. With the proposed heuristic, it is possible to obtain speedups in line with the number of allocated computers, while being charged approximately the predefined budget.
A Group Membership Service for Large-Scale Grids
Fernando Castor Filho, Fabio Kon, Raphael Y. de Camargo and Augusta Marques
Abstract: To avoid wasting grid resources, a grid infrastructure must be capable of detecting consistently failures of its participants as soon as possible. When coupled with a failure recovery mechanism, this capability can avoid general application failures and, in cases where a general failure is inevitable, allow a grid job to be restarted (or rescheduled) as soon as possible. In this paper, we propose a decentralized group membership service that can be incorporated into existing grid middleware to make it more reliable. This service includes a flexible failure detector that adapts dynamically to changing network conditions and can be configured with a number of failure recovery strategies. Moreover, it disseminates information about membership changes (new processes, failures, etc.) in a scalable and efficient manner. We conducted a preliminary evaluation of the proposed service by simulating a grid with up to 140 nodes distributed across three domains separated by a wide-area network. This evaluation showed that the proposed service performs well both in the absence and in the presence of process failures.
A Tool for Isolating Performance in General-Purpose Operating Systems
Valeria Reis and Renato Cerqueira
Abstract: General-purpose Operating Systems do not provide effective mechanisms for application processing reservation. For this reason, some initiatives aim at guaranteeing processing by instrumenting kernels or by isolating the performance through the creation of virtual machines. As will be described in the present paper, CPUReserve works differently from these approaches. It is a processing reservation system that runs at user level. Because CPUReserve presents a client-server architecture and significant scalability - as suggested by the experiments carried out - it can be used in distributed and shared environments just like computational grids.
Using Clouds to address Grid Limitations
Giacomo Mc Evoy and Bruno Schulze
Abstract: This article is a review of the importance of identifying the conceptual components participating in the design of Grid infrastructure, the interfaces presented to other elements and the semantics involved. We will show that the middleware layer still exposes too much detail of the underlying implementation, thus making the application development more complex, difficulting interoperability and scaling. We also discuss Cloud Computing, an emerging technology that has been so far successful in the IT market, also show how Grids and Clouds are related, and to what extent these technologies may provide features that will help accomplish the Grid vision for e-Science applications.
PastryGrid:Decentralisation of the execution of distributed applications in Desktop Grid
Heithem Abbes, Christophe Cerin
Abstract: This paper proposes a decentralised system for managing Desktop Grid (DG). The idea is to bypass the main drawback of existing systems putting all the control on a single master that can fails. Here, each node can play alternatively the role of client or server. Our main contribution is to design the PastryGrid protocol (based on Pastry) for DG in order to decentralise the execution of a distributed application with precedence between tasks. Comparing to a centralised system, we evaluate our approach over 205 machines executing 2500 tasks. The results show that our decentralised system runs better than the same system configured as a master/slave because it gives less overhead.
Cyclotron: A Secure, Isolated Virtual Cycle-Scavenging Grid in the Enterprise
Kevin Kane and Blair Dillaway
Abstract: Cycle-scavenging grids appeal to organizations with large numbers of workstations that remain idle outside of working hours as potential sources of grid computing cycles, but security and isolation issues that come with the use of non-dedicated resources have slowed their adoption in the enterprise. In this paper we present Cyclotron, a prototype cycle-stealing grid solution that leverages virtualization and a declarative policy-based access control infrastructure supporting flexible authorization rules and the constrained delegation of access rights to address these requirements.