Dissertations / Theses: 'Distributed computing infrastructure'

1

AlJabban, Tarek. "Distributed database storage management for a cloud computing infrastructure." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114556.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Internet applications have recently witnessed tremendous growth in terms of both size and complexity. Cloud computing is one of the several distributed technologies that have emerged to help meeting the objectives of these applications in terms of achieving high availability, performance and scalability.Platform as a Service (PaaS) is one kind of services provided by cloud solutions. These systems often follow a multi-tier architecture consisting mainly of a presentation tier, an application tier and a database tier. The volumes of data exchanged between the application tier and the database tier become huge, especially for enterprise level applications. As a result, the design of the database tier in cloud systems has to carefully address the scalability challenges rising from the huge data volumes. In this thesis, we propose a data distribution approach to improve the scalability of the database tier. Our approach is applied to a traditional single database server. It works by replacing the traditionally used single machine storage paradigm with a distributed storage paradigm. The suggested approach maintains the features that originally exists in the database system, and additionally provides the features of distribution and replication. Distributing the data storage helps improving the system fault-tolerance as it decreases the possibility of having a failure at the database server. It also helps resolve specific performance issues such as reducing the I/O usage and consecutively decreasing the possibility of an I/O bottleneck. Yet, it produces other performance challenges that need to be addressed. To prove the feasibility of our proposed approach, we use it to implement two extensions to the storage manager module of the PostgreSQL database system, using the HDFS distributed file system, and the HBase distributed key-value store.
Les applications Internet ont récemment connu une croissance considérable en termes de taille et de complexité. Afin de satisfaire la forte demande pour les ressources informatiques et les espaces de stockage, les technologies en distribution ont commencé à devenir plus impliquées dans les applications à grande échelle. Le Cloud Computing est l'une de ces nombreuses technologies qui ont émergé pour aider à atteindre les objectifs de ces applications, telles que la haute disponibilité, les performances et l'évolutivité.Platform as a Service (PaaS) est un type de service qui peut être fourni par les solutions de Cloud Computing. Ces systèmes suivent souvent une architecture multi-niveaux qui se compose principalement d'un niveau de présentation, un niveau d'application et d'un niveau de base de données. Les volumes de données échangées entre l'application et la base de données deviennent énormes en particulier pour les applications de niveau entreprise. En conséquence, la conception de la base de données dans les systèmes de Cloud Computing doit prendre en compte le challenge de l'évolution des quantités énormes de données. Dans cette mémoire, nous proposons une approche de distribution des données qui peuvent être utilisées pour améliorer l'évolutivité des bases de données. Nous proposons deux techniques qui peuvent être appliquées à un serveur de base de données unique traditionnelle.Ces techniques fonctionnent en remplaçant le paradigme traditionnel utilisant une seule machine de stockage avec un paradigme de stockage distribué. Les techniques proposées maintiennent les caractéristiques qui existaient à l'origine dans le système de base de données, et en plus fournissent les caractéristiques de la distribution et de la réplication. Ces deux fonctionnalités supplémentaires aident à améliorer le système de tolérance aux pannes, car ils diminuent la possibilité d'avoir une défaillance au niveau du serveur de base de données. La distribution du stockage permet de résoudre les problèmes de performances spécifiques, tels que la réduction de l'utilisation des entrées/sorties et consécutivement de diminuer la possibilité de saturation des entrées/sorties.Par ailleurs, cela produit d'autres défis de performances qui doivent être pris en compte. Pour prouver la faisabilité de nos techniques, nous les avons implémentées comme des extensions du module de gestion de stockage de la base de données PostgreSQL.

2

LUCREZIA, FRANCESCO. "Network Infrastructures for Highly Distributed Cloud-Computing." Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2706032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Software-Defined-Network (SDN) is emerging as a solid opportunity for the Network Service Providers (NSP) to reduce costs while at the same time providing better and/or new services. The possibility to flexibly manage and configure highly-available and scalable network services through data model abstractions and easy-to-consume APIs is attractive and the adoption of such technologies is gaining momentum. At the same time, NSPs are planning to innovate their infrastructures through a process of network softwarisation and programmability. The SDN paradigm aims at improving the design, configuration, maintenance and service provisioning agility of the network through a centralised software control. This can be easily achievable in local area networks, typical of data-centers, where the benefits of having programmable access to the entire network is not restricted by latency between the network devices and the SDN controller which is reasonably located in the same LAN of the data path nodes. In Wide Area Networks (WAN), instead, a centralised control plane limits the speed of responsiveness in reaction to time-constrained network events due to unavoidable latencies caused by physical distances. Moreover, an end-to-end control shall involve the participation of multiple, domain-specific, controllers: access devices, data-center fabrics and backbone networks have very different characteristics and their control-plane could hardly coexist in a single centralised entity, unless of very complex solutions which inevitably lead to software bugs, inconsistent states and performance issues. In recent years, the idea to exploit SDN for WAN infrastructures to connect multiple sites together has spread in both the scientific community and the industry. The former has produced interesting results in terms of framework proposals, complexity and performance analysis for network resource allocation schemes and open-source proof of concept prototypes targeting SDN architectures spanning multiple technological and administrative domains. On the other hand, much of the work still remains confined to the academy mainly because based on pure Openflow prototype implementation, networks emulated on a single general-purpose machine or on simulations proving algorithms effectiveness. The industry has made SDN a reality via closed-source systems, running on single administrative domain networks with little if no diversification of access and backbone devices. In this dissertation we present our contributions to the design and the implementation of SDN architectures for the control plane of WAN infrastructures. In particular, we studied and prototyped two SDN platforms to build a programmable, intent-based, control-plane suitable for the today highly distributed cloud infrastructures. Our main contributions are: (i) an holistic and architectural description of a distributed SDN control-plane for end-end QoS provisioning; we compare the legacy IntServ RSVP protocol with a novel approach for prioritising application-sensitive flows via centralised vantage points. It is based on a peer-to-peer architecture and could so be suitable for the inter-authoritative domains scenario. (ii) An open-source platform based on a two-layer hierarchy of network controllers designed to provision end-to-end connectivity in real networks composed by heterogeneous devices and links within a single authoritative domain. This platform has been integrated in CORD, an open-source project whose goal is to bring data-center economics and cloud agility to the NSP central office infrastructures, combining NFV (Network Function Virtualization), SDN and the elasticity of commodity clouds. Our platform enables the provisioning of connectivity services between multiple CORD sites, up to the customer premises. Thus our system and software contributions in SDN has been combined with a NFV infrastructure for network service automation and orchestration.

3

Khan, Kashif. "A distributed computing architecture to enable advances in field operations and management of distributed infrastructure." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/a-distributed-computing-architecture-to-enable-advances-in-field-operations-and-management-of-distributed-infrastructure(a9181e99-adf3-47cb-93e1-89d267219e50).html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Distributed infrastructures (e.g., water networks and electric Grids) are difficult to manage due to their scale, lack of accessibility, complexity, ageing and uncertainties in knowledge of their structure. In addition they are subject to loads that can be highly variable and unpredictable and to accidental events such as component failure, leakage and malicious tampering. To support in-field operations and central management of these infrastructures, the availability of consistent and up-to-date knowledge about the current state of the network and how it would respond to planned interventions is argued to be highly desirable. However, at present, large-scale infrastructures are “data rich but knowledge poor”. Data, algorithms and tools for network analysis are improving but there is a need to integrate them to support more directly engineering operations. Current ICT solutions are mainly based on specialized, monolithic and heavyweight software packages that restrict the dissemination of dynamic information and its appropriate and timely presentation particularly to field engineers who operate in a resource constrained and less reliable environments. This thesis proposes a solution to these problems by recognizing that current monolithic ICT solutions for infrastructure management seek to meet the requirements of different human roles and operating environments (defined in this work as field and central sides). It proposes an architectural approach to providing dynamic, predictive, user-centric, device and platform independent access to consistent and up-to-date knowledge. This architecture integrates the components required to implement the functionalities of data gathering, data storage, simulation modelling, and information visualization and analysis. These components are tightly coupled in current implementations of software for analysing the behaviour of networks. The architectural approach, by contrast, requires they be kept as separate as possible and interact only when required using common and standard protocols. The thesis particularly concentrates on engineering practices in clean water distribution networks but the methods are applicable to other structural networks, for example, the electricity Grid. A prototype implementation is provided that establishes a dynamic hydraulic simulation model and enables the model to be queried via remote access in a device and platform independent manner.This thesis provides an extensive evaluation comparing the architecture driven approach with current approaches, to substantiate the above claims. This evaluation is conducted by the use of benchmarks that are currently published and accepted in the water engineering community. To facilitate this evaluation, a working prototype of the whole architecture has been developed and is made available under an open source licence.

4

Peters, Stephen Leslie. "Hyperglue : an infrastructure for human-centered computing in distributed, pervasive, intelligent environments." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/35594.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
Includes bibliographical references (p. 161-165).
As intelligent environments (IEs) move from simple kiosks and meeting rooms into the everyday offices, kitchens, and living spaces we use, the need for these spaces to communicate not only with users, but also with each other, will become increasingly important. Users will want to be able to shift their work environment between localities easily, and will also need to communicate with others as they move about. These IEs will thus require two pieces of infrastructure: a knowledge representation (KR) which can keep track of people and their relationships to the world; and a communication mechanism so that the IE can mediate interactions. This thesis seeks to define, explore and evaluate one way of creating this infrastructure, by creating societies of agents that can act on behalf of real-world entities such as users, physical spaces, or informal groups of people. Just as users interact with each other and with objects in their physical location, the agent societies interact with each other along communication channels organized along these same relationships. By organizing the infrastructure through analogies to the real world, we hope to achieve a simpler conceptual model for the users, as well as a communication hierarchy which can be realized efficiently.
by Stephen L. Peters.
Ph.D.

5

Bianchi, Stefano. "Design and Implementation of a Cloud Infrastructure for Distributed Scientific Calculation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Cloud computing enables independent end users and applications to share data and pooled resources, possibly located in geographically distributed Data Centers, in a fully transparent way. This need is particularly felt by scientific applications to exploit distributed resources in efficient and scalable way for the processing of big amount of data. This paper proposes an open so- lution to deploy a Platform as a service (PaaS) over a set of multi- site data centers by applying open source virtualization tools to facilitate operation among virtual machines while optimizing the usage of distributed resources. An experimental testbed is set up in Openstack environment to obtain evaluations with different types of TCP sample connections to demonstrate the functionality of the proposed solution and to obtain throughput measurements in relation to relevant design parameters.

6

Mechtri, Marouen. "Virtual networked infrastructure provisioning in distributed cloud environments." Thesis, Evry, Institut national des télécommunications, 2014. http://www.theses.fr/2014TELE0028/document.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

L'informatique en nuage (Cloud Computing) a émergé comme un nouveau paradigme pour offrir des ressources informatiques à la demande et pour externaliser des infrastructures logicielles et matérielles. Le Cloud Computing est rapidement et fondamentalement en train de révolutionner la façon dont les services informatiques sont mis à disposition et gérés. Ces services peuvent être demandés à partir d’un ou plusieurs fournisseurs de Cloud d’où le besoin de la mise en réseau entre les composants des services informatiques distribués dans des emplacements géographiquement répartis. Les utilisateurs du Cloud veulent aussi déployer et instancier facilement leurs ressources entre les différentes plateformes hétérogènes de Cloud Computing. Les fournisseurs de Cloud assurent la mise à disposition des ressources de calcul sous forme des machines virtuelles à leurs utilisateurs. Par contre, ces clients veulent aussi la mise en réseau entre leurs ressources virtuelles. En plus, ils veulent non seulement contrôler et gérer leurs applications, mais aussi contrôler la connectivité réseau et déployer des fonctions et des services de réseaux complexes dans leurs infrastructures virtuelles dédiées. Les besoins des utilisateurs avaient évolué au-delà d'avoir une simple machine virtuelle à l'acquisition de ressources et de services virtuels complexes, flexibles, élastiques et intelligents. L'objectif de cette thèse est de permettre le placement et l’instanciation des ressources complexes dans des infrastructures de Cloud distribués tout en permettant aux utilisateurs le contrôle et la gestion de leurs ressources. En plus, notre objectif est d'assurer la convergence entre les services de cloud et de réseau. Pour atteindre cela, nous proposons des algorithmes de mapping d’infrastructures virtuelles dans les centres de données et dans le réseau tout en respectant les exigences des utilisateurs. Avec l'apparition du Cloud Computing, les réseaux traditionnels sont étendus et renforcés avec des réseaux logiciels reposant sur la virtualisation des ressources et des fonctions réseaux. En plus, le nouveau paradigme d'architecture réseau (Software Defined Networks) est particulièrement pertinent car il vise à offrir la programmation du réseau et à découpler, dans un équipement réseau, la partie plan de données de la partie plan de contrôle. Dans ce contexte, la première partie propose des algorithmes optimaux (exacts) et heuristiques de placement pour trouver le meilleur mapping entre les demandes des utilisateurs et les infrastructures sous-jacentes, tout en respectant les exigences exprimées dans les demandes. Cela inclut des contraintes de localisation permettant de placer une partie des ressources virtuelles dans le même nœud physique. Ces contraintes assurent aussi le placement des ressources dans des nœuds distincts. Les algorithmes proposés assurent le placement simultané des nœuds et des liens virtuels sur l’infrastructure physique. Nous avons proposé aussi un algorithme heuristique afin d’accélérer le temps de résolution et de réduire la complexité du problème. L'approche proposée se base sur la technique de décomposition des graphes et la technique de couplage des graphes bipartis. Dans la troisième partie, nous proposons un cadriciel open source (framework) permettant d’assurer la mise en réseau dynamique entre des ressources Cloud distribués et l’instanciation des fonctions réseau dans l’infrastructure virtuelle de l’utilisateur. Ce cadriciel permettra de déployer et d’activer les composants réseaux afin de mettre en place les demandes des utilisateurs. Cette solution se base sur un gestionnaire des ressources réseaux "Cloud Network Gateway Manager" et des passerelles logicielles permettant d’établir la connectivité dynamique et à la demande entre des ressources cloud et réseau. Le CNG-Manager offre le contrôle de la partie réseau et prend en charge le déploiement des fonctions réseau nécessaires dans l'infrastructure virtuelle des utilisateurs
Cloud computing emerged as a new paradigm for on-demand provisioning of IT resources and for infrastructure externalization and is rapidly and fundamentally revolutionizing the way IT is delivered and managed. The resulting incremental Cloud adoption is fostering to some extent cloud providers cooperation and increasing the needs of tenants and the complexity of their demands. Tenants need to network their distributed and geographically spread cloud resources and services. They also want to easily accomplish their deployments and instantiations across heterogeneous cloud platforms. Traditional cloud providers focus on compute resources provisioning and offer mostly virtual machines to tenants and cloud services consumers who actually expect full-fledged (complete) networking of their virtual and dedicated resources. They not only want to control and manage their applications but also control connectivity to easily deploy complex network functions and services in their dedicated virtual infrastructures. The needs of users are thus growing beyond the simple provisioning of virtual machines to the acquisition of complex, flexible, elastic and intelligent virtual resources and services. The goal of this thesis is to enable the provisioning and instantiation of this type of more complex resources while empowering tenants with control and management capabilities and to enable the convergence of cloud and network services. To reach these goals, the thesis proposes mapping algorithms for optimized in-data center and in-network resources hosting according to the tenants' virtual infrastructures requests. In parallel to the apparition of cloud services, traditional networks are being extended and enhanced with software networks relying on the virtualization of network resources and functions especially through network resources and functions virtualization. Software Defined Networks are especially relevant as they decouple network control and data forwarding and provide the needed network programmability and system and network management capabilities. In such a context, the first part proposes optimal (exact) and heuristic placement algorithms to find the best mapping between the tenants' requests and the hosting infrastructures while respecting the objectives expressed in the demands. This includes localization constraints to place some of the virtual resources and services in the same host and to distribute other resources in distinct hosts. The proposed algorithms achieve simultaneous node (host) and link (connection) mappings. A heuristic algorithm is proposed to address the poor scalability and high complexity of the exact solution(s). The heuristic scales much better and is several orders of magnitude more efficient in terms of convergence time towards near optimal and optimal solutions. This is achieved by reducing complexity of the mapping process using topological patterns to map virtual graph requests to physical graphs representing respectively the tenants' requests and the providers' physical infrastructures. The proposed approach relies on graph decomposition into topology patterns and bipartite graphs matching techniques. The third part propose an open source Cloud Networking framework to achieve cloud and network resources provisioning and instantiation in order to respectively host and activate the tenants' virtual resources and services. This framework enables and facilitates dynamic networking of distributed cloud services and applications. This solution relies on a Cloud Network Gateway Manager and gateways to establish dynamic connectivity between cloud and network resources. The CNG-Manager provides the application networking control and supports the deployment of the needed underlying network functions in the tenant desired infrastructure (or slice since the physical infrastructure is shared by multiple tenants with each tenant receiving a dedicated and isolated portion/share of the physical resources)

7

Svärd, Petter. "Dynamic Cloud Resource Management : Scheduling, Migration and Server Disaggregation." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-87904.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

A key aspect of cloud computing is the promise of infinite, scalable resources, and that cloud services should scale up and down on demand. This thesis investigates methods for dynamic resource allocation and management of services in cloud datacenters, introducing new approaches as well as improvements to established technologies.Virtualization is a key technology for cloud computing as it allows several operating system instances to run on the same Physical Machine, PM, and cloud services normally consists of a number of Virtual Machines, VMs, that are hosted on PMs. In this thesis, a novel virtualization approach is presented. Instead of running each PM isolated, resources from multiple PMs in the datacenter are disaggregated and exposed to the VMs as pools of CPU, I/O and memory resources. VMs are provisioned by using the right amount of resources from each pool, thereby enabling both larger VMs than any single PM can host as well as VMs with tailor-made specifications for their application. Another important aspect of virtualization is live migration of VMs, which is the concept moving VMs between PMs without interruption in service. Live migration allows for better PM utilization and is also useful for administrative purposes. In the thesis, two improvements to the standard live migration algorithm are presented, delta compression and page transfer reordering. The improvements can reduce migration downtime, i.e., the time that the VM is unavailable, as well as the total migration time. Postcopy migration, where the VM is resumed on the destination before the memory content is transferred is also studied. Both userspace and in-kernel postcopy algorithms are evaluated in an in-depth study of live migration principles and performance.Efficient mapping of VMs onto PMs is a key problem for cloud providers as PM utilization directly impacts revenue. When services are accepted into a datacenter, a decision is made on which PM should host the service VMs. This thesis presents a general approach for service scheduling that allows for the same scheduling software to be used across multiple cloud architectures. A number of scheduling algorithms to optimize objectives like revenue or utilization are also studied. Finally, an approach for continuous datacenter consolidation is presented. As VM workloads fluctuate and server availability varies any initial mapping is bound to become suboptimal over time. The continuous datacenter consolidation approach adjusts this VM-to-PM mapping during operation based on combinations of management actions, like suspending/resuming PMs, live migrating VMs, and suspending/resuming VMs. Proof-of-concept software and a set of algorithms that allows cloud providers to continuously optimize their server resources are presented in the thesis.

8

Mechtri, Marouen. "Virtual networked infrastructure provisioning in distributed cloud environments." Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2014. http://www.theses.fr/2014TELE0028.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

L'informatique en nuage (Cloud Computing) a émergé comme un nouveau paradigme pour offrir des ressources informatiques à la demande et pour externaliser des infrastructures logicielles et matérielles. Le Cloud Computing est rapidement et fondamentalement en train de révolutionner la façon dont les services informatiques sont mis à disposition et gérés. Ces services peuvent être demandés à partir d’un ou plusieurs fournisseurs de Cloud d’où le besoin de la mise en réseau entre les composants des services informatiques distribués dans des emplacements géographiquement répartis. Les utilisateurs du Cloud veulent aussi déployer et instancier facilement leurs ressources entre les différentes plateformes hétérogènes de Cloud Computing. Les fournisseurs de Cloud assurent la mise à disposition des ressources de calcul sous forme des machines virtuelles à leurs utilisateurs. Par contre, ces clients veulent aussi la mise en réseau entre leurs ressources virtuelles. En plus, ils veulent non seulement contrôler et gérer leurs applications, mais aussi contrôler la connectivité réseau et déployer des fonctions et des services de réseaux complexes dans leurs infrastructures virtuelles dédiées. Les besoins des utilisateurs avaient évolué au-delà d'avoir une simple machine virtuelle à l'acquisition de ressources et de services virtuels complexes, flexibles, élastiques et intelligents. L'objectif de cette thèse est de permettre le placement et l’instanciation des ressources complexes dans des infrastructures de Cloud distribués tout en permettant aux utilisateurs le contrôle et la gestion de leurs ressources. En plus, notre objectif est d'assurer la convergence entre les services de cloud et de réseau. Pour atteindre cela, nous proposons des algorithmes de mapping d’infrastructures virtuelles dans les centres de données et dans le réseau tout en respectant les exigences des utilisateurs. Avec l'apparition du Cloud Computing, les réseaux traditionnels sont étendus et renforcés avec des réseaux logiciels reposant sur la virtualisation des ressources et des fonctions réseaux. En plus, le nouveau paradigme d'architecture réseau (Software Defined Networks) est particulièrement pertinent car il vise à offrir la programmation du réseau et à découpler, dans un équipement réseau, la partie plan de données de la partie plan de contrôle. Dans ce contexte, la première partie propose des algorithmes optimaux (exacts) et heuristiques de placement pour trouver le meilleur mapping entre les demandes des utilisateurs et les infrastructures sous-jacentes, tout en respectant les exigences exprimées dans les demandes. Cela inclut des contraintes de localisation permettant de placer une partie des ressources virtuelles dans le même nœud physique. Ces contraintes assurent aussi le placement des ressources dans des nœuds distincts. Les algorithmes proposés assurent le placement simultané des nœuds et des liens virtuels sur l’infrastructure physique. Nous avons proposé aussi un algorithme heuristique afin d’accélérer le temps de résolution et de réduire la complexité du problème. L'approche proposée se base sur la technique de décomposition des graphes et la technique de couplage des graphes bipartis. Dans la troisième partie, nous proposons un cadriciel open source (framework) permettant d’assurer la mise en réseau dynamique entre des ressources Cloud distribués et l’instanciation des fonctions réseau dans l’infrastructure virtuelle de l’utilisateur. Ce cadriciel permettra de déployer et d’activer les composants réseaux afin de mettre en place les demandes des utilisateurs. Cette solution se base sur un gestionnaire des ressources réseaux "Cloud Network Gateway Manager" et des passerelles logicielles permettant d’établir la connectivité dynamique et à la demande entre des ressources cloud et réseau. Le CNG-Manager offre le contrôle de la partie réseau et prend en charge le déploiement des fonctions réseau nécessaires dans l'infrastructure virtuelle des utilisateurs
Cloud computing emerged as a new paradigm for on-demand provisioning of IT resources and for infrastructure externalization and is rapidly and fundamentally revolutionizing the way IT is delivered and managed. The resulting incremental Cloud adoption is fostering to some extent cloud providers cooperation and increasing the needs of tenants and the complexity of their demands. Tenants need to network their distributed and geographically spread cloud resources and services. They also want to easily accomplish their deployments and instantiations across heterogeneous cloud platforms. Traditional cloud providers focus on compute resources provisioning and offer mostly virtual machines to tenants and cloud services consumers who actually expect full-fledged (complete) networking of their virtual and dedicated resources. They not only want to control and manage their applications but also control connectivity to easily deploy complex network functions and services in their dedicated virtual infrastructures. The needs of users are thus growing beyond the simple provisioning of virtual machines to the acquisition of complex, flexible, elastic and intelligent virtual resources and services. The goal of this thesis is to enable the provisioning and instantiation of this type of more complex resources while empowering tenants with control and management capabilities and to enable the convergence of cloud and network services. To reach these goals, the thesis proposes mapping algorithms for optimized in-data center and in-network resources hosting according to the tenants' virtual infrastructures requests. In parallel to the apparition of cloud services, traditional networks are being extended and enhanced with software networks relying on the virtualization of network resources and functions especially through network resources and functions virtualization. Software Defined Networks are especially relevant as they decouple network control and data forwarding and provide the needed network programmability and system and network management capabilities. In such a context, the first part proposes optimal (exact) and heuristic placement algorithms to find the best mapping between the tenants' requests and the hosting infrastructures while respecting the objectives expressed in the demands. This includes localization constraints to place some of the virtual resources and services in the same host and to distribute other resources in distinct hosts. The proposed algorithms achieve simultaneous node (host) and link (connection) mappings. A heuristic algorithm is proposed to address the poor scalability and high complexity of the exact solution(s). The heuristic scales much better and is several orders of magnitude more efficient in terms of convergence time towards near optimal and optimal solutions. This is achieved by reducing complexity of the mapping process using topological patterns to map virtual graph requests to physical graphs representing respectively the tenants' requests and the providers' physical infrastructures. The proposed approach relies on graph decomposition into topology patterns and bipartite graphs matching techniques. The third part propose an open source Cloud Networking framework to achieve cloud and network resources provisioning and instantiation in order to respectively host and activate the tenants' virtual resources and services. This framework enables and facilitates dynamic networking of distributed cloud services and applications. This solution relies on a Cloud Network Gateway Manager and gateways to establish dynamic connectivity between cloud and network resources. The CNG-Manager provides the application networking control and supports the deployment of the needed underlying network functions in the tenant desired infrastructure (or slice since the physical infrastructure is shared by multiple tenants with each tenant receiving a dedicated and isolated portion/share of the physical resources)

9

Rojas, Balderrama Javier. "Gestion du cycle de vie de services déployés sur une infrastructure de calcul distribuée en neuroinformatique." Phd thesis, Université de Nice Sophia-Antipolis, 2012. http://tel.archives-ouvertes.fr/tel-00804893.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

L'intérêt va croissant parmi les communautés scientifiques pour le partage de données et d'applications qui facilitent les recherches et l'établissement de collaborations fructueuses. Les domaines interdisciplinaires tels que les neurosciences nécessitent particulièrement de disposer d'une puissance de calcul suffisante pour l'expérimentation à grande échelle. Malgré les progrès réalisés dans la mise en œuvre de telles infrastructures distribuées, de nombreux défis sur l'interopérabilité et le passage à l'échelle ne sont pas complètement résolus. L'évolution permanente des technologies, la complexité intrinsèque des environnements de production et leur faible fiabilité à l'exécution sont autant de facteurs pénalisants. Ce travail porte sur la modélisation et l'implantation d'un environnement orienté services qui permet l'exécution d'applications scientifiques sur des infrastructures de calcul distribué, exploitant leur capacité de calcul haut débit. Le modèle comprend une spécification de description d'interfaces en ligne de commande; un pont entre les architectures orientées services et le calcul globalisé; ainsi que l'utilisation efficace de ressources locales et distantes pour le passage à l'échelle. Une implantation de référence est réalisée pour démontrer la faisabilité de cette approche. Sa pertinence et illustrée dans le contexte de deux projets de recherche dirigés par des campagnes expérimentales de grande ampleur réalisées sur des ressources distribuées. L'environnement développé se substitue aux systèmes existants dont les préoccupations se concentrent souvent sur la seule exécution. Il permet la gestion de codes patrimoniaux en tant que services, prenant en compte leur cycle de vie entier. De plus, l'approche orientée services aide à la conception de flux de calcul scientifique qui sont utilisés en tant que moyen flexible pour décrire des applications composées de services multiples. L'approche proposée est évaluée à la fois qualitativement et quantitativement en utilisant des applications réelles en analyse de neuroimages. Les expériences qualitatives sont basées sur l'optimisation de la spécificité et la sensibilité des outils de segmentation du cerveau utilisés pour traiter des Image par Raisonnance Magnétique de patients atteints de sclérose en plaques. Les expériences quantitative traitent de l'accélération et de la latence mesurées pendant l'exécution d'études longitudinales portant sur la mesure d'atrophie cérébrale chez des patients affectés de la maladie d'Alzheimer.

10

Suthakar, Uthayanath. "A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/15788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.

11

Lakew, Ewnetu Bayuh. "Autonomous cloud resource provisioning : accounting, allocation, and performance control." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-107955.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The emergence of large-scale Internet services coupled with the evolution of computing technologies such as distributed systems, parallel computing, utility computing, grid, and virtualization has fueled a movement toward a new resource provisioning paradigm called cloud computing. The main appeal of cloud computing lies in its ability to provide a shared pool of infinitely scalable computing resources for cloud services, which can be quickly provisioned and released on-demand with minimal effort. The rapidly growing interest in cloud computing from both the public and industry together with the rapid expansion in scale and complexity of cloud computing resources and the services hosted on them have made monitoring, controlling, and provisioning cloud computing resources at runtime into a very challenging and complex task. This thesis investigates algorithms, models and techniques for autonomously monitoring, controlling, and provisioning the various resources required to meet services’ performance requirements and account for their resource usage. Quota management mechanisms are essential for controlling distributed shared resources so that services do not exceed their allocated or paid-for budget. Appropriate cloud-wide monitoring and controlling of quotas must be exercised to avoid over- or under-provisioning of resources. To this end, this thesis presents new distributed algorithms that efficiently manage quotas for services running across distributed nodes. Determining the optimal amount of resources to meet services’ performance requirements is a key task in cloud computing. However, this task is extremely challenging due to multi-faceted issues such as the dynamic nature of cloud environments, the need for supporting heterogeneous services with different performance requirements, the unpredictable nature of services’ workloads, the non-triviality of mapping performance measurements into resources, and resource shortages. Models and techniques that can predict the optimal amount of resources needed to meet service performance requirements at runtime irrespective of variations in workloads are proposed. Moreover, different service differentiation schemes are proposed for managing temporary resource shortages due to, e.g., flash crowds or hardware failures. In addition, the resources used by services must be accounted for in order to properly bill customers. Thus, monitoring data for running services should be collected and aggregated to maintain a single global state of the system that can be used to generate a single bill for each customer. However, collecting and aggregating such data across geographical distributed locations is challenging because the management task itself may consume significant computing and network resources unless done with care. A consistency and synchronization mechanism that can alleviate this task is proposed.

12

Ambroz, Luca. "Evaluation of a cloud infrastructure for the CMS distributed data analysis in the top quark sector at the LHC." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/7703/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Nella fisica delle particelle, onde poter effettuare analisi dati, è necessario disporre di una grande capacità di calcolo e di storage. LHC Computing Grid è una infrastruttura di calcolo su scala globale e al tempo stesso un insieme di servizi, sviluppati da una grande comunità di fisici e informatici, distribuita in centri di calcolo sparsi in tutto il mondo. Questa infrastruttura ha dimostrato il suo valore per quanto riguarda l'analisi dei dati raccolti durante il Run-1 di LHC, svolgendo un ruolo fondamentale nella scoperta del bosone di Higgs. Oggi il Cloud computing sta emergendo come un nuovo paradigma di calcolo per accedere a grandi quantità di risorse condivise da numerose comunità scientifiche. Date le specifiche tecniche necessarie per il Run-2 (e successivi) di LHC, la comunità scientifica è interessata a contribuire allo sviluppo di tecnologie Cloud e verificare se queste possano fornire un approccio complementare, oppure anche costituire una valida alternativa, alle soluzioni tecnologiche esistenti. Lo scopo di questa tesi è di testare un'infrastruttura Cloud e confrontare le sue prestazioni alla LHC Computing Grid. Il Capitolo 1 contiene un resoconto generale del Modello Standard. Nel Capitolo 2 si descrive l'acceleratore LHC e gli esperimenti che operano a tale acceleratore, con particolare attenzione all’esperimento CMS. Nel Capitolo 3 viene trattato il Computing nella fisica delle alte energie e vengono esaminati i paradigmi Grid e Cloud. Il Capitolo 4, ultimo del presente elaborato, riporta i risultati del mio lavoro inerente l'analisi comparata delle prestazioni di Grid e Cloud.

13

Peiro, Sajjad Hooman. "Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers." Licentiate thesis, KTH, Programvaruteknik och Datorsystem, SCS, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193582.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures. First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers.

QC 20161005

14

Toor, Salman Zubair. "Managing Applications and Data in Distributed Computing Infrastructures." Doctoral thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-173467.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

During the last decades the demand for large-scale computational and storage resources in science has increased dramatically. New computational infrastructures enable scientists to enter a new mode of science, e-science, which complements traditional theory and experiments. E-science is inherently interdisciplinary, involving researchers from several disciplines, and also opens up for large-scale collaborative efforts where physically distributed groups of scientists share software tools and data to make scientific progress. Within the field of e-science, new challenges are emerging in managing large-scale distributed computing efforts and distributed data sets. Different models, e.g. grids and clouds, have been introduced over the years, but new solutions built on these models are needed to enable easy and flexible use of distributed computing infrastructures by application scientists. In the first part of the thesis, application execution environments are studied. The goal is to hide technical details of the underlying distributed computing infrastructure and expose secure and user-friendly environments to the end users. First, a general-purpose solution using portal technology is described, enabling transparent and easy usage of a variety of grid systems. Then a problem-solving environment for genetic analysis is presented. Here the statistical software R is used as a workflow engine, enhanced with grid-enabled routines for performing the computationally demanding parts of the analysis. Finally, the issue of resource allocation in grid system is briefly studied and certain modifications in the distributed resource-brokering model for the ARC middleware are proposed. The second part of the thesis presents solutions for managing and analyzing scientific data using distributed storage resources. First, a new reliable and secure file-oriented distributed storage system, Chelonia, is presented. The architectural design of the system is described and implementation issues are considered. Also, the stability and scalable performance of Chelonia is verified using several test scenarios. Then, tools for providing an efficient and easy-to-use platform for data analysis built on Chelonia are presented. Here, a database driven approach is explored. An extended architecture where Chelonia is combined with the Web-Service MEDiator (WSMED) system is implemented, providing web service tools to query data without any further programming. This approach is then developed further and Chelonia is combined with SciSPARQL, a query language that extends SPARQL to queries over numeric scientific data. This results in a system that is capable of interactive analysis of distributed data sets. Writing customized modules in Java, Python or C can fulfill advanced application-specific analysis requirements. The viability of the approach is demonstrated by applying the system to data produced by URDME, a computational environment in systems biology and results for sample queries expressed in SciSPARQL are presented. Finally, the use of an open source storage cloud, Openstack – SWIFT, for analysis of data from CERN experiments is considered. Here, a pilot implementation for the ROOT data analysis framework is presented together with a performance evaluation.
eSSENCE

15

Toor, Salman. "Managing applications and data in distributed computing infrastructures." Licentiate thesis, Uppsala universitet, Avdelningen för teknisk databehandling, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-121099.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Over the last few decades, the needs of computational power and data storage by collaborative, distributed scientific communities have increased very rapidly. Distributed computing infrastructures such as computing and storage grids provide means to connect geographically distributed resources and helps in addressing the needs of these communities. Much progress has been made in developing and operating grids, but several issues still need further attention. This thesis discusses three different aspects of managing large-scale scientific applications in grids: • Using large-scale scientific applications is often in itself a complex task, and to set them up and run experiments in a distributed environment adds another level of complexity. It is important to design general purpose and application specific frameworks that enhance the overall productivity for the scientists. The thesis present further development of a general purpose framework where existing portal technology is combined with tools for robust and middleware independent job management. Also, a pilot implementation of a domain-specific problem solving environment based on a grid-enabled R solution is presented. • Many current and future applications will need large-scale storage systems. Centralized systems are eventually not scalable enough to handle huge data volumes and also have can have additional problems with security and availability. An alternative is a reliable and efficient distributed storage system. In the thesis the architecture of a self-healing, grid-aware distributed storage cloud, Chelonia, is described and performance results for a pilot implementation are presented. • In a distributed computing infrastructure it is very important to manage and utilize the available resources efficiently. The thesis presents a review of different resource brokering techniques and how they are implemented in different production level middlewares. Also, a modified resource allocation model for the Advanced Resource Connector (ARC) middleware is described and performance experiments are presented.
eSSENCE

16

Espling, Daniel. "Enabling Technologies for Management of Distributed Computing Infrastructures." Doctoral thesis, Umeå universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-80129.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Computing infrastructures offer remote access to computing power that can be employed, e.g., to solve complex mathematical problems or to host computational services that need to be online and accessible at all times. From the perspective of the infrastructure provider, large amounts of distributed and often heterogeneous computer resources need to be united into a coherent platform that is then made accessible to and usable by potential users. Grid computing and cloud computing are two paradigms that can be used to form such unified computational infrastructures. Resources from several independent infrastructure providers can be joined to form large-scale decentralized infrastructures. The primary advantage of doing this is that it increases the scale of the available resources, making it possible to address more complex problems or to run a greater number of services on the infrastructures. In addition, there are advantages in terms of factors such as fault-tolerance and geographical dispersion. Such multi-domain infrastructures require sophisticated management processes to mitigate the complications of executing computations and services across resources from different administrative domains. This thesis contributes to the development of management processes for distributed infrastructures that are designed to support multi-domain environments. It describes investigations into how fundamental management processes such as scheduling and accounting are affected by the barriers imposed by multi-domain deployments, which include technical heterogeneity, decentralized and (domain-wise) self-centric decision making, and a lack of information on the state and availability of remote resources. Four enabling technologies or approaches are explored and developed within this work: (I) The use of explicit definitions of cloud service structure as inputs for placement and management processes to ensure that the resulting placements respect the internal relationships between different service components and any relevant constraints. (II) Technology for the runtime adaptation of Virtual Machines to enable the automatic adaptation of cloud service contexts in response to changes in their environment caused by, e.g., service migration across domains. (III) Systems for managing meta-data relating to resource usage in multi-domain grid computing and cloud computing infrastructures. (IV) A global fairshare prioritization mechanism that enables computational jobs to be consistently prioritized across a federation of several decentralized grid installations. Each of these technologies will facilitate the emergence of decentralized computational infrastructures capable of utilizing resources from diverse infrastructure providers in an automatic and seamless manner.

Note that the author changed surname from Henriksson to Espling in 2011

17

Chuchuk, Olga. "Optimisation de l'accès aux données au CERN et dans la Grille de calcul mondiale pour le LHC (WLCG)." Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

La Grille de calcul mondiale pour le LHC (WLCG) offre une infrastructure informatique distribuée considérable dédiée à la communauté scientifique impliquée dans le Grand Collisionneur de Hadrons (LHC) du CERN. Avec un stockage total d'environ un exaoctet, le WLCG répond aux besoins de traitement et de stockage des données de milliers de scientifiques internationaux. À mesure que la phase du High-Luminosity LHC (HL-LHC) approche, le volume de données à analyser augmentera considérablement, dépassant les gains attendus grâce à l'avancement de la technologie de stockage. Par conséquent, de nouvelles approches pour un accès et une gestion efficaces des données, telles que les caches, deviennent essentielles. Cette thèse se plonge dans une exploration exhaustive de l'accès au stockage au sein du WLCG, dans le but d'améliorer le débit scientifique global tout en limitant les coûts. Au cœur de cette recherche se trouve l'analyse des journaux d'accès aux fichiers réels provenant du système de surveillance du WLCG, mettant en évidence les véritables schémas d'utilisation.Dans un contexte scientifique, la mise en cache a des implications profondes. Contrairement à des applications plus commerciales telles que la diffusion de vidéos, les caches de données scientifiques traitent des tailles de fichiers variables, allant de quelques octets à plusieurs téraoctets. De plus, les associations logiques inhérentes entre les fichiers influencent considérablement les schémas d'accès des utilisateurs. La recherche traditionnelle sur la mise en cache s'est principalement concentrée sur des tailles de fichiers uniformes et des modèles de référence indépendants. Au contraire, les charges de travail scientifiques rencontrent des variations de taille de fichier, et les interconnexions logiques entre les fichiers influencent de manière significative les schémas d'accès des utilisateurs.Mes investigations montrent comment l'organisation hiérarchique des données du LHC, en particulier leur compartimentation en "datasets", influence les schémas de demande. Reconnaissant cette opportunité, j'introduis des algorithmes de mise en cache innovants qui mettent l'accent sur la connaissance spécifique des datasets et je compare leur efficacité avec les stratégies traditionnelles axées sur les fichiers. De plus, mes découvertes mettent en évidence le phénomène des "hits retardés" déclenché par une connectivité limitée entre les sites de calcul et de stockage, mettant en lumière ses répercussions potentielles sur l'efficacité de la mise en cache.Reconnaissant le défi de longue date que représente la prédiction de la Popularité des Données dans la communauté de la Physique des Hautes Énergies (PHE), en particulier avec les énigmes de stockage à l'approche de l'ère du HL-LHC, ma recherche intègre des outils de Machine Learning (ML). Plus précisément, j'utilise l'algorithme Random Forest, connu pour sa pertinence dans le traitement des Big Data. En utilisant le ML pour prédire les futurs schémas de réutilisation des fichiers, je présente une méthode en deux étapes pour informer les politiques d'éviction de cache. Cette stratégie combine la puissance de l'analyse prédictive et des algorithmes établis d'éviction de cache, créant ainsi un système de mise en cache plus résilient pour le WLCG.En conclusion, cette recherche souligne l'importance de services de stockage robustes, suggérant une orientation vers des caches sans état pour les petits sites afin d'alléger les exigences complexes de gestion de stockage et d'ouvrir la voie à un niveau supplémentaire dans la hiérarchie de stockage. À travers cette thèse, je vise à naviguer à travers les défis et les complexités du stockage et de la récupération de données, élaborant des méthodes plus efficaces qui résonnent avec les besoins évolutifs du WLCG et de sa communauté mondiale
The Worldwide LHC Computing Grid (WLCG) offers an extensive distributed computing infrastructure dedicated to the scientific community involved with CERN's Large Hadron Collider (LHC). With storage that totals roughly an exabyte, the WLCG addresses the data processing and storage requirements of thousands of international scientists. As the High-Luminosity LHC phase approaches, the volume of data to be analysed will increase steeply, outpacing the expected gain through the advancement of storage technology. Therefore, new approaches to effective data access and management, such as caches, become essential. This thesis delves into a comprehensive exploration of storage access within the WLCG, aiming to enhance the aggregate science throughput while limiting the cost. Central to this research is the analysis of real file access logs sourced from the WLCG monitoring system, highlighting genuine usage patterns.In a scientific setting, caching has profound implications. Unlike more commercial applications such as video streaming, scientific data caches deal with varying file sizes—from a mere few bytes to multiple terabytes. Moreover, the inherent logical associations between files considerably influence user access patterns. Traditional caching research has predominantly revolved around uniform file sizes and independent reference models. Contrarily, scientific workloads encounter variances in file sizes, and logical file interconnections significantly influence user access patterns.My investigations show how LHC's hierarchical data organization, particularly its compartmentalization into datasets, impacts request patterns. Recognizing the opportunity, I introduce innovative caching policies that emphasize dataset-specific knowledge, and compare their effectiveness with traditional file-centric strategies. Furthermore, my findings underscore the "delayed hits" phenomenon triggered by limited connectivity between computing and storage locales, shedding light on its potential repercussions for caching efficiency.Acknowledging the long-standing challenge of predicting Data Popularity in the High Energy Physics (HEP) community, especially with the upcoming HL-LHC era's storage conundrums, my research integrates Machine Learning (ML) tools. Specifically, I employ the Random Forest algorithm, known for its suitability with Big Data. By harnessing ML to predict future file reuse patterns, I present a dual-stage method to inform cache eviction policies. This strategy combines the power of predictive analytics and established cache eviction algorithms, thereby devising a more resilient caching system for the WLCG. In conclusion, this research underscores the significance of robust storage services, suggesting a direction towards stateless caches for smaller sites to alleviate complex storage management requirements and open the path to an additional level in the storage hierarchy. Through this thesis, I aim to navigate the challenges and complexities of data storage and retrieval, crafting more efficient methods that resonate with the evolving needs of the WLCG and its global community

18

Ahmed, Arif. "Efficient cloud application deployment in distributed fog infrastructures." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les architectures Fog computing sont composées d'un grand nombre de machines dispersées dans une zone géographique telle qu'une ville ou une région. Dans ce contexte il est important de permettre un démarrage rapide des applications déployées sous forme de containers Docker. Cette thèse étudie les raisons de la lenteur de déploiement, et identifie trois opportunités susceptibles de réduire le temps de déploiement des conteneurs: (1) améliorer le taux de réussite du cache Docker; (2) accélérer l'opération d'installation d'une image; et (3) accélérer le processus de démarrage après la création d'un conteneur
Fog computing architectures are composed of a large number of machines distributed across a geographical area such as a city or a region. In this context it is important to support a quick startup of applications deployed in the for of docker containers. This thesis explores the reasons for slow deployment and identifies three improvement opportunities: (1) improving the Docker cache hit rate; (2) speed-up the image installation operation; and (3) accelerate the application boot phase after the creation of a container

19

Papaspyrou, Alexander [Verfasser], Uwe [Akademischer Betreuer] Schwiegelshohn, and Ramin [Gutachter] Yahyapour. "Federated capacity planning for distributed computing infrastructures / Alexander Papaspyrou. Betreuer: Uwe Schwiegelshohn. Gutachter: Ramin Yahyapour." Dortmund : Universitätsbibliothek Dortmund, 2013. http://d-nb.info/1106333322/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

CASTELLANO, GABRIELE. "On the Orchestration of Dynamic Services over Distributed IT Infrastructures." Doctoral thesis, Politecnico di Torino, 2020. http://hdl.handle.net/11583/2841178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Kiss, Tamas. "Making distributed computing infrastructures interoperable and accessible for e-scientists at the level of computational workflows." Thesis, University of Westminster, 2012. https://westminsterresearch.westminster.ac.uk/item/8z351/making-distributed-computing-infrastructures-interoperable-and-accessible-for-e-scientists-at-the-level-of-computational-workflows.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

As distributed computing infrastructures evolve, and as their take up by user communities is growing, the importance of making different types of infrastructures based on a heterogeneous set of middleware interoperable is becoming crucial. This PhD submission, based on twenty scientific publications, presents a unique solution to the challenge of the seamless interoperation of distributed computing infrastructures at the level of workflows. The submission investigates workflow level interoperation inside a particular workflow system (intra-workflow interoperation), and also between different workflow solutions (inter-workflow interoperation). In both cases the interoperation of workflow component execution and the feeding of data into these components workflow components are considered. The invented and developed framework enables the execution of legacy applications and grid jobs and services on multiple grid systems, the feeding of data from heterogeneous file and data storage solutions to these workflow components, and the embedding of non-native workflows to a hosting meta-workflow. Moreover, the solution provides a high level user interface that enables e-scientist end-users to conveniently access the interoperable grid solutions without requiring them to study or understand the technical details of the underlying infrastructure. The candidate has also developed an application porting methodology that enables the systematic porting of applications to interoperable and interconnected grid infrastructures, and facilitates the exploitation of the above technical framework.

22

Vervaet, Arthur. "Automated Log-Based Anomaly Detection within Cloud Computing Infrastructures." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS548.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les plateformes de Cloud Computing mettent à disposition de leurs clients différentes ressources informatiques à la demande. Cette externalisation rend les fournisseurs garants de la haute disponibilité et de la qualité de leurs services. La gestion d'un parc de ressources mutualisées en croissance constante demande de minimiser l'intervention humaine afin de suivre le changement d'échelle des infrastructures et d'éviter les erreurs. Dans cette thèse, réalisée en collaboration avec 3DS OUTSCALE, un fournisseur français de cloud public, nous explorons le potentiel des logs informatiques pour la détection automatique d'anomalies au sein des plateformes de cloud computing. Les journaux de logs sont écrits pendant l'exécution et fournissent des informations sur l'état actuel d'un système. Ils sont déjà largement utilisés à des fins diverses, telles que la surveillance, le diagnostic, l'évaluation des performances ou la maintenance. Cependant, l'utilisation des logs pour la détection automatique et en temps réel d'anomalies reste compliquée. La nature complexe des plateformes de cloud computing doit être dûment prise en compte. L'extraction d'informations pertinentes à partir d'une multitude de sources de logs et les évolutions fréquentes de la base de code posent des défis et introduisent des risques d'erreurs. De plus, établir des relations entre les logs au sein de tels systèmes est souvent une tâche impossible. Les solutions de structuration visent à retrouver les variables dans les messages des logs. Notre première contribution implique une étude approfondie de deux de ces méthodes en examinant l'impact de l'optimisation des hyperparamètres et du prétraitement sur leur précision. Étant donné la nature laborieuse de l'étiquetage des logs dans le contexte des plateformes de cloud computing, nous avons cherché à identifier des valeurs génériques potentielles permettant une analyse précise dans divers scénarios. Cependant, nos recherches révèlent l'impossibilité de trouver de telles valeurs, soulignant ainsi la nécessité d'approches de structuration des logs plus robustes. Notre deuxième contribution présente USTEP, une approche innovante de structuration des logs en ligne qui surpasse les méthodes existantes en termes de précision, d'efficacité et de robustesse. USTEP atteint une complexité temporelle d'analyse constante dans le pire des cas, le distinguant ainsi de ses prédécesseurs pour qui le nombre de patrons déjà découverts ralentit la vitesse de structuration. À travers une analyse comparative de cinq méthodes de structuration en ligne des logs utilisant 13 ensembles de données open source et un dérivé des systèmes de 3DS OUTSCALE, nous démontrons les performances supérieures d'USTEP. De plus, nous proposons USTEP-UP, une architecture qui permet l'exécution distribuée de plusieurs instances d'USTEP. Notre troisième contribution présente Monilog, une architecture système conçue pour la détection automatique des anomalies à partir de journaux de logs. Monilog exploite des paires modèle/métrique pour prédire l'activité logs au sein d'un système et détecter les anomalies en identifiant des changements de comportement. Les capacités prédictives de Monilog sont reforcées par notre utilisation des récentes avancées dans le domaine de l'apprentissage automatique. Il génère également des rapports détaillés mettant en évidence les composants impliqués et les applications associées à une anomalie. Nous avons implémenté une instance de Monilog à l'échelle d'une plateforme cloud et mené des analyses expérimentales pour évaluer sa capacité à prévoir des événements anormaux, tels que des pannes de serveur résultant de problèmes de virtualisation. Les résultats obtenus soutiennent notre hypothèse concernant l'utilité des logs pour la détection et la prévision d'événements anormaux. Notre implémentation de Monilog a identifié avec succès des périodes anormales et fournie des informations précieuses sur les applications concernées
Cloud computing aims to optimize resource utilization while accommodating a large user base and elastic services. Within this context, cloud computing platforms bear the responsibility of managing their customers’ infrastructure. The management of an everexpanding number of IT resources poses a significant challenge. In this study, conducted in collaboration with 3DS OUTSCALE, a French public cloud provider, we investigate the potential of log data as a valuable source for automated anomaly detection within cloud computing platforms. Logs serve as a widely utilized information source for various purposes, including monitoring, diagnosing, performance evaluation, and maintenance. These logs are generated during runtime and provide insights into the current state of a system. However, achieving automated real-time anomaly detection based on log data remains a complex undertaking. The intricate nature of cloud computing platforms must be duly considered. Extracting relevant information from a multitude of logging sources and accounting for frequent code base evolution poses challenges and introduces the potential for errors. Furthermore, establishing log relationships within such systems is often an insurmountable task. Log parsing solutions aim to extract variables from the template of log messages. Our first contribution involves a comprehensive study of two state-of-the-art log parsing methods, investigating the impact of hyperparameter tuning and preprocessing on their accuracy. Given the laborious nature of labeling logs related to a cloud computing platform, we sought to identify potential generic values that enable accurate parsing across diverse scenarios. However, our research reveals the infeasibility of finding such requirements, thereby emphasizing the necessity for more robust parsing approaches. Our second contribution introduces USTEP, an innovative online log parsing approach that surpasses existing methods in terms of accuracy, efficiency, and robustness. Notably, USTEP achieves a constant worst-case parsing time complexity, distinguishing it from its predecessors for which the number of already detected templates is to be taken into account. Through a comparative analysis of five online log parsers using 13 open-source datasets and one derived from 3DS OUTSCALE systems, we demonstrate the superior performance of USTEP. Furthermore, we propose USTEP-UP, an architecture that enables the distributed execution of multiple USTEP instances. Our third contribution presents Monilog, a system architecture designed for automated log-based anomaly detection within log data streams. Monilog leverages model/metric pairs to predict log traffic patterns within a system and detect anomalies by identifying deviations in system behavior. Monilog forecasting models are powered by the recent advances in the deep learning field and is able to generate comprehensive reports that highlight the relevant system components and the associated applications. We implemented an instance of Monilog at cloud scale and conducted experimental analyses to evaluate its ability to forecast anomalous events, such as servers crashes resulting from virtualization issues. The results obtained strongly support our hypothesis regarding the utility of logs in detecting and predicting abnormal events. Our Monilog implementation successfully identified abnormal periods and provided valuable insights into the applications involved. With Monilog, we demonstrate the value of logs in predicting anomalies in such environments and provide a flexible architecture for future study. Our work on the parsing field with the proposal of USTEP and USTEP-UP not only provides us with additional information for building anomaly detection models but also has potential benefits for other log mining applications

23

Moise, Diana Maria. "Optimizing data management for MapReduce applications on large-scale distributed infrastructures." Thesis, Cachan, Ecole normale supérieure, 2011. http://www.theses.fr/2011DENS0067/document.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les applications data-intensive sont largement utilisées au sein de domaines diverses dans le but d'extraire et de traiter des informations, de concevoir des systèmes complexes, d'effectuer des simulations de modèles réels, etc. Ces applications posent des défis complexes tant en termes de stockage que de calcul. Dans le contexte des applications data-intensive, nous nous concentrons sur le paradigme MapReduce et ses mises en oeuvre. Introduite par Google, l'abstraction MapReduce a révolutionné la communauté intensif de données et s'est rapidement étendue à diverses domaines de recherche et de production. Une implémentation domaine publique de l'abstraction mise en avant par Google, a été fournie par Yahoo à travers du project Hadoop. Le framework Hadoop est considéré l'implémentation de référence de MapReduce et est actuellement largement utilisé à des fins diverses et sur plusieurs infrastructures. Nous proposons un système de fichiers distribué, optimisé pour des accès hautement concurrents, qui puisse servir comme couche de stockage pour des applications MapReduce. Nous avons conçu le BlobSeer File System (BSFS), basé sur BlobSeer, un service de stockage distribué, hautement efficace, facilitant le partage de données à grande échelle. Nous étudions également plusieurs aspects liés à la gestion des données intermédiaires dans des environnements MapReduce. Nous explorons les contraintes des données intermédiaires MapReduce à deux niveaux: dans le même job MapReduce et pendant l'exécution des pipelines d'applications MapReduce. Enfin, nous proposons des extensions de Hadoop, un environnement MapReduce populaire et open-source, comme par example le support de l'opération append. Ce travail inclut également l'évaluation et les résultats obtenus sur des infrastructures à grande échelle: grilles informatiques et clouds
Data-intensive applications are nowadays, widely used in various domains to extract and process information, to design complex systems, to perform simulations of real models, etc. These applications exhibit challenging requirements in terms of both storage and computation. Specialized abstractions like Google’s MapReduce were developed to efficiently manage the workloads of data-intensive applications. The MapReduce abstraction has revolutionized the data-intensive community and has rapidly spread to various research and production areas. An open-source implementation of Google's abstraction was provided by Yahoo! through the Hadoop project. This framework is considered the reference MapReduce implementation and is currently heavily used for various purposes and on several infrastructures. To achieve high-performance MapReduce processing, we propose a concurrency-optimized file system for MapReduce Frameworks. As a starting point, we rely on BlobSeer, a framework that was designed as a solution to the challenge of efficiently storing data generated by data-intensive applications running at large scales. We have built the BlobSeer File System (BSFS), with the goal of providing high throughput under heavy concurrency to MapReduce applications. We also study several aspects related to intermediate data management in MapReduce frameworks. We investigate the requirements of MapReduce intermediate data at two levels: inside the same job, and during the execution of pipeline applications. Finally, we show how BSFS can enable extensions to the de facto MapReduce implementation, Hadoop, such as the support for the append operation. This work also comprises the evaluation and the obtained results in the context of grid and cloud environments

24

Tsafack, Chetsa Ghislain Landry. "System Profiling and Green Capabilities for Large Scale and Distributed Infrastructures." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2013. http://tel.archives-ouvertes.fr/tel-00946583.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Nowadays, reducing the energy consumption of large scale and distributed infrastructures has truly become a challenge for both industry and academia. This is corroborated by the many efforts aiming to reduce the energy consumption of those systems. Initiatives for reducing the energy consumption of large scale and distributed infrastructures can without loss of generality be broken into hardware and software initiatives.Unlike their hardware counterpart, software solutions to the energy reduction problem in large scale and distributed infrastructures hardly result in real deployments. At the one hand, this can be justified by the fact that they are application oriented. At the other hand, their failure can be attributed to their complex nature which often requires vast technical knowledge behind proposed solutions and/or thorough understanding of applications at hand. This restricts their use to a limited number of experts, because users usually lack adequate skills. In addition, although subsystems including the memory are becoming more and more power hungry, current software energy reduction techniques fail to take them into account. This thesis proposes a methodology for reducing the energy consumption of large scale and distributed infrastructures. Broken into three steps known as (i) phase identification, (ii) phase characterization, and (iii) phase identification and system reconfiguration; our methodology abstracts away from any individual applications as it focuses on the infrastructure, which it analyses the runtime behaviour and takes reconfiguration decisions accordingly.The proposed methodology is implemented and evaluated in high performance computing (HPC) clusters of varied sizes through a Multi-Resource Energy Efficient Framework (MREEF). MREEF implements the proposed energy reduction methodology so as to leave users with the choice of implementing their own system reconfiguration decisions depending on their needs. Experimental results show that our methodology reduces the energy consumption of the overall infrastructure of up to 24% with less than 7% performance degradation. By taking into account all subsystems, our experiments demonstrate that the energy reduction problem in large scale and distributed infrastructures can benefit from more than "the traditional" processor frequency scaling. Experiments in clusters of varied sizes demonstrate that MREEF and therefore our methodology can easily be extended to a large number of energy aware clusters. The extension of MREEF to virtualized environments like cloud shows that the proposed methodology goes beyond HPC systems and can be used in many other computing environments.

25

Moise, Diana. "Optimisation de la gestion des données pour les applications MapReduce sur des infrastructures distribuées à grande échelle." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00653622.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les applications data-intensive sont largement utilisées au sein de domaines diverses dans le but d'extraire et de traiter des informations, de concevoir des systèmes complexes, d'effectuer des simulations de modèles réels, etc. Ces applications posent des défis complexes tant en termes de stockage que de calcul. Dans le contexte des applications data-intensive, nous nous concentrons sur le paradigme MapReduce et ses mises en oeuvre. Introduite par Google, l'abstraction MapReduce a révolutionné la communauté data-intensive et s'est rapidement étendue à diverses domaines de recherche et de production. Une implémentation domaine publique de l'abstraction mise en avant par Google a été fournie par Yahoo à travers du project Hadoop. Le framework Hadoop est considéré l'implémentation de référence de MapReduce et est actuellement largement utilisé à des fins diverses et sur plusieurs infrastructures. Nous proposons un système de fichiers distribué, optimisé pour des accès hautement concurrents, qui puisse servir comme couche de stockage pour des applications MapReduce. Nous avons conçu le BlobSeer File System (BSFS), basé sur BlobSeer, un service de stockage distribué, hautement efficace, facilitant le partage de données à grande échelle. Nous étudions également plusieurs aspects liés à la gestion des données intermédiaires dans des environnements MapReduce. Nous explorons les contraintes des données intermédiaires MapReduce à deux niveaux: dans le même job MapReduce et pendant l'exécution des pipelines d'applications MapReduce. Enfin, nous proposons des extensions de Hadoop, un environnement MapReduce populaire et open-source, comme par example le support de l'opération append. Ce travail inclut également l'évaluation et les résultats obtenus sur des infrastructures à grande échelle: grilles informatiques et clouds.

26

Ariyattu, Resmi. "Towards federated social infrastructures for plug-based decentralized social networks." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S031/document.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Dans cette thèse, nous abordons deux problèmes soulevés par les systèmes distribués décentralisés - le placement de réseaux logiques de façon compatible avec le réseau physique sous-jacent et la construction de cohortes d'éditeurs pour dans les systèmes d'édition collaborative. Bien que les réseaux logiques (overlay networks) été largement étudiés, la plupart des systèmes existant ne prennent pas ou prennent mal en compte la topologie du réseau physique sous-jacent, alors que la performance de ces systèmes dépend dans une grande mesure de la manière dont leur topologie logique exploite la localité présente dans le réseau physique sur lequel ils s'exécutent. Pour résoudre ce problème, nous proposons dans cette thèse Fluidify, un mécanisme décentralisé pour le déploiement d'un réseau logique sur une infrastructure physique qui cherche à maximiser la localité du déploiement. Fluidify utilise une stratégie double qui exploite à la fois les liaisons logiques d'un réseau applicatif et la topologie physique de son réseau sous-jacent pour aligner progressivement l'une avec l'autre. Le protocole résultant est générique, efficace, évolutif et peut améliorer considérablement les performances de l'ensemble. La deuxième question que nous abordons traite des plates-formes d'édition collaborative. Ces plates-formes permettent à plusieurs utilisateurs distants de contribuer simultanément au même document. Seuls un nombre limité d'utilisateurs simultanés peuvent être pris en charge par les éditeurs actuellement déployés. Un certain nombre de solutions pair-à-pair ont donc été proposées pour supprimer cette limitation et permettre à un grand nombre d'utilisateurs de collaborer sur un même document sans aucune coordination centrale. Ces plates-formes supposent cependant que tous les utilisateurs d'un système éditent le même jeu de document, ce qui est peu vraisemblable. Pour ouvrir la voie à des systèmes plus flexibles, nous présentons, Filament, un protocole décentralisé de construction de cohorte adapté aux besoins des grands éditeurs collaboratifs. Filament élimine la nécessité de toute table de hachage distribuée (DHT) intermédiaire et permet aux utilisateurs travaillant sur le même document de se retrouver d'une manière rapide, efficace et robuste en générant un champ de routage adaptatif autour d'eux-mêmes. L'architecture de Filament repose sur un ensemble de réseaux logiques auto-organisées qui exploitent les similarités entre jeux de documents édités par les utilisateurs. Le protocole résultant est efficace, évolutif et fournit des propriétés bénéfiques d'équilibrage de charge sur les pairs impliqués
In this thesis, we address two issues in the area of decentralized distributed systems: network-aware overlays and collaborative editing. Even though network overlays have been extensively studied, most solutions either ignores the underlying physical network topology, or uses mechanisms that are specific to a given platform or applications. This is problematic, as the performance of an overlay network strongly depends on the way its logical topology exploits the underlying physical network. To address this problem, we propose Fluidify, a decentralized mechanism for deploying an overlay network on top of a physical infrastructure while maximizing network locality. Fluidify uses a dual strategy that exploits both the logical links of an overlay and the physical topology of its underlying network to progressively align one with the other. The resulting protocol is generic, efficient, scalable and can substantially improve network overheads and latency in overlay based systems. The second issue that we address focuses on collaborative editing platforms. Distributed collaborative editors allow several remote users to contribute concurrently to the same document. Only a limited number of concurrent users can be supported by the currently deployed editors. A number of peer-to-peer solutions have therefore been proposed to remove this limitation and allow a large number of users to work collaboratively. These decentralized solution assume however that all users are editing the same set of documents, which is unlikely to be the case. To open the path towards more flexible decentralized collaborative editors, we present Filament, a decentralized cohort-construction protocol adapted to the needs of large-scale collaborative editors. Filament eliminates the need for any intermediate DHT, and allows nodes editing the same document to find each other in a rapid, efficient and robust manner by generating an adaptive routing field around themselves. Filament's architecture hinges around a set of collaborating self-organizing overlays that utilizes the semantic relations between peers. The resulting protocol is efficient, scalable and provides beneficial load-balancing properties over the involved peers

27

Fabiyi, Adedeji Oyekanmi. "A methodology for developing scientific software applications in science gateways : towards the easy accessibility and availability of scientific applications." Thesis, Brunel University, 2017. http://bura.brunel.ac.uk/handle/2438/16313.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Distributed Computing Infrastructures (DCIs) have emerged as a viable and affordable solution to the computing needs of communities of practice that may require the need to improve system performance or enhance the availability of their scientific applications. According to the literature, the ease of access and several other issues which relate to the interoperability among different resources are the biggest challenges surrounding the use of these infrastructures. The traditional method of using a Command Line Interface (CLI) to access these resources is difficult and can make the learning curve quite steep. This approach can result in the low uptake of DCIs as it prevents potential users of the infrastructures from adopting the technology. Science Gateways have emerged as a viable option that are used to realise the high-level scientific domain-specific user interfaces that hide all the details of the underlying infrastructures and expose only the science-specific aspects of the scientific applications to be executed in the various DCIs. A Science Gateway is a digital interface to advanced technologies which is used to provide adequate support for science and engineering research and education. The focus of this study therefore is to propose and implement a Methodology for dEveloping Scientific Software Applications in science GatEways (MESSAGE). This will be achieved by testing an approach which is considered to be appropriate for developing applications in Science Gateways. In the course of this study, several Science Gateway functionalities obtained from the review of literature which may be utilised to provide services for different communities of practice are highlighted. To implement the identified functionalities, this study utilises the methodology for developing scientific software applications in Science Gateways. In order to achieve this purpose, this research therefore adopts the Catania Science Gateway Framework (CSGF) and the Future Gateway approach to implement the methods and ideas described in the proposed methodology, as well the essential services of Science Gateways discussed throughout the thesis. In addition, three different set of scientific software applications are utilised for the implementation of the proposed methodology. While the first application primarily serves as the case study for implementing the methodology discussed in this thesis, a second application is used to evaluate the entire process. Furthermore, several other real-life scientific applications developed (using two distinctly different Science Gateway frameworks) are also utilised for the purpose of evaluation. Subsequently, a revised MESSAGE methodology for developing scientific software applications in Science Gateways is discussed in the latter Chapter of this thesis. Following from the implementation of both scientific software applications which sees the use of portlets to execute single experiments, a study was also conducted to investigate ways in which Science Gateways may be utilised for the execution of multiple experiments in a distributed environment. Finally, similar to making different scientific software applications accessible and available (worldwide) to the communities that need them, the processes involved in making their associated research outputs (such as data, software and results) easily accessible and readily available are also discussed. The main contribution of this thesis is the MESSAGE methodology for developing scientific software applications in Science Gateways. Other contributions which are also made in different aspects of this research include a framework of the essential services required in generic Science Gateways and an approach to developing and executing multiple experiments (via Science Gateway interfaces) within a distributed environment. To a lesser extent, this study also utilises the Open Access Document Repository (OADR) (and other related technologies) to demonstrate accessibility and availability of research outputs associated with specific scientific software applications, thereby introducing the concept (and thus laying the foundation) of an Open Science research.

28

Villebonnet, Violaine. "Scheduling and Dynamic Provisioning for Energy Proportional Heterogeneous Infrastructures." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEN057/document.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

La consommation énergétique des centres de calculs et de données, aussi appelés « data centers », représentait 2% de la consommation mondiale d'électricité en 2012. Leur nombre est en augmentation et suit l'évolution croissante des objets connectés, services, applications, et des données collectées. Ces infrastructures, très consommatrices en énergie, sont souvent sur-dimensionnées et les serveurs en permanence allumés. Quand la charge de travail est faible, l'électricité consommée par les serveurs inutilisés est gaspillée, et un serveur inactif peut consommer jusqu'à la moitié de sa consommation maximale. Cette thèse s'attaque à ce problème en concevant un data center ayant une consommation énergétique proportionnelle à sa charge. Nous proposons un data center hétérogène, nommé BML pour « Big, Medium, Little », composé de plusieurs types de machines : des processeurs très basse consommation et des serveurs classiques. L'idée est de profiter de leurs différentes caractéristiques de performance, consommation, et réactivité d'allumage, pour adapter dynamiquement la composition de l'infrastructure aux évolutions de charge. Nous décrivons une méthode générique pour calculer les combinaisons de machines les plus énergétiquement efficaces à partir de données de profilage de performance et d'énergie acquis expérimentalement considérant une application cible, ayant une charge variable au cours du temps, dans notre cas un serveur web.Nous avons développé deux algorithmes prenant des décisions de reconfiguration de l'infrastructure et de placement des instances de l'application en fonction de la charge future. Les différentes temporalités des actions de reconfiguration ainsi que leur coûts énergétiques sont pris en compte dans le processus de décision. Nous montrons par simulations que nous atteignons une consommation proportionnelle à la charge, et faisons d'importantes économies d'énergie par rapport aux gestions classiques des data centers
The increasing number of data centers raises serious concerns regarding their energy consumption. These infrastructures are often over-provisioned and contain servers that are not fully utilized. The problem is that inactive servers can consume as high as 50% of their peak power consumption.This thesis proposes a novel approach for building data centers so that their energy consumption is proportional to the actual load. We propose an original infrastructure named BML for "Big, Medium, Little", composed of heterogeneous computing resources : from low power processors to classical servers. The idea is to take advantage of their different characteristics in terms of energy consumption, performance, and switch on reactivity to adjust the composition of the infrastructure according to the load evolutions. We define a generic methodology to compute the most energy proportional combinations of machines based on hardware profiling data.We focus on web applications whose load varies over time and design a scheduler that dynamically reconfigures the infrastructure, with application migrations and machines switch on and off, to minimize the infrastructure energy consumption according to the current application requirements.We have developed two different dynamic provisioning algorithms which take into account the time and energy overheads of the different reconfiguration actions in the decision process. We demonstrate through simulations based on experimentally acquired hardware profiles that we achieve important energy savings compared to classical data center infrastructures and management

29

Truong, Huu Tram. "Optimisation des performances et du coût de flots applicatifs s'exécutant sur des infrastructures de cloud." Phd thesis, Université de Nice Sophia-Antipolis, 2010. http://tel.archives-ouvertes.fr/tel-00805511.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les infrastructures virtuelles de cloud sont de plus en plus exploitées pour relever les défis de calcul intensif en sciences comme dans l'industrie. Elles fournissent des ressources de calcul, de communication et de stockage à la demande pour satisfaire les besoins des applications à grande échelle. Pour s'adapter à la diversité de ces infrastructures, de nouveaux outils et modèles sont nécessaires. L'estimation de la quantité de ressources consommées par chaque application est un problème particulièrement difficile, tant pour les utilisateurs qui visent à minimiser leurs coûts que pour les fournisseurs d'infrastructure qui visent à contrôler l'allocation des ressources. Même si une quantité quasi illimitée de ressources peut être allouée, un compromis doit être trouvé entre (i) le coût de l'infrastructure allouée, (ii) la performance attendue et (iii) la performance optimale atteignable qui dépend du niveau de parallélisme inhérent à l'application. Partant du cas d'utilisation de l'analyse d'images médicales, un domaine scientifique représentatif d'un grand nombre d'applications à grande échelle, cette thèse propose un modèle de coût à grain fin qui s'appuie sur l'expertise extraite de l'application formalisée comme un flot. Quatre stratégies d'allocation des ressources basées sur ce modèle de coût sont introduites. En tenant compte à la fois des ressources de calcul et de communication, ces stratégies permettent aux utilisateurs de déterminer la quantité de ressources de calcul et de bande passante à réserver afin de composer leur environnement d'exécution. De plus, l'optimisation du transfert de données et la faible fiabilité des systèmes à grande échelle, qui sont des problèmes bien connus ayant un impact sur la performance de l'application et donc sur le coût d'utilisation des infrastructures, sont également prises en considération. Les expériences exposées dans cette thèse ont été effectuées sur la plateforme Aladdin/Grid'5000, en utilisant l'intergiciel HIPerNet. Ce gestionnaire de plateforme virtuelle permet la virtualisation de ressources de calcul et de communication. Une application réelle d'analyse d'images médicales a été utilisée pour toutes les validations expérimentales. Les résultats expérimentaux montrent la validité de l'approche en termes de contrôle du coût de l'infrastructure et de la performance des applications. Nos contributions facilitent à la fois l'exploitation des infrastructures de cloud, offrant une meilleure qualité de services aux utilisateurs, et la planification de la mise à disposition des ressources virtualisées.

30

Tsafack, Chetsa Ghislain Landry. "Profilage système et leviers verts pour les infrastructures distribuées à grande échelle." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2013. http://tel.archives-ouvertes.fr/tel-00925320.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

De nos jours, réduire la consommation énergétique des infrastructures de calcul à grande échelle est devenu un véritable challenge aussi bien dans le monde académique qu'industriel. Ceci est justifié par les nombreux efforts visant à réduire la consommation énergétique de ceux-ci. Ces efforts peuvent sans nuire à la généralité être divisés en deux groupes : les approches matérielles et les approches logicielles. Contrairement aux approches matérielles, les approches logicielles connaissent très peu de succès à cause de leurs complexités. En effet, elles se focalisent sur les applications et requièrent souvent une très bonne compréhension des solutions proposées et/ou de l'application considérée. Ce fait restreint leur utilisation à un nombre limité d'experts puisqu'en général les utilisateurs n'ont pas les compétences nécessaires à leurs implémentation. Aussi, les solutions actuelles en plus de leurs complexités de déploiement ne prennent en compte que le processeur alors que les composants tel que la mémoire, le stockage et le réseau sont eux aussi de gros consommateurs d'énergie. Cette thèse propose une méthodologie de réduction de la consommation énergétique des infrastructures de calcul à grande échelle. Elaborée en trois étapes à savoir : (i) détection de phases, (ii) caractérisation de phases détectées et (iii) identification de phases et reconfiguration du système ; elle s'abstrait de toute application en se focalisant sur l'infrastructure dont elle analyse le comportement au cours de son fonctionnement afin de prendre des décisions de reconfiguration. La méthodologie proposée est implémentée et évaluée sur des grappes de calcul à haute performance de tailles variées par le biais de MREEF (Multi-Resource Energy Efficient Framework). MREEF implémente la méthodologie de réduction énergétique de manière à permettre aux utilisateurs d'implémenter leurs propres mécanismes de reconfiguration du système en fonction des besoins. Les résultats expérimentaux montrent que la méthodologie proposée réduit la consommation énergétique de 24% pour seulement une perte de performance de moins de 7%. Ils montrent aussi que pour réduire la consommation énergétique des systèmes, on peut s'appuyer sur les sous-systèmes tels que les sous-systèmes de stockage et de communication. Nos validations montrent que notre méthodologie s'étend facilement à un grand nombre de grappes de calcul sensibles à l'énergie (energy aware). L'extension de MREEF dans les environnements virtualisés tel que le cloud montre que la méthodologie proposée peut être utilisée dans beaucoup d'autres environnements de calcul.

31

Moise, Diana Maria. "Optimisation de la gestion des données pour les applications MapReduce sur des infrastructures distribuées à grande échelle." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00696062.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les applications data-intensive sont largement utilisées au sein de domaines diverses dans le but d'extraire et de traiter des informations, de concevoir des systèmes complexes, d'effectuer des simulations de modèles réels, etc. Ces applications posent des défis complexes tant en termes de stockage que de calcul. Dans le contexte des applications data-intensive, nous nous concentrons sur le paradigme MapReduce et ses mises en oeuvre. Introduite par Google, l'abstraction MapReduce a révolutionné la communauté intensif de données et s'est rapidement étendue à diverses domaines de recherche et de production. Une implémentation domaine publique de l'abstraction mise en avant par Google, a été fournie par Yahoo à travers du project Hadoop. Le framework Hadoop est considéré l'implémentation de référence de MapReduce et est actuellement largement utilisé à des fins diverses et sur plusieurs infrastructures. Nous proposons un système de fichiers distribué, optimisé pour des accès hautement concurrents, qui puisse servir comme couche de stockage pour des applications MapReduce. Nous avons conçu le BlobSeer File System (BSFS), basé sur BlobSeer, un service de stockage distribué, hautement efficace, facilitant le partage de données à grande échelle. Nous étudions également plusieurs aspects liés à la gestion des données intermédiaires dans des environnements MapReduce. Nous explorons les contraintes des données intermédiaires MapReduce à deux niveaux: dans le même job MapReduce et pendant l'exécution des pipelines d'applications MapReduce. Enfin, nous proposons des extensions de Hadoop, un environnement MapReduce populaire et open-source, comme par example le support de l'opération append. Ce travail inclut également l'évaluation et les résultats obtenus sur des infrastructures à grande échelle: grilles informatiques et clouds.

32

Wieder, Philipp [Verfasser], Ramin [Akademischer Betreuer] Yahyapour, and Dieter [Akademischer Betreuer] Kranzlmüller. "A generic scheduling architecture for service oriented distributed computing infrastructures : Introducing new concepts based on automated negotiation of electronic contracts / Philipp Wieder. Betreuer: Ramin Yahyapour. Gutachter: Dieter Kranzlmüller." Dortmund : Universitätsbibliothek Dortmund, 2013. http://d-nb.info/1099297540/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Fernandez, Héctor. "Coordination flexible fondée sur la métaphore chimique dans les infrastructures de services." Phd thesis, Université Rennes 1, 2012. http://tel.archives-ouvertes.fr/tel-00717057.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Avec le développement de l'Internet des services, composer dynamiquement des services distribués faiblement couplés est devenu le nouveau challenge du calcul à large échelle. Alors que la composition de services est devenue un élément clef des plates-formes orientées service, les systèmes de composition de services suivent pour la plupart une approche centralisée connaissant l'ensemble des informations de flux de contrôle et de données du workflow, posant un certain nombre de problèmes, notamment de passage à l'échelle et de fiabilité. Dans un monde où les plates-formes sont de plus en plus dynamiques, de nouveaux mécanismes de coordination dynamiques sont requis. Dans ce contexte, des métaphores naturelles, et en particulier la méthapore chimique, ont gagné une attention particulière récemment, car elles fournissent des abstractions pour une coordination flexible d'entités. Dans cette thèse, nous présentons un système de gestion de workflow fondée sur la métaphore chimique, qui fournit un modèle d'exécution haut-niveau pour l'exécution centralisée et décentralisée de compositions (ou workflows). Selon ce modèle, les services sont vus comme des molécules qui flottent dans une solution chimique. La coordination de ces services est effectuée par un ensemble de réactions entre ces molécules exprimant l'exécution décentralisée d'un workflow. Par ailleurs, si le paradigme chimique est aujourd'hui considéré comme un modèle de coordination prometteur, il manque des résultats expérimentaux. Ainsi, nous avons développé un prototype logiciel. Des expériences ont été menées avec des workflows d'applications réelles pour montrer la viabilité de notre modèle.

34

Alvares, De Oliveira Junior Frederico. "Gestion multi autonome pour l'optimisation de la consommation énergétique sur les infrastructures en nuage." Phd thesis, Université de Nantes, 2013. http://tel.archives-ouvertes.fr/tel-00853575.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Conséquence directe de la popularité croissante des services informatique en nuage, les centres de données se développent à une vitesse vertigineuse et doivent rapidement faire face à des problèmes de consommation d'énergie. Paradoxalement, l'informatique en nuage permet aux infrastructure et applications de s'ajuster dynamiquement afin de rendre l'infrastructure plus efficace en termes d'énergie et les applications plus conformes en termes de qualité de service (QdS). Toutefois, les décisions d'optimisation prises isolément à un certain niveau peuvent indirectement interférer avec (voire neutraliser) les décisions prises à un autre niveau, par exemple, une application demande plus de ressources pour garder sa QdS alors qu'une partie de l'infrastructure est en cours d'arrêt pour des raisons énergétiques. Par conséquent, il devient nécessaire non seulement d'établir une synergie entre les couches du nuage, mais aussi de rendre ces couches suffisamment souples et sensibles pour être en mesure de réagir aux changements d'exécution et ainsi profiter pleinement de cette synergie. Cette thèse propose une approche d'auto-adaptation qui prend en considération les composants applicatifs (élasticité architecturale) ainsi que d'infrastructure (élasticité des ressources) pour réduire l'empreinte énergétique. Chaque application et l'infrastructure sont équipées d'une boucle de contrôle autonome qui leur permet d'optimiser indépendamment leur fonctionnement. Afin de créer une synergie entre boucles de contrôle autour d'un objectif commun, nous proposons un modèle pour la coordination et la synchronisation de plusieurs boucles de contrôle. L'approche est validée expérimentalement à la fois qualitativement (amélioration de QdS et des gains d'énergie) et quantitativement (passage à l'échelle).

35

Giansiracusa, Michelangelo Antonio. "A secure infrastructural strategy for safe autonomous mobile agents." Thesis, Queensland University of Technology, 2005. https://eprints.qut.edu.au/16052/1/Michaelangelo_Giansiracusa_Thesis.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Portable languages and distributed paradigms have driven a wave of new applications and processing models. One of the most promising, certainly from its early marketing, but disappointing (from its limited uptake)is the mobile agent execution and data processing model. Mobile agents are autonomous programs which can move around a heterogeneous network such as the Internet, crossing through a number of different security domains, and perform some work at each visited destination as partial completion of a mission for their agent user. Despite their promise as a technology and paradigm to drive global electronic services (i.e.any Internet-driven-and-delivered service, not solely e-commerce related activities), their up take on the Internet has been very limited. Chief among the reasons for the paradigm's practical under-achievement is there is no ubiquitous frame work for using Internet mobile agents, and non-trivial security concerns abound for the two major stake holders (mobile agent users and mobile agent platform owners). While both stake holders have security concerns with the dangers of the mobile agent processing model, most investigators in the field are of the opinion that protecting mobile agents from malicious agent platforms is more problematic than protecting agent platforms from malicious mobile agents. Traditional cryptographic mechanisms are not well-suited to counter the bulk of the threats associated with the mobile agent paradigm due to the untrusted hosting of an agent and its intended autonomous, flexible movement and processing. In our investigation, we identified that the large majority of the research undertaken on mobile agent security to date has taken a micro-level perspective. By this we mean research focused solely on either of the two major stakeholders, and even then often only on improving measures to address one security issue dear to the stake holder - for example mobile agent privacy (for agent users) or access control to platform resources (for mobile agent platform owners). We decided to take a more encompassing, higher-level approach in tackling mobile agent security issues. In this endeavour, we developed the beginnings of an infrastructural-approach to not only reduce the security concerns of both major stakeholders, but bring them transparently to a working relationship. Strategic utilisation of both existing distributed system trusted-third parties (TTPs) and novel mobile agent paradigm-specific TTPs are fundamental in the infrastructural framework we have devised. Besides designing an application and language independent frame work for supporting a large-scale Internet mobile agent network, our Mobile Agent Secure Hub Infrastructure (MASHIn) proposal encompasses support for flexible access control to agent platform resources. A reliable means to track the location and processing times of autonomous Internet mobile agents is discussed, withfault-tolerant handling support to work around unexpected processing delays. Secure,highly-effective (incomparison to existing mechanisms) strategies for providing mobile agent privacy, execution integrity, and stake holder confidence scores were devised - all which fit comfortably within the MASHIn framework. We have deliberately considered the interests - withoutbias -of both stake holders when designing our solutions. In relation to mobile agent execution integrity, we devised a new criteria for assessing the robustness of existing execution integrity schemes. Whilst none of the existing schemes analysed met a large number of our desired properties for a robust scheme, we identified that the objectives of Hohl's reference states scheme were most admirable - particularly real - time in - mission execution integrity checking. Subsequently, we revised Hohl's reference states protocols to fit in the MASHIn framework, and were able to overcome not only the two major limitations identified in his scheme, but also meet all of our desired properties for a robust execution integrity scheme (given an acceptable decrease in processing effiency). The MASHIn offers a promising new perspective for future mobile agent security research and indeed a new frame work for enabling safe and autonomous Internet mobile agents. Just as an economy cannot thrive without diligent care given to micro and macro-level issues, we do not see the security prospects of mobile agents (and ultimately the prospects of the mobile agent paradigm) advancing without diligent research on both levels.

36

Giansiracusa, Michelangelo Antonio. "A Secure Infrastructural Strategy for Safe Autonomous Mobile Agents." Queensland University of Technology, 2005. http://eprints.qut.edu.au/16052/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Portable languages and distributed paradigms have driven a wave of new applications and processing models. One of the most promising, certainly from its early marketing, but disappointing (from its limited uptake)is the mobile agent execution and data processing model. Mobile agents are autonomous programs which can move around a heterogeneous network such as the Internet, crossing through a number of different security domains, and perform some work at each visited destination as partial completion of a mission for their agent user. Despite their promise as a technology and paradigm to drive global electronic services (i.e.any Internet-driven-and-delivered service, not solely e-commerce related activities), their up take on the Internet has been very limited. Chief among the reasons for the paradigm's practical under-achievement is there is no ubiquitous frame work for using Internet mobile agents, and non-trivial security concerns abound for the two major stake holders (mobile agent users and mobile agent platform owners). While both stake holders have security concerns with the dangers of the mobile agent processing model, most investigators in the field are of the opinion that protecting mobile agents from malicious agent platforms is more problematic than protecting agent platforms from malicious mobile agents. Traditional cryptographic mechanisms are not well-suited to counter the bulk of the threats associated with the mobile agent paradigm due to the untrusted hosting of an agent and its intended autonomous, flexible movement and processing. In our investigation, we identified that the large majority of the research undertaken on mobile agent security to date has taken a micro-level perspective. By this we mean research focused solely on either of the two major stakeholders, and even then often only on improving measures to address one security issue dear to the stake holder - for example mobile agent privacy (for agent users) or access control to platform resources (for mobile agent platform owners). We decided to take a more encompassing, higher-level approach in tackling mobile agent security issues. In this endeavour, we developed the beginnings of an infrastructural-approach to not only reduce the security concerns of both major stakeholders, but bring them transparently to a working relationship. Strategic utilisation of both existing distributed system trusted-third parties (TTPs) and novel mobile agent paradigm-specific TTPs are fundamental in the infrastructural framework we have devised. Besides designing an application and language independent frame work for supporting a large-scale Internet mobile agent network, our Mobile Agent Secure Hub Infrastructure (MASHIn) proposal encompasses support for flexible access control to agent platform resources. A reliable means to track the location and processing times of autonomous Internet mobile agents is discussed, withfault-tolerant handling support to work around unexpected processing delays. Secure,highly-effective (incomparison to existing mechanisms) strategies for providing mobile agent privacy, execution integrity, and stake holder confidence scores were devised - all which fit comfortably within the MASHIn framework. We have deliberately considered the interests - withoutbias -of both stake holders when designing our solutions. In relation to mobile agent execution integrity, we devised a new criteria for assessing the robustness of existing execution integrity schemes. Whilst none of the existing schemes analysed met a large number of our desired properties for a robust scheme, we identified that the objectives of Hohl's reference states scheme were most admirable - particularly real - time in - mission execution integrity checking. Subsequently, we revised Hohl's reference states protocols to fit in the MASHIn framework, and were able to overcome not only the two major limitations identified in his scheme, but also meet all of our desired properties for a robust execution integrity scheme (given an acceptable decrease in processing effiency). The MASHIn offers a promising new perspective for future mobile agent security research and indeed a new frame work for enabling safe and autonomous Internet mobile agents. Just as an economy cannot thrive without diligent care given to micro and macro-level issues, we do not see the security prospects of mobile agents (and ultimately the prospects of the mobile agent paradigm) advancing without diligent research on both levels.

37

Kuo, Hsun-Hung, and 郭訓宏. "Towards a High-Performance Grid Computing Infrastructure —A Distributed Databases Approach." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/55069768643873995817.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

碩士
國立交通大學
資訊科學系所
92
Grid Computing promises to provide uniform, non-expensive access to computing power through aggregation and utilization of potentially unlimited number of storage and computing devices. For Grid infrastructure developers, this goal amounts to creating effective mechanisms that can allocate and coordinate distributed, heterogeneous resources in a robust and secure manner. For Grid application developers, on the other hand, the main challenge is to make best use of the facilities provided by the infrastructure. Typically, a developer needs to divide a problem into smaller pieces, and plan for appropriate data manipulation and transfer among them. Such divide-and-conquer effort is essential when required memory space is beyond the capabilities of individual machines, but complicated when the infrastructure provides only low-level facilities. This thesis describes database-specific techniques that can relieve developers from complicated memory management. Simply speaking, we use individual relational databases as computational nodes for their storage and computation capabilities, and connect them together into a distributed computing platform. In addition, we define a generic schema capable of storing complex data structures, and mechanisms that allow flexible translation between the schema and other computation-friendly tabular structures. We argue that together these constructs form an attractive platform that can greatly simplify Grid application development, thus contribute to the general Grid Computing community in a useful way.

38

Δεχουνιώτης, Δημήτριος. "Monitoring and control of distributed web services on cloud computing infrastructure." Thesis, 2014. http://hdl.handle.net/10889/7958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis concerns two main research areas of distributed web services deployed on cloud computing infrastructure. The first category is about monitoring of cloud computing infrastructure. In chapter 2 a novel general technique is used to infer relationships between different service components in a data center. This approach relies on a small set of fuzzy rules, produced by a hybrid genetic algorithm with high classification rate. Furthermore, the strength of detected dependencies is measured. Although we do not know the ground truth about relationships in a network, the proposed method mines realistic relationships without having any previous information about network topology and infrastructure. This approach can be a useful monitoring tool for administrators to obtain a clear view of what is happening in the underlying network. Finally, because of the simplicity of our algorithm and the flexibility of FIM, an online approach seems feasible. The second major problem, which is addressed in chapter 3, is the automated resource control of consolidated web applications on cloud computing infrastructure. ACRA is an innovative modeling and controlling technique of distributed services that are co-located on server cluster. The system dynamics are modeled by a group of linear state space models, which cover all the range of workload conditions. Because of the variant workload conditions, there are non-linear term and uncertainties which are modeled by an additive term in the local linear models. Due to the several types of service transactions with varying time and resources demands there are many desired candidate reference values of the SLOs during a day. Due to these requirements and the workload circumstances, we choose the appropriate model and we compute the closest feasible operating point according to several optimization criteria. Then using a set-theoretic technique a state feedback controller is designed that successfully leads and stabilize the system in the region of the equilibrium point. ACRA controller computes a positively invariant set on the state-space, which includes the target set and drives the system trajectories in it. Thus provide stability guarantee and high levels of robustness against system disturbances and nonlinearities. Furthermore we compare ACRA with an MPC and a PI controller and the results are very promising, since our solution outperforms the two other approaches. Secondly, a unified local level modeling and control framework for consolidated web services in a server cluster was presented, which can be a vital element of a holistic distributed control platform. Admission control and resource allocation were addressed as a common decision problem. Stability and constraint satisfaction was guaranteed. A real testbed was built and from a range of examples, in different operating conditions, we can conclude that both the identification scheme and controller provide high level of QoS. A novel component of this approach is the determination of a set of feasible operating (equilibrium) points which allows choosing the appropriate equilibrium point, depending only on what our objectives are, such as maximizing throughput, minimizing consumption or maximizing profit. Evaluation shows that our approach has high performance compared to well-known solutions, such as queuing models and measurement approach of equilibrium points. Both controllers succeed in their main targets respectively to the already proposed studies in literature. Firstly they satisfy the SLA requirements and the constraints of the underlying cloud computing infrastructure. To the best of our knowledge they are the only studies that calculate a set of feasible operating points that ensure system stability. Furthermore they adopt modern control theory and beyond the stability guarantee they introduce new control properties such as positively invariant sets , ultimate boundedness and e- contractive sets.
Στη παρούσα διδακτορική διατριβή δύο ερευνητικά θέματα επιλύονται. Αρχικά αναπτύσσεται μια τεχνική παρακολούθηση της δικτυακής κίνησης με σκοπό την εύρεση λειτουργικών σχέσεων μεταξύ των διάφορων μερών μιας δικτυακής εφαρμογής. Στο δεύτερο μέρος επιλύεται το πρόβλημα της αυτοματοποιημένη διανομής των πόρων σε δικτυακές εφαρμογές που μοιράζονται ένα κοινό περιβάλλον ΥΑΝ ( Υπολογιστική Αρχιτεκτονική Νέφους). Στόχος του πρώτου κεφαλαίου της διατριβής σε σχέση με την υπάρχουσα βιβλιογραφία είναι η δημιουργία ενός εργαλείου ανάλυσης της δικτυακής κίνησης έτσι ώστε να γίνονται κατανοητές οι λειτουργικές σχέσεις μεταξύ μερών των κατανεμημένων δικτυακών υπηρεσιών. Αυτός ο γράφος είναι πρωτεύον εργαλείο για πολλές εργασίες ενός διαχειριστή που εντάσσονται στο πεδίο της ανάλυσης της απόδοσης και της ανάλυσης των αρχικών αιτίων. Για παράδειγμα η ανίχνευση λανθασμένων εγκαταστάσεων ή διαδικτυακών επιθέσεων και ο σχεδιασμός για την επέκταση η μετατροπή των ΥΑΝ υποδομών. Το δεύτερο μέρος της παρούσας διατριβής ασχολείται με το θέμα της αυτοματοποιημένης κατανομής των υπολογιστικών πόρων ενός υπολογιστικού κέντρου ΥΑΝ σε ένα σύνολο εγκατεστημένων δικτυακών εφαρμογών. Η σύγχρονη τεχνολογία της εικονικοποίησης είναι ο κύριος παράγοντας για την «συστέγαση» πολλών κατανεμημένων υπηρεσιών σε υπολογιστικά κέντρα ΥΑΝ. Το ΕΑΚΠ (έλεγχος αποδοχής και κατανομή πόρων) είναι ένα αυτόνομο πλαίσιο μοντελοποίησης και ελέγχου, το οποίο παρέχει ακριβή μοντέλα και λύνει ενοποιημένα τα προβλήματα ΕΑ και ΚΠ των δικτυακών εφαρμογών που είναι συγκεντρωμένες σε υπολογιστικά κέντρα ΥΑΝ. Στόχος του ΕΑΚΠ είναι να μεγιστοποιεί την είσοδο των αιτήσεων των χρηστών στη παρεχόμενη υπηρεσία εκπληρώνοντας παράλληλα και τις προδιαγεγραμμένες απαιτήσεις ΠΥ (Ποιότητα Υπηρεσίας). Ο δεύτερος τοπικός ελεγκτής που παρουσιάζεται σε αυτή τη διατριβή είναι ένα αυτόνομο πλαίσιο μοντελοποίησης και ελέγχου κατανεμημένων δικτυακών εφαρμογών σε περιβάλλον ΥΑΝ, το οποίο λύνει συγχρόνως τα προβλήματα ΕΑ και ΚΠ με ενιαίο τρόπο.

39

Ur, Rahman Sumair. "Security for Rural Public Computing." Thesis, 2008. http://hdl.handle.net/10012/3986.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Current research on securing public computing infrastructure like Internet kiosks has focused on the use of smartphones to establish trust in a computing platform or to offload the processing of sensitive information, and the use of new cryptosystems such as Hierarchical Identity-based Encryption (HIBE) to protect kiosk user data. Challenges posed by rural kiosks, specifically (a) the absence of specialized hardware features such as Trusted Platform Modules (TPMs) or a modifiable BIOS in older recycled PCs, (b) the potential use of periodically disconnected links between kiosks and the Internet, (c) the absence of a production-ready implementation of HIBE and (d) the limited availability of smartphones in most developing regions make these approaches difficult, if not impossible, to implement in a rural public computing scenario. In this thesis, I present a practical, unobtrusive and easy-to-use security architecture for rural public computing that uses a combination of physical and cryptographic mechanisms to protect user data, public computing infrastructure and handheld devices that access this infrastructure. Key contributions of this work include (a) a detailed threat analysis of such systems with a particular focus on rural Internet kiosks and handheld devices, (b) a security architecture for rural public computing infrastructure that does not require any specialized hardware, (c) an application-independent and backward-compatible security API for securely sending and receiving data between these systems and the Internet that can operate over delay tolerant links, (d) an implementation of my scheme for rural Internet kiosks and (e) a performance evaluation of this implementation to demonstrate its feasibility.

40

Vieira, Pedro Miguel Lima de Jesus. "A Persistent Publish/Subscribe System for Mobile Edge Computing." Master's thesis, 2018. http://hdl.handle.net/10362/71124.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In recent times, we have seen an incredible growth of users adopting mobile devices andwearables, and while the hardware capabilities of these devices have greatly increased year after year, mobile communications still remain a bottleneck for most applications. This is partially caused by the companies’ cloud infrastructure, which effectively represents a large scale communication hub where all kinds of platforms compete with each other for the servers’ processing power and channel throughput. Additionally, wireless technologies used in mobile environments are unreliable, slow and congestion-prone by nature when compared to the wired medium counterpart. To fix the back-and-forth mobile communication overhead, the “Edge” paradigm has been recently introduced with the aim to bring cloud services closer to the customers, by providing an intermediate layer between the end devices and the actual cloud infrastructure, resulting in faster response times. Publish/Subscribe systems, such as Thyme, have also been proposed and proven effective for data dissemination at edge networks, due to the interactions’ loosely coupled nature and scalability. Nonetheless, solely relying on P2P interactions is not feasible in every scenario due to wireless protocols’ range limitations. In this thesis we propose and develop Thyme- Infrastructure, an extension to the Thyme framework, that utilizes available stationary nodes within the edge infrastructure to not only improve the performance of mobile clients within a BSS, by offloading a portion of the requests to be processed by the infrastructure, but also to connect multiple clusters of users within the same venue, with the goal of creating a persistent and global end-to-end storage network. Our experimental results, both in simulated and real-world scenarios, show adequate response times for interactive usage, and low energy consumption, allowing the application to be used in a variety of events without excessive battery drainage. In fact, when compared to the previous version of Thyme, our framework was generally able to improve on all of these metrics. On top of that, we evaluated our system’s latencies against a full-fledged cloud solution and verified that our proposal yielded a considerable speedup across the board.

41

Aschenbrenner, Andreas. "Reference Framework for Distributed Repositories." Doctoral thesis, 2009. http://hdl.handle.net/11858/00-1735-0000-0006-B3CF-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Afonso, José Pedro Serra. "Key-Value Storage for handling data in mobile devices." Master's thesis, 2019. http://hdl.handle.net/10362/92282.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In the current era of technology, computers have shrinked to the point that more than half of the world population always carries one with them - their mobile devices. These are used in all sorts of different activities, constantly generating information that needs to be stored or processed somewhere. To cope with the huge amounts of data generated by all of these devices, applications have resorted to Cloud services to provide them with the much needed computational and storage resources, but as these remote infrastructures still represented a bottleneck communication wise, a new paradigm has been emerging, Edge Computing. Instead of processing and storing all the data in more distant cloud services, the data is spread among mobile devices and edge servers connected in a shared network. In order to fully take advantage of the low latency times experienced in the Edge, applications still needed a distributed storage edge-oriented system, capable of handling the contents generated by all of these mobile devices. The current state-of-the-art storage systems are able to provide these applications with a storing platform that uses mobile devices or edge servers as data storing points, but neither uses both. In this thesis we propose a Key-Value Edge Storage System named Basil, that uses both mobile devices and edge infrastructures as nodes of the system, capable of providing users from different locations with a cohesive and consistent distributed storage system. Furthermore, we will test our KV store against existing NoSQL storage models deployed in the edge, as well as its own performance while varying the number of nodes it relies on.

43

Pan, Jia-Yang, and 潘家仰. "MPI Infrastructures in A Browser-Based Distributed Computing Environment." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/j9dm4f.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

碩士
國立東華大學
資訊工程學系
107
The goal of this thesis is to design and implement the MPI infrastructure in a browser-based distributed computing environment. We learned about several popular distributed communications infrastructures, and we analyze their advantages and disadvantages. Through the analysis results, we find that MPI has many suitable features and becomes one of the best candidates for implementation in browser-based distributed computing environments. Hence, we design a Javascript version of MPI infrastructure, called MPIjs. MPIjs follows the standard MPI API, and has the features of simplicity and compatibility. Such that, the distributed programs using MPIjs may be ported to many popular browser easily, such as Google Chrome and Mozilla Firefox. For validation, we implement a distributed merge sort system using MPIjs. And, we conduct experiments on the NAS Grid Benchmarks by NASA in three configurations: one client with one server, one client with two servers, and one client with four servers. We compare the efficiency of the same benchmarks on two different platforms: MPIjs on Google Chrome, and MPJ Express on Java virtual machine. It is interesting to note that the communication performance for MPIjs is better than MPJ Express, while the computation performance for Java bytecode on JVM is better than Javascript program on Chrome.

44

(9034865), Deepika Kaushal. "Bootstrapping a Private Cloud." Thesis, 2020.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Cloud computing allows on-demand provision, configuration and assignment of computing resources with minimum cost and effort for users and administrators. Managing the physical infrastructure that underlies cloud computing services relies on the need to provision and manage bare-metal computer hardware. Hence there is a need for quick loading of operating systems in bare-metal and virtual machines to service the demands of users. The focus of the study is on developing a technique to load these machines remotely, which is complicated by the fact that the machines can be present in different Ethernet broadcast domains, physically distant from the provisioning server. The use of available bare-metal provisioning frameworks require significant skills and time. Moreover, there is no easily implementable standard method of booting across separate and different Ethernet broadcast domains. This study proposes a new framework to provision bare-metal hardware remotely using layer 2 services in a secure manner. This framework is a composition of existing tools that can be assembled to build the framework.

45

(11132985), Thamir Qadah. "High-performant, Replicated, Queue-oriented Transaction Processing Systems on Modern Computing Infrastructures." Thesis, 2021.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

With the shifting landscape of computing hardware architectures and the emergence of new computing environments (e.g., large main-memory systems, hundreds of CPUs, distributed and virtualized cloud-based resources), state-of-the-art designs of transaction processing systems that rely on conventional wisdom suffer from lost performance optimization opportunities. This dissertation challenges conventional wisdom to rethink the design and implementation of transaction processing systems for modern computing environments.

We start by tackling the vertical hardware scaling challenge, and propose a deterministic approach to transaction processing on emerging multi-sockets, many-core, shared memory architecture to harness its unprecedented available parallelism. Our proposed priority-based queue-oriented transaction processing architecture eliminates the transaction contention footprint and uses speculative execution to improve the throughput of centralized deterministic transaction processing systems. We build QueCC and demonstrate up to two orders of magnitude better performance over the state-of-the-art.

We further tackle the horizontal scaling challenge and propose a distributed queue-oriented transaction processing engine that relies on queue-oriented communication to eliminate the traditional overhead of commitment protocols for multi-partition transactions. We build Q-Store, and demonstrate up to 22x improvement in system throughput over the state-of-the-art deterministic transaction processing systems.

Finally, we propose a generalized framework for designing distributed and replicated deterministic transaction processing systems. We introduce the concept of speculative replication to hide the latency overhead of replication. We prototype the speculative replication protocol in QR-Store and perform an extensive experimental evaluation using standard benchmarks. We show that QR-Store can achieve a throughput of 1.9 million replicated transactions per second in under 200 milliseconds and a replication overhead of 8%-25%compared to non-replicated configurations.

Dissertations / Theses on the topic 'Distributed computing infrastructure'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles