Log in

Relevant bibliographies by topics / Large-scale parallel simulations / Dissertations / Theses

Dissertations / Theses on the topic 'Large-scale parallel simulations'

To see the other types of publications on this topic, follow the link: Large-scale parallel simulations.

Author: Grafiati

Published: 10 December 2022

Last updated: 29 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 32 dissertations / theses for your research on the topic 'Large-scale parallel simulations.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Benson, Kirk C. "Adaptive Control of Large-Scale Simulations." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5002.

Full text

Abstract:

This thesis develops adaptive simulation control techniques that differentiate between competing system configurations. Here, a system is a real world environment under analysis. In this context, proposed modifications to a system denoted by different configurations are evaluated using large-scale hybrid simulation. Adaptive control techniques, using ranking and selection methods, compare the relative worth of competing configurations and use these comparisons to control the number of required simulation observations. Adaptive techniques necessitate embedded statistical computations suitable for the variety of data found in detailed simulations, including hybrid and agent-based simulations. These embedded statistical computations apply efficient sampling methods to collect data from simulations running on a network of workstations. The National Airspace System provides a test case for the application of these techniques to the analysis and design of complex systems, implemented here in the Reconfigurable Flight Simulator, a large-scale hybrid simulation. Implications of these techniques for the use of simulation as a design activity are also presented.

APA, Harvard, Vancouver, ISO, and other styles

2

Pulla, Gautam. "High Performance Computing Issues in Large-Scale Molecular Statics Simulations." Thesis, Virginia Tech, 1999. http://hdl.handle.net/10919/33206.

Full text

Abstract:

Successful application of parallel high performance computing to practical problems requires overcoming several challenges. These range from the need to make sequential and parallel improvements in programs to the implementation of software tools which create an environment that aids sharing of high performance hardware resources and limits losses caused by hardware and software failures. In this thesis we describe our approach to meeting these challenges in the context of a Molecular Statics code. We describe sequential and parallel optimizations made to the code and also a suite of tools constructed to facilitate the execution of the Molecular Statics program on a network of parallel machines with the aim of increasing resource sharing, fault tolerance and availability.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

3

Kamal, Tariq. "Computational Cost Analysis of Large-Scale Agent-Based Epidemic Simulations." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/82507.

Full text

Abstract:

Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

4

De, Grande Robson E. "Dynamic Load Balancing Schemes for Large-scale HLA-based Simulations." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23110.

Full text

Abstract:

Dynamic balancing of computation and communication load is vital for the execution stability and performance of distributed, parallel simulations deployed on shared, unreliable resources of large-scale environments. High Level Architecture (HLA) based simulations can experience a decrease in performance due to imbalances that are produced initially and/or during run-time. These imbalances are generated by the dynamic load changes of distributed simulations or by unknown, non-managed background processes resulting from the non-dedication of shared resources. Due to the dynamic execution characteristics of elements that compose distributed simulation applications, the computational load and interaction dependencies of each simulation entity change during run-time. These dynamic changes lead to an irregular load and communication distribution, which increases overhead of resources and execution delays. A static partitioning of load is limited to deterministic applications and is incapable of predicting the dynamic changes caused by distributed applications or by external background processes. Due to the relevance in dynamically balancing load for distributed simulations, many balancing approaches have been proposed in order to offer a sub-optimal balancing solution, but they are limited to certain simulation aspects, specific to determined applications, or unaware of HLA-based simulation characteristics. Therefore, schemes for balancing the communication and computational load during the execution of distributed simulations are devised, adopting a hierarchical architecture. First, in order to enable the development of such balancing schemes, a migration technique is also employed to perform reliable and low-latency simulation load transfers. Then, a centralized balancing scheme is designed; this scheme employs local and cluster monitoring mechanisms in order to observe the distributed load changes and identify imbalances, and it uses load reallocation policies to determine a distribution of load and minimize imbalances. As a measure to overcome the drawbacks of this scheme, such as bottlenecks, overheads, global synchronization, and single point of failure, a distributed redistribution algorithm is designed. Extensions of the distributed balancing scheme are also developed to improve the detection of and the reaction to load imbalances. These extensions introduce communication delay detection, migration latency awareness, self-adaptation, and load oscillation prediction in the load redistribution algorithm. Such developed balancing systems successfully improved the use of shared resources and increased distributed simulations' performance.

APA, Harvard, Vancouver, ISO, and other styles

5

Li, Qiang. "Simulations of turbulent boundary layers with heat transfer." Licentiate thesis, Stockholm : Skolan för teknikvetenskap, Kungliga Tekniska högskolan, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-11320.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Verma, Poonam Santosh. "Large Scale Computer Investigations of Non-Equilibrium Surface Growth for Surfaces From Parallel Discrete Event Simulations." MSSTATE, 2004. http://sun.library.msstate.edu/ETD-db/theses/available/etd-04192004-140532/.

Full text

Abstract:

The asymptotic scaling properties of conservative algorithms for parallel discrete-event simulations (e.g.: for spatially distributed parallel simulations of dynamic Monte Carlo for spin systems) of one-dimensional systems with system size $L$ is studied. The particular case studied here is the case of one or two elements assigned to each processor element. The previously studied case of one element per processor is reviewed, and the two elements per processor case is presented. The key concept is a simulated time horizon which is an evolving non equilibrium surface, specific for the particular algorithm. It is shown that the flat-substrate initial condition is responsible for the existence of an initial non-scaling regime. Various methods to deal with this non-scaling regime are documented, both the final successful method and unsuccessful attempts. The width of this time horizon relates to desynchronization in the system of processors. Universal properties of the conservative time horizon are derived by constructing a distribution of the interface width at saturation.

APA, Harvard, Vancouver, ISO, and other styles

7

Kelling, Jeffrey [Verfasser], Sibylle [Akademischer Betreuer] Gemming, Sibylle [Gutachter] Gemming, and Martin [Gutachter] Weigel. "Efficient Parallel Monte-Carlo Simulations for Large-Scale Studies of Surface Growth Processes / Jeffrey Kelling ; Gutachter: Sibylle Gemming, Martin Weigel ; Betreuer: Sibylle Gemming." Chemnitz : Technische Universität Chemnitz, 2018. http://d-nb.info/121482109X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Dad, Cherifa. "Méthodologie et algorithmes pour la distribution large échelle de co-simulations de systèmes complexes : application aux réseaux électriques intelligents (Smart Grids)." Electronic Thesis or Diss., CentraleSupélec, 2018. http://www.theses.fr/2018CSUP0004.

Full text

Abstract:

L’apparition des réseaux électriques intelligents, ou « Smart Grids », engendre de profonds changements dans le métier de la distribution d’électricité. En effet, ces réseaux voient apparaître de nouveaux usages (véhicules électriques, climatisation) et de nouveaux producteurs décentralisés (photovoltaïque, éolien), ce qui rend plus difficile le besoin d’équilibre entre l’offre et la demande en électricité et qui impose d’introduire une forme d’intelligence répartie entre leurs différents composants. Au vu de la complexité et de l’ampleur de la mise en oeuvre des Smart Grids, il convient tout d’abord de procéder à des simulations afin de valider leur fonctionnement. Pour ce faire, CentraleSupélec et EDF R&D (au sein de l’institut RISEGrid) ont développé DACCOSIM, une plate-forme de co-simulation s’appuyant sur la norme FMI1(Functional Mock-up Interface), permettant de concevoir et de mettre au point des réseaux électriques intelligents et de grandes tailles. Les composants clés de cette plate-forme sont représentés sous forme de boîtes grises appelées FMU (Functional Mock-up Unit). En outre, les simulateurs des systèmes physiques des Smart Grids peuvent faire des retours arrière en cas de problème dans leurs calculs, contrairement aux simulateurs événementiels (unités de contrôle) qui, bien souvent, ne peuvent qu’avancer dans le temps. Pour faire collaborer ces différents simulateurs, nous avons conçu une solution hybride prenant en considération les contraintes de tous les composants, et permettant d’identifier précisément les types d’événements auxquels le système est confronté. Cette étude a débouché sur une proposition d’évolution de la norme FMI. Par ailleurs, il est difficile de simuler rapidement et efficacement un Smart Grid, surtout lorsque le problème est à l’échelle nationale ou même régionale. Pour pallier ce manque, nous nous sommes focalisés sur la partie la plus gourmande en calcul, à savoir la co-simulation des dispositifs physiques. Ainsi, nous avons proposé des méthodologies, approches et algorithmes permettant de répartir efficacement et rapidement ces différentes FMU sur des architectures distribuées. L’implantation de ces algorithmes a déjà permis de co-simuler des cas métiers de grande taille sur un cluster de PC multi-coeurs. L’intégration de ces méthodes dans DACCOSIM permettraaux ingénieurs d’EDF de concevoir des « réseaux électriques intelligents de très grande taille » plus résistants aux pannes
The emergence of Smart Grids is causing profound changes in the electricity distribution business. Indeed, these networks are seeing new uses (electric vehicles, air conditioning) and new decentralized producers (photovoltaic, wind), which make it more difficult to ensure a balance between electricity supply and demand, and imposes to introduce a form of distributed intelligence between their different components. Considering its complexity and the extent of its implementation, it is necessary to co-simulate it in order to validate its performances. In the RISEGrid institute, CentraleSupélec and EDF R&D have developed a co-simulation platform based on the FMI2 (Functional Mock-up Interface) standard called DACCOSIM, permitting to design and develop Smart Grids. The key components of this platform are represented as gray boxes called FMUs (Functional Mock-up Unit). In addition, simulators of the physical systems of Smart Grids can make backtracking when an inaccuracy is suspected in FMU computations, unlike discrete simulators (control units) that often can only advance in time. In order these different simulators collaborate, we designed a hybrid solution that takes into account the constraints of all the components, and precisely identifies the types of the events that system is facing. This study has led to a FMI standard change proposal. Moreover, it is difficult to rapidly design an efficient Smart Grid simulation, especially when the problem has a national or even a regional scale.To fill this gap,we have focused on the most computationally intensive part, which is the simulation of physical devices. We have therefore proposed methodologies, approaches and algorithms to quickly and efficiently distribute these different FMUs on distributed architectures. The implementation of these algorithms has already allowed simulating large-scale business cases on a multi-core PC cluster. The integration of these methods into DACCOSIM will enable EDF engineers to design « large scale Smart Grids » which will be more resistant to breakdowns

APA, Harvard, Vancouver, ISO, and other styles

9

Tarabay, Ranine. "Simulations des écoulements sanguins dans des réseaux vasculaires complexes." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD034/document.

Full text

Abstract:

Au cours des dernières décennies, des progrès remarquables ont été réalisés au niveau de la simulation d’écoulements sanguins dans des modèles anatomiques réalistes construits à partir de données d'imagerie médicale 3D en vue de simulation hémodynamique et physiologique 3D à grande échelle. Alors que les modèles anatomiques précis sont d'une importance primordiale pour simuler le flux sanguin, des conditions aux limites réalistes sont également importantes surtout lorsqu’il s’agit de calculer des champs de vitesse et de pression. La première cible de cette thèse était d'étudier l'analyse de convergence des inconnus pour différents types de conditions aux limites permettant un cadre flexible par rapport au type de données d'entrée (vitesse, pression, débit, ...). Afin de faire face au grand coût informatique associé, nécessitant un calcul haute performance, nous nous sommes intéressés à comparer les performances de deux préconditionneurs par blocs; le preconditionneur LSC (Least-Squared Commutator et le preconditionneur PCD (Pressure Convection Diffusion). Dans le cadre de cette thèse, nous avons implémenté ce dernier dans la bibliothèque Feel++. Dans le but de traiter l'interaction fluide-structure, nous nous sommes focalisés sur l'approximation de la force exercée par le fluide sur la structure, un champ essentiel intervenant dans la condition de continuité pour assurer le couplage du modèle de fluide avec le modèle de structure. Enfin, afin de valider nos choix numériques, deux cas tests ont été réalisés et une comparaison avec les données expérimentales et numériques a été établie et validée (le benchmark FDA et le benchmark Phantom)
Towards a large scale 3D computational model of physiological hemodynamics, remarkable progress has been made in simulating blood flow in realistic anatomical models constructed from three-dimensional medical imaging data in the past few decades. When accurate anatomic models are of primary importance in simulating blood flow, realistic boundary conditions are equally important in computing velocity and pressure fields. Thus, the first target of this thesis was to investigate the convergence analysis of the unknown fields for various types of boundary conditions allowing for a flexible framework with respect to the type of input data (velocity, pressure, flow rate, ...). In order to deal with the associated large computational cost, requiring high performance computing, we were interested in comparing the performance of two block preconditioners; the least-squared commutator preconditioner and the pressure convection diffusion preconditioner. We implemented the latter, in the context of this thesis, in the Feel++ library. With the purpose of handling the fluid-structure interaction, we focused of the approximation of the force exerted by the fluid on the structure, a field that is essential while setting the continuity condition to ensure the coupling of the fluid model with the structure model. Finally, in order to assess our numerical choices, two benchmarks (the FDA benchmark and the Phantom benchmark) were carried out, and a comparison with respect to experimental and numerical data was established and validated

APA, Harvard, Vancouver, ISO, and other styles

10

Grass, Thomas. "Simulation methodologies for future large-scale parallel systems." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/461198.

Full text

Abstract:

Since the early 2000s, computer systems have seen a transition from single-core to multi-core systems. While single-core systems included only one processor core on a chip, current multi-core processors include up to tens of cores on a single chip, a trend which is likely to continue in the future. Today, multi-core processors are ubiquitous. They are used in all classes of computing systems, ranging from low-cost mobile phones to high-end High-Performance Computing (HPC) systems. Designing future multi-core systems is a major challenge [12]. The primary design tool used by computer architects in academia and industry is architectural simulation. Simulating a computer system executing a program is typically several orders of magnitude slower than running the program on a real system. Therefore, new techniques are needed to speed up simulation and allow the exploration of large design spaces in a reasonable amount of time. One way of increasing simulation speed is sampling. Sampling reduces simulation time by simulating only a representative subset of a program in detail. In this thesis, we present a workload analysis of a set of task-based programs. We then use the insights from this study to propose TaskPoint, a sampled simulation methodology for task-based programs. Task-based programming models can reduce the synchronization costs of parallel programs on multi-core systems and are becoming increasingly important. Finally, we present MUSA, a simulation methodology for simulating applications running on thousands of cores on a hybrid, distributed shared-memory system. The simulation time required for simulation with MUSA is comparable to the time needed for native execution of the simulated program on a production HPC system. The techniques developed in the scope of this thesis permit researchers and engineers working in computer architecture to simulate large workloads, which were infeasible to simulate in the past. Our work enables architectural research in the fields of future large-scale shared-memory and hybrid, distributed shared-memory systems.
Des dels principis dels anys 2000, els sistemes d'ordinadors han experimentat una transició de sistemes d'un sol nucli a sistemes de múltiples nuclis. Mentre els sistemes d'un sol nucli incloïen només un nucli en un xip, els sistemes actuals de múltiples nuclis n'inclouen desenes, una tendència que probablement continuarà en el futur. Avui en dia, els processadors de múltiples nuclis són omnipresents. Es fan servir en totes les classes de sistemes de computació, de telèfons mòbils de baix cost fins a sistemes de computació d'alt rendiment. Dissenyar els futurs sistemes de múltiples nuclis és un repte important. L'eina principal usada pels arquitectes de computadors, tant a l'acadèmia com a la indústria, és la simulació. Simular un ordinador executant un programa típicament és múltiples ordres de magnitud més lent que executar el mateix programa en un sistema real. Per tant, es necessiten noves tècniques per accelerar la simulació i permetre l'exploració de grans espais de disseny en un temps raonable. Una manera d'accelerar la velocitat de simulació és la simulació mostrejada. La simulació mostrejada redueix el temps de simulació simulant en detall només un subconjunt representatiu d¿un programa. En aquesta tesi es presenta una anàlisi de rendiment d'una col·lecció de programes basats en tasques. Com a resultat d'aquesta anàlisi, proposem TaskPoint, una metodologia de simulació mostrejada per programes basats en tasques. Els models de programació basats en tasques poden reduir els costos de sincronització de programes paral·lels executats en sistemes de múltiples nuclis i actualment estan guanyant importància. Finalment, presentem MUSA, una metodologia de simulació per simular aplicacions executant-se en milers de nuclis d'un sistema híbrid, que consisteix en nodes de memòria compartida que formen un sistema de memòria distribuïda. El temps que requereixen les simulacions amb MUSA és comparable amb el temps que triga l'execució nativa en un sistema d'alt rendiment en producció. Les tècniques desenvolupades al llarg d'aquesta tesi permeten simular execucions de programes que abans no eren viables, tant als investigadors com als enginyers que treballen en l'arquitectura de computadors. Per tant, aquest treball habilita futura recerca en el camp d'arquitectura de sistemes de memòria compartida o distribuïda, o bé de sistemes híbrids, a gran escala.
A principios de los años 2000, los sistemas de ordenadores experimentaron una transición de sistemas con un núcleo a sistemas con múltiples núcleos. Mientras los sistemas single-core incluían un sólo núcleo, los sistemas multi-core incluyen decenas de núcleos en el mismo chip, una tendencia que probablemente continuará en el futuro. Hoy en día, los procesadores multi-core son omnipresentes. Se utilizan en todas las clases de sistemas de computación, de teléfonos móviles de bajo coste hasta sistemas de alto rendimiento. Diseñar sistemas multi-core del futuro es un reto importante. La herramienta principal usada por arquitectos de computadores, tanto en la academia como en la industria, es la simulación. Simular un computador ejecutando un programa típicamente es múltiples ordenes de magnitud más lento que ejecutar el mismo programa en un sistema real. Por ese motivo se necesitan nuevas técnicas para acelerar la simulación y permitir la exploración de grandes espacios de diseño dentro de un tiempo razonable. Una manera de aumentar la velocidad de simulación es la simulación muestreada. La simulación muestreada reduce el tiempo de simulación simulando en detalle sólo un subconjunto representativo de la ejecución entera de un programa. En esta tesis presentamos un análisis de rendimiento de una colección de programas basados en tareas. Como resultado de este análisis presentamos TaskPoint, una metodología de simulación muestreada para programas basados en tareas. Los modelos de programación basados en tareas pueden reducir los costes de sincronización de programas paralelos ejecutados en sistemas multi-core y actualmente están ganando importancia. Finalmente, presentamos MUSA, una metodología para simular aplicaciones ejecutadas en miles de núcleos de un sistema híbrido, compuesto de nodos de memoria compartida que forman un sistema de memoria distribuida. El tiempo de simulación que requieren las simulaciones con MUSA es comparable con el tiempo necesario para la ejecución del programa simulado en un sistema de alto rendimiento en producción. Las técnicas desarolladas al largo de esta tesis permiten a los investigadores e ingenieros trabajando en la arquitectura de computadores simular ejecuciones largas, que antes no se podían simular. Nuestro trabajo facilita nuevos caminos de investigación en los campos de sistemas de memoria compartida o distribuida y en sistemas híbridos.

APA, Harvard, Vancouver, ISO, and other styles

11

Sornil, Ohm. "Parallel Inverted Indices for Large-Scale, Dynamic Digital Libraries." Diss., Virginia Tech, 2001. http://hdl.handle.net/10919/26131.

Full text

Abstract:

The dramatic increase in the amount of content available in digital forms gives rise to large-scale digital libraries, targeted to support millions of users and terabytes of data. Retrieving information from a system of this scale in an efficient manner is a challenging task due to the size of the collection as well as the index. This research deals with the design and implementation of an inverted index that supports searching for information in a large-scale digital library, implemented atop a massively parallel storage system. Inverted index partitioning is studied in a simulation environment, aiming at a terabyte of text. As a result, a high performance partitioning scheme is proposed. It combines the best qualities of the term and document partitioning approaches in a new Hybrid Partitioning Scheme. Simulation experiments show that this organization provides good performance over a wide range of conditions. Further, the issues of creation and incremental updates of the index are considered. A disk-based inversion algorithm and an extensible inverted index architecture are described, and experimental results with actual collections are presented. Finally, distributed algorithms to create a parallel inverted index partitioned according to the hybrid scheme are proposed, and performance is measured on a portion of the equipment that normally makes up the 100 node Virginia Tech PetaPlex™ system. NOTE: (02/2007) An updated copy of this ETD was added after there were patron reports of problems with the file.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

12

Gusukuma, Luke. "GPU Based Large Scale Multi-Agent Crowd Simulation and Path Planning." Thesis, Virginia Tech, 2015. http://hdl.handle.net/10919/78098.

Full text

Abstract:

Crowd simulation is used for many applications including (but not limited to) videogames, building planning, training simulators, and various virtual environment applications. Particularly, crowd simulation is most useful for when real life practices wouldn't be practical such as repetitively evacuating a building, testing the crowd flow for various building blue prints, placing law enforcers in actual crowd suppression circumstances, etc. In our work, we approach the fidelity to scalability problem of crowd simulation from two angles, a programmability angle, and a scalability angle, by creating new methodology building off of a struct of arrays approach and transforming it into an Object Oriented Struct of Arrays approach. While the design pattern itself is applied to crowd simulation in our work, the application of crowd simulation exemplifies the variety of applications for which the design pattern can be used.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

13

Kodukula, Surya Ravikiran. "An Adaptive Time Window Algorithm for Large Scale Network Emulation." Thesis, Virginia Tech, 2002. http://hdl.handle.net/10919/31160.

Full text

Abstract:

With the continuing growth of the Internet and network protocols, there is a need for Protocol Development Environments. Simulation environments like ns and OPNET require protocol code to be rewritten in a discrete event model. Direct Code Execution Environments (DCEE) solve the Verification and Validation problems by supporting the execution of unmodified protocol code in a controlled environment. Open Network Emulator (ONE) is a system supporting Direct Code Execution in a parallel environment - allowing unmodified protocol code to run on top of a parallel simulation layer, capable of simulating complex network topologies. Traditional approaches to the problem of Parallel Discrete Event Simulation (PDES) broadly fall into two categories. Conservative approaches allow processing of events only after it has been asserted that the event handling would not result in a causality error. Optimistic approaches allow for causality errors and support means of restoring state â i.e., rollback. All standard approaches to the problem of PDES are either flawed by their assumption of existing event patterns in the system or cannot be applied to ONE due to their restricted analysis on simplified models like queues and Petri-nets. The Adaptive Time Window algorithm is a bounded optimistic parallel simulation algorithm with the capability to change the degree of optimism with changes in the degree of causality in the network. The optimism at any instant is bounded by the amount of virtual time called the time window. The algorithm assumes efficient rollback capabilities supported by the â Weavesâ framework. The algorithm is reactive and responds to changes in the degree of causality in the system by adjusting the length of its time window. With sufficient history gathered the algorithm adjusts to the increasing causality in the system with a small time window (conservative approach) and increases to a higher value (optimistic approach) during idle periods. The problem of splitting the entire simulation run into time windows of arbitrary length, whereby the total number of rollbacks in the system is minimal, is NP-complete. The Adaptive Time Window algorithm is compared against offline greedy approaches to the NP-complete problem called Oracle Computations. The total number of rollbacks in the system and the total execution time for the Adaptive Time Window algorithm were comparable to the ones for Oracle Computations.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

14

Liu, Xing. "High-performance algorithms and software for large-scale molecular simulation." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53487.

Full text

Abstract:

Molecular simulation is an indispensable tool in many different disciplines such as physics, biology, chemical engineering, materials science, drug design, and others. Performing large-scale molecular simulation is of great interest to biologists and chemists, because many important biological and pharmaceutical phenomena can only be observed in very large molecule systems and after sufficiently long time dynamics. On the other hand, molecular simulation methods usually have very steep computational costs, which limits current molecular simulation studies to relatively small systems. The gap between the scale of molecular simulation that existing techniques can handle and the scale of interest has become a major barrier for applying molecular simulation to study real-world problems. In order to study large-scale molecular systems using molecular simulation, it requires developing highly parallel simulation algorithms and constantly adapting the algorithms to rapidly changing high performance computing architectures. However, many existing algorithms and codes for molecular simulation are from more than a decade ago, which were designed for sequential computers or early parallel architectures. They may not scale efficiently and do not fully exploit features of today's hardware. Given the rapid evolution in computer architectures, the time has come to revisit these molecular simulation algorithms and codes. In this thesis, we demonstrate our approach to addressing the computational challenges of large-scale molecular simulation by presenting both the high-performance algorithms and software for two important molecular simulation applications: Hartree-Fock (HF) calculations and hydrodynamics simulations, on highly parallel computer architectures. The algorithms and software presented in this thesis have been used by biologists and chemists to study some problems that were unable to solve using existing codes. The parallel techniques and methods developed in this work can be also applied to other molecular simulation applications.

APA, Harvard, Vancouver, ISO, and other styles

15

Perumalla, Kalyan S. "Techniques for efficient parallel simulation and their application to large-scale telecommunication network models." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13086.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Uppala, Roshni. "Simulating Large Scale Memristor Based Crossbar for Neuromorphic Applications." University of Dayton / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1429296073.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Sun, Yi. "High Performance Simulation of DEVS Based Large Scale Cellular Space Models." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/cs_diss/40.

Full text

Abstract:

Cellular space modeling is becoming an increasingly important modeling paradigm for modeling complex systems with spatial-temporal behaviors. The growing demand for cellular space models has directed researchers to use different modeling formalisms, among which Discrete Event System Specification (DEVS) is widely used due to its formal modeling and simulation framework. The increasing complexity of systems to be modeled asks for cellular space models with large number of cells for modeling the systems¡¯ spatial-temporal behavior. Improving simulation performance becomes crucial for simulating large scale cellular space models. In this dissertation, we proposed a framework for improving simulation performance for large scale DEVS-based cellular space models. The framework has a layered structure, which includes modeling, simulation, and network layers corresponding to the DEVS-based modeling and simulation architecture. Based on this framework, we developed methods at each layer to overcome performance issues for simulating large scale cellular space models. Specifically, to increase the runtime and memory efficiency for simulating large number of cells, we applied Dynamic Structure DEVS (DSDEVS) to cellular space modeling and carried out comprehensive performance measurement. DSDEVS improves simulation performance by making the simulation focus only on those active models, and thus be more efficient than when the entire cellular space is loaded. To reduce the number of simulation cycles caused by extensive message passing among cells, we developed a pre-schedule modeling approach that exploits the model behavior for improving simulation performance. At the network layer, we developed a modified time-warp algorithm that supports parallel simulation of DEVS-based cellular space models. The developed methods have been applied to large scale wildfire spread simulations based on the DEVS-FIRE simulation environment and have achieved significant performance results.

APA, Harvard, Vancouver, ISO, and other styles

18

Ahn, Tae-Hyuk. "Computational Techniques for the Analysis of Large Scale Biological Systems." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/77162.

Full text

Abstract:

An accelerated pace of discovery in biological sciences is made possible by a new generation of computational biology and bioinformatics tools. In this dissertation we develop novel computational, analytical, and high performance simulation techniques for biological problems, with applications to the yeast cell division cycle, and to the RNA-Sequencing of the yellow fever mosquito. Cell cycle system evolves stochastic effects when there are a small number of molecules react each other. Consequently, the stochastic effects of the cell cycle are important, and the evolution of cells is best described statistically. Stochastic simulation algorithm (SSA), the standard stochastic method for chemical kinetics, is often slow because it accounts for every individual reaction event. This work develops a stochastic version of a deterministic cell cycle model, in order to capture the stochastic aspects of the evolution of the budding yeast wild-type and mutant strain cells. In order to efficiently run large ensembles to compute statistics of cell evolution, the dissertation investigates parallel simulation strategies, and presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms. This work also proposes new accelerated stochastic simulation algorithms based on a fully implicit approach and on stochastic Taylor expansions. Next Generation RNA-Sequencing, a high-throughput technology to sequence cDNA in order to get information about a sample's RNA content, is becoming an efficient genomic approach to uncover new genes and to study gene expression and alternative splicing. This dissertation develops efficient algorithms and strategies to find new genes in Aedes aegypti, which is the most important vector of dengue fever and yellow fever. We report the discovery of a large number of new gene transcripts, and the identification and characterization of genes that showed male-biased expression profiles. This basic information may open important avenues to control mosquito borne infectious diseases.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

19

Zönnchen, Benedikt Sebastian [Verfasser], Hans-Joachim [Akademischer Betreuer] Bungartz, Hans-Joachim [Gutachter] Bungartz, and Gerta [Gutachter] Köster. "Efficient parallel algorithms for large-scale pedestrian simulation / Benedikt Sebastian Zönnchen ; Gutachter: Hans-Joachim Bungartz, Gerta Köster ; Betreuer: Hans-Joachim Bungartz." München : Universitätsbibliothek der TU München, 2021. http://d-nb.info/1237048850/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Tiew, Chin-Yaw. "On improving the performance of parallel fault simulation for synchronous sequential circuits." Thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-03042009-040323/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Manalo, Kevin. "Detailed analysis of phase space effects in fuel burnup/depletion for PWR assembly & full core models using large-scale parallel computation." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50351.

Full text

Abstract:

Nuclear nonproliferation research and forensics have a need for improved software solutions, particularly in the estimates of the transmutation of nuclear fuel during burnup and depletion. At the same time, parallel computers have become effectively sized to enable full core simulations using highly-detailed 3d mesh models. In this work, the capability for modeling 3d reactor models is researched with PENBURN, a burnup/depletion code that couples to the PENTRAN Parallel Sn Transport Solver and also to the Monte Carlo solver MCNP5 using the multigroup option. This research is computationally focused, but will also compare a subset of results of experimental Pressurized Water Reactor (PWR) burnup spectroscopy data available with a designated BR3 PWR burnup benchmark. Also, this research will analyze large-scale Cartesian mesh models that can be feasibly modeled for 3d burnup, as well as investigate the improvement of finite differencing schemes used in parallel discrete ordinates transport with PENTRAN, in order to optimize runtimes for full core transport simulation, and provide comparative results with Monte Carlo simulations. Also, the research will consider improvements to software that will be parallelized, further improving large model simulation using hybrid OpenMP-MPI. The core simulations that form the basis of this research, utilizing discrete ordinates methods and Monte Carlo methods to drive time and space dependent isotopic reactor production using the PENBURN code, will provide more accurate detail of fuel compositions that can benefit nuclear safety, fuel management, non-proliferation, and safeguards applications.

APA, Harvard, Vancouver, ISO, and other styles

22

Lannez, Sébastien. "Optimisation des tournées d'inspection des voies." Phd thesis, INSA de Toulouse, 2010. http://tel.archives-ouvertes.fr/tel-00595070.

Full text

Abstract:

La SNCF utilise plusieurs engins spécialisés pour ausculter les fissures internes du rail. La fréquence d'auscultation de chaque rail est fonction du tonnage cumulé qui passe dessus. La programmation des engins d'auscultations ultrasonores est aujourd'hui décentralisée. Dans le cadre d'une étude de réorganisation, la SNCF souhaite étudier la faisabilité de l'optimisation de certaines tournées d'inspection. Dans le cadre de cette thèse de doctorat, l'optimisation de la programmation des engins d'auscultation à ultrasons est étudiée. Une modélisation mathématique sous forme de problème de tournées sur arcs généralisant plusieurs problèmes académiques est proposées. Une méthode de résolution exacte, appliquant la décomposition de Benders, est détaillée. À partir de cette approche, une heuristique de génération de colonnes et de contraintes est présentée et analysée numériquement sur des données réelles de 2009. Enfin, un logiciel industriel développé autour de cette approche est présenté.

APA, Harvard, Vancouver, ISO, and other styles

23

Ameli, Mostafa. "Heuristic Methods for Calculating Dynamic Traffic Assignment Simulation-based dynamic traffic assignment: meta-heuristic solution methods with parallel computing Non-unicity of day-to-day multimodal user equilibrium: the network design history effect Improving traffic network performance with road banning strategy: a simulation approach comparing user equilibrium and system optimum." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSET009.

Full text

Abstract:

Les systèmes de transport sont caractérisés de manière dynamique non seulement par des interactions non linéaires entre les différents composants, mais également par des boucles de rétroaction entre l'état du réseau et les décisions des utilisateurs. En particulier, la congestion du réseau impacte à la fois la répartition de la demande locale en modifiant les choix d’itinéraire et la demande multimodale globale. Selon les conditions du réseau, ils peuvent décider de changer, par exemple, leur mode de transport. Plusieurs équilibres peuvent être définis pour les systèmes de transport. L'équilibre de l'utilisateur correspond à la situation dans laquelle chaque utilisateur est autorisé à se comporter de manière égoïste et à minimiser ses propres frais de déplacement. L'optimum du système correspond à une situation où le coût total du transport de tous les utilisateurs est minimal. Dans ce contexte, l’étude vise à calculer les modèles de flux d'itinéraires dans un réseau prenant en compte différentes conditions d’équilibre et à étudier l’équilibre du réseau dans un contexte dynamique. L'étude se concentre sur des modèles de trafic capables de représenter une dynamique du trafic urbain à grande échelle. Trois sujets principaux sont abordés. Premièrement, des méthodes heuristiques et méta-heuristiques rapides sont développées pour déterminer les équilibres avec différents types de trafic. Deuxièmement, l'existence et l'unicité des équilibres d'utilisateurs sont étudiées. Lorsqu'il n'y a pas d'unicité, la relation entre des équilibres multiples est examinée. De plus, l'impact de l'historique du réseau est analysé. Troisièmement, une nouvelle approche est développée pour analyser l’équilibre du réseau en fonction du niveau de la demande. Cette approche compare les optima des utilisateurs et du système et vise à concevoir des stratégies de contrôle afin de déplacer la situation d'équilibre de l'utilisateur vers l'optimum du système
Transport systems are dynamically characterized not only by nonlinear interactions between the different components but also by feedback loops between the state of the network and the decisions of users. In particular, network congestion affects both the distribution of local demand by modifying route choices and overall multimodal demand. Depending on the conditions of the network, they may decide to change for example their transportation mode. Several equilibria can be defined for transportation systems. The user equilibrium corresponds to the situation where each user is allowed to behave selfishly and to minimize his own travel costs. The system optimum corresponds to a situation where the total transport cost of all the users is minimum. In this context, the study aims to calculate route flow patterns in a network considering different equilibrium conditions and study the network equilibrium in a dynamic setting. The study focuses on traffic models capable of representing large-scale urban traffic dynamics. Three main issues are addressed. First, fast heuristic and meta-heuristic methods are developed to determine equilibria with different types of traffic patterns. Secondly, the existence and uniqueness of user equilibria is studied. When there is no uniqueness, the relationship between multiple equilibria is examined. Moreover, the impact of network history is analyzed. Thirdly, a new approach is developed to analyze the network equilibrium as a function of the level of demand. This approach compares user and system optimums and aims to design control strategies in order to move the user equilibrium situation towards the system optimum

APA, Harvard, Vancouver, ISO, and other styles

24

Malakar, Preeti. "Integrated Parallel Simulations and Visualization for Large-Scale Weather Applications." Thesis, 2013. http://etd.iisc.ernet.in/2005/3907.

Full text

Abstract:

The emergence of the exascale era necessitates development of new techniques to efficiently perform high-performance scientific simulations, online data analysis and on-the-fly visualization. Critical applications like cyclone tracking and earthquake modeling require high-fidelity and high- performance simulations involving large-scale computations and generate huge amounts of data. Faster simulations and simultaneous online data analysis and visualization enable scientists provide real-time guidance to policy makers. In this thesis, we present a set of techniques for efficient high-fidelity simulations, online data analysis and visualization in environments with varying resource configurations. First, we present a strategy for improving throughput of weather simulations with multiple regions of interest. We propose parallel execution of these nested simulations based on partitioning the 2D process grid into disjoint rectangular regions associated with each subdomain. The process grid partitioning is obtained from a Huffman tree which is constructed from the relative execution times of the subdomains. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. We observe up to 33% gain over the default strategy in weather models. Second, we propose a processor reallocation heuristic that minimizes data redistribution cost while reallocating processors in the case of dynamic regions of interest. This algorithm is based on hierarchical diffusion approach that uses a novel tree reorganization strategy. We have also developed a parallel data analysis algorithm to detect regions of interest within a domain. This helps improve performance of detailed simulations of multiple weather phenomena like depressions and clouds, thereby in- creasing the lead time to severe weather phenomena like tornadoes and storm surges. Our method is able to reduce the redistribution time by 25% over a simple partition from scratch method. We also show that it is important to consider resource constraints like I/O bandwidth, disk space and network bandwidth for continuous simulation and smooth visualization. High simulation rates on modern-day processors combined with high I/O bandwidth can lead to rapid accumulation of data at the simulation site and eventual stalling of simulations. We show that formulating the problem as an optimization problem can deter- mine optimal execution parameters for enabling smooth simulation and visualization. This approach proves beneficial for resource-constrained environments, whereas a naive greedy strategy leads to stalling and disk overflow. Our optimization method provides about 30% higher simulation rate and consumes about 25-50% lesser storage space than a naive greedy approach. We have then developed an integrated adaptive steering framework, InSt, that analyzes the combined e ect of user-driven steering with automatic tuning of application parameters based on resource constraints and the criticality needs of the application to determine the final parameters for the simulations. It is important to allow the climate scientists to steer the ongoing simulation, specially in the case of critical applications. InSt takes into account both the steering inputs of the scientists and the criticality needs of the application. Finally, we have developed algorithms to minimize the lag between the time when the simulation produces an output frame and the time when the frame is visualized. It is important to reduce the lag so that the scientists can get on-the- y view of the simulation, and concurrently visualize important events in the simulation. We present most-recent, auto-clustering and adaptive algorithms for reducing lag. The lag-reduction algorithms adapt to the available resource parameters and the number of pending frames to be sent to the visualization site by transferring a representative subset of frames. Our adaptive algorithm reduces lag by 72% and provides 37% larger representativeness than the most-recent for slow networks.

APA, Harvard, Vancouver, ISO, and other styles

25

LIN, JUN-XIONG, and 林俊雄. "A parallel simulation model and its synchronization protocol for large scale discrete event simulations." Thesis, 1987. http://ndltd.ncl.edu.tw/handle/71848256931584644343.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Kelling, Jeffrey. "Efficient Parallel Monte-Carlo Simulations for Large-Scale Studies of Surface Growth Processes." 2017. https://monarch.qucosa.de/id/qucosa%3A31220.

Full text

Abstract:

Lattice Monte Carlo methods are used to investigate far from and out-of-equilibrium systems, including surface growth, spin systems and solid mixtures. Applications range from the determination of universal growth or aging behaviors to palpable systems, where coarsening of nanocomposites or self-organization of functional nanostructures are of interest. Such studies require observations of large systems over long times scales, to allow structures to grow over orders of magnitude, which necessitates massively parallel simulations. This work addresses the problem of parallel processing introducing correlations in Monte Carlo updates and proposes a virtually correlation-free domain decomposition scheme to solve it. The effect of correlations on scaling and dynamical properties of surface growth systems and related lattice gases is investigated further by comparing results obtained by correlation-free and intrinsically correlated but highly efficient simulations using a stochastic cellular automaton (SCA). Efficient massively parallel implementations on graphics processing units (GPUs) were developed, which enable large-scale simulations leading to unprecedented precision in the final results. The primary subject of study is the Kardar–Parisi–Zhang (KPZ) surface growth in (2 + 1) dimensions, which is simulated using a dimer lattice gas and the restricted solid-on-solid model (RSOS) model. Using extensive simulations, conjectures regard- ing growth, autocorrelation and autoresponse properties are tested and new precise numerical predictions for several universal parameters are made.:1. Introduction 1.1. Motivations and Goals 1.2. Overview 2. Methods and Models 2.1. Estimation of Scaling Exponents and Error Margins 2.2. From Continuum- to Atomistic Models 2.3. Models for Phase Ordering and Nanostructure Evolution 2.3.1. The Kinetic Metropolis Lattice Monte-Carlo Method 2.3.2. The Potts Model 2.4. The Kardar–Parisi–Zhang and Edwards–Wilkinson Universality Classes 2.4.0.1. Physical Aging 2.4.1. The Octahedron Model 2.4.2. The Restricted Solid on Solid Model 3. Parallel Implementation: Towards Large-Scale Simulations 3.1. Parallel Architectures and Programming Models 3.1.1. CPU 3.1.2. GPU 3.1.3. Heterogeneous Parallelism and MPI 3.1.4. Bit-Coding of Lattice Sites 3.2. Domain Decomposition for Stochastic Lattice Models 3.2.1. DD for Asynchronous Updates 3.2.1.1. Dead border (DB) 3.2.1.2. Double tiling (DT) 3.2.1.3. DT DD with random origin (DTr) 3.2.1.4. Implementation 3.2.2. Second DD Layer on GPUs 3.2.2.1. Single-Hit DT 3.2.2.2. Single-Hit dead border (DB) 3.2.2.3. DD Parameters for the Octahedron Model 3.2.3. Performance 3.3. Lattice Level DD: Stochastic Cellular Automaton 3.3.1. Local Approach for the Octahedron Model 3.3.2. Non-Local Approach for the Octahedron Model 3.3.2.1. Bit-Vectorized GPU Implementation 3.3.3. Performance of SCA Implementations 3.4. The Multi-Surface Coding Approach 3.4.0.1. Vectorization 3.4.0.2. Scalar Updates 3.4.0.3. Domain Decomposition 3.4.1. Implementation: SkyMC 3.4.1.1. 2d Restricted Solid on Solid Model 3.4.1.2. 2d and 3d Potts Model 3.4.1.3. Sequential CPU Reference 3.4.2. SkyMC Benchmarks 3.5. Measurements 3.5.0.1. Measurement Intervals 3.5.0.2. Measuring using Heterogeneous Resources 4. Monte-Carlo Investigation of the Kardar–Parisi–Zhang Universality Class 4.1. Evolution of Surface Roughness 4.1.1. Comparison of Parallel Implementations of the Octahedron Model 4.1.1.1. The Growth Regime 4.1.1.2. Distribution of Interface Heights in the Growth Regime 4.1.1.3. KPZ Ansatz for the Growth Regime 4.1.1.4. The Steady State 4.1.2. Investigations using RSOS 4.1.2.1. The Growth Regime 4.1.2.2. The Steady State 4.1.2.3. Consistency of Fine-Size Scaling with Respect to DD 4.1.3. Results for Growth Phase and Steady State 4.2. Autocorrelation Functions 4.2.1. Comparison of DD Methods for RS Dynamics 4.2.1.1. Device-Layer DD 4.2.1.2. Block-Layer DD 4.2.2. Autocorrelation Properties under RS Dynamics 4.2.3. Autocorrelation Properties under SCA Dynamics 4.2.3.1. Autocorrelation of Heights 4.2.3.2. Autocorrelation of Slopes 4.2.4. Autocorrelation in the SCA Steady State 4.2.5. Autocorrelation in the EW Case under SCA 4.2.5.1. Autocorrelation of Heights 4.2.5.2. Autocorrelations of Slopes 4.3. Autoresponse Functions 4.3.1. Autoresponse Properties 4.3.1.1. Autoresponse of Heights 4.3.1.2. Autoresponse of Slopes 4.3.1.3. Self-Averaging 4.4. Summary 5. Further Topics 5.1. Investigations of the Potts Model 5.1.1. Testing Results from the Parallel Implementations 5.1.2. Domain Growth in Disordered Potts Models 5.2. Local Scale Invariance in KPZ Surface Growth 6. Conclusions and Outlook Acknowledgements A. Coding Details A.1. Bit-Coding A.2. Packing and Unpacking Signed Integers A.3. Random Number Generation
Gitter-Monte-Carlo-Methoden werden zur Untersuchung von Systemen wie Oberflächenwachstum, Spinsystemen oder gemischten Feststoffen verwendet, welche fern eines Gleichgewichtes bleiben oder zu einem streben. Die Anwendungen reichen von der Bestimmung universellen Wachstums- und Alterungsverhaltens hin zu konkreten Systemen, in denen die Reifung von Nanokompositmaterialien oder die Selbstorganisation von funktionalen Nanostrukturen von Interesse sind. In solchen Studien müssen große Systemen über lange Zeiträume betrachtet werden, um Strukturwachstum über mehrere Größenordnungen zu erlauben. Dies erfordert massivparallele Simulationen. Diese Arbeit adressiert das Problem, dass parallele Verarbeitung Korrelationen in Monte-Carlo-Updates verursachen und entwickelt eine praktisch korrelationsfreie Domänenzerlegungsmethode, um es zu lösen. Der Einfluss von Korrelationen auf Skalierungs- und dynamische Eigenschaften von Oberflächenwachtums- sowie verwandten Gittergassystemen wird weitergehend durch den Vergleich von Ergebnissen aus korrelationsfreien und intrinsisch korrelierten Simulationen mit einem stochastischen zellulären Automaten untersucht. Effiziente massiv parallele Implementationen auf Grafikkarten wurden entwickelt, welche großskalige Simulationen und damit präzedenzlos genaue Ergebnisse ermöglichen. Das primäre Studienobjekt ist das (2 + 1)-dimensionale Kardar–Parisi–Zhang- Oberflächenwachstum, welches durch ein Dimer-Gittergas und das Kim-Kosterlitz-Modell simuliert wird. Durch massive Simulationen werden Thesen über Wachstums-, Autokorrelations- und Antworteigenschaften getestet und neue, präzise numerische Vorhersagen zu einigen universellen Parametern getroffen.:1. Introduction 1.1. Motivations and Goals 1.2. Overview 2. Methods and Models 2.1. Estimation of Scaling Exponents and Error Margins 2.2. From Continuum- to Atomistic Models 2.3. Models for Phase Ordering and Nanostructure Evolution 2.3.1. The Kinetic Metropolis Lattice Monte-Carlo Method 2.3.2. The Potts Model 2.4. The Kardar–Parisi–Zhang and Edwards–Wilkinson Universality Classes 2.4.0.1. Physical Aging 2.4.1. The Octahedron Model 2.4.2. The Restricted Solid on Solid Model 3. Parallel Implementation: Towards Large-Scale Simulations 3.1. Parallel Architectures and Programming Models 3.1.1. CPU 3.1.2. GPU 3.1.3. Heterogeneous Parallelism and MPI 3.1.4. Bit-Coding of Lattice Sites 3.2. Domain Decomposition for Stochastic Lattice Models 3.2.1. DD for Asynchronous Updates 3.2.1.1. Dead border (DB) 3.2.1.2. Double tiling (DT) 3.2.1.3. DT DD with random origin (DTr) 3.2.1.4. Implementation 3.2.2. Second DD Layer on GPUs 3.2.2.1. Single-Hit DT 3.2.2.2. Single-Hit dead border (DB) 3.2.2.3. DD Parameters for the Octahedron Model 3.2.3. Performance 3.3. Lattice Level DD: Stochastic Cellular Automaton 3.3.1. Local Approach for the Octahedron Model 3.3.2. Non-Local Approach for the Octahedron Model 3.3.2.1. Bit-Vectorized GPU Implementation 3.3.3. Performance of SCA Implementations 3.4. The Multi-Surface Coding Approach 3.4.0.1. Vectorization 3.4.0.2. Scalar Updates 3.4.0.3. Domain Decomposition 3.4.1. Implementation: SkyMC 3.4.1.1. 2d Restricted Solid on Solid Model 3.4.1.2. 2d and 3d Potts Model 3.4.1.3. Sequential CPU Reference 3.4.2. SkyMC Benchmarks 3.5. Measurements 3.5.0.1. Measurement Intervals 3.5.0.2. Measuring using Heterogeneous Resources 4. Monte-Carlo Investigation of the Kardar–Parisi–Zhang Universality Class 4.1. Evolution of Surface Roughness 4.1.1. Comparison of Parallel Implementations of the Octahedron Model 4.1.1.1. The Growth Regime 4.1.1.2. Distribution of Interface Heights in the Growth Regime 4.1.1.3. KPZ Ansatz for the Growth Regime 4.1.1.4. The Steady State 4.1.2. Investigations using RSOS 4.1.2.1. The Growth Regime 4.1.2.2. The Steady State 4.1.2.3. Consistency of Fine-Size Scaling with Respect to DD 4.1.3. Results for Growth Phase and Steady State 4.2. Autocorrelation Functions 4.2.1. Comparison of DD Methods for RS Dynamics 4.2.1.1. Device-Layer DD 4.2.1.2. Block-Layer DD 4.2.2. Autocorrelation Properties under RS Dynamics 4.2.3. Autocorrelation Properties under SCA Dynamics 4.2.3.1. Autocorrelation of Heights 4.2.3.2. Autocorrelation of Slopes 4.2.4. Autocorrelation in the SCA Steady State 4.2.5. Autocorrelation in the EW Case under SCA 4.2.5.1. Autocorrelation of Heights 4.2.5.2. Autocorrelations of Slopes 4.3. Autoresponse Functions 4.3.1. Autoresponse Properties 4.3.1.1. Autoresponse of Heights 4.3.1.2. Autoresponse of Slopes 4.3.1.3. Self-Averaging 4.4. Summary 5. Further Topics 5.1. Investigations of the Potts Model 5.1.1. Testing Results from the Parallel Implementations 5.1.2. Domain Growth in Disordered Potts Models 5.2. Local Scale Invariance in KPZ Surface Growth 6. Conclusions and Outlook Acknowledgements A. Coding Details A.1. Bit-Coding A.2. Packing and Unpacking Signed Integers A.3. Random Number Generation

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Keni, George J. Moridis, Yu-Shu Wu, and Karsten Pruess. "A DOMAIN DECOMPOSITION APPROACH FOR LARGE-SCALE SIMULATIONS OF FLOW PROCESSES IN HYDRATE-BEARING GEOLOGIC MEDIA." 2008. http://hdl.handle.net/2429/1166.

Full text

Abstract:

Simulation of the system behavior of hydrate-bearing geologic media involves solving fully coupled mass- and heat-balance equations. In this study, we develop a domain decomposition approach for large-scale gas hydrate simulations with coarse-granularity parallel computation. This approach partitions a simulation domain into small subdomains. The full model domain, consisting of discrete subdomains, is still simulated simultaneously by using multiple processes/processors. Each processor is dedicated to following tasks of the partitioned subdomain: updating thermophysical properties, assembling mass- and energy-balance equations, solving linear equation systems, and performing various other local computations. The linearized equation systems are solved in parallel with a parallel linear solver, using an efficient interprocess communication scheme. This new domain decomposition approach has been implemented into the TOUGH+HYDRATE code and has demonstrated excellent speedup and good scalability. In this paper, we will demonstrate applications for the new approach in simulating field-scale models for gas production from gas-hydrate deposits.

APA, Harvard, Vancouver, ISO, and other styles

28

Wang, Mingchao. "Large-Scale Simulation of Neural Networks with Biophysically Accurate Models on Graphics Processors." Thesis, 2012. http://hdl.handle.net/1969.1/ETD-TAMU-2012-05-11161.

Full text

Abstract:

Efficient simulation of large-scale mammalian brain models provides a crucial computational means for understanding complex brain functions and neuronal dynamics. However, such tasks are hindered by significant computational complexities. In this work, we attempt to address the significant computational challenge in simulating large-scale neural networks based on the most biophysically accurate Hodgkin-Huxley (HH) neuron models. Unlike simpler phenomenological spiking models, the use of HH models allows one to directly associate the observed network dynamics with the underlying biological and physiological causes, but at a significantly higher computational cost. We exploit recent commodity massively parallel graphics processors (GPUs) to alleviate the significant computational cost in HH model based neural network simulation. We develop look-up table based HH model evaluation and efficient parallel implementation strategies geared towards higher arithmetic intensity and minimum thread divergence. Furthermore, we adopt and develop advanced multi-level numerical integration techniques well suited for intricate dynamical and stability characteristics of HH models. On a commodity CPU card with 240 streaming processors, for a neural network with one million neurons and 200 million synaptic connections, the presented GPU neural network simulator is about 600X faster than a basic serial CPU based simulator, 28X faster than the CPU implementation of the proposed techniques, and only two to three times slower than the GPU based simulation using simpler spiking models.

APA, Harvard, Vancouver, ISO, and other styles

29

Hu, Jingzhen. "Biophysically Accurate Brain Modeling and Simulation using Hybrid MPI/OpenMP Parallel Processing." Thesis, 2012. http://hdl.handle.net/1969.1/ETD-TAMU-2012-05-11226.

Full text

Abstract:

In order to better understand the behavior of the human brain, it is very important to perform large scale neural network simulation which may reveal the relationship between the whole network activity and the biophysical dynamics of individual neurons. However, considering the complexity of the network and the large amount of variables, researchers choose to either simulate smaller neural networks or use simple spiking neuron models. Recently, supercomputing platforms have been employed to greatly speedup the simulation of large brain models. However, there are still limitations of these works such as the simplicity of the modeled network structures and lack of biophysical details in the neuron models. In this work, we propose a parallel simulator using biophysically realistic neural models for the simulation of large scale neural networks. In order to improve the performance of the simulator, we adopt several techniques such as merging linear synaptic receptors mathematically and using two level time steps, which significantly accelerate the simulation. In addition, we exploit the efficiency of parallel simulation through three parallel implementation strategies: MPI parallelization, MPI parallelization with dynamic load balancing schemes and Hybrid MPI/OpenMP parallelization. Through experimental studies, we illustrate the limitation of MPI implementation due to the imbalanced workload among processors. It is shown that the two developed MPI load balancing schemes are not able to improve the simulation efficiency on the targeted parallel platform. Using 32 processors, the proposed hybrid approach, on the other hand, is more efficient than the MPI implementation and is about 31X faster than a serial implementation of the simulator for a network consisting of more than 100,000 neurons. Finally, it is shown that for large neural networks, the presented approach is able to simulate the transition from the 3Hz delta oscillation to epileptic behaviors due to the alterations of underlying cellular mechanisms.

APA, Harvard, Vancouver, ISO, and other styles

30

Jalili-Marandi, Vahid. "Acceleration of Transient Stability Simulation for Large-Scale Power Systems on Parallel and Distributed Hardware." Phd thesis, 2010. http://hdl.handle.net/10048/1266.

Full text

Abstract:

Transient stability analysis is necessary for the planning, operation, and control of power systems. However, its mathematical modeling and time-domain solution is computationally onerous and has attracted the attention of power systems experts and simulation specialists for decades. The ultimate promised goal has been always to perform this simulation as fast as real-time for realistic-sized systems. In this thesis, methods to speedup transient stability simulation for large-scale power systems are investigated. The research reported in this thesis can be divided into two parts. First, real-time simulation on a general-purpose simulator composed of CPU-based computational nodes is considered. A novel approach called Instantaneous Relaxation (IR) is proposed for the real-time transient stability simulation on such a simulator. The motivation of proposing this technique comes from the inherent parallelism that exists in the transient stability problem that allows to have a coarse grain decomposition of resulting system equations. Comparison of the real-time results with the off-line results shows both the accuracy and efficiency of the proposed method. In the second part of this thesis, Graphics Processing Units (GPUs) are used for the first time for the transient stability simulation of power systems. Data-parallel programming techniques are used on the single-instruction multiple-date (SIMD) architecture of the GPU to implement the transient stability simulations. Several test cases of varying sizes are used to investigate the GPU-based simulation. The simulation results reveal the obvious advantage of using GPUs instead of CPUs for large-scale problems. In the continuation of part two of this thesis the application of multiple GPUs running in parallel is investigated. Two different parallel processing based techniques are implemented: the IR method, and the incomplete LU factorization based approach. Practical information is provided on how to use multi-threaded programming to manage multiple GPUs running simultaneously for the implementation of the transient stability simulation. The implementation of the IR method on multiple GPUs is the intersection of data parallelism and program-level parallelism, which makes possible the simulation of very large-scale systems with 7020 buses and 1800 synchronous generators.
Energy Systems

APA, Harvard, Vancouver, ISO, and other styles

31

Feng, Zhuo. "Modeling and Analysis of Large-Scale On-Chip Interconnects." 2009. http://hdl.handle.net/1969.1/ETD-TAMU-2009-12-7142.

Full text

Abstract:

As IC technologies scale to the nanometer regime, efficient and accurate modeling and analysis of VLSI systems with billions of transistors and interconnects becomes increasingly critical and difficult. VLSI systems impacted by the increasingly high dimensional process-voltage-temperature (PVT) variations demand much more modeling and analysis efforts than ever before, while the analysis of large scale on-chip interconnects that requires solving tens of millions of unknowns imposes great challenges in computer aided design areas. This dissertation presents new methodologies for addressing the above two important challenging issues for large scale on-chip interconnect modeling and analysis: In the past, the standard statistical circuit modeling techniques usually employ principal component analysis (PCA) and its variants to reduce the parameter dimensionality. Although widely adopted, these techniques can be very limited since parameter dimension reduction is achieved by merely considering the statistical distributions of the controlling parameters but neglecting the important correspondence between these parameters and the circuit performances (responses) under modeling. This dissertation presents a variety of performance-oriented parameter dimension reduction methods that can lead to more than one order of magnitude parameter reduction for a variety of VLSI circuit modeling and analysis problems. The sheer size of present day power/ground distribution networks makes their analysis and verification tasks extremely runtime and memory inefficient, and at the same time, limits the extent to which these networks can be optimized. Given today?s commodity graphics processing units (GPUs) that can deliver more than 500 GFlops (Flops: floating point operations per second). computing power and 100GB/s memory bandwidth, which are more than 10X greater than offered by modern day general-purpose quad-core microprocessors, it is very desirable to convert the impressive GPU computing power to usable design automation tools for VLSI verification. In this dissertation, for the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with very promising performance. Our GPU based network analyzer is capable of solving tens of millions of power grid nodes in just a few seconds. Additionally, with the above GPU based simulation framework, more challenging three-dimensional full-chip thermal analysis can be solved in a much more efficient way than ever before.

APA, Harvard, Vancouver, ISO, and other styles

32

Heinze, Georg. "Molekulardynamische Simulation der Oxidation dünner Siliziumnanodrähte: Einfluss von Draht- und Prozessparametern auf die Struktur." 2018. https://monarch.qucosa.de/id/qucosa%3A32834.

Full text

Abstract:

Siliziumnanodrähte (SiNWs) bieten aufgrund ihrer exzellenten elektrostatischen Kontrollierbarkeit eine gute Grundlage für die Entwicklung neuartiger Bauelemente, wie rekonfigurierbarer Feldeffekttransistoren (RFETs). Da SiNWs durch die Oxidation gezielt verzerrt werden können und diese Verzerrung die Bandstruktur des Siliziums verändert, bietet der Oxidationsprozess eine Möglichkeit, die Leitungseigenschaften der RFETs zu modulieren und eine symmetrische Transfercharakteristik zu erhalten. Die Untersuchung von SiNWs mit Durchmessern im einstelligen Nanometerbereich bedarf eines atomistischen Ansatzes. In der vorliegenden Arbeit wird mit einem reaktiven Kraftfeld die initiale Phase der Oxidation dünner SiNWs molekulardynamisch simuliert. Gegenstand der Untersuchungen sind die Temperaturabhängigkeit der Oxidation von <110>-SiNWs mit Anfangsradien von 10.2 Å sowie das Oxidationsverhalten von <110>- und <100>-SiNWs mit Anfangsradien von 5.1 Å. Dabei wird neben dem Sauerstoffanteil im Simulationssystem und der radial aufgelösten Dichte auch das radial aufgelöste Verhältnis zwischen Sauerstoff- und Siliziumatomen während der gesamten Simulationsdauer untersucht und ein Zusammenhang zur Dichte festgestellt. Darüber hinaus wird bei 300 K erstmals eine Analyse der Verzerrungsentwicklung während der initialen Oxidationsphase durchgeführt, bei der sich sowohl für <110>-SiNWs als auch für <100>-SiNWs eine tensile Verzerrung im unoxidierten Drahtkern einstellt. Wie eine Analyse der partiellen radialen Verteilungsfunktion zeigt, kommt es zu dieser Verzerrung, weil während der Oxidation die Grundstruktur des Siliziums im Oxid erhalten bleibt, durch die Einlagerung des Sauerstoffs allerdings der Bindungsabstand erhöht wird. Dieser erhöhte Bindungsabstand wird durch Bindungen zu Siliziumatomen im Oxid auch Siliziumatomen im unoxidierten Kern aufgezwungen.:Inhaltsverzeichnis Abbildungsverzeichnis Tabellenverzeichnis Abkürzungsverzeichnis Symbolverzeichnis 1. Einleitung 2. Theoretische Grundlagen 2.1. Molekulardynamik 2.2. Siliziumnanodrähte 2.3. Verzerrung und Verspannung 3. Modellsystem 3.1. Ausgangsstruktur 3.2. Vorrelaxation 3.3. Ablauf der Oxidation 4. Untersuchungsmethoden 4.1. Sauerstofffluenz, Oxidationsgrad und Oxidationsrate 4.2. Massendichte und Siliziumanteil 4.3. Radiale Verteilungsfunktion 4.4. Verzerrung 4.4.1. <110>-Draht 4.4.2. <100>-Draht 5. Ergebnisse und Diskussion 5.1. Festlegung des Einsetzintervalls 5.2. Temperaturvariation 5.2.1. Oxidationsgrad 5.2.2. Siliziumanteil 5.2.3. Massendichte 5.2.4. Radiale Verteilungsfunktion 5.3. Radius- und Orientierungsvariation 5.4. Verzerrung 6. Zusammenfassung und Ausblick 6.1. Zusammenfassung 6.2. Ausblick A. Festlegung des Einsetzintervalls Literaturverzeichnis

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!