Дисертації з теми "High performance scientific computing"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: High performance scientific computing.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "High performance scientific computing".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Balakrishnan, Suresh Reuben A/L. "Hybrid High Performance Computing (HPC) + Cloud for Scientific Computing." Thesis, Curtin University, 2022. http://hdl.handle.net/20.500.11937/89123.

Повний текст джерела
Анотація:
The HPC+Cloud framework has been built to enable on-premise HPC jobs to use resources from cloud computing nodes. As part of designing the software framework, public cloud providers, namely Amazon AWS, Microsoft Azure and NeCTAR were benchmarked against one another, and Microsoft Azure was determined to be the most suitable cloud component in the proposed HPC+Cloud software framework. Finally, an HPC+Cloud cluster was built using the HPC+Cloud software framework and then was validated by conducting HPC processing benchmarks.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Bentz, Jonathan Lee. "Hybrid programming in high performance scientific computing." [Ames, Iowa : Iowa State University], 2006.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Calatrava, Arroyo Amanda. "High Performance Scientific Computing over Hybrid Cloud Platforms." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/75265.

Повний текст джерела
Анотація:
Scientific applications generally require large computational requirements, memory and data management for their execution. Such applications have traditionally used high-performance resources, such as shared memory supercomputers, clusters of PCs with distributed memory, or resources from Grid infrastructures on which the application needs to be adapted to run successfully. In recent years, the advent of virtualization techniques, together with the emergence of Cloud Computing, has caused a major shift in the way these applications are executed. However, the execution management of scientific applications on high performance elastic platforms is not a trivial task. In this doctoral thesis, Elastic Cloud Computing Cluster (EC3) has been developed. EC3 is an open-source tool able to execute high performance scientific applications by creating self-managed cost-efficient virtual hybrid elastic clusters on top of IaaS Clouds. These self-managed clusters have the capability to adapt the size of the cluster, i.e. the number of nodes, to the workload, thus creating the illusion of a real cluster without requiring an investment beyond the actual usage. They can be fully customized and migrated from one provider to another, in an automatically and transparent process for the users and jobs running in the cluster. EC3 can also deploy hybrid clusters across on-premises and public Cloud resources, where on-premises resources are supplemented with public Cloud resources to accelerate the execution process. Different instance types and the use of spot instances combined with on-demand resources are also cluster configurations supported by EC3. Moreover, using spot instances, together with checkpointing techniques, the tool can significantly reduce the total cost of executions while introducing automatic fault tolerance. EC3 is conceived to facilitate the use of virtual clusters to users, that might not have an extensive knowledge about these technologies, but they can benefit from them. Thus, the tool offers two different interfaces for its users, a web interface where EC3 is exposed as a service for non-experienced users and a powerful command line interface. Moreover, this thesis explores the field of light-weight virtualization using containers as an alternative to the traditional virtualization solution based on virtual machines. This study analyzes the suitable scenario for the use of containers and proposes an architecture for the deployment of elastic virtual clusters based on this technology. Finally, to demonstrate the functionality and advantages of the tools developed during this thesis, this document includes several use cases covering different scenarios and fields of knowledge, such as structural analysis of buildings, astrophysics or biodiversity.
Las aplicaciones científicas generalmente precisan grandes requisitos de cómputo, memoria y gestión de datos para su ejecución. Este tipo de aplicaciones tradicionalmente ha empleado recursos de altas prestaciones, como supercomputadores de memoria compartida, clústers de PCs de memoria distribuida, o recursos provenientes de infraestructuras Grid, sobre los que se adaptaba la aplicación para que se ejecutara satisfactoriamente. El auge que han tenido las técnicas de virtualización en los últimos años, propiciando la aparición de la computación en la nube (Cloud Computing), ha provocado un importante cambio en la forma de ejecutar este tipo de aplicaciones. Sin embargo, la gestión de la ejecución de aplicaciones científicas sobre plataformas de computación elásticas de altas prestaciones no es una tarea trivial. En esta tesis doctoral se ha desarrollado Elastic Cloud Computing Cluster (EC3), una herramienta de código abierto capaz de llevar a cabo la ejecución de aplicaciones científicas de altas prestaciones creando para ello clústers virtuales, híbridos y elásticos, autogestionados y eficientes en cuanto a costes, sobre plataformas Cloud de tipo Infraestructura como Servicio (IaaS). Estos clústers autogestionados tienen la capacidad de adaptar su tamaño, es decir, el número de nodos, a la carga de trabajo, creando así la ilusión de un clúster real sin requerir una inversión por encima del uso actual. Además, son completamente configurables y pueden ser migrados de un proveedor a otro de manera automática y transparente a los usuarios y trabajos en ejecución en el cluster. EC3 también permite desplegar clústers híbridos sobre recursos Cloud públicos y privados, donde los recursos privados son complementados con recursos Cloud públicos para acelerar el proceso de ejecución. Otras configuraciones híbridas, como el empleo de diferentes tipos de instancias y el uso de instancias puntuales combinado con instancias bajo demanda son también soportadas por EC3. Además, el uso de instancias puntuales junto con técnicas de checkpointing permite a EC3 reducir significantemente el coste total de las ejecuciones a la vez que proporciona tolerancia a fallos. EC3 está concebido para facilitar el uso de clústers virtuales a los usuarios, que, aunque no tengan un conocimiento extenso sobre este tipo de tecnologías, pueden beneficiarse fácilmente de ellas. Por ello, la herramienta ofrece dos interfaces diferentes a sus usuarios, una interfaz web donde se expone EC3 como servicio para usuarios no experimentados y una potente interfaz de línea de comandos. Además, esta tesis doctoral se adentra en el campo de la virtualización ligera, mediante el uso de contenedores como alternativa a la solución tradicional de virtualización basada en máquinas virtuales. Este estudio analiza el escenario propicio para el uso de contenedores y propone una arquitectura para el despliegue de clusters virtuales elásticos basados en esta tecnología. Finalmente, para demostrar la funcionalidad y ventajas de las herramientas desarrolladas durante esta tesis, esta memoria recoge varios casos de uso que abarcan diferentes escenarios y campos de conocimiento, como estudios estructurales de edificios, astrofísica o biodiversidad.
Les aplicacions científiques generalment precisen grans requisits de còmput, de memòria i de gestió de dades per a la seua execució. Este tipus d'aplicacions tradicionalment hi ha empleat recursos d'altes prestacions, com supercomputadors de memòria compartida, clústers de PCs de memòria distribuïda, o recursos provinents d'infraestructures Grid, sobre els quals s'adaptava l'aplicació perquè s'executara satisfactòriament. L'auge que han tingut les tècniques de virtualitzaciò en els últims anys, propiciant l'aparició de la computació en el núvol (Cloud Computing), ha provocat un important canvi en la forma d'executar este tipus d'aplicacions. No obstant això, la gestió de l'execució d'aplicacions científiques sobre plataformes de computació elàstiques d'altes prestacions no és una tasca trivial. En esta tesi doctoral s'ha desenvolupat Elastic Cloud Computing Cluster (EC3), una ferramenta de codi lliure capaç de dur a terme l'execució d'aplicacions científiques d'altes prestacions creant per a això clústers virtuals, híbrids i elàstics, autogestionats i eficients quant a costos, sobre plataformes Cloud de tipus Infraestructura com a Servici (IaaS). Estos clústers autogestionats tenen la capacitat d'adaptar la seua grandària, es dir, el nombre de nodes, a la càrrega de treball, creant així la il·lusió d'un cluster real sense requerir una inversió per damunt de l'ús actual. A més, són completament configurables i poden ser migrats d'un proveïdor a un altre de forma automàtica i transparent als usuaris i treballs en execució en el cluster. EC3 també permet desplegar clústers híbrids sobre recursos Cloud públics i privats, on els recursos privats són complementats amb recursos Cloud públics per a accelerar el procés d'execució. Altres configuracions híbrides, com l'us de diferents tipus d'instàncies i l'ús d'instàncies puntuals combinat amb instàncies baix demanda són també suportades per EC3. A més, l'ús d'instàncies puntuals junt amb tècniques de checkpointing permet a EC3 reduir significantment el cost total de les execucions al mateix temps que proporciona tolerància a fallades. EC3e stà concebut per a facilitar l'ús de clústers virtuals als usuaris, que, encara que no tinguen un coneixement extensiu sobre este tipus de tecnologies, poden beneficiar-se fàcilment d'elles. Per això, la ferramenta oferix dos interfícies diferents dels seus usuaris, una interfície web on s'exposa EC3 com a servici per a usuaris no experimentats i una potent interfície de línia d'ordres. A més, esta tesi doctoral s'endinsa en el camp de la virtualitzaciò lleugera, per mitjà de l'ús de contenidors com a alternativa a la solució tradicional de virtualitzaciò basada en màquines virtuals. Este estudi analitza l'escenari propici per a l'ús de contenidors i proposa una arquitectura per al desplegament de clusters virtuals elàstics basats en esta tecnologia. Finalment, per a demostrar la funcionalitat i avantatges de les ferramentes desenrotllades durant esta tesi, esta memòria arreplega diversos casos d'ús que comprenen diferents escenaris i camps de coneixement, com a estudis estructurals d'edificis, astrofísica o biodiversitat.
Calatrava Arroyo, A. (2016). High Performance Scientific Computing over Hybrid Cloud Platforms [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/75265
TESIS
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Agarwal, Dinesh. "Scientific High Performance Computing (HPC) Applications On The Azure Cloud Platform." Digital Archive @ GSU, 2013. http://digitalarchive.gsu.edu/cs_diss/75.

Повний текст джерела
Анотація:
Cloud computing is emerging as a promising platform for compute and data intensive scientific applications. Thanks to the on-demand elastic provisioning capabilities, cloud computing has instigated curiosity among researchers from a wide range of disciplines. However, even though many vendors have rolled out their commercial cloud infrastructures, the service offerings are usually only best-effort based without any performance guarantees. Utilization of these resources will be questionable if it can not meet the performance expectations of deployed applications. Additionally, the lack of the familiar development tools hamper the productivity of eScience developers to write robust scientific high performance computing (HPC) applications. There are no standard frameworks that are currently supported by any large set of vendors offering cloud computing services. Consequently, the application portability among different cloud platforms for scientific applications is hard. Among all clouds, the emerging Azure cloud from Microsoft in particular remains a challenge for HPC program development both due to lack of its support for traditional parallel programming support such as Message Passing Interface (MPI) and map-reduce and due to its evolving application programming interfaces (APIs). We have designed newer frameworks and runtime environments to help HPC application developers by providing them with easy to use tools similar to those known from traditional parallel and distributed computing environment set- ting, such as MPI, for scientific application development on the Azure cloud platform. It is challenging to create an efficient framework for any cloud platform, including the Windows Azure platform, as they are mostly offered to users as a black-box with a set of application programming interfaces (APIs) to access various service components. The primary contributions of this Ph.D. thesis are (i) creating a generic framework for bag-of-tasks HPC applications to serve as the basic building block for application development on the Azure cloud platform, (ii) creating a set of APIs for HPC application development over the Azure cloud platform, which is similar to message passing interface (MPI) from traditional parallel and distributed setting, and (iii) implementing Crayons using the proposed APIs as the first end-to-end parallel scientific application to parallelize the fundamental GIS operations.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Gulabani, Teena Pratap. "Development of high performance scientific components for interoperability of computing packages." [Ames, Iowa : Iowa State University], 2008.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Kaplan, Ali. "Collaborative framework for high-performance p2p-based data transfer in scientific computing." [Bloomington, Ind.] : Indiana University, 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3380091.

Повний текст джерела
Анотація:
Thesis (Ph.D.)--Indiana University, Dept. of Computer Science, 2009.
Title from PDF t.p. (viewed on Jul 19, 2010). Source: Dissertation Abstracts International, Volume: 70-12, Section: B, page: 7668. Adviser: Geoffrey C. Fox.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Steven, Monteiro Steena Dominica. "Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads." DigitalCommons@USU, 2016. https://digitalcommons.usu.edu/etd/5228.

Повний текст джерела
Анотація:
Projecting performance of applications and hardware is important to several market segments—hardware designers, software developers, supercomputing centers, and end users. Hardware designers estimate performance of current applications on future systems when designing new hardware. Software developers make performance estimates to evaluate performance of their code on different architectures and input datasets. Supercomputing centers try to optimize the process of matching computing resources to computing needs. End users requesting time on supercomputers must provide estimates of their application’s run time, and incorrect estimates can lead to wasted supercomputing resources and time. However, application performance is challenging to predict because it is affected by several factors in application code, specifications of system hardware, choice of compilers, compiler flags, and libraries. This dissertation uses statistical techniques to model and optimize performance of scientific applications across different computer processors. The first study in this research offers statistical models that predict performance of an application across different input datasets prior to application execution. These models guide end users to select parameters that produce optimal application performance during execution. The second study offers a suite of statistical models that predict performance of a new application on a new processor. Both studies present statistical techniques that can be generalized to analyze, optimize, and predict performance of diverse computation- and data-intensive applications on different hardware.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Lin, Tien-Ju. "Web-based front-end design and scientific computing for material stress simulation software." Thesis, Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53101.

Повний текст джерела
Анотація:
A precise simulation requires a large amount of input data such as geometrical descriptions of the crystal structure, the external forces and loads, and quantitative properties of the material. Although some powerful applications already exist for research purposes, they are not widely used in education due to complex structure and unintuitive operation. To cater to the generic user base, a front-end application for material simulation software is introduced. With a graphic interface, it provides a more efficient way to conduct the simulation and to educate students who want to enlarge knowledge in relevant fields. We first discuss how we explore the solution for the front-end application and how to develop it on top of the material simulation software developed by mechanical engineering lab from Georgia Tech Lorraine. The user interface design, the functionality and the whole user experience are primary factors determining the product success or failure. This material simulation software helps researchers resolve the motion and the interactions of a large ensemble of dislocations for single or multi-layered 3D materials. However, the algorithm it utilizes is not well optimized and parallelized, so its performance of speedup cannot scale when using more CPUs in the cluster. This problem leads to the second topic on scientific computing, so in this thesis we offer different approaches that attempt to improve the parallelization and optimize the scalability.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Krishnan, Manoj Kumar. "ProLAS a novel dynamic load balancing library for advanced scientific computing /." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-11102003-184622.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Malenta, Mateusz. "Exploring the dynamic radio sky with many-core high-performance computing." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/exploring-the-dynamic-radio-sky-with-manycore-highperformance-computing(fe86c963-e253-48c0-a907-f8b59c44cf53).html.

Повний текст джерела
Анотація:
As new radio telescopes and processing facilities are being built, the amount of data that has to be processed is growing continuously. This poses significant challenges, especially if the real-time processing is required, which is important for surveys looking for poorly understood objects, such as Fast Radio Bursts, where quick detection and localisation can enable rapid follow-up observations at different frequencies. With the data rates increasing all the time, new processing techniques using the newest hardware, such as GPUs, have to be developed. A new pipeline, called PAFINDER, has been developed to process data taken with a phased array feed, which can generate up to 36 beams on the sky, with data rates of 25 GBps per beam. With the majority of work done on GPUs, the pipeline reaches real-time performance when generating filterbank files used for offline processing. The full real-time processing, including single-pulse searches has also been implemented and has been shown to perform well under favourable conditions. The pipeline was successfully used to record and process data containing observations of RRAT J1819-1458 and positions on the sky where 3 FRBs have been observed previously, including the repeating FRB121102. Detailed examination of J1819-1458 single-pulse detections revealed a complex emission environment with pulses coming from three different rotation phase bands and a number of multi-component emissions. No new FRBs and no repeated bursts from FRB121102 have been detected. The GMRT High Resolution Southern Sky survey observes the sky at high galactic latitudes, searching for new pulsars and FRBs. 127 hours of data have been searched for the presence of any new bursts, with the help of new pipeline developed for this survey. No new FRBs have been found, which can be the result of bad RFI pollution, which was not fully removed despite new techniques being developed and combined with the existing solutions to mitigate these negative effects. Using the best estimates on the total amount of data that has been processed correctly, obtained using new single-pulse simulation software, no detections were found to be consistent with the expected rates for standard candle FRBs with a flat or positive spectrum.
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Li, Shaomeng. "Wavelet Compression for Visualization and Analysis on High Performance Computers." Thesis, University of Oregon, 2018. http://hdl.handle.net/1794/23905.

Повний текст джерела
Анотація:
As HPC systems move towards exascale, the discrepancy between computational power and I/O transfer rate is only growing larger. Lossy in situ compression is a promising solution to address this gap, since it alleviates I/O constraints while still enabling traditional post hoc analysis. This dissertation explores the viability of such a solution with respect to a specific kind of compressor — wavelets. We especially examine three aspects of concern regarding the viability of wavelets: 1) information loss after compression, 2) its capability to fit within in situ constraints, and 3) the compressor’s capability to adapt to HPC architectural changes. Findings from this dissertation inform in situ use of wavelet compressors on HPC systems, demonstrate its viabilities, and argue that its viability will only increase as exascale computing becomes a reality.
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Zheng, Fang. "Middleware for online scientific data analytics at extreme scale." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51847.

Повний текст джерела
Анотація:
Scientific simulations running on High End Computing machines in domains like Fusion, Astrophysics, and Combustion now routinely generate terabytes of data in a single run, and these data volumes are only expected to increase. Since such massive simulation outputs are key to scientific discovery, the ability to rapidly store, move, analyze, and visualize data is critical to scientists' productivity. Yet there are already serious I/O bottlenecks on current supercomputers, and movement toward the Exascale is further accelerating this trend. This dissertation is concerned with the design, implementation, and evaluation of middleware-level solutions to enable high performance and resource efficient online data analytics to process massive simulation output data at large scales. Online data analytics can effectively overcome the I/O bottleneck for scientific applications at large scales by processing data as it moves through the I/O path. Online analytics can extract valuable insights from live simulation output in a timely manner, better prepare data for subsequent deep analysis and visualization, and gain improved performance and reduced data movement cost (both in time and in power) compared to the conventional post-processing paradigm. The thesis identifies the key challenges for online data analytics based on the needs of a variety of large-scale scientific applications, and proposes a set of novel and effective approaches to efficiently program, distribute, and schedule online data analytics along the critical I/O path. In particular, its solution approach i) provides a high performance data movement substrate to support parallel and complex data exchanges between simulation and online data analytics, ii) enables placement flexibility of analytics to exploit distributed resources, iii) for co-placement of analytics with simulation codes on the same nodes, it uses fined-grained scheduling to harvest idle resources for running online analytics with minimal interference to the simulation, and finally, iv) it supports scalable efficient online spatial indices to accelerate data analytics and visualization on the deep memory hierarchies of high end machines. Our middleware approach is evaluated with leadership scientific applications in domains like Fusion, Combustion, and Molecular Dynamics, and on different High End Computing platforms. Substantial improvements are demonstrated in end-to-end application performance and in resource efficiency at scales of up to 16384 cores, for a broad range of analytics and visualization codes. The outcome is a useful and effective software platform for online scientific data analytics facilitating large-scale scientific data exploration.
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Kumar, Vijay Shiv. "Specification, Configuration and Execution of Data-intensive Scientific Applications." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1286570224.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Obermiller, Dan. "High Performance Portability with RAJA and Agency." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/cmc_theses/1557.

Повний текст джерела
Анотація:
High performance and scientific computing take advantage of high-end and high-spec computer architectures. As these architectures evolve, and new architectures are created, applications may be able to run at greater and greater speeds. These changes persent challenges to implementors who wish to take advantage of the newest features and machines. Portability layers such as RAJA and Agency seek to abstract away machine-specific details and allow scientists to take advantage of new features as they become available. We enhance RAJA with a lower-level framework, Agency, to determine if these layered abstractions provide performance or maintainability benefits.
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Futral, Jeremy Stephen. "A method of evaluation of high-performance computing batch schedulers." UNF Digital Commons, 2019. https://digitalcommons.unf.edu/etd/869.

Повний текст джерела
Анотація:
According to Sterling et al., a batch scheduler, also called workload management, is an application or set of services that provide a method to monitor and manage the flow of work through the system [Sterling01]. The purpose of this research was to develop a method to assess the execution speed of workloads that are submitted to a batch scheduler. While previous research exists, this research is different in that more complex jobs were devised that fully exercised the scheduler with established benchmarks. This research is important because the reduction of latency even if it is miniscule can lead to massive savings of electricity, time, and money over the long term. This is especially important in the era of green computing [Reuther18]. The methodology used to assess these schedulers involved the execution of custom automation scripts. These custom scripts were developed as part of this research to automatically submit custom jobs to the schedulers, take measurements, and record the results. There were multiple experiments conducted throughout the course of the research. These experiments were designed to apply the methodology and assess the execution speed of a small selection of batch schedulers. Due to time constraints, the research was limited to four schedulers. x The measurements that were taken during the experiments were wall time, RAM usage, and CPU usage. These measurements captured the utilization of system resources of each of the schedulers. The custom scripts were executed using, 1, 2, and 4 servers to determine how well a scheduler scales with network growth. The experiments were conducted on local school resources. All hardware was similar and was co-located within the same data-center. While the schedulers that were investigated as part of the experiments are agnostic to whether the system is grid, cluster, or super-computer; the investigation was limited to a cluster architecture.
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Jamaliannasrabadi, Saba. "High Performance Computing as a Service in the Cloud Using Software-Defined Networking." Bowling Green State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1433963448.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Neuwirth, Sarah Marie [Verfasser], and Ulrich [Akademischer Betreuer] Brüning. "Accelerating Network Communication and I/O in Scientific High Performance Computing Environments / Sarah Marie Neuwirth ; Betreuer: Ulrich Brüning." Heidelberg : Universitätsbibliothek Heidelberg, 2019. http://d-nb.info/1177045133/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Neuwirth, Sarah [Verfasser], and Ulrich [Akademischer Betreuer] Brüning. "Accelerating Network Communication and I/O in Scientific High Performance Computing Environments / Sarah Marie Neuwirth ; Betreuer: Ulrich Brüning." Heidelberg : Universitätsbibliothek Heidelberg, 2019. http://d-nb.info/1177045133/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
19

López, Huguet Sergio. "Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing." Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/172327.

Повний текст джерела
Анотація:
Tesis por compendio
[ES] Las aplicaciones científicas implican generalmente una carga computacional variable y no predecible a la que las instituciones deben hacer frente variando dinámicamente la asignación de recursos en función de las distintas necesidades computacionales. Las aplicaciones científicas pueden necesitar grandes requisitos. Por ejemplo, una gran cantidad de recursos computacionales para el procesado de numerosos trabajos independientes (High Throughput Computing o HTC) o recursos de alto rendimiento para la resolución de un problema individual (High Performance Computing o HPC). Los recursos computacionales necesarios en este tipo de aplicaciones suelen acarrear un coste muy alto que puede exceder la disponibilidad de los recursos de la institución o estos pueden no adaptarse correctamente a las necesidades de las aplicaciones científicas, especialmente en el caso de infraestructuras preparadas para la ejecución de aplicaciones de HPC. De hecho, es posible que las diferentes partes de una aplicación necesiten distintos tipos de recursos computacionales. Actualmente las plataformas de servicios en la nube se han convertido en una solución eficiente para satisfacer la demanda de las aplicaciones HTC, ya que proporcionan un abanico de recursos computacionales accesibles bajo demanda. Por esta razón, se ha producido un incremento en la cantidad de clouds híbridos, los cuales son una combinación de infraestructuras alojadas en servicios en la nube y en las propias instituciones (on-premise). Dado que las aplicaciones pueden ser procesadas en distintas infraestructuras, actualmente la portabilidad de las aplicaciones se ha convertido en un aspecto clave. Probablemente, las tecnologías de contenedores son la tecnología más popular para la entrega de aplicaciones gracias a que permiten reproducibilidad, trazabilidad, versionado, aislamiento y portabilidad. El objetivo de la tesis es proporcionar una arquitectura y una serie de servicios para proveer infraestructuras elásticas híbridas de procesamiento que puedan dar respuesta a las diferentes cargas de trabajo. Para ello, se ha considerado la utilización de elasticidad vertical y horizontal desarrollando una prueba de concepto para proporcionar elasticidad vertical y se ha diseñado una arquitectura cloud elástica de procesamiento de Análisis de Datos. Después, se ha trabajo en una arquitectura cloud de recursos heterogéneos de procesamiento de imágenes médicas que proporciona distintas colas de procesamiento para trabajos con diferentes requisitos. Esta arquitectura ha estado enmarcada en una colaboración con la empresa QUIBIM. En la última parte de la tesis, se ha evolucionado esta arquitectura para diseñar e implementar un cloud elástico, multi-site y multi-tenant para el procesamiento de imágenes médicas en el marco del proyecto europeo PRIMAGE. Esta arquitectura utiliza un almacenamiento distribuido integrando servicios externos para la autenticación y la autorización basados en OpenID Connect (OIDC). Para ello, se ha desarrollado la herramienta kube-authorizer que, de manera automatizada y a partir de la información obtenida en el proceso de autenticación, proporciona el control de acceso a los recursos de la infraestructura de procesamiento mediante la creación de las políticas y roles. Finalmente, se ha desarrollado otra herramienta, hpc-connector, que permite la integración de infraestructuras de procesamiento HPC en infraestructuras cloud sin necesitar realizar cambios en la infraestructura HPC ni en la arquitectura cloud. Cabe destacar que, durante la realización de esta tesis, se han utilizado distintas tecnologías de gestión de trabajos y de contenedores de código abierto, se han desarrollado herramientas y componentes de código abierto y se han implementado recetas para la configuración automatizada de las distintas arquitecturas diseñadas desde la perspectiva DevOps.
[CA] Les aplicacions científiques impliquen generalment una càrrega computacional variable i no predictible a què les institucions han de fer front variant dinàmicament l'assignació de recursos en funció de les diferents necessitats computacionals. Les aplicacions científiques poden necessitar grans requisits. Per exemple, una gran quantitat de recursos computacionals per al processament de nombrosos treballs independents (High Throughput Computing o HTC) o recursos d'alt rendiment per a la resolució d'un problema individual (High Performance Computing o HPC). Els recursos computacionals necessaris en aquest tipus d'aplicacions solen comportar un cost molt elevat que pot excedir la disponibilitat dels recursos de la institució o aquests poden no adaptar-se correctament a les necessitats de les aplicacions científiques, especialment en el cas d'infraestructures preparades per a l'avaluació d'aplicacions d'HPC. De fet, és possible que les diferents parts d'una aplicació necessiten diferents tipus de recursos computacionals. Actualment les plataformes de servicis al núvol han esdevingut una solució eficient per satisfer la demanda de les aplicacions HTC, ja que proporcionen un ventall de recursos computacionals accessibles a demanda. Per aquest motiu, s'ha produït un increment de la quantitat de clouds híbrids, els quals són una combinació d'infraestructures allotjades a servicis en el núvol i a les mateixes institucions (on-premise). Donat que les aplicacions poden ser processades en diferents infraestructures, actualment la portabilitat de les aplicacions s'ha convertit en un aspecte clau. Probablement, les tecnologies de contenidors són la tecnologia més popular per a l'entrega d'aplicacions gràcies al fet que permeten reproductibilitat, traçabilitat, versionat, aïllament i portabilitat. L'objectiu de la tesi és proporcionar una arquitectura i una sèrie de servicis per proveir infraestructures elàstiques híbrides de processament que puguen donar resposta a les diferents càrregues de treball. Per a això, s'ha considerat la utilització d'elasticitat vertical i horitzontal desenvolupant una prova de concepte per proporcionar elasticitat vertical i s'ha dissenyat una arquitectura cloud elàstica de processament d'Anàlisi de Dades. Després, s'ha treballat en una arquitectura cloud de recursos heterogenis de processament d'imatges mèdiques que proporciona distintes cues de processament per a treballs amb diferents requisits. Aquesta arquitectura ha estat emmarcada en una col·laboració amb l'empresa QUIBIM. En l'última part de la tesi, s'ha evolucionat aquesta arquitectura per dissenyar i implementar un cloud elàstic, multi-site i multi-tenant per al processament d'imatges mèdiques en el marc del projecte europeu PRIMAGE. Aquesta arquitectura utilitza un emmagatzemament integrant servicis externs per a l'autenticació i autorització basats en OpenID Connect (OIDC). Per a això, s'ha desenvolupat la ferramenta kube-authorizer que, de manera automatitzada i a partir de la informació obtinguda en el procés d'autenticació, proporciona el control d'accés als recursos de la infraestructura de processament mitjançant la creació de les polítiques i rols. Finalment, s'ha desenvolupat una altra ferramenta, hpc-connector, que permet la integració d'infraestructures de processament HPC en infraestructures cloud sense necessitat de realitzar canvis en la infraestructura HPC ni en l'arquitectura cloud. Es pot destacar que, durant la realització d'aquesta tesi, s'han utilitzat diferents tecnologies de gestió de treballs i de contenidors de codi obert, s'han desenvolupat ferramentes i components de codi obert, i s'han implementat receptes per a la configuració automatitzada de les distintes arquitectures dissenyades des de la perspectiva DevOps.
[EN] Scientific applications generally imply a variable and an unpredictable computational workload that institutions must address by dynamically adjusting the allocation of resources to their different computational needs. Scientific applications could require a high capacity, e.g. the concurrent usage of computational resources for processing several independent jobs (High Throughput Computing or HTC) or a high capability by means of using high-performance resources for solving complex problems (High Performance Computing or HPC). The computational resources required in this type of applications usually have a very high cost that may exceed the availability of the institution's resources or they are may not be successfully adapted to the scientific applications, especially in the case of infrastructures prepared for the execution of HPC applications. Indeed, it is possible that the different parts that compose an application require different type of computational resources. Nowadays, cloud service platforms have become an efficient solution to meet the need of HTC applications as they provide a wide range of computing resources accessible on demand. For this reason, the number of hybrid computational infrastructures has increased during the last years. The hybrid computation infrastructures are the combination of infrastructures hosted in cloud platforms and the computation resources hosted in the institutions, which are named on-premise infrastructures. As scientific applications can be processed on different infrastructures, the application delivery has become a key issue. Nowadays, containers are probably the most popular technology for application delivery as they ease reproducibility, traceability, versioning, isolation, and portability. The main objective of this thesis is to provide an architecture and a set of services to build up hybrid processing infrastructures that fit the need of different workloads. Hence, the thesis considered aspects such as elasticity and federation. The use of vertical and horizontal elasticity by developing a proof of concept to provide vertical elasticity on top of an elastic cloud architecture for data analytics. Afterwards, an elastic cloud architecture comprising heterogeneous computational resources has been implemented for medical imaging processing using multiple processing queues for jobs with different requirements. The development of this architecture has been framed in a collaboration with a company called QUIBIM. In the last part of the thesis, the previous work has been evolved to design and implement an elastic, multi-site and multi-tenant cloud architecture for medical image processing has been designed in the framework of a European project PRIMAGE. This architecture uses a storage integrating external services for the authentication and authorization based on OpenID Connect (OIDC). The tool kube-authorizer has been developed to provide access control to the resources of the processing infrastructure in an automatic way from the information obtained in the authentication process, by creating policies and roles. Finally, another tool, hpc-connector, has been developed to enable the integration of HPC processing infrastructures into cloud infrastructures without requiring modifications in both infrastructures, cloud and HPC. It should be noted that, during the realization of this thesis, different contributions to open source container and job management technologies have been performed by developing open source tools and components and configuration recipes for the automated configuration of the different architectures designed from the DevOps perspective. The results obtained support the feasibility of the vertical elasticity combined with the horizontal elasticity to implement QoS policies based on a deadline, as well as the feasibility of the federated authentication model to combine public and on-premise clouds.
López Huguet, S. (2021). Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172327
TESIS
Compendio
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Bas, Erdeniz Ozgun. "Load-Balancing Spatially Located Computations using Rectangular Partitions." The Ohio State University, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=osu1306909831.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Lemon, Alexander Michael. "A Shared-Memory Coupled Architecture to Leverage Big Data Frameworks in Prototyping and In-Situ Analytics for Data Intensive Scientific Workflows." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7545.

Повний текст джерела
Анотація:
There is a pressing need for creative new data analysis methods whichcan sift through scientific simulation data and produce meaningfulresults. The types of analyses and the amount of data handled by currentmethods are still quite restricted, and new methods could providescientists with a large productivity boost. New methods could be simpleto develop in big data processing systems such as Apache Spark, which isdesigned to process many input files in parallel while treating themlogically as one large dataset. This distributed model, combined withthe large number of analysis libraries created for the platform, makesSpark ideal for processing simulation output.Unfortunately, the filesystem becomes a major bottleneck in any workflowthat uses Spark in such a fashion. Faster transports are notintrinsically supported by Spark, and its interface almost denies thepossibility of maintainable third-party extensions. By leveraging thesemantics of Scala and Spark's recent scheduler upgrades, we forceco-location of Spark executors with simulation processes and enable fastlocal inter-process communication through shared memory. This provides apath for bulk data transfer into the Java Virtual Machine, removing thecurrent Spark ingestion bottleneck.Besides showing that our system makes this transfer feasible, we alsodemonstrate a proof-of-concept system integrating traditional HPC codeswith bleeding-edge analytics libraries. This provides scientists withguidance on how to apply our libraries to gain a new and powerful toolfor developing new analysis techniques in large scientific simulationpipelines.
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Lundgren, Jacob. "Pricing of American Options by Adaptive Tree Methods on GPUs." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-265257.

Повний текст джерела
Анотація:
An assembled algorithm for pricing American options with absolute, discrete dividends using adaptive lattice methods is described. Considerations for hardware-conscious programming on both CPU and GPU platforms are discussed, to provide a foundation for the investigation of several approaches for deploying the program onto GPU architectures. The performance results of the approaches are compared to that of a central processing unit reference implementation, and to each other. In particular, an approach of designating subtrees to be calculated in parallel by allowing multiple calculation of overlapping elements is described. Among the examined methods, this attains the best performance results in a "realistic" region of calculation parameters. A fifteen- to thirty-fold improvement in performance over the CPU reference implementation is observed as the problem size grows sufficiently large.
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Skjerven, Brian M. "A parallel implementation of an agent-based brain tumor model." Link to electronic thesis, 2007. http://www.wpi.edu/Pubs/ETD/Available/etd-060507-172337/.

Повний текст джерела
Анотація:
Thesis (M.S.) -- Worcester Polytechnic Institute.
Keywords: Visualization; Numerical analysis; Computational biology; Scientific computation; High-performance computing. Includes bibliographical references (p.19).
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Dirand, Estelle. "Développement d'un système in situ à base de tâches pour un code de dynamique moléculaire classique adapté aux machines exaflopiques." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM065/document.

Повний текст джерела
Анотація:
L’ère de l’exascale creusera encore plus l’écart entre la vitesse de génération des données de simulations et la vitesse d’écriture et de lecture pour analyser ces données en post-traitement. Le temps jusqu’à la découverte scientifique sera donc grandement impacté et de nouvelles techniques de traitement des données doivent être mises en place. Les méthodes in situ réduisent le besoin d’écrire des données en les analysant directement là où elles sont produites. Il existe plusieurs techniques, en exécutant les analyses sur les mêmes nœuds de calcul que la simulation (in situ), en utilisant des nœuds dédiés (in transit) ou en combinant les deux approches (hybride). La plupart des méthodes in situ traditionnelles ciblent les simulations qui ne sont pas capables de tirer profit du nombre croissant de cœurs par processeur mais elles n’ont pas été conçues pour les architectures many-cœurs qui émergent actuellement. La programmation à base de tâches est quant à elle en train de devenir un standard pour ces architectures mais peu de techniques in situ à base de tâches ont été développées.Cette thèse propose d’étudier l’intégration d’un système in situ à base de tâches pour un code de dynamique moléculaire conçu pour les supercalculateurs exaflopiques. Nous tirons profit des propriétés de composabilité de la programmation à base de tâches pour implanter l’architecture hybride TINS. Les workflows d’analyses sont représentés par des graphes de tâches qui peuvent à leur tour générer des tâches pour une exécution in situ ou in transit. L’exécution in situ est rendue possible grâce à une méthode innovante de helper core dynamique qui s’appuie sur le concept de vol de tâches pour entrelacer efficacement tâches de simulation et d’analyse avec un faible impact sur le temps de la simulation.TINS utilise l’ordonnanceur de vol de tâches d’Intel® TBB et est intégré dans ExaStamp, un code de dynamique moléculaire. De nombreuses expériences ont montrées que TINS est jusqu’à 40% plus rapide que des méthodes existantes de l’état de l’art. Des simulations de dynamique moléculaire sur des système de 2 milliards de particles sur 14,336 cœurs ont montré que TINS est capable d’exécuter des analyses complexes à haute fréquence avec un surcoût inférieur à 10%
The exascale era will widen the gap between data generation rate and the time to manage their output and analysis in a post-processing way, dramatically increasing the end-to-end time to scientific discovery and calling for a shift toward new data processing methods. The in situ paradigm proposes to analyze data while still resident in the supercomputer memory to reduce the need for data storage. Several techniques already exist, by executing simulation and analytics on the same nodes (in situ), by using dedicated nodes (in transit) or by combining the two approaches (hybrid). Most of the in situ techniques target simulations that are not able to fully benefit from the ever growing number of cores per processor but they are not designed for the emerging manycore processors.Task-based programming models on the other side are expected to become a standard for these architectures but few task-based in situ techniques have been developed so far. This thesis proposes to study the design and integration of a novel task-based in situ framework inside a task-based molecular dynamics code designed for exascale supercomputers. We take benefit from the composability properties of the task-based programming model to implement the TINS hybrid framework. Analytics workflows are expressed as graphs of tasks that can in turn generate children tasks to be executed in transit or interleaved with simulation tasks in situ. The in situ execution is performed thanks to an innovative dynamic helper core strategy that uses the work stealing concept to finely interleave simulation and analytics tasks inside a compute node with a low overhead on the simulation execution time.TINS uses the Intel® TBB work stealing scheduler and is integrated into ExaStamp, a task-based molecular dynamics code. Various experiments have shown that TINS is up to 40% faster than state-of-the-art in situ libraries. Molecular dynamics simulations of up to 2 billions particles on up to 14,336 cores have shown that TINS is able to execute complex analytics workflows at a high frequency with an overhead smaller than 10%
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Johnson, Buxton L. Sr. "HYBRID PARALLELIZATION OF THE NASA GEMINI ELECTROMAGNETIC MODELING TOOL." UKnowledge, 2017. http://uknowledge.uky.edu/ece_etds/99.

Повний текст джерела
Анотація:
Understanding, predicting, and controlling electromagnetic field interactions on and between complex RF platforms requires high fidelity computational electromagnetic (CEM) simulation. The primary CEM tool within NASA is GEMINI, an integral equation based method-of-moments (MoM) code for frequency domain electromagnetic modeling. However, GEMINI is currently limited in the size and complexity of problems that can be effectively handled. To extend GEMINI’S CEM capabilities beyond those currently available, primary research is devoted to integrating the MFDlib library developed at the University of Kentucky with GEMINI for efficient filling, factorization, and solution of large electromagnetic problems formulated using integral equation methods. A secondary research project involves the hybrid parallelization of GEMINI for the efficient speedup of the impedance matrix filling process. This thesis discusses the research, development, and testing of the secondary research project on the High Performance Computing DLX Linux supercomputer cluster. Initial testing of GEMINI’s existing MPI parallelization establishes the benchmark for speedup and reveals performance issues subsequently solved by the NASA CEM Lab. Implementation of hybrid parallelization incorporates GEMINI’s existing course level MPI parallelization with Open MP fine level parallel threading. Simple and nested Open MP threading are compared. Final testing documents the improvements realized by hybrid parallelization.
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Abbasi, Mohammad Hasan. "Data services: bringing I/O processing to petascale." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42694.

Повний текст джерела
Анотація:
The increasing size of high performance computing systems and the associated increase in the volume of generated data, has resulted in an I/O bottleneck for these applications. This bottleneck is further exacerbated by the imbalance in the growth of processing capability compared to storage capability, due mainly to the power and cost requirements of scaling the storage. This thesis introduces data services, a new abstraction which provides significant benefits for data intensive applications. Data services combine low overhead data movement with flexible placement of data manipulation operations, to address the I/O challenges of leadership class scientific applications. The impact of asynchronous data movement on application runtime is minimized by utilizing novel server side data movement schedulers to avoid contention related jitter in application communication. Additionally, the JITStager component is presented. Utilizing dynamic code generation and flexible code placement, the JITStager allows data services to be executed as a pipeline extending from the application to storage. It is shown in this thesis that data services can add new functionality to the application without having an significant negative impact on performance.
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Su, Yu. "Big Data Management Framework based on Virtualization and Bitmap Data Summarization." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420738636.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Schepke, Claudio. "Distribuição de dados para implementações paralelas do Método de Lattice Boltzmann." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2007. http://hdl.handle.net/10183/8810.

Повний текст джерела
Анотація:
A Dinâmica de Fluidos Computacional é uma importante área de pesquisa no contexto da Computação Científica. Através da modelagem e simulação das propriedades de líquidos e gases é possível obter resultados numéricos para diferentes estruturas e fenômenos físicos cotidianos e de grande importância econômica. A evolução dos sistemas computacionais possibilitou a essa área o surgimento de novas técnicas e abordagens de simulação. Uma das técnicas computacionais atualmente empregadas é o Método de Lattice Boltzmann, um método numérico iterativo para a modelagem e simulação mesoscópica da dinâmica de fluxos de fluidos. Diferentes tipos de sistemas físicos podem ser tratados através dessa técnica, como é o caso de fluxos em meios porosos ou de substâncias imiscíveis. No entanto, por causa da dimensão dos sistemas físicos, é necessário adotar estratégias que permitam a obtenção de resultados precisos ou em tempos computacionais aceitáveis. Assim, paralelizar as operações é a solução mais indicada para aumentar o desempenho do método. Uma maneira eficiente de paralelizar um método numérico é fazer uso de técnicas de distribuição de dados refinadas, como é o caso do particionamento em blocos. Tais abordagens de paralelização foram adotadas neste trabalho em implementações bi- e tridimensionais do Método de Lattice Boltzmann, com o intuito de avaliar o ganho de desempenho oferecido através dessa técnica. Além disso, foram definidos os fatores que influenciam as melhores configurações de particionamento. Os resultados obtidos demonstraram que o particionamento em blocos prove um aumento considerável do desempenho das aplicações paralelas, especialmente para a versão tridimensional do método. Para algumas configurações dos estudos de caso os tempos de execução diminuíram em até 30% em relação aos tempos obtidos com o particionamento unidimensional. Já as melhores configurações para a distribuição dos dados em blocos foram aquelas em que a disposição dos dados manteve-se mais quadrada ou cúbica em relação a cada uma das dimensões coordenadas.
Computational Fluid Dynamics is an important research area in the Scientific Computing context. Through the modeling and simulation of liquids and gases properties it is possible to get numerical results for different physical structures and daily phenomena that have great economic importance. The evolution of the computational systems made it possible to develop new techniques and approaches of simulation in this area. One of these techniques currently used is the Lattice Boltzmann Method. This method is an iterative numerical strategy for modeling and simulating mesoscopic dynamics of fluid flows. Different types of physical systems can be simulated through this technique, like immiscible substances and flows in porous media. However, since the dimension of the physical systems is usually large, it is necessary to adopt strategies that allow to get accurate results or results in an acceptable computational time. Thus, the parallelization of the operations is the best alternative to increase the performance of the method. An efficient way to parallelize a numerical method is to make use of refined data distribution techniques, like data partitioning in blocks. Such parallelization approach had been adopted in this work for bi- and three-dimensional implementations of the Lattice Boltzmann Method. The objective of the work was to evaluate the performance enhancement offered through the parallelization. Moreover, another objective is to define the elements that influence the best partitioning configurations. The results shown that data partitioning in blocks provide a considerable performance increase for parallel implementations, especially for the three-dimensional version of the method. For some configurations adopted in the case studies, the execution time was reduced of up to 30% in relation to the one-dimensional partitioning strategy. The best configurations for data distribution in blocks were that where the data disposal are more square or cubical shaped in relation to each one of the coordinate dimensions
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Venkatasubramanian, Sundaresan. "Tuned and asynchronous stencil kernels for CPU/GPU systems." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29728.

Повний текст джерела
Анотація:
Thesis (M. S.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Vuduc, Richard; Committee Member: Kim, Hyesoon; Committee Member: Vetter, Jeffrey. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Стилі APA, Harvard, Vancouver, ISO та ін.
30

Gueunet, Charles. "Calcul haute performance pour l'analyse topologique de données par ensembles de niveaux." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS120.

Повний текст джерела
Анотація:
L'analyse de données topologique nécessite des algorithmes de plus en plus efficaces pour être capable de traiter des jeux de données dont la taille et le niveau de détail augmente continûment. Dans cette thèse, nous nous concentrons sur trois abstractions topologiques fondamentales dérivées des ensembles de niveaux : l'arbre de jointure, l'arbre de contour et le graphe de Reeb. Nous proposons trois nouveaux algorithmes parallèles efficaces pour leur calcul sur des stations de travail composées de processeurs multi-cœur en mémoire partagée. Le premier algorithme élaboré durant cette thèse se base sur du parallélisme multi-thread pour le calcul de l'arbre de contour. Une seconde approche revisite l'algorithme séquentiel de référence pour le calcul de cette structure et se base sur des propagations locales exprimables en tâches parallèles. Ce nouvel algorithme est en pratique deux fois plus rapide en séquentiel que l'algorithme de référence élaboré en 2000 et offre une accélération d'un ordre de grandeur en parallèle. Un dernier algorithme basé sur une approche locale par tâches est également présenté pour une abstraction plus générique : le graphe de Reeb. Contrairement aux approches concurrentes, nos algorithmes construisent les versions augmentées de ces structures, permettant de supporter l'ensemble des applications pour l'analyse de données par ensembles de niveaux. Les méthodes présentées dans ce manuscrit ont donné lieu à des implémentations qui sont les plus rapides parmi celles disponibles pour le calcul de ces abstractions. Ce travail a été intégré à la bibliothèque libre : Topology Toolkit (TTK)
Topological Data Analysis requires efficient algorithms to deal with the continuously increasing size and level of details of data sets. In this manuscript, we focus on three fundamental topological abstractions based on level sets: merge trees, contour trees and Reeb graphs. We propose three new efficient parallel algorithms for the computation of these abstractions on multi-core shared memory workstations. The first algorithm developed in the context of this thesis is based on multi-thread parallelism for the contour tree computation. A second algorithm revisits the reference sequential algorithm to compute this abstraction and is based on local propagations expressible as parallel tasks. This new algorithm is in practice twice faster in sequential than the reference algorithm designed in 2000 and offers one order of magnitude speedups in parallel. A last algorithm also relying on task-based local propagations is presented, computing a more generic abstraction: the Reeb graph. Contrary to concurrent approaches, these methods provide the augmented version of these structures, hence enabling the full extend of level-set based analysis. Algorithms presented in this manuscript result today in the fastest implementations available to compute these abstractions. This work has been integrated into the open-source platform: the Topology Toolkit (TTK)
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Lee, Joo Hong. "Hybrid Parallel Computing Strategies for Scientific Computing Applications." Diss., Virginia Tech, 2012. http://hdl.handle.net/10919/28882.

Повний текст джерела
Анотація:
Multi-core, multi-processor, and Graphics Processing Unit (GPU) computer architectures pose significant challenges with respect to the efficient exploitation of parallelism for large-scale, scientific computing simulations. For example, a simulation of the human tonsil at the cellular level involves the computation of the motion and interaction of millions of cells over extended periods of time. Also, the simulation of Radiative Heat Transfer (RHT) effects by the Photon Monte Carlo (PMC) method is an extremely computationally demanding problem. The PMC method is example of the Monte Carlo simulation method—an approach extensively used in wide of application areas. Although the basic algorithmic framework of these Monte Carlo methods is simple, they can be extremely computationally intensive. Therefore, an efficient parallel realization of these simulations depends on a careful analysis of the nature these problems and the development of an appropriate software framework. The overarching goal of this dissertation is develop and understand what the appropriate parallel programming model should be to exploit these disparate architectures, both from the metric of efficiency, as well as from a software engineering perspective. In this dissertation we examine these issues through a performance study of PathSim2, a software framework for the simulation of large-scale biological systems, using two different parallel architectures’ distributed and shared memory. First, a message-passing implementation of a multiple germinal center simulation by PathSim2 is developed and analyzed for distributed memory architectures. Second, a germinal center simulation is implemented on shared memory architecture with two parallelization strategies based on Pthreads and OpenMP. Finally, we present work targeting a complete hybrid, parallel computing architecture. With this work we develop and analyze a software framework for generic Monte Carlo simulations implemented on multiple, distributed memory nodes consisting of a multi-core architecture with attached GPUs. This simulation framework is divided into two asynchronous parts: (a) a threaded, GPU-accelerated pseudo-random number generator (or producer), and (b) a multi-threaded Monte Carlo application (or consumer). The advantage of this approach is that this software framework can be directly used within any Monte Carlo application code, without requiring application-specific programming of the GPU. We examine this approach through a performance study of the simulation of RHT effects by the PMC method on a hybrid computing architecture. We present a theoretical analysis of our proposed approach, discuss methods to optimize performance based on this analysis, and compare this analysis to experimental results obtained from simulations run on two different hybrid, parallel computing architectures.
Ph. D.
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Ferreira, Davi Morais. "IntegraÃÃo de bibliotecas cientÃficas de propÃsito especial em uma plataforma de componentes paralelos." Universidade Federal do CearÃ, 2010. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=7166.

Повний текст джерела
Анотація:
FundaÃÃo Cearense de Apoio ao Desenvolvimento Cientifico e TecnolÃgico
A contribuiÃÃo das tradicionais bibliotecas cientÃcas mostra-se consolidada na construÃÃo de aplicaÃÃes de alto desempenho. No entanto, tal artefato de desenvolvimento possui algumas limitaÃÃes de integraÃÃo, de produtividade em aplicaÃÃes de larga escala e de exibilidade para mudanÃas no contexto do problema. Por outro lado, a tecnologia de desenvolvimento baseada em componentes, recentemente proposta como alternativa viÃvel para a arquitetura de aplicaÃÃes de ComputaÃÃo de Alto Desempenho (CAD), tem fornecido meios para superar esses desaos. Vemos assim, que as bibliotecas cientÃcas e a programaÃÃo orientada a componentes sÃo tÃcnicas complementares na melhoria do processo de desenvolvimento de aplicaÃÃes modernas de CAD. Dessa forma, este trabalho tem por objetivo propor um mÃtodo sistemÃtico para integraÃÃo de bibliotecas cientÃcas sobre a plataforma de componentes paralelos HPE (Hash Programming Environment ), buscando oferecer os aspectos vantajosos complementares do uso de componentes e de bibliotecas cientÃcas aos desenvolvedores de programas paralelos que implementam aplicaÃÃes de alto desempenho. A proposta deste trabalho vai alÃm da construÃÃo de um simples encapsulamento da biblioteca em um componente, visa proporcionar ao uso das bibliotecas cientÃcas os benefÃcios de integraÃÃo, de produtividade em aplicaÃÃes de larga escala e da exibilidade para mudanÃas no contexto do problema. Como forma de exemplicar e validar o mÃtodo, temos incorporado bibliotecas de resoluÃÃo de sistemas lineares ao HPE, elegendo trÃs representantes significativos: PETSc, Hypre e SuperLU.
The contribution of traditional scientic libraries shows to be consolidated in the construction of high-performance applications. However, such an artifact of development possesses some limitations in integration, productivity in large-scale applications, and exibility for changes in the context of the problem. On the other hand, the development technology based on components recently proposed a viable alternative for the architecture of High-Performance Computing (HPC) applications, which has provided a means to overcome these challenges. Thus we see that the scientic libraries and programming orientated at components are complementary techniques in the improvement of the development process of modern HPC applications. Accordingly, this work aims to propose a systematic method for the integration of scientic libraries on a platform of parallel components, HPE (Hash Programming Environment), to oer additional advantageous aspects for the use of components and scientic libraries to developers of parallel programs that implement high-performance applications. The purpose of this work goes beyond the construction of a simple encapsulation of the library in a component; it aims to provide the benets in integration, productivity in large-scale applications, and the exibility for changes in the context of a problem in the use of scientic libraries. As a way to illustrate and validate the method, we have incorporated the libraries of linear systems solvers to HPE, electing three signicant representatives: PETSc, Hypre, e SuperLU.
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Mayer, Anthony Edward. "Composite construction of high performance scientific applications." Thesis, Imperial College London, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252520.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Roberts, Stephen I. "Energy-aware performance engineering in high performance computing." Thesis, University of Warwick, 2017. http://wrap.warwick.ac.uk/107784/.

Повний текст джерела
Анотація:
Advances in processor design have delivered performance improvements for decades. As physical limits are reached, however, refinements to the same basic technologies are beginning to yield diminishing returns. Unsustainable increases in energy consumption are forcing hardware manufacturers to prioritise energy efficiency in their designs. Research suggests that software modifications will be needed to exploit the resulting improvements in current and future hardware. New tools are required to capitalise on this new class of optimisation. This thesis investigates the field of energy-aware performance engineering. It begins by examining the current state of the art, which is characterised by ad-hoc techniques and a lack of standardised metrics. Work in this thesis addresses these deficiencies and lays stable foundations for others to build on. The first contribution made includes a set of criteria which define the properties that energy-aware optimisation metrics should exhibit. These criteria show that current metrics cannot meaningfully assess the utility of code or correctly guide its optimisation. New metrics are proposed to address these issues, and theoretical and empirical proofs of their advantages are given. This thesis then presents the Power Optimised Software Envelope (POSE) model, which allows developers to assess whether power optimisation is worth pursuing for their applications. POSE is used to study the optimisation characteristics of codes from the Mantevo mini-application suite running on a Haswell-based cluster. The results obtained show that of these codes TeaLeaf has the most scope for power optimisation while PathFinder has the least. Finally, POSE modelling techniques are extended to evaluate the system-wide scope for energy-aware performance optimisation. System Summary POSE allows developers to assess the scope a system has for energy-aware software optimisation independent of the code being run.
Стилі APA, Harvard, Vancouver, ISO та ін.
35

Palamadai, Natarajan Ekanathan. "Portable and productive high-performance computing." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108988.

Повний текст джерела
Анотація:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 115-120).
Performance portability of computer programs, and programmer productivity in writing them are key expectations in software engineering. These expectations lead to the following questions: Can programmers write code once, and execute it at optimal speed on any machine configuration? Can programmers write parallel code to simple models that hide the complex details of parallel programming? This thesis addresses these questions for certain "classes" of computer programs. It describes "autotuning" techniques that achieve performance portability for serial divide-and-conquer programs, and an abstraction that improves programmer productivity in writing parallel code for a class of programs called "Star". We present a "pruned-exhaustive" autotuner called Ztune that optimizes the performance of serial divide-and-conquer programs for a given machine configuration. Whereas the traditional way of autotuning divide-and-conquer programs involves simply coarsening the base case of recursion optimally, Ztune searches for optimal divide-and-conquer trees. Although Ztune, in principle, exhaustively enumerates the search domain, it uses pruning properties that greatly reduce the size of the search domain without significantly sacrificing the quality of the autotuned code. We illustrate how to autotune divide-and-conquer stencil computations using Ztune, and present performance comparisons with state-of-the-art "heuristic" autotuning. Not only does Ztune autotune significantly faster than a heuristic autotuner, the Ztuned programs also run faster on average than their heuristic autotuner tuned counterparts. Surprisingly, for some stencil benchmarks, Ztune actually autotuned faster than the time it takes to execute the stencil computation once. We introduce the Star class that includes many seemingly different programs like solving symmetric, diagonally-dominant tridiagonal systems, executing "watershed" cuts on graphs, sample sort, fast multipole computations, and all-prefix-sums and its various applications. We present a programming model, which is also called Star, to generate and execute parallel code for the Star class of programs. The Star model abstracts the pattern of computation and interprocessor communication in the Star class of programs, hides low-level parallel programming details, and offers ease of expression, thereby improving programmer productivity in writing parallel code. Besides, we also present parallel algorithms, which offer asymptotic improvements over prior art, for two programs in the Star class - a Trip algorithm for solving symmetric, diagonally-dominant tridiagonal systems, and a Wasp algorithm for executing watershed cuts on graphs. The Star model is implemented in the Julia programming language, and leverages Julia's capabilities in expressing parallelism in code concisely, and in supporting both shared-memory and distributed-memory parallel programming alike.
by Ekanathan Palamadai Natarajan.
Ph. D.
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Zhou, He. "High Performance Computing Architecture with Security." Diss., The University of Arizona, 2015. http://hdl.handle.net/10150/578611.

Повний текст джерела
Анотація:
Multi-processor embedded system is the future promise of high performance computing architecture. However, it still suffers low network efficiency and security threat. Simply upgrading to multi-core systems has been proven to provide only minor speedup compared with single core systems. Router architecture of network-on-chip (NoC) uses shared input buffers such as virtual channels and crossbar switches that only allow sequential data access. The speed and efficiency of on-chip communication is limited. In addition, the performance of conventional NoC topology is limited by routing latency and energy consumption due to its network diameter increases with the rising number of nodes. The security concern has also become a serious problem for embedded systems. Even with cryptographic algorithms, embedded systems are still very vulnerable to side channel attacks (SCAs). Among SCA approaches, power analysis is an efficient and powerful attack. Once the encryption location in an instruction sequence is identified, power analysis can be applied to exploit the embedded system. To improve on-chip network parallelism, this dissertation proposes a new router microarchitecture based on a new data structure called virtual collision array. Sequential data requests are partially eliminated in the virtual collision array before entering router pipeline. To facilitate the new router architecture, new workload assignment is applied to increase data request elimination. Through a task flow partitioning algorithm, we minimize sequential data access and then schedule tasks while minimizing the total router delay. For NoC topology, this dissertation presents a new hybrid NoC (HyNoC) architecture. We introduce an adaptive routing scheme to provide reconfigurable on-chip communication with both wired and wireless links. In addition, based on a mathematical model which established on cross-correlation, this dissertation proposes two obfuscation methodologies: Real Instruction Insertion and AES Mimic to prevent SCAs power analysis attack.
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Mani, Sindhu. "Empirical Performance Analysis of High Performance Computing Benchmarks Across Variations in Cloud Computing." UNF Digital Commons, 2012. http://digitalcommons.unf.edu/etd/418.

Повний текст джерела
Анотація:
High Performance Computing (HPC) applications are data-intensive scientific software requiring significant CPU and data storage capabilities. Researchers have examined the performance of Amazon Elastic Compute Cloud (EC2) environment across several HPC benchmarks; however, an extensive HPC benchmark study and a comparison between Amazon EC2 and Windows Azure (Microsoft’s cloud computing platform), with metrics such as memory bandwidth, Input/Output (I/O) performance, and communication computational performance, are largely absent. The purpose of this study is to perform an exhaustive HPC benchmark comparison on EC2 and Windows Azure platforms. We implement existing benchmarks to evaluate and analyze performance of two public clouds spanning both IaaS and PaaS types. We use Amazon EC2 and Windows Azure as platforms for hosting HPC benchmarks with variations such as instance types, number of nodes, hardware and software. This is accomplished by running benchmarks including STREAM, IOR and NPB benchmarks on these platforms on varied number of nodes for small and medium instance types. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance. Benchmarking cloud platforms provides useful objective measures of their worthiness for HPC applications in addition to assessing their consistency and predictability in supporting them.
Стилі APA, Harvard, Vancouver, ISO та ін.
38

De, la Cruz Raúl. "Leveraging performance of 3D finite difference schemes in large scientific computing simulations." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/325139.

Повний текст джерела
Анотація:
Gone are the days when engineers and scientists conducted most of their experiments empirically. During these decades, actual tests were carried out in order to assess the robustness and reliability of forthcoming product designs and prove theoretical models. With the advent of the computational era, scientific computing has definetely become a feasible solution compared with empirical methods, in terms of effort, cost and reliability. Large and massively parallel computational resources have reduced the simulation execution times and have improved their numerical results due to the refinement of the sampled domain. Several numerical methods coexist for solving the Partial Differential Equations (PDEs). Methods such as the Finite Element (FE) and the Finite Volume (FV) are specially well suited for dealing with problems where unstructured meshes are frequent. Unfortunately, this flexibility is not bestowed for free. These schemes entail higher memory latencies due to the handling of irregular data accesses. Conversely, the Finite Difference (FD) scheme has shown to be an efficient solution for problems where the structured meshes suit the domain requirements. Many scientific areas use this scheme due to its higher performance. This thesis focuses on improving FD schemes to leverage the performance of large scientific computing simulations. Different techniques are proposed such as the Semi-stencil, a novel algorithm that increases the FLOP/Byte ratio for medium- and high-order stencils operators by reducing the accesses and endorsing data reuse. The algorithm is orthogonal and can be combined with techniques such as spatial- or time-blocking, adding further improvement. New trends on Symmetric Multi-Processing (SMP) systems -where tens of cores are replicated on the same die- pose new challenges due to the exacerbation of the memory wall problem. In order to alleviate this issue, our research is focused on different strategies to reduce pressure on the cache hierarchy, particularly when different threads are sharing resources due to Simultaneous Multi-Threading (SMT). Several domain decomposition schedulers for work-load balance are introduced ensuring quasi-optimal results without jeopardizing the overall performance. We combine these schedulers with spatial-blocking and auto-tuning techniques, exploring the parametric space and reducing misses in last level cache. As alternative to brute-force methods used in auto-tuning, where a huge parametric space must be traversed to find a suboptimal candidate, performance models are a feasible solution. Performance models can predict the performance on different architectures, selecting suboptimal parameters almost instantly. In this thesis, we devise a flexible and extensible performance model for stencils. The proposed model is capable of supporting multi- and many-core architectures including complex features such as hardware prefetchers, SMT context and algorithmic optimizations. Our model can be used not only to forecast execution time, but also to make decisions about the best algorithmic parameters. Moreover, it can be included in run-time optimizers to decide the best SMT configuration based on the execution environment. Some industries rely heavily on FD-based techniques for their codes. Nevertheless, many cumbersome aspects arising in industry are still scarcely considered in academia research. In this regard, we have collaborated in the implementation of a FD framework which covers the most important features that an HPC industrial application must include. Some of the node-level optimization techniques devised in this thesis have been included into the framework in order to contribute in the overall application performance. We show results for a couple of strategic applications in industry: an atmospheric transport model that simulates the dispersal of volcanic ash and a seismic imaging model used in Oil & Gas industry to identify hydrocarbon-rich reservoirs.
Atrás quedaron los días en los que ingenieros y científicos realizaban sus experimentos empíricamente. Durante esas décadas, se llevaban a cabo ensayos reales para verificar la robustez y fiabilidad de productos venideros y probar modelos teóricos. Con la llegada de la era computacional, la computación científica se ha convertido en una solución factible comparada con métodos empíricos, en términos de esfuerzo, coste y fiabilidad. Los supercomputadores han reducido el tiempo de las simulaciones y han mejorado los resultados numéricos gracias al refinamiento del dominio. Diversos métodos numéricos coexisten para resolver las Ecuaciones Diferenciales Parciales (EDPs). Métodos como Elementos Finitos (EF) y Volúmenes Finitos (VF) están bien adaptados para tratar problemas donde las mallas no estructuradas son frecuentes. Desafortunadamente, esta flexibilidad no se confiere de forma gratuita. Estos esquemas conllevan latencias más altas debido al acceso irregular de datos. En cambio, el esquema de Diferencias Finitas (DF) ha demostrado ser una solución eficiente cuando las mallas estructuradas se adaptan a los requerimientos. Esta tesis se enfoca en mejorar los esquemas DF para impulsar el rendimiento de las simulaciones en la computación científica. Se proponen diferentes técnicas, como el Semi-stencil, un nuevo algoritmo que incrementa el ratio de FLOP/Byte para operadores de stencil de orden medio y alto reduciendo los accesos y promoviendo el reuso de datos. El algoritmo es ortogonal y puede ser combinado con técnicas como spatial- o time-blocking, añadiendo mejoras adicionales. Las nuevas tendencias hacia sistemas con procesadores multi-simétricos (SMP) -donde decenas de cores son replicados en el mismo procesador- plantean nuevos retos debido a la exacerbación del problema del ancho de memoria. Para paliar este problema, nuestra investigación se centra en estrategias para reducir la presión en la jerarquía de cache, particularmente cuando diversos threads comparten recursos debido a Simultaneous Multi-Threading (SMT). Introducimos diversos planificadores de descomposición de dominios para balancear la carga asegurando resultados casi óptimos sin poner en riesgo el rendimiento global. Combinamos estos planificadores con técnicas de spatial-blocking y auto-tuning, explorando el espacio paramétrico y reduciendo los fallos en la cache de último nivel. Como alternativa a los métodos de fuerza bruta usados en auto-tuning donde un espacio paramétrico se debe recorrer para encontrar un candidato, los modelos de rendimiento son una solución factible. Los modelos de rendimiento pueden predecir el rendimiento en diferentes arquitecturas, seleccionando parámetros suboptimos casi de forma instantánea. En esta tesis, ideamos un modelo de rendimiento para stencils flexible y extensible. El modelo es capaz de soportar arquitecturas multi-core incluyendo características complejas como prefetchers, SMT y optimizaciones algorítmicas. Nuestro modelo puede ser usado no solo para predecir los tiempos de ejecución, sino también para tomar decisiones de los mejores parámetros algorítmicos. Además, puede ser incluido en optimizadores run-time para decidir la mejor configuración SMT. Algunas industrias confían en técnicas DF para sus códigos. Sin embargo, no todos los aspectos que aparecen en la industria han sido sometidos a investigación. En este aspecto, hemos diseñado e implementado desde cero una infraestructura DF que cubre las características más importantes que una aplicación industrial debe incluir. Algunas de las técnicas de optimización propuestas en esta tesis han sido incluidas para contribuir en el rendimiento global a nivel industrial. Mostramos resultados de un par de aplicaciones estratégicas para la industria: un modelo de transporte atmosférico que simula la dispersión de ceniza volcánica y un modelo de imagen sísmica usado en la industria del petroleo y gas para identificar reservas ricas en hidrocarburos
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Azhar, Rizwan. "Upgrading and Performance Analysis of Thin Clients in Server Based Scientific Computing." Thesis, Linköpings universitet, Kommunikationssystem, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-65900.

Повний текст джерела
Анотація:
Server Based Computing (SBC) technology allows applications to be deployed, managed, supported and executed on the server and not on the client; only the screen information is transmitted between the server and client. This architecture solves many fundamental problems with application deployment, technical support, data storage, hardware and software upgrades. This thesis is targeted at upgrading and evaluating performance of thin clients in scientific Server Based Computing (SBC). Performance of Linux based SBC was assessed via methods of both quantitative and qualitative research. Quantitative method used benchmarks that measured typical-load performance with SAR and graphics performance with X11perf, Xbench and SPECviewperf. Structured interview, a qualitative research method, was adopted in which the number of open-ended questions in specific order was presented to users in order to estimate user-perceived performance. The first performance bottleneck identified was the CPU speed. The second performance bottleneck, with respect to graphics intensive applications, includes the network latency of the X11 protocol and the subsequent performance of old thin clients. An upgrade of both the computational server and thin clients was suggested. The evaluation after the upgrade involved performance analysis via quantitative and qualitative methods. The results showed that the new configuration had improved the performance.
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Ahrens, James P. "Scientific experiment management with high-performance distributed computation /." Thesis, Connect to this title online; UW restricted, 1996. http://hdl.handle.net/1773/6974.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Choi, Jee Whan. "Power and performance modeling for high-performance computing algorithms." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53561.

Повний текст джерела
Анотація:
The overarching goal of this thesis is to provide an algorithm-centric approach to analyzing the relationship between time, energy, and power. This research is aimed at algorithm designers and performance tuners so that they may be able to make decisions on how algorithms should be designed and tuned depending on whether the goal is to minimize time or to minimize energy on current and future systems. First, we present a simple analytical cost model for energy and power. Assuming a simple von Neumann architecture with a two-level memory hierarchy, this model pre- dicts energy and power for algorithms using just a few simple parameters, such as the number of floating point operations (FLOPs or flops) and the amount of data moved (bytes or words). Using highly optimized microbenchmarks and a small number of test platforms, we show that although this model uses only a few simple parameters, it is, nevertheless, accurate. We can also visualize this model using energy “arch lines,” analogous to the “rooflines” in time. These “rooflines in energy” allow users to easily assess and com- pare different algorithms’ intensities in energy and time to various target systems’ balances in energy and time. This visualization of our model gives us many inter- esting insights, and as such, we refer to our analytical model as the energy roofline model. Second, we present the results of our microbenchmarking study of time, energy, and power costs of computation and memory access of several candidate compute- node building blocks of future high–performance computing (HPC) systems. Over a dozen server-, desktop-, and mobile-class platforms that span a range of compute and power characteristics were evaluated, including x86 (both conventional and Xeon Phi accelerator), ARM, graphics processing units (GPU), and hybrid (AMD accelerated processing units (APU) and other system–on–chip (SoC)) processors. The purpose of this study was twofold; first, it was to extend the validation of the energy roofline model to a more comprehensive set of target systems to show that the model works well independent of system hardware and microarchitecture; second, it was to improve the model by uncovering and remedying potential shortcomings, such as incorporating the effects of power “capping,” multi–level memory hierarchy, and different implementation strategies on power and performance. Third, we incorporate dynamic voltage and frequency scaling (DVFS) into the energy roofline model to explore its potential for saving energy. Rather than the more traditional approach of using DVFS to reduce energy, whereby a “slack” in computation is used as an opportunity to dynamically cycle down the processor clock, the energy roofline model can be used to determine precisely how the time and energy costs of different operations, both compute and memory, change with respect to frequency and voltage settings. This information can be used to target a specific optimization goal, whether that be time, energy, or a combination of both. In the final chapter of this thesis, we use our model to predict the energy dissi- pation of a real application running on a real system. The fast multipole method (FMM) kernel was executed on the GPU component of the Tegra K1 SoC under various frequency and voltage settings and a breakdown of instructions and data ac- cess pattern was collected via performance counters. The total energy dissipation of FMM was then calculated as a weighted sum of these instructions and the associated costs in energy. On eight different voltage and frequency settings and eight different algorithm–specific input parameters per setting, for a total of 64 total test cases, the accuracy of the energy roofline model for predicting total energy dissipation was within 6.2%, with a standard deviation of 4.7%, when compared to actual energy measurements. Despite its simplicity and its foundation on the first principles of algorithm anal- ysis, the energy roofline model has proven to be both practical and accurate for real applications running on a real system. And as such, it can be an invaluable tool for al- gorithm designers and performance tuners with which they can more precisely analyze the impact of their design decisions on both performance and energy efficiency.
Стилі APA, Harvard, Vancouver, ISO та ін.
42

Beniamine, David. "Analyzing the memory behavior of parallel scientific applications." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM088/document.

Повний текст джерела
Анотація:
Depuis plusieurs décennies, afin de réduire la consommation énergétique des processeurs, les constructeurs fabriquent des ordinateurs de plus en plus parallèles.Dans le même temps, l'écart de fréquence entre les processeurs et la mémoire a significativement augmenté.Pour compenser cet écart, les processeurs modernes embarquent une hiérarchie de caches complexe.Développer un programme efficace sur de telles machines est une tâche complexe.Par conséquent, l'analyse de performance est devenue une étape majeure lors du développement d'applications requérant des performances.La plupart des outils d'analyse de performances se concentrent sur le point de vue du processeur.Ces outils voient la mémoire comme une entité monolithique et sont donc incapable de comprendre comment elle est accédée.Cependant, la mémoire est une ressource critique et les schémas d'accès à cette dernière peuvent impacter les performances de manière significative.Quelques outils permettant l'analyse de performances mémoire existent, cependant ils sont basé sur un échantillon age à large grain.Par conséquent, ces outils se concentrent sur une petite partie de l’Exécution et manquent le comportement global de l'application.De plus, l'échantillonnage à large granularité ne permet pas de collecter des schémas d'accès.Dans cette thèse, nous proposons deux outils différences pour analyser le comportement mémoire d'une application.Le premier outil est conçu spécifiquement pour pour les machines NUMA (Not Uniform Memory Accesses) et fournit plusieurs visualisations du schéma global de partage de chaque structure de données entre les flux d’ExécutionLe deuxième outil collecte des traces mémoires a grain fin avec information temporelles.Nous proposons de visualiser ces traces soit à l'aide d'un outil générique de gestion de traces soit en utilisant une approche programmatique basé sur R.De plus nous évaluons ces deux outils en les comparant a des outils existant de trace mémoire en terme de performances, précision et de complétude
Since a few decades, to reduce energy consumption, processor vendors builds more and more parallel computers.At the same time, the gap between processors and memory frequency increased significantly.To mitigate this gap, processors embed a complex hierarchical caches architectureWriting efficient code for such computers is a complex task.Therefore, performance analysis has became an important step of the development of applications seeking for performances.Most existing performance analysis tools focuses on the point of view of the processor.Theses tools see the main memory as a monolithic entity and thus are not able to understand how it is accessed.However, memory is a common bottleneck in High Performances Computing, and the pattern of memory accesses can impact significantly the performances.There are a few tools to analyze memory performances, however theses tools are based on a coarse grain sampling.Consequently, they focus on a small part of the execution missing the global memory behavior.Furthermore, these coarse grain sampling are not able to collect memory accesses patterns.In this thesis we propose two different tools to analyze the memory behavior of an application.The first tool is designed specifically for Not Uniform Memory Accesses machines and provides some visualizations of the global sharing pattern inside each data structure between the threads.The second one collects fine grain memory traces with temporal information.We can visualize theses traces either with a generic trace management framework or with a programmatic exploration using R.Furthermore we evaluate both of these tools, comparing them with state of the art memory analysis tools in terms of performances, precision and completeness
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Orobitg, Cortada Miquel. "High performance computing on biological sequence alignment." Doctoral thesis, Universitat de Lleida, 2013. http://hdl.handle.net/10803/110930.

Повний текст джерела
Анотація:
L'Alineament Múltiple de Seqüències (MSA) és una eina molt potent per a aplicacions biològiques importants. Els MSA són computacionalment complexos de calcular, i la majoria de les formulacions porten a problemes d'optimització NP-Hard. Per a dur a terme alineaments de milers de seqüències, nous desafiaments necessiten ser resolts per adaptar els algoritmes a l'era de la computació d'altes prestacions. En aquesta tesi es proposen tres aportacions diferents per resoldre algunes limitacions dels mètodes MSA. La primera proposta consisteix en un algoritme de construcció d'arbres guia per millorar el grau de paral•lelisme, amb la finalitat de resoldre el coll d'ampolla de l'etapa de l'alineament progressiu. La segona proposta consisteix en optimitzar la biblioteca de consistència per millorar el temps d'execució, l'escalabilitat, i poder tractar un major nombre de seqüències. Finalment, proposem Multiples Trees Alignment (MTA), un mètode MSA per alinear en paral•lel múltiples arbres guia, avaluar els alineaments obtinguts i seleccionar el millor com a resultat. Els resultats experimentals han demostrat que MTA millora considerablement la qualitat dels alineaments. El Alineamiento Múltiple de Secuencias (MSA) es una herramienta poderosa para aplicaciones biológicas importantes. Los MSA son computacionalmente complejos de calcular, y la mayoría de las formulaciones llevan a problemas de optimización NP-Hard. Para llevar a cabo alineamientos de miles de secuencias, nuevos desafíos necesitan ser resueltos para adaptar los algoritmos a la era de la computación de altas prestaciones. En esta tesis se proponen tres aportaciones diferentes para resolver algunas limitaciones de los métodos MSA. La primera propuesta consiste en un algoritmo de construcción de árboles guía para mejorar el grado de paralelismo, con el fin de resolver el cuello de botella de la etapa del alineamiento progresivo. La segunda propuesta consiste en optimizar la biblioteca de consistencia para mejorar el tiempo de ejecución, la escalabilidad, y poder tratar un mayor número de secuencias. Finalmente, proponemos Múltiples Trees Alignment (MTA), un método MSA para alinear en paralelo múltiples árboles guía, evaluar los alineamientos obtenidos y seleccionar el mejor como resultado. Los resultados experimentales han demostrado que MTA mejora considerablemente la calidad de los alineamientos. Multiple Sequence Alignment (MSA) is a powerful tool for important biological applications. MSAs are computationally difficult to calculate, and most formulations of the problem lead to NP-Hard optimization problems. To perform large-scale alignments, with thousands of sequences, new challenges need to be resolved to adapt the MSA algorithms to the High-Performance Computing era. In this thesis we propose three different approaches to solve some limitations of main MSA methods. The first proposal consists of a new guide tree construction algorithm to improve the degree of parallelism in order to resolve the bottleneck of the progressive alignment stage. The second proposal consists of optimizing the consistency library, improving the execution time and the scalability of MSA to enable the method to treat more sequences. Finally, we propose Multiple Trees Alignments (MTA), a MSA method to align in parallel multiple guide-trees, evaluate the alignments obtained and select the best one as a result. The experimental results demonstrated that MTA improves considerably the quality of the alignments.
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Algire, Martin. "Distributed multi-processing for high performance computing." Thesis, McGill University, 2000. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=31180.

Повний текст джерела
Анотація:
Parallel computing can take many forms. From a user's perspective, it is important to consider the advantages and disadvantages of each methodology. The following project attempts to provide some perspective on the methods of parallel computing and indicate where the tradeoffs lie along the continuum. Problems that are parallelizable enable researchers to maximize the computing resources available for a problem, and thus push the limits of the problems that can be solved. Solving any particular problem in parallel will require some very important design decisions to be made. These decisions may dramatically affect portability, performance, and cost of implementing a software solution to the problem. The results gained from this work indicate that although performance improvements are indeed possible---they are heavily dependent on the application in question and may require much more programming effort and expertise to implement.
Стилі APA, Harvard, Vancouver, ISO та ін.
45

King, Graham A. "High performance computing systems for signal processing." Thesis, Southampton Solent University, 1996. http://ssudl.solent.ac.uk/2424/.

Повний текст джерела
Анотація:
The submission begins by demonstrating that the conditions required for consideration under the University's research degrees regulations have been met in full. There then follows a commentary which starts by explaining the origin of the research theme concerned and which continues by discussing the nature and significance of the work. This has been an extensive programme to devise new methods of improving the computational speed and efficiency required for effective implementation of FIR and IIR digital filters and transforms. The problems are analysed and initial experimental work is described which sought to quantify the performance to be derived from peripheral vector processors. For some classes of computation, especially in real time, it was necessary to tum to pure systolic array hardware engines and a large number of innovations are suggested, both in array architecture and in the creation of a new hybrid opto-electronic adder capable of improving the performance ofprocessing elements for the array. This significant and original research is extended further by including a class of computation involving a bit sliced co-processor. A means of measuring the performance of this system is developed and discussed. The contribution of the work has been evident in: software innovation for horizontal architecture microprocessors; improved multi-dimensional systolic array designs; the development of completely new implementations of processing elements in such arrays; and in the details of co-processing architectures for bit sliced microprocessors. The use of Read Only Memory in creating n-dimensional FIR or IIR filters, and in executing the discrete cosine transform is a further innovative contribution that has enabled researchers to re-examine the case for pre-calculated systems previously using stored squares. The Read Only Memory work has suggested that Read Only Memory chips may be combined in a way architecturally similar to systolic array processing elements. This led to original concepts of pipelining for memory devices. The work is entirely coherent in that it covers the application of these contributions to a set of common processes, producing a set of performance graded and scaleable solutions. In order that effective solutions are proposed it was necessary to demonstrate a solid underlying appreciation of the computational mechanics involved. Whilst the published papers within this submission assume such an understanding , two appendices are provided to demonstrate the essential groundwork necessary to underpin the work resulting in these publications. The improved results obtained from the programme were threefold: execution time; theoretical clocking speeds and circuit areas; and speed up ratios. In the case of the investigations involving vector signal processors the issue was one of quantifying the performance bounds of the architecture in performing specific combinations of signal processing functions. An important aspect of this work was the optimisation achieved in the programming of the device. The use of innovative techniques reduced the execution time for the complex combinational algorithms involved to sub 10 milliseconds. Given the real time constraints for typical applications and the aims for this research the work evolved toward dedicated hardware solutions. Systolic arrays were thus a significant area of investigation. In such systems meritorious criteria are concerned with achieving: a higher regularity in architectural structure; data exchanges only with nearest neighbour processing elements; minimised global distribution functions such as power supplies and clock lines; minimised latency; minimisation in the use of latches; the elimination of output adders; and the design of higher speed processing elements. The programme has made original and significant contributions to the art of effective array design culminating in systems calculated to clock at 100MHz when using 1 micron CMOS technology, whilst creating reductions in transistor count when compared with contemporary implementations. The improvements vary by specific design but are ofthe order of30-l00% speed advantage and 20-30% less real estate usage. The third type of result was obtained when considering operations best executed by dedicated microcode running on bit sliced engines. The main issues for this part of the work were the development of effective interactions between host processors and the bit sliced processors used for computationally intensive and repetitive functions together with the evaluation of the relative performance of new bit sliced microcode solutions. The speed up obtained relative to a range of state of the art microprocessors (68040, 80386, 32032) ranged from 2: 1 to 8: 1. The programme of research is represented by sixteen papers divided into three groups corresponding to the following stages in the work: problem definition and initial responses involving vector processors; the synthesis of higher performance solutions using dedicated hardware; and bit sliced solutions
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Debbage, Mark. "Reliable communication protocols for high-performance computing." Thesis, University of Southampton, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358359.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Cox, Simon J. "Development and applications of high performance computing." Thesis, University of Southampton, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.242712.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Tavara, Shirin. "High-Performance Computing For Support Vector Machines." Licentiate thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-16556.

Повний текст джерела
Анотація:
Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems.
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Wong, Yee Lok Ph D. Massachusetts Institute of Technology. "High-performance computing with PetaBricks and Julia." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/67818.

Повний текст джерела
Анотація:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 163-170).
We present two recent parallel programming languages, PetaBricks and Julia, and demonstrate how we can use these two languages to re-examine classic numerical algorithms in new approaches for high-performance computing. PetaBricks is an implicitly parallel language that allows programmers to naturally express algorithmic choice explicitly at the language level. The PetaBricks compiler and autotuner is not only able to compose a complex program using fine-grained algorithmic choices but also find the right choice for many other parameters including data distribution, parallelization and blocking. We re-examine classic numerical algorithms with PetaBricks, and show that the PetaBricks autotuner produces nontrivial optimal algorithms that are difficult to reproduce otherwise. We also introduce the notion of variable accuracy algorithms, in which accuracy measures and requirements are supplied by the programmer and incorporated by the PetaBricks compiler and autotuner in the search of optimal algorithms. We demonstrate the accuracy/performance trade-offs by benchmark problems, and show how nontrivial algorithmic choice can change with different user accuracy requirements. Julia is a new high-level programming language that aims at achieving performance comparable to traditional compiled languages, while remaining easy to program and offering flexible parallelism without extensive effort. We describe a problem in large-scale terrain data analysis which motivates the use of Julia. We perform classical filtering techniques to study the terrain profiles and propose a measure based on Singular Value Decomposition (SVD) to quantify terrain surface roughness. We then give a brief tutorial of Julia and present results of our serial blocked SVD algorithm implementation in Julia. We also describe the parallel implementation of our SVD algorithm and discuss how flexible parallelism can be further explored using Julia.
by Yee Lok Wong.
Ph.D.
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Ravindrudu, Rahul. "Benchmarking More Aspects of High Performance Computing." Ames, Iowa : Oak Ridge, Tenn. : Ames Laboratory ; distributed by the Office of Scientific and Technical Information, U.S. Dept. of Energy, 2004. http://www.osti.gov/servlets/purl/837280-06M7ga/webviewable/.

Повний текст джерела
Анотація:
Thesis (M.S.); Submitted to Iowa State Univ., Ames, IA (US); 19 Dec 2004.
Published through the Information Bridge: DOE Scientific and Technical Information. "IS-T 2196" Rahul Ravindrudu. US Department of Energy 12/19/2004. Report is also available in paper and microfiche from NTIS.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії