Dissertations / Theses: 'Parallel'

1

Ferlin, Edson Pedro. "Avaliação de métodos de paralelização automática." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/76/76132/tde-09102008-111750/.

Full text

Abstract:

Este trabalho aborda alguns conceitos e definições de processamento paralelo, que são aplicados a paralelização automática, e também às análises e condições para as dependências dos dados, de modo a aplicarmos os métodos de paralelização: Hiperplano, Transformação Unimodular, Alocação de Dados Sem Comunicação e Particionamento & Rotulação. Desta forma, transformamos um programa seqüencial em seu equivalente paralelo. Utilizando-os em um sistema de memória distribuída com comunicação através da passagem de mensagem MPI (Message-Passing Interface), e obtemos algumas métricas para efetuarmos as avaliações/comparações entre os métodos.
This work invoke some concepts and definitions about parallel processing, applicable in the automatic parallelization, and also the analysis and conditions for the data dependence, in order to apply the methods for parallelization: Hyperplane, Unimodular Transformation, Communication-Free Data Allocation and Partitioning & Labeling. On this way, transform a sequential program into an equivalent parallel one. Applying these programs on the distributed-memory system with communication through message-passing MPI (Message-Passing Interface), and we obtain some measurements for the evaluations/comparison between those methods.

APA, Harvard, Vancouver, ISO, and other styles

2

Oliver, William R. "The Matrix a metaphorical paralell [i.e. parallel] to language /." View electronic thesis, 2008. http://dl.uncw.edu/etd/2008-3/oliverw/williamoliver.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Yousif, Hilal M. "Parallel algorithms for MIMD parallel computers." Thesis, Loughborough University, 1986. https://dspace.lboro.ac.uk/2134/15113.

Full text

Abstract:

This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results.

APA, Harvard, Vancouver, ISO, and other styles

4

Harrison, Ian. "Locality and parallel optimizations for parallel supercomputing." Diss., Connect to the thesis, 2003. http://hdl.handle.net/10066/1274.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Su, (Philip) Shin-Chen. "Parallel subdomain method for massively parallel computers." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/17376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Gecgel, Murat. "Parallel, Navier." Master's thesis, METU, 2003. http://etd.lib.metu.edu.tr/upload/12604807/index.pdf.

Full text

Abstract:

The aim of this study is to extend a parallel Fortran90 code to compute three&ndash
dimensional laminar and turbulent flowfields over rotary wing configurations. The code employs finite volume discretization and the compact, four step Runge-Kutta type time integration technique to solve unsteady, thin&ndash
layer Navier&ndash
Stokes equations. Zero&ndash
order Baldwin&ndash
Lomax turbulence model is utilized to model the turbulence for the computation of turbulent flowfields. A fine, viscous, H type structured grid is employed in the computations. To reduce the computational time and memory requirements parallel processing with distributed memory is used. The data communication among the processors is executed by using the MPI ( Message Passing Interface ) communication libraries. Laminar and turbulent solutions around a two bladed UH &ndash
1 helicopter rotor and turbulent solution around a flat plate is obtained. For the rotary wing configurations, nonlifting and lifting rotor cases are handled seperately for subsonic and transonic blade tip speeds. The results are, generally, in good agreement with the experimental data.

APA, Harvard, Vancouver, ISO, and other styles

7

Windowmaker, Tricia. "Parallel adolescents." Honors in the Major Thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1525.

Full text

Abstract:

This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Arts and Humanities
English

APA, Harvard, Vancouver, ISO, and other styles

8

Hassel, Karen Louise. "Parallel memories." The Ohio State University, 1993. http://rave.ohiolink.edu/etdc/view?acc_num=osu1314801102.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Andersson, Håkan. "Parallel Simulation : Parallel computing for high performance LTE radio network simulations." Thesis, Mittuniversitetet, Institutionen för informationsteknologi och medier, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-12390.

Full text

Abstract:

Radio access technologies for cellular mobile networks are continuously being evolved to meet the future demands for higher data rates, and lower end‐to‐end delays. In the research and development of LTE, radio network simulations play an essential role. The evolution of parallel processing hardware makes it desirable to exploit the potential gains of parallelizing LTE radio network simulations using multithreading techniques in contrast to distributing experiments over processors as independent simulation job processes. There is a hypothesis that parallel speedup gain diminishes when running many parallel simulation jobs concurrently on the same machine due to the increased memory requirements. A proposed multithreaded prototype of the Ericsson LTE simulator has been constructed, encapsulating scheduling, execution and synchronization of asynchronous physical layer computations. In order to provide implementation transparency, an algorithm has been proposed to sort and synchronize log events enabling a sequential logging model on top of non‐deterministic execution. In order to evaluate and compare multithreading techniques to parallel simulation job distribution, a large number of experiments have been carried out for four very diverse simulation scenarios. The evaluation of the results from these experiments involved analysis of average measured execution times and comparison with ideal estimates derived from Amdahl’s law in order to analyze overhead. It has been shown that the proposed multithreaded task‐oriented framework provides a convenient way to execute LTE physical layer models asynchronously on multi‐core processors, still providing deterministic results that are equivalent to the results of a sequential simulator. However, it has been indicated that distributing parallel independent jobs over processors is currently more efficient than multithreading techniques, even though the achieved speedup is far from ideal. This conclusion is based on the observation that the overhead caused by increased memory requirements, memory access and system bus congestion is currently smaller than the thread management and synchronization overhead of the proposed multithreaded Java prototype.

APA, Harvard, Vancouver, ISO, and other styles

10

Dai, Jiehua. "Automatic Parallel Memory Address Generation for Parallel DSP Computing." Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11110.

Full text

Abstract:

The concept of Parallel Vector (scratch pad) Memories (PVM) was introduced as one solution for Parallel Computing in DSP, which can provides parallel memory addressing efficiently with minimum latency. The parallel programming more efficient by using the parallel addressing generator for parallel vector memory (PVM) proposed in this thesis. However, without hiding complexities by cache, the cost of programming is high. To minimize the programming cost, automatic parallel memory address generation is needed to hide the complexities of memory access.

This thesis investigates methods for implementing conflict-free vector addressing algorithms on a parallel hardware structure. In particular, match vector addressing requirements extracted from the behaviour model to a prepared parallel memory addressing template, in order to supply data in parallel from the main memory to the on-chip vector memory.

According to the template and usage of the main and on-chip parallel vector memory, models for data pre-allocation and permutation in scratch pad memories of ASIP can be decided and configured. By exposing the parallel memory access of source code, the memory access flow graph (MFG) will be generated. Then MFG will be used combined with hardware information to match templates in the template library. When it is matched with one template, suited permutation equation will be gained, and the permutation table that include target addresses for data pre-allocation and permutation is created. Thus it is possible to automatically generate memory address for parallel memory accesses.

A tool for achieving the goal mentioned above is created, Permutator, which is implemented in C++ combined with XML. Memory access coding template is selected, as a result that permutation formulas are specified. And then PVM address table could be generated to make the data pre-allocation, so that efficient parallel memory access is possible.

The result shows that the memory access complexities is hiden by using Permutator, so that the programming cost is reduced.It works well in the context that each algorithm with its related hardware information is corresponding to a template case, so that extra memory cost is eliminated.

APA, Harvard, Vancouver, ISO, and other styles

11

Ghanemi, Salim. "Non-numerical parallel algorithms for asynchronous parallel computer systems." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/28016.

Full text

Abstract:

The work in this thesis covers mainly the design and analysis of many important Non-Numerical Parallel Algorithms that run on MIMD type Parallel Computer Systems (PCSs), in particular the NEPTUNE and the SEQUENT BALANCE 8000 PCSs available at Loughborough University of Technology.

APA, Harvard, Vancouver, ISO, and other styles

12

Aswad, Mustafa K. H. "Architecture aware parallel programming in Glasgow parallel Haskell (GPH)." Thesis, Heriot-Watt University, 2012. http://hdl.handle.net/10399/2589.

Full text

Abstract:

General purpose computing architectures are evolving quickly to become manycore and hierarchical: i.e. a core can communicate more quickly locally than globally. To be effective on such architectures, programming models must be aware of the communications hierarchy. This thesis investigates a programming model that aims to share the responsibility of task placement, load balance, thread creation, and synchronisation between the application developer and the runtime system. The main contribution of this thesis is the development of four new architectureaware constructs for Glasgow parallel Haskell that exploit information about task size and aim to reduce communication for small tasks, preserve data locality, or to distribute large units of work. We define a semantics for the constructs that specifies the sets of PEs that each construct identifies, and we check four properties of the semantics using QuickCheck. We report a preliminary investigation of architecture aware programming models that abstract over the new constructs. In particular, we propose architecture aware evaluation strategies and skeletons. We investigate three common paradigms, such as data parallelism, divide-and-conquer and nested parallelism, on hierarchical architectures with up to 224 cores. The results show that the architecture-aware programming model consistently delivers better speedup and scalability than existing constructs, together with a dramatic reduction in the execution time variability. We present a comparison of functional multicore technologies and it reports some of the first ever multicore results for the Feedback Directed Implicit Parallelism (FDIP) and the semi-explicit parallelism (GpH and Eden) languages. The comparison reflects the growing maturity of the field by systematically evaluating four parallel Haskell implementations on a common multicore architecture. The comparison contrasts the programming effort each language requires with the parallel performance delivered. We investigate the minimum thread granularity required to achieve satisfactory performance for three implementations parallel functional language on a multicore platform. The results show that GHC-GUM requires a larger thread granularity than Eden and GHC-SMP. The thread granularity rises as the number of cores rises.

APA, Harvard, Vancouver, ISO, and other styles

13

Enomoto, Cristina. "Uma linguagem para especificação de fluxo de execução em aplicações paralelas." [s.n.], 2005. http://repositorio.unicamp.br/jspui/handle/REPOSIP/261813.

Full text

Abstract:

Orientador: Marco Aurelio Amaral Henriques
Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação
Made available in DSpace on 2018-08-05T12:56:47Z (GMT). No. of bitstreams: 1 Enomoto_Cristina_M.pdf: 856279 bytes, checksum: ce524a49db0f67734e28d8458d5deb0b (MD5) Previous issue date: 2005
Resumo: Vários sistemas de grid e computação distribuída existentes só permitem a execução de aplicações com um fluxo de execução de tarefas básico, no qual é feita a distribuição das tarefas executadas em paralelo e depois a coleta de seus resultados. Outros sistemas permitem definir uma relação de dependências entre as tarefas, formando um grafo direcionado acíclico. Porém, mesmo com este modelo de fluxo de execução não é possível executar vários tipos de aplicações que poderiam ser paralelizadas, como, por exemplo, algoritmos genéticos e de cálculo numérico que utilizam algum tipo de processamento iterativo. Nesta dissertação é proposta uma linguagem de especificação para fluxo de execução de aplicações paralelas que permite um controle de fluxo de tarefas mais flexível, viabilizando desvios condicionais e laços com iterações controladas. A linguagem é baseada na notação XML (eXtensible Markup Language), o que lhe confere características importantes tais como flexibilidade e simplicidade. Para avaliar estas e outras características da linguagem proposta, foi feita uma implementação sobre o sistema de processamento paralelo JoiN. Além de viabilizar a criação e execução de novas aplicações paralelas cujos fluxos de tarefas contêm laços e/ou desvios condicionais, a linguagem se mostrou simples de usar e não causou sobrecarga perceptível ao sistema paralelo
Abstract: Many distributed and parallel systems allow only a basic task flow, in which the parallel tasks are distributed and their results collected. In some systems the application execution flow gives support to a dependence relationship among tasks, represented by a directed acyclic graph. Even with this model it is not possible to execute in parallel some important applications as, for example, genetic algorithms. Therefore, there is a need for a new specification model with more sophisticated flow controls that allow some kind of iterative processing at the level of task management. The purpose of this work is to present a proposal for a specification language for parallel application execution workflow, which provides new types of control structures and allows the implementation of a broader range of applications. This language is based on XML (eXtensible Markup Language) notation, which provides characteristics like simplicity and flexibility to the proposed language. To evaluate these and other characteristics of the language, it was implemented on the JoiN parallel processing system. Besides allowing the creation and execution of new parallel applications containing task flows with loops and conditional branches, the proposedlanguage was easy to use and did not cause any significant overhead to the parallel system
Mestrado
Engenharia de Computação
Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

14

Vera, Rodríguez Gonzalo. "R/parallel Parallel Computing for R in non‐dedicated environments." Doctoral thesis, Universitat Autònoma de Barcelona, 2010. http://hdl.handle.net/10803/121248.

Full text

Abstract:

Traditionally, parallel computing has been associated with special purpose applications designed to run in complex computing clusters, specifically set up with a software stack of dedicated libraries together with advanced administration tools to manage co Traditionally, parallel computing has been associated with special purpose applications designed to run in complex computing clusters, specifically set up with a software stack of dedicated libraries together with advanced administration tools to manage complex IT infrastructures. These High Performance Computing (HPC) solutions, although being the most efficient solutions in terms of performance and scalability, impose technical and practical barriers for most common scientists whom, with reduced IT knowledge, time and resources, are unable to embrace classical HPC solutions without considerable efforts. Moreover, two important technology advances are increasing the need for parallel computing. For example in the bioinformatics field, and similarly in other experimental science disciplines, new high throughput screening devices are generating huge amounts of data within very short time which requires their analysis in equally short time periods to avoid delaying experimental analysis. Another important technological change involves the design of new processor chips. To increase raw performance the current strategy is to increase the number of processing units per chip, so to make use of the new processing capacities parallel applications are required. In both cases we find users that may need to update their current sequential applications and computing resources to achieve the increased processing capacities required for their particular needs. Since parallel computing is becoming a natural option for obtaining increased performance and it is required by new computer systems, solutions adapted for the mainstream should be developed for a seamless adoption. In order to enable the adoption of parallel computing, new methods and technologies are required to remove or mitigate the current barriers and obstacles that prevent many users from evolving their sequential running environments. A particular scenario that specially suffers from these problems and that is considered as a practical case in this work consists of bioinformaticians analyzing molecular data with methods written with the R language. In many cases, with long datasets, they have to wait for days and weeks for their data to be processed or perform the cumbersome task of manually splitting their data, look for available computers to run these subsets and collect back the previously scattered results. Most of these applications written in R are based on parallel loops. A loop is called a parallel loop if there is no data dependency among all its iterations, and therefore any iteration can be processed in any order or even simultaneously, so they are susceptible of being parallelized. Parallel loops are found in a large number of scientific applications. Previous contributions deal with partial aspects of the problems suffered by this kind of users, such as providing access to additional computing resources or enabling the codification of parallel problems, but none takes proper care of providing complete solutions without considering advanced users with access to traditional HPC platforms. Our contribution consists in the design and evaluation of methods to enable the easy parallelization of applications based in parallel loops written in R using non-dedicated environments as a computing platform and considering users without proper experience in parallel computing or system management skills. As a proof of concept, and in order to evaluate the feasibility of our proposal, an extension of R, called R/parallel, has been developed to test our ideas in real environments with real bioinformatics problems. The results show that even in situations with a reduced level of information about the running environment and with a high degree of uncertainty about the quantity and quality of the available resources it is possible to provide a software layer to enable users without previous knowledge and skills adapt their applications with a minimal effort and perform concurrent computations using the available computers. Additionally of proving the feasibility of our proposal, a new self-scheduling scheme, suitable for parallel loops in dynamics environments has been contributed, the results of which show that it is possible to obtain improved performance levels compared to previous contributions in best-effort environments. The main conclusion is that, even in situations with limited information about the environment and the involved technologies, it is possible to provide the mechanisms that will allow users without proper knowledge and time restrictions to conveniently make use and take advantage of parallel computing technologies, so closing the gap between classical HPC solutions and the mainstream of users of common applications, in our case, based in parallel loops with R. mplex IT infrastructures. These High Performance Computing (HPC) solutions, although being the most efficient solutions in terms of performance and scalability, impose technical and practical barriers for most common scientists whom, with reduced IT knowledge, time and resources, are unable to embrace classical HPC solutions without considerable efforts. Moreover, two important technology advances are increasing the need for parallel computing. For example in the bioinformatics field, and similarly in other experimental science disciplines, new high throughput screening devices are generating huge amounts of data within very short time which requires their analysis in equally short time periods to avoid delaying experimental analysis. Another important technological change involves the design of new processor chips. To increase raw performance the current strategy is to increase the number of processing units per chip, so to make use of the new processing capacities parallel applications are required. In both cases we find users that may need to update their current sequential applications and computing resources to achieve the increased processing capacities required for their particular needs. Since parallel computing is becoming a natural option for obtaining increased performance and it is required by new computer systems, solutions adapted for the mainstream should be developed for a seamless adoption. In order to enable the adoption of parallel computing, new methods and technologies are required to remove or mitigate the current barriers and obstacles that prevent many users from evolving their sequential running environments. A particular scenario that specially suffers from these problems and that is considered as a practical case in this work consists of bioinformaticians analyzing molecular data with methods written with the R language. In many cases, with long datasets, they have to wait for days and weeks for their data to be processed or perform the cumbersome task of manually splitting their data, look for available computers to run these subsets and collect back the previously scattered results. Most of these applications written in R are based on parallel loops. A loop is called a parallel loop if there is no data dependency among all its iterations, and therefore any iteration can be processed in any order or even simultaneously, so they are susceptible of being parallelized. Parallel loops are found in a large number of scientific applications. Previous contributions deal with partial aspects of the problems suffered by this kind of users, such as providing access to additional computing resources or enabling the codification of parallel problems, but none takes proper care of providing complete solutions without considering advanced users with access to traditional HPC platforms. Our contribution consists in the design and evaluation of methods to enable the easy parallelization of applications based in parallel loops written in R using non-dedicated environments as a computing platform and considering users without proper experience in parallel computing or system management skills. As a proof of concept, and in order to evaluate the feasibility of our proposal, an extension of R, called R/parallel, has been developed to test our ideas in real environments with real bioinformatics problems. The results show that even in situations with a reduced level of information about the running environment and with a high degree of uncertainty about the quantity and quality of the available resources it is possible to provide a software layer to enable users without previous knowledge and skills adapt their applications with a minimal effort and perform concurrent computations using the available computers. Additionally of proving the feasibility of our proposal, a new self-scheduling scheme, suitable for parallel loops in dynamics environments has been contributed, the results of which show that it is possible to obtain improved performance levels compared to previous contributions in best-effort environments. The main conclusion is that, even in situations with limited information about the environment and the involved technologies, it is possible to provide the mechanisms that will allow users without proper knowledge and time restrictions to conveniently make use and take advantage of parallel computing technologies, so closing the gap between classical HPC solutions and the mainstream of users of common applications, in our case, based in parallel loops with R.

APA, Harvard, Vancouver, ISO, and other styles

15

Normann, Per. "Parallel graph coloring : Parallel graph coloring on multi-core CPUs." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-227656.

Full text

Abstract:

In recent times an evident trend in hardware is to opt for multi-core CPUs. This has lead to a situation where an increasing number of sequential algorithms are parallelized to fit these new multi-core environments. The greedy Multi-Coloring algorithm is a strictly sequential algorithm that is used in a wide range of applications. The application in focus is on decomposition by graph coloring for preconditioning techniques suitable for iterative solvers like the and methods. In order to perform all phases of these iterative solvers in parallel the graph analysis phase needs to be parallelized. Albeit many attempts have been made to parallelize graph coloring non of these attempts have successfully put the greedy Multi-Coloring algorithm into obsolescence. In this work techniques for parallel graph coloring are proposed and studied. Quantitative results, which represent the number of colors, confirm that the best new algorithm, the Normann algorithm, is performing on the same level as the greedy Multi-Coloring algorithm. Furthermore, in multi-core environments the parallel Normann algorithm fully outperforms the classical greedy Multi-Coloring algorithm for all large test matrices. With the features of the Normann algorithm quantified and presented in this work it is now possible to perform all phases of iterative solvers like and methods in parallel.

APA, Harvard, Vancouver, ISO, and other styles

16

Goswami, Dhrubajyoti. "Parallel architectural skeletons, re-usable building blocks for parallel applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp05/NQ65241.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Rau-Chaplin, Andrew Carleton University Dissertation Computer Science. "On parallel data structures and parallel geometric applications for multicomputers." Ottawa, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

18

Álvarez, Mesa Mauricio. "Parallel video decoding." Doctoral thesis, Universitat Politècnica de Catalunya, 2011. http://hdl.handle.net/10803/80382.

Full text

Abstract:

Digital video is a popular technology used in many different applications. The quality of video, expressed in the spatial and temporal resolution, has been increasing continuously in the last years. In order to reduce the bitrate required for its storage and transmission, a new generation of video encoders and decoders (codecs) have been developed. The latest video codec standard, known as H.264/AVC, includes sophisticated compression tools that require more computing resources than any previous video codec. The combination of high quality video and the advanced compression tools found in H.264/AVC has resulted in a significant increase in the computational requirements of video decoding applications. The main objective of this thesis is to provide the performance required for real-time operation of high quality video decoding using programmable architectures. Our solution has been the simultaneous exploitation of multiple levels of parallelism. On the one hand, video decoders have been modified in order to extract as much parallelism as possible. And, on the other hand, general purpose architectures has been enhanced for exploiting the type of parallelism that is present in video codec applications.
El vídeo digital es una tecnología popular utilizada en una gran variedad de aplicaciones. La calidad de vídeo, expresada en la resolución espacial y temporal, ha ido aumentando constantemente en los últimos años. Con el fin de reducir la tasa de bits requerida para su almacenamiento y transmisión, se ha desarrollado una nueva generación de codificadores y decodificadores (códecs) de vídeo. El códec estándar de vídeo más reciente, conocido como H.264/AVC, incluye herramientas sofisticadas de compresión que requieren más recursos de computación que los códecs de vídeo anteriores. El efecto combinado del vídeo de alta calidad y las herramientas de compresión avanzada incluidas en el H.264/AVC han llevado a un aumento significativo de los requerimientos computacionales de la decodificación de vídeo. El objetivo principal de esta tesis es proporcionar el rendimiento necesario para la decodificación en tiempo real de vídeo de alta calidad. Nuestra solución ha sido la explotación simultánea de múltiples niveles de paralelismo. Por un lado, se realizaron modificaciones en el decodificador de vídeo con el fin de extraer múltiples niveles de paralelismo. Y, por otro lado, se modificaron las arquitecturas de propósito general para mejorar la explotación del tipo paralelismo que está presente en las aplicaciones de vídeo. Primero hicimos un análisis de la escalabilidad de dos extensiones de Instrucción Simple con Múltiples Datos (SIMD por sus siglas en inglés): una de una dimensión (1D) y otra matricial de dos dimensiones (2D). Se demostró que al escalar la extensión 2D se obtiene un mayor rendimiento con una menor complejidad que al escalar la extensión 1D. Luego se realizó una caracterización de la decodificación de H.264/AVC en aplicaciones de alta definición (HD) donde se identificaron los núcleos principales. Debido a la falta de un punto de referencia (benchmark) adecuado para la decodificación de vídeo HD, desarrollamos uno propio, llamado HD-VideoBench el cual incluye aplicaciones completas de codificación y decodificación de vídeo junto con una serie de secuencias de vídeo en HD. Después optimizamos los núcleos más importantes del decodificador H.264/AVC usando instrucciones SIMD. Sin embargo, los resultados no alcanzaron el máximo rendimiento posible debido al efecto negativo de la desalineación de los datos en memoria. Como solución, evaluamos el hardware y el software necesarios para realizar accesos no alineados. Este soporte produjo mejoras significativas de rendimiento en la aplicación. Aparte se realizó una investigación sobre cómo extraer paralelismo de nivel de tarea. Se encontró que ninguno de los mecanismos existentes podía escalar para sistemas masivamente paralelos. Como alternativa, desarrollamos un nuevo algoritmo que fue capaz de encontrar miles de tareas independientes al explotar paralelismo de nivel de macrobloque. Luego implementamos una versión paralela del decodificador de H.264 en una máquina de memoria compartida distribuida (DSM por sus siglas en inglés). Sin embargo esta implementación no alcanzó el máximo rendimiento posible debido al impacto negativo de las operaciones de sincronización y al efecto del núcleo de decodificación de entropía. Con el fin de eliminar estos cuellos de botella se evaluó la paralelización al nivel de imagen de la fase de decodificación de entropía combinada con la paralelización al nivel de macrobloque de los demás núcleos. La sobrecarga de las operaciones de sincronización se eliminó casi por completo mediante el uso de operaciones aceleradas por hardware. Con todas las mejoras presentadas se permitió la decodificación, en tiempo real, de vídeo de alta definición y alta tasa de imágenes por segundo. Como resultado global se creó una solución escalable capaz de usar el número creciente procesadores en las arquitecturas multinúcleo.

APA, Harvard, Vancouver, ISO, and other styles

19

Poggiali, Dario. "Parallel geometry processing." Zürich : ETH, Eidgenössische Technische Hochschule Zürich, cgl Computer Graphics Laboratory, 2008. http://e-collection.ethbib.ethz.ch/show?type=dipl&nr=393.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Gamble, James Graham. "Explicit parallel programming." Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-06082009-171019/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Holt, C. M. "Quasi-parallel processing." Thesis, University of Oxford, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.375244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Filos, Jason. "Parallel Transmission MRI." Thesis, Imperial College London, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.516789.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

練偉森 and Wai-sum Lin. "Adaptive parallel rendering." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31221415.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Stahl, Frederic Theodor. "Parallel rule induction." Thesis, University of Portsmouth, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508872.

Full text

Abstract:

Classification rule induction on large datasets is a major challenge in the field of data mining in a world where massive amounts of data are recorded on a large scale. There are two main approaches to classification rule induction; the 'divide and conquer' approach and the 'separate and conquer' approach. Even though both approaches deliver a comparable classification accuracy, they differ when it comes to rule representation and quality of rules in certain circumstances. There is the intuitive representation of classification rules in the form of a tree when using the 'divide and conquer' approach which is easy to assimilate by humans. However, modular rules induced by the 'separate and conquer' approach generally perform better in environments where the training data of the classifier is noisy or contains clashes. The term 'modular rules' is used to mean any set of rules describing some domain of interest. They will generally not fit together naturally in a decision tree. Both approaches are challenged by increasingly large volumes of data. There have been several attempts to scale up the 'divide and conquer' approach, however there is very little work on scaling up the 'separate and conquer' approach. One general approach is to use supercomputers with faster hardware to process these huge amounts of data, yet modest-sized organisations may not be able to afford such hardware. However most organisations have local computer workstations that they use for many applications such as word processing or spreadsheets. These computer workstations are usually connected in a local network and mainly used during normal working hours and are usually idle overnight and at weekends. During these idle times these computer workstations connected in a network could be used for data mining applications on large datasets. This research focuses on a cheap solution for modest sized organisations that cannot afford fast supercomputers. For this reason this work aims to utilise the computational power and memory of a network of workstations. In this research a novel framework for scaling up modular classification rule induction is presented, based on a distributed blackboard architecture. The framework is called PMCRI (Parallel Modular Classification Rule Inducer). It provides an underlying communication infrastructure for parallelising a whole family of modular classification rule induction algorithms: the Prism family. Experimental results obtained show a good scale up behaviour on various datasets and thus confirm the success of PMCRI.

APA, Harvard, Vancouver, ISO, and other styles

25

Minbashian, Behnam. "Intelligent parallel controllers." Thesis, University of Reading, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.331933.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

KoÌˆse, Cemal. "Parallel volume visualisation." Thesis, University of Bristol, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.361100.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Jackson, Robert Owen. "Heterogeneous parallel computing." Thesis, University of Birmingham, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.366162.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Gomes, Luís Manuel dos Santos. "Parallel texts alignment." Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2051.

Full text

Abstract:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Alignment of parallel texts (texts that are translation of each other) is a required step for many applications that use parallel texts, including statistical machine translation, automatic extraction of translation equivalents, automatic creation of concordances, etc. This dissertation presents a new methodology for parallel texts alignment that departs from previous work in several ways. One important departure is a shift of goals concerning the use of lexicons for obtaining correspondences between the texts. Previous methods try to infer a bilingual lexicon as part of the alignment process and use it to obtain correspondences between the texts. Some of those methods can use external lexicons to complement the inferred one, but they tend to consider them as secondary. This dissertation presents several arguments supporting the thesis that lexicon inference should not be embedded in the alignment process. The method described complies with this statement and relies exclusively on externally managed lexicons to obtain correspondences. Moreover, the algorithms presented can handle very large lexicons containing terms of arbitrary length. Besides the exclusive use of external lexicons, this dissertation presents a new method for obtaining correspondences between translation equivalents found in the texts. It uses a decision criteria based on features that have been overlooked by prior work. The proposed method is iterative and refines the alignment at each iteration. It uses the alignment obtained in one iteration as a guide to obtaining new correspondences in the next iteration, which in turn are used to compute a finer alignment. This iterative scheme allows the method to correct correspondence errors from previous iterations in face of new information.

APA, Harvard, Vancouver, ISO, and other styles

29

Hulot, Carlos. "Parallel tracking systems." Thesis, University of Southampton, 1995. https://eprints.soton.ac.uk/264882/.

Full text

Abstract:

Tracking Systems provide an important analysis technique that can be used in many different areas of science. A Tracking System can be defined as the estimation of the dynamic state of moving objects based on `inaccurate’ measurements taken by sensors. The area encompasses a wide range of subjects, although the two most essential elements are estimation and data association. Tracking systems are applicable to relatively simple as well as more complex applications. These include air traffic control, ocean surveillance and control sonar tracking, military surveillance, missile guidance, physics particle experiments, global positioning systems and aerospace. This thesis describes an investigation into state-of-the-art tracking algorithms and distributed memory architectures (Multiple Instructions Multiple Data systems - “MIMD”) for parallel processing of tracking systems. The first algorithm investigated is the Interacting Multiple Model (IMM) which has been shown recently to be one of the most cost-effective in its class. IMM scalability is investigated for tracking single targets in a clean environment. Next, the IMM is coupled with a well-established Bayesian data association technique known as Probabilistic Data Association (PDA) to permit the tracking of a target in different clutter environments (IMMPDA). As in the previous case, IMMPDA scalability is investigated for tracking a single target in different clutter environments. In order to evaluate the effectiveness of these new parallel techniques, standard languages and parallel software systems (to provide message-passing facilities) have been used. The main objective is to demonstrate how these complex algorithms can benefit in the general case from being implemented using parallel architectures.

APA, Harvard, Vancouver, ISO, and other styles

30

Hering, Klaus. "Parallel Cycle Simulation." Universität Leipzig, 1996. https://ul.qucosa.de/id/qucosa%3A34504.

Full text

Abstract:

Parallelization of logic simulation on register-transfer and gate level is a promising way to accelerate extremely time extensive system simulation processes for whole processor structures. In this report parallel simulation realized by means of the functional simulator parallel- TEXSIM based on the clock-cycle algorithm is considered. Within a corresponding simulation, several simulator instances co-operate over a loosely-coupled processor system, each instance simulating a part of a synchronous hardware design. Therefore, in preparation of parallel simulation, partitioning of hardware models is necessary, which is essentially determining e±ciency of the following simulation. A framework of formal concepts for an abstract description of parallel cycle simulation is developed. This provides the basis for partition valuation within partitioning algorithms. Starting from the definition of a Structural Hardware Model as special bipartite graph Sequential Cycle Simulation is introduced as sequence of actions. Following a cone-based partitioning approach a Parallel Structural Hardware Model is defined as set of Structural Hardware Models. Furthermore, a model of parallel computation called Communicating Processors is introduced which is closely related to the well known LogP Model. Together with the preceding concepts it represents the basis for determining Parallel Cycle Simulation as sequence of action sets.

APA, Harvard, Vancouver, ISO, and other styles

31

Zhang, Hua 1954. "Practical Parallel Processing." Thesis, University of North Texas, 1996. https://digital.library.unt.edu/ark:/67531/metadc278769/.

Full text

Abstract:

The physical limitations of uniprocessors and the real-time requirements of numerous practical applications have made parallel processing an essential technology in military, industry and scientific research. In this dissertation, we investigate parallelizations of three practical applications using three parallel machine models. The algorithms are: Finitely inductive (FI) sequence processing is a pattern recognition technique used in many fields. We first propose four parallel FI algorithms on the EREW PRAM. The time complexity of the parallel factoring and following by bucket packing is O(sk^2 n/p), and they are optimal under some conditions. The parallel factoring and following by hashing requires O(sk^2 n/p) time when uniform hash functions are used and log(p) ≤ k n/p and pm ≈ n. Their speedup is proportional to the number processors used. For these results, s is the number of levels, k is the size of the antecedents and n is the length of the input sequence and p is the number of processors. We also describe algorithms for raster/vector conversion based on the scan model to handle block-like connected components of arbitrary geometrical shapes with multi-level nested dough nuts for the IES (image exploitation system). Both the parallel raster-to-vector algorithm and parallel vector-to-raster algorithm require O(log(n2)) or O(log2(n2)) time (depending on the sorting algorithms used) for images of size n2 using p = n2 processors. Not only is the DWT (discrete wavelet transforms) useful in data compression, but also has it potentials in signal processing, image processing, and graphics. Therefore, it is of great importance to investigate efficient parallelizations of the wavelet transforms. The time complexity of the parallel forward DWT on the parallel virtual machine with linear processor organization is O(((so+s1)mn)/p), where s0 and s1 are the lengths of the filters and p is the number of processors used. The time complexity of the inverse DWT is also O(((so+s1)mn)/p). If the processors are organized as a 2D array with PrawPcol processors, both the interleaved parallel DWT and IDWT have the time complexity of O(((so+s1)mn)/ProwPcol). We have parallelized three applications and achieved optimality or best-possible performances for each of the three applications over each of the chosen machine models. Future research will involve continued examination of parallel architectures for implementation of practical problems.

APA, Harvard, Vancouver, ISO, and other styles

32

Handler, Caroline. "Parallel process placement." Thesis, Rhodes University, 1989. http://hdl.handle.net/10962/d1002033.

Full text

Abstract:

This thesis investigates methods of automatic allocation of processes to available processors in a given network configuration. The research described covers the investigation of various algorithms for optimal process allocation. Among those researched were an algorithm which used a branch and bound technique, an algorithm based on graph theory, and an heuristic algorithm involving cluster analysis. These have been implemented and tested in conjunction with the gathering of performance statistics during program execution, for use in improving subsequent allocations. The system has been implemented on a network of loosely-coupled microcomputers using multi-port serial communication links to simulate a transputer network. The concurrent programming language occam has been implemented, replacing the explicit process allocation constructs with an automatic placement algorithm. This enables the source code to be completely separated from hardware considerations

APA, Harvard, Vancouver, ISO, and other styles

33

Bhalerao, Rohit Dinesh. "Parallel XML parsing." Diss., Online access via UMI:, 2007.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

34

Pan, Yinfei. "Parallel XML parsing." Diss., Online access via UMI:, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

35

Lin, Wai-sum. "Adaptive parallel rendering /." Hong Kong : University of Hong Kong, 1999. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20868236.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Keller, Kody. "Parallel and Allegory." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4200.

Full text

Abstract:

Parallel and Allegory is a series of four pieces that look deeper into specific Christian beliefs. Most directly addressed those that dealt with specific parallels and allegorical relationships. Specific symbols such as nails, hammers, wood, trees, people, fruit, a cup, knife a rope and a stone were the focus of the pieces in the exhibition. Four combinations of these symbols were created to create dialogue and introspection.

APA, Harvard, Vancouver, ISO, and other styles

37

Gómez, Sánchez Pilar. "Analyzing the parallel applications’ I/O behavior impact on HPC systems." Doctoral thesis, Universitat Autònoma de Barcelona, 2018. http://hdl.handle.net/10803/586177.

Full text

Abstract:

Donat que el volum de dades generat per les aplicacions científiques creix i la pressió sobre el sistema d’E/S dels sistemes HPC també augmenta, es proposa un model de comportament d’E/S per les aplicacions cientifiques paral.leles de pas de missatges MPI (Message Passing Interface) amb l’objectiu d’analitzar l’impacte de les aplicacions en el sistema d’E/S. Analitzar les aplicacions les aplicacions paral.leles MPI a nivell POSIX-IO permet observar com es tracten les dades de l’aplicació en aquest nivell. En aquest treball de recerca es presenta: la definició del model PIOM-PX. la metodologia aplicada per extraure el model i l’eina PIOM-PX-Trace-Tool. Donat que PIOM-PX està basat en el concepte de E/S, es poden identificar les fases més significatives. Fases que tenen més influència que altres en el sistema d’E/S, provocant un coll d’ampolla o un rendiment pobre. L’anàlisis en base a les fases d’E/S permeten identificar, acotar i intentar reduir l’impacte d’aquestes fases sobre el sistema d’E/S. PIOM-PX forma part del model proposat PIOM que integra el model de comportament d’E/S a nivell de POSIX-IO (PIOM-PX) i el model de comportament d’E/S a nivell de MPI-IO (PIOM-MP, antic PAS2P-IO). El model proporciona la informació necessaria, per a que utilitzant programes sintètics programables es pugui replicar el comportament de l’aplicació en diferents sistemes. PIOM-PX-Trace-Tool permet interceptar instruccions de POSIX-IO utilitzades durant l’execució de l’aplicació. Els experiments realitzats s’han executat en varis sistemes HPC estandard i en la plataforma Cloud, on s’ha pogut comprovar la utilitat del model proposat, PIOM.
Dado que el volumen de datos generado por las aplicaciones científicas crece y la presión sobre el sistema de E/S de los sistemas HPC también aumenta, se propone un modelo de comportamiento de E/S para las aplicaciones científicas paralelas de paso de mensajes (MPI -Message Passing Interface-) con el objetivo de analizar el impacto de las aplicaciones en el sistema de E/S. Analizar las aplicaciones paralelas MPI a nivel POSIX-IO permite observar cómo se tratan los datos de la aplicación a ese nivel. En este trabajo de investigación se presenta: la definición del modelo PIOM-PX, la metodología aplicada para extraer dicho modelo y la herramienta PIOM-PX-Trace-Tool. Dado que PIOM-PX está basado en el concepto de fase de E/S, se pueden identificar las fases más significativas. Fases que tienen más influencia que otras en el sistema de E/S, que podrían provocar un cuello de botella o un rendimiento pobre. El análisis en base a las fases de E/S permite identificar, acotar e intentar reducir el impacto de esas fases sobre el sistema de E/S. PIOM-PX forma parte del modelo propuesto PIOM que integra el modelo de comportamiento de E/S a nivel de POSIX-IO (PIOM-PX) y el modelo de comportamiento de E/S a nivel de MPI-IO (PIOM-MP, antiguo PAS2P-IO). El modelo proporciona la información necesaria, para que utilizando programas sintéticos programables se pueda replicar el comportamiento de la aplicación en diferentes sistemas. PIOM-PX-Trace-Tool permite interceptar instrucciones de POSIX-IO utilizadas durante la ejecución de la aplicación. Los experimentos realizados se han ejecutado en varios sistemas HPC estándar y en la plataforma Cloud, donde se ha podido comprobar la utilidad del modelo propuesto, PIOM.
The volume of data generated by scientific applications grows and the pressure on the I/O system of HPC systems also increases. For this reason, an I/O behavior model is proposed for scientific MPI (Message Passing Interface) parallel applications. The goal is to analyze the applications’ impact on the I/O system. Analyzing the MPI parallel applications at POSIX-IO level allows observing how the application’s data are treated at that level. In this research work, the following is presented: the I/O behavior model definition at POSIX-IO level (PIOM-PX model definition), the methodology applied to extract this model and the PIOM-PX-Trace-Tool. As PIOM-PX is based on the I/O phase concept, it can identify the more significant phases. Phases that have more influence than others in the I/O system and they could provoke a bottleneck or a poor performance. Analysis based on I/O phases allows identifying, delimiting, and trying to reduce each phase’s impact on the I/O system. PIOM-PX is part of proposed model PIOM. PIOM integrates the I/O behavior model at POSIX-IO level (PIOMPX) and the I/O behavior model at MPI-IO level (PIOM-MP, formerly known as PAS2P-IO). The model provides the information necessary to replicate an application’s behavior in different systems using synthetic programmables programs. PIOM-PX-Trace-Tool allows interception of POSIX-IO instructions used during the application execution. The experiments carried out are executed in several standar HPC systems and the Cloud platform, where it is able to test the utility of the proposed model PIOM.

APA, Harvard, Vancouver, ISO, and other styles

38

Mikayelyan, Parandzem. "Parallel Realities: How to handle parallel-proceedings in investor-state disputes?" Thesis, Uppsala universitet, Juridiska institutionen, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-412153.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Saarimaki, Anton Carleton University Dissertation Computer Science. "Bulk synchronous parallel; practical experience with a model for parallel computing." Ottawa, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

40

Varisteas, Georgios. "Effective cooperative scheduling of task-parallel applications on multiprogrammed parallel architectures." Doctoral thesis, KTH, Programvaruteknik och Datorsystem, SCS, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175461.

Full text

Abstract:

Emerging architecture designs include tens of processing cores on a single chip die; it is believed that the number of cores will reach the hundreds in not so many years from now. However, most common parallel workloads cannot fully utilize such systems. They expose fluctuating parallelism, and do not scale up indefinitely as there is usually a point after which synchronization costs outweigh the gains of parallelism. The combination of these issues suggests that large-scale systems will be either multiprogrammed or have their unneeded resources powered off.Multiprogramming leads to hardware resource contention and as a result application performance degradation, even when there are enough resources, due to negative share effects and increased bus traffic. Most often this degradation is quite unbalanced between co-runners, as some applications dominate the hardware over others. Current Operating Systems blindly provide applications with access to as many resources they ask for. This leads to over-committing the system with too many threads, memory contention and increased bus traffic. Due to the inability of the application to have any insight on system-wide resource demands, most parallel workloads will create as many threads as there are available cores. If every co-running application does the same, the system ends up with threads $N$ times the amount of cores. Threads then need to time-share cores, so the continuous context-switching and cache line evictions generate considerable overhead.This thesis proposes a novel solution across all software layers that achieves throughput optimization and uniform performance degradation of co-running applications. Through a novel fully automated approach (DVS and Palirria), task-parallel applications can accurately quantify their available parallelism online, generating a meaningful metric as parallelism feedback to the Operating System. A second component in the Operating System scheduler (Pond) uses such feedback from all co-runners to effectively partition available resources.The proposed two-level scheduling scheme ultimately achieves having each co-runner degrade its performance by the same factor, relative to how it would execute with unrestricted isolated access to the same hardware. We call this fair scheduling, departing from the traditional notion of equal opportunity which causes uneven degradation, with some experiments showing at least one application degrading its performance 10 times less than its co-runners.

QC 20151016

APA, Harvard, Vancouver, ISO, and other styles

41

Wottrich, Rodolfo Guilherme 1990. "Loop parallelization in the cloud using OpenMP and MapReduce." [s.n.], 2014. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275500.

Full text

Abstract:

Orientadores: Guido Costa Souza de Araújo, Rodolfo Jardim de Azevedo
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-24T12:44:05Z (GMT). No. of bitstreams: 1 Wottrich_RodolfoGuilherme_M.pdf: 2132128 bytes, checksum: b8ac1197909b6cdaf96b95d6097649f3 (MD5) Previous issue date: 2014
Resumo: A busca por paralelismo sempre foi um importante objetivo no projeto de sistemas computacionais, conduzida principalmente pelo constante interesse na redução de tempos de execução de aplicações. Programação paralela é uma área de pesquisa ativa, na qual o interesse tem crescido devido à emergência de arquiteturas multicore. Por outro lado, aproveitar as grandes capacidades de computação e armazenamento da nuvem e suas características desejáveis de flexibilidade e escalabilidade oferece várias oportunidades interessantes para abordar problemas de pesquisa relevantes em computação científica. Infelizmente, em muitos casos a implementação de aplicações na nuvem demanda conhecimento específico de interfaces de programação paralela e APIs, o que pode se tornar um fardo na programação de aplicações complexas. Para superar tais limitações, neste trabalho propomos OpenMR, um modelo de execução baseado na sintaxe e nos princípios da API OpenMP que facilita a tarefa de programar sistemas distribuídos (isto é, clusters locais ou a nuvem remota). Especificamente, este trabalho aborda o problema de executar a paralelização de laços, usando OpenMR, em um ambiente distribuído, através do mapeamento de iterações do laço para nós MapReduce. Assim, a interface de programação para a nuvem se torna a própria linguagem, livrando o desenvolvedor da tarefa de se preocupar com detalhes da distribuição de cargas de trabalho e dados. Para avaliar a validade da proposta, modificamos benchmarks da suite SPEC OMP2012 para se encaixarem no modelo proposto, desenvolvemos outros toy benchmarks que são I/O-bound e executamo-os em duas configurações: (a) um cluster de computadores disponível localmente através de uma LAN padrão; e (b) clusters disponíveis remotamente através dos serviços Amazon AWS. Comparamos os resultados com a execução utilizando OpenMP em uma arquitetura SMP e mostramos que a técnica de paralelização proposta é factível e demonstra boa escalabilidade
Abstract: The pursuit of parallelism has always been an important goal in the design of computer systems, driven mainly by the constant interest in reducing program execution time. Parallel programming is an active research area, which has grown in interest due to the emergence of multicore architectures. On the other hand, harnessing the large computing and storage capabilities of the cloud and its desirable flexibility and scaling features offers a number of interesting opportunities to address some relevant research problems in scientific computing. Unfortunately, in many cases the implementation of applications on the cloud demands specific knowledge of parallel programming interfaces and APIs, which may become a burden when programming complex applications. To overcome such limitations, in this work we propose OpenMR, an execution model based on the syntax and principles of the OpenMP API which eases the task of programming distributed systems (i.e. local clusters or remote cloud). Specifically, this work addresses the problem of performing loop parallelization, using OpenMR, in a distributed environment, through the mapping of loop iterations to MapReduce nodes. By doing so, the cloud programming interface becomes the programming language itself, freeing the developer from the task of worrying about the details of distributing workload and data. To assess the validity of the proposal, we modified benchmarks from the SPEC OMP2012 suite to fit the proposed model, developed other I/O-bound toy benchmarks and executed them in two settings: (a) a computer cluster locally available through a standard LAN; and (b) clusters remotely available through the Amazon AWS services. We compare the results to the execution using OpenMP in an SMP architecture and show that the proposed parallelization technique is feasible and demonstrates good scalability
Mestrado
Ciência da Computação
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

42

Barrera-Gonzalez, Claudia Patricia. "Variable swing optimal parallel links minimal power, maximal density for parallel links /." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file, 120 p, 2009. http://proquest.umi.com/pqdweb?did=1818417501&sid=11&Fmt=2&clientId=8331&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Jin, Xiaoming. "A practical realization of parallel disks for a distributed parallel computing system." [Gainesville, Fla.] : University of Florida, 2000. http://etd.fcla.edu/etd/uf/2000/ane5954/master.PDF.

Full text

Abstract:

Thesis (M.S.)--University of Florida, 2000.
Title from first page of PDF file. Document formatted into pages; contains ix, 41 p.; also contains graphics. Vita. Includes bibliographical references (p. 39-40).

APA, Harvard, Vancouver, ISO, and other styles

44

Pinto, Vinícius Garcia. "Escalonamento por roubo de tarefas em sistemas Multi-CPU e Multi-GPU." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/71270.

Full text

Abstract:

Nos últimos anos, uma das alternativas adotadas para aumentar o desempenho de sistemas de processamento de alto desempenho têm sido o uso de arquiteturas híbridas. Essas arquiteturas são constituídas de processadores multicore e coprocessadores especializados, como GPUs. Esses coprocessadores atuam como aceleradores em alguns tipos de operações. Por outro lado, as ferramentas e modelos de programação paralela atuais não são adequados para cenários híbridos, produzindo aplicações pouco portáveis. O paralelismo de tarefas considerado um paradigma de programação genérico e de alto nível pode ser adotado neste cenário. Porém, exige o uso de algoritmos de escalonamento dinâmicos, como o algoritmo de roubo de tarefas. Neste contexto, este trabalho apresenta um middleware (WORMS) que oferece suporte ao paralelismo de tarefas com escalonamento por roubo de tarefas em sistemas híbridos multi-CPU e multi-GPU. Esse middleware permite que as tarefas tenham implementação tanto para execução em CPUs quanto em GPUs, decidindo em tempo de execução qual das implementações será executada de acordo com os recursos de hardware disponíveis. Os resultados obtidos com o WORMS mostram ser possível superar, em algumas aplicações, tanto o desempenho de ferramentas de referência para execução em CPU quanto de ferramentas para execução em GPUs.
In the last years, one of alternatives adopted to increase performance in high performance computing systems have been the use of hybrid architectures. These architectures consist of multicore processors and specialized coprocessors, like GPUs. Coprocessors act as accelerators in some types of operations. On the other hand, current parallel programming models and tools are not suitable for hybrid scenarios, generating less portable applications. Task parallelism, considered a generic and high level programming paradigm, can be used in this scenario. However, it requires the use of dynamic scheduling algorithms, such as work stealing. In this context, this work presents a middleware (WORMS) that supports task parallelism with work stealing scheduling in multi-CPU and multi-GPU systems. This middleware allows task implementations for both CPU and GPU, deciding at runtime which implementation will run according to the available hardware resources. The performance results obtained with WORMS showed that is possible to outperform both CPU and GPU reference tools in some applications.

APA, Harvard, Vancouver, ISO, and other styles

45

Zafalon, Geraldo Francisco Donega. "Algoritmos de alinhamento múltiplo e técnicas de otimização para esses algoritmos utilizando Ant Colony /." São José do Rio Preto : [s.n.], 2009. http://hdl.handle.net/11449/89350.

Full text

Abstract:

Orientador: José Márcio Machado
Banca: Liria Matsumoto Sato
Banca: Renata Spolon Lobato
Resumo: A biologia, como uma ciência bastante desenvolvida, foi dividida em diversas areas, dentre elas, a genética. Esta area passou a crescer em importância nos ultimos cinquenta anos devido aos in umeros benefícios que ela pode trazer, principalmente, aos seres humanos. Como a gen etica passou a apresentar problemas com grande complexidade de resolução estratégias computacionais foram agregadas a ela, surgindo assim a bioinform atica. A bioinformática desenvolveu-se de forma bastante signi cativa nos ultimos anos e esse desenvolvimento vem se acentuando a cada dia, devido ao aumento da complexidade dos problemas genômicos propostos pelos biólogos. Assim, os cientistas da computação têm se empenhado no desenvolvimento de novas técnicas computacionais para os biólogos, principalmente no que diz respeito as estrat egias para alinhamentos m ultiplos de sequências. Quando as sequências estão alinhadas, os biólogos podem realizar mais inferências sobre elas, principalmente no reconhecimento de padrões que e uma outra area interessante da bioinformática. Atrav es do reconhecimento de padrãoes, os bi ologos podem identicar pontos de alta signi cância (hot spots) entre as sequências e, consequentemente, pesquisar curas para doençass, melhoramentos genéticos na agricultura, entre outras possibilidades. Este trabalho traz o desenvolvimento e a comparação entre duas técnicas computacionais para o alinhamento m ultiplo de sequências. Uma e baseada na técnica de alinhamento múltiplo de sequências progressivas pura e a outra, e uma técnica de alinhamento múltiplo de sequências otimizada a partir da heurística de colônia de formigas. Ambas as técnicas adotam em algumas de suas fases estratégias de paralelismo, focando na redu c~ao do tempo de execução dos algoritmos. Os testes de desempenho e qualidade dos alinhamentos que foram conduzidos com as duas estrat egias... (Resumo completo, clicar acesso eletrônico abaixo)
Abstract: Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved its relevance in last fty years due to the several bene ts that it can mainly bring to the humans. As genetics starts to show problems with hard resolution complexity, computational strategies were aggregated to it, leading to the start of the bioinformatics. The bioinformatics has been developed in a signi cant way in the last years and this development is accentuating everyday due to the increase of the complexity of the genomic problems proposed by biologists. Thus, the computer scientists have committed in the development of new computational techniques to the biologists, mainly related to the strategies to multiple sequence alignments. When the sequences are aligned, the biologists can do more inferences about them mainly in the pattern recognition that is another interesting area of the bioinformatics. Through the pattern recognition, the biologists can nd hot spots among the sequences and consequently contribute for the cure of diseases, genetics improvements in the agriculture and many other possibilities. This work brings the development and the comparison between two computational techniques for the multiple sequence alignments. One is based on the pure progressive multiple sequence alignment technique and the other one is an optimized multiple sequence alignment technique based on the ant colony heuristics. Both techniques take on some of its stages of parallel strategies, focusing on reducing the execution time of algorithms. Performance and quality tests of the alignments were conducted with both strategies and showed that the optimized approach presents better results when it is compared with the pure progressive approach. Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved... (Complete abstract click electronic access below)
Mestre

APA, Harvard, Vancouver, ISO, and other styles

46

Zafalon, Geraldo Francisco Donega [UNESP]. "Algoritmos de alinhamento múltiplo e técnicas de otimização para esses algoritmos utilizando Ant Colony." Universidade Estadual Paulista (UNESP), 2009. http://hdl.handle.net/11449/89350.

Full text

Abstract:

Made available in DSpace on 2014-06-11T19:24:01Z (GMT). No. of bitstreams: 0 Previous issue date: 2009-04-30Bitstream added on 2014-06-13T19:10:03Z : No. of bitstreams: 1 zafalon_gfd_me_sjrp.pdf: 915240 bytes, checksum: 39a35a2fec9d70947eb907760544f707 (MD5)
A biologia, como uma ciência bastante desenvolvida, foi dividida em diversas areas, dentre elas, a genética. Esta area passou a crescer em importância nos ultimos cinquenta anos devido aos in umeros benefícios que ela pode trazer, principalmente, aos seres humanos. Como a gen etica passou a apresentar problemas com grande complexidade de resolução estratégias computacionais foram agregadas a ela, surgindo assim a bioinform atica. A bioinformática desenvolveu-se de forma bastante signi cativa nos ultimos anos e esse desenvolvimento vem se acentuando a cada dia, devido ao aumento da complexidade dos problemas genômicos propostos pelos biólogos. Assim, os cientistas da computação têm se empenhado no desenvolvimento de novas técnicas computacionais para os biólogos, principalmente no que diz respeito as estrat egias para alinhamentos m ultiplos de sequências. Quando as sequências estão alinhadas, os biólogos podem realizar mais inferências sobre elas, principalmente no reconhecimento de padrões que e uma outra area interessante da bioinformática. Atrav es do reconhecimento de padrãoes, os bi ologos podem identicar pontos de alta signi cância (hot spots) entre as sequências e, consequentemente, pesquisar curas para doençass, melhoramentos genéticos na agricultura, entre outras possibilidades. Este trabalho traz o desenvolvimento e a comparação entre duas técnicas computacionais para o alinhamento m ultiplo de sequências. Uma e baseada na técnica de alinhamento múltiplo de sequências progressivas pura e a outra, e uma técnica de alinhamento múltiplo de sequências otimizada a partir da heurística de colônia de formigas. Ambas as técnicas adotam em algumas de suas fases estratégias de paralelismo, focando na redu c~ao do tempo de execução dos algoritmos. Os testes de desempenho e qualidade dos alinhamentos que foram conduzidos com as duas estrat egias...
Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved its relevance in last fty years due to the several bene ts that it can mainly bring to the humans. As genetics starts to show problems with hard resolution complexity, computational strategies were aggregated to it, leading to the start of the bioinformatics. The bioinformatics has been developed in a signi cant way in the last years and this development is accentuating everyday due to the increase of the complexity of the genomic problems proposed by biologists. Thus, the computer scientists have committed in the development of new computational techniques to the biologists, mainly related to the strategies to multiple sequence alignments. When the sequences are aligned, the biologists can do more inferences about them mainly in the pattern recognition that is another interesting area of the bioinformatics. Through the pattern recognition, the biologists can nd hot spots among the sequences and consequently contribute for the cure of diseases, genetics improvements in the agriculture and many other possibilities. This work brings the development and the comparison between two computational techniques for the multiple sequence alignments. One is based on the pure progressive multiple sequence alignment technique and the other one is an optimized multiple sequence alignment technique based on the ant colony heuristics. Both techniques take on some of its stages of parallel strategies, focusing on reducing the execution time of algorithms. Performance and quality tests of the alignments were conducted with both strategies and showed that the optimized approach presents better results when it is compared with the pure progressive approach. Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved... (Complete abstract click electronic access below)

APA, Harvard, Vancouver, ISO, and other styles

47

Fox, Paul James. "Massively parallel neural computation." Thesis, University of Cambridge, 2013. https://www.repository.cam.ac.uk/handle/1810/245013.

Full text

Abstract:

Reverse-engineering the brain is one of the US National Academy of Engineering’s “Grand Challenges.” The structure of the brain can be examined at many different levels, spanning many disciplines from low-level biology through psychology and computer science. This thesis focusses on real-time computation of large neural networks using the Izhikevich spiking neuron model. Neural computation has been described as “embarrassingly parallel” as each neuron can be thought of as an independent system, with behaviour described by a mathematical model. However, the real challenge lies in modelling neural communication. While the connectivity of neurons has some parallels with that of electrical systems, its high fan-out results in massive data processing and communication requirements when modelling neural communication, particularly for real-time computations. It is shown that memory bandwidth is the most significant constraint to the scale of real-time neural computation, followed by communication bandwidth, which leads to a decision to implement a neural computation system on a platform based on a network of Field Programmable Gate Arrays (FPGAs), using commercial off- the-shelf components with some custom supporting infrastructure. This brings implementation challenges, particularly lack of on-chip memory, but also many advantages, particularly high-speed transceivers. An algorithm to model neural communication that makes efficient use of memory and communication resources is developed and then used to implement a neural computation system on the multi- FPGA platform. Finding suitable benchmark neural networks for a massively parallel neural computation system proves to be a challenge. A synthetic benchmark that has biologically-plausible fan-out, spike frequency and spike volume is proposed and used to evaluate the system. It is shown to be capable of computing the activity of a network of 256k Izhikevich spiking neurons with a fan-out of 1k in real-time using a network of 4 FPGA boards. This compares favourably with previous work, with the added advantage of scalability to larger neural networks using more FPGAs. It is concluded that communication must be considered as a first-class design constraint when implementing massively parallel neural computation systems.

APA, Harvard, Vancouver, ISO, and other styles

48

Paulos, Yonas Kinfu. "Sedimentation between parallel plates." Thesis, University of British Columbia, 1991. http://hdl.handle.net/2429/30055.

Full text

Abstract:

Settling basins can be shortened by using a stack of horizontal parallel plates which develop boundary layers in which sedimentation can occur. The purpose of this study is to examine the design parameters for such a system and to apply this approach to a fish rearing channel in which settling length is strictly limited. Flow between parallel rough and smooth plates has been modelled together with sediment concentration profile. Accurate description of boundary layer flow requires the solution of Navier-Stokes equations, and due to the complexity of the equations to be solved for turbulent flow some assumptions are made to relate the Reynolds stresses to turbulent kinetic energy and turbulent energy dissipation rate. The simplified equations are solved using a numerical method which uses the approach given by the TEACH code. The flow parameters obtained from the turbulent flow model are used to obtain the sediment concentration profile within the settling plates. Numerical solution of the sedimentation process is obtained by adopting the general transport equation. The lower plate is assumed to retain sediments reaching the bottom. The design of a sedimentation tank for a fish rearing unit with high velocity of flow has been investigated. The effectiveness of the sedimentation tank depends on the uniformity of flow attained at the inlet, and experiments were conducted to obtain the most suitable geometric system to achieve uniform flow distribution without affecting other performances of the fish rearing unit. The main difficulties to overcome were the heavy circulation present in the sedimentation tank and the clogging of the distributing system by suspended particles. Several distributing systems were investigated, the best is discussed in detail. It was concluded that a stack of horizontal parallel plates can be used in fish rearing systems where space is limited for settling sediments. Flow distribution along the vertical at the entrance to the plates determines the efficiency of the sediment settling process and a suitable geometrical configuration can be constructed to distribute the high velocity flow uniformly across the vertical. Numerical modelling of sediment removal ratio for flow between smooth and rough parallel plates has been calculated. The results show that almost the same pattern of sediment deposition occurs for both the smooth-smooth and rough-smooth plate arrangements.
Applied Science, Faculty of
Civil Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

49

Landry, Barry R. "Parallel PSK demodulator development." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp04/mq23813.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Langis, Christian. "Mesh simplification in parallel." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape7/PQDD_0020/MQ48438.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Parallel'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles