Dissertations / Theses on the topic 'Parallel computers Evaluation'

To see the other types of publications on this topic, follow the link: Parallel computers Evaluation.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Parallel computers Evaluation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Grove, Duncan A. "Performance modelling of message-passing parallel programs." Title page, contents and abstract only, 2003. http://web4.library.adelaide.edu.au/theses/09PH/09phg8832.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This dissertation describes a new performance modelling system, called the Performance Evaluating Virtual Parallel Machine (PEVPM). It uses a novel bottom-up approach, where submodels of individual computation and communication events are dynamically constructed from data-dependencies, current contention levels and the performance distributions of low-level operations, which define performance variability in the face of contention.
2

Ligon, Walter Batchelor III. "An empirical evaluation of architectural reconfigurability." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/8204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Maache, Ahmed. "A prototype parallel multi-FPGA accelerator for SPICE CMOS model evaluation." Thesis, University of Southampton, 2011. https://eprints.soton.ac.uk/173435/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to ever increasing complexity of circuits, EDA tools and algorithms are demanding more computational power. This made transistor-level simulation a growing bottleneck in the circuit development process. This thesis serves as a proof of concept to evaluate and quantify the cost of using multi-FPGA systems in SPICE-like simulations in terms of acceleration, throughput, area, and power. To this end, a multi-FPGA architecture is designed to exploit the inherent parallelism in the device model evaluation phase within the SPICE simulator. A code transformation flow which converts the high-level device model code to structural VHDL was also implemented. This ow showed that an automatic compiler system to design, map, and optimise SPICE-like simulations on FPGAs is feasible. This thesis has two main contributions. The first contribution is the multi-FPGA accelerator of the device model evaluation which demonstrated speedup of 10 times over a conventional processor, while consuming six times less power. Results also showed that it is feasible to describe and optimise FPGA pipelined implementations to exploit other class of applications similar to the SPICE device model evaluation. The constant throughput of the pipelined architecture is one of the main factors for the FPGA accelerator to outperform conventional processors. The second contribution lies in the use of multi-FPGA synthesis to optimise the inter-FPGA connections through altering the process of mapping partitions to FPGA devices. A novel technique is introduced which reduces the inter-FPGA connections by an average of 18%. The speedup and power effciency results showed that the proposed multi-FPGA system can be used by the SPICE community to accelerate the transistor-level simulation. The experimental results also showed that it is worthwhile continuing this research further to explore the use of FPGAs to accelerate other EDA tools
4

Dhandapani, Mangayarkarasi. "Performance evaluation of high performance parallel I/O." Master's thesis, Mississippi State : Mississippi State University, 2003. http://sun.library.msstate.edu/ETD-db/theses/available/etd-07072003-155031/unrestricted/mythesis.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Larriba, Pey Josep Lluís. "Design and evaluation of tridiagonal solvers for vector and parallel computers." Doctoral thesis, Universitat Politècnica de Catalunya, 1995. http://hdl.handle.net/10803/6012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Jiang, Jie Cheng. "Performance monitoring in transputer-based multicomputer networks." Thesis, University of British Columbia, 1990. http://hdl.handle.net/2429/28968.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Parallel architectures, like the transputer-based multicomputer network, offer potentially enormous computational power at modest cost. However, writing programs on a multicomputer to exploit parallelism is very difficult due to the lack of tools to help users understand the run-time behavior of the parallel system and detect performance bottlenecks in their programs. This thesis examines the performance characteristics of parallel programs in a multicomputer network, and describes the design and implementation of a real-time performance monitoring tool on transputers. We started with a simple graph theoretical model in which a parallel computation is represented as a weighted directed acyclic graph, called the execution graph. This model allows us to easily derive a variety of performance metrics for parallel programs, such as program execution time, speedup, efficiency, etc. From this model, we also developed a new analysis method called weighted critical path analysts (WCPA), which incorporates the notion of parallelism into critical path analysis and helps users identify the program activities which have the most impact on performance. Based on these ideas, the design of a real-time performance monitoring tool was proposed and implemented on a 74-node transputer-based multicomputer. Major problems in parallel and distributed monitoring addressed in this thesis are: global state and global clock, minimization of monitoring overhead, and the presentation of meaningful data. New techniques and novel approaches to these problems have been investigated and implemented in our tool. Lastly, benchmarks are used to measure the accuracy and the overhead of our monitoring tool. We also demonstrate how this tool was used to improve the performance of an actual parallel application by more than 50%.
Science, Faculty of
Computer Science, Department of
Graduate
7

Afsahi, Ahmad. "Design and evaluation of communication latency hiding/reduction techniques for message-passing environments." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0019/NQ48225.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Xiaogang. "Efficient and parallel evaluation of XQuery." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1139939037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Fu, Jingsong. "ParPlum : a system for evaluating parallel program optimization methods." PDXScholar, 1991. https://pdxscholar.library.pdx.edu/open_access_etds/4177.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The diversity of application programs and parallel architectures makes the mapping problem complicated and hard to evaluate. The quality of mapping is machine and application dependent and varies due to inaccurate values of application and architecture characteristics. A system for developing, applying and evaluating mappings must have four characteristics: (1) Simplicity: A mapping procedure can be evaluated by separately evaluating its submapping, so the complicated problem can be simplified. (2) Generality: A wide range of application programs and architectures can be easily represented and all mapping algorithms can be easily implemented. (3) Multifunctionality: all the mapping steps, application programs, target architectures, and related cost functions can vary and are easy to evaluate. (4) Ability for the sensitivity analysis: The sensitivity of mapping quality to the inaccuracy of cost functions and characteristics of applications and architectures can be easily tested. ParPlum, which is presented in this thesis, is aimed at creating and evaluating mappings on different parallel architectures with different application programs. Sensitivity analysis is another major focus. The design philosophy of ParPlum is to narrow down the multidimensional optimization problem into sub-problems with one or fewer dimensions. Mapping, for example, can be divided into three submappings, partitioning, allocating, and scheduling. This leads to the implementation of the ParPlum system, the use of data flow style, the distribution of ParPlum libraries, and the development of the ParPlum pipeline.
10

Tang, Dezheng. "Mapping Programs to Parallel Architectures in the Real World." PDXScholar, 1992. https://pdxscholar.library.pdx.edu/open_access_etds/4534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Mapping an application program to a parallel architecture can be described as a multidimensional optimization problem. To simplify the problem, we divide the overall mapping process into three sequential substeps: partitioning, allocating, and scheduling, with each step using a few details of the program and architecture description. Due to the difficulty in accurately describing the program and architecture and the fact that each substep uses incomplete information, inaccuracy is pervasive in the real-world mapping process. We hypothesize that the inaccuracy and the use of suboptimal, heuristic mapping methods may greatly affect the mapping or submapping performance and lead to a non-optimal solution. We do not discard the typical approach used by most researchers in which total execution time or speedup is the criterion to evaluate the quality of the mapping. However, we improve on this approach by including the effects of inaccuracy. We believe that, due to the presence of inaccuracy in the mapping process, investigating the impact of inaccuracy on the mapping quality is crucial to achieving good mappings. The motivation of this work is to identify the various inaccuracies during the mapping procedure and explore the sensitivity of mapping quality to the inaccurate parameters. To conduct the sensitivity examination, the Global Cluster partitioning algorithm and some models were used. The models use some program and architecture characteristics, or lower-level meters, to characterize the mapping solution space. The algorithm searches the solution space and makes the decision based on the information provided by the models. The experiments were implemented on a UNIX LAN of Sun workstations for different data flow graphs. The graphs use three parallel programming paradigms: fine grained, coarse-grained, and pipelined styles, to represent some high-level application programs: vector inner product calculation, matrix multiplication, and Gaussian elimination respectively. The experimental results show that varying system behavior affects the accuracy of lower-level meters, and the quality of the mapping algorithm is very sensitive to the inaccuracies.
11

Surma, David Ray 1963. "Design and performance evaluation of parallel architectures for image segmentation processing." Thesis, The University of Arizona, 1989. http://hdl.handle.net/10150/277042.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The design of parallel architectures to perform image segmentation processing is given. In addition, the various designs are evaluated as to their performance, and a discussion of an optimal design is given. In this thesis, a set of eight segmentation algorithms has been provided as a starting point. Four of these algorithms will be evaluated and partitioned using two techniques. From this study of partitioning and considering the data flow through the total system, architectures utilizing parallel techniques will be derived. Timing analysis using pen and paper techniques will be given on the architectures using three of today's current technologies. Next, NETWORK II.5 simulations will be run to provide performance measures. Finally, evaluations of the various architectures will be made as well as the applicability of using NETWORK II.5 as a simulation language.
12

Sankaran, Rajesh Madukkarumukumana. "Performance Evaluation of Specialized Hardware for Fast Global Operations on Distributed Memory Multicomputers." PDXScholar, 1995. https://pdxscholar.library.pdx.edu/open_access_etds/4919.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Workstation cluster multicomputers are increasingly being applied for solving scientific problems that require massive computing power. Parallel Virtual Machine (PVM) is a popular message-passing model used to program these clusters. One of the major performance limiting factors for cluster multicomputers is their inefficiency in performing parallel program operations involving collective communications. These operations include synchronization, global reduction, broadcast/multicast operations and orderly access to shared global variables. Hall has demonstrated that a .secondary network with wide tree topology and centralized coordination processors (COP) could improve the performance of global operations on a variety of distributed architectures [Hall94a]. My hypothesis was that the efficiency of many PVM applications on workstation clusters could be significantly improved by utilizing a COP system for collective communication operations. To test my hypothesis, I interfaced COP system with PVM. The interface software includes a virtual memory-mapped secondary network interface driver, and a function library which allows to use COP system in place of PVM function calls in application programs. My implementation makes it possible to easily port any existing PVM applications to perform fast global operations using the COP system. To evaluate the performance improvements of using a COP system, I measured cost of various PVM global functions, derived the cost of equivalent COP library global functions, and compared the results. To analyze the cost of global operations on overall execution time of applications, I instrumented a complex molecular dynamics PVM application and performed measurements. The measurements were performed for a sample cluster size of 5 and for message sizes up to 16 kilobytes. The comparison of PVM and COP system global operation performance clearly demonstrates that the COP system can speed up a variety of global operations involving small-to-medium sized messages by factors of 5-25. Analysis of the example application for a sample cluster size of 5 show that speedup provided by my global function libraries and the COP system reduces overall execution time for this and similar applications by above 1.5 times. Additionally, the performance improvement seen by applications increases as the cluster size increases, thus providing a scalable solution for performing global operations.
13

Hansen, Christian Leland. "Towards Comparative Profiling of Parallel Applications with PPerfDB." PDXScholar, 2001. https://pdxscholar.library.pdx.edu/open_access_etds/2666.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the complex nature of parallel programming, it is difficult to diagnose and solve performance related problems. Knowledge of program behavior is obtained experimentally, with repeated runs of a slightly modified version of the application or the same code in different environments. In these circumstances, comparative performance analysis can provide meaningful insights into the subtle effects of system and code changes on parallel program behavior by highlighting the difference in performance results across executions. I have designed and implemented modules which extend the PPerfDB performance tool to allow access to existing performance data generated by several commonly used tracing tools. Access occurs from within the experiment management framework provided by PPerfDB for the identification of system parameters, the representation of multiple sets of execution data, and the formulation of data queries. Furthermore, I have designed and implemented an additional module that will generate new data using dynamic instrumentation under the control of PPerfDB. This was done to enable the creation of novel experiments for performance hypothesis testing and to ultimately automate the diagnostic and tuning process. As data from such diverse sources has very different representations, various techniques to allow comparisons are presented. I have generalized the definition of the Performance Difference operator, which automatically detects divergence in multiple data sets, and I have defined an Overlay operation to provide uniform access to both dynamically generated and tracefile based data. The use and application of these new operations along with an indication of some of the issues involved in the creation of a fully automatic comparative profilier is presented via several case studies performed on an IBM SP2 using different versions of an MPI application.
14

Mohror, Kathryn Marie. "Scalable event tracking on high-end parallel systems." PDXScholar, 2010. https://pdxscholar.library.pdx.edu/open_access_etds/2811.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amount of data measured and stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time.
15

Parker, Brandon S. "CLUE: A Cluster Evaluation Tool." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5444/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Modern high performance computing is dependent on parallel processing systems. Most current benchmarks reveal only the high level computational throughput metrics, which may be sufficient for single processor systems, but can lead to a misrepresentation of true system capability for parallel systems. A new benchmark is therefore proposed. CLUE (Cluster Evaluator) uses a cellular automata algorithm to evaluate the scalability of parallel processing machines. The benchmark also uses algorithmic variations to evaluate individual system components' impact on the overall serial fraction and efficiency. CLUE is not a replacement for other performance-centric benchmarks, but rather shows the scalability of a system and provides metrics to reveal where one can improve overall performance. CLUE is a new benchmark which demonstrates a better comparison among different parallel systems than existing benchmarks and can diagnose where a particular parallel system can be optimized.
16

Suh, Taeweon. "Integration and Evaluation of Cache Coherence Protocols for Multiprocessor SoCs." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
System-on-a-chip (SoC) designs is characterized by heavy reuse of IP blocks to satisfy specific computing needs for target applications, reduce overall design cost, and expedite time-to-market. To meet their performance goal and cost constraint, SoC designers integrate multiple, sometimes heterogeneous, processor IPs to perform particular functions. This design approach is called Multiprocessor SoC (MPSoC). In this thesis, I investigated generic methodologies for enabling efficient communication among heterogeneous processors and quantified the efficiency of coherence traffic. Hardware techniques for two main MPSoC architectures were studied: Integration of cache coherence protocols for shared-bus-based MPSoCs and Cache coherence support for non-shared-bus-based MPSoCs. In the shared-bus-based MPSoCs, the integration techniques guarantee data consistency among incompatible coherence protocols. An integrated protocol will contain common states from these coherence protocols. A snoop-hit buffer and region-based cache coherence were also proposed to further enhance the coherence performance. For the non-shared-bus-based MPSoCs, bypass and bookkeeping approaches were proposed to maintain coherence in a new cache coherence-enforced memory controller. The simulations based on micro-benchmark and RTOS kernel showed the benefits of my methodologies over a generic software solution. This thesis also evaluated and quantified the efficiency of coherence traffic based on a novel emulation platform using FPGA. The proposed technique can completely isolate the intrinsic delay of the coherence traffic to demonstrate the impact of coherence traffic on system performance. Unlike previous evaluation methods, this technique eliminated non-deterministic factors in measurements such as bus arbitration delay and stall in the pipelined bus. The experimental results showed that the cache-to-cache transfer in the Intel server system is less efficient than the main memory access.
17

Squillante, Mark S. "Issues in shared-memory multiprocessor scheduling : a performance evaluation /." Thesis, Connect to this title online; UW restricted, 1990. http://hdl.handle.net/1773/6858.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Leung, K. H. W., and 梁海宏. "Implementation and performance evaluation of doubly-linked list protocols on a cluster of workstations." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1999. http://hub.hku.hk/bib/B31223060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Balasubramaniam, Mahadevan. "Performance analysis and evaluation of dynamic loop scheduling techniques in a competitive runtime environment for distributed memory architectures." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04022003-154254.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Codrescu, Lucian. "An evaluation of the Pica architecture for an object recognition application." Thesis, Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/15483.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Subbiah, Arun. "Design and evaluation of a distributed diagnosis algorithm for arbitrary network topologies in dynamic fault environments." Thesis, Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/13273.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Grossman, J. P. 1973. "Design and evaluation of the Hamal parallel computer." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/16909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2003.
"December 2002."
Includes bibliographical references (p. 145-152).
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Parallel shared-memory machines with hundreds or thousands of processor-memory nodes have been built; in the future we will see machines with millions or even billions of nodes. Associated with such large systems is a new set of design challenges. Many problems must be addressed by an architecture in order for it to be successful; of these, we focus on three in particular. First, a scalable memory system is required. Second, the network messaging protocol must be fault-tolerant. Third, the overheads of thread creation, thread management and synchronization must be extremely low. This thesis presents the complete system design for Hamal, a shared-memory architecture which addresses these concerns and is directly scalable to one million nodes. Virtual memory and distributed objects are implemented in a manner that requires neither inter-node synchronization nor the storage of globally coherent translations at each node. We develop a lightweight fault-tolerant messaging protocol that guarantees message delivery and idempotence across a discarding network. A number of hardware mechanisms provide efficient support for massive multithreading and fine-grained synchronization.
(cont.) Experiments are conducted in simulation, using a trace-driven network simulator to investigate the messaging protocol and a cycle-accurate simulator to evaluate the Hamal architecture. We determine implementation parameters for the messaging protocol which optimize performance. A discarding network is easier to design and can be clocked at a higher rate, and we find that with this protocol its performance can approach that of a non-discarding network. Our simulations of Hamal demonstrate the effectiveness of its thread management and synchronization primitives. In particular, we find register-based synchronization to be an extremely efficient mechanism which can be used to implement a software barrier with a latency of only 523 cycles on a 512 node machine.
by J.B. Grossman.
Ph.D.
23

Sivasubramaniam, Anand. "A framework for evaluating architectural issues of parallel systems." Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/9194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Pennycook, Simon J. "Evaluating the performance of legacy applications on emerging parallel architectures." Thesis, University of Warwick, 2012. http://wrap.warwick.ac.uk/57050/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The gap between a supercomputer's theoretical maximum ("peak") floating-point performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5-20% of any given machine's peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern "accelerator" architectures - collections of hundreds of simple, low-clocked cores capable of executing the same instruction on dozens of pieces of data simultaneously. This is a significant change from the low number of high-clocked cores found in traditional CPUs, and effective utilisation of accelerators typically requires extensive code and algorithmic changes. In many cases, the best way in which to map a parallel workload to these new architectures is unclear. The principle focus of the work presented in this thesis is the evaluation of emerging parallel architectures (specifically, modern CPUs, GPUs and Intel MIC) for two benchmark codes - the LU benchmark from the NAS Parallel Benchmark Suite and Sandia's miniMD benchmark - which exhibit complex parallel behaviours that are representative of many scientific applications. Using combinations of low-level intrinsic functions, OpenMP, CUDA and MPI, we demonstrate performance improvements of up to 7x for these workloads. We also detail a code development methodology that permits application developers to target multiple architecture types without maintaining completely separate implementations for each platform. Using OpenCL, we develop performance portable implementations of the LU and miniMD benchmarks that are faster than the original codes, and at most 2x slower than versions highly-tuned for particular hardware. Finally, we demonstrate the importance of evaluating architectures at scale (as opposed to on single nodes) through performance modelling techniques, highlighting the problems associated with strong-scaling on emerging accelerator architectures.
25

Pout, Mike. "Performance evaluation of an associative processor array for computer vision tasks." Thesis, University of Bristol, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Söderquist, Fredrik. "Evaluation of Methodology for Parallel Scheduling." Thesis, Linköping University, Department of Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2867.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

In the rapidly progressing evolution of technology, more and more emphasize is put on developing proper tools for the task of designing new and revolutionary systems. These tools are required in order to allow for a designer to fully utilize the power of new architectures and techniques. This thesis examines the current state of available scheduling tools for embedded systems, by evaluating and analyzing a number of di erent tools. An attempt is made to provide an overview of how the tools are constructed, and what types of methodology have been used.

27

Hernzndes-Gonzalez, Emilio. "A methodology for the design of parallel benchmarks." Thesis, University of Southampton, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.242181.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

O'Gorman, Russell John. "Design and application of the RPA II." Thesis, University of Southampton, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.330153.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Hall, James E. "Performance evaluations of a parallel and expandable database computer -- the Multi-Backend Database computer." Thesis, Monterey, California. Naval Postgraduate School, 1989. http://hdl.handle.net/10945/27202.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Papaefstathiou, Efstathios. "A framework for characterising parallel systems for performance evaluation." Thesis, University of Warwick, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307345.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Komathukattil, Deepa V. "Evaluating Speedup in Parallel Compilers." UNF Digital Commons, 2012. http://digitalcommons.unf.edu/etd/417.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Parallel programming is prevalent in every field mainly to speed up computation. Advancements in multiprocessor technology fuel this trend toward parallel programming. However, modern compilers are still largely single threaded and do not take advantage of the machine resources available to them. There has been a lot of work done on compilers that add parallel constructs to the programs they are compiling, enabling programs to exploit parallelism at run time. Auto parallelization of loops by a compiler is one such example. Researchers have done very little work towards parallelizing the compilation process itself. The research done here focuses on parallel compilers that target computation speedup by parallelizing the process of program compilation during the lexical analysis and semantic analysis phase. Parallelization brings along with it issues like synchronization, concurrency and communication overhead. In the semantic analysis phase, these issues are of particular relevance during the construction of the symbol table. Research done on a concurrent compiler developed at the University of Toronto in 1991 proposed three techniques to address the generation of the symbol table [Seshadri91]. The goal here is to implement a parallel compiler using concepts from those techniques as references. The research done here will augment the work done formerly and measure the performance speedup obtained.
32

Teran, Maria. "An analysis for evaluating the cost/profit effectiveness of parallel systems." Master's thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-10282002-201316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Lingtorp, Alexander, and Simon Mossmyr. "Performance comparison of parallel turbulent noise evaluation with different gradient selection methods." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-208410.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Noise is of vital interest in many parts of computer sciene, especially in the computer graphics eld where noise is used to create nature-like e ects. Perlin’s 1985 algorithm to generate noise remains the most pop- ular in spite of many alternatives having been presented over the years. In this report we have examined the execution time impact of two new gradient table data structures and a new hash method for this algo- rithm, suggested by Perlin in 2002 and Olano in 2005 respectively. Our implementation simulated turbulence and ran in parallel on a modern GPU using the OpenCL framework. We also examined if the turbulence method’s octave summation could bene t from parallelization. Results suggest that Olano’s hash method performs signi cantly faster, while Perlin’s original gradient table data structure performs slightly faster than the suggested improvements. We also found that a paral- lelization of the octave summation in the turbulence method performs signi cantly faster.
Brusalstring är av stort intresse i många datavetenskapliga områden, speciellt i datorgra ksfältet där det används för att simulera naturliga fenomen. Perlins algoritm från 1985 fortsätter vara den mest populära, trots att åtskilliga alternativ har presenterats genom åren. I den här rap- porten har vi undersökt hur exekveringstiden av denna algoritm påverkas av två nya gradientliststrukturer och en ny hashmetod som föreslagits av Perlin i 2002 respektive Olano i 2005. Vår implementation simulerade virvelströmning och kördes parallelt på en modern GPU via ramverket OpenCL. Vi undersökte också om virvelströmningens oktavsummering kunde förbättras med parallelisering. Resultaten var att Olanos föreslagna hashmetod förbättrade körtiden betydligt, medan gradientliststrukturen i Perlins ursprungliga algoritm resulterade in en snabbare körtid än de två föreslagna förbättringarna. Vi fann också att en parallelisering av virvelströmningens oktavsummation förbättrade körtiden betydligt.
34

Engström, Gustav, and Marcus Falgert. "Implementation and Evaluation of Concurrency on Parallella." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177385.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The question asked is what optimizations can be done when working with the Parallella board from Adapteva and how they di er from other concurrent solutions. Parallella is a small super computer with a unique 16 core co-processor that we were to utilize. We have been working to parallelizing image manipulation software, and then analyzing the results of some performed tests. The goal is to conclude how to properly utilize the Epiphany accelerator, and also see how it performs in comparison to other CPUs. This project is a part of the PaPP project, which will utilize Parallella, and the work can be seen as an initial evaluation of the board. We have tested the board to see how it holds up and made our best e orts to adapt to the hardware and explain our path of working. This report is worth reading for anyone who has little experience with Parallella, who desires to learn how well it works and what it's good for. There are descriptions of all libraries used and detailed thoughts on how to implement software solutions for Epiphany. This is a bachelor level project and was performed with no prior knowledge of Parallella.
35

Markomanolis, Georgios. "Performance Evaluation and Prediction of Parallel Applications." Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2014. http://tel.archives-ouvertes.fr/tel-00951125.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Analyzing and understanding the performance behavior of parallel applicationson various compute infrastructures is a long-standing concern in the HighPerformance Computing community. When the targeted execution environments arenot available, simulation is a reasonable approach to obtain objectiveperformance indicators and explore various ''what-if?'' scenarios. In thiswork we present a framework for the off-line simulation of MPIapplications. The main originality of our work with regard to the literature is to rely on\tit execution traces. This allows for an extreme scalability as heterogeneousand distributed resources can be used to acquire a trace. We propose a formatwhere for each event that occurs during the execution of an application we logthe volume of instructions for a computation phase or the bytes and the type ofa communication. To acquire time-independent traces of the execution of MPI applications, wehave to instrument them to log the required data. There exist many profilingtools which can instrument an application. We propose a scoring system thatcorresponds to our framework specific requirements and evaluate the mostwell-known and open source profiling tools according to it. Furthermore weintroduce an original tool called Minimal Instrumentation that was designed tofulfill the requirements of our framework. We study different instrumentationmethods and we also investigate several acquisition strategies. We detail thetools that extract the \tit traces from the instrumentation traces of somewell-known profiling tools. Finally we evaluate the whole acquisition procedureand we present the acquisition of large scale instances. We describe in detail the procedure to provide a realistic simulated platformfile to our trace replay tool taking under consideration the topology of thereal platform and the calibration procedure with regard to the application thatis going to be simulated. Moreover we present the implemented trace replaytools that we used during this work. We show that our simulator can predictthe performance of some MPI benchmarks with less than 11\% relativeerror between the real execution and simulation for the cases that there is noperformance issue. Finally, we identify the reasons of the performance issuesand we propose solutions.
36

Rudd, Kevin Edward. "Parallel three-dimensional acoustic and elastic wave simulation methods with applications in nondestructive evaluation." W&M ScholarWorks, 2007. https://scholarworks.wm.edu/etd/1539623332.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this dissertation, we present two parallelized 3D simulation techniques for three-dimensional acoustic and elastic wave propagation based on the finite integration technique. We demonstrate their usefulness in solving real-world problems with examples in the three very different areas of nondestructive evaluation, medical imaging, and security screening. More precisely, these include concealed weapons detection, periodontal ultrasography, and guided wave inspection of complex piping systems. We have employed these simulation methods to study complex wave phenomena and to develop and test a variety of signal processing and hardware configurations. Simulation results are compared to experimental measurements to confirm the accuracy of the parallel simulation methods.
37

Rahman, Kamela Choudhury. "Complete Design Methodology of a Massively Parallel and Pipelined Memristive Stateful IMPLY Logic Based Reconfigurable Architecture." PDXScholar, 2016. http://pdxscholar.library.pdx.edu/open_access_etds/2956.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Continued dimensional scaling of CMOS processes is approaching fundamental limits and therefore, alternate new devices and microarchitectures are explored to address the growing need of area scaling and performance gain. New nanotechnologies, such as memristors, emerge. Memristors can be used to perform stateful logic with nanowire crossbars, which allows for implementation of very large binary networks that can be easily reconfigured. This research involves the design of a memristor-based massively parallel datapath for various applications, specifically SIMD (Single Instruction Multiple Data) like architecture, and parallel pipelines. The dissertation develops a new model of massively parallel memristor-CMOS hybrid datapath architectures at a systems level, as well as a complete methodology to design them. One innovation of the proposed approach is that the datapath design is based on space-time diagrams that use stateful IMPLY gates built from binary memristors. This notation aids in the circuit minimization in logic design, calculations of delay and memristor costs, and sneak-path avoidance. Another innovation of the proposed methodology is a general, new, architecture model, MsFSMD (Memristive stateful Finite State Machine with Datapath) that has two interacting sub-systems: 1) a controller composed of a memristive RAM, MsRAM, to act as a pulse generator, along with a finite state machine realized in CMOS, a CMOS counter, CMOS multiplexers and CMOS decoders, 2) massively parallel, pipelined, datapath realized with a new variant of a CMOL-like nanowire crossbar array, MsCMOL (Memristive stateful CMOL), with binary stateful memristor-based IMPLY gates. Next contribution of the dissertation is the new type of FPGA. In contrast to the previous memristor-based FPGA (mrFPGA), the proposed MsFPGA (Memristive stateful logic Field Programmable Gate Array) uses memristors for memory, connections programming, and combinational logic implementation. With a regular structure of square abutting blocks of memristive nanowire crossbars and their short connections, proposed architecture is highly reconfigurable. As an example of using the proposed new FPGA to realize biologically inspired systems, the detailed design of a pipelined Euclidean Distance processor was presented and its various applications are mentioned. Euclidean Distance calculation is widely used by many neural network and associative memory based algorithms.
38

Zhou, Jun. "Parallel Go on CUDA with Monte Carlo Tree Search." University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1367942396.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Li, Shen Carmen C. Duren Russell Walker. "Evaluating Impulse C and multiple parallelism partitions for a low-cost reconfigurable computing system." Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5280.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Karlbom, David. "A Performance Evaluation of MPI Shared Memory Programming." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188676.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on modern hardware architecture with multiple Non-Uniform Memory Access (NUMA) domains. We investigate its performance in two case studies: the matrix-matrix multiplication and Conway’s game of life. We compare MPI shared memory performance in terms of execution time and memory consumption with the performance of implementations using OpenMP and MPI point-to-point communication, also called "MPI two-sided". We perform strong scaling tests in both test cases. We observe that MPI two-sided implementation is 21% and 18% faster than the MPI shared and OpenMP implementations respectively in the matrix-matrix multiplication when using 32 processes. MPI shared uses less memory space: when compared to MPI two-sided, MPI shared uses 45% less memory. In the Conway’s game of life, we find that MPI two-sided implementation is 10% and 82% faster than the MPI shared and OpenMP implementations respectively when using 32 processes. We also observe that not mapping virtual memory to a specific NUMA domain can lead to an increment in execution time of 64% when using 32 processes. The use of MPI shared is viable for intranode communication on modern hardware architecture with multiple NUMA domains.
I detta examensarbete undersöker vi Message Passing Inferfaces (MPI) support för shared memory programmering på modern hårdvaruarkitektur med flera Non-Uniform Memory Access (NUMA) domäner. Vi undersöker prestanda med hjälp av två fallstudier: matris-matris multiplikation och Conway’s game of life. Vi jämför prestandan utav MPI shared med hjälp utav exekveringstid samt minneskonsumtion jämtemot OpenMP och MPI punkt-till-punkt kommunikation, även känd som MPI two-sided. Vi utför strong scaling tests för båda fallstudierna. Vi observerar att MPI-two sided är 21% snabbare än MPI shared och 18% snabbare än OpenMP för matris-matris multiplikation när 32 processorer användes. För samma testdata har MPI shared en 45% lägre minnesförburkning än MPI two-sided. För Conway’s game of life är MPI two-sided 10% snabbare än MPI shared samt 82% snabbare än OpenMP implementation vid användandet av 32 processorer. Vi kunde också utskilja att om ingen mappning av virtuella minnet till en specifik NUMA domän görs, leder det till en ökning av exekveringstiden med upp till 64% när 32 processorer används. Vi kom fram till att MPI shared är användbart för intranode kommunikation på modern hårdvaruarkitektur med flera NUMA domäner.
41

SARAIVA, SILVA IVAN. "Evaluation des performances au niveau systeme d'une architecture simd appliquee a la comparaison de sequences genetiques." Paris 6, 1995. http://www.theses.fr/1995PA066461.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Comparaison de sequences genetiques est actuellement une activite courante en biologie moleculaire. Les comparaisons dites exactes sont, le plus souvent, executees avec des algorithmes de programmation dynamique tels que les algorithmes de needleman et wunsch ou de smith et waterman. Ces algorithmes presentant cependant une complexite theorique de l'ordre de o(m. N), ou m et n sont les tailles des sequences comparees. Cette complexite, alliee au grand volume des bases de donnees genetiques, rend impossible la realisation de comparaisons exhaustives sur des machines sequentielles classiques. Cette these presente les evaluations de performance d'une nouvelle architecture parallele, simd et systolique, appliquee a la comparaison de sequences genetiques. Cette architecture se presente sous la forme d'une carte coprocesseur montee sur un bus standard (le bus pci) pour ordinateurs du type pc ou compatibles. Pour la realisation des mesures de performances et analyses presentees dans ici nous avons developpe une version parallele de l'algorithme de smith et waterman et simule son execution sur la carte coprocesseur. Les simulations ont ete executees avec un simulateur precis, au niveau du cycle, capable de prendre en compte le temps des calculs effectues par le coprocesseur et le temps des echanges de donnees geres par l'hote. L'utilisation d'un tel simulateur nous a permis aussi de generer des vecteurs de teste pour la validation d'un modele structurel vhdl de l'architecture
42

La, Fratta Patrick Anthony. "Evaluating the Design and Performance of a Single-Chip Parallel Computer Using System-Level Models and Methodology." Thesis, Virginia Tech, 2005. http://hdl.handle.net/10919/32424.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
As single-chip systems are predicted to soon contain over a billion transistors, design methodologies are evolving dramatically to account for the fast evolution of technologies and product properties. Novel methodologies feature the exploration of design alternatives early in development, the support for IPs, and early error detection â all with a decreasing time-to-market. In order to accommodate these product complexities and development needs, the modeling levels at which designers are working have quickly changed, as development at higher levels of abstraction allows for faster simulations of system models and earlier estimates of system performance while considering design trade-offs. Recent design advancements to exploit instruction-level parallelism on single-processor computer systems have become exceedingly complex, and modern applications are presenting an increasing potential to be partitioned and parallelized at the thread level. The new Single-Chip, Message-Passing (SCMP) parallel computer is a tightly coupled mesh of processing nodes that is designed to exploit thread-level parallelism as efficiently as possible. By minimizing the latency of communication among processors, memory access time, and the time for context switching, the system designer will undoubtedly observe an overall performance increase. This study presents in-depth evaluations and quantitative analyses of various design and performance aspects of SCMP through the development of abstract hardware models by following a formalized, well-defined methodology. The performance evaluations are taken through benchmark simulation while taking into account system-level communication and synchronization among nodes as well as node-level timing and interaction amongst node components. Through the exploration of alternatives and optimization of the components within the SCMP models, maximum system performance in the hardware implementation can be achieved.
Master of Science
43

Savas, Suleyman. "Implementation and Evaluation of MPEG-4 Simple Profile Decoder on a Massively Parallel Processor Array." Thesis, Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-14549.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The high demand of the video decoding has pushed the developers to implement the decoders on parallel architectures. This thesis provides the deliberations about the implementation of an MPEG-4 decoder on a massively parallel processor array (MPPA), Ambric 2045, by converting the CAL actor language implementation of the decoder. This decoder is the Xilinx model of the MPEG-4 Simple Profile decoder and consists of four main blocks; parser, acdc, idct2d and motion. The parser block is developed in another thesis work [20] and the rest of the decoder, which consists of the other three blocks, is implemented in this thesis work. Afterwards, in order to complete the decoder, the parser block is combined with the other three blocks. Several methods are developed for conversion purposes. Additionally, a number of other methods are developed in order to overcome the constraints of the ambric architecture such as no division support. At the beginning, for debugging purposes, the decoder is implemented on a simulator which is designed for Ambric architecture. Finally the implementation is uploaded to the Ambric 2045 chip and tested with different input streams. The performance of the implementation is analyzed and satisfying results are achieved when compared to the standards which are in use in the market. These performance results can be considered as satisfying for any real-time application as well. Furthermore, the results are compared with the results of the CAL implementation, running on a single 2GHz i7 intel processor, in terms of speed and efficiency. The Ambric implementation runs 4,7 times faster than the CAL implementation when a small input stream (300 frames with resolution of 176x144) is used. However, when a large input stream (384 frames with resolution of 720x480) is used, the Ambric implementation shows a performance which is approximately 32 times better than the CAL implementation, in terms of decoding speed and throughput. The performance may increase further together with the size of the input stream up to some point.
44

Elliott, Grant (Grant Andrew). "Design and evaluation of a quasi-passive robotic knee brace : on the effects of parallel elasticity on human running." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/71474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 103-106).
While the effects of series compliance on running biomechanics are documented, the effects of parallel compliance are known only for the simpler case of hopping. As many practical exoskeleton and orthosis designs act in parallel with the leg, it is desirable to understand the effects of such an intervention. Spring-like forces offer a natural choice of perturbation, as they are both biologically motivated and energetically inexpensive to implement. To this end, this thesis investigates the hypothesis that the addition of an external elastic element at the knee during the stance phase of running results in a reduction in knee extensor activation so that total joint quasi-stiffness is maintained. To test this, an exoskeleton is presented, consisting of a leaf spring in parallel with the knee joint and a clutch which engages this spring only during stance. The design of a custom interference clutch, made necessary by the need for high holding torque but low mass, is discussed in detail, as are problems of human attachment. The greater applicability of this clutch design to other problems in rehabilitation and augmentation is also addressed. Motion capture of four subjects is used to investigate the consequences of running with this exoskeleton. Leg stiffness is found to increase with distal mass, but no significant change in leg stiffness or total knee stiffness is observed due to the activation of the clutched parallel knee spring. However, preliminary evidence suggests differing responses between trained marathon runners, who appear to maintain biological knee torque, and recreational runners, who appear to maintain total knee torque. Such a relationship between degree of past training and effective utilization of an external force is suggestive of limitations on the applications of assistive devices.
by Grant A. Elliott.
Ph.D.
45

Wang, Xiaoyang. "Evaluation of two word alignment systems." Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2215.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and I*Trix system developed at IDA/NLPLab that generates word pairs with frequencies.

The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while I*Trix is better for small corpora. Especially for corpora with high statistical ratio or special resource, I*Trix has a better performance.

46

Gannholm, Lovisa. "A Comparative Evaluation Between Two Design Solutions for an Information Dashboard." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-102134.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This study is a software usability design case about information presentation in a software dash­board. The dashboard is supposed to present system information about an enterprise resource planning system. The study aims to evaluate if the intended users of the dash­board prefer a list-based or an object-based presentation of the information and why. It also investigates if the possi­bility to get familiar with the prototype affects the evaluation’s result. The study was performed using parallel prototypes and evaluation with users. The use of parallel prototypes is a rather unexplored area. Likewise, little research has been done in the area of how user experience changes over time. Two prototypes were created, presenting the same information in two different design solutions, one list-based, and one object-based. The prototypes were evaluated with ten presumptive users, with respect to usability. The evaluation consisted of two parts, one quantitative and one qualita­tive. Half of the respondents got a chance to get familiar with the list-based prototype, and half the object-based prototype, after which they evaluated both sequentially. The result of the evaluation showed that seven out of ten respondents preferred the list-based prototype. The two primary reasons were that they are more used to the list-based concept from their work, and that the list-based prototype presented all information about an application at once. In the object-based prototype the user had to make a request for each type of information, which opened up in a new pop-up window. The primary reason that three of the ten respondents preferred the object-based prototype was that it had a more modern look, and gave a cleaner impression since it only presented the information the respondent was interested in at each point in time. The result also implied that the possibility to get familiar with the prototype by testing it for a couple of days affected the result. Eight out of ten respondents preferred the prototype they got familiar to, and the only ones that liked or preferred the object-based prototype were those who had gotten familiar with it.   The results of the study support the results of the existing research done by Dow et al. (2010) on the use of parallel prototypes, i.e. creating several prototypes in parallel, and conform with the results of the research of Karapanos et al. (2009) on how user experience changes over time. Some other interesting information that emerged from the study was that all but one of the respondents thought that the prototype they got familiar with had an acceptable level of usability. The study also validated that all respondents are positive to use a dashboard in their work, and that the presented information was enough for a first version of the dashboard. It also validated that the different groups of users would use the dashboard differently, and therefore are in need of slightly different information.
47

J'lali, Yousra. "DirectX 12: Performance Comparison Between Single- and Multithreaded Rendering when Culling Multiple Lights." Thesis, Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20201.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background. As newer computers are constructed, more advanced and powerful hardware come along with them. This leads to the enhancement of various program attributes and features by corporations to get ahold of the hardware, hence, improving performance. A relatively new API which serves to facilitate such logic, is Microsoft DirectX 12. There are numerous opinions about this specific API, and to get a slightly better understanding of its capabilities with hardware utilization, this research puts it under some tests. Objectives. This article’s aim is to steadily perform tests and comparisons in order to find out which method has better performance when using DirectX 12; single-threading, or multithreading. For performance measurements, the average CPU and GPU utilizations are gathered, as well as the average FPS and the speed of which it takes to perform the Render function. When all results have been collected, the comparison between the methods are assessed. Methods. In this research, the main method which is being used is experiments. To find out the performance differences between the two methods, they must undergo different trials while data is gathered. There are four experiments for the single-threaded and multithreaded application, respectively. Each test varies in the number of lights and objects that are rendered in the simulation environment, gradually escalading from 50; then 100; 1000; and lastly, 5000. Results. A similar pattern was discovered throughout the experiments, with all of the four tests, where the multithreaded application used considerably more of the CPU than the single-threaded version. And despite there being less simultaneous work done by the GPU in the one-threaded program, it appeared to be using more GPU utilization than multithreading. Furthermore, the system with many threads tended to perform the Render function faster than its counterpart, regardless of which test was executed. Nevertheless, both applications never differed in FPS. Conclusion. Half of the hypotheses stated in this article were contradicted after some unexpected tun of events. It was believed that the multithreaded system would utilize less of the CPU and more of the GPU. Instead, the outcome contradicted the hypotheses, thus, opposing them. Another theory believed that the system with multiple threads would execute the Render function faster than the other version, a hypothesis that was strongly supported by the results. In addition to that, more objects and lights inserted into the scene did increased the applications’ utilization in both the CPU and GPU, which also supported another hypothesis. In conclusion, the multithreaded program performs faster but still has no gain in FPS compared to single-threading. The multithreaded version also utilizes more CPU and less GPU
48

Hazra, Tushar K., Richard A. Stephenson, and Gregory M. Troendly. "EVOLUTION OF THE COST EFFECTIVE, HIGH PERFORMANCE GROUND SYSTEMS: A QUANTITATIVE APPROACH." International Foundation for Telemetering, 1994. http://hdl.handle.net/10150/608555.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
International Telemetering Conference Proceedings / October 17-20, 1994 / Town & Country Hotel and Conference Center, San Diego, California
During the recent years of small satellite space access missions, the trend has been towards designing low-cost ground control centers to maintain the space/ground cost ratio. The use of personal computers (PC) in combination with high speed transputer modules as embedded parallel processors, provides a relatively affordable, highly versatile, and reliable desktop workstation upon which satellite telemetry systems can be built to meet the ever-growing challenge of the space missions today and of the future. This paper presents the feasibility of cost effective, high performance ground systems and a quantitative analysis and study in terms of performance, speedup, efficiency, and the compatibility of the architecture to commercial off the shelf (COTS) tools, and finally, introduces an operational high performance, low cost ground system to strengthen the insight of the concept.
49

Nguyen, Ken D. "Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations." Digital Archive @ GSU, 2011. http://digitalarchive.gsu.edu/cs_diss/62.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences' structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm.
50

Jacquiot, Olivier. "Evaluation d'architectures parallèles à mémoire virtuelle partagée distribuée : étude et réalisation d'un émulateur." Phd thesis, Grenoble INPG, 1996. http://tel.archives-ouvertes.fr/tel-00004995.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Le but principal de cette thèse est d'étudier et de réaliser un émulateur performant de machines parallèles dotées d'une mémoire virtuelle partagée distribuée. Cet émulateur doit permettre d'évaluer la charge induite par des machines de ce type sur le réseau d'interconnexion, afin d'en choisir la meilleure topologie. Pour cela, ce travail est divisé en deux parties. La première est constituée d'une étude de l'éventail des techniques pouvant être utilisées lors de la construction d'une hiérarchie de mémoires ou lors du maintien de la cohérence des données contenues dans cette hiérarchie. La seconde partie décrit le fonctionnement de l'émulateur. Pour que celui-ci soit performant, il faut qu'il puisse faire varier un nombre important de paramètres de la machine émulée et qu'il puisse utiliser un grand nombre d'applications de taille significative. Pour cela, nous utilisons une technique qui permet de réellement exécuter les instructions et de ne simuler que les envois de pages sur le réseau. Les paramètres de l'émulateur sont le nombre de processeurs, les caractéristiques du réseau (débit, latence), et le type de maintien de la cohérence utilisé (5 possibles). En ce qui concerne les applications, il est possible de faire varier la taille et pour certaines la répartition des données. L'émulateur construit s'exécute au-dessus d'un micro-noyau MACH et d'un serveur UNIX. Il exploite certaines fonctionnalités du micro-noyau MACH, en particulier les paginateurs externes.

To the bibliography