Thematische Bibliographien / Parallel code optimization

Inhaltsverzeichnis

Zeitschriftenartikel
Dissertationen
Bücher
Buchteile
Konferenzberichte
Berichte der Organisationen

Auswahl der wissenschaftlichen Literatur zum Thema „Parallel code optimization“

Autor: Grafiati

Veröffentlicht am 25. Mai 2024

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Parallel code optimization" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Parallel code optimization"

Özcan, Ender, und Esin Onbaşioğlu. „Memetic Algorithms for Parallel Code Optimization“. International Journal of Parallel Programming 35, Nr. 1 (02.12.2006): 33–61. http://dx.doi.org/10.1007/s10766-006-0026-x.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Luo, Hao, Guoyang Chen, Pengcheng Li, Chen Ding und Xipeng Shen. „Data-centric combinatorial optimization of parallel code“. ACM SIGPLAN Notices 51, Nr. 8 (09.11.2016): 1–2. http://dx.doi.org/10.1145/3016078.2851182.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bailey, Duane A., Janice E. Cuny und Bruce B. MacLeod. „Reducing communication overhead: A parallel code optimization“. Journal of Parallel and Distributed Computing 4, Nr. 5 (Oktober 1987): 505–20. http://dx.doi.org/10.1016/0743-7315(87)90021-9.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Shang, Zhi. „Large-Scale CFD Parallel Computing Dealing with Massive Mesh“. Journal of Engineering 2013 (2013): 1–6. http://dx.doi.org/10.1155/2013/850148.

Der volle Inhalt der Quelle

Annotation:

In order to run CFD codes more efficiently on large scales, the parallel computing has to be employed. For example, in industrial scales, it usually uses tens of thousands of mesh cells to capture the details of complex geometries. How to distribute these mesh cells among the multiprocessors for obtaining a good parallel computing performance (HPC) is really a challenge. Due to dealing with the massive mesh cells, it is difficult for the CFD codes without parallel optimizations to handle this kind of large-scale computing. Some of the open source mesh partitioning software packages, such as Metis, ParMetis, Scotch, PT-Scotch, and Zoltan, are able to deal with the distribution of large number of mesh cells. Therefore they were employed as the parallel optimization tools ported into Code_Saturne, an open source CFD code, for testing if they can solve the issue of dealing with massive mesh cells for CFD codes. Through the studies, it was found that the mesh partitioning optimization software packages can help CFD codes not only deal with massive mesh cells but also have a good HPC.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Özturan, Can, Balaram Sinharoy und Boleslaw K. Szymanski. „Compiler Technology for Parallel Scientific Computation“. Scientific Programming 3, Nr. 3 (1994): 201–25. http://dx.doi.org/10.1155/1994/243495.

Der volle Inhalt der Quelle

Annotation:

There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL). Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Kiselev, E. A., P. N. Telegin und A. V. Baranov. „Impact of Parallel Code Optimization on Computer Power Consumption“. Lobachevskii Journal of Mathematics 44, Nr. 12 (Dezember 2023): 5306–19. http://dx.doi.org/10.1134/s1995080223120211.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Safarik, Jakub, und Vaclav Snasel. „Acceleration of Particle Swarm Optimization with AVX Instructions“. Applied Sciences 13, Nr. 2 (04.01.2023): 734. http://dx.doi.org/10.3390/app13020734.

Der volle Inhalt der Quelle

Annotation:

Parallel implementations of algorithms are usually compared with single-core CPU performance. The advantage of multicore vector processors decreases the performance gap between GPU and CPU computation, as shown in many recent pieces of research. With the AVX-512 instruction set, there will be another performance boost for CPU computations. The availability of parallel code running on CPUs made them much easier and more accessible than GPUs. This article compares the performances of parallel implementations of the particle swarm optimization algorithm. The code was written in C++, and we used various techniques to obtain parallel execution through Advanced Vector Extensions. We present the performance on various benchmark functions and different problem configurations. The article describes and compares the performance boost gained from parallel execution on CPU, along with advantages and disadvantages of parallelization techniques.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Chowdhary, K. R., Rajendra Purohit und Sunil Dutt Purohit. „Source-to-source translation for code-optimization“. Journal of Information and Optimization Sciences 44, Nr. 3 (2023): 407–16. http://dx.doi.org/10.47974/jios-1350.

Der volle Inhalt der Quelle

Annotation:

Multi-core design intends to serve a large market with user-oriented and highproductivity management as opposed to any other parallel system. Small numbers of processors, a frequent feature of current multi-core systems, are ideal for future generation of CPUs, where automated parallelization succeeds on shared space architectures. The multi-core compiler optimization platform CETUS (high-level to high-level compiler) offers initiates automatic parallelization in compiled programmes. This compiler’s infrastructure is built with C programmes in mind and is user-friendly and simple to use. It offers the significant parallelization passes and also the underlying empowering techniques, allows source-to-source conversions, and delivers these features. This compiler has undergone numerous benchmark investigations (techniques) and approach implementation iterations. It might enhance the programs’ parallel performance. The main drawback of advanced optimising compilers, however, is that they don’t provide runtime details like the program’s input data. The approaches presented in this paper facilitatedynamic optimization using CETUS. The large amount of proposed compiler analyses and modifications for parallelization is the last point. To research the behaviour as well as the throughput gains, we investigated both non-CETUS based and CETUS based parallelized program features in this work.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

WANG, SHENGYUE, PEN-CHUNG YEW und ANTONIA ZHAI. „CODE TRANSFORMATIONS FOR ENHANCING THE PERFORMANCE OF SPECULATIVELY PARALLEL THREADS“. Journal of Circuits, Systems and Computers 21, Nr. 02 (April 2012): 1240008. http://dx.doi.org/10.1142/s0218126612400087.

Der volle Inhalt der Quelle

Annotation:

As technology advances, microprocessors that integrate multiple cores on a single chip are becoming increasingly common. How to use these processors to improve the performance of a single program has been a challenge. For general-purpose applications, it is especially difficult to create efficient parallel execution due to the complex control flow and ambiguous data dependences. Thread-level speculation and transactional memory provide two hardware mechanisms that are able to optimistically parallelize potentially dependent threads. However, a compiler that performs detailed performance trade-off analysis is essential for generating efficient parallel programs for these hardwares. This compiler must be able to take into consideration the cost of intra-thread as well as inter-thread value communication. On the other hand, the ubiquitous existence of complex, input-dependent control flow and data dependence patterns in general-purpose applications makes it impossible to have one technique optimize all program patterns. In this paper, we propose three optimization techniques to improve the thread performance: (i) scheduling instruction and generating recovery code to reduce the critical forwarding path introduced by synchronizing memory resident values; (ii) identifying reduction variables and transforming the code the minimize the serializing execution; and (iii) dynamically merging consecutive iterations of a loop to avoid stalls due to unbalanced workload. Detailed evaluation of the proposed mechanism shows that each optimization technique improves a subset but none improve all of the SPEC2000 benchmarks. On average, the proposed optimizations improve the performance by 7% for the set of the SPEC2000 benchmarks that have already been optimized for register-resident value communication.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Siow, C. L., Jaswar und Efi Afrizal. „Computational Fluid Dynamic Using Parallel Loop of Multi-Cores Processor“. Applied Mechanics and Materials 493 (Januar 2014): 80–85. http://dx.doi.org/10.4028/www.scientific.net/amm.493.80.

Der volle Inhalt der Quelle

Annotation:

Computational Fluid Dynamics (CFD) software is often used to study fluid flow and structures motion in fluids. The CFD normally requires large size of arrays and computer memory and then caused long execution time. However, Innovation of computer hardware such as multi-cores processor provides an alternative solution to improve this programming performance. This paper discussed loop parallelize multi-cores processor for optimization of sequential looping CFD code. This loop parallelize CFD was achieved by applying multi-tasking or multi-threading code into the original CFD code which was developed by one of the authors. The CFD code was developed based on Reynolds Average Navier-Stokes (RANS) method. The new CFD code program was developed using Microsoft Visual Basic (VB) programming language. In the early stage, the whole CFD code was constructed in a sequential flow before it is modified to parallel flow by using VBs multi-threading library. In the comparison, fluid flow around the hull of round-shaped FPSO was selected to compare the performance of both the programming codes. Besides, executed results of this self-developed code such as pressure distribution around the hull were also presented in this paper.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Mehr Quellen

Dissertationen zum Thema "Parallel code optimization"

Cordeiro, Silvio Ricardo. „Code profiling and optimization in transactional memory systems“. reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/97866.

Der volle Inhalt der Quelle

Annotation:

Memória Transacional tem se demonstrado um paradigma promissor na implementação de aplicações concorrentes sob memória compartilhada que busquem evitar um modelo de sincronização baseado em locks. Em vez de sujeitar a execução a um acesso exclusivo com base no valor de um lock que é compartilhado por threads concorrentes, uma aplicação sob Memória Transacional tenta executar seções críticas de modo otimista, desfazendo as modificações no caso de um conflito de acesso à memória. Entretanto, apesar de a abordagem baseada em locks ter adquirido um número significativo de ferramentas automatizadas para a depuração, profiling e otimização automatizados (por ser uma das técnicas de sincronização mais antigas e mais bem pesquisadas), o campo da Memória Transacional ainda é comparativamente recente, e programadores frequentemente precisam adaptar manualmente suas aplicações transacionais ao encontrar problemas de eficiência. Este trabalho propõe um sistema no qual o profiling de código em uma implementação de Memória Transacional simulada é utilizado para caracterizar uma aplicação transacional, formando a base para uma parametrização automatizada do respectivo sistema especulativo para uma execução eficiente do código em questão. Também é proposta uma abordagem de escalonamento de threads guiado por profiling em uma implementação de Memória Transacional baseada em software, usando dados coletados pelo profiler para prever a probabilidade de conflitos e determinar que thread escalonar com base nesta previsão. São apresentados os resultados de experimentos sob ambas as abordagens.
Transactional Memory has shown itself to be a promising paradigm for the implementation of shared-memory concurrent applications that eschew a lock-based model of data synchronization. Rather than conditioning exclusive access on the value of a lock that is shared across concurrent threads, Transactional Memory attempts to execute critical sections optimistically, rolling back the modifications in the event of a data access conflict. However, while the lock-based approach has acquired a significant body of debugging, profiling and automated optimization tools (as one of the oldest and most researched synchronization techniques), the field of Transactional Memory is still comparably recent, and programmers are usually tasked with an unguided manual tuning of their transactional applications when facing efficiency problems. We propose a system in which code profiling in a simulated hardware implementation of Transactional Memory is used to characterize a transactional application, which forms the basis for the automated tuning of the underlying speculative system for the efficient execution of that particular application. We also propose a profile-guided approach to the scheduling of threads in a software-based implementation of Transactional Memory, using collected data to predict the likelihood of conflicts and determine what thread to schedule based on this prediction. We present the results achieved under both designs.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Hong, Changwan. „Code Optimization on GPUs“. The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Faber, Peter. „Code Optimization in the Polyhedron Model - Improving the Efficieny of Parallel Loop Nests“. kostenfrei, 2007. http://www.opus-bayern.de/uni-passau/volltexte/2008/1251/.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Fassi, Imen. „XFOR (Multifor) : A new programming structure to ease the formulation of efficient loop optimizations“. Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD043/document.

Der volle Inhalt der Quelle

Annotation:

Nous proposons une nouvelle structure de programmation appelée XFOR (Multifor), dédiée à la programmation orientée réutilisation de données. XFOR permet de gérer simultanément plusieurs boucles "for" ainsi que d’appliquer/composer des transformations de boucles d’une façon intuitive. Les expérimentations ont montré des accélérations significatives des codes XFOR par rapport aux codes originaux, mais aussi par rapport au codes générés automatiquement par l’optimiseur polyédrique de boucles Pluto. Nous avons mis en œuvre la structure XFOR par le développement de trois outils logiciels: (1) un compilateur source-à-source nommé IBB, qui traduit les codes XFOR en un code équivalent où les boucles XFOR ont été remplacées par des boucles for sémantiquement équivalentes. L’outil IBB bénéficie également des optimisations implémentées dans le générateur de code polyédrique CLooG qui est invoqué par IBB pour générer des boucles for à partir d’une description OpenScop; (2) un environnement de programmation XFOR nommé XFOR-WIZARD qui aide le programmeur dans la ré-écriture d’un programme utilisant des boucles for classiques en un programme équivalent, mais plus efficace, utilisant des boucles XFOR; (3) un outil appelé XFORGEN, qui génère automatiquement des boucles XFOR à partir de toute représentation OpenScop de nids de boucles transformées générées automatiquement par un optimiseur automatique
We propose a new programming structure named XFOR (Multifor), dedicated to data-reuse aware programming. It allows to handle several for-loops simultaneously and map their respective iteration domains onto each other. Additionally, XFOR eases loop transformations application and composition. Experiments show that XFOR codes provides significant speed-ups when compared to the original code versions, but also to the Pluto optimized versions. We implemented the XFOR structure through the development of three software tools: (1) a source-to-source compiler named IBB for Iterate-But-Better!, which automatically translates any C/C++ code containing XFOR-loops into an equivalent code where XFOR-loops have been translated into for-loops. IBB takes also benefit of optimizations implemented in the polyhedral code generator CLooG which is invoked by IBB to generate for-loops from an OpenScop specification; (2) an XFOR programming environment named XFOR-WIZARD that assists the programmer in re-writing a program with classical for-loops into an equivalent but more efficient program using XFOR-loops; (3) a tool named XFORGEN, which automatically generates XFOR-loops from any OpenScop representation of transformed loop nests automatically generated by an automatic optimizer

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Irigoin, François. „Partitionnement des boucles imbriquées : une technique d'optimisation pour les programmes scientifiques“. Paris 6, 1987. http://www.theses.fr/1987PA066437.

Der volle Inhalt der Quelle

Annotation:

On propose une nouvelle transformation de programme, appelée partitionnement en supernœuds, qui s'applique aux boucles imbriquées et qui permet d'atteindre les objectifs suivants: saturation du parallélisme vectoriel et des processeurs élémentaires, une bonne localité des références de manière à ne pas se trouver limité par la bande passante de la mémoire principale et un coût de synchronisation acceptable.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

He, Guanlin. „Parallel algorithms for clustering large datasets on CPU-GPU heterogeneous architectures“. Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG062.

Der volle Inhalt der Quelle

Annotation:

Clustering, qui consiste à réaliser des groupements naturels de données, est une tâche fondamentale et difficile dans l'apprentissage automatique et l'exploration de données. De nombreuses méthodes de clustering ont été proposées dans le passé, parmi lesquelles le clustering en k-moyennes qui est une méthode couramment utilisée en raison de sa simplicité et de sa rapidité.Le clustering spectral est une approche plus récente qui permet généralement d'obtenir une meilleure qualité de clustering que les k-moyennes. Cependant, les algorithmes classiques de clustering spectral souffrent d'un manque de passage à l'échelle en raison de leurs grandes complexités en nombre d'opérations et en espace mémoire nécessaires. Ce problème de passage à l'échelle peut être traité en appliquant des méthodes d'approximation ou en utilisant le calcul parallèle et distribué.L'objectif de cette thèse est d'accélérer le clustering spectral et de le rendre applicable à de grands ensembles de données en combinant l'approximation basée sur des données représentatives avec le calcul parallèle sur processeurs CPU et GPU. En considérant différents scénarios, nous proposons plusieurs chaînes de traitement parallèle pour le clustering spectral à grande échelle. Nous concevons des algorithmes et des implémentations parallèles optimisés pour les modules de chaque chaîne proposée : un algorithme parallèle des k-moyennes sur CPU et GPU, un clustering spectral parallèle sur GPU avec un format de stockage creux, un filtrage parallèle sur GPU du bruit dans les données, etc. Nos expériences variées atteignent de grandes performances et valident le passage à l'échelle de chaque module et de nos chaînes complètes
Clustering, which aims at achieving natural groupings of data, is a fundamental and challenging task in machine learning and data mining. Numerous clustering methods have been proposed in the past, among which k-means is one of the most famous and commonly used methods due to its simplicity and efficiency.Spectral clustering is a more recent approach that usually achieves higher clustering quality than k-means. However, classical algorithms of spectral clustering suffer from a lack of scalability due to their high complexities in terms of number of operations and memory space requirements. This scalability challenge can be addressed by applying approximation methods or by employing parallel and distributed computing.The objective of this thesis is to accelerate spectral clustering and make it scalable to large datasets by combining representatives-based approximation with parallel computing on CPU-GPU platforms. Considering different scenarios, we propose several parallel processing chains for large-scale spectral clustering. We design optimized parallel algorithms and implementations for each module of the proposed chains: parallel k-means on CPU and GPU, parallel spectral clustering on GPU using sparse storage format, parallel filtering of data noise on GPU, etc. Our various experiments reach high performance and validate the scalability of each module and the complete chains

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Fang, Juing. „Décodage pondère des codes en blocs et quelques sujets sur la complexité du décodage“. Paris, ENST, 1987. http://www.theses.fr/1987ENST0005.

Der volle Inhalt der Quelle

Annotation:

Etude de la compléxité théorique du décodage des codes en blocs à travers une famille d'algorithmes basée sur le principe d'optimisation combinatoire. Puis on aborde un algorithme parallèle de décodage algébrique dont la complexitré est liée au niveau de bruit du canal. Enfin on introduit un algorithme de Viterbi pour les applications de traitement en chaînes.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Tagliavini, Giuseppe <1980&gt. „Optimization Techniques for Parallel Programming of Embedded Many-Core Computing Platforms“. Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amsdottorato.unibo.it/8068/1/TESI.pdf.

Der volle Inhalt der Quelle

Annotation:

Nowadays many-core computing platforms are widely adopted as a viable solution to accelerate compute-intensive workloads at different scales, from low-cost devices to HPC nodes. It is well established that heterogeneous platforms including a general-purpose host processor and a parallel programmable accelerator have the potential to dramatically increase the peak performance/Watt of computing architectures. However the adoption of these platforms further complicates application development, whereas it is widely acknowledged that software development is a critical activity for the platform design. The introduction of parallel architectures raises the need for programming paradigms capable of effectively leveraging an increasing number of processors, from two to thousands. In this scenario the study of optimization techniques to program parallel accelerators is paramount for two main objectives: first, improving performance and energy efficiency of the platform, which are key metrics for both embedded and HPC systems; second, enforcing software engineering practices with the aim to guarantee code quality and reduce software costs. This thesis presents a set of techniques that have been studied and designed to achieve these objectives overcoming the current state-of-the-art. As a first contribution, we discuss the use of OpenMP tasking as a general-purpose programming model to support the execution of diverse workloads, and we introduce a set of runtime-level techniques to support fine-grain tasks on high-end many-core accelerators (devices with a power consumption greater than 10W). Then we focus our attention on embedded computer vision (CV), with the aim to show how to achieve best performance by exploiting the characteristics of a specific application domain. To further reduce the power consumption of parallel accelerators beyond the current technological limits, we describe an approach based on the principles of approximate computing, which implies modification to the program semantics and proper hardware support at the architectural level.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Drebes, Andi. „Dynamic optimization of data-flow task-parallel applications for large-scale NUMA systems“. Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066330/document.

Der volle Inhalt der Quelle

Annotation:

Au milieu des années deux mille, le développement de microprocesseurs a atteint un point à partir duquel l'augmentation de la fréquence de fonctionnement et la complexification des micro-architectures devenaient moins efficaces en termes de consommation d'énergie, poussant ainsi la densité d'énergie au delà du raisonnable. Par conséquent, l'industrie a opté pour des architectures multi-cœurs intégrant plusieurs unités de calcul sur une même puce. Les sytèmes hautes performances d'aujourd'hui sont composés de centaines de cœurs et les systèmes futurs intègreront des milliers d'unités de calcul. Afin de fournir une bande passante mémoire suffisante dans ces systèmes, la mémoire vive est distribuée physiquement sur plusieurs contrôleurs mémoire avec un accès non-uniforme à la mémoire (NUMA). Des travaux de recherche récents ont identifié les modèles de programmation à base de tâches dépendantes à granularité fine comme une approche clé pour exploiter la puissance de calcul des architectures généralistes massivement parallèles. Toutefois, peu de recherches ont été conduites sur l'optimisation dynamique des programmes parallèles à base de tâches afin de réduire l'impact négatif sur les performances résultant de la non-uniformité des accès à la mémoire. L'objectif de cette thèse est de déterminer les enjeux et les opportunités concernant l'exploitation efficace de machines many-core NUMA par des applications à base de tâches et de proposer des mécanismes efficaces, portables et entièrement automatiques pour le placement de tâches et de données, améliorant la localité des accès à la mémoire ainsi que les performances. Les décisions de placement sont basées sur l'exploitation des informations sur les dépendances entre tâches disponibles dans les run-times de langages de programmation à base de tâches modernes. Les évaluations expérimentales réalisées reposent sur notre implémentation dans le run-time du langage OpenStream et un ensemble de benchmarks scientifiques hautes performances. Enfin, nous avons développé et implémenté Aftermath, un outil d'analyse et de débogage de performances pour des applications à base de tâches et leurs run-times
Within the last decade, microprocessor development reached a point at which higher clock rates and more complex micro-architectures became less energy-efficient, such that power consumption and energy density were pushed beyond reasonable limits. As a consequence, the industry has shifted to more energy efficient multi-core designs, integrating multiple processing units (cores) on a single chip. The number of cores is expected to grow exponentially and future systems are expected to integrate thousands of processing units. In order to provide sufficient memory bandwidth in these systems, main memory is physically distributed over multiple memory controllers with non-uniform access to memory (NUMA). Past research has identified programming models based on fine-grained, dependent tasks as a key technique to unleash the parallel processing power of massively parallel general-purpose computing architectures. However, the execution of task-paralel programs on architectures with non-uniform memory access and the dynamic optimizations to mitigate NUMA effects have received only little interest. In this thesis, we explore the main factors on performance and data locality of task-parallel programs and propose a set of transparent, portable and fully automatic on-line mapping mechanisms for tasks to cores and data to memory controllers in order to improve data locality and performance. Placement decisions are based on information about point-to-point data dependences, readily available in the run-time systems of modern task-parallel programming frameworks. The experimental evaluation of these techniques is conducted on our implementation in the run-time of the OpenStream language and a set of high-performance scientific benchmarks. Finally, we designed and implemented Aftermath, a tool for performance analysis and debugging of task-parallel applications and run-times

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Child, Ryan. „Performance and Power Optimization of Parallel Discrete Event Simulations Using DVFS“. University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1342730759.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Mehr Quellen

Bücher zum Thema "Parallel code optimization"

Faber, Peter. Code Optimization in the Polyhedron Model - Improving the Efficiency of Parallel Loop Nests. Lulu Press, Inc., 2009.

Den vollen Inhalt der Quelle finden

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Faber, Peter. Paperback: Code Optimization in the Polyhedron Model - Improving the Efficiency of Parallel Loop Nests. Lulu Press, Inc., 2009.

Den vollen Inhalt der Quelle finden

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Performance Optimization of Numerically Intensive Codes (Software, Environments and Tools). Society for Industrial Mathematics, 2001.

Den vollen Inhalt der Quelle finden

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bäck, Thomas. Evolutionary Algorithms in Theory and Practice. Oxford University Press, 1996. http://dx.doi.org/10.1093/oso/9780195099713.001.0001.

Der volle Inhalt der Quelle

Annotation:

This book presents a unified view of evolutionary algorithms: the exciting new probabilistic search tools inspired by biological models that have immense potential as practical problem-solvers in a wide variety of settings, academic, commercial, and industrial. In this work, the author compares the three most prominent representatives of evolutionary algorithms: genetic algorithms, evolution strategies, and evolutionary programming. The algorithms are presented within a unified framework, thereby clarifying the similarities and differences of these methods. The author also presents new results regarding the role of mutation and selection in genetic algorithms, showing how mutation seems to be much more important for the performance of genetic algorithms than usually assumed. The interaction of selection and mutation, and the impact of the binary code are further topics of interest. Some of the theoretical results are also confirmed by performing an experiment in meta-evolution on a parallel computer. The meta-algorithm used in this experiment combines components from evolution strategies and genetic algorithms to yield a hybrid capable of handling mixed integer optimization problems. As a detailed description of the algorithms, with practical guidelines for usage and implementation, this work will interest a wide range of researchers in computer science and engineering disciplines, as well as graduate students in these fields.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Parallel code optimization"

Dekel, Eliezer, Simeon Ntafos und Shie-Tung Peng. „Parallel tree techniques and code optimization“. In VLSI Algorithms and Architectures, 205–16. Berlin, Heidelberg: Springer Berlin Heidelberg, 1986. http://dx.doi.org/10.1007/3-540-16766-8_18.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Andersson, Niclas, und Peter Fritzson. „Object Oriented Mathematical Modelling and Compilation to Parallel Code“. In Applied Optimization, 99–182. Boston, MA: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4613-3400-2_5.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Sarkar, Vivek. „Challenges in Code Optimization of Parallel Programs“. In Lecture Notes in Computer Science, 1. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-00722-4_1.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Taylor, Ryan, und Xiaoming Li. „A Code Merging Optimization Technique for GPU“. In Languages and Compilers for Parallel Computing, 218–36. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-36036-7_15.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Martinez Caamaño, Juan Manuel, Willy Wolff und Philippe Clauss. „Code Bones: Fast and Flexible Code Generation for Dynamic and Speculative Polyhedral Optimization“. In Euro-Par 2016: Parallel Processing, 225–37. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-43659-3_17.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Avis, David, und Gary Roumanis. „A Portable Parallel Implementation of the lrs Vertex Enumeration Code“. In Combinatorial Optimization and Applications, 414–29. Cham: Springer International Publishing, 2013. http://dx.doi.org/10.1007/978-3-319-03780-6_36.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wcisło, R., J. Kitowski und J. Mościński. „Parallelization of a code for animation of multi-object system“. In Applied Parallel Computing Industrial Computation and Optimization, 697–709. Berlin, Heidelberg: Springer Berlin Heidelberg, 1996. http://dx.doi.org/10.1007/3-540-62095-8_75.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Damani, Sana, und Vivek Sarkar. „Common Subexpression Convergence: A New Code Optimization for SIMT Processors“. In Languages and Compilers for Parallel Computing, 64–73. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72789-5_5.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Epshteyn, Arkady, María Jesús Garzaran, Gerald DeJong, David Padua, Gang Ren, Xiaoming Li, Kamen Yotov und Keshav Pingali. „Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization“. In Languages and Compilers for Parallel Computing, 259–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/978-3-540-69330-7_18.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Taubert, Oskar, Marie Weiel, Daniel Coquelin, Anis Farshian, Charlotte Debus, Alexander Schug, Achim Streit und Markus Götz. „Massively Parallel Genetic Optimization Through Asynchronous Propagation of Populations“. In Lecture Notes in Computer Science, 106–24. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-32041-5_6.

Der volle Inhalt der Quelle

Annotation:

AbstractWe present , an evolutionary optimization algorithm and software package for global optimization and in particular hyperparameter search. For efficient use of HPC resources, omits the synchronization after each generation as done in conventional genetic algorithms. Instead, it steers the search with the complete population present at time of breeding new individuals. We provide an MPI-based implementation of our algorithm, which features variants of selection, mutation, crossover, and migration and is easy to extend with custom functionality. We compare to the established optimization tool . We find that is up to three orders of magnitude faster without sacrificing solution accuracy, demonstrating the efficiency and efficacy of our lazy synchronization approach. Code and documentation are available at https://github.com/Helmholtz-AI-Energy/propulate/.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Parallel code optimization"

Sarkar, Vivek. „Code optimization of parallel programs“. In the sixth annual IEEE/ACM international symposium. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1356058.1356087.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wang, Fang, Shixin Cheng, Wei Xu und Haifeng Wang. „Design and Code Optimization of Parallel Concatenated Gallager Codes“. In 2007 IEEE 18th International Symposium on Personal, Indoor and Mobile Radio Communications. IEEE, 2007. http://dx.doi.org/10.1109/pimrc.2007.4394240.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buck, Ian. „GPU Computing: Programming a Massively Parallel Processor“. In International Symposium on Code Generation and Optimization (CGO'07). IEEE, 2007. http://dx.doi.org/10.1109/cgo.2007.13.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Soliman, Karim, Marwa El Shenawy und Ahmed Abou El Farag. „Loop unrolling effect on parallel code optimization“. In ICFNDS'18: International Conference on Future Networks and Distributed Systems. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3231053.3231060.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Luo, Hao, Guoyang Chen, Pengcheng Li, Chen Ding und Xipeng Shen. „Data-centric combinatorial optimization of parallel code“. In PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2851141.2851182.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dubey, A., und T. Clune. „Optimization of a parallel pseudospectral MHD code“. In Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation. IEEE, 1999. http://dx.doi.org/10.1109/fmpc.1999.750602.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Suriana, Patricia, Andrew Adams und Shoaib Kamil. „Parallel associative reductions in Halide“. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2017. http://dx.doi.org/10.1109/cgo.2017.7863747.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Yongpeng Zhang und F. Mueller. „Hidp: A hierarchical data parallel language“. In 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2013. http://dx.doi.org/10.1109/cgo.2013.6494994.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dewey, Kyle, Vineeth Kashyap und Ben Hardekopf. „A parallel abstract interpreter for JavaScript“. In 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2015. http://dx.doi.org/10.1109/cgo.2015.7054185.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Yunsup Lee, R. Krashinsky, V. Grover, S. W. Keckler und K. Asanovic. „Convergence and scalarization for data-parallel architectures“. In 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2013. http://dx.doi.org/10.1109/cgo.2013.6494995.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Berichte der Organisationen zum Thema "Parallel code optimization"

Hisley, Dixie M. Enabling Programmer-Controlled Combined Memory Consistency for Parallel Code Optimization. Fort Belvoir, VA: Defense Technical Information Center, Juli 2003. http://dx.doi.org/10.21236/ada416794.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!