Tesis sobre el tema "Parallel code optimization"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 24 mejores tesis para su investigación sobre el tema "Parallel code optimization".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Cordeiro, Silvio Ricardo. "Code profiling and optimization in transactional memory systems". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/97866.
Texto completoTransactional Memory has shown itself to be a promising paradigm for the implementation of shared-memory concurrent applications that eschew a lock-based model of data synchronization. Rather than conditioning exclusive access on the value of a lock that is shared across concurrent threads, Transactional Memory attempts to execute critical sections optimistically, rolling back the modifications in the event of a data access conflict. However, while the lock-based approach has acquired a significant body of debugging, profiling and automated optimization tools (as one of the oldest and most researched synchronization techniques), the field of Transactional Memory is still comparably recent, and programmers are usually tasked with an unguided manual tuning of their transactional applications when facing efficiency problems. We propose a system in which code profiling in a simulated hardware implementation of Transactional Memory is used to characterize a transactional application, which forms the basis for the automated tuning of the underlying speculative system for the efficient execution of that particular application. We also propose a profile-guided approach to the scheduling of threads in a software-based implementation of Transactional Memory, using collected data to predict the likelihood of conflicts and determine what thread to schedule based on this prediction. We present the results achieved under both designs.
Hong, Changwan. "Code Optimization on GPUs". The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1557123832601533.
Texto completoFaber, Peter. "Code Optimization in the Polyhedron Model - Improving the Efficieny of Parallel Loop Nests". kostenfrei, 2007. http://www.opus-bayern.de/uni-passau/volltexte/2008/1251/.
Texto completoFassi, Imen. "XFOR (Multifor) : A new programming structure to ease the formulation of efficient loop optimizations". Thesis, Strasbourg, 2015. http://www.theses.fr/2015STRAD043/document.
Texto completoWe propose a new programming structure named XFOR (Multifor), dedicated to data-reuse aware programming. It allows to handle several for-loops simultaneously and map their respective iteration domains onto each other. Additionally, XFOR eases loop transformations application and composition. Experiments show that XFOR codes provides significant speed-ups when compared to the original code versions, but also to the Pluto optimized versions. We implemented the XFOR structure through the development of three software tools: (1) a source-to-source compiler named IBB for Iterate-But-Better!, which automatically translates any C/C++ code containing XFOR-loops into an equivalent code where XFOR-loops have been translated into for-loops. IBB takes also benefit of optimizations implemented in the polyhedral code generator CLooG which is invoked by IBB to generate for-loops from an OpenScop specification; (2) an XFOR programming environment named XFOR-WIZARD that assists the programmer in re-writing a program with classical for-loops into an equivalent but more efficient program using XFOR-loops; (3) a tool named XFORGEN, which automatically generates XFOR-loops from any OpenScop representation of transformed loop nests automatically generated by an automatic optimizer
Irigoin, François. "Partitionnement des boucles imbriquées : une technique d'optimisation pour les programmes scientifiques". Paris 6, 1987. http://www.theses.fr/1987PA066437.
Texto completoHe, Guanlin. "Parallel algorithms for clustering large datasets on CPU-GPU heterogeneous architectures". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG062.
Texto completoClustering, which aims at achieving natural groupings of data, is a fundamental and challenging task in machine learning and data mining. Numerous clustering methods have been proposed in the past, among which k-means is one of the most famous and commonly used methods due to its simplicity and efficiency.Spectral clustering is a more recent approach that usually achieves higher clustering quality than k-means. However, classical algorithms of spectral clustering suffer from a lack of scalability due to their high complexities in terms of number of operations and memory space requirements. This scalability challenge can be addressed by applying approximation methods or by employing parallel and distributed computing.The objective of this thesis is to accelerate spectral clustering and make it scalable to large datasets by combining representatives-based approximation with parallel computing on CPU-GPU platforms. Considering different scenarios, we propose several parallel processing chains for large-scale spectral clustering. We design optimized parallel algorithms and implementations for each module of the proposed chains: parallel k-means on CPU and GPU, parallel spectral clustering on GPU using sparse storage format, parallel filtering of data noise on GPU, etc. Our various experiments reach high performance and validate the scalability of each module and the complete chains
Fang, Juing. "Décodage pondère des codes en blocs et quelques sujets sur la complexité du décodage". Paris, ENST, 1987. http://www.theses.fr/1987ENST0005.
Texto completoTagliavini, Giuseppe <1980>. "Optimization Techniques for Parallel Programming of Embedded Many-Core Computing Platforms". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amsdottorato.unibo.it/8068/1/TESI.pdf.
Texto completoDrebes, Andi. "Dynamic optimization of data-flow task-parallel applications for large-scale NUMA systems". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066330/document.
Texto completoWithin the last decade, microprocessor development reached a point at which higher clock rates and more complex micro-architectures became less energy-efficient, such that power consumption and energy density were pushed beyond reasonable limits. As a consequence, the industry has shifted to more energy efficient multi-core designs, integrating multiple processing units (cores) on a single chip. The number of cores is expected to grow exponentially and future systems are expected to integrate thousands of processing units. In order to provide sufficient memory bandwidth in these systems, main memory is physically distributed over multiple memory controllers with non-uniform access to memory (NUMA). Past research has identified programming models based on fine-grained, dependent tasks as a key technique to unleash the parallel processing power of massively parallel general-purpose computing architectures. However, the execution of task-paralel programs on architectures with non-uniform memory access and the dynamic optimizations to mitigate NUMA effects have received only little interest. In this thesis, we explore the main factors on performance and data locality of task-parallel programs and propose a set of transparent, portable and fully automatic on-line mapping mechanisms for tasks to cores and data to memory controllers in order to improve data locality and performance. Placement decisions are based on information about point-to-point data dependences, readily available in the run-time systems of modern task-parallel programming frameworks. The experimental evaluation of these techniques is conducted on our implementation in the run-time of the OpenStream language and a set of high-performance scientific benchmarks. Finally, we designed and implemented Aftermath, a tool for performance analysis and debugging of task-parallel applications and run-times
Child, Ryan. "Performance and Power Optimization of Parallel Discrete Event Simulations Using DVFS". University of Cincinnati / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1342730759.
Texto completoDrebes, Andi. "Dynamic optimization of data-flow task-parallel applications for large-scale NUMA systems". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066330.
Texto completoWithin the last decade, microprocessor development reached a point at which higher clock rates and more complex micro-architectures became less energy-efficient, such that power consumption and energy density were pushed beyond reasonable limits. As a consequence, the industry has shifted to more energy efficient multi-core designs, integrating multiple processing units (cores) on a single chip. The number of cores is expected to grow exponentially and future systems are expected to integrate thousands of processing units. In order to provide sufficient memory bandwidth in these systems, main memory is physically distributed over multiple memory controllers with non-uniform access to memory (NUMA). Past research has identified programming models based on fine-grained, dependent tasks as a key technique to unleash the parallel processing power of massively parallel general-purpose computing architectures. However, the execution of task-paralel programs on architectures with non-uniform memory access and the dynamic optimizations to mitigate NUMA effects have received only little interest. In this thesis, we explore the main factors on performance and data locality of task-parallel programs and propose a set of transparent, portable and fully automatic on-line mapping mechanisms for tasks to cores and data to memory controllers in order to improve data locality and performance. Placement decisions are based on information about point-to-point data dependences, readily available in the run-time systems of modern task-parallel programming frameworks. The experimental evaluation of these techniques is conducted on our implementation in the run-time of the OpenStream language and a set of high-performance scientific benchmarks. Finally, we designed and implemented Aftermath, a tool for performance analysis and debugging of task-parallel applications and run-times
Belgin, Mehmet. "Structure-based Optimizations for Sparse Matrix-Vector Multiply". Diss., Virginia Tech, 2010. http://hdl.handle.net/10919/30260.
Texto completoPh. D.
Gao, Xiaoyang. "Integrated compiler optimizations for tensor contractions". Columbus, Ohio : Ohio State University, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1198874631.
Texto completoDarwish, Mohammed. "Lot-sizing and scheduling optimization using genetic algorithm". Thesis, Högskolan i Skövde, Institutionen för ingenjörsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17045.
Texto completoТевяшев, А. Д. y Д. І. Гольдинер. "System Analysis of The Parallel Execution Problem". Thesis, 2019. http://openarchive.nure.ua/handle/document/11959.
Texto completoFaber, Peter [Verfasser]. "Code optimization in the polyhedron model : improving the efficiency of parallel loop nests / Peter Faber". 2008. http://d-nb.info/991047869/34.
Texto completoMullapudi, Ravi Teja. "Polymage : Automatic Optimization for Image Processing Pipelines". Thesis, 2015. http://etd.iisc.ac.in/handle/2005/3757.
Texto completoMullapudi, Ravi Teja. "Polymage : Automatic Optimization for Image Processing Pipelines". Thesis, 2015. http://etd.iisc.ernet.in/2005/3757.
Texto completoNandakumar, K. S. "Combining Conditional Constant Propagation And Interprocedural Alias Analysis". Thesis, 1995. https://etd.iisc.ac.in/handle/2005/1739.
Texto completoNandakumar, K. S. "Combining Conditional Constant Propagation And Interprocedural Alias Analysis". Thesis, 1995. http://etd.iisc.ernet.in/handle/2005/1739.
Texto completoSjöblom, William. "Idiom-driven innermost loop vectorization in the presence of cross-iteration data dependencies in the HotSpot C2 compiler". Thesis, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-172789.
Texto completoRyoo, Shane. "Program optimization strategies for data-parallel many-core processors /". 2008. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3314878.
Texto completoSource: Dissertation Abstracts International, Volume: 69-05, Section: B, page: 3190. Adviser: Wen-mei W. Hwu. Includes bibliographical references (leaves 137-145) Available on microfilm from Pro Quest Information and Learning.
Chen, Shih-Chang y 陳世璋. "Developing GEN_BLOCK Redistribution Algorithms and Optimization Techniques on Parallel, Distributed and Multi-Core Systems". Thesis, 2010. http://ndltd.ncl.edu.tw/handle/31998536558283534640.
Texto completo中華大學
工程科學博士學位學程
99
Parallel computing systems have been used to solve complex scientific problems with aggregates of data such as arrays to extend sequential programming language. With the improvement of hardware architectures, parallel systems can be a cluster, multiple clusters or a multi-cluster with multi-core machines. Under this paradigm, appropriate data distribution is critical to the performance of each phase in a multi-phase program. Because the phases of a program are different from one another, the optimal distribution changes due to the characteristics of each phase, as well as on those of the following phase. In order to achieve good load balancing, improved data locality and reduced inter-processor communication during runtime, data redistribution is critical during operation. In this study, formulas for message generation, three scheduling algorithms for single cluster, multiple clusters and multi-cluster system with multi-core machines and a power saving technique are proposed to solve problems for GEN_BLOCK redistribution. Formulas for message generation provide much information of source, destination and data which are needed before scheduling algorithms giving effective results. Each node can use the formulas to obtain the information simply, effectively and independently. An effective scheduling algorithm for a cluster system is proposed to apply on heterogeneous environment. It not only guarantees minimal schedule steps but also shortens communication cost. Multi-cluster computing provides complex network and heterogeneous processors to perform GEN_BLOCK redistribution. To adapt this architecture, a new scheduling algorithm is proposed to provide better result in terms of communication cost. This technique classifies transmissions among clusters into three types and schedules transmissions inside a node together to avoid synchronization delay. While employing multi-core machines to be a part of parallel systems, present scheduling algorithms are doubted to deliver good performance. In addition, efficient power saving techniques are not under consideration for any scheduling algorithms. Therefore, four kinds of transmission time are designed for messages to increase scheduling efficiency. While performing proposed scheduling algorithm, the efficient power saving technique is also executed to evaluate the voltage value to save energy for each core on the complicated system.
Nikjah, Reza. "Performance evaluation and protocol design of fixed-rate and rateless coded relaying networks". Phd thesis, 2010. http://hdl.handle.net/10048/1674.
Texto completoCommunications