Готові списки джерел за темами / Cluster OpenMP implementations

Добірка наукової літератури з теми "Cluster OpenMP implementations"

Автор: Grafiati

Опубліковано: 10 грудня 2022

Оновлено: 28 січня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Cluster OpenMP implementations".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Cluster OpenMP implementations"

Saeed, Firas Mahmood, Salwa M. Ali, and Mohammed W. Al-Neama. "A parallel time series algorithm for searching similar sub-sequences." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 3 (March 1, 2022): 1652. http://dx.doi.org/10.11591/ijeecs.v25.i3.pp1652-1661.

Повний текст джерела

Анотація:

<p><span>Dynamic time warping (DTW) is an important metric for measuring similarity for most time series applications. The computations of DTW cost too much especially with the gigantic of sequence databases and lead to an urgent need for accelerating these computations. However, the multi-core cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposes a highly efficient parallel vectorized algorithm with high performance for computing DTW, addressed to multi-core clusters using the Intel quad-core Xeon co-processors. It deduces an efficient architecture. Implementations employ the potential of both message passing interface (MPI) and OpenMP libraries. The implementation is based on the OpenMP parallel programming technology and offloads execution mode, where part of the code sub-sequences on the processor side, which are uploaded to the co-processor for the DTW computations. The results of experiments confirm the effectiveness of the algorithm.</span></p>

Стилі APA, Harvard, Vancouver, ISO та ін.

Al-Neama, Mohammed W., Naglaa M. Reda, and Fayed F. M. Ghaleb. "An Improved Distance Matrix Computation Algorithm for Multicore Clusters." BioMed Research International 2014 (2014): 1–12. http://dx.doi.org/10.1155/2014/406178.

Повний текст джерела

Анотація:

Distance matrix has diverse usage in different research areas. Its computation is typically an essential task in most bioinformatics applications, especially in multiple sequence alignment. The gigantic explosion of biological sequence databases leads to an urgent need for accelerating these computations.DistVectalgorithm was introduced in the paper of Al-Neama et al. (in press) to present a recent approach for vectorizing distance matrix computing. It showed an efficient performance in both sequential and parallel computing. However, the multicore cluster systems, which are available now, with their scalability and performance/cost ratio, meet the need for more powerful and efficient performance. This paper proposesDistVect1as highly efficient parallel vectorized algorithm with high performance for computing distance matrix, addressed to multicore clusters. It reformulatesDistVect1vectorized algorithm in terms of clusters primitives. It deduces an efficient approach of partitioning and scheduling computations, convenient to this type of architecture. Implementations employ potential of both MPI and OpenMP libraries. Experimental results show that the proposed method performs improvement of around 3-fold speedup upon SSE2. Further it also achieves speedups more than 9 orders of magnitude compared to the publicly available parallel implementation utilized in ClustalW-MPI.

Стилі APA, Harvard, Vancouver, ISO та ін.

Thomas, Nathan, Steven Saunders, Tim Smith, Gabriel Tanase, and Lawrence Rauchwerger. "ARMI: A High Level Communication Library for STAPL." Parallel Processing Letters 16, no. 02 (June 2006): 261–80. http://dx.doi.org/10.1142/s0129626406002617.

Повний текст джерела

Анотація:

ARMI is a communication library that provides a framework for expressing fine-grain parallelism and mapping it to a particular machine using shared-memory and message passing library calls. The library is an advanced implementation of the RMI protocol and handles low-level details such as scheduling incoming communication and aggregating outgoing communication to coarsen parallelism. These details can be tuned for different platforms to allow user codes to achieve the highest performance possible without manual modification. ARMI is used by STAPL, our generic parallel library, to provide a portable, user transparent communication layer. We present the basic design as well as the mechanisms used in the current Pthreads/OpenMP, MPI implementations and/or a combination thereof. Performance comparisons between ARMI and explicit use of Pthreads or MPI are given on a variety of machines, including an HP-V2200, Origin 3800, IBM Regatta and IBM RS/6000 SP cluster.

Стилі APA, Harvard, Vancouver, ISO та ін.

SCHUBERT, GERALD, HOLGER FEHSKE, GEORG HAGER, and GERHARD WELLEIN. "HYBRID-PARALLEL SPARSE MATRIX-VECTOR MULTIPLICATION WITH EXPLICIT COMMUNICATION OVERLAP ON CURRENT MULTICORE-BASED SYSTEMS." Parallel Processing Letters 21, no. 03 (September 2011): 339–58. http://dx.doi.org/10.1142/s0129626411000254.

Повний текст джерела

Анотація:

We evaluate optimized parallel sparse matrix-vector operations for several representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Beyond the single node, the performance of parallel sparse matrix-vector operations is often limited by communication overhead. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. Moreover we identify performance benefits of hybrid MPI/OpenMP programming due to improved load balancing even without explicit communication overlap. We compare performance results for pure MPI, the widely used "vector-like" hybrid programming strategies, and explicit overlap on a modern multicore-based cluster and a Cray XE6 system.

Стилі APA, Harvard, Vancouver, ISO та ін.

Речкалов, Т. В., and М. Л. Цымблер. "A parallel data clustering algorithm for Intel MIC accelerators." Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie), no. 2 (March 28, 2019): 104–15. http://dx.doi.org/10.26089/nummet.v20r211.

Повний текст джерела

Анотація:

Алгоритм PAM (Partitioning Around Medoids) представляет собой разделительный алгоритм кластеризации, в котором в качестве центров кластеров выбираются только кластеризуемые объекты (медоиды). Кластеризация на основе техники медоидов применяется в широком спектре приложений: сегментирование медицинских и спутниковых изображений, анализ ДНК-микрочипов и текстов и др. На сегодня имеются параллельные реализации PAM для систем GPU и FPGA, но отсутствуют таковые для многоядерных ускорителей архитектуры Intel Many Integrated Core (MIC). В настоящей статье предлагается новый параллельный алгоритм кластеризации PhiPAM для ускорителей Intel MIC. Вычисления распараллеливаются с помощью технологии OpenMP. Алгоритм предполагает использование специализированной компоновки данных в памяти и техники тайлинга, позволяющих эффективно векторизовать вычисления на системах Intel MIC. Эксперименты, проведенные на реальных наборах данных, показали хорошую масштабируемость алгоритма. The PAM (Partitioning Around Medoids) is a partitioning clustering algorithm where each cluster is represented by an object from the input dataset (called a medoid). The medoid-based clustering is used in a wide range of applications: the segmentation of medical and satellite images, the analysis of DNA microarrays and texts, etc. Currently, there are parallel implementations of PAM for GPU and FPGA systems, but not for Intel Many Integrated Core (MIC) accelerators. In this paper, we propose a novel parallel PhiPAM clustering algorithm for Intel MIC systems. Computations are parallelized by the OpenMP technology. The algorithm exploits a sophisticated memory data layout and loop tiling technique, which allows one to efficiently vectorize computations with Intel MIC. Experiments performed on real data sets show a good scalability of the algorithm.

Стилі APA, Harvard, Vancouver, ISO та ін.

Osthoff, Carla, Francieli Zanon Boito, Rodrigo Virote Kassick, Laércio Lima Pilla, Philippe O. A. Navaux, Claudio Schepke, Jairo Panetta, et al. "Atmospheric models hybrid OpenMP/MPI implementation multicore cluster evaluation." International Journal of Information Technology, Communications and Convergence 2, no. 3 (2012): 212. http://dx.doi.org/10.1504/ijitcc.2012.050411.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Mahinthakumar, G., and F. Saied. "A Hybrid Mpi-Openmp Implementation of an Implicit Finite-Element Code on Parallel Architectures." International Journal of High Performance Computing Applications 16, no. 4 (November 2002): 371–93. http://dx.doi.org/10.1177/109434200201600402.

Повний текст джерела

Анотація:

Summary The hybrid MPI-OpenMP model is a natural parallel programming paradigm for emerging parallel architectures that are based on symmetric multiprocessor (SMP) clusters. This paper presents a hybrid implementation adapted for an implicit finite-element code developed for groundwater transport simulations. The original code was parallelized for distributed memory architectures using MPI (Message Passing Interface) using a domain decomposition strategy. OpenMP directives were then added to the code (a straightforward loop-level implementation) to use multiple threads within each MPI process. To improve the OpenMP performance, several loop modifications were adopted. The parallel performance results are compared for four modern parallel architectures. The results show that for most of the cases tested, the pure MPI approach outperforms the hybrid model. The exceptions to this observation were mainly due to a limitation in the MPI library implementation on one of the architectures. A general conclusion is that while the hybrid model is a promising approach for SMP cluster architectures, at the time of this writing, the payoff may not be justified for converting all existing MPI codes to hybrid codes. However, improvements in OpenMP compilers combined with potential MPI limitations in SMP nodes may make the hybrid approach more attractive for a broader set of applications in the future.

Стилі APA, Harvard, Vancouver, ISO та ін.

Smith, Lorna, and Mark Bull. "Development of Mixed Mode MPI / OpenMP Applications." Scientific Programming 9, no. 2-3 (2001): 83–98. http://dx.doi.org/10.1155/2001/450503.

Повний текст джерела

Анотація:

MPI / OpenMP mixed mode codes could potentially offer the most effective parallelisation strategy for an SMP cluster, as well as allowing the different characteristics of both paradigms to be exploited to give the best performance on a single SMP. This paper discusses the implementation, development and performance of mixed mode MPI / OpenMP applications. The results demonstrate that this style of programming will not always be the most effective mechanism on SMP systems and cannot be regarded as the ideal programming model for all codes. In some situations, however, significant benefit may be obtained from a mixed mode implementation. For example, benefit may be obtained if the parallel (MPI) code suffers from: poor scaling with MPI processes due to load imbalance or too fine a grain problem size, memory limitations due to the use of a replicated data strategy, or a restriction on the number of MPI processes combinations. In addition, if the system has a poorly optimised or limited scaling MPI implementation then a mixed mode code may increase the code performance.

Стилі APA, Harvard, Vancouver, ISO та ін.

Huang, Lei, Barbara Chapman, and Zhenying Liu. "Towards a more efficient implementation of OpenMP for clusters via translation to global arrays." Parallel Computing 31, no. 10-12 (October 2005): 1114–39. http://dx.doi.org/10.1016/j.parco.2005.03.015.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Hua Zhong, Yong Sheng Liang, Tao He, and Yi Li. "AOI Multi-Core Parallel System for TFT-LCD Defect Detection." Advanced Materials Research 472-475 (February 2012): 2325–31. http://dx.doi.org/10.4028/www.scientific.net/amr.472-475.2325.

Повний текст джерела

Анотація:

The present Automatic Optical Inspection (AOI) technology can hardly satisfy online inspection requirements for large-scale high-speed, high-precision and high-sensitivity TFT-LCD. First, through studying the working principle of TFT-LCD Defect AOI System, the system architecture for mixed-parallel multi-core computer cluster is proposed to satisfy design requirements. Second, the study focuses on the software framework of AOI system and related key software technology. Finally, the fusion programming model for parallel image processing and its implementation strategy is proposed based on OpenMP, MPI, OpenCV, and Intel Integrated Performance Primitives (IPP).

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Cluster OpenMP implementations"

Tran, Van Long. "Optimization of checkpointing and execution model for an implementation of OpenMP on distributed memory architectures." Thesis, Evry, Institut national des télécommunications, 2018. http://www.theses.fr/2018TELE0017/document.

Повний текст джерела

Анотація:

OpenMP et MPI sont devenus les outils standards pour développer des programmes parallèles sur une architecture à mémoire partagée et à mémoire distribuée respectivement. Comparé à MPI, OpenMP est plus facile à utiliser. Ceci est dû au fait qu’OpenMP génère automatiquement le code parallèle et synchronise les résultats à l’aide de directives, clauses et fonctions d’exécution, tandis que MPI exige que les programmeurs fassent ce travail manuellement. Par conséquent, des efforts ont été faits pour porter OpenMP sur les architectures à mémoire distribuée. Cependant, à l’exclusion de CAPE, aucune solution ne satisfait les deux exigences suivantes: 1) être totalement conforme à la norme OpenMP et 2) être hautement performant. CAPE (Checkpointing-Aided Parallel Execution) est un framework qui traduit et fournit automatiquement des fonctions d’exécution pour exécuter un programme OpenMP sur une architecture à mémoire distribuée basé sur des techniques de checkpoint. Afin d’exécuter un programme OpenMP sur un système à mémoire distribuée, CAPE utilise un ensemble de modèles pour traduire le code source OpenMP en code source CAPE, puis le code source CAPE est compilé par un compilateur C/C++ classique. Fondamentalement, l’idée de CAPE est que le programme s’exécute d’abord sur un ensemble de nœuds du système, chaque nœud fonctionnant comme un processus. Chaque fois que le programme rencontre une section parallèle, le maître distribue les tâches aux processus esclaves en utilisant des checkpoints incrémentaux discontinus (DICKPT). Après l’envoi des checkpoints, le maître attend les résultats renvoyés par les esclaves. L’étape suivante au niveau du maître consiste à recevoir et à fusionner le résultat des checkpoints avant de les injecter dans sa mémoire. Les nœuds esclaves quant à eux reçoivent les différents checkpoints, puis l’injectent dans leur mémoire pour effectuer le travail assigné. Le résultat est ensuite renvoyé au master en utilisant DICKPT. À la fin de la région parallèle, le maître envoie le résultat du checkpoint à chaque esclave pour synchroniser l’espace mémoire du programme. Dans certaines expériences, CAPE a montré des performances élevées sur les systèmes à mémoire distribuée et constitue une solution viable entièrement compatible avec OpenMP. Cependant, CAPE reste en phase de développement, ses checkpoints et son modèle d’exécution devant être optimisés pour améliorer les performances, les capacités et la fiabilité. Cette thèse vise à présenter les approches proposées pour optimiser et améliorer la capacité des checkpoints, concevoir et mettre en œuvre un nouveau modèle d’exécution, et améliorer la capacité de CAPE. Tout d’abord, nous avons proposé une arithmétique sur les checkpoints qui modélise la structure de leurs données et ses opérations. Cette modélisation contribue à optimiser leur taille et à réduire le temps nécessaire à la fusion, tout en améliorant leur capacité. Deuxièmement, nous avons développé TICKPT (Time-Stamp Incremental Checkpointing) une implémentation de l’arithmétique sur les checkpoints. TICKPT est une amélioration de DICKPT, il a ajouté l’horodatage aux checkpoints pour en identifier l’ordre. L’analyse et les expériences comparées montrent TICKPT sont non seulement plus petites, mais qu’ils ont également moins d’impact sur les performances du programme. Troisièmement, nous avons conçu et implémenté un nouveau modèle d’exécution et de nouveaux prototypes pour CAPE basés sur TICKPT. Le nouveau modèle d’exécution permet à CAPE d’utiliser les ressources efficacement, d’éviter les risques de goulots d’étranglement et de satisfaire à l’exigence des les conditions de Bernstein. Au final, ces approches améliorent significativement les performances de CAPE, ses capacités et sa fiabilité. Le partage des données implémenté sur CAPE et basé sur l’arithmétique sur des checkpoints est ouvert et basé sur TICKPT. Cela démontre également la bonne direction que nous avons prise et rend CAPE plus complet
OpenMP and MPI have become the standard tools to develop parallel programs on shared-memory and distributed-memory architectures respectively. As compared to MPI, OpenMP is easier to use. This is due to the ability of OpenMP to automatically execute code in parallel and synchronize results using its directives, clauses, and runtime functions while MPI requires programmers do all this manually. Therefore, some efforts have been made to port OpenMP on distributed-memory architectures. However, excluding CAPE, no solution has successfully met both requirements: 1) to be fully compliant with the OpenMP standard and 2) high performance. CAPE stands for Checkpointing-Aided Parallel Execution. It is a framework that automatically translates and provides runtime functions to execute OpenMP program on distributed-memory architectures based on checkpointing techniques. In order to execute an OpenMP program on distributed-memory system, CAPE uses a set of templates to translate OpenMP source code to CAPE source code, and then, the CAPE source code is compiled by a C/C++ compiler. This code can be executed on distributed-memory systems under the support of the CAPE framework. Basically, the idea of CAPE is the following: the program first run on a set of nodes on the system, each node being executed as a process. Whenever the program meets a parallel section, the master distributes the jobs to the slave processes by using a Discontinuous Incremental Checkpoint (DICKPT). After sending the checkpoints, the master waits for the returned results from the slaves. The next step on the master is the reception and merging of the resulting checkpoints before injecting them into the memory. For slave nodes, they receive different checkpoints, and then, they inject it into their memory to compute the divided job. The result is sent back to the master using DICKPTs. At the end of the parallel region, the master sends the result of the checkpoint to every slaves to synchronize the memory space of the program as a whole. In some experiments, CAPE has shown very high-performance on distributed-memory systems and is a viable and fully compatible with OpenMP solution. However, CAPE is in the development stage. Its checkpoint mechanism and execution model need to be optimized in order to improve the performance, ability, and reliability. This thesis aims at presenting the approaches that were proposed to optimize and improve checkpoints, design and implement a new execution model, and improve the ability for CAPE. First, we proposed arithmetics on checkpoints, which aims at modeling checkpoint’s data structure and its operations. This modeling contributes to optimize checkpoint size and reduces the time when merging, as well as improve checkpoints capability. Second, we developed TICKPT which stands for Time-stamp Incremental Checkpointing as an instance of arithmetics on checkpoints. TICKPT is an improvement of DICKPT. It adds a timestamp to checkpoints to identify the checkpoints order. The analysis and experiments to compare it to DICKPT show that TICKPT do not only provide smaller in checkpoint size, but also has less impact on the performance of the program using checkpointing. Third, we designed and implemented a new execution model and new prototypes for CAPE based on TICKPT. The new execution model allows CAPE to use resources efficiently, avoid the risk of bottlenecks, overcome the requirement of matching the Bernstein’s conditions. As a result, these approaches make CAPE improving the performance, ability as well as reliability. Four, Open Data-sharing attributes are implemented on CAPE based on arithmetics on checkpoints and TICKPT. This also demonstrates the right direction that we took, and makes CAPE more complete

Стилі APA, Harvard, Vancouver, ISO та ін.

Cai, Jie. "Region-based techniques for modeling and enhancing cluster OpenMP performance." Phd thesis, 2011. http://hdl.handle.net/1885/8865.

Повний текст джерела

Анотація:

Cluster OpenMP enables the use of the OpenMP shared memory programming clusters. Intel has released a cluster OpenMP implementation called Intel Cluster OpenMP (CLOMP). While this offers better programmability than message passing alternatives such as the Message Passing Interface (MPI), such convenience comes with overheads resulting from having to maintain the consistency of underlying shared memory abstractions. CLOMP is no exception. This thesis introduces models for understanding these overheads of cluster OpenMP implementations like CLOMP and proposes techniques for enhancing their performance. Cluster OpenMP systems are usually implemented using page-based software distributed shared memory systems. A key issue for such system is maintaining the consistency of the shared memory space. This forms a major source of overhead, and it is driven by detecting and servicing page faults. To understand these systems, we evaluate their performance with different OpenMP applications, and we also develop a benchmark, called MCBENCH, to characterize the memory consistency costs. Using MCBENCH, we discover that this overhead is proportional to the number of writers to the same shared page and the number of shared pages. Furthermore, we divide an OpenMP program into parallel and serial regions. Based on the regions, we develop two region-based models to rationalize the numbers and types of the page faults and their associated costs to performance. The models highlight the fact that the major overhead is servicing the type of page faults, which requires data to be transferred across a network. With this understanding, we have developed three region-based prefetch (ReP) techniques based on the execution history of each region. The first ReP technique (TReP) considers temporal paging behaviour between consecutive executions of the same region. The second technique (HReP) considers both the temporal paging behaviour between consecutive region executions and the spatial paging behaviour within a region execution. The last technique (DReP) utilizes a novel stride-augmented run length encoding (sRLE) method to address the both the temporal and spatial paging behaviour between consecutive region executions. RePs effectively reduce the number of page faults and aggregate data into larger transfers, which leverages the network bandwidth provided by interconnects. All three ReP techniques are implemented into runtime libraries of CLOMP to enhance its performance. Both the original and the enhanced CLOMP are evaluated using the NAS Parallel Benchmark OpenMP (NPB-OMP) suite, and two LINPACK OpenMP benchmarks on two clusters connected with Ethernet and InfiniBand interconnects. The performance data is quantitatively analyzed and modeled. MCBENCH is used to evaluate the impact of ReP techniques on memory consistency cost. The evaluation results demonstrate that, on average, CLOMP spends 75% and 55% overall elapsed time of the NPB-OMP benchmarks on Gigabit Ethernet and double data rate InfiniBand network respectively. These ratios of the NPB-OMP benchmarks are reduced effectively by ?60% and ?40% after implementing the ReP techniques on to the CLOMP runtime. For the LINPACK benchmarks, with the assistance of sRLE, DReP significantly outperforms the other ReP techniques with effectively reducing 50% and 58% of page fault handling costs on the Ethernet and InfiniBand networks respectively.

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Cluster OpenMP implementations"

Wong, H. J., J. Cai, A. P. Rendell, and P. Strazdins. "Micro-benchmarks for Cluster OpenMP Implementations: Memory Consistency Costs." In OpenMP in a New Era of Parallelism, 60–70. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-79561-2_6.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Eachempati, Deepak, Lei Huang, and Barbara Chapman. "Strategies and Implementation for Translating OpenMP Code for Clusters." In High Performance Computing and Communications, 420–31. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-75444-2_42.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Liu, Zhenying, Lei Huang, Barbara Chapman, and Tien-Hsiung Weng. "Efficient Implementation of OpenMP for Clusters with Implicit Data Distribution." In Lecture Notes in Computer Science, 121–36. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/978-3-540-31832-3_11.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Briguglio, Sergio, Beniamino Di Martino, Giuliana Fogaccia, and Gregorio Vlad. "Hierarchical MPI+OpenMP Implementation of Parallel PIC Applications on Clusters of Symmetric MultiProcessors." In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 180–87. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/978-3-540-39924-7_27.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Cabral, Frederico, Carla Osthoff, Roberto Pinto Souto, Gabriel P. Costa, Sanderson L. Gonzaga de Oliveira, Diego N. Brandão, and Mauricio Kischinhevsky. "An Improved OpenMP Implementation of the TVD–Hopmoc Method Based on a Cluster of Points." In High Performance Computing for Computational Science – VECPAR 2018, 132–45. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-15996-2_10.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Takahashi, Daisuke. "A Hybrid MPI/OpenMP Implementation of a Parallel 3-D FFT on SMP Clusters." In Parallel Processing and Applied Mathematics, 970–77. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11752578_117.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Pavão, Pedro Nuno Rebelo, João Pedro Almeida Couto, and Maria Manuela Santos Natário. "A Tale of Different Realities." In The Role of Knowledge Transfer in Open Innovation, 262–80. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-5849-1.ch013.

Повний текст джерела

Анотація:

This chapter aims to identify the determinants that affect innovation capacity at regional level in Europe. It proposes modelling the territorial innovation capacity and identifies relevant factors with influence on the innovation capacity at a regional level. The chapter uses the Regional Innovation Scoreboard database and cluster analysis to detect behavioral patterns in terms of innovation performance in European regions. The results show that innovation capacity is related to regional governance, and particularly regional autonomy, regional control of innovation policy, influencing the affectation of structural funds, and the region's location within the European Union. Cohesion policy criteria is also a significant factor, demonstrating the adequacy of the European regional policy's new programming regarding innovation policy. These results point to the importance of the participation of regions in formulation, and implementation bottom-up strategies to develop innovation dynamics and develop partnerships with other public and/or private actors.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ntseane, Peggy Gabo, and Idowu Biao. "Learning Cities." In Advances in Electronic Government, Digital Divide, and Regional Development, 73–93. IGI Global, 2019. http://dx.doi.org/10.4018/978-1-5225-8134-5.ch004.

Повний текст джерела

Анотація:

This chapter opens up with the suggestion that the “leaning cities” concept may well apply to ancient cities since learning has characterized life in all cities of the world since time immemorial. However, it is acknowledged that the “learning cities” construct was specifically originated during the 20th century for the purpose of assisting city dwellers cope with the challenges of modern city life. Dwelling on the situation in Sub-Saharan Africa, the chapter reveals that learning cities projects are not currently popular in the sub-continent. This lack of interest has been attributed to the fact that Africans were never and are still not taken along during the process of transformation of both ancient and modern spaces into cities. Consequently, it is here recommended that a transformative learning process that uses both indigenous knowledges and endogenous city clusters as learning pads should be adopted for the revitalization of the implementation of learning cities projects in Sub-Saharan Africa.

Стилі APA, Harvard, Vancouver, ISO та ін.

Willis, James S., Matthieu Schaller, Pedro Gonnet, and John C. Helly. "A Hybrid MPI+Threads Approach to Particle Group Finding Using Union-Find." In Parallel Computing: Technology Trends. IOS Press, 2020. http://dx.doi.org/10.3233/apc200050.

Повний текст джерела

Анотація:

The Friends-of-Friends (FoF) algorithm is a standard technique used in cosmological N-body simulations to identify structures. Its goal is to find clusters of particles (called groups) that are separated by at most a cut-off radius. N-body simulations typically use most of the memory present on a node, leaving very little free for a FoF algorithm to run on-the-fly. We propose a new method that utilises the common Union-Find data structure and a hybrid MPI+threads approach. The algorithm can also be expressed elegantly in a task-based formalism if such a framework is used in the rest of the application. We have implemented our algorithm in the open-source cosmological code, SWIFT. Our implementation displays excellent strong- and weak-scaling behaviour on realistic problems and compares favourably (speed-up of 18x) over other methods commonly used in the N-body community.

Стилі APA, Harvard, Vancouver, ISO та ін.

Teodoro, George. "Efficient Execution of Dataflows on Parallel and Heterogeneous Environments." In Advances in Systems Analysis, Software Engineering, and High Performance Computing, 1–17. IGI Global, 2013. http://dx.doi.org/10.4018/978-1-4666-2533-4.ch001.

Повний текст джерела

Анотація:

Current advances in computer architectures have transformed clusters, traditional high performance distributed platforms, into hierarchical environments, where each node has multiple heterogeneous processing units including accelerators such as GPUs. Although parallel heterogeneous environments are becoming common, their efficient use is still an open problem. Current tools for development of parallel applications are mainly concerning with the exclusive use of accelerators, while it is argued that the adequate coordination of heterogeneous computing cores can significantly improve performance. The approach taken in this chapter to efficiently use such environments, which is experimented in the context of replicated dataflow applications, consists of scheduling processing tasks according to their characteristics and to the processors specificities. Thus, we can better utilize the available hardware as we try to execute each task into the best-suited processor for it. The proposed approach has been evaluated using two applications, for which there were previously available CPU-only and GPU-only implementations. The experimental results show that using both devices simultaneously can improve the performance significantly; moreover, the proposed method doubled the performance of a demand driven approach that utilizes both CPU and GPU, on the two applications in several scenarios.

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Cluster OpenMP implementations"

Cai, Jie, Alistair P. Rendell, Peter E. Strazdins, and H'sien Jin Wong. "Performance models for Cluster-enabled OpenMP implementations." In 2008 13th Asia-Pacific Computer Systems Architecture Conference (ACSAC). IEEE, 2008. http://dx.doi.org/10.1109/apcsac.2008.4625433.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Santander-Jimenez, Sergio, and Miguel A. Vega-Rodriguez. "Applying OpenMP-based parallel implementations of NSGA-II and SPEA2 to study phylogenetic relationships." In 2014 IEEE International Conference On Cluster Computing (CLUSTER). IEEE, 2014. http://dx.doi.org/10.1109/cluster.2014.6968779.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Noor, Nor Rizuan Mat, and Tanya Vladimirova. "Parallel implementation of lossless clustered integer KLT using OpenMP." In 2012 NASA/ESA Conference on Adaptive Hardware and Systems (AHS). IEEE, 2012. http://dx.doi.org/10.1109/ahs.2012.6268639.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Tran, Van Long, Eric Renault, and Viet Hai Ha. "Optimization of Checkpoints and Execution Model for an Implementation of OpenMP on Distributed Memory Architectures." In 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 2017. http://dx.doi.org/10.1109/ccgrid.2017.119.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Xuan, Huailiang, Weiqin Tong, Zhixun Gong, and Youwen Lan. "Implementation and performance analysis of hybrid MPI+OpenMP programming for parallel MLFMA on SMP cluster." In 2012 Third International Conference on Intelligent Control and Information Processing (ICICIP). IEEE, 2012. http://dx.doi.org/10.1109/icicip.2012.6391557.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Andreev, Vyacheslav Viktorovich, Olga Vyacheslavovna Andreeva, and Vasiliy Evgenievich Gai. "Computer Modelling Based on the Percolation Theory of the Third Stage of Cracks Formation and Development on the Steel Microstructures Surfaces." In 32nd International Conference on Computer Graphics and Vision. Keldysh Institute of Applied Mathematics, 2022. http://dx.doi.org/10.20948/graphicon-2022-1059-1064.

Повний текст джерела

Анотація:

The modeling of the process of destruction of structural materials under cyclic loads in the field of high-cycle fatigue was considered. A phenomenological analysis of the main stages of the formation of cracks, mechanisms and schemes of their initiation, formed in the process of destruction of real objects, was carried out. The implementation of these mechanisms using the tools of the percolation theory made it possible to increase the reliability of modeling the processes occurring in real conditions during the destruction of a structure. The parameter of the model of damage accumulation on the images of the microstructure of the surface of metals and alloys was formulated, which makes it possible to detect the moment of completion of the formation of a crack. The fractal dimension of the percolation cluster obtained on the cells belonging to the damage was chosen as a parameter. To calculate the sizes of percolation clusters the Hoshen – Kopelman multiple labeling algorithm was used. The existing algorithm was supplemented with an auxiliary label for open cells belonging to the percolation cluster, which made it possible to get rid of the additional operation of comparing labels and re-marking nodes when combining parts of a single cluster. To confirm the effectiveness of the proposed parameter, simulation the process of damage accumulation on images of the surface microstructure was made. The magnitude of the error did not exceed 6.6% for calculations using the values of the fractal dimension of percolation clusters built on the cells belonging to the crack.

Стилі APA, Harvard, Vancouver, ISO та ін.

Darmawan, B. "The Implementation of Hybrid Parallel Computation for Complex and Fine Reservoir Model Using Cluster Technology." In Indonesian Petroleum Association 44th Annual Convention and Exhibition. Indonesian Petroleum Association, 2021. http://dx.doi.org/10.29118/ipa21-e-53.

Повний текст джерела

Анотація:

Pertamina EP plays an important role in maintaining the oil production supply for national energy stability. Thus, they bear a great responsibility to accelerate all the development plans and execute them in timely manner. However, there is big challenge in the realization of those plans since they are not fully equipped with the advance computing technology to boost the reservoir modeling and simulation phase. Therefore, the effort on finalizing and executing of 33 Plan of development (POD) projects within 5 years was looked like a never-ending project. To face the challenge, Pertamina EP evaluated the possibility to create a cluster technology that can accommodate high intensity of simulation numbers and high load of simulation process. The evaluation process covers: compiling, sorting and selecting the analog reservoir model (highest grid number and longest simulation time), benchmarking and performance test to get the most optimum cluster configuration. Supercomputer was then procured and set based on the optimized model, then completed by implementing the test on three most extreme POD models. This paper described the success story and innovation of a complex simulation and finer scale reservoir model using the hybrid parallel-computing technology with a set of 8 nodes high performing computer. Three models were tested with satisfying results. This paper discusses the parallel scalability of complex computing systems of multi-CPU clusters. Multi-CPU distributed memory computing system is proven to be able to improve and accelerate the reservoir modeling and simulation time, when it is used in combination with a new so called “hybrid” approach. In this approach, the common Message Passing Interface (MPI) synchronization between the cluster nodes is being interleaved with a shared memory system thread-based synchronization at the node level. The model with the longest simulation time has been accelerated by magnitude of 60%. The most exhausted model with highest number of simulation steps has been accelerated by magnitude of 80%. The model with the greatest number of grid (21.7 million active grids) has finally finished its simulation just in 27 minutes where previously was impossible to have it open and run. The successful study case is then followed by the implementation of the cluster computing technology for two pilot POD projects which led to the very good result. With this improvement, Pertamina EP can finally perform the probabilistic simulation as recommended by SKKMIGAS in PTK Rev-2/2018. It is now possible to run all 33 structures of multiple reservoir realizations for each POD.

Стилі APA, Harvard, Vancouver, ISO та ін.

Clauberg, Jan, Michael Leistner, and Heinz Ulbrich. "Hybrid-Parallel Calculation of Jacobians in Multi-Body Dynamics." In ASME 2013 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2013. http://dx.doi.org/10.1115/detc2013-12245.

Повний текст джерела

Анотація:

Implicit integration methods are often used for numerically stiff multi-body systems in order to reduce the timestep-size. But within most implicit integration methods, the jacobian of the right hand side must be calculated at every step. In case of multi-body dynamics, this is the vector of generalized forces and the calculation of its jacobian can be so expensive, that it often outweighs the advantage of a bigger step-size. For large systems, this task can take more than 99% of the whole simulation time, while it can be easily parallelized. In this paper, a hybrid-parallel implementation of calculating this jacobian is presented. With a combination of MPI and OpenMP, simulations on up to 512 cores are run. The experiments are carried out on a high-performance-computing cluster with an infiniband network. The performance of a pure MPI implementation is compared to several hybrid variants, using different numbers of OpenMP threads per process.

Стилі APA, Harvard, Vancouver, ISO та ін.

Miniello, G., and M. La Salandra. "HIGH RESOLUTION IMAGE PROCESSING AND LAND COVER CLASSIFICATION FOR HYDRO- GEOMORPHOLOGICAL HIGH-RISK AREA MONITORING." In 9th International Conference "Distributed Computing and Grid Technologies in Science and Education". Crossref, 2021. http://dx.doi.org/10.54546/mlit.2021.12.40.001.

Повний текст джерела

Анотація:

High-resolution image processing for land surface monitoring is fundamental to analyze the impact ofdifferent geomorphological processes on Earth surface for different climate change scenarios. In thiscontext, photogrammetry is one of the most reliable techniques to generate high-resolutiontopographic data, being key to territorial mapping and change detection analysis of landforms inhydro-geomorphological high-risk areas. An important issue arises as soon as the main goal is toconduct analyses over extended areas of the Earth surface (such as fluvial systems) in a short time,since the need to capture large datasets to develop detailed topographic models may limit thephotogrammetric process, due to the high demand of high-performance hardware. In order toinvestigate the best set up of computing resources for these very peculiar tasks, a study of theperformance of a photogrammetric workflow based on a FOSS (Free Open-Source Software) SfM(Structure from Motion) algorithm using different cluster configurations was conducted, leveragingthe computing power of ReCaS-Bari data center infrastructure, which hosts several services such asHTC, HPC, IaaS, PaaS. Exploiting the high-computing resources available at clusters and choosingspecific set up for the workflow steps, an important reduction of several hours in the processing timewas recorded, especially compared to classic photogrammetric programs processed on a singleworkstation with commercial softwares. The high quality of the image details can be used for landcover classification and preliminary change detection studies using Machine Learning techniques. Asubset of the datasets used for the workflow implementation has been considered to test theperformance of different Convolutional Neural Networks, using progressively more complex layersequences, data augmentation and callback functions for training the models. All the results are givenin terms of model accuracy and loss and performance evaluation.

Стилі APA, Harvard, Vancouver, ISO та ін.

Lisboa, Flávio Gomes da Silva. "A scalable distributed system based on microservices for collecting pod logs from a Kubernetes cluster." In Congresso Latino-Americano de Software Livre e Tecnologias Abertas. Sociedade Brasileira de Computação - SBC, 2021. http://dx.doi.org/10.5753/latinoware.2021.19916.

Повний текст джерела

Анотація:

This article presents the architecture of a distributed log collection system for Kubernetes clusters. Initially, we present the motivation for creating the system. Then, we present an overview of the system, consisting of several microservices. Next, we approach the implementation of each of the microservices. All system components are free and open software. This article is an example of how a distributed system with microservices can be heterogeneous in relation to the programming languages used.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!