Log in

Relevant bibliographies by topics / Parallel computing paradigm / Journal articles

Journal articles on the topic 'Parallel computing paradigm'

To see the other types of publications on this topic, follow the link: Parallel computing paradigm.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Parallel computing paradigm.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Gorodnyaya, Lidia. "FUNCTIONAL PROGRAMMING FOR PARALLEL COMPUTING." Bulletin of the Novosibirsk Computing Center. Series: Computer Science, no. 45 (2021): 29–48. http://dx.doi.org/10.31144/bncc.cs.2542-1972.2021.n45.p29-48.

Full text

Abstract:

The paper is devoted to modern trends in the application of functional programming to the problems of organizing parallel computations. Functional programming is considered as a meta-paradigm for solving the problems of developing multi-threaded programs for multiprocessor complexes and distributed systems, as well as for solving the problems associated with rapid IT development. The semantic and pragmatic principles of functional programming and consequences of these principles are described. The paradigm analysis of programming languages and systems is used, which allows assessing their similarities and differences. Taking into account these features is necessary when predicting the course of application processes, as well as when planning the study and organization of program development. There are reasons to believe that functional programming is capable of improving program performance through its adaptability to modeling and prototyping. A variety of features and characteristics inherent in the development and debugging of long-lived parallel computing programs is shown. The author emphasizes the prospects of functional programming as a universal technique for solving complex problems burdened with difficult to verify and poorly compatible requirements. A brief outline of the requirements for a multiparadigm parallel programming language is given.

APA, Harvard, Vancouver, ISO, and other styles

2

Zmejev, D. N., and N. N. Levchenko. "Aspects of Creating Parallel Programs in Dataflow Programming Paradigm." Informacionnye Tehnologii 28, no. 11 (November 17, 2022): 597–606. http://dx.doi.org/10.17587/it.28.597-606.

Full text

Abstract:

The imperative programming paradigm is the main one for creating sequential and parallel programs for the vast majority of modern computers, including supercomputers. A feature of the imperative paradigm is the sequence of commands. This feature is an obstacle to the creation of efficient parallel programs, since parallelism is achieved at the expense of additional code. One of the solutions to the problem of overhead for parallel computing is the creation of such a computing model and the architecture of the system that implements it, for which the parallel execution of the algorithm is an immanent property. This model is a dataflow computing model with a dynamically formed context and the architecture of the parallel dataflow computing system "Buran". A complete transition to dataflow systems is hampered, among other things, by the conceptual difference between the dataflow programming paradigm and the imperative one. The article compares these two paradigms. First, parallel data processing is an inherent property of a dataflow program. Second, the dataflow program consists of three elements: a set of initial data, a program code, and a parameterizable distribution function. And third, a conceptually different approach to the algorithmization of the task — the data themselves store information about who should process them (in traditional programs, on the contrary, the command stores information about what data should be processed). The article also presents the structure of a dataflow program and the route for creating a dataflow algorithm. The translation of basic algorithmic constructions (following, branching, loops) is considered on the example of simple problems.

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Nen-Zi, and Hsin-Yi Chen. "A cross-platform parallel programming model for fluid-film lubrication optimization." Industrial Lubrication and Tribology 70, no. 6 (August 13, 2018): 1002–11. http://dx.doi.org/10.1108/ilt-11-2016-0283.

Full text

Abstract:

Purpose A cross-platform paradigm (computing model), which combines the graphical user interface of MATLAB and parallel Fortran programming, for fluid-film lubrication analysis is proposed. The purpose of this paper is to take the advantages of effective multithreaded computing of OpenMP and MATLAB’s user-friendly interface and real-time display capability. Design/methodology/approach A validation of computing performance of MATLAB and Fortran coding for solving two simple sliders by iterative solution methods is conducted. The online display of the particles’ search process is incorporated in the MATLAB coding, and the execution of the air foil bearing optimum design is conducted by using OpenMP multithreaded computing in the background. The optimization analysis is conducted by particle swarm optimization method for an air foil bearing design. Findings It is found that the MATLAB programs require prolonged execution times than those by using Fortran computing in iterative methods. The execution time of the air foil bearing optimum design is significantly minimized by using the OpenMP computing. As a result, the cross-platform paradigm can provide a useful graphical user interface. And very little code rewritting of the original numerical models is required, which is usually optimized for either serial or parallel computing. Research limitations/implications Iterative methods are commonly applied in fluid-film lubrication analyses. In this study, iterative methods are used as the solution methods, which may not be an effective way to compute in the MATLAB’s setting. Originality/value In this study, a cross-platform paradigm consisting of a standalone MATLAB and Fortran codes is proposed. The approach combines the best of the two paradigms and each coding can be modified or maintained independently for different applications.

APA, Harvard, Vancouver, ISO, and other styles

4

Lohani, Bhanu Prakash, Vimal Bibhu, and Ajit Singh. "Review of Evolutionary Algorithms based on parallel computing paradigm." International Journal of Computer Science and Engineering 4, no. 6 (June 25, 2017): 1–4. http://dx.doi.org/10.14445/23488387/ijcse-v4i6p101.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Zhao, Jin, Fan, Song, Zhou, and Jiang. "High-performance Overlay Analysis of Massive Geographic Polygons That Considers Shape Complexity in a Cloud Environment." ISPRS International Journal of Geo-Information 8, no. 7 (June 26, 2019): 290. http://dx.doi.org/10.3390/ijgi8070290.

Full text

Abstract:

: Overlay analysis is a common task in geographic computing that is widely used in geographic information systems, computer graphics, and computer science. With the breakthroughs in Earth observation technologies, particularly the emergence of high-resolution satellite remote-sensing technology, geographic data have demonstrated explosive growth. The overlay analysis of massive and complex geographic data has become a computationally intensive task. Distributed parallel processing in a cloud environment provides an efficient solution to this problem. The cloud computing paradigm represented by Spark has become the standard for massive data processing in the industry and academia due to its large-scale and low-latency characteristics. The cloud computing paradigm has attracted further attention for the purpose of solving the overlay analysis of massive data. These studies mainly focus on how to implement parallel overlay analysis in a cloud computing paradigm but pay less attention to the impact of spatial data graphics complexity on parallel computing efficiency, especially the data skew caused by the difference in the graphic complexity. Geographic polygons often have complex graphical structures, such as many vertices, composite structures including holes and islands. When the Spark paradigm is used to solve the overlay analysis of massive geographic polygons, its calculation efficiency is closely related to factors such as data organization and algorithm design. Considering the influence of the shape complexity of polygons on the performance of overlay analysis, we design and implement a parallel processing algorithm based on the Spark paradigm in this paper. Based on the analysis of the shape complexity of polygons, the overlay analysis speed is improved via reasonable data partition, distributed spatial index, a minimum boundary rectangular filter and other optimization processes, and the high speed and parallel efficiency are maintained.

APA, Harvard, Vancouver, ISO, and other styles

6

Yu, Qiu Dong, Yun Chen Tian, and Xu Feng Hua. "Research on Security of Agricultural Information Model Based on Cloud Computing." Applied Mechanics and Materials 687-691 (November 2014): 1970–73. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.1970.

Full text

Abstract:

In 2007, as a new computing paradigm, Cloud Computing go into people's sight. It is a technology that develop with the parallel computing, distributed computing, utility computing, a new computing paradigm and the emergence of grid computing development. It is also the result of network storage, virtualization and load balancing development. As an Internet-based super computing paradigm, cloud computing allows users to dynamically share the hardware, software, and data resources. In the process of sharing resources, it is inevitable to involve security issues of network transmission. In order to solve the security problems of network data integration cloud computing facing agricultural resources, this study proposes a network security model which can be proved to be much useful in the near future.

APA, Harvard, Vancouver, ISO, and other styles

7

Sbalzarini, Ivo F. "Abstractions and Middleware for Petascale Computing and Beyond." International Journal of Distributed Systems and Technologies 1, no. 2 (April 2010): 40–56. http://dx.doi.org/10.4018/jdst.2010040103.

Full text

Abstract:

As high-performance computing moves to the petascale and beyond, a number of algorithmic and software challenges need to be addressed. This paper reviews the main performance-limiting factors in today’s high-performance computing software and outlines a possible new programming paradigm to address them. The proposed paradigm is based on abstract parallel data structures and operations that encapsulate much of the complexity of an application, but still make communication overhead explicit. The authors argue that all numerical simulations can be formulated in terms of the presented abstractions, which thus define an abstract semantic specification language for parallel numerical simulations. Simulations defined in this language can automatically be translated to source code containing the appropriate calls to a middleware that implements the underlying abstractions. Finally, the structure and functionality of such a middleware are outlined while demonstrating its feasibility on the example of the parallel particle-mesh library (PPM).

APA, Harvard, Vancouver, ISO, and other styles

8

LIN, RONG, STEPHAN OLARIU, JAMES L. SCHWING, and JINGYUAN ZHANG. "COMPUTING ON RECONFIGURABLE BUSES—A NEW COMPUTATIONAL PARADIGM." Parallel Processing Letters 04, no. 04 (December 1994): 465–76. http://dx.doi.org/10.1142/s0129626494000430.

Full text

Abstract:

Up to now, buses have been used exclusively to ferry data around. The contribution of this work is to show that buses can be used both as topological descriptors and as powerful computational devices. We illustrate the power of this paradigm by designing two fast algorithms for image segmentation and parallel visibility. Our algorithm for image segmentation uses a novel technique invol ving building a bus around every region of interest in the image. With a binary image pretiled in the natural way on a reconfigurable mesh of size N×N our segmentation algorithm runs in O( log N) time, improving by a factor of O( log N) over the state of the art. Next, we exhibit a very simple algorithm to solve the parallel visibility problem on an image of size N×N. Our algorithm runs in O( log N) time. The only previously-known algorithm for this problem runs in O( log N) time on a hypercube with N processors. To support these algorithms, a set of basic building blocks are developed which are of independent interest. These include solutions to the following problems on a bus on length N: (1) computing the prefix maxima of items stored by the processors on the bus, even if none of the processors knows its rank on the bus; (2) computing the rank of every processor on the bus; (3) electing a leader on a closed bus.

APA, Harvard, Vancouver, ISO, and other styles

9

Moreno Escobar, Jesús Jaime, Oswaldo Morales Matamoros, Ricardo Tejeida Padilla, Liliana Chanona Hernández, Juan Pablo Francisco Posadas Durán, Ana Karen Pérez Martínez, Ixchel Lina Reyes, and Hugo Quintana Espinosa. "Biomedical Signal Acquisition Using Sensors under the Paradigm of Parallel Computing." Sensors 20, no. 23 (December 7, 2020): 6991. http://dx.doi.org/10.3390/s20236991.

Full text

Abstract:

There are several pathologies attacking the central nervous system and diverse therapies for each specific disease. These therapies seek as far as possible to minimize or offset the consequences caused by these types of pathologies and disorders in the patient. Therefore, comprehensive neurological care has been performed by neurorehabilitation therapies, to improve the patients’ life quality and facilitating their performance in society. One way to know how the neurorehabilitation therapies contribute to help patients is by measuring changes in their brain activity by means of electroencephalograms (EEG). EEG data-processing applications have been used in neuroscience research to be highly computing- and data-intensive. Our proposal is an integrated system of Electroencephalographic, Electrocardiographic, Bioacoustic, and Digital Image Acquisition Analysis to provide neuroscience experts with tools to estimate the efficiency of a great variety of therapies. The three main axes of this proposal are: parallel or distributed capture, filtering and adaptation of biomedical signals, and synchronization in real epochs of sampling. Thus, the present proposal underlies a general system, whose main objective is to be a wireless benchmark in the field. In this way, this proposal could acquire and give some analysis tools for biomedical signals used for measuring brain interactions when it is stimulated by an external system during therapies, for example. Therefore, this system supports extreme environmental conditions, when necessary, which broadens the spectrum of its applications. In addition, in this proposal sensors could be added or eliminated depending on the needs of the research, generating a wide range of configuration limited by the number of CPU cores, i.e., the more biosensors, the more CPU cores will be required. To validate the proposed integrated system, it is used in a Dolphin-Assisted Therapy in patients with Infantile Cerebral Palsy and Obsessive–Compulsive Disorder, as well as with a neurotypical one. Event synchronization of sample periods helped isolate the same therapy stimulus and allowed it to be analyzed by tools such as the Power Spectrum or the Fractal Geometry.

APA, Harvard, Vancouver, ISO, and other styles

10

Joshi, R. K., and D. J. Ram. "Anonymous remote computing: a paradigm for parallel programming on interconnected workstations." IEEE Transactions on Software Engineering 25, no. 1 (1999): 75–90. http://dx.doi.org/10.1109/32.748919.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Nguyen, DuyQuang, and Miguel J. Bagajewicz. "Parallel computing approaches to sensor network design using the value paradigm." Computers & Chemical Engineering 35, no. 6 (June 2011): 1119–34. http://dx.doi.org/10.1016/j.compchemeng.2010.07.014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Nowicki, Marek, Magdalena Ryczkowska, Łukasz Gorski, Michał Szynkiewicz, and Piotr Bała. "PCJ - a Java Library for Heterogenous Parallel Computing." WSEAS TRANSACTIONS ON COMPUTERS 21 (March 23, 2022): 81–87. http://dx.doi.org/10.37394/23205.2022.21.12.

Full text

Abstract:

Marek Nowicki, MagdaWith the wide adoption of the multicore and multiprocessor systems the parallel programming became a very important element of the computer science. The programming of the multicore systems is still complicated and far to be easy. The difficulties are caused, amongst others, by the parallel tools, libraries and programming models which are not easy especially for a nonexperienced programmer. In this paper, we present PCJ - a Java library for parallel programming of heterogeneous multicore systems. The PCJ is adopting Partitioned Global Address Space paradigm which makes programming easy. We present basic functionality pf the PCJ library and its usage for parallelization of selected applications. The scalability of the genetic algorithm implementation is presented. The parallelization of the N-body algorithm implementation with PCJ is also described.

APA, Harvard, Vancouver, ISO, and other styles

13

Seoane, Luís F. "Evolutionary aspects of reservoir computing." Philosophical Transactions of the Royal Society B: Biological Sciences 374, no. 1774 (April 22, 2019): 20180377. http://dx.doi.org/10.1098/rstb.2018.0377.

Full text

Abstract:

Reservoir computing (RC) is a powerful computational paradigm that allows high versatility with cheap learning. While other artificial intelligence approaches need exhaustive resources to specify their inner workings, RC is based on a reservoir with highly nonlinear dynamics that does not require a fine tuning of its parts. These dynamics project input signals into high-dimensional spaces, where training linear readouts to extract input features is vastly simplified. Thus, inexpensive learning provides very powerful tools for decision-making, controlling dynamical systems, classification, etc. RC also facilitates solving multiple tasks in parallel, resulting in a high throughput. Existing literature focuses on applications in artificial intelligence and neuroscience. We review this literature from an evolutionary perspective. RC’s versatility makes it a great candidate to solve outstanding problems in biology, which raises relevant questions. Is RC as abundant in nature as its advantages should imply? Has it evolved? Once evolved, can it be easily sustained? Under what circumstances? (In other words, is RC an evolutionarily stable computing paradigm?) To tackle these issues, we introduce a conceptual morphospace that would map computational selective pressures that could select for or against RC and other computing paradigms. This guides a speculative discussion about the questions above and allows us to propose a solid research line that brings together computation and evolution with RC as test model of the proposed hypotheses. This article is part of the theme issue ‘Liquid brains, solid brains: How distributed cognitive architectures process information’.

APA, Harvard, Vancouver, ISO, and other styles

14

SHAMS, SOHEIL, and JEAN-LUC GAUDIOT. "PARALLEL IMPLEMENTATIONS OF NEURAL NETWORKS." International Journal on Artificial Intelligence Tools 02, no. 04 (December 1993): 557–81. http://dx.doi.org/10.1142/s0218213093000266.

Full text

Abstract:

Neural network models have attracted much attention recently by demonstrating their potential at being an effective paradigm for implementing human-like intelligent processing. Neural network models, applied to “real-world” problems, demand high processing rates. Fortunately, neural network models contain several inherently parallel computing structures which can be utilized for high throughput implementations on parallel processing architectures. In this paper we describe the basic computational requirements and the various interconnection structures that are used by neural network models. A number of inherently parallel aspects of neural computing are described in detail along with a description of their specific demands on the supporting parallel processing architecture. The main obstacle in achieving efficient parallel implementations of neural networks is shown to be associated with the difficulty in efficiently supporting the complex and widely differing interconnection structures used by various neural network models. In this paper we survey several proposed implementation techniques organized based on a taxonomy of neural network interconnection structures.

APA, Harvard, Vancouver, ISO, and other styles

15

Xiao, Wei, Chun Lei Ji, and Jian Dun Li. "Design and Implementation of Massive Data Retrieving Based on Cloud Computing Platform." Applied Mechanics and Materials 303-306 (February 2013): 2235–40. http://dx.doi.org/10.4028/www.scientific.net/amm.303-306.2235.

Full text

Abstract:

Considering the low efficiency of massive data retrieving in traditional parallel processing, by taking advantage of the great availability of cloud computing paradigm, we propose a hybrid solution based on Map-Reduce model and distributed computing framework--Spark. Moreover, we design and implement this solution in our lab. The results show that the solution can effectively improve the performance of massive data retrieving.

APA, Harvard, Vancouver, ISO, and other styles

16

Shang, Yizi, Guiming Lu, Ling Shang, and Guangqian Wang. "Parallel processing on block-based Gauss-Jordan algorithm for desktop grid." Computer Science and Information Systems 8, no. 3 (2011): 739–59. http://dx.doi.org/10.2298/csis100907026s.

Full text

Abstract:

Two kinds of parallel possibilities exist in the block-based Gauss-Jordan (BbGJ) algorithm, which are intra-step and inter-steps based parallelism. But the existing parallel paradigm of BbGJ algorithm just aiming at the intra-step based parallelism, can?t meet the requirement of dispatching simultaneously as many tasks as possible to computing nodes of desktop grid platform exploiting thousands of volunteer computing resources. To overcome the problem described above, this paper presents a hybrid parallel paradigm for desktop grid platform, exploiting all the possible parallelizable parts of the BbGJ algorithm. As well known to us all, volatility is the key issue of desktop grid platform and faults are unavoidable during the process of program execution. So the adapted version of block BbGJ algorithm for desktop grid platform should take the volatility into consideration. To solve the problem presented above, the paper adopts multi-copy distribution strategy and multi-queue based task preemption method to ensure the key tasks can be executed on time, thus ensure the whole tasks can be finished in shorter period of time.

APA, Harvard, Vancouver, ISO, and other styles

17

SLAWINSKI, JAROSLAW, and VAIDY SUNDERAM. "TOWARDS COMPUTING AS A UTILITY VIA ADAPTIVE MIDDLEWARE: AN EXPERIMENT IN CROSS-PARADIGM EXECUTION." Parallel Processing Letters 23, no. 02 (June 2013): 1340002. http://dx.doi.org/10.1142/s0129626413400021.

Full text

Abstract:

Rapid advances in cloud computing have made the vision of utility computing a near-reality, but only in certain domains. For science and engineering parallel or distributed applications, on-demand access to resources within grids and clouds is hampered by two major factors: communication performance and paradigm mismatch issues. We propose a framework for addressing the latter aspect via software adaptations that attempt to reconcile model and interface differences between application needs and resource platforms. Such matching can greatly enhance flexibility in choice of execution platforms — a key characteristic of utility computing — even though they may not be a natural fit or may incur some performance loss. Our design philosophy, middleware components, and experiences from a cross-paradigm experiment are described.

APA, Harvard, Vancouver, ISO, and other styles

18

Городняя, Лидия Васильевна. "Perspectives of Functional Programming of Parallel Computations." Russian Digital Libraries Journal 24, no. 6 (January 26, 2022): 1090–116. http://dx.doi.org/10.26907/1562-5419-2021-24-6-1090-1116.

Full text

Abstract:

The article is devoted to the results of the analysis of modern trends in functional programming, considered as a metaparadigm for solving the problems of organizing parallel computations and multithreaded programs for multiprocessor complexes and distributed systems. Taking into account the multi-paradigm nature of parallel programming, the paradigm analysis of languages and functional programming systems is used. This makes it possible to reduce the complexity of the problems being solved by methods of decomposition of programs into autonomously developed components, to evaluate their similarities and differences. Consideration of such features is necessary when predicting the course of application processes, as well as when planning the study and organizing the development of programs. There is reason to believe that functional programming has the ability to improve programs performance. A variety of paradigmatic characteristics inherent in the preparation and debugging of long-lived parallel computing programs are shown.

APA, Harvard, Vancouver, ISO, and other styles

19

Turek, Wojciech, Aleksander Byrski, John Hughes, Kevin Hammond, and Marek Zaionc. "Special issue on Parallel and distributed computing based on the functional programming paradigm." Concurrency and Computation: Practice and Experience 30, no. 22 (August 29, 2018): e4842. http://dx.doi.org/10.1002/cpe.4842.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Wang, Ziling, Li Luo, Jie Li, Lidan Wang, and Shukai Duan. "Reconfigurable nonvolatile Boolean logic with one-transistor-two-memristor for in-memory computing." Semiconductor Science and Technology 36, no. 12 (November 17, 2021): 125023. http://dx.doi.org/10.1088/1361-6641/ac363b.

Full text

Abstract:

Abstract In-memory computing is highly expected to break the von Neumann bottleneck and memory wall. Memristor with inherent nonvolatile property is considered to be a strong candidate to execute this new computing paradigm. In this work, we have presented a reconfigurable nonvolatile logic method based on one-transistor-two-memristor device structure, inhibiting the sneak path in the large-scale crossbar array. By merely adjusting the applied voltage signals, all 16 binary Boolean logic functions can be achieved in a single cell. More complex computing tasks including one-bit parallel full adder and set–reset latch have also been realized with optimization, showing simple operation process, high flexibility, and low computational complexity. The circuit verification based on cadence PSpice simulation is also provided, proving the feasibility of the proposed design. The work in this paper is intended to make progress in constructing architectures for in-memory computing paradigm.

APA, Harvard, Vancouver, ISO, and other styles

21

Błażewicz, Jacek, Adrian Moret-Salvador, and Rafał Walkowiak. "Parallel tabu search approaches for two-dimensional cutting." Parallel Processing Letters 14, no. 01 (March 2004): 23–32. http://dx.doi.org/10.1142/s0129626404001684.

Full text

Abstract:

A tabu search based approach is studied as a method for solving in parallel the two-dimensional irregular cutting problem. We use and compare different, variants of the method and various parallel computing systems. Systems used are based on message passing or shared memory paradigm. Parallel algorithms using both methods of communication are proposed. The efficiency of computer system utilization is discussed in the context of unpredictable time requirements of parallel tasks. We present results for different variants of the method together with efficiency measures for parallel implementations, where IBM SP2 and CRAY T3E systems, respectively, have been used.

APA, Harvard, Vancouver, ISO, and other styles

22

Sardar, Tanvir Habib, and Ahmed Rimaz Faizabadi. "Parallelization and analysis of selected numerical algorithms using OpenMP and Pluto on symmetric multiprocessing machine." Data Technologies and Applications 53, no. 1 (February 4, 2019): 20–32. http://dx.doi.org/10.1108/dta-05-2018-0040.

Full text

Abstract:

PurposeIn recent years, there is a gradual shift from sequential computing to parallel computing. Nowadays, nearly all computers are of multicore processors. To exploit the available cores, parallel computing becomes necessary. It increases speed by processing huge amount of data in real time. The purpose of this paper is to parallelize a set of well-known programs using different techniques to determine best way to parallelize a program experimented.Design/methodology/approachA set of numeric algorithms are parallelized using hand parallelization using OpenMP and auto parallelization using Pluto tool.FindingsThe work discovers that few of the algorithms are well suited in auto parallelization using Pluto tool but many of the algorithms execute more efficiently using OpenMP hand parallelization.Originality/valueThe work provides an original work on parallelization using OpenMP programming paradigm and Pluto tool.

APA, Harvard, Vancouver, ISO, and other styles

23

Al-Shafei, Ahmed, Hamidreza Zareipour, and Yankai Cao. "High-Performance and Parallel Computing Techniques Review: Applications, Challenges and Potentials to Support Net-Zero Transition of Future Grids." Energies 15, no. 22 (November 18, 2022): 8668. http://dx.doi.org/10.3390/en15228668.

Full text

Abstract:

The transition towards net-zero emissions is inevitable for humanity’s future. Of all the sectors, electrical energy systems emit the most emissions. This urgently requires the witnessed accelerating technological landscape to transition towards an emission-free smart grid. It involves massive integration of intermittent wind and solar-powered resources into future power grids. Additionally, new paradigms such as large-scale integration of distributed resources into the grid, proliferation of Internet of Things (IoT) technologies, and electrification of different sectors are envisioned as essential enablers for a net-zero future. However, these changes will lead to unprecedented size, complexity and data of the planning and operation problems of future grids. It is thus important to discuss and consider High Performance Computing (HPC), parallel computing, and cloud computing prospects in any future electrical energy studies. This article recounts the dawn of parallel computation in power system studies, providing a thorough history and paradigm background for the reader, leading to the most impactful recent contributions. The reviews are split into Central Processing Unit (CPU) based, Graphical Processing Unit (GPU) based, and Cloud-based studies and smart grid applications. The state-of-the-art is also discussed, highlighting the issue of standardization and the future of the field. The reviewed papers are predominantly focused on classical imperishable electrical system problems. This indicates the need for further research on parallel and HPC approaches applied to future smarter grid challenges, particularly to the integration of renewable energy into the smart grid.

APA, Harvard, Vancouver, ISO, and other styles

24

A. J., Umbarkar, and Joshi M. S. "REVIEW OF PARALLEL GENETIC ALGORITHM BASED ON COMPUTING PARADIGM AND DIVERSITY IN SEARCH SPACE." ICTACT Journal on Soft Computing 03, no. 04 (July 1, 2013): 615–22. http://dx.doi.org/10.21917/ijsc.2013.0089.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Attiya, Ibrahim, Laith Abualigah, Doaa Elsadek, Samia Allaoua Chelloug, and Mohamed Abd Elaziz. "An Intelligent Chimp Optimizer for Scheduling of IoT Application Tasks in Fog Computing." Mathematics 10, no. 7 (March 29, 2022): 1100. http://dx.doi.org/10.3390/math10071100.

Full text

Abstract:

The cloud computing paradigm is evolving rapidly to address the challenges of new emerging paradigms, such as the Internet of Things (IoT) and fog computing. As a result, cloud services usage is increasing dramatically with the recent growth of IoT-based applications. To successfully fulfill application requirements while efficiently harnessing cloud computing power, intelligent scheduling approaches are required to optimize the scheduling of IoT application tasks on computing resources. In this paper, the chimp optimization algorithm (ChOA) is incorporated with the marine predators algorithm (MPA) and disruption operator to determine the optimal solution to IoT applications’ task scheduling. The developed algorithm, called CHMPAD, aims to avoid entrapment in the local optima and improve the exploitation capability of the basic ChOA as its main drawbacks. Experiments are conducted using synthetic and real workloads collected from the Parallel Workload Archive to demonstrate the applicability and efficiency of the presented CHMPAD method. The simulation findings reveal that CHMPAD can achieve average makespan time improvements of 1.12–43.20% (for synthetic workloads), 1.00–43.43% (for NASA iPSC workloads), and 2.75–42.53% (for HPC2N workloads) over peer scheduling algorithms. Further, our evaluation results suggest that our proposal can improve the throughput performance of fog computing.

APA, Harvard, Vancouver, ISO, and other styles

26

Guzmán, Eduardo, Mario Vázquez, David Del Valle, and Paulino Pérez-Rodríguez. "Artificial Neuronal Networks: A Bayesian Approach Using Parallel Computing." Revista Colombiana de Estadística 41, no. 2 (July 1, 2018): 173–89. http://dx.doi.org/10.15446/rce.v41n2.55250.

Full text

Abstract:

An Artificial Neural Network (ANN) is a learning paradigm and automatic processing inspired in the biological behavior of neurons and the brain structure. The brain is a complex system; its basic processing unit are the neurons, which are distributed massively in the brain sharing multiple connections between them. The ANNs try to emulate some characteristics of humans, and can be thought as intelligent systems that perform some tasks in a different way that actual computer does. The ANNs can be used to perform complex activities, for example: pattern recognition and classification, weather prediction, genetic values prediction, etc. The algorithms used to train the ANN, are in general complex, so therefore there is a need to have alternatives which lead to a significant reduction of times employed to train an ANN. In this work, we present an algorithm based in the strategy ``divide and conquer'' which allows to train an ANN with a single hidden layer. Part of the sub problems of the general algorithm used for training are solved by using parallel computing techniques, which allows to improve the performance of the resulting application. The proposed algorithm was implemented using the C++ programming language, and the libraries Open MPI and ScaLAPACK. We present some application examples and we asses the application performance. The results shown that it is possible to reduce significantly the time necessary to execute the program that implements the algorithm to train the ANN.

APA, Harvard, Vancouver, ISO, and other styles

27

CORDASCO, GENNARO, and ARNOLD L. ROSENBERG. "ON SCHEDULING SERIES-PARALLEL DAGs TO MAXIMIZE AREA." International Journal of Foundations of Computer Science 25, no. 05 (August 2014): 597–621. http://dx.doi.org/10.1142/s0129054114500245.

Full text

Abstract:

The AREA of a schedule for executing DAGs is the average number of DAG-chores that are eligible for execution at each step of the computation. AREA maximization is a new optimization goal for schedules that execute DAGs within computational environments, such as Internet-based computing, clouds, and volunteer computing projects, that are dynamically heterogeneous, in the sense that the environments' constituent computers can change their effective powers at times and in ways that are not predictable. This paper is motivated by the thesis that, within dynamically heterogeneous environments, DAG-schedules that have larger AREAs execute a computation-DAG with smaller completion time under many circumstances; this thesis is supported by preliminary simulation-based experiments. While every DAG admits an AREA-maximizing schedule, it is likely computationally difficult to find such a schedule for an arbitrary DAG. Earlier work has shown how to craft AREA-maximizing schedules efficiently for a number of families of DAGs whose structures are reminiscent of many scientific computations. The current paper extends this work by showing how to efficiently craft AREA-maximizing schedules for series-parallel DAGs, a family that models a multithreading computing paradigm. The techniques for crafting these schedules promise to apply also to other large families of recursively defined DAGs. Moreover, the ability to derive these schedules efficiently leads to an efficient AREA-oriented heuristic for scheduling arbitrary DAGs.

APA, Harvard, Vancouver, ISO, and other styles

28

Mazhar, H., T. Heyn, A. Pazouki, D. Melanz, A. Seidl, A. Bartholomew, A. Tasora, and D. Negrut. "CHRONO: a parallel multi-physics library for rigid-body, flexible-body, and fluid dynamics." Mechanical Sciences 4, no. 1 (February 12, 2013): 49–64. http://dx.doi.org/10.5194/ms-4-49-2013.

Full text

Abstract:

Abstract. The last decade witnessed a manifest shift in the microprocessor industry towards chip designs that promote parallel computing. Until recently the privilege of a select group of large research centers, Teraflop computing is becoming a commodity owing to inexpensive GPU cards and multi to many-core x86 processors. This paradigm shift towards large scale parallel computing has been leveraged in CHRONO, a freely available C++ multi-physics simulation package. CHRONO is made up of a collection of loosely coupled components that facilitate different aspects of multi-physics modeling, simulation, and visualization. This contribution provides an overview of CHRONO::Engine, CHRONO::Flex, CHRONO::Fluid, and CHRONO::Render, which are modules that can capitalize on the processing power of hundreds of parallel processors. Problems that can be tackled in CHRONO include but are not limited to granular material dynamics, tangled large flexible structures with self contact, particulate flows, and tracked vehicle mobility. The paper presents an overview of each of these modules and illustrates through several examples the potential of this multi-physics library.

APA, Harvard, Vancouver, ISO, and other styles

29

Khakhanova, A., S. Chumachenko, D. Rakhlis, І. Hahanov, and V. Hahanov. "QUANTUM DIGITAL-ANALOGUE COMPUTING." Radio Electronics, Computer Science, Control, no. 4 (December 4, 2022): 40. http://dx.doi.org/10.15588/1607-3274-2022-4-4.

Full text

Abstract:

Context. Nature is the relation among processes and phenomena. Nothing exists in the universe without relations. Computer is transactions of relations between data with the help of control and execution mechanisms. Quantum relations are a superposition of particles and their states. Superposition and entanglement are equivalent concepts. Entanglement is a non-local superposition of deterministic states. A quantum computer is unconditional transactions of relations between qubit data. Quantum computer is an analog device for parallel solution of combinatorial problems. Practically oriented definitions of the quantum computer concepts are the path to development of scalable quantum parallel algorithms for combinatorial problems solving. Any algorithm can be reduced to a sequence of operations without conditions, because any truth table is a collection of a complete system of conditions-states. Any sequence of actions can always be reduced to one parallel operation. Conditions and sequences arise only when the developer wants to use previously created primitive constructs to build an always non-optimal computing unit. The paradigm of quantum computer creation is determined through the use of photonic transactions on the electrons of an atom may exclude the use of quantum logic. The evolutionary path of a quantum computer from the classical one: “memory-address-transaction” (MAT) → “electron-addresstransaction” → “electron-address-quantaction” (EAQ) → state-superposition-logic. The meeting point of classical and quantum computers is photon transactions on the structure of electrons. Everything that is calculated on a quantum computer can be calculated in parallel on a classical one on account of memory redundancy. The given example is a memory-driven algorithm for modeling digital products based on qubit-vector forms of functionality description for significant performance boost of computing processes by parallel execution of logical operations. Objective. Simulation of the correct SoC-component behavior based on vector representation of the logic. Formation of the triggering development of a computing based on the superposition of the classical, quantum and analog computing process, which in its development should be based on technological qubit, tabular and vector data structures for the parallel solution of combinatorial problems. Method. MAT-computing implements any algorithms on account of transactions (read-write) in memory. Qubit-vector models for describing functionalities, which differ from known truth tables in compactness of description and manufacturability for the implementation of parallel algorithms of the synthesis and analysis of digital devices and SoC-components. Results. 1) The metric of the technological data structures, focused on parallel troubleshooting in digital systems based on the usage of two logical vector operations, was proposed for the first time. 2) The metric of relations between the individual components of QC, allowing organizing a quantum deterministic computer, has been further developed. 3) Quantum architectural solutions, that allow solving coverage problems in a quasi-parallel mode, were proposed for the first time. 4) Architectural solutions based on an analog-to-digital computing, which can be used to solve the problems of the digital systems parallel analysis, have been further developed. 5) Vector-qubit structures of the logic data, that allow a quasi-parallel simulation of digital circuits, were proposed. Conclusions. Qubit models, quantum methods and combinatorial algorithms for technical diagnostics of digital devices have been implemented, which can significantly (up to 25%) reduce the time of test synthesis, deductive modeling of faulty and correct behavior, search for defective states by introducing an innovative idea of using qubit-vector data structures for describing logical components. Comparative assessments of qubit models and methods usage show an increase in the efficiency of algorithms for modeling digital devices compared to tabular ones. The superposition of a classical, quantum and analog computer is integrally represented, which allows to find the best solutions for recognition and decision making.

APA, Harvard, Vancouver, ISO, and other styles

30

Iacono-Manno, Carmelo Marcello, Marco Fargetta, Roberto Barbera, Alberto Falzone, Giuseppe Andronico, Salvatore Monforte, Annamaria Muoio, et al. "The Sicilian Grid Infrastructure for High Performance Computing." International Journal of Distributed Systems and Technologies 1, no. 1 (January 2010): 40–54. http://dx.doi.org/10.4018/jdst.2010090803.

Full text

Abstract:

The conjugation of High Performance Computing (HPC) and Grid paradigm with applications based on commercial software is one among the major challenges of today e-Infrastructures. Several research communities from either industry or academia need to run high parallel applications based on licensed software over hundreds of CPU cores; a satisfactory fulfillment of such requests is one of the keys for the penetration of this computing paradigm into the industry world and sustainability of Grid infrastructures. This problem has been tackled in the context of the PI2S2 project that created a regional e-Infrastructure in Sicily, the first in Italy over a regional area. Present article will describe the features added in order to integrate an HPC facility into the PI2S2 Grid infrastructure, the adoption of the InifiniBand low-latency net connection, the gLite middleware extended to support MPI/MPI2 jobs, the newly developed license server and the specific scheduling policy adopted. Moreover, it will show the results of some relevant use cases belonging to Computer Fluid-Dynamics (Fluent, OpenFOAM), Chemistry (GAMESS), Astro-Physics (Flash) and Bio-Informatics (ClustalW)).

APA, Harvard, Vancouver, ISO, and other styles

31

Turton, I., and S. Openshaw. "High-Performance Computing and Geography: Developments, Issues, and Case Studies." Environment and Planning A: Economy and Space 30, no. 10 (October 1998): 1839–56. http://dx.doi.org/10.1068/a301839.

Full text

Abstract:

In this paper we outline some of the results that were obtained by the application of a Cray T3D parallel supercomputer to human geography problems. We emphasise the fundamental importance of high-performance computing (HPC) as a future relevant paradigm for doing geography. We offer an introduction to recent developments and illustrate how new computational intelligence technologies can start to be used to make use of opportunities created by data riches from geographic information systems, artificial intelligence tools, and HPC in geography.

APA, Harvard, Vancouver, ISO, and other styles

32

Sittig, Dean F., David Foulser, Nicholas Carriero, George McCorkle, and Perry L. Miller. "A parallel computing approach to genetic sequence comparison: The master-worker paradigm with interworker communication." Computers and Biomedical Research 24, no. 2 (April 1991): 152–69. http://dx.doi.org/10.1016/0010-4809(91)90027-t.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Symes, Mark D., and Leroy Cronin. "The Crystal Computer - Computing with Inorganic Cellular Frameworks and Nets." International Journal of Nanotechnology and Molecular Computation 3, no. 1 (January 2011): 24–34. http://dx.doi.org/10.4018/jnmc.2011010103.

Full text

Abstract:

The enormous potential of parallel computing has led to the first prototype devices being constructed. However, all the examples to date rely on complicated chemical and/or physical manipulations, and hence do not lend themselves to the kind of widespread investigation necessary to advance the field. This article presents a new paradigm for parallel computing: the use of solid, single crystalline materials as cellular automata suggesting the idea of the “Crystal Computer,” now possible due to a new class of crystalline cellular materials that undergo single-crystal-to-single-crystal (SC-SC) oxidation and reduction (REDOX) reactions. Two avenues are proposed for investigation: reversible single-crystal to single-crystal electronic transformations and solid-state spin transfer within spin-crossover complexes. Both schemes allow computation to occur in three dimensions, within cheap and easy to assemble materials and using commonplace techniques for input and readout.

APA, Harvard, Vancouver, ISO, and other styles

34

Huang, Wei, Jianzhong Zhou, and Dongying Zhang. "On-the-Fly Fusion of Remotely-Sensed Big Data Using an Elastic Computing Paradigm with a Containerized Spark Engine on Kubernetes." Sensors 21, no. 9 (April 23, 2021): 2971. http://dx.doi.org/10.3390/s21092971.

Full text

Abstract:

Remotely-sensed satellite image fusion is indispensable for the generation of long-term gap-free Earth observation data. While cloud computing (CC) provides the big picture for RS big data (RSBD), the fundamental question of the efficient fusion of RSBD on CC platforms has not yet been settled. To this end, we propose a lightweight cloud-native framework for the elastic processing of RSBD in this study. With the scaling mechanisms provided by both the Infrastructure as a Service (IaaS) and Platform as a Services (PaaS) of CC, the Spark-on-Kubernetes operator model running in the framework can enhance the efficiency of Spark-based algorithms without considering bottlenecks such as task latency caused by an unbalanced workload, and can ease the burden to tune the performance parameters for their parallel algorithms. Internally, we propose a task scheduling mechanism (TSM) to dynamically change the Spark executor pods’ affinities to the computing hosts. The TSM learns the workload of a computing host. Learning from the ratio between the number of completed and failed tasks on a computing host, the TSM dispatches Spark executor pods to newer and less-overwhelmed computing hosts. In order to illustrate the advantage, we implement a parallel enhanced spatial and temporal adaptive reflectance fusion model (PESTARFM) to enable the efficient fusion of big RS images with a Spark aggregation function. We construct an OpenStack cloud computing environment to test the usability of the framework. According to the experiments, TSM can improve the performance of the PESTARFM using only PaaS scaling to about 11.7%. When using both the IaaS and PaaS scaling, the maximum performance gain with the TSM can be even greater than 13.6%. The fusion of such big Sentinel and PlanetScope images requires less than 4 min in the experimental environment.

APA, Harvard, Vancouver, ISO, and other styles

35

Li, Peng, Jonathan C. Beard, and Jeremy D. Buhler. "Deadlock-free buffer configuration for stream computing." International Journal of High Performance Computing Applications 31, no. 5 (December 20, 2016): 441–50. http://dx.doi.org/10.1177/1094342016675679.

Full text

Abstract:

Stream computing is a popular paradigm for parallel and distributed computing, where compute nodes are connected by first-in first-out data channels. Each channel can be considered as a concatenation of several data buffers, including an output buffer for the sender and an input buffer for the receiver. The configuration of buffer sizes impacts the performance as well as the correctness of the application. In this article, we focus on application deadlocks that are caused by incorrect configuration of buffer sizes. We describe three types of deadlock in streaming applications, categorized by how they can be created. To avoid them, we first prove necessary and sufficient conditions for deadlock-free computations; then based on the theorems, we propose both compile-time and runtime solutions for deadlock avoidance.

APA, Harvard, Vancouver, ISO, and other styles

36

Di Modica, Giuseppe, and Orazio Tomarchio. "A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data." Big Data and Cognitive Computing 6, no. 1 (January 6, 2022): 5. http://dx.doi.org/10.3390/bdcc6010005.

Full text

Abstract:

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.

APA, Harvard, Vancouver, ISO, and other styles

37

Lapegna, Marco, Walter Balzano, Norbert Meyer, and Diego Romano. "Clustering Algorithms on Low-Power and High-Performance Devices for Edge Computing Environments." Sensors 21, no. 16 (August 10, 2021): 5395. http://dx.doi.org/10.3390/s21165395.

Full text

Abstract:

The synergy between Artificial Intelligence and the Edge Computing paradigm promises to transfer decision-making processes to the periphery of sensor networks without the involvement of central data servers. For this reason, we recently witnessed an impetuous development of devices that integrate sensors and computing resources in a single board to process data directly on the collection place. Due to the particular context where they are used, the main feature of these boards is the reduced energy consumption, even if they do not exhibit absolute computing powers comparable to modern high-end CPUs. Among the most popular Artificial Intelligence techniques, clustering algorithms are practical tools for discovering correlations or affinities within data collected in large datasets, but a parallel implementation is an essential requirement because of their high computational cost. Therefore, in the present work, we investigate how to implement clustering algorithms on parallel and low-energy devices for edge computing environments. In particular, we present the experiments related to two devices with different features: the quad-core UDOO X86 Advanced+ board and the GPU-based NVIDIA Jetson Nano board, evaluating them from the performance and the energy consumption points of view. The experiments show that they realize a more favorable trade-off between these two requirements than other high-end computing devices.

APA, Harvard, Vancouver, ISO, and other styles

38

Sheetal, Annabathula Phani, Ravi Teja Bhima, Radha Karampudi, and Srisailapu D. Vara Prasad. "Load Balancing and Parallel Computation Model for Performance and Accuracy over the Cluster of Nodes." Ingénierie des systèmes d information 27, no. 2 (April 30, 2022): 343–48. http://dx.doi.org/10.18280/isi.270219.

Full text

Abstract:

Cloud computing can be online based network engineering which contributed with a rapid advancement at the progress of communication technological innovation by supplying assistance to clients of assorted conditions with aid from online computing sources. It's terms of hardware and software apps together side software growth testing and platforms applications because tools. Large-scale heterogeneous distributed computing surroundings give the assurance of usage of a huge quantity of computing tools in a comparatively low price. As a way to lessen the software development and setup onto such complicated surroundings, high speed parallel programming languages exist which have to be encouraged by complex operating techniques. There are numerous advantages for consumers in terms of cost and flexibility that come with Cloud computing anticipated uptake. Building on well-established research in Internet solutions, networks and utility computing, virtualization et cetera Service-Oriented Architectures and the Internet of Services (IoS) have implications for a wide range of technological issues such as parallel computing and load balancing as well as high availability and scalability. Effective load balancing methods are essential to solving these issues. Adaptive task load model is the name of the method we suggest in our article for balancing the workload (ATLM). We developed an adaptive parallel distributed computing paradigm as a result of this (ADPM). While still maintaining the model's integrity, ADPM employs a more flexible synchronization approach to cut down on the amount of time synchronous operations use. As well as the ATLM load balancing technique, which solves the straggler issue caused by the performance disparity between nodes, ADPM also applies it to ensure model correctness. The results indicate that combining ADPM and ATLM improves training efficiency without compromising model correctness.

APA, Harvard, Vancouver, ISO, and other styles

39

AlShathri, Samah Ibrahim, Samia Allaoua Chelloug, and Dina S. M. Hassan. "Parallel Meta-Heuristics for Solving Dynamic Offloading in Fog Computing." Mathematics 10, no. 8 (April 11, 2022): 1258. http://dx.doi.org/10.3390/math10081258.

Full text

Abstract:

The internet of things (IoT) concept has been extremely investigated in many modern smart applications, which enable a set of sensors to either process the collected data locally or send them to the cloud for remote processing. Unfortunately, cloud datacenters are located far away from IoT devices, and consequently, the transmission of IoT data may be delayed. In this paper, we investigate fog computing, which is a new paradigm that overcomes many issues of cloud computing. More importantly, dynamic task offloading in fog computing is a challenging problem that requires an optimal decision for processing the tasks that are generated in each time slot. Thus, exact optimization methods based on Lyapunov function have been widely used for solving dynamic offloading which represents an NP hard problem. To overcome the scalability issue of exact optimization techniques, we have explored famous population based meta-heuristics for optimizing the offloading process of a set of dynamic tasks using Orthogonal Frequency Division Multiplexing (OFDM) communication. Hence, a parallel multi-threading framework is proposed for generating the optimal offloading solution while selecting the best sub-carrier for each offloaded task. More importantly, our contribution associates a thread for each IoT device and generates a population of random solutions. Next, each population is updated and evaluated according to the proposed fitness function that considers a tradeoff between the delay and energy consumption. Upon the arrival of new tasks at each time slot, an evaluation is performed for maintaining some individuals of the previous population while generating new individuals based on some criteria. Our results have been compared to the results achieved using Lyapunov optimization. They demonstrate the convergence of the fitness function, the scalability of the parallel Particle Swarm Optimization (PSO) approach, and the performance in terms of the offline error and the execution cost.

APA, Harvard, Vancouver, ISO, and other styles

40

HERRMANN, CHRISTOPH A., and CHRISTIAN LENGAUER. "USING METAPROGRAMMING TO PARALLELIZE FUNCTIONAL SPECIFICATIONS." Parallel Processing Letters 12, no. 02 (June 2002): 193–210. http://dx.doi.org/10.1142/s0129626402000926.

Full text

Abstract:

Metaprogramming is a paradigm for enhancing a general-purpose programming language with features catering for a special-purpose application domain, without a need for a reimplementation of the language. In a staged compilation, the special-purpose features are translated and optimised by a domain-specific preprocessor, which hands over to the general-purpose compiler for translation of the domain-independent part of the program. The domain we work in is high-performance parallel computing. We use metaprogramming to enhance the functional language Haskell with features for the efficient, parallel implementation of certain computational patterns, called skeletons.

APA, Harvard, Vancouver, ISO, and other styles

41

Huang, Lan, Teng Gao, Dalin Li, Zihao Wang, and Kangping Wang. "A Highly Configurable High-Level Synthesis Functional Pattern Library." Electronics 10, no. 5 (February 25, 2021): 532. http://dx.doi.org/10.3390/electronics10050532.

Full text

Abstract:

FPGA has recently played an increasingly important role in heterogeneous computing, but Register Transfer Level design flows are not only inefficient in design, but also require designers to be familiar with the circuit architecture. High-level synthesis (HLS) allows developers to design FPGA circuits more efficiently with a more familiar programming language, a higher level of abstraction, and automatic adaptation of timing constraints. When using HLS tools, such as Xilinx Vivado HLS, specific design patterns and techniques are required in order to create high-performance circuits. Moreover, designing efficient concurrency and data flow structures requires a deep understanding of the hardware, imposing more learning costs on programmers. In this paper, we propose a set of functional patterns libraries based on the MapReduce model, implemented by C++ templates, which can quickly implement high-performance parallel pipelined computing models on FPGA with specified simple parameters. The usage of this pattern library allows flexible adaptation of parallel and flow structures in algorithms, which greatly improves the coding efficiency. The contributions of this paper are as follows. (1) Four standard functional operators suitable for hardware parallel computing are defined. (2) Functional concurrent programming patterns are described based on C++ templates and Xilinx HLS. (3) The efficiency of this programming paradigm is verified with two algorithms with different complexity.

APA, Harvard, Vancouver, ISO, and other styles

42

LUO, JUN, and SANGUTHEVAR RAJASEKARAN. "PARALLIZING 1-DIMENSIONAL ESTUARINE MODEL." International Journal of Foundations of Computer Science 15, no. 06 (December 2004): 809–21. http://dx.doi.org/10.1142/s0129054104002765.

Full text

Abstract:

Wave simulation is an important problem in engineering. Wave simulation models play significant roles in environmental protection. Wave simulations have thus far been done mostly serially. In order to meet the demand for increased spatial and temporal resolution and uncertainty analysis in environmental models for ecological assessments and water resource managements, it is essential to develop high-performance hydrodynamics and water quality models using parallel techniques. In this paper, algorithms for parallelizing 1-D wave simulation models are proposed. In particular, we introduce a paradigm called Less Talk for reducing communication costs in parallel computing. Our wave models have been implemented using Parallel Virtual Machine (PVM) and Multithreading. The experiments show that these algorithms that employ Less Talk are efficient and the parallel wave simulation models have excellent speedup performances.

APA, Harvard, Vancouver, ISO, and other styles

43

Epicoco, Italo, Silvia Mocavero, Andrew R. Porter, Stephen M. Pickles, Mike Ashworth, and Giovanni Aloisio. "Hybridisation strategies and data structures for the NEMO ocean model." International Journal of High Performance Computing Applications 32, no. 6 (January 29, 2017): 864–81. http://dx.doi.org/10.1177/1094342016684930.

Full text

Abstract:

This work describes the introduction of a second level of parallelism based on the OpenMP shared memory paradigm to NEMO, one of the most widely used ocean models in the European climate community. Although the existing parallelisation scheme in NEMO, based on the MPI paradigm, has served it well for many years, it is becoming unsuited to current high-performance computing architectures due to their increasing tendency to have fat nodes containing tens of compute cores. Three different parallel approaches for introducing OpenMP are presented, discussed and compared on several platforms. Finally we have also considered the effect on performance of the data layout employed in NEMO.

APA, Harvard, Vancouver, ISO, and other styles

44

Burdonov, Igor Borisovich, Nina Vladimirovna Yevtushenko, and Alexander Sergeevitch Kossatchev. "Implementation of distributed and parallel computing in the SDN network." Proceedings of the Institute for System Programming of the RAS 34, no. 3 (2022): 159–72. http://dx.doi.org/10.15514/ispras-2022-34(3)-11.

Full text

Abstract:

The paper discusses the execution of a program of tasks on the SDN data plane, modeled by a finite connected undirected graph of physical connections; the execution is understood in the sense of the object-oriented programming paradigm as consisting of objects and messages that objects can exchange. Objects are implemented in hosts. Several different objects can be implemented in one host and the same object can be implemented in several hosts. Messages between objects implemented in different hosts are transmitted in packets which are routed by switches based on identifiers assigned to packets that is on a set of values of some packet parameters in the packet header. Two problems are tackled in the work: 1) minimizing the number of identifiers, 2) setting up switches to implement the paths that packets should take place. These tasks are solved in two cases: A) a packet intended for some object must get into exactly one host in which this object is implemented, B) a packet can get into several hosts, but the desired object must be implemented in one and only one of them. It is shown that problem 1 in case A is equivalent to the set covering problem, and the minimum number of identifiers in the worst case is min{ n, m } where n is the number of objects, and m is the number of hosts implementing objects. In case B, the problem is a special modification of the set covering problem, the hypothesis is proposed that the minimum number of identifiers in the worst case is min{ ëlb(n + 1)û, m }. So far, an upper bound is O( min { ln (min { n, m }) × ln ( n, m ) } ). To solve problem 2 in cases A and B, algorithms for switches’ setting are proposed which have the complexityO( m ) and O( k m ), , respectively, where m is the number of the edges of the graph of physical connections and k is the number of the required packet identifiers.

APA, Harvard, Vancouver, ISO, and other styles

45

Samaké, A., M. Alassane, A. Mahamane, and O. Diallo. "A SCALABLE HYBRID CPU-GPU COMPUTATIONAL FRAMEWORK FOR A FINITE ELEMENT-BASED AIR QUALITY MODEL." Advances in Mathematics: Scientific Journal 12, no. 1 (January 2, 2023): 45–61. http://dx.doi.org/10.37418/amsj.12.1.3.

Full text

Abstract:

We propose a scalable computational framework for the hybrid CPU-GPU implementation ofa traffic-induced and finite element-based air quality model. The hybrid computing paradigm we investigate consists in combining the CPU-based distributed-memory programming approach using Message Passing Interface (MPI) and a GPU programming model for the finite element numerical integration using Compute Unified Device Architecture (CUDA), a general purpose parallel computing platform released by NVIDIA Corporation and featured on its own GPUs. The scalability results obtained from numerical experiments on two major road traffic-induced air pollutants, namely the fine and inhalable particulate matter PM$_{2.5}$ and PM$_{10}$, are illustrated. These achievements, including speedup and efficiency analyses, support that this framework scales well up to 256 CPU cores used concurrently with GPUs from a hybrid computing system.

APA, Harvard, Vancouver, ISO, and other styles

46

Kehagias, Dimitris, Michael Grivas, and Grammati Pantziou. "Using a hybrid platform for cluster, NoW and GRID computing." Facta universitatis - series: Electronics and Energetics 18, no. 2 (2005): 205–18. http://dx.doi.org/10.2298/fuee0502205k.

Full text

Abstract:

Clusters, Networks of Workstations (NoW) and Grids offer a new, highly available and cost-effective parallel computing paradigm. Their simplicity, versatility and unlimited power made them a rapidly adopted technology. It seems that they form the new way of computing. Although these platforms are based on the same principles, they differ significantly, needing special attention to their characteristics. Future computer scientists, programmers and analysts have to be well prepared, both about administrative and programming issues, in order to ensure faster and smoother transition. We present the architecture of a dynamic clustering system consisting of Beowulf class clusters and NoW, as well as our experience from constructing and using such a system, as an educational infrastructure for HPC.

APA, Harvard, Vancouver, ISO, and other styles

47

Barcelona-Pons, Daniel, Pierre Sutra, Marc Sánchez-Artigas, Gerard París, and Pedro García-López. "Stateful Serverless Computing with Crucial." ACM Transactions on Software Engineering and Methodology 31, no. 3 (July 31, 2022): 1–38. http://dx.doi.org/10.1145/3490386.

Full text

Abstract:

Serverless computing greatly simplifies the use of cloud resources. In particular, Function-as-a-Service (FaaS) platforms enable programmers to develop applications as individual functions that can run and scale independently. Unfortunately, applications that require fine-grained support for mutable state and synchronization, such as machine learning (ML) and scientific computing, are notoriously hard to build with this new paradigm. In this work, we aim at bridging this gap. We present Crucial , a system to program highly-parallel stateful serverless applications. Crucial retains the simplicity of serverless computing. It is built upon the key insight that FaaS resembles to concurrent programming at the scale of a datacenter. Accordingly, a distributed shared memory layer is the natural answer to the needs for fine-grained state management and synchronization. Crucial allows to port effortlessly a multi-threaded code base to serverless, where it can benefit from the scalability and pay-per-use model of FaaS platforms. We validate Crucial with the help of micro-benchmarks and by considering various stateful applications. Beyond classical parallel tasks (e.g., a Monte Carlo simulation), these applications include representative ML algorithms such as k -means and logistic regression. Our evaluation shows that Crucial obtains superior or comparable performance to Apache Spark at similar cost (18%–40% faster). We also use Crucial to port (part of) a state-of-the-art multi-threaded ML library to serverless. The ported application is up to 30% faster than with a dedicated high-end server. Finally, we attest that Crucial can rival in performance with a single-machine, multi-threaded implementation of a complex coordination problem. Overall, Crucial delivers all these benefits with less than 6% of changes in the code bases of the evaluated applications.

APA, Harvard, Vancouver, ISO, and other styles

48

Yamazaki, Ichitaro, Jakub Kurzak, Piotr Luszczek, and Jack Dongarra. "Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime." Parallel Processing Letters 24, no. 04 (December 2014): 1442004. http://dx.doi.org/10.1142/s0129626414420043.

Full text

Abstract:

A systolic array provides an alternative computing paradigm to the von Neumann architecture. Though its hardware implementation has failed as a paradigm to design integrated circuits in the past, we are now discovering that the systolic array as a software virtualization layer can lead to an extremely scalable execution paradigm. To demonstrate this scalability, in this paper, we design and implement a 3D virtual systolic array to compute a tile QR decomposition of a tall-and-skinny dense matrix. Our implementation is based on a state-of-the-art algorithm that factorizes a panel based on a tree-reduction. Freed from the constraint of a planar layout, we present a three-dimensional virtual systolic array architecture for this algorithm. Using a runtime developed as a part of the Parallel Ultra Light Systolic Array Runtime (PULSAR) project, we demonstrate on a Cray-XT5 machine how our virtual systolic array can be mapped to a large-scale machine and obtain excellent parallel performance. This is an important contribution since such a QR decomposition is used, for example, to compute a least squares solution of an overdetermined system, which arises in many scientific and engineering problems.

APA, Harvard, Vancouver, ISO, and other styles

49

Hamidi, Hodjat, Abbas Vafaei, and Seyed Amir Hassan Monadjemi. "Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems." International Journal of Grid and High Performance Computing 4, no. 1 (January 2012): 37–51. http://dx.doi.org/10.4018/jghpc.2012010103.

Full text

Abstract:

In this paper, the authors present a new approach to algorithm based fault tolerance (ABFT) for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.

APA, Harvard, Vancouver, ISO, and other styles

50

Kong, Byungyun, Geonmo Ryu, Sangwook Bae, Seo-Young Noh, and Heejun Yoon. "An Efficient Approach to Consolidating Job Schedulers in Traditional Independent Scientific Workflows." Applied Sciences 10, no. 4 (February 21, 2020): 1455. http://dx.doi.org/10.3390/app10041455.

Full text

Abstract:

The current research paradigm is one of data-driven research. Researchers are beginning to deploy computer facilities to produce and analyze large amounts of data. As requirements for computing power grow, data processing in traditional workstations is always under pressure for efficient resource management. In such an environment, a tremendous amount of data is being processed using parallel computing for efficient and effective research results. HTCondor, as an example, provides computing power for data analysis for researchers. Although such a system works well in a traditional computing cluster environment, we need an efficient methodology to meet the ever-increasing demands of computing using limited resources. In this paper, we propose an approach to integrating clusters that can share their computing power on the basis of a priority policy. Our approach makes it possible to share worker nodes while maintaining the resources allocated to each group. In addition, we have utilized the historical data of user usage in order to analyze problems that have occurred during job execution due to resource sharing and the actual operating results. Our findings can provide a reasonable guideline for limited computing powers shared by multiple scientific groups.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!