Journal articles on the topic 'Multi-core execution'

To see the other types of publications on this topic, follow the link: Multi-core execution.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-core execution.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Fang, Juan, and Hong Bo Zhang. "An Improved Architecture for Multi-Core Prefetching." Advanced Materials Research 505 (April 2012): 253–56. http://dx.doi.org/10.4028/www.scientific.net/amr.505.253.

Full text
Abstract:
The “Memory Wall” problem has become a bottleneck for the performance of processor, and on-chip multiprocessor(CMP) aggravates the memory access latency. So many hardware prefetching techniques have been brought to solve this challenge, i.e. Future Execution. This paper introduces runahead execution(another hardware prefetching technique firstly used on single-core processor) and Future Execution, then it brings up some improvement for Future Execution and gives the result and analysis of data tested by SPEC2000 benchmark.
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Xiaowen. "Command-Triggered Microcode Execution for Distributed Shared Memory Based Multi-Core Network-on-Chips." Journal of Software 10, no. 2 (February 2015): 142–61. http://dx.doi.org/10.17706/jsw.10.2.142-161.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Maity, Arka, Anuj Pathania, and Tulika Mitra. "PkMin: Peak Power Minimization for Multi-Threaded Many-Core Applications." Journal of Low Power Electronics and Applications 10, no. 4 (September 30, 2020): 31. http://dx.doi.org/10.3390/jlpea10040031.

Full text
Abstract:
Multiple multi-threaded tasks constitute a modern many-core application. An accompanying generic Directed Acyclic Graph (DAG) represents the execution precedence relationship between the tasks. The application comes with a hard deadline and high peak power consumption. Parallel execution of multiple tasks on multiple cores results in a quicker execution, but higher peak power. Peak power single-handedly determines the involved cooling costs in many-cores, while its violations could induce performance-crippling execution uncertainties. Less task parallelization, on the other hand, results in lower peak power, but a more prolonged deadline violating execution. The problem of peak power minimization in many-cores is to determine task-to-core mapping configuration in the spatio-temporal domain that minimizes the peak power consumption of an application, but ensures application still meets the deadline. All previous works on peak power minimization for many-core applications (with or without DAG) assume only single-threaded tasks. We are the first to propose a framework, called PkMin, which minimizes the peak power of many-core applications with DAG that have multi-threaded tasks. PkMin leverages the inherent convexity in the execution characteristics of multi-threaded tasks to find a configuration that satisfies the deadline, as well as minimizes peak power. Evaluation on hundreds of applications shows PkMin on average results in 49.2% lower peak power than a similar state-of-the-art framework.
APA, Harvard, Vancouver, ISO, and other styles
4

Griffin, Thomas A. N., Kenneth Shankland, Jacco van de Streek, and Jason Cole. "MDASH: a multi-core-enabled program for structure solution from powder diffraction data." Journal of Applied Crystallography 42, no. 2 (March 5, 2009): 360–61. http://dx.doi.org/10.1107/s0021889809006852.

Full text
Abstract:
The simulated annealing approach to structure solution from powder diffraction data, as implemented in theDASHprogram, is easily amenable to parallelization at the individual run level. Modest increases in speed of execution can therefore be achieved by executing individualDASHruns on the individual cores of CPUs.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Gang, Kai Huang, Long Cheng, Biao Hu, and Alois Knoll. "Dynamic Partitioned Cache Memory for Real-Time MPSoCs with Mixed Criticality." Journal of Circuits, Systems and Computers 25, no. 06 (March 31, 2016): 1650062. http://dx.doi.org/10.1142/s0218126616500626.

Full text
Abstract:
Shared cache interference in multi-core architectures has been recognized as one of major factors that degrade predictability of a mixed-critical real-time system. Due to the unpredictable cache interference, the behavior of shared cache is hard to predict and analyze statically in multi-core architectures executing mixed-critical tasks, which will not only result in difficulty of estimating the worst-case execution time (WCET) but also introduce significant worst-case timing penalties for critical tasks. Therefore, cache management in mixed-critical multi-core systems has become a challenging task. In this paper, we present a dynamic partitioned cache memory for mixed-critical real-time multi-core systems. In this architecture, critical tasks can dynamically allocate and release the cache resourse during the execution interval according to the real-time workload. This dynamic partitioned cache can, on the one hand, provide the predicable cache performance for critical tasks. On the other hand, the released cache can be dynamically used by non-critical tasks to improve their average performance. We demonstrate and prototype our system design on the embedded FPGA platform. Measurements from the prototype clearly demonstrate the benefits of the dynamic partitioned cache for mixed-critical real-time multi-core systems.
APA, Harvard, Vancouver, ISO, and other styles
6

Suleman, M. Aater, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. "Accelerating critical section execution with asymmetric multi-core architectures." ACM SIGARCH Computer Architecture News 37, no. 1 (March 2009): 253–64. http://dx.doi.org/10.1145/2528521.1508274.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Kuan-Chung, and Chung-Ho Chen. "Enabling SIMT Execution Model on Homogeneous Multi-Core System." ACM Transactions on Architecture and Code Optimization 15, no. 1 (April 2, 2018): 1–26. http://dx.doi.org/10.1145/3177960.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kulkarni, Abhishek, Latchesar Ionkov, Michael Lang, and Andrew Lumsdaine. "Optimizing process creation and execution on multi-core architectures." International Journal of High Performance Computing Applications 27, no. 2 (April 2, 2013): 147–61. http://dx.doi.org/10.1177/1094342013481483.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Suleman, M. Aater, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. "Accelerating critical section execution with asymmetric multi-core architectures." ACM SIGPLAN Notices 44, no. 3 (February 28, 2009): 253–64. http://dx.doi.org/10.1145/1508284.1508274.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Huang, Shujuan, Yi'an Zhu, Bailin Liu, and Feng Xiao. "Research on Three Dimensional Scheduling Model for Embedded Multi-Core System." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, no. 5 (October 2018): 1020–25. http://dx.doi.org/10.1051/jnwpu/20183651020.

Full text
Abstract:
This paper proposes a new three-dimensional scheduling model which can divide the tasks into harmonic tasks and non-harmonic tasks for the high demands of embedded mucticne plactorim. According to the characteristic parameters of the tasks and make the value of the rectangular area as the attribute of the execution region which is divided into executive region, interference region and free region with the characteristic of the area. By using these attributes of the different region, the tasks are allocated to different cores. Experimental results show that the proposed method is more fully optimizing the system utilization and throughput than PEDF.
APA, Harvard, Vancouver, ISO, and other styles
11

Saidi, Salah Eddine, Nicolas Pernet, and Yves Sorel. "A method for parallel scheduling of multi-rate co-simulation on multi-core platforms." Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles 74 (2019): 49. http://dx.doi.org/10.2516/ogst/2019009.

Full text
Abstract:
The design of cyber-physical systems is a complex process and relies on the simulation of the system behavior before its deployment. Such is the case, for instance, of joint simulation of the different subsystems that constitute a hybrid automotive powertrain. Co-simulation allows system designers to simulate a whole system composed of a number of interconnected subsystem simulators. Traditionally, these subsystems are modeled by experts of different fields using different tools, and then integrated into one tool to perform simulation at the system-level. This results in complex and compute-intensive co-simulations and requires the parallelization of these co-simulations in order to accelerate their execution. The simulators composing a co-simulation are characterized by the rates of data exchange between the simulators, defined by the engineers who set the communication steps. The RCOSIM approach allows the parallelization on multi-core processors of co-simulations using the FMI standard. This approach is based on the exploitation of the co-simulation parallelism where dependent functions perform different computation tasks. In this paper, we extend RCOSIM to handle additional co-simulation requirements. First, we extend the co-simulation to multi-rate, i.e. where simulators are assigned different communication steps. We present graph transformation rules and an algorithm that allow the execution of each simulator at its respective rate while ensuring correct and efficient data exchange between simulators. Second, we present an approach based on acyclic orientation of mixed graphs for handling mutual exclusion constraints between functions that belong to the same simulator due to the non-thread-safe implementation of FMI. We propose an exact algorithm and a heuristic for performing the acyclic orientation. The final stage of the proposed approach consists in scheduling the co-simulation on a multi-core architecture. We propose an algorithm and a heuristic for computing a schedule which minimizes the total execution time of the co-simulation. We evaluate the performance of our proposed approach in terms of the obtained execution speed. By applying our approach on an industrial use case, we obtained a maximum speedup of 2.91 on four cores.
APA, Harvard, Vancouver, ISO, and other styles
12

Mikami, Hiro, Kei Torigoe, Makoto Inokawa, and Masato Edahiro. "LLVM Instruction Latency Measurement for Software-Hardware Interface for Multi-many-core." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 22 (May 27, 2022): 50–63. http://dx.doi.org/10.24297/ijct.v22i.9231.

Full text
Abstract:
The increasing scale and complexity of embedded systems and the use of multi-many-core processors have resulted in a corresponding increase in the demand for software development with a high degree of parallelism. The degree of parallelism in software and the accuracy of performance estimation in the early design stages of model-based development can be improved by estimating performance of blocks in models and utilizing the estimate for parallelization. Research is therefore being performed on a software performance estimation technique that uses the IEEE2804-2019 hardware feature description called software-hardware interface for multi-many-core (SHIM). In SHIM, each LLVM-IR instruction is associated with the execution cycle of the target processor. Because several types of assembly instruction sequences for the target processor are generated from a given LLVM-IR instruction, it is not easy to estimate the number of execution cycles. In this study, we propose a regression analysis method to estimate the execution cycles for each LLVM-IR instruction. It is observed that our method estimated the execution cycles within the target error of ±20% in experiments using a Raspberry Pi3 Model B+.
APA, Harvard, Vancouver, ISO, and other styles
13

Rebaya, Asma, Karol Desnos, and Salem Hasnaoui. "Translating Hierarchical Simulink Applications to Real-time multi-core Execution." Universal Journal of Electrical and Electronic Engineering 7, no. 4 (August 2020): 242–61. http://dx.doi.org/10.13189/ujeee.2020.070403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Jena, Swagat Kumar, Satyabrata Das, and Satya Prakash Sahoo. "Design and Development of a Parallel Lexical Analyzer for C Language." International Journal of Knowledge-Based Organizations 8, no. 1 (January 2018): 68–82. http://dx.doi.org/10.4018/ijkbo.2018010105.

Full text
Abstract:
Future of computing is rapidly moving towards massively multi-core architecture because of its power and cost advantages. Almost everywhere Multi-core processors are being used now-a-days and number of cores per chip is also relatively increasing. To exploit full potential offered by multi-core architecture, the system software like compilers should be designed for parallelized execution. In the past, various significant works have been made to change the design of traditional compiler to take advantages of the future multi-core platform. This paper focuses on adapting parallelism in the lexical analysis phase of the compilation process. The main objective of our proposal is to do the lexical analysis i.e., finding the tokens in an input stream in parallel. We use the parallel constructs available in OpenMP to achieve parallelism in the lexical analysis process for multi-core machines. The experimental result of our proposal shows a significant performance improvement in the parallel lexical analysis phase as compared to sequential version in terms of time of execution.
APA, Harvard, Vancouver, ISO, and other styles
15

Rinku, Dhruva R., and M. AshaRani. "Reinforcement learning based multi core scheduling (RLBMCS) for real time systems." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (April 1, 2020): 1805. http://dx.doi.org/10.11591/ijece.v10i2.pp1805-1813.

Full text
Abstract:
Embedded systems with multi core processors are increasingly popular because of the diversity of applications that can be run on it. In this work, a reinforcement learning based scheduling method is proposed to handle the real time tasks in multi core systems with effective CPU usage and lower response time. The priority of the tasks is varied dynamically to ensure fairness with reinforcement learning based priority assignment and Multi Core MultiLevel Feedback queue (MCMLFQ) to manage the task execution in multi core system.
APA, Harvard, Vancouver, ISO, and other styles
16

Liu, Cong, Wen Wang, and Zhi Ying Wang. "Speculative High Performance Computation on Heterogeneous Multi-Core." Advanced Materials Research 1049-1050 (October 2014): 2126–30. http://dx.doi.org/10.4028/www.scientific.net/amr.1049-1050.2126.

Full text
Abstract:
Thread level speculation has been proposed and researched to parallelize traditional sequential applications on homogeneous multi-core architecture. In this paper, a heterogeneous multi-core hardware simulation system is present, which provides with TLS execution mechanism. With a novel TLS programming model and a number of new speculative tuning techniques, benchmarkGzipis parallelized from-3% to 195% on a four-core processor, and the speedup of the test benchmarks are 30%, 43% and 156%, respectively with arbitrary, hotspot and insight speculation.
APA, Harvard, Vancouver, ISO, and other styles
17

Uddin, Irfan. "One-IPC high-level simulation of microthreaded many-core architectures." International Journal of High Performance Computing Applications 31, no. 2 (July 28, 2016): 152–62. http://dx.doi.org/10.1177/1094342015584495.

Full text
Abstract:
The microthreaded many-core architecture is comprised of multiple clusters of fine-grained multi-threaded cores. The management of concurrency is supported in the instruction set architecture of the cores and the computational work in application is asynchronously delegated to different clusters of cores, where the cluster is allocated dynamically. Computer architects are always interested in analyzing the complex interaction amongst the dynamically allocated resources. Generally a detailed simulation with a cycle-accurate simulation of the execution time is used. However, the cycle-accurate simulator for the microthreaded architecture executes at the rate of 100,000 instructions per second, divided over the number of simulated cores. This means that the evaluation of a complex application executing on a contemporary multi-core machine can be very slow. To perform efficient design space exploration we present a co-simulation environment, where the detailed execution of instructions in the pipeline of microthreaded cores and the interactions amongst the hardware components are abstracted. We present the evaluation of the high-level simulation framework against the cycle-accurate simulation framework. The results show that the high-level simulator is faster and less complicated than the cycle-accurate simulator but with the cost of losing accuracy.
APA, Harvard, Vancouver, ISO, and other styles
18

Karasik, O. N., and A. A. Prihozhy. "ADVANCED SCHEDULER FOR COOPERATIVE EXECUTION OF THREADS ON MULTI-CORE SYSTEM." «System analysis and applied information science», no. 1 (May 4, 2017): 4–11. http://dx.doi.org/10.21122/2309-4923-2017-1-4-11.

Full text
Abstract:
Three architectures of the cooperative thread scheduler in a multithreaded application that is executed on a multi-core system are considered. Architecture A0 is based on the synchronization and scheduling facilities, which are provided by the operating system. Architecture A1 introduces a new synchronization primitive and a single queue of the blocked threads in the scheduler, which reduces the interaction activity between the threads and operating system, and significantly speed up the processes of blocking and unblocking the threads. Architecture A2 replaces the single queue of blocked threads with dedicated queues, one for each of the synchronizing primitives, extends the number of internal states of the primitive, reduces the inter- dependence of the scheduling threads, and further significantly speeds up the processes of blocking and unblocking the threads. All scheduler architectures are implemented on Windows operating systems and based on the User Mode Scheduling. Important experimental results are obtained for multithreaded applications that implement two blocked parallel algorithms of solving the linear algebraic equation systems by the Gaussian elimination. The algorithms differ in the way of the data distribution among threads and by the thread synchronization models. The number of threads varied from 32 to 7936. Architecture A1 shows the acceleration of up to 8.65% and the architecture A2 shows the acceleration of up to 11.98% compared to A0 architecture for the blocked parallel algorithms computing the triangular form and performing the back substitution. On the back substitution stage of the algorithms, architecture A1 gives the acceleration of up to 125%, and architecture A2 gives the acceleration of up to 413% compared to architecture A0. The experiments clearly show that the proposed architectures, A1 and A2 outperform A0 depending on the number of thread blocking and unblocking operations, which happen during the execution of multi-threaded applications. The conducted computational experiments demonstrate the improvement of parameters of multithreaded applications on a heterogeneous multi-core system due the proposed advanced versions of the thread scheduler.
APA, Harvard, Vancouver, ISO, and other styles
19

Soliman, Mostafa I. "Exploiting ILP, TLP, and DLP to Improve Multi-Core Performance of One-Sided Jacobi SVD." Parallel Processing Letters 19, no. 02 (June 2009): 355–75. http://dx.doi.org/10.1142/s0129626409000262.

Full text
Abstract:
This paper shows how the performance of singular value decomposition (SVD) is enhanced through the exploitation of ILP, TLP, and DLP on Intel multi-core processors using superscalar execution, multi-threading computation, and streaming SIMD extensions, respectively. To facilitate the exploitation of TLP on multiple execution cores, the well-known cyclic one-sided Jacobi algorithm is restructured to work in parallel. On two dual-core Intel Xeon processors with hyper-threading technology running at 3.0 GHz, our results show that the multi-threaded implementation of one-sided Jacobi SVD gives about four times faster than the single-threaded superscalar implementation. Furthermore, the multi-threaded SIMD implementation speeds up the execution of single-threaded one-sided Jacobi by a factor of 10, which is close to the ideal speedup. On a reasonable large matrix size fitted in the L2 cache, our results show a performance of 11 GFLOPS (double-precision) is achieved on the target system through the exploitation of ILP, TLP, and DLP as well as memory hierarchy.
APA, Harvard, Vancouver, ISO, and other styles
20

HU, WEI, TIANZHOU CHEN, QINGSONG SHI, and SHA LIU. "CRITICAL-PATH DRIVEN ROUTERS FOR ON-CHIP NETWORKS." Journal of Circuits, Systems and Computers 19, no. 07 (November 2010): 1543–57. http://dx.doi.org/10.1142/s021812661000689x.

Full text
Abstract:
Multithreaded programming has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors. The performance bottleneck of a multithreaded program is its critical path, whose length is its total execution time. As the number of cores within a processor increases, Network-on-Chip (NoC) has been proposed as a promising approach for inter-core communication. In order to optimize the performance of a multithreaded program running on an NoC based multi-core platform, we design and implement the critical-path driven router, which prioritizes inter-thread communication on the critical path when routing packets. The experimental results show that the critical-path driven router improves the execution time of the test case by 14.8% compared to the ordinary router.
APA, Harvard, Vancouver, ISO, and other styles
21

Kaiser, Benjamin, Matthias Keinert, Armin Lechler, and Alexander Verl. "CNC Tool Path Generation on Multi-Core Processors." Applied Mechanics and Materials 794 (October 2015): 339–46. http://dx.doi.org/10.4028/www.scientific.net/amm.794.339.

Full text
Abstract:
Tool path generation for CNC machine tools is mainly responsible for quality, accuracy and productivity of the manufacturing process and therefore in the focus of research activities. Many approaches regarding this topic yield to complex algorithms and thus, demand for the availability of sufficient processing performance realizing this algorithms in a CNC real-time environment. For that reason this paper presents an approach on how to use multi-core processors for CNC tool path generation functions. A partitioning concept is presented allowing to concurrently execute multiple threads realizing interpolation and arc length calculation algorithms. At the example of B-spline interpolation the execution time of the tool path generation function could be reduced significantly using the presented approach.
APA, Harvard, Vancouver, ISO, and other styles
22

Ouyang, Xiangzhen, and Yian Zhu. "Core-aware combining: Accelerating critical section execution on heterogeneous multi-core systems via combining synchronization." Journal of Parallel and Distributed Computing 162 (April 2022): 27–43. http://dx.doi.org/10.1016/j.jpdc.2022.01.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Chu, Edward Tsung Hsien, and Ming Ru Tsai. "Using Multi-Core to Debug Interactive Applications." Applied Mechanics and Materials 764-765 (May 2015): 1007–11. http://dx.doi.org/10.4028/www.scientific.net/amm.764-765.1007.

Full text
Abstract:
Instrumentation technology has been widely used in debugging interactive applications, such as interactive games and virtual reality. Debug codes are instrumented into a target program in order to collect run-time information. Although instrumentation provides detail information of the target program behavior, it can significantly prolong execution time, change program behavior and lead to incorrect debugging results, especially for time dependent and real-time applications. This paper aims to design a scalable parallel debugging mechanism to reduce instrumentation overhead while collecting detail run-time information. We design a new synchronization mechanism of instrumentation, named MDM, which uses multiple buffers to process debug messages. Also, a binding mechanism is used to specify the relationship between the target program, helper threads and cores. We conduct a case study of augmented reality interactive games on an Intel Core i7-2600 processor with Linux 2.6.38. Compared to existing methods, MDM can reduce instrumentation overhead by up to 19%.
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Lei, Ren Ping Dong, and Ya Ping Yu. "Realization of SMS4 Algorithm Based on Share Memory of the Heterogeneous Multi-Core Password Chip System." Applied Mechanics and Materials 668-669 (October 2014): 1368–73. http://dx.doi.org/10.4028/www.scientific.net/amm.668-669.1368.

Full text
Abstract:
In order to meet the rapid demand of the modern cryptographic communication, heterogeneous multi-core password system architecture is set up based on shared memory in Xilinx XUP Virtex-II Pro chip in this paper. Under this architecture, the encryption and decryption of the SMS4 algorithm is realized fastly. Make a contrast with the homogeneous multi-core password system and the heterogeneous one-core password system’s performance in the execution time, throughput and resource utilization of the SMS4 algorithm.The experimental results show that the heterogeneous multi-core password system which based on shared memory has better performance.
APA, Harvard, Vancouver, ISO, and other styles
25

Iman Fitri Ismail, Akmal Nizam Mohammed, Bambang Basuno, Siti Aisyah Alimuddin, and Mustafa Alas. "Evaluation of CFD Computing Performance on Multi-Core Processors for Flow Simulations." Journal of Advanced Research in Applied Sciences and Engineering Technology 28, no. 1 (September 11, 2022): 67–80. http://dx.doi.org/10.37934/araset.28.1.6780.

Full text
Abstract:
Previous parallel computing implementations for Computational Fluid Dynamics (CFD) focused extensively on Complex Instruction Set Computer (CISC). Parallel programming was incorporated into the previous generation of the Raspberry Pi Reduced Instruction Set Computer (RISC). However, it yielded poor computing performance due to the processing power limits of the time. This research focuses on utilising two Raspberry Pi 3 B+ with increased processing capability compared to its previous generation to tackle fluid flow problems using numerical analysis and CFD. Parallel computing elements such as Secure Shell (SSH) and the Message Passing Interface (MPI) protocol were implemented for Advanced RISC Machine (ARM) processors. The parallel network was then validated by a processor call attempt and core execution test. Parallelisation of the processors enables the study of fluid flow and computational fluid dynamics (CFD) problems, such as validation of the NACA 0012 airfoil and an additional case of the Laplace equation for computing the temperature distribution via the parallel system. The experimental NACA 0012 data was validated using the parallel system, which can simulate the airfoil's physics. Each core was enabled and tested to determine the system's performance in parallelising the execution of various programming algorithms such as pi calculation. A comparison of the execution time for the NACA 0012 validation case yielded a parallelisation efficiency above 50%. The case studies confirmed the Raspberry Pi 3 B+'s successful parallelisation independent of external software and machines, making it a self-sustaining compact demonstration cluster of parallel computers for CFD.
APA, Harvard, Vancouver, ISO, and other styles
26

SANTOS COSTA, VíTOR, INÊS DUTRA, and RICARDO ROCHA. "Threads and or-parallelism unified." Theory and Practice of Logic Programming 10, no. 4-6 (July 2010): 417–32. http://dx.doi.org/10.1017/s1471068410000190.

Full text
Abstract:
AbstractOne of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world applications in low cost multi-core architectures. To achieve these goals, we revive and redesign the YapOr system to exploit or-parallelism based on a multi-threaded implementation. Our new approach takes full advantage of the state-of-the-art fast and optimized YAP Prolog engine and shares the underlying execution environment, scheduler and most of the data structures used to support YapOr's model. Initial experiments with our new approach consistently achieve almost linear speedups for most of the applications, proving itself as a good alternative for exploiting implicit parallelism in the currently available low cost multi-core architectures.
APA, Harvard, Vancouver, ISO, and other styles
27

AbdulRazzaq, Atheer Akram, Qusay Shihab Hamad, and Ahmed Majid Taha. "Parallel implementation of maximum-shift algorithm using OpenMp." Indonesian Journal of Electrical Engineering and Computer Science 22, no. 3 (June 1, 2021): 1529. http://dx.doi.org/10.11591/ijeecs.v22.i3.pp1529-1539.

Full text
Abstract:
String matching is considered as one of the center issues within the field of computer science, where there are numerous computer applications that supply the clients with string matching services. The increment within the number of databases which are created and protected in numerous computer gadgets had impacted researchers with the slant towards getting robust techniques in tending to this issue. In this study, the Maximum-Shift string matching algorithm is chosen to be executed with multi-core innovation through the utilization of OpenMP paradigm, in order to decrease the successive time, and increment the speedup and efficiency of the algorithm. The deoxyribonucleic acid (DNA), protein and the English text datasets are utilized to test the parallel execution that influences the Maximum-Shift algorithm execution when utilized with multi-core environment. The results demonstrated that the execution is affected by the performance between the parallel and consecutive execution of Maximum-Shift algorithm by data type. The English text appeared ideal comes about within the parallel execution time as compared to other datasets, whereas the DNA database set appeared the most elevated comes about when compared to other data types in terms of speedup and efficiency capabilities.
APA, Harvard, Vancouver, ISO, and other styles
28

Qiu, Yue, and Jing Feng Zang. "Based on Improved Genetic Algorithm for Task Scheduling of Heterogeneous Multi-Core Processor." Advanced Materials Research 1030-1032 (September 2014): 1671–75. http://dx.doi.org/10.4028/www.scientific.net/amr.1030-1032.1671.

Full text
Abstract:
This paper puts forward an improved genetic scheduling algorithm in order to improve the execution efficiency of task scheduling of the heterogeneous multi-core processor system and give full play to its performance. The attribute values and the high value of tasks were introduced to structure the initial population, randomly selected a method with the 50% probability to sort for task of individuals of the population, thus to get high quality initial population and ensured the diversity of the population. The experimental results have shown that the performance of the improved algorithm was better than that of the traditional genetic algorithm and the HEFT algorithm. The execution time of tasks was reduced.
APA, Harvard, Vancouver, ISO, and other styles
29

Maghsoudloo, Mohammad, and Hamid Zarandi. "Parallel execution tracing: An alternative solution to exploit under-utilized resources in multi-core architectures for control-flow checking." Facta universitatis - series: Electronics and Energetics 29, no. 2 (2016): 243–60. http://dx.doi.org/10.2298/fuee1602243m.

Full text
Abstract:
In this paper, a software behavior-based technique is presented to detect control-flow errors in multi-core architectures. The analysis of a key point leads to introduction of the proposed technique: employing under-utilized CPU resources in multi-core processors to check the execution flow of the programs concurrently and in parallel with the main executions. To evaluate the proposed technique, a quad-core processor system was used as the simulation environment, and the behavior of SPEC CPU2006 benchmarks were studied as the target to compare with conventional techniques. The experimental results, with regard to both detection coverage and performance overhead, demonstrate that on average, about 94% of the control-flow errors can be detected by the proposed technique, with less performance overhead compared to previous techniques. <br><br><font color="red"><b> This article has been retracted. Link to the retraction <u><a href="http://dx.doi.org/10.2298/FUEE1801155E">10.2298/FUEE1801155E</a><u></b></font>
APA, Harvard, Vancouver, ISO, and other styles
30

T R, Vinay, and Ajeet A. Chikkamannur. "A Machine Learning Technique for Abstraction of Modules in Legacy System and Assigning them on Multicore Machines Using and Controlling p-threads." International Journal on Recent and Innovation Trends in Computing and Communication 10, no. 12 (December 31, 2022): 21–25. http://dx.doi.org/10.17762/ijritcc.v10i12.5837.

Full text
Abstract:
Hardware and Software technology has undergone a sea-of-change in recent past. Hardware technology has moved from single-core to multi-core machine, thus capable of executing multi-task at the same time. But traditional software’s (Legacy system) are still in use today in business world. It is not easy to replace them with new software system as they carry loads of knowledge, business value with them. Also, to build new software system by taking the requirements afresh involves lot of resources in terms of skilled human resources, time and financial resources. At last the customer may not have confidence in this new software. Instead of building a new software, an attempt is made to develop a semi-automated methodology by learning about the program itself (machine learning about the program) to abstract the independent modules present in the same abstraction level (implementation level) and recode the legacy program (single threaded program) into multi-threaded parallel program. A case study program is considered and execution time is noted and analyzed for both the original program and reengineered program on a multi-core machine.
APA, Harvard, Vancouver, ISO, and other styles
31

Alonso, Pedro, Manuel F. Dolz, Rafael Mayo, and Enrique S. Quintana-Ortí. "Energy-efficient execution of dense linear algebra algorithms on multi-core processors." Cluster Computing 16, no. 3 (May 12, 2012): 497–509. http://dx.doi.org/10.1007/s10586-012-0215-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Rajalakshmi, N. R., Ankur Dumka, Manoj Kumar, Rajesh Singh, Anita Gehlot, Shaik Vaseem Akram, Divya Anand, Dalia H. Elkamchouchi, and Irene Delgado Noya. "A Cost-Optimized Data Parallel Task Scheduling with Deadline Constraints in Cloud." Electronics 11, no. 13 (June 28, 2022): 2022. http://dx.doi.org/10.3390/electronics11132022.

Full text
Abstract:
Large-scale distributed systems have the advantages of high processing speeds and large communication bandwidths over the network. The processing of huge real-world data within a time constraint becomes tricky, due to the complexity of data parallel task scheduling in a time constrained environment. This paper proposes data parallel task scheduling in cloud to address the minimization of cost and time constraints. By running concurrent executions of tasks on multi-core cloud resources, the number of parallel executions could be increased correspondingly, thereby, finishing the task within the deadline is possible. A mathematical model is developed here to minimize the operational cost of data parallel tasks by feasibly assigning a load to each virtual machine in the cloud data center. This work experiments with a machine learning model that is replicated on the multi-core cloud heterogeneous resources to execute different input data concurrently to accomplish distributive learning. The outcome of concurrent execution of data-intensive tasks on different parts of the input dataset gives better solutions in terms of processing the task by the deadline at optimized cost.
APA, Harvard, Vancouver, ISO, and other styles
33

DURAN, ALEJANDRO, EDUARD AYGUADÉ, ROSA M. BADIA, JESÚS LABARTA, LUIS MARTINELL, XAVIER MARTORELL, and JUDIT PLANAS. "OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES." Parallel Processing Letters 21, no. 02 (June 2011): 173–93. http://dx.doi.org/10.1142/s0129626411000151.

Full text
Abstract:
In this paper, we present OmpSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on different architectures, SMP, GPUs, and hybrid SMP/GPU environments, showing the wide usefulness of the approach. The evaluation is done with six different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, Julia Set, PBPI and FixedGrid. We compare the results obtained with the execution of the same benchmarks written in OpenCL or OpenMP, on the same architectures. The results show that OmpSs greatly outperforms both environments. With the use of OmpSs the programming environment is more flexible than traditional approaches to exploit multiple accelerators, and due to the simplicity of the annotations, it increases programmer's productivity.
APA, Harvard, Vancouver, ISO, and other styles
34

Sørensen, Peter, and Jan Madsen. "Generating Process Network Communication Infrastructure for Custom Multi-Core Platforms." International Journal of Embedded and Real-Time Communication Systems 1, no. 1 (January 2010): 37–63. http://dx.doi.org/10.4018/jertcs.2010103003.

Full text
Abstract:
We present an approach for generating implementations of abstraction layers implementing the communication infrastructure of applications modeled as process networks. Our approach is unique in that it does not rely on assumptions about the capabilities and topology of the underlying platform. Instead, a generic implementation is adapted to the particular platform based on information retrieved from analyzing the platform. At the heart of the approach is a novel method for analyzing the capabilities of custom execution platforms composed of components. The versatility and usefulness of the approach and analysis method is demonstrated through a case study.
APA, Harvard, Vancouver, ISO, and other styles
35

Nozal, Raúl, and Jose Luis Bosque. "Straightforward Heterogeneous Computing with the oneAPI Coexecutor Runtime." Electronics 10, no. 19 (September 29, 2021): 2386. http://dx.doi.org/10.3390/electronics10192386.

Full text
Abstract:
Heterogeneous systems are the core architecture of most computing systems, from high-performance computing nodes to embedded devices, due to their excellent performance and energy efficiency. Efficiently programming these systems has become a major challenge due to the complexity of their architectures and the efforts required to provide them with co-execution capabilities that can fully exploit the applications. There are many proposals to simplify the programming and management of acceleration devices and multi-core CPUs. However, in many cases, portability and ease of use compromise the efficiency of different devices—even more so when co-executing. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using two heterogeneous systems composed of an integrated GPU and CPU. Static and dynamic load balancers are integrated and evaluated, highlighting single and co-execution strategies and the most significant key points of this promising technology. Experimental results show that co-execution is worthwhile when using dynamic algorithms and improves the efficiency even further when using unified shared memory.
APA, Harvard, Vancouver, ISO, and other styles
36

Bielecki, Włodzimierz, Piotr Błaszyński, and Marek Pałkowski. "3D Tiled Code Generation for Nussinov’s Algorithm." Applied Sciences 12, no. 12 (June 9, 2022): 5898. http://dx.doi.org/10.3390/app12125898.

Full text
Abstract:
Current state-of-the-art parallel codes used to calculate the maximum number of pairs for a given RNA sequence by means of Nussinov’s algorithm do not allow for achieving speedup close up to the number of the processors used for execution of those codes on multi-core computers. This is due to the fact that known codes do not make full use of and derive benefit from cache memory of such computers. There is a need to develop new approaches allowing for increasing cache exploitation in multi-core computers. One of such possibilities is increasing the dimension of tiles in generated target tiled code and assuring a similar size of generated tiles. The article presents an approach allowing us to produce 3D parallel code with tiling calculating Nussinov’s RNA folding, i.e., code with the maximal tile dimension possible for the loop nest, executing Nussinov’s algorithm. The approach guarantees that generated tiles are of a similar size. The code generated with the presented approach is characterized by increased code locality and outperforms all closely related ones examined by us. This allows us to considerably reduce execution time required for computing the maximum number of pairs of any nested structure for larger RNA sequences by means of Nussinov’s algorithm.
APA, Harvard, Vancouver, ISO, and other styles
37

R, Maheswari, Pattabiraman V, and Sharmila P. "RECONFIGURABLE FPGA BASED SOFT-CORE PROCESSOR FOR SIMD APPLICATIONS." Asian Journal of Pharmaceutical and Clinical Research 10, no. 13 (April 1, 2017): 180. http://dx.doi.org/10.22159/ajpcr.2017.v10s1.19632.

Full text
Abstract:
Objective: The prospective need of SIMD (Single Instruction and Multiple Data) applications like video and image processing in single system requires greater flexibility in computation to deliver high quality real time data. This paper performs an analysis of FPGA (Field Programmable Gate Array) based high performance Reconfigurable OpenRISC1200 (ROR) soft-core processor for SIMD.Methods: The ROR1200 ensures performance improvement by data level parallelism executing SIMD instruction simultaneously in HPRC (High Performance Reconfigurable Computing) at reduced resource utilization through RRF (Reconfigurable Register File) with multiple core functionalities. This work aims at analyzing the functionality of the reconfigurable architecture, by illustrating the implementation of two different image processing operations such as image convolution and image quality improvement. The MAC (Multiply-Accumulate) unit of ROR1200 used to perform image convolution and execution unit with HPRC is used for image quality improvement.Result: With parallel execution in multi-core, the proposed processor improves image quality by doubling the frame rate up-to 60 fps (frames per second) with peak power consumption of 400mWatt. Thus the processor gives a significant computational cost of 12ms with a refresh rate of 60Hz and 1.29ns of MAC critical path delay.Conclusion:This FPGA based processor becomes a feasible solution for portable embedded SIMD based applications which need high performance at reduced power consumptions
APA, Harvard, Vancouver, ISO, and other styles
38

UDDIN, M. IRFAN, MICHIEL W. VAN TOL, and CHRIS R. JESSHOPE. "HIGH LEVEL SIMULATION OF SVP MANY-CORE SYSTEMS." Parallel Processing Letters 21, no. 04 (December 2011): 413–38. http://dx.doi.org/10.1142/s0129626411000308.

Full text
Abstract:
The Microgrid is a many-core architecture comprising multiple clusters of fine-grained multi-threaded cores. The SVP API supported by the cores allows for the asynchronous delegation of work to different clusters of cores which can be acquired dynamically. We want to explore the execution of complex applications and their interaction with dynamically allocated resources. To date, any evaluation of the Microgrid has used a detailed emulation with a cycle accurate simulation of the execution time. Although the emulator can be used to evaluate small program kernels, it only executes at a rate of 100K instructions per second, divided over the number of emulated cores. This makes it inefficient to evaluate a complex application executing on many cores using dynamic allocation of clusters. In order to obtain a more efficient evaluation we have developed a co-simulation environment that executes high level SVP control code but which abstracts the scheduling of the low-level threads using two different techniques. The co-simulation is evaluated for both performance and simulation accuracy.
APA, Harvard, Vancouver, ISO, and other styles
39

Dümmler, Jörg, Thomas Rauber, and Gudula Rünger. "Combined Scheduling and Mapping for Scalable Computing with Parallel Tasks." Scientific Programming 20, no. 1 (2012): 45–67. http://dx.doi.org/10.1155/2012/514940.

Full text
Abstract:
Recent and future parallel clusters and supercomputers use symmetric multiprocessors (SMPs) and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection networks combining computing resources at different levels, starting with the interconnect within multi-core processors up to the interconnection network combining nodes of the cluster or supercomputer. The challenge for the programmer is that these computing resources should be utilized efficiently by exploiting the available degree of parallelism of the application program and by structuring the application in a way which is sensitive to the heterogeneous interconnect. In this article, we pursue a parallel programming method using parallel tasks to structure parallel implementations. A parallel task can be executed by multiple processors or cores and, for each activation of a parallel task, the actual number of executing cores can be adapted to the specific execution situation. In particular, we propose a new combined scheduling and mapping technique for parallel tasks with dependencies that takes the hierarchical structure of modern multi-core clusters into account. An experimental evaluation shows that the presented programming approach can lead to a significantly higher performance compared to standard data parallel implementations.
APA, Harvard, Vancouver, ISO, and other styles
40

Weisz, Sergiu, and Marta Bertran Ferrer. "Adding multi-core support to the ALICE Grid Middleware." Journal of Physics: Conference Series 2438, no. 1 (February 1, 2023): 012009. http://dx.doi.org/10.1088/1742-6596/2438/1/012009.

Full text
Abstract:
Abstract The major upgrade of the ALICE experiment for the LHC Run3 poses unique challenges and opportunities for new software development. In particular, the entirely new data taking and processing software of ALICE relies on process parallelism and large amounts of shared objects in memory. Thus from a single-core single thread workload in the past, the new workloads are exclusively multithreaded. This requires a profound change in the ALICE Grid midleware job handling, from scheduling to execution, and thus the entire middleware has been rewritten during the past 3 years to support the new multithreaded reality. This paper presents the ALICE middlewre development for multi-core job management and the tools used to achieve an efficient and secure environment. In particular, it covers job isolation and scheduling and how they can be implemented in different site configurations, such as sites shared with other experiments or High Performance Computing resources.
APA, Harvard, Vancouver, ISO, and other styles
41

Zhai, Wenzheng, Yue-Li Hu, and Feng Ran. "CQPSO scheduling algorithm for heterogeneous multi-core DAG task model." Modern Physics Letters B 31, no. 19-21 (July 27, 2017): 1740050. http://dx.doi.org/10.1142/s0217984917400504.

Full text
Abstract:
Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
APA, Harvard, Vancouver, ISO, and other styles
42

Zhang, Jikai, Haidong Lan, Yuandong Chan, Yuan Shang, Bertil Schmidt, and Weiguo Liu. "BGSA: a bit-parallel global sequence alignment toolkit for multi-core and many-core architectures." Bioinformatics 35, no. 13 (November 16, 2018): 2306–8. http://dx.doi.org/10.1093/bioinformatics/bty930.

Full text
Abstract:
Abstract Motivation Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. Results BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance performance of BGSA on a Xeon Phi-7210 outperforms the highly optimized NVBio implementation on a Titan X GPU for the seed verification stage of a read mapper by a factor of 4.4. Availability and implementation BGSA is open-source and available at https://github.com/sdu-hpcl/BGSA. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
43

Huang, Kai, Ming Jing, Xiaowen Jiang, Siheng Chen, Xiaobo Li, Wei Tao, Dongliang Xiong, and Zhili Liu. "Task-Level Aware Scheduling of Energy-Constrained Applications on Heterogeneous Multi-Core System." Electronics 9, no. 12 (December 5, 2020): 2077. http://dx.doi.org/10.3390/electronics9122077.

Full text
Abstract:
Minimizing the schedule length of parallel applications, which run on a heterogeneous multi-core system and are subject to energy consumption constraints, has recently attracted much attention. The key point of this problem is the strategy to pre-allocate the energy consumption of unscheduled tasks. Previous articles used the minimum value, average value or a power consumption weight value as the pre-allocation energy consumption of tasks. However, they all ignored the different levels of tasks. The tasks in different task levels have different impact on the overall schedule length when they are allocated the same energy consumption. Considering the task levels, we designed a novel task energy consumption pre-allocation strategy that is conducive to minimizing the scheduling time and developed a novel task schedule algorithm based on it. After getting the preliminary scheduling results, we also proposed a task execution frequency re-adjustment mechanism that can re-adjust the execution frequency of tasks, to further reduce the overall schedule length. We carried out a considerable number of experiments with practical parallel application models. The results of the experiments show that our method can reach better performance compared with the existing algorithms.
APA, Harvard, Vancouver, ISO, and other styles
44

Vivanco, José María, Matthias Keinert, Armin Lechler, and Alexander Verl. "Analysis and Design of Computerized Numerical Controls for Execution on Multi-core Systems." Procedia CIRP 41 (2016): 864–69. http://dx.doi.org/10.1016/j.procir.2015.12.021.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Muresano, Ronal, Hugo Meyer, Dolores Rexachs, and Emilio Luque. "An approach for an efficient execution of SPMD applications on Multi-core environments." Future Generation Computer Systems 66 (January 2017): 11–26. http://dx.doi.org/10.1016/j.future.2016.06.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Sesha Saiteja, Maddula N. V., K. Sai Sumanth Reddy, D. Radha, and Minal Moharir. "Multi-Core Architecture and Network on Chip: Applications and Challenges." Journal of Computational and Theoretical Nanoscience 17, no. 1 (January 1, 2020): 239–45. http://dx.doi.org/10.1166/jctn.2020.8657.

Full text
Abstract:
Technology improves performance and reduces in size day by day. Reduction in size can increase the density and which in turn can improve the performance. These statements suit very well for the computer architecture improvement. The whole System on Chip (SoC) brought the concept of multiple cores on a single chip. The multi-core or many-core architectures are the future of computing. Technology has improved in reducing the size and increasing the density, but improving the performance to an expectation of including more cores is a challenge of many-core technology. Utilization of all cores and improving the performance of execution by these cores are the challenges to be addressed in a many-core technology. This paper discusses the basics of many core architecture, comparison and applications. Further, it covers the basics of Network on Chip (NoC), architectural components, and various views of current Network on Chip research problems. Research problems include improving the performance of communication by avoiding congested path in routing.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Kuo Yi, Fuh Gwo Chen, and Jr Shian Chen. "A Cost-Effective Hardware Approach for Measuring Power Consumption of Modern Multi-Core Processors." Applied Mechanics and Materials 110-116 (October 2011): 4569–73. http://dx.doi.org/10.4028/www.scientific.net/amm.110-116.4569.

Full text
Abstract:
Multiple processor cores are built within a chip by advanced VLSI technology. With the decreasing prices, multi-core processors are widely deployed in both server and desktop systems. The workload of multi-threaded applications could be separated to different cores by multiple threads, such that application threads can run concurrently to maximize overall execution speed of the applications. Moreover, for the green trend of computing nowadays, most of modern multi-core processors have a functionality of dynamic frequency turning. The power-level tuning techniques are based on Dynamic Voltage and Frequency Scaling (DVFS). In order to evaluate the performance of various power-saving approaches, an appropriate technique to measure the power consumption of multi-core processors is important. However, most of approaches estimate CPU power consumption only from CMOS power consumption data and CPU frequency. These approaches only estimate the dynamic power consumption of multi-core processors, the static power consumption is not be included. In this study, a hardware approach for the power consumption measurement of multi-core processors is proposed. Thus the power consumption of a CPU could be measured precisely, and the performance of CPU power-saving approaches can be evaluated well.
APA, Harvard, Vancouver, ISO, and other styles
48

Chen, Yong Heng, Wan Li Zuo, and Feng Lin He. "Optimization Strategy of Bidirectional Join Enumeration in Multi-Core CPUS." Applied Mechanics and Materials 44-47 (December 2010): 383–87. http://dx.doi.org/10.4028/www.scientific.net/amm.44-47.383.

Full text
Abstract:
Most contemporary database systems query optimizers exploit System-R’s Bottom-up dynamic programming method (DP) to find the optimal query execution plan (QEP) without evaluating redundant sub-plans. As modern microprocessors employ multiple cores to accelerate computations, the parallel optimization algorithm has been proposed to parallelize the Bottom-up DP query optimization process. However Top-down DP method can derive upper bounds for the costs of the plans it generates which is not available to typical Bottom-up DP method since such method generate and cost all subplans before considering larger containing plans. This paper combined the enhancements of two approaches and proposes a comprehensive and practical algorithm based graph-traversal driven, referred to here as DPbid, for parallelizing query optimization in the multi-core processor architecture. This paper has implemented such a search strategy and experimental results show that can improve optimization time effective compared to known existing algorithms.
APA, Harvard, Vancouver, ISO, and other styles
49

Park, Sihyeong, Mi-Young Kwon, Hoon-Kyu Kim, and Hyungshin Kim. "Execution Model to Reduce the Interference of Shared Memory in ARINC 653 Compliant Multicore RTOS." Applied Sciences 10, no. 7 (April 3, 2020): 2464. http://dx.doi.org/10.3390/app10072464.

Full text
Abstract:
Multicore architecture is applied to contemporary avionics systems to deal with complex tasks. However, multicore architectures can cause interference by contention because the cores share hardware resources. This interference reduces the predictable execution time of safety-critical systems, such as avionics systems. To reduce this interference, methods of separating hardware resources or limiting capacity by core have been proposed. Existing studies have modified kernels to control hardware resources. Additionally, an execution model has been proposed that can reduce interference by adjusting the execution order of tasks without software modification. Avionics systems require several rigorous software verification procedures. Therefore, modifying existing software can be costly and time-consuming. In this work, we propose a method to apply execution models proposed in existing studies without modifying commercial real-time operating systems. We implemented the time-division multiple access (TDMA) and acquisition execution restitution (AER) execution models with pseudo-partition and message queuing on VxWorks 653. Moreover, we propose a multi-TDMA model considering the characteristics of the target hardware. For the interference analysis, we measured the L1 and L2 cache misses and the number of main memory requests. We demonstrated that the interference caused by memory sharing was reduced by at least 60% in the execution model. In particular, multi-TDMA doubled utilization compared to TDMA and also reduced the execution time by 20% compared to the AER model.
APA, Harvard, Vancouver, ISO, and other styles
50

Sodinapalli, Nagendra Prasad, Subhash Kulkarni, Nawaz Ahmed Sharief, and Prasanth Venkatareddy. "An efficient resource utilization technique for scheduling scientific workload in cloud computing environment." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 1 (March 1, 2022): 367. http://dx.doi.org/10.11591/ijai.v11.i1.pp367-378.

Full text
Abstract:
<span lang="EN-US">Recently, number of data intensive workflow have been generated with growth of internet of things (IoT’s) technologies. Heterogeneous cloud framework has been emphasized by existing methodologies for executing these data-intensive workflows. Efficient resource scheduling plays a very important role provisioning workload execution on Heterogeneous cloud framework. Building tradeoff model in meeting energy constraint and workload task deadline requirement is challenging. Recently, number of multi-objective-based workload scheduling aimed at minimizing power budget and meeting task deadline constraint. However, these models induce significant overhead when demand and number of processing core increases. For addressing research problem here, the workload is modelled by considering each sub-task require dynamic memory, cache, accessible slots, execution time, and I/O access requirement. Thus, for utilizing resource more efficiently better cache resource management is needed. Here efficient resource utilization (ERU) model is presented. The ERU model is designed to utilize cache resource more efficiently and reduce last level cache failure and meeting workload task deadline prerequisite. The ERU model is very efficient when compared with standard resource management methodology in terms of reducing execution time, power consumption, and energy consumption for execution scientific workloads on heterogeneous cloud platform.</span>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography