Journal articles: 'Extensible processor'

1

Martin, Grant. "What is a configurable, extensible processor?" ACM SIGDA Newsletter 38, no. 16 (August 15, 2008): 1. http://dx.doi.org/10.1145/1862846.1862847.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Martin, Grant. "What is a configurable, extensible processor?" ACM SIGDA Newsletter 38, no. 17 (September 2008): 1. http://dx.doi.org/10.1145/1862849.1862850.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Gonzalez, R. E. "Xtensa: a configurable and extensible processor." IEEE Micro 20, no. 2 (2000): 60–70. http://dx.doi.org/10.1109/40.848473.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Martin, Grant. "Multi-Processor SoC-Based Design Methodologies Using Configurable and Extensible Processors." Journal of Signal Processing Systems 53, no. 1-2 (November 29, 2007): 113–27. http://dx.doi.org/10.1007/s11265-007-0153-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Sun, F., S. Ravi, A. Raghunathan, and N. K. Jha. "Custom-Instruction Synthesis for Extensible-Processor Platforms." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23, no. 2 (February 2004): 216–28. http://dx.doi.org/10.1109/tcad.2003.822133.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Misko, Joshua, Shrikant S. Jadhav, and Youngsoo Kim. "Extensible Embedded Processor for Convolutional Neural Networks." Scientific Programming 2021 (April 21, 2021): 1–12. http://dx.doi.org/10.1155/2021/6630552.

Full text

Abstract:

Convolutional neural networks (CNNs) require significant computing power during inference. Smart phones, for example, may not run a facial recognition system or search algorithm smoothly due to the lack of resources and supporting hardware. Methods for reducing memory size and increasing execution speed have been explored, but choosing effective techniques for an application requires extensive knowledge of the network architecture. This paper proposes a general approach to preparing a compressed deep neural network processor for inference with minimal additions to existing microprocessor hardware. To show the benefits to the proposed approach, an example CNN for synthetic aperture radar target classification is modified and complimentary custom processor instructions are designed. The modified CNN is examined to show the effects of the modifications and the custom processor instructions are profiled to illustrate the potential performance increase from the new extended instructions.

APA, Harvard, Vancouver, ISO, and other styles

7

Noori, Hamid, Farhad Mehdipour, Kazuaki Murakami, Koji Inoue, and Morteza Saheb Zamani. "An architecture framework for an adaptive extensible processor." Journal of Supercomputing 45, no. 3 (February 1, 2008): 313–40. http://dx.doi.org/10.1007/s11227-008-0174-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Dutheil, Julien Y., Sylvain Gaillard, and Eva H. Stukenbrock. "MafFilter: a highly flexible and extensible multiple genome alignment files processor." BMC Genomics 15, no. 1 (2014): 53. http://dx.doi.org/10.1186/1471-2164-15-53.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Bauer, L., M. Shafique, and J. Henkel. "Efficient Resource Utilization for an Extensible Processor Through Dynamic Instruction Set Adaptation." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, no. 10 (October 2008): 1295–308. http://dx.doi.org/10.1109/tvlsi.2008.2002430.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Sano, Kentaro, Luzhou Wang, and Satoru Yamamoto. "Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation." ACM SIGARCH Computer Architecture News 38, no. 4 (September 14, 2010): 80–86. http://dx.doi.org/10.1145/1926367.1926381.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Lin, Shengbing Zhang, and Juan Wu. "Design of Deep Learning VLIW Processor for Image Recognition." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 38, no. 1 (February 2020): 216–24. http://dx.doi.org/10.1051/jnwpu/20203810216.

Full text

Abstract:

In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields, and to solve the problem of insufficient parallelism in existing researches, an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model. Parallel processing of feature maps and neurons, instruction level parallelism based on very long instruction word (VLIW), data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design. The test results based on FPGA prototype system show that the processor can effectively complete the image classification and object detection applications. The peak performance of processor is up to 128 GOP/s when it operates at 200 MHz. For selecting benchmarks, the processor speed is about 12X faster than CPU and 7X faster than GPU at least. Comparing with the results of the software framework, the average error of the test accuracy of the processor is less than 1%.

APA, Harvard, Vancouver, ISO, and other styles

12

Cetin, E., R. C. S. Morling, and I. Kale. "An extensible complex fast Fourier transform processor chip for real-time spectrum analysis and measurement." IEEE Transactions on Instrumentation and Measurement 47, no. 1 (1998): 95–99. http://dx.doi.org/10.1109/19.728798.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Lagadec, Loïc, Damien Picard, Youenn Corre, and Pierre-Yves Lucas. "Experiment Centric Teaching for Reconfigurable Processors." International Journal of Reconfigurable Computing 2011 (2011): 1–14. http://dx.doi.org/10.1155/2011/952560.

Full text

Abstract:

This paper presents a setup for teaching configware to master students. Our approach focuses on experiment and leaning-by-doing while being supported by research activity. The central project we submit to students addresses building up a simple RISC processor, that supports an extensible instructions set thanks to its reconfigurable functional unit. The originality comes from that the students make use of the Biniou framework. Biniou is a research tool which approach covers tasks ranging from describing the RFU, synthesizing it as VHDL code, and implementing applications over it. Once done, students exhibit a deep understanding of the domain, ensuring the ability to fast adapt to state-of-the-art techniques.

APA, Harvard, Vancouver, ISO, and other styles

14

WAGGY, SCOTT B., ALEC KUCALA, and SEDAT BIRINGEN. "PARALLEL IMPLEMENTATION OF A NAVIER–STOKES SOLVER: TURBULENT EKMAN LAYER DIRECT SIMULATION." International Journal of Computational Methods 11, no. 05 (October 2014): 1350070. http://dx.doi.org/10.1142/s0219876213500709.

Full text

Abstract:

A massively parallel direct numerical solution procedure for the turbulent Ekman layer is presented. The simulations study the dynamics of turbulence in this flow by solving the incompressible Navier–Stokes equations with Coriolis and buoyancy terms. The governing equations are integrated via a semi-implicit time advancement algorithm which is massively parallelized using the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Accuracy of the numerical scheme was validated by comparisons of simulation results with the hydrodynamic linear stability theory for Poiseuille flow. Two cases are presented to demonstrate the capabilities of the code: (a) a neutrally stable case of Reynolds number, Re = 400 and (b) an unstably stratified case at Re = 1,000 requiring very high resolution in all coordinate directions. Results indicate that the scalability is not limited by the overall size of the problem, but rather by the number of mesh points per processor. Strong scaling is demonstrated for both cases with as few as 10,000 unknowns per processor.

APA, Harvard, Vancouver, ISO, and other styles

15

Wait, Eric, Mark Winter, and Andrew R. Cohen. "Hydra image processor: 5-D GPU image analysis library with MATLAB and python wrappers." Bioinformatics 35, no. 24 (June 26, 2019): 5393–95. http://dx.doi.org/10.1093/bioinformatics/btz523.

Full text

Abstract:

Abstract Summary Light microscopes can now capture data in five dimensions at very high frame rates producing terabytes of data per experiment. Five-dimensional data has three spatial dimensions (x, y, z), multiple channels (λ) and time (t). Current tools are prohibitively time consuming and do not efficiently utilize available hardware. The hydra image processor (HIP) is a new library providing hardware-accelerated image processing accessible from interpreted languages including MATLAB and Python. HIP automatically distributes data/computation across system and video RAM allowing hardware-accelerated processing of arbitrarily large images. HIP also partitions compute tasks optimally across multiple GPUs. HIP includes a new kernel renormalization reducing boundary effects associated with widely used padding approaches. Availability and implementation HIP is free and open source software released under the BSD 3-Clause License. Source code and compiled binary files will be maintained on http://www.hydraimageprocessor.com. A comprehensive description of all MATLAB and Python interfaces and user documents are provided. HIP includes GPU-accelerated support for most common image processing operations in 2-D and 3-D and is easily extensible. HIP uses the NVIDIA CUDA interface to access the GPU. CUDA is well supported on Windows and Linux with macOS support in the future.

APA, Harvard, Vancouver, ISO, and other styles

16

Huang, LinYun, Young-Pil Lee, Yong-Seon Moon, and Young-Chul Bae. "Noble Implementation of Motor Driver with All Programmable SoC for Humanoid Robot or Industrial Device." International Journal of Humanoid Robotics 14, no. 04 (November 16, 2017): 1750028. http://dx.doi.org/10.1142/s0219843617500281.

Full text

Abstract:

Currently, as the requirements for simple implementations in the motor control technologies increase, System-on-Chip (SoC) device such as Zynq All Programmable SoC was devised to meet those requirements. Because this CPU and FPGA can be assembled into one SoC device, we can consolidate motor-control functions and additional processing tasks into a single SoC device. The control algorithms, networking and other tasks, are off-loaded to the programmable logic that can include multiple control cores and multiple control system. This SoC system with a single chip can allow the hardware design with a single chip, hence, we can implement to control the motor to be simpler, more reliable, and less expensive. In this paper, in order to implement motor controller, we apply latest All Programmable SoC technologies for humanoid robot or industrial device that is integrated with FPGA technologies and embedded processor technologies. We also propose the structure of motor controller that decentralizes the function of motor driver from previous typical motor driver into FPGA and level of embedded processor by using All Programmable SoC for humanoid robot or industrial device. We verify the possibilities of applying the novel implemented motor controller in Zynq EPP (Extensible Processing Platform) which is one kind of All Programmable SoC made by Xilinx. To do this, we perform velocity control and position control with digital PI controller on the BLDC motor.

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Hong Yi, Cheng Yang, Xiao Yu Wu, and Ya Ning Wu. "A Kind of Video Abstracting System Base on Hadoop." Applied Mechanics and Materials 687-691 (November 2014): 2186–91. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.2186.

Full text

Abstract:

Digital video, as a kind of large-scale resource, plays an important role in internet’s multimedia resources. Resource consuming is one of the major issues for extracting data from digital video. However, if it is possible to know the outline of a video by browsing several pictures describing it, that will save us a huge amount of time. Up till now, it is not common for video processing system using cloud environment (except video transcoding system). This demo shows an extensible video processing system based on Apache Hadoop cloud environment. Our system utilizes FFmpeg video encoder and OpenCV graphic processor. The basic process of building a video abstracting system includes segmenting scene, followed by exacting representative key frame. The source files of that video stores in HDFS, while the information of segmentation and key frame stores in HBase. In this way, fast searching can be achieved.

APA, Harvard, Vancouver, ISO, and other styles

18

Vlahopoulos, Nickolas, and Michael M. Bernitsas. "Three-Dimensional Nonlinear Dynamics of Nonintegral Riser Bundle." Journal of Ship Research 35, no. 01 (March 1, 1991): 40–57. http://dx.doi.org/10.5957/jsr.1991.35.1.40.

Full text

Abstract:

The dynamic behavior of a nonintegral riser bundle is studied parametrically. The dynamics of each component-riser is analyzed by a three-dimensional, nonlinear, large deflection, small strain model with coupled bending and torsion. Component-risers are slender, thin-walled, extensible or inextensible tubular beam-columns, subject to response and deformation dependent hydrodynamic loads. The con-nector equations of equilibrium are used to derive the connector forces and moments. Substructuring can thus be achieved even though in three dimensions connectors do not impose linearly dependent deflections at substructure interfaces. The developed time incremental and iterative finite-element computer code is used to analyze the effects of water depth, distribution of connectors, distance between component risers and number of finite elements in the numerical model. The problem of total CPU (central processor unit) time and the advantages of substructuring are discussed by running cases of up to 1094 degrees of freedom.

APA, Harvard, Vancouver, ISO, and other styles

19

CASEAU, YVES, FRANÇOIS-XAVIER JOSSET, and FRANÇOIS LABURTHE. "CLAIRE: combining sets, search and rules to better express algorithms." Theory and Practice of Logic Programming 2, no. 6 (November 2002): 769–805. http://dx.doi.org/10.1017/s1471068401001363.

Full text

Abstract:

This paper presents a programming language which includes paradigms that are usually associated with declarative languages, such as sets, rules and search, into an imperative (functional) language. Although these paradigms are separately well known and are available under various programming environments, the originality of the CLAIRE language comes from the tight integration, which yields interesting run-time performances, and from the richness of this combination, which yields new ways in which to express complex algorithmic patterns with few elegant lines. To achieve the opposite goals of a high abstraction level (conciseness and readability) and run-time performance (CLAIRE is used as a C++ preprocessor), we have developed two kinds of compiler: first, a pattern pre-processor handles iterations over both concrete and abstract sets (data types and program fragments), in a completely user-extensible manner; secondly, an inference compiler transforms a set of logical rules into a set of functions (demons that are used through procedural attachment).

APA, Harvard, Vancouver, ISO, and other styles

20

Saadawi, Gilan M., and James H. Harrison. "Definition of an XML Markup Language for Clinical Laboratory Procedures and Comparison with Generic XML Markup." Clinical Chemistry 52, no. 10 (October 1, 2006): 1943–51. http://dx.doi.org/10.1373/clinchem.2006.071449.

Full text

Abstract:

Abstract Background: Clinical laboratory procedure manuals are typically maintained as word processor files and are inefficient to store and search, require substantial effort for review and updating, and integrate poorly with other laboratory information. Electronic document management systems could improve procedure management and utility. As a first step toward building such systems, we have developed a prototype electronic format for laboratory procedures using Extensible Markup Language (XML). Methods: Representative laboratory procedures were analyzed to identify document structure and data elements. This information was used to create a markup vocabulary, CLP-ML, expressed as an XML Document Type Definition (DTD). To determine whether this markup provided advantages over generic markup, we compared procedures structured with CLP-ML or with the vocabulary of the Health Level Seven, Inc. (HL7) Clinical Document Architecture (CDA) narrative block. Results: CLP-ML includes 124 XML tags and supports a variety of procedure types across different laboratory sections. When compared with a general-purpose markup vocabulary (CDA narrative block), CLP-ML documents were easier to edit and read, less complex structurally, and simpler to traverse for searching and retrieval. Conclusion: In combination with appropriate software, CLP-ML is designed to support electronic authoring, reviewing, distributing, and searching of clinical laboratory procedures from a central repository, decreasing procedure maintenance effort and increasing the utility of procedure information. A standard electronic procedure format could also allow laboratories and vendors to share procedures and procedure layouts, minimizing duplicative word processor editing. Our results suggest that laboratory-specific markup such as CLP-ML will provide greater benefit for such systems than generic markup.

APA, Harvard, Vancouver, ISO, and other styles

21

LOIDL, HANS-WOLFGANG, PHILIP W. TRINDER, and CARSTEN BUTZ. "TUNING TASK GRANULARITY AND DATA LOCALITY OF DATA PARALLEL GPH PROGRAMS." Parallel Processing Letters 11, no. 04 (December 2001): 471–86. http://dx.doi.org/10.1142/s0129626401000737.

Full text

Abstract:

The performance of data parallel programs often hinges on two key coordination aspects: the computational costs of the parallel tasks relative to their management overhead — task granularity; and the communication costs induced by the distance between tasks and their data — data locality. In data parallel programs both granularity and locality can be improved by clustering, i.e. arranging for parallel tasks to operate on related sub-collections of data. The GPH parallel functional language automatically manages most coordination aspects, but also allows some high-level control of coordination using evaluation strategies. We study the coordination behavior of two typical data parallel programs, and find that while they can be improved by introducing clustering evaluation strategies, further performance improvements can be achieved by restructuring the program. We introduce a new generic Cluster class that allows clustering to be systematically introduced, and improved by program transformation. In contrast to many other parallel program transformation approaches, we transform realistic programs and report performance results on a 32-processor Beowulf cluster. The cluster class is highly-generic and extensible, amenable to reasoning, and avoids conflating computation and coordination aspects of the program.

APA, Harvard, Vancouver, ISO, and other styles

22

Arsenault, Kristi R., Sujay V. Kumar, James V. Geiger, Shugong Wang, Eric Kemp, David M. Mocko, Hiroko Kato Beaudoing, et al. "The Land surface Data Toolkit (LDT v7.2) – a data fusion environment for land data assimilation systems." Geoscientific Model Development 11, no. 9 (September 5, 2018): 3605–21. http://dx.doi.org/10.5194/gmd-11-3605-2018.

Full text

Abstract:

Abstract. The effective applications of land surface models (LSMs) and hydrologic models pose a varied set of data input and processing needs, ranging from ensuring consistency checks to more derived data processing and analytics. This article describes the development of the Land surface Data Toolkit (LDT), which is an integrated framework designed specifically for processing input data to execute LSMs and hydrological models. LDT not only serves as a preprocessor to the NASA Land Information System (LIS), which is an integrated framework designed for multi-model LSM simulations and data assimilation (DA) integrations, but also as a land-surface-based observation and DA input processor. It offers a variety of user options and inputs to processing datasets for use within LIS and stand-alone models. The LDT design facilitates the use of common data formats and conventions. LDT is also capable of processing LSM initial conditions and meteorological boundary conditions and ensuring data quality for inputs to LSMs and DA routines. The machine learning layer in LDT facilitates the use of modern data science algorithms for developing data-driven predictive models. Through the use of an object-oriented framework design, LDT provides extensible features for the continued development of support for different types of observational datasets and data analytics algorithms to aid land surface modeling and data assimilation.

APA, Harvard, Vancouver, ISO, and other styles

23

Kamal, Mehdi, Ali Afzali-Kusha, Saeed Safari, and Massoud Pedram. "Design of NBTI-resilient extensible processors." Integration 49 (March 2015): 22–34. http://dx.doi.org/10.1016/j.vlsi.2014.12.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Shalaby, Nadia, Andy Bavier, Yitzchak Gottlieb, Scott Karlin, Larry Peterson, Xiaohu Qie, Tammo Spalink, and Mike Wawrzoniak. "Building extensible routers using network processors." Software: Practice and Experience 35, no. 12 (2005): 1155–94. http://dx.doi.org/10.1002/spe.667.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Xiao, Chenglong, and Emmanuel Casseau. "Exact custom instruction enumeration for extensible processors." Integration 45, no. 3 (June 2012): 263–70. http://dx.doi.org/10.1016/j.vlsi.2011.11.011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Xiao, Chenglong, Shanshan Wang, Wanjun Liu, and Emmanuel Casseau. "Parallel custom instruction identification for extensible processors." Journal of Systems Architecture 76 (May 2017): 149–59. http://dx.doi.org/10.1016/j.sysarc.2016.11.011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Fei, Y., S. Ravi, A. Raghunathan, and N. K. Jha. "A Hybrid Energy-Estimation Technique for Extensible Processors." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23, no. 5 (May 2004): 652–64. http://dx.doi.org/10.1109/tcad.2004.826546.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Fei Sun, S. Ravi, A. Raghunathan, and N. K. Jha. "Application-specific heterogeneous multiprocessor synthesis using extensible processors." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, no. 9 (September 2006): 1589–602. http://dx.doi.org/10.1109/tcad.2005.858269.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Chen, Xiaoyong, Douglas L. Maskell, and Yang Sun. "Fast Identification of Custom Instructions for Extensible Processors." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, no. 2 (February 2007): 359–68. http://dx.doi.org/10.1109/tcad.2006.883915.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Li, T., W. Jigang, Y. Deng, T. Srikanthan, and X. Lu. "Accelerating identification of custom instructions for extensible processors." IET Circuits, Devices & Systems 5, no. 1 (2011): 21. http://dx.doi.org/10.1049/iet-cds.2010.0073.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Bonzini, P., and L. Pozzi. "Recurrence-Aware Instruction Set Selection for Extensible Embedded Processors." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 16, no. 10 (October 2008): 1259–67. http://dx.doi.org/10.1109/tvlsi.2008.2001863.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Goyal, Puneet, and Narayan Chaturvedi. "Multiple Output Complex Instruction Matching Algorithm for Extensible Processors." International Journal of Computer Applications 49, no. 21 (July 31, 2012): 31–35. http://dx.doi.org/10.5120/7897-1240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Yazdanbakhsh, Amir, Mehdi Kamal, Sied Mehdi Fakhraie, Ali Afzali-Kusha, Saeed Safari, and Massoud Pedram. "Implementation-aware selection of the custom instruction set for extensible processors." Microprocessors and Microsystems 38, no. 7 (October 2014): 681–91. http://dx.doi.org/10.1016/j.micpro.2014.05.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

LUKAC, RASTISLAV, PAVOL GALAJDA, and ALENA GALAJDOVA. "LUM PROCESSOR WITH NEURAL DECISION." International Journal of Pattern Recognition and Artificial Intelligence 20, no. 05 (August 2006): 747–62. http://dx.doi.org/10.1142/s0218001406004934.

Full text

Abstract:

This paper focuses on impulsive noise filtering and outliers rejection in gray-scale images. The proposed method combines neural networks, lower-upper-middle (LUM) smoothers and adaptive switching operations to produce a high-quality enhanced image. Extensive experimentation reported in this paper indicates that the proposed method is sufficiently robust, achieves an excellent balance between noise suppression and signal-detail preservation, and outperforms some well-known filters both subjectively and objectively.

APA, Harvard, Vancouver, ISO, and other styles

35

SCHAFFER, KEVIN, and ROBERT A. WALKER. "USING HARDWARE MULTITHREADING TO OVERCOME BROADCAST/REDUCTION LATENCY IN AN ASSOCIATIVE SIMD PROCESSOR." Parallel Processing Letters 18, no. 04 (December 2008): 491–509. http://dx.doi.org/10.1142/s0129626408003533.

Full text

Abstract:

The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.

APA, Harvard, Vancouver, ISO, and other styles

36

KWON, YOUNG-SU, and NAK-WOONG EUM. "APPLICATION-ADAPTIVE RECONFIGURATION OF MEMORY ADDRESS SHUFFLER FOR FPGA-EMBEDDED INSTRUCTION-SET PROCESSOR." Journal of Circuits, Systems and Computers 19, no. 07 (November 2010): 1435–47. http://dx.doi.org/10.1142/s0218126610006748.

Full text

Abstract:

Programmability requirement in reconfigurable systems necessitates the integration of soft processors in FPGAs. The extensive memory bandwidth sets a major performance bottleneck in soft processors for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions in media processors, memory access conflicts caused by multiple memory buses limit the overall performance. We propose and evaluate the configurable memory address shuffler integrated in memory access arbiter for the parallel memory system in a soft processor. The novel address shuffling algorithm profiles memory access pattern of the application, produces the access conflict graph, relocates decomposed memory sub-pages based on the access conflict graph, and finally generates a synthesizable code of the address shuffler. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that the amount of simultaneous accesses to the identical physical memory block diminishes. The reconfigurability of the address shuffler enables the adaptive address shuffling depending on the memory access pattern of an application running on the soft processor. The configurable address shuffler removes 80% of access conflicts on average for benchmarks where the hardware overhead of the shuffler is 1592 LUTs which is 14% of LUT size of the processor core.

APA, Harvard, Vancouver, ISO, and other styles

37

Sun, Fei, Srivaths Ravi, Anand Raghunathan, and Niraj K. Jha. "A Synthesis Methodology for Hybrid Custom Instruction and Coprocessor Generation for Extensible Processors." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, no. 11 (November 2007): 2035–45. http://dx.doi.org/10.1109/tcad.2007.906457.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Kamal, Mehdi, Ali Afzali-Kusha, Saeed Safari, and Massoud Pedram. "Impact of Process Variations on Speedup and Maximum Achievable Frequency of Extensible Processors." ACM Journal on Emerging Technologies in Computing Systems 10, no. 3 (April 2014): 1–25. http://dx.doi.org/10.1145/2567665.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Sari, Aitzan, and Mihalis Psarakis. "A Flexible Fault Injection Platform for the Analysis of the Symptoms of Soft Errors in FPGA Soft Processors." Journal of Circuits, Systems and Computers 26, no. 08 (April 11, 2017): 1740009. http://dx.doi.org/10.1142/s0218126617400096.

Full text

Abstract:

Due to the high vulnerability of SRAM-based FPGAs in single-event upsets (SEUs), effective fault tolerant soft processor architectures must be considered when we use FPGAs to build embedded systems for critical applications. In the past, the detection of symptoms of soft errors in the behavior of microprocessors has been used for the implementation of low-budget error detection techniques, instead of costly hardware redundancy techniques. To enable the development of such low-cost error detection techniques for FPGA soft processors, we propose an in-depth analysis of the symptoms of SEUs in the FPGA configuration memory. To this end, we present a flexible fault injection platform based on an open-source CAD framework (RapidSmith) for the soft error sensitivity analysis of soft processors in Xilinx SRAM-based FPGAs. Our platform supports the estimation of soft error sensitivity per configuration bit/frame, processor component and benchmark. The fault injection is performed on-chip by a dedicated microcontroller which also monitors processor behavior to identify specific symptoms as consequences of soft errors. The performed analysis showed that these symptoms can be used to build an efficient, low-cost error detection scheme. The proposed platform is demonstrated through an extensive fault injection campaign in the Leon3 soft processor.

APA, Harvard, Vancouver, ISO, and other styles

40

Greer, Bruce, John Harrison, Greg Henry, Wei Li, and Peter Tang. "Scientific Computing on the Itanium® Processor." Scientific Programming 10, no. 4 (2002): 329–37. http://dx.doi.org/10.1155/2002/193478.

Full text

Abstract:

The 64-bit Intel® Itanium® architecture is designed for high-performance scientific and enterprise computing, and the Itanium processor is its first silicon implementation. Features such as extensive arithmetic support, predication, speculation, and explicit parallelism can be used to provide a sound infrastructure for supercomputing. A large number of high-performance computer companies are offering Itanium® -based systems, some capable of peak performance exceeding 50 GFLOPS. In this paper we give an overview of the most relevant architectural features and provide illustrations of how these features are used in both low-level and high-level support for scientific and engineering computing, including transcendental functions and linear algebra kernels.

APA, Harvard, Vancouver, ISO, and other styles

41

Hua, Jing, Yingqiong Peng, Yilu Xu, Kun Cao, and Jing Jia. "Makespan Minimization for Multiprocessor Real-Time Systems under Thermal and Timing Constraints." Journal of Circuits, Systems and Computers 28, no. 09 (August 2019): 1950145. http://dx.doi.org/10.1142/s0218126619501457.

Full text

Abstract:

With the continued scaling of the CMOS device, the exponential increase in power density has strikingly elevated the temperature of on-chip systems. In this paper, the problem of allocating and scheduling frame-based real-time applications is addressed to multiprocessors to minimize the makespan under the thermal and timing constraints. The proposed algorithms consist of offline and online components. The offline component assigns the applications accepted at static time to processors in a way that the finish time of processors are balanced. The online component firstly selects the processor with the highest allocation probability for each application accepted at runtime. The allocation probability is calculated by taking the processor workload and temperature profiles into consideration. The higher allocation probability of a processor shows the better performance with respect to makespan and temperature can be achieved by executing the application on this processor. Then, the operating frequencies of applications are determined by making the most of slack in order to reduce the peak temperature under the timing constraint. Extensive simulations were performed to validate the effectiveness of the proposed approach. Experimental results have shown that the static makespan of the proposed scheme is very close to the optimal schedule length within a small margin varying from 0.118[Formula: see text]s to 0.249[Formula: see text]s, and the dynamic makespan of the proposed scheme can be adapted to satisfy varying system design constraints. The peak temperature of the proposed algorithms can be up to [Formula: see text] lower than that of the benchmarking schemes.

APA, Harvard, Vancouver, ISO, and other styles

42

Dixon, Matthew, Jörg Lotze, and Mohammad Zubair. "A portable, extensible and fast stochastic volatility model calibration using multi and many-core processors." Concurrency and Computation: Practice and Experience 28, no. 3 (November 20, 2015): 866–77. http://dx.doi.org/10.1002/cpe.3727.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Kamal, Mehdi, Ali Afzali-Kusha, Saeed Safari, and Massoud Pedram. "Yield and Speedup Improvements in Extensible Processors by Allocating Extra Cycles to Some Custom Instructions." ACM Transactions on Design Automation of Electronic Systems 21, no. 2 (January 28, 2016): 1–25. http://dx.doi.org/10.1145/2830566.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Faraci, Giuseppe, Alfio Lombardo, and Giovanni Schembra. "A Processor-Sharing Scheduling Strategy for NFV Nodes." Journal of Electrical and Computer Engineering 2016 (2016): 1–10. http://dx.doi.org/10.1155/2016/3583962.

Full text

Abstract:

The introduction of the two paradigms SDN and NFV to “softwarize” the current Internet is making management and resource allocation two key challenges in the evolution towards the Future Internet. In this context, this paper proposes Network-Aware Round Robin (NARR), a processor-sharing strategy, to reduce delays in traversing SDN/NFV nodes. The application of NARR alleviates the job of the Orchestrator by automatically working at the intranode level, dynamically assigning the processor slices to the virtual network functions (VNFs) according to the state of the queues associated with the output links of the network interface cards (NICs). An extensive simulation set is presented to show the improvements achieved with respect to two more processor-sharing strategies chosen as reference.

APA, Harvard, Vancouver, ISO, and other styles

45

Günzel, Mario, Christian Hakert, Kuan-Hsun Chen, and Jian-Jia Chen. "HEART: H ybrid Memory and E nergy- A ware R eal- T ime Scheduling for Multi-Processor Systems." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–23. http://dx.doi.org/10.1145/3477019.

Full text

Abstract:

Dynamic power management (DPM) reduces the power consumption of a computing system when it idles, by switching the system into a low power state for hibernation. When all processors in the system share the same component, e.g., a shared memory, powering off this component during hibernation is only possible when all processors idle at the same time. For a real-time system, the schedulability property has to be guaranteed on every processor, especially if idle intervals are considered to be actively introduced. In this work, we consider real-time systems with hybrid shared-memory architectures, which consist of shared volatile memory (VM) and non-volatile memory (NVM). Energy-efficient execution is achieved by applying DPM to turn off all memories during the hibernation mode. Towards this, we first explore the hybrid memory architectures and suggest a task model, which features configurable hibernation overheads. We propose a multi-processor procrastination algorithm (HEART), based on partitioned earliest-deadline-first (pEDF) scheduling. Our algorithm facilitates reducing the energy consumption by actively enlarging the hibernation time. It enforces all processors to idle simultaneously without violating the schedulability condition, such that the system can enter the hibernation state, where shared memories are turned off. Throughout extensive evaluation of HEART, we demonstrate (1) the increase in potential hibernation time, respectively the decrease in energy consumption, and (2) that our algorithm is not only more general but also has better performance than the state of the art with respect to energy efficiency in most cases.

APA, Harvard, Vancouver, ISO, and other styles

46

Yin, G., and Y. M. Zhu. "On W.P.1 Convergence of A Parallel Stochastic Approximation Algorithm." Probability in the Engineering and Informational Sciences 3, no. 1 (January 1989): 55–75. http://dx.doi.org/10.1017/s0269964800000978.

Full text

Abstract:

To find zeros or locate maximum values of a regression function with noisy measurements, a commonly used algorithm is the RM or KW procedure. In various applications, the dimensionality of the problems involved might be quite large. As a result, enormous memory space and extensive computation time may be needed. Motivated by the recent progress in stochastic approximation methods for decentralized and distributed computing, a parallel stochastic approximation algorithm is developed in this paper. The essence is to take advantage of state-space decompositions, and to exploit the opportunities provided by parallel processing and asynchronous communication. In lieu of utilizing a single processor as in the classical cases, a number of parallel processors are employed to solve the underlying problem in a cooperative way. First, the large dimensional vector is partitioned into a number of subvectors with relatively small dimension, then each of the subvectors is assigned to one of the processors. The processors compute and communicate in an asynchronous manner and at random times. Under rather weak conditions, the global convergence of the parallel algorithm is obtained via the methods of randomly varying truncations.

APA, Harvard, Vancouver, ISO, and other styles

47

Mahmood, Ausif. "Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures." VLSI Design 4, no. 1 (January 1, 1996): 59–68. http://dx.doi.org/10.1155/1996/91035.

Full text

Abstract:

The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

APA, Harvard, Vancouver, ISO, and other styles

48

Ababneh, Ismail M., Saad Bani-Mohammad, and Motasem Al Smadi. "Corner-Boundary Processor Allocation for 3D Mesh-Connected Multicomputers." International Journal of Cloud Applications and Computing 5, no. 1 (January 2015): 1–13. http://dx.doi.org/10.4018/ijcac.2015010101.

Full text

Abstract:

This research paper presents a new contiguous allocation strategy for 3D mesh-connected multicomputers. The proposed strategy maintains a list of maximal free sub-meshes and gives priority to allocating corner and boundary free sub-meshes. The goal of corner and boundary allocation is to decrease the number of leftover free sub-meshes and increase their sizes, which is expected to reduce processor fragmentation and improve overall system performance. The proposed strategy, which is referred to as Turning Corner-Boundary Free List (TCBFL) strategy, is compared, using extensive simulation experiments, to several existing allocation strategies for 3D meshes. These are the First-Fit (FF), Turning First-Fit Free List (TFFFL), and Turning Busy List (TBL) allocation strategies. The simulation results show that TCBFL produces average turnaround times and mean system utilization values that are superior to those of previous strategies.

APA, Harvard, Vancouver, ISO, and other styles

49

NICOL, DAVID M., and WEIZHEN MAO. "ON BOTTLENECK PARTITIONING OF k-ARY n-CUBES." Parallel Processing Letters 06, no. 03 (September 1996): 389–99. http://dx.doi.org/10.1142/s0129626496000376.

Full text

Abstract:

Graph partitioning is a topic of extensive interest, with applications to parallel processing. In this context graph nodes typically represent computation, and edges represent communication. One seeks to distribute the workload by partitioning the graph so that every processor has approximately the same workload, and the communication cost (measured as a function of edges exposed by the partition) is minimized. Measures of partition quality vary; in this paper we consider a processor’s cost to be the sum of its computation and communication costs, and consider the cost of a partition to be the bottleneck, or maximal processor cost induced by the partition. For a general graph the problem of finding an optimal partitioning is intractable. In this paper we restrict our attention to the class of k-ary n-cube graphs with uniformly weighted nodes. Given mild restrictions on the node weight and number of processors, we identify partitions yielding the smallest bottleneck. We also demonstrate by example that some restrictions are necessary for the partitions we identify to be optimal. In particular, there exist cases where partitions that evenly partition nodes need not be optimal.

APA, Harvard, Vancouver, ISO, and other styles

50

Mikheev, Andrei, and Liubov Liubushkina. "Russian morphology: An engineering approach." Natural Language Engineering 1, no. 3 (September 1995): 235–60. http://dx.doi.org/10.1017/s135132490000019x.

Full text

Abstract:

AbstractMorphological analysis, which is at the heart of the processing of natural language requires computationally effective morphological processors. In this paper an approach to the organization of an inflectional morphological model and its application for the Russian language are described. The main objective of our morphological processor is not the classification of word constituents, but rather an efficient computational recognition of morpho-syntactic features of words and the generation of words according to requested morpho-syntactic features. Another major concern that the processor aims to address is the ease of extending the lexicon. The templated word-paradigm model used in the system has an engineering flavour: paradigm formation rules are of a bottom-up (word specific) nature rather than general observations about the language, and word formation units are segments of words rather than proper morphemes. This approach allows us to handle uniformly both general cases and exceptions, and requires extremely simple data structures and control mechanisms which can be easily implemented as a finite-state automata. The morphological processor described in this paper is fully implemented for a substantial subset of Russian (more then 1,500,000 word-tokens – 95,000 word paradigms) and provides an extensive list of morpho-syntactic features together with stress positions for words utilized in its lexicon. Special dictionary management tools were built for browsing, debugging and extension of the lexicon. The actual implementation was done in C and C++, and the system is available for the MS-DOS, MS-Windows and UNIX platforms.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Extensible processor'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles