Journal articles on the topic 'Pipeline datapath'

To see the other types of publications on this topic, follow the link: Pipeline datapath.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 33 journal articles for your research on the topic 'Pipeline datapath.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ravikumar, C. P., and V. Saxena. "TOGAPS: A Testability Oriented Genetic Algorithm For Pipeline Synthesis." VLSI Design 5, no. 1 (January 1, 1996): 77–87. http://dx.doi.org/10.1155/1996/65320.

Full text
Abstract:
In this paper, we describe TOGAPS, a Testability-Oriented Genetic Algorithm for Pipeline Synthesis. The input to TOGAPS is an unscheduled data flow graph along with a specification of the desired pipeline latency. TOGAPS generates a register-level description of a datapath which is near-optimal in terms of area, meets the latency requirement, and is highly testable. Genetic search is employed to explore a 3-D search space, the three dimensions being the chip area, average latency, and the testability of the datapath. Testability of a design is evaluated by counting the number of self-loops in the structure graph of the data path. Each design is characterized by a four-tuple consisting of (i) the latency and schedule information, (ii) the module allocation, (iii) operation-to-module binding, and (iv) value-to-register binding. Accordingly, we maintain the population of designs in a hierarchical manner. The topmost level of this hierarchy consists of the latency and schedule information, which together characterize the timing performance of the design. The middle level of the hierarchy consists of a number of allocations for a given latency/schedule duplet. The lowest level of the hierarchy consists of a number of bindings for a specific latency/schedule/ allocation. An initial population of designs is constructed from the given data flow graph using different latency cycles whose average latency is in the specified range. Multiple scheduling heuristics are used to generate schedules for the DFG. For each of the resulting scheduled data flow graphs, we decide on an allocation of modules and registers based on a lower bound estimated using the schedule and latency information. The operation-to-module binding and the value-to-register binding are then carried out. A fitness measure is evaluated for each of the resulting data paths; this fitness measure includes one component for each of the three search dimensions. Crossover and mutation operators are used to generate new designs from the current set of parent designs. The crossover operator attempts to combine the properties of two designs. The mutation operators include addition and deletion of pure delays before scheduling, as well as changes in the register and module allocation prior to binding. The genetic algorithm applies the rule of the survival of the fittest to obtain nearoptimal solution to the otherwise intractable problem of data path synthesis. We have implemented TOGAPS on a Sun/SPARC 10 and studied its performance on a number of benchmark examples. Results indicate that TOGAPS finds area-optimal datapaths for the specified latency cycle, while reducing the number of self-loops in the data path.
APA, Harvard, Vancouver, ISO, and other styles
2

Kingyens, Jeffrey, and J. Gregory Steffan. "The Potential for a GPU-Like Overlay Architecture for FPGAs." International Journal of Reconfigurable Computing 2011 (2011): 1–15. http://dx.doi.org/10.1155/2011/514581.

Full text
Abstract:
We propose a soft processor programming model and architecture inspired by graphics processing units (GPUs) that are well-matched to the strengths of FPGAs, namely, highly parallel and pipelinable computation. In particular, our soft processor architecture exploits multithreading, vector operations, and predication to supply a floating-point pipeline of 64 stages via hardware support for up to 256 concurrent thread contexts. The key new contributions of our architecture are mechanisms for managing threads and register files that maximize data-level and instruction-level parallelism while overcoming the challenges of port limitations of FPGA block memories as well as memory and pipeline latency. Through simulation of a system that (i) is programmable via NVIDIA's high-levelCglanguage, (ii) supports AMD's CTM r5xx GPU ISA, and (iii) is realizable on an XtremeData XD1000 FPGA-based accelerator system, we demonstrate the potential for such a system to achieve 100% utilization of a deeply pipelined floating-point datapath.
APA, Harvard, Vancouver, ISO, and other styles
3

Lee, Y. H., M. Khalil-Hani, and M. N. Marsono. "An FPGA-Based Quantum Computing Emulation Framework Based on Serial-Parallel Architecture." International Journal of Reconfigurable Computing 2016 (2016): 1–18. http://dx.doi.org/10.1155/2016/5718124.

Full text
Abstract:
Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speed-up than software simulations. In this paper, an efficient hardware emulation method that employs a serial-parallel hardware architecture targeted for field programmable gate array (FPGA) is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. The proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits.
APA, Harvard, Vancouver, ISO, and other styles
4

Kashima, Ryota, Ikki Nagaoka, Masamitsu Tanaka, Taro Yamashita, and Akira Fujimaki. "64-GHz Datapath Demonstration for Bit-Parallel SFQ Microprocessors Based on a Gate-Level-Pipeline Structure." IEEE Transactions on Applied Superconductivity 31, no. 5 (August 2021): 1–6. http://dx.doi.org/10.1109/tasc.2021.3061353.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Alachiotis, Nikolaos, and Alexandros Stamatakis. "A Vector-Like Reconfigurable Floating-Point Unit for the Logarithm." International Journal of Reconfigurable Computing 2011 (2011): 1–12. http://dx.doi.org/10.1155/2011/341510.

Full text
Abstract:
The use of reconfigurable computing for accelerating floating-point intensive codes is becoming common due to the availability of DSPs in new-generation FPGAs. We present the design of an efficient, pipelined floating-point datapath for calculating the logarithm function on reconfigurable devices. We integrate the datapath into a stand-alone LUT-based (Lookup Table) component, the LAU (Logarithm Approximation Unit). We extended the LAU, by integrating two architecturally independent, LAU-based datapaths into a larger component, the VLAU (vector-like LAU). The VLAU produces 2 results/cycle, while occupying the same amount of memory as the LAU. Under single precision, one LAU is 12 and 1.7 times faster than the GNU and Intel Math Kernel Library (MKL) implementations, respectively. The LAU is also 1.6 times faster than the FloPoCo reconfigurable logarithm architecture. Under double precision, one LAU is 20 and 2.6 times faster than the respective GNU and MKL functions and 1.4 times faster than the FloPoCo logarithm. The VLAU is approximately twice as fast as the LAU, both under single and double precision.
APA, Harvard, Vancouver, ISO, and other styles
6

Titus, Dr Anita. "Datapath Optimization in AES using Pipelined Architecture." International Journal for Research in Applied Science and Engineering Technology 8, no. 8 (August 31, 2020): 940–44. http://dx.doi.org/10.22214/ijraset.2020.31056.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cekli, Serap, and Ali Akman. "Enhanced SPIHT Algorithm with Pipelined Datapath Architecture Design." Electrica 19, no. 1 (March 5, 2019): 29–36. http://dx.doi.org/10.26650/electrica.2018.15101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Nabi, Syed Waqar, and Wim Vanderbauwhede. "Automatic Pipelining and Vectorization of Scientific Code for FPGAs." International Journal of Reconfigurable Computing 2019 (November 18, 2019): 1–12. http://dx.doi.org/10.1155/2019/7348013.

Full text
Abstract:
There is a large body of legacy scientific code in use today that could benefit from execution on accelerator devices like GPUs and FPGAs. Manual translation of such legacy code into device-specific parallel code requires significant manual effort and is a major obstacle to wider FPGA adoption. We are developing an automated optimizing compiler TyTra to overcome this obstacle. The TyTra flow aims to compile legacy Fortran code automatically for FPGA-based acceleration, while applying suitable optimizations. We present the flow with a focus on two key optimizations, automatic pipelining and vectorization. Our compiler frontend extracts patterns from legacy Fortran code that can be pipelined and vectorized. The backend first creates fine and coarse-grained pipelines and then automatically vectorizes both the memory access and the datapath based on a cost model, generating an OpenCL-HDL hybrid working solution for FPGA targets on the Amazon cloud. Our results show up to 4.2× performance improvement over baseline OpenCL code.
APA, Harvard, Vancouver, ISO, and other styles
9

Cappuccino, G., G. Cocorullo, P. Corsonello, and S. Perri. "High speed self-timed pipelined datapath for square rooting." IEE Proceedings - Circuits, Devices and Systems 146, no. 1 (1999): 16. http://dx.doi.org/10.1049/ip-cds:19990271.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Arató, Péter, lstván Béres, Andrzej Rucinski, Robert Davis, and Roy Torbert. "A high-level datapath synthesis method for pipelined structures." Microelectronics Journal 25, no. 3 (May 1994): 237–47. http://dx.doi.org/10.1016/0026-2692(94)90015-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Xianwu Xing and Ching Chuen Jong. "Multivoltage Multifrequency Low-Energy Synthesis for Functionally Pipelined Datapath." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, no. 9 (September 2009): 1348–52. http://dx.doi.org/10.1109/tvlsi.2008.2002684.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Sergiyenko, A. M., V. A. Romankevich, and A. A. Serhienko. "Genetic Programming of Application-Specific Pipelined Datapaths." Èlektronnoe modelirovanie 42, no. 2 (April 9, 2020): 25–40. http://dx.doi.org/10.15407/emodel.42.02.025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Arató, Péter, Zoltán Ádám Mann, and András Orbán. "Time-constrained scheduling of large pipelined datapaths." Journal of Systems Architecture 51, no. 12 (December 2005): 665–87. http://dx.doi.org/10.1016/j.sysarc.2005.02.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Hong-Shin Jun and Sun-Young Hwang. "Design of a pipelined datapath synthesis system for digital signal processing." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2, no. 3 (September 1994): 292–303. http://dx.doi.org/10.1109/92.311638.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Sergiyenko, A. M., and I. V. Mozghovyi. "Hardware Decompressor Design." Èlektronnoe modelirovanie 45, no. 5 (October 10, 2023): 113–28. http://dx.doi.org/10.15407/emodel.45.05.113.

Full text
Abstract:
The common lossless compression algorithms were analyzed, and the LZW algorithm was selected for the hardware implementation. To express parallelism, this algorithm is represented as a cyclo-dynamic dataflow (CDDF). A hardware synthesis method for designing pipelined datapath is proposed, which optimizes CDDF considering the features of the FPGA primitives and maps it to hardware using VHDL language description. Using this method, an LZW de¬compressor is developed, which exhibits a high performance-to-hardware cost ratio. The de¬com¬¬¬pressor can be utilized in communication channels and other application-specific systems for data loading from memory, generating graphical stencils, and more.
APA, Harvard, Vancouver, ISO, and other styles
16

Jin, Zheming, and Jason D. Bakos. "A Heuristic Scheduler for Port-Constrained Floating-Point Pipelines." International Journal of Reconfigurable Computing 2013 (2013): 1–9. http://dx.doi.org/10.1155/2013/849545.

Full text
Abstract:
We describe a heuristic scheduling approach for optimizing floating-point pipelines subject to input port constraints. The objective of our technique is to maximize functional unit reuse while minimizing the following performance metrics in the generated circuit: (1) maximum multiplexer fanin, (2) datapath fanout, (3) number of multiplexers, and (4) number of registers. For a set of systems biology markup language (SBML) benchmark expressions, we compare the resource usages given by our method to those given by a branch-and-bound enumeration of all valid schedules. Compared with the enumeration results, our heuristic requires on average 33.4% less multiplexer bits and 32.9% less register bits than the worse case, while only requiring 14% more multiplexer bits and 4.5% more register bits than the optimal case. We also compare our results against those given by the state-of-art high-level synthesis tool Xilinx AutoESL. For the most complex of our benchmark expressions, our synthesis technique requires 20% less FPGA slices than AutoESL.
APA, Harvard, Vancouver, ISO, and other styles
17

Koch, Andreas. "Efficient Integration of Pipelined IP Blocks into Automatically Compiled Datapaths." EURASIP Journal on Embedded Systems 2007 (2007): 1–9. http://dx.doi.org/10.1155/2007/65173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Koch, Andreas. "Efficient Integration of Pipelined IP Blocks into Automatically Compiled Datapaths." EURASIP Journal on Embedded Systems 2007, no. 1 (2007): 065173. http://dx.doi.org/10.1186/1687-3963-2007-065173.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Han, Liang, Jie Chen, and Xiaodong Chen. "Power optimization for the datapath of a 32-bit reconfigurable pipelined DSP processor." Journal of Electronics (China) 22, no. 6 (November 2005): 650–57. http://dx.doi.org/10.1007/bf02687846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Yoo, Hee-Jin, Ju-Young Oh, Jun-Yong Lee, and Do-Soon Park. "A Scheduling Approach using Gradual Mobility Reduction for Synthesizing Pipelined Datapaths." KIPS Transactions:PartA 9A, no. 3 (September 1, 2002): 379–86. http://dx.doi.org/10.3745/kipsta.2002.9a.3.379.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Jin, Seunghun, Dongkyun Kim, Thuy Tuong Nguyen, Daijin Kim, Munsang Kim, and Jae Wook Jeon. "Design and Implementation of a Pipelined Datapath for High-Speed Face Detection Using FPGA." IEEE Transactions on Industrial Informatics 8, no. 1 (February 2012): 158–67. http://dx.doi.org/10.1109/tii.2011.2173943.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Guo, Wei, KwangHyok Ri, Luping Cui, and Jizeng Wei. "An Area-Efficient Unified Architecture for Multi-Functional Double-Precision Floating-Point Computation." Journal of Circuits, Systems and Computers 24, no. 10 (October 25, 2015): 1550151. http://dx.doi.org/10.1142/s0218126615501510.

Full text
Abstract:
In this paper, we propose a unified architecture for computation of double-precision floating-point division, reciprocal, square root, inverse square root and multiplication with a significant area reduction. First, a double-precision multiplication-based divider, the common datapath shared with these arithmetic computations, is optimized by a modified Goldschmidt algorithm to achieve better area efficiency. In this algorithm, a linear-degree minimax approximation instead of second-degree is used to obtain a 15-bit precision estimate of the reciprocal so that we can get a rather small lookup table (LUT) as well as reduced amount of computation when accumulating the partial products. Two Goldschmidt iterations specially designed for hardware reuse are performed to gain the final accurate result of division. By virtue of the pipelined processing, the time cost for the two iterations is minimized. Second, a reconfigurable datapath with a little extra area cost is introduced to dynamically support multiple double-precision computations by executing the optimized divider iteratively. The design is finally implemented and synthesized in SMIC 0.13-μm CMOS process. The experimental results show that the proposed design can achieve a speed of 400 MHz with area of 61.6 K logic gates and 9-Kb LUT. Compared with other works, the area efficiency (performance/area ratio) of the proposed unified architecture is increased by about 20% in average, which is a better performance-area trade-off for embedded microprocessors.
APA, Harvard, Vancouver, ISO, and other styles
23

Nummer, Muhammad, and Manoj Sachdev. "Experimental Results for Slow-speed Timing Characterization of High-speed Pipelined Datapaths." Journal of Electronic Testing 27, no. 1 (November 3, 2010): 9–17. http://dx.doi.org/10.1007/s10836-010-5186-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Chowdhury, Shubhajit Roy, Dipankar Chakrabarti, and Hiranmay Saha. "FPGA realization of a smart processing system for clinical diagnostic applications using pipelined datapath architectures." Microprocessors and Microsystems 32, no. 2 (March 2008): 107–20. http://dx.doi.org/10.1016/j.micpro.2007.12.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Salehi, Sayed Ahmad, Rasoul Amirfattahi, and Keshab K. Parhi. "Pipelined Architectures for Real-Valued FFT and Hermitian-Symmetric IFFT With Real Datapaths." IEEE Transactions on Circuits and Systems II: Express Briefs 60, no. 8 (August 2013): 507–11. http://dx.doi.org/10.1109/tcsii.2013.2268411.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Yin, Xiao-Bo, Feng Yu, and Zhen-Guo Ma. "Resource-Efficient Pipelined Architectures for Radix-2 Real-Valued FFT With Real Datapaths." IEEE Transactions on Circuits and Systems II: Express Briefs 63, no. 8 (August 2016): 803–7. http://dx.doi.org/10.1109/tcsii.2016.2530862.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Wilson, T. C., N. Mukherjee, M. K. Garg, and D. K. Banerji. "An ILP Solution for Optimum Scheduling, Module and Register Allocation, and Operation Binding in Datapath Synthesis." VLSI Design 3, no. 1 (January 1, 1995): 21–36. http://dx.doi.org/10.1155/1995/23249.

Full text
Abstract:
We present an integrated and optimal solution to the problems of operator scheduling, module and register allocation, and operator binding in datapath synthesis. The solution is based on an integer linear programming (ILP) model that minimizes a weighted sum of module area and total execution time under very general assumptions of module capabilities. In particular, a module may execute an arbitrary combination of operations, possibly using different numbers of control steps for different operations. Furthermore, operations may be implemented by a variety of modules, possibly requiring different numbers of control steps depending on the modules chosen. This generality in the complexity and mixture of modules is unqiue to our system and leads to an optimum selection of modules to meet specified design constraints. Significant extensions include the ability to incorporate pipelined functional units and operator chaining in an integrated manner. Straightforward extension to multi-block synthesis is discussed briefly but the details are omitted due to space considerations.
APA, Harvard, Vancouver, ISO, and other styles
28

Livramento, Vinícius Dos S., Bruno G. Moraes, Brunno A. Machado, Eduardo Boabaid, and José Luiz Güntzel. "Evaluating the Impact of Architectural Decisions on the Energy Efficiency of FDCT/IDCT Configurable IP Cores." Journal of Integrated Circuits and Systems 7, no. 1 (December 27, 2012): 23–36. http://dx.doi.org/10.29292/jics.v7i1.353.

Full text
Abstract:
The development of mobile multimedia devices follows the platform-based design methodology in which IP cores are the building blocks. In the context of mobile devices there is a concern of battery lifetime which leads to the need of energy-efficient IP cores. This paper presents four energy-efficient FDCT/IDCT configurable IP cores. These architectures are based on Massimino’s algorithm, which was chosen due to its high accuracy and parallelism. The four architectures were built by combining fully-combinational or pipelined datapaths, using either a single or two 1-D DCT blocks with a transpose buffer that assures the optimal minimum latency of eight cycles. Synthesis results for 90nm showed that our most efficient architecture, which uses two pipelined 1-D blocks, achieved 250 MHz as maximum frequency at a total power of 14.03 mW. Such frequency was enough to process 16x 1080p@30fps videos in real time (nearly 2 GigaPixels/s). Comparisons with related work, in terms of energy efficiency (μJ/MPixels), revealed that our most energy-efficient architecture is at least 2 times as efficient as other DCT architectures. Moreover, the four designed architectures were also synthesized by using common low-power techniques. These results showed that pipelined versions at high throughput tend to take more benefit from using Low-Vdd and High-Vt combined than the combinational ones, thus becoming the most energy efficient.
APA, Harvard, Vancouver, ISO, and other styles
29

Josipović, Lana, Shabnam Sheikhha, Andrea Guerrieri, Paolo Ienne, and Jordi Cortadella. "Buffer Placement and Sizing for High-Performance Dataflow Circuits." ACM Transactions on Reconfigurable Technology and Systems 15, no. 1 (March 31, 2022): 1–32. http://dx.doi.org/10.1145/3477053.

Full text
Abstract:
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), unpredictable memory dependencies, and irregular control flow. Dataflow circuits exhibit an unconventional property: registers (usually referred to as “buffers”) can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit’s timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. Our performance optimization model supports important high-level synthesis features such as pipelined computational units, units with variable latency and throughput, and if-conversion. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.
APA, Harvard, Vancouver, ISO, and other styles
30

Soliman, Mostafa I., and Elsayed A. Elsayed. "Simultaneous Multithreaded Matrix Processor." Journal of Circuits, Systems and Computers 24, no. 08 (August 12, 2015): 1550114. http://dx.doi.org/10.1142/s0218126615501145.

Full text
Abstract:
This paper proposes a simultaneous multithreaded matrix processor (SMMP) to improve the performance of data-parallel applications by exploiting instruction-level parallelism (ILP) data-level parallelism (DLP) and thread-level parallelism (TLP). In SMMP, the well-known five-stage pipeline (baseline scalar processor) is extended to execute multi-scalar/vector/matrix instructions on unified parallel execution datapaths. SMMP can issue four scalar instructions from two threads each cycle or four vector/matrix operations from one thread, where the execution of vector/matrix instructions in threads is done in round-robin fashion. Moreover, this paper presents the implementation of our proposed SMMP using VHDL targeting FPGA Virtex-6. In addition, the performance of SMMP is evaluated on some kernels from the basic linear algebra subprograms (BLAS). Our results show that, the hardware complexity of SMMP is 5.68 times higher than the baseline scalar processor. However, speedups of 4.9, 6.09, 6.98, 8.2, 8.25, 8.72, 9.36, 11.84 and 21.57 are achieved on BLAS kernels of applying Givens rotation, scalar times vector plus another, vector addition, vector scaling, setting up Givens rotation, dot-product, matrix–vector multiplication, Euclidean length, and matrix–matrix multiplications, respectively. The average speedup over the baseline is 9.55 and the average speedup over complexity is 1.68. Comparing with Xilinx MicroBlaze, the complexity of SMMP is 6.36 times higher, however, its speedup ranges from 6.87 to 12.07 on vector/matrix kernels, which is 9.46 in average.
APA, Harvard, Vancouver, ISO, and other styles
31

John, Elwyn G., Z. Ghassemlooy, Malcolm Woolfson, Steve Harrold, M. Fleury, Mike Barnes, and Math Bollen. "Book Reviews: A Guide to Microsoft Excel for Scientists and Engineers, Fiber Bragg Gratings, Signal Detection Theory, Analog BiCMOS Design: Practices and Pitfalls, High Level Synthesis of Pipelined Datapaths, Electronic Control of Switched Reluctance Machines, Power Quality Primer." International Journal of Electrical Engineering & Education 39, no. 2 (April 2002): 175–80. http://dx.doi.org/10.7227/ijeee.39.2.9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Kashima, Ryota, Ikki Nagaoka, Tomoki Nakano, Masamitsu Tanaka, Taro Yamashita, and Akira Fujimaki. "Lowering Latency in a High-Speed Gate-Level-Pipelined Single Flux Quantum Datapath Using an Interleaved Register File." IEEE Transactions on Applied Superconductivity, 2023, 1–6. http://dx.doi.org/10.1109/tasc.2023.3249131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

"Energy-Efficient and High-throughput Implementations of Lightweight Block Cipher." International Journal of Innovative Technology and Exploring Engineering 9, no. 2S (December 31, 2019): 35–41. http://dx.doi.org/10.35940/ijitee.b1022.1292s19.

Full text
Abstract:
Security in resource-constrained devices has drawn the great attentions to researchers in recent years. To make secure transmission of critical information in such devices, lightweight cryptography algorithms come in light to large extend. KLEIN has been popular lightweight block cipher used to overcome such issues. In this paper, different architectures of KLEIN block cipher are presented. One of designs enhances the efficiency with regard to the throughput at the expense of a larger area. In order to make such designs, the pipelined registers are placed on different positions in datapath algorithm. The proposed design transforms the data input to protected output with the speed of 2414.13 Mbps for xc5vlx50t-3ff1136 device. In addition, the second design implementation completes either one or more than one round in only one clock and gives energy-efficient and high throughput implementations. Due to this, a trade-off between area and speed can be analyzed for high-speed applications. Moreover, this proposed design shows that with increasing the area of cipher implementation results in more transformation of plaintext into ciphertext. All results are verified and simulated for various families of Xilinx ISE design suite.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography