Journal articles: 'Computation-in-memory'

1

Stern, Peter. "Parallel computation in memory-making." Science 355, no. 6321 (January 12, 2017): 143.17–145. http://dx.doi.org/10.1126/science.355.6321.143-q.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chen, Zehui, Clayton Schoeny, and Lara Dolecek. "Hamming Distance Computation in Unreliable Resistive Memory." IEEE Transactions on Communications 66, no. 11 (November 2018): 5013–27. http://dx.doi.org/10.1109/tcomm.2018.2840717.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Yehl, Kevin, and Timothy Lu. "Scaling computation and memory in living cells." Current Opinion in Biomedical Engineering 4 (December 2017): 143–51. http://dx.doi.org/10.1016/j.cobme.2017.10.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Sun, Zhong, Giacomo Pedretti, Elia Ambrosi, Alessandro Bricalli, and Daniele Ielmini. "In‐Memory Eigenvector Computation in Time O (1)." Advanced Intelligent Systems 2, no. 8 (May 20, 2020): 2000042. http://dx.doi.org/10.1002/aisy.202000042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Hwang, Myeong-Eun, and Sungoh Kwon. "A 0.94 μW 611 KHz In-Situ Logic Operation in Embedded DRAM Memory Arrays in 90 nm CMOS." Electronics 8, no. 8 (August 5, 2019): 865. http://dx.doi.org/10.3390/electronics8080865.

Full text

Abstract:

Conventional computers based on the Von Neumann architecture conduct computation with repeated data movements between their separate processing and memory units, where each movement takes time and energy. Unlike this approach, we experimentally study memory that can perform computation as well as store data within a generic memory array in a non-Von Neumann architecture way. Memory array can innately perform NOR operation that is functionally complete and thus realize any Boolean functions like inversion (NOT), disjunction (OR) and conjunction (AND) operations. With theoretical exploration of memory array performing Boolean computation along with storing data, we demonstrate another potential of memory array with a test chip fabricated in a 90 nm logic process. Measurement results confirm valid in-situ memory logic operations in a 32-kbit memory system that successfully operates down to 135 mV consuming 130 nW at 750 Hz, reducing power and data traffic between the units by five orders of magnitude at the sacrifice of performance.

APA, Harvard, Vancouver, ISO, and other styles

6

Andrade, Marcus V. A., Salles V. G. Magalhães, Mirella A. Magalhães, W. Randolph Franklin, and Barbara M. Cutler. "Efficient viewshed computation on terrain in external memory." GeoInformatica 15, no. 2 (November 26, 2009): 381–97. http://dx.doi.org/10.1007/s10707-009-0100-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Goswami, Mrinal, Jayanta Pal, Mayukh Roy Choudhury, Pritam P. Chougule, and Bibhash Sen. "In memory computation using quantum-dot cellular automata." IET Computers & Digital Techniques 14, no. 6 (November 1, 2020): 336–43. http://dx.doi.org/10.1049/iet-cdt.2020.0008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Jafari, Atousa, Christopher Münch, and Mehdi Tahoori. "A Spintronic 2M/7T Computation-in-Memory Cell." Journal of Low Power Electronics and Applications 12, no. 4 (December 6, 2022): 63. http://dx.doi.org/10.3390/jlpea12040063.

Full text

Abstract:

Computing data-intensive applications on the von Neumann architecture lead to significant performance and energy overheads. The concept of computation in memory (CiM) addresses the bottleneck of von Neumann machines by reducing the data movement in the computing system. Emerging resistive non-volatile memory technologies, as well as volatile memories (SRAM and DRAM), can be used to realize architectures based on the CiM paradigm. In this paper, we propose a hybrid cell design to provide the opportunity for CiM by combining the magnetic tunnel junction (MTJ) and the conventional 6T-SRAM cell. The cell performs CiM operations based on stateful in-array computation, which has better scalability for multiple operands compared with stateless computation in the periphery. Various logic operations such as XOR, OR, and IMP can be performed with the proposed design. In addition, the proposed cell can also operate as a conventional memory cell to read and write volatile as well as non-volatile data. The obtained simulation results show that the proposed CiM-A design can increase the performance of regular memory architectures by reducing the delay by 8 times and the energy by 13 times for database query applications consisting of consecutive bitwise operations with minimum overhead.

APA, Harvard, Vancouver, ISO, and other styles

9

Khan, Kamil, Sudeep Pasricha, and Ryan Gary Kim. "A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures." Journal of Low Power Electronics and Applications 10, no. 4 (September 24, 2020): 30. http://dx.doi.org/10.3390/jlpea10040030.

Full text

Abstract:

Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management.

APA, Harvard, Vancouver, ISO, and other styles

10

Ou, Qiao-Feng, Bang-Shu Xiong, Lei Yu, Jing Wen, Lei Wang, and Yi Tong. "In-Memory Logic Operations and Neuromorphic Computing in Non-Volatile Random Access Memory." Materials 13, no. 16 (August 10, 2020): 3532. http://dx.doi.org/10.3390/ma13163532.

Full text

Abstract:

Recent progress in the development of artificial intelligence technologies, aided by deep learning algorithms, has led to an unprecedented revolution in neuromorphic circuits, bringing us ever closer to brain-like computers. However, the vast majority of advanced algorithms still have to run on conventional computers. Thus, their capacities are limited by what is known as the von-Neumann bottleneck, where the central processing unit for data computation and the main memory for data storage are separated. Emerging forms of non-volatile random access memory, such as ferroelectric random access memory, phase-change random access memory, magnetic random access memory, and resistive random access memory, are widely considered to offer the best prospect of circumventing the von-Neumann bottleneck. This is due to their ability to merge storage and computational operations, such as Boolean logic. This paper reviews the most common kinds of non-volatile random access memory and their physical principles, together with their relative pros and cons when compared with conventional CMOS-based circuits (Complementary Metal Oxide Semiconductor). Their potential application to Boolean logic computation is then considered in terms of their working mechanism, circuit design and performance metrics. The paper concludes by envisaging the prospects offered by non-volatile devices for future brain-inspired and neuromorphic computation.

APA, Harvard, Vancouver, ISO, and other styles

11

Yu, Jintao, Razvan Nane, Imran Ashraf, Mottaqiallah Taouil, Said Hamdioui, Henk Corporaal, and Koen Bertels. "Skeleton-Based Synthesis Flow for Computation-in-Memory Architectures." IEEE Transactions on Emerging Topics in Computing 8, no. 2 (April 1, 2020): 545–58. http://dx.doi.org/10.1109/tetc.2017.2760927.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

OSKIN, MARK, DIANA KEEN, JUSTIN HENSLEY, LUCIAN-VLAD LITA, and FREDERIC T. CHONG. "OPERATING SYSTEMS TECHNIQUES FOR PARALLEL COMPUTATION IN INTELLIGENT MEMORY." Parallel Processing Letters 12, no. 03n04 (September 2002): 311–26. http://dx.doi.org/10.1142/s0129626402001014.

Full text

Abstract:

Advances in DRAM density have led to several proposals to perform computation in memory [1] [2] [3]. Active Pages is a page-based model of intelligent memory that can exploit large amounts of parallel computation in data-intensive applications. With a simple VLIW processor embedded near each page on DRAM, Active Page memory systems achieve up to 1000X speedups over conventional memory systems [4]. Active Pages are specifically designed to support virtualized hardware resources. In this study, we examine operating system techniques that allow Active Page memories to share, or multiplex, embedded VLIW processors across multiple physical Active Pages. We explore the trade-off between individual page-processor performance and page-level multiplexing. We find that hardware costs of computational logic can be reduced from 31% of DRAM chip area to 12%, through multiplexing, without significant loss in performance. Furthermore, manufacturing defects that disable up to 50% of the page processors can be tolerated through efficient resource allocation and associative multiplexing.

APA, Harvard, Vancouver, ISO, and other styles

13

Acker, Daniel, Suzanne Paradis, and Paul Miller. "Stable memory and computation in randomly rewiring neural networks." Journal of Neurophysiology 122, no. 1 (July 1, 2019): 66–80. http://dx.doi.org/10.1152/jn.00534.2018.

Full text

Abstract:

Our brains must maintain a representation of the world over a period of time much longer than the typical lifetime of the biological components producing that representation. For example, recent research suggests that dendritic spines in the adult mouse hippocampus are transient with an average lifetime of ~10 days. If this is true, and if turnover is equally likely for all spines, ~95% of excitatory synapses onto a particular neuron will turn over within 30 days; however, a neuron’s receptive field can be relatively stable over this period. Here, we use computational modeling to ask how memories can persist in neural circuits such as the hippocampus and visual cortex in the face of synapse turnover. We demonstrate that Hebbian plasticity during replay of presynaptic activity patterns can integrate newly formed synapses into pre-existing memories. Furthermore, we find that Hebbian plasticity during replay is sufficient to stabilize the receptive fields of hippocampal place cells in a model of the grid-cell-to-place-cell transformation in CA1 and of orientation-selective cells in a model of the center-surround-to-simple-cell transformation in V1. Together, these data suggest that a simple plasticity rule, correlative Hebbian plasticity of synaptic strengths, is sufficient to preserve neural representations in the face of synapse turnover, even in the absence of activity-dependent structural plasticity. NEW & NOTEWORTHY Recent research suggests that synapses turn over rapidly in some brain structures; however, memories seem to persist for much longer. We show that Hebbian plasticity of synaptic strengths during reactivation events can preserve memory in computational models of hippocampal and cortical networks despite turnover of all synapses. Our results suggest that memory can be stored in the correlation structure of a network undergoing rapid synaptic remodeling.

APA, Harvard, Vancouver, ISO, and other styles

14

Du Nguyen, Hoang Anh, Lei Xie, Mottaqiallah Taouil, Razvan Nane, Said Hamdioui, and Koen Bertels. "On the Implementation of Computation-in-Memory Parallel Adder." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, no. 8 (August 2017): 2206–19. http://dx.doi.org/10.1109/tvlsi.2017.2690571.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Gulafshan, Gulafshan, Selma Amara, Rajat Kumar, Danial Khan, Hossein Fariborzi, and Yehia Massoud. "Bitwise Logical Operations in VCMA-MRAM." Electronics 11, no. 18 (September 6, 2022): 2805. http://dx.doi.org/10.3390/electronics11182805.

Full text

Abstract:

Today’s technology demands compact, portable, fast, and energy-efficient devices. One approach to making energy-efficient devices is an in-memory computation that addresses the memory bottleneck issues of the present computing system by utilizing a spintronic device viz. magnetic tunnel junction (MTJ). Further, area and energy can be reduced through approximate computation. We present a circuit design based on the logic-in-memory computing paradigm on voltage-controlled magnetic anisotropy magnetoresistive random access memory (VCMA-MRAM). During the computation, multiple bit cells within the memory array are selected that are in parallel by activating multiple word lines. The designed circuit performs all logic operations-Read/NOT, AND/NAND, OR/NOR, and arithmetic SUM operation (1-bit approximate adder with 75% accuracy for SUM and accurate carry out) by slight modification using control signals. All the simulations have been performed at a 45 nm CMOS technology node with VCMA-MTJ compact model by using the HSPICE simulator. Simulation results show that the proposed circuit’s approximate adder consumes about 300% less energy and 2.3 times faster than its counterpart exact adder.

APA, Harvard, Vancouver, ISO, and other styles

16

Yang, Renyu, Junzhong Shen, Mei Wen, Yasong Cao, and Yuhang Li. "Integration of Single-Port Memory (ISPM) for Multiprecision Computation in Systolic-Array-Based Accelerators." Electronics 11, no. 10 (May 16, 2022): 1587. http://dx.doi.org/10.3390/electronics11101587.

Full text

Abstract:

On-chip memory is one of the core components of deep learning accelerators. In general, the area used by the on-chip memory accounts for around 30% of the total chip area. With the increasing complexity of deep learning algorithms, it will become a challenge for the accelerators to integrate much larger on-chip memory responding to algorithm needs, whereas the on-chip memory for multiprecision computation is required by the different precision (such as FP32, FP16) computations in training and inference. To solve it, this paper explores the use of single-port memory (SPM) in systolic-array-based deep learning accelerators. We propose transformation methods for multiple precision computation scenarios, respectively, to avoid the conflict of simultaneous read and write requests on the SPM. Then, we prove that the two methods are feasible and can be implemented on hardware without affecting the computation efficiency of the accelerator. Experimental results show that both methods have about 30% and 25% improvement in terms of area cost when accelerator integrates SPM without affecting the throughput of the accelerator, while the hardware cost is almost negligible.

APA, Harvard, Vancouver, ISO, and other styles

17

Bloem, Ilona M., Yurika L. Watanabe, Melissa M. Kibbe, and Sam Ling. "Visual Memories Bypass Normalization." Psychological Science 29, no. 5 (March 29, 2018): 845–56. http://dx.doi.org/10.1177/0956797617747091.

Full text

Abstract:

How distinct are visual memory representations from visual perception? Although evidence suggests that briefly remembered stimuli are represented within early visual cortices, the degree to which these memory traces resemble true visual representations remains something of a mystery. Here, we tested whether both visual memory and perception succumb to a seemingly ubiquitous neural computation: normalization. Observers were asked to remember the contrast of visual stimuli, which were pitted against each other to promote normalization either in perception or in visual memory. Our results revealed robust normalization between visual representations in perception, yet no signature of normalization occurring between working memory stores—neither between representations in memory nor between memory representations and visual inputs. These results provide unique insight into the nature of visual memory representations, illustrating that visual memory representations follow a different set of computational rules, bypassing normalization, a canonical visual computation.

APA, Harvard, Vancouver, ISO, and other styles

18

Hamdioui, Said, Elena-Ioana Vatajelu, and Alberto Bosio. "Guest Editorial: Computation-In-Memory (CIM): from Device to Applications." ACM Journal on Emerging Technologies in Computing Systems 18, no. 2 (April 30, 2022): 1–3. http://dx.doi.org/10.1145/3503263.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

George, Alan, and Esmond Ng. "Some shared memory is desirable in parallel sparse matrix computation." ACM SIGNUM Newsletter 23, no. 2 (April 1988): 9–13. http://dx.doi.org/10.1145/47917.47919.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Mutlu, Onur, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. "Processing data where it makes sense: Enabling in-memory computation." Microprocessors and Microsystems 67 (June 2019): 28–41. http://dx.doi.org/10.1016/j.micpro.2019.01.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Zhao, Yang, Fengyu Qian, Faquir Jain, and Lei Wang. "Quantum-Dot Transistor Based Multi-Bit Multiplier Unit for In-Memory Computing." International Journal of High Speed Electronics and Systems 29, no. 01n04 (March 2020): 2040007. http://dx.doi.org/10.1142/s0129156420400078.

Full text

Abstract:

In-memory computing is an emerging technique to fulfill the fast growing demand for high-performance data processing. This technique provides fast processing and high throughput by accessing data stored in the memory array rather than dealing with complicated operation and data movement on hard drive. For data processing, the most important computation is dot product, which is also the core computation for applications such as deep learning neuron networks, machine learning, etc. As multiplication is the key function in dot product, it is critical to improve its performance and achieve faster memory processing. In this paper, we present a design with the ability to perform in-memory multi-bit multiplications. The proposed design is implemented by using quantum-dot transistors, which enable multi-bit computations in the memory cell. Experimental results demonstrate that the proposed design provides reliable in-memory multi-bit multiplications with high density and high energy efficiency. Statistical analysis is performed using Monte Carlo simulations to investigate the process variations and error effects.

APA, Harvard, Vancouver, ISO, and other styles

22

Yantır, Hasan Erdem, Ahmed M. Eltawil, and Khaled N. Salama. "Efficient Acceleration of Stencil Applications through In-Memory Computing." Micromachines 11, no. 6 (June 26, 2020): 622. http://dx.doi.org/10.3390/mi11060622.

Full text

Abstract:

The traditional computer architectures severely suffer from the bottleneck between the processing elements and memory that is the biggest barrier in front of their scalability. Nevertheless, the amount of data that applications need to process is increasing rapidly, especially after the era of big data and artificial intelligence. This fact forces new constraints in computer architecture design towards more data-centric principles. Therefore, new paradigms such as in-memory and near-memory processors have begun to emerge to counteract the memory bottleneck by bringing memory closer to computation or integrating them. Associative processors are a promising candidate for in-memory computation, which combines the processor and memory in the same location to alleviate the memory bottleneck. One of the applications that need iterative processing of a huge amount of data is stencil codes. Considering this feature, associative processors can provide a paramount advantage for stencil codes. For demonstration, two in-memory associative processor architectures for 2D stencil codes are proposed, implemented by both emerging memristor and traditional SRAM technologies. The proposed architecture achieves a promising efficiency for a variety of stencil applications and thus proves its applicability for scientific stencil computing.

APA, Harvard, Vancouver, ISO, and other styles

23

Kohata, Naoki, Toru Yamaguchi, Takanobu Baba, and Hideki Hashimoto. "Chaotic Evolutionary Parallel Computation on Intelligent Agents." Journal of Robotics and Mechatronics 10, no. 5 (October 20, 1998): 424–30. http://dx.doi.org/10.20965/jrm.1998.p0424.

Full text

Abstract:

This paper proposes evolutionary computation using chaotic dynamics rather than the conventional genetic algorithm (GA) for such intelligent agents as welfare robots. Proposed evolutionary computation applies chaotic retrieval to associative memory. We applied evolutionary computation to multiagent robots moving side by side in step. Evolutionary computation is basically parallel processing, so we implement its parallel processing algorithm on A-NET (Actors NETwork) parallel objectoriented computer to show usefulness of parallel processing in evolutionary computation.

APA, Harvard, Vancouver, ISO, and other styles

24

Zahedi, Mahdi, Muah Abu Lebdeh, Christopher Bengel, Dirk Wouters, Stephan Menzel, Manuel Le Gallo, Abu Sebastian, Stephan Wong, and Said Hamdioui. "MNEMOSENE: Tile Architecture and Simulator for Memristor-based Computation-in-memory." ACM Journal on Emerging Technologies in Computing Systems 18, no. 3 (July 31, 2022): 1–24. http://dx.doi.org/10.1145/3485824.

Full text

Abstract:

In recent years, we are witnessing a trend toward in-memory computing for future generations of computers that differs from traditional von-Neumann architecture in which there is a clear distinction between computing and memory units. Considering that data movements between the central processing unit (CPU) and memory consume several orders of magnitude more energy compared to simple arithmetic operations in the CPU, in-memory computing will lead to huge energy savings as data no longer needs to be moved around between these units. In an initial step toward this goal, new non-volatile memory technologies, e.g., resistive RAM (ReRAM) and phase-change memory (PCM), are being explored. This has led to a large body of research that mainly focuses on the design of the memory array and its peripheral circuitry. In this article, we mainly focus on the tile architecture (comprising a memory array and peripheral circuitry) in which storage and compute operations are performed in the (analog) memory array and the results are produced in the (digital) periphery. Such an architecture is termed compute-in-memory-periphery (CIM-P). More precisely, we derive an abstract CIM-tile architecture and define its main building blocks. To bridge the gap between higher-level programming languages and the underlying (analog) circuit designs, an instruction-set architecture is defined that is intended to control and, in turn, sequence the operations within this CIM tile to perform higher-level more complex operations. Moreover, we define a procedure to pipeline the CIM-tile operations to further improve the performance. To simulate the tile and perform design space exploration considering different technologies and parameters, we introduce the fully parameterized first-of-its-kind CIM tile simulator and compiler. Furthermore, the compiler is technology-aware when scheduling the CIM-tile instructions. Finally, using the simulator, we perform several preliminary design space explorations regarding the three competing technologies, ReRAM, PCM, and STT-MRAM concerning CIM-tile parameters, e.g., the number of ADCs. Additionally, we investigate the effect of pipelining in relation to the clock speeds of the digital periphery assuming the three technologies. In the end, we demonstrate that our simulator is also capable of reporting energy consumption for each building block within the CIM tile after the execution of in-memory kernels considering the data-dependency on the energy consumption of the memory array. All the source codes are publicly available.

APA, Harvard, Vancouver, ISO, and other styles

25

Oskin, Mark, Lucian-Vlad Lita, Frederic T. Chong, Justin Hensley, and Diana Keen. "Algorithmic Complexity with Page-Based Intelligent Memory." Parallel Processing Letters 10, no. 01 (March 2000): 99–109. http://dx.doi.org/10.1142/s0129626400000111.

Full text

Abstract:

High DRAM densities will make intelligent memory chips a commodity in the next five years [1] [2]. This paper focuses upon a promising model of computation in intelligent memory, Active Pages [3], where computation is associated with each page of memory. Computational hardware scales linearly and inexpensively with data size in this model, reducing the order of many algorithms. This scaling can, for example, reduce linear-time algorithms to [Formula: see text]. When page-based intelligent memory chips become available in commodity, they will change the way programmers select and utilize algorithms. In this paper, we analyze the asymptotic performance of several common algorithms as problem sizes scale. We also derive the optimal page size, as a function of problem size, for each algorithm running with intelligent memory. Finally, we validate these analyses with simulation results.

APA, Harvard, Vancouver, ISO, and other styles

26

Kaburcuk, Fatih, and Atef Elsherbeni. "Efficient Electromagnetic Analysis of a Dispersive Head Model Due to Smart Glasses Embedded Antennas at Wi-Fi and 5G Frequencies." Applied Computational Electromagnetics Society 36, no. 2 (March 16, 2021): 159–67. http://dx.doi.org/10.47037/2020.aces.j.360207.

Full text

Abstract:

Numerical study of electromagnetic interaction between an adjacent antenna and a human head model requires long computation time and large computer memory. In this paper, two speeding up techniques for a dispersive algorithm based on finite-difference time-domain method are used to reduce the required computation time and computer memory. In order to evaluate the validity of these two speeding up techniques, specific absorption rate (SAR) and temperature rise distributions in a dispersive human head model due to radiation from an antenna integrated into a pair of smart glasses are investigated. The antenna integrated into the pair of smart glasses have wireless connectivity at 2.4 GHz and 5th generation (5G) cellular connectivity at 4.9 GHz. Two different positions for the antenna integrated into the frame are considered in this investigation. These techniques provide remarkable reduction in computation time and computer memory.

APA, Harvard, Vancouver, ISO, and other styles

27

Cılasun, Hüsrev, Salonik Resch, Zamshed I. Chowdhury, Erin Olson, Masoud Zabihi, Zhengyang Zhao, Thomas Peterson, et al. "Spiking Neural Networks in Spintronic Computational RAM." ACM Transactions on Architecture and Code Optimization 18, no. 4 (December 31, 2021): 1–21. http://dx.doi.org/10.1145/3475963.

Full text

Abstract:

Spiking Neural Networks (SNNs) represent a biologically inspired computation model capable of emulating neural computation in human brain and brain-like structures. The main promise is very low energy consumption. Classic Von Neumann architecture based SNN accelerators in hardware, however, often fall short of addressing demanding computation and data transfer requirements efficiently at scale. In this article, we propose a promising alternative to overcome scalability limitations, based on a network of in-memory SNN accelerators, which can reduce the energy consumption by up to 150.25= when compared to a representative ASIC solution. The significant reduction in energy comes from two key aspects of the hardware design to minimize data communication overheads: (1) each node represents an in-memory SNN accelerator based on a spintronic Computational RAM array, and (2) a novel, De Bruijn graph based architecture establishes the SNN array connectivity.

APA, Harvard, Vancouver, ISO, and other styles

28

Dey, Sowvik, Mihir Kumar Mahata, and Amiya Karmakar. "An FPGA-Based Design of a Fault-Tolerant Shared Memory Structure." International Journal of Electronics, Communications, and Measurement Engineering 11, no. 1 (January 1, 2022): 1–11. http://dx.doi.org/10.4018/ijecme.312258.

Full text

Abstract:

In this current era of smart computation, faster processing speed is needed. Execution of a process in a parallel manner can achieve higher throughput. For avoiding the Von-Neumann bottle-neck phenomena, the speed of the memories should be high. A high-speed memory can provide contiguous data to a high-speed processor that retains the high performance of that processor. Various high-speed memory technologies are existing in the market such as interleaved memory, cache memory, etc. Multiprocessing technology is also used to achieve high-performance computation. It is still challenging to make a high-speed memory device that can provide data to a multiprocessing system in a contiguous way. Shared memory can fulfill all of the requirements of a multiprocessing system. The efficiency of shared memory can be enhanced by introducing the fault-tolerant mechanism in it without affecting the inter-process communication, which will be discussed in this paper.

APA, Harvard, Vancouver, ISO, and other styles

29

Chua, Kyle Matthew Chan, Janz Aeinstein Fauni Villamayor, Lorenzo Campos Bautista, and Roger Luis Uy. "Implementation of hyyrö’s bit-vector algorithm using advanced vector extensions 2." International Journal of Advances in Intelligent Informatics 5, no. 3 (October 29, 2019): 230. http://dx.doi.org/10.26555/ijain.v5i3.362.

Full text

Abstract:

The Advanced Vector Extensions 2 (AVX2) instruction set architecture was introduced by Intel’s Haswell microarchitecture that features improved processing power, wider vector registers, and a rich instruction set. This study presents an implementation of the Hyyrö’s bit-vector algorithm for pairwise Deoxyribonucleic Acid (DNA) sequence alignment that takes advantage of Single-Instruction-Multiple-Data (SIMD) computing capabilities of AVX2 on modern processors. It investigated the effects of the length of the query and reference sequences to the I/O load time, computation time, and memory consumption. The result reveals that the experiment has achieved an I/O load time of ϴ(n), computation time of ϴ(n*⌈m/64⌉), and memory consumption of ϴ(n). The implementation computed more extended time complexity than the expected ϴ(n) due to instructional and architectural limitations. Nonetheless, it was par with other experiments, in terms of computation time complexity and memory consumption.

APA, Harvard, Vancouver, ISO, and other styles

30

Kim, HanBit, Seokhie Hong, and HeeSeok Kim. "Lightweight Conversion from Arithmetic to Boolean Masking for Embedded IoT Processor." Applied Sciences 9, no. 7 (April 5, 2019): 1438. http://dx.doi.org/10.3390/app9071438.

Full text

Abstract:

A masking method is a widely known countermeasure against side-channel attacks. To apply a masking method to cryptosystems consisting of Boolean and arithmetic operations, such as ARX (Addition, Rotation, XOR) block ciphers, a masking conversion algorithm should be used. Masking conversion algorithms can be classified into two categories: “Boolean to Arithmetic (B2A)” and “Arithmetic to Boolean (A2B)”. The A2B algorithm generally requires more execution time than the B2A algorithm. Using pre-computation tables, the A2B algorithm substantially reduces its execution time, although it requires additional space in RAM. In CHES2012, B. Debraize proposed a conversion algorithm that somewhat reduced the memory cost of using pre-computation tables. However, they still require ( 2 ( k + 1 ) ) entries of length ( k + 1 ) -bit where k denotes the size of the processed data. In this paper, we propose a low-memory algorithm to convert A2B masking that requires only ( 2 k ) ( k ) -bit. Our contributions are three-fold. First, we specifically show how to reduce the pre-computation table from ( k + 1 ) -bit to ( k ) -bit, as a result, the memory use for the pre-computation table is reduced from ( 2 ( k + 1 ) ) ( k + 1 ) -bit to ( 2 k ) ( k ) -bit. Second, we optimize the execution times of the pre-computation phase and the conversion phase, and determine that our pre-computation algorithm requires approximately half of the operations than Debraize’s algorithm. The results of the 8/16/32-bit simulation show improved speed in the pre-computation phase and the conversion phase as compared to Debraize’s results. Finally, we verify the security of the algorithm against side-channel attacks as well as the soundness of the proposed algorithm.

APA, Harvard, Vancouver, ISO, and other styles

31

Cassinerio, M., N. Ciocchini, and D. Ielmini. "Logic Computation in Phase Change Materials by Threshold and Memory Switching." Advanced Materials 25, no. 41 (August 15, 2013): 5975–80. http://dx.doi.org/10.1002/adma.201301940.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

HAMANN, HEIKO, and HEINZ WÖRN. "EMBODIED COMPUTATION." Parallel Processing Letters 17, no. 03 (September 2007): 287–98. http://dx.doi.org/10.1142/s0129626407003022.

Full text

Abstract:

The traditional computational devices and models, such as the von Neumann architecture or the Turing machine, are strongly influenced by concepts of central control and perfection. The standard models of computation seem to cover the reality of computation only partially and lack, in particular, in the ability to describe more natural forms of computation. In this paper we propose the concept of embodied computation, a straight forward advancement of well known concepts such as amorphous computing, emergent phenomena and embodied cognitive science. Many embodied microscopic computational devices form a single macroscopic device of embodied computation. The solution to computational problems emerges from a huge amount of local interactions. The system's memory is the sum of the positional information and possibly of the internal states. Such systems are very robust and allow different methodologies to analyze computation. To back this theoretic concept some results based on simulations are given and potential benefits of this approach are discussed.

APA, Harvard, Vancouver, ISO, and other styles

33

DIALLO, MOHAMADOU, AFONSO FERREIRA, and ANDREW RAU-CHAPLIN. "A NOTE ON COMMUNICATION-EFFICIENT DETERMINISTIC PARALLEL ALGORITHMS FOR PLANAR POINT LOCATION AND 2D VORONOÏ DIAGRAM." Parallel Processing Letters 11, no. 02n03 (June 2001): 327–40. http://dx.doi.org/10.1142/s0129626401000622.

Full text

Abstract:

In this note we describe deterministic parallel algorithms for planar point location and for building the Voronoï Diagram of n co-planar points. These algorithms are designed for BSP/CGM-like models of computation, where p processors, with [Formula: see text] local memory each, communicate through some arbitrary interconnection network. They are communication-efficient since they require, respectively, O(1) and O( log p) communication steps and [Formula: see text] local computation per step. Both algorithms require [Formula: see text] local memory.

APA, Harvard, Vancouver, ISO, and other styles

34

Li, Wei, and Zhao Deng. "Decision Support System in a Memristor-Based Mobile CIM Architecture Applied on Big Data Computation and Storage." Scientific Programming 2021 (December 14, 2021): 1–8. http://dx.doi.org/10.1155/2021/9041150.

Full text

Abstract:

Data computation and storage are essential parts of developing big data applications. The memristor device technology could remove the speed and energy efficiency bottleneck in the existing data processing. The present experimental work investigates the decision support system in a new architecture, computation-in-memory (CIM) architecture, which can be utilized to store and process big data in the same physical location at a faster rate. The decision support system is used for data computation and storage, with the aims of helping memory units read, write, and erase data and supporting their decisions under big data communication ambiguities. Data communication is realized within the crossbar by the support of peripheral controller blocks. The feasibility of the CIM architecture, adaptive read, write, and erase methods, and memory accuracy were investigated. The integrated circuit emphasis (SPICE) simulation results show that the proposed CIM architecture has the potential of improving the computing efficiency, energy consumption, and performance area by at least two orders of magnitude. CIM architecture may be used to mitigate big data processing limits caused by the conventional computer architecture and complementary metal-oxide-semiconductor (CMOS) transistor process technologies.

APA, Harvard, Vancouver, ISO, and other styles

35

Pommerening, Florian, and Malte Helmert. "Incremental LM-Cut." Proceedings of the International Conference on Automated Planning and Scheduling 23 (June 2, 2013): 162–70. http://dx.doi.org/10.1609/icaps.v23i1.13560.

Full text

Abstract:

In heuristic search and especially in optimal classical planning the computation of accurate heuristic values can take up the majority of runtime. In many cases, the heuristic computations for a search node and its successors are very similar, leading to significant duplication of effort. For example most landmarks of a node that are computed by the LM-cut algorithm are also landmarks for the node's successors. We propose to reuse these landmarks and incrementally compute new ones to speed up the LM-cut calculation. The speed advantage obtained by incremental computation is offset by higher memory usage. We investigate different search algorithms that reduce memory usage without sacrificing the faster computation, leading to a substantial increase in coverage for benchmark domains from the International Planning Competitions.

APA, Harvard, Vancouver, ISO, and other styles

36

ACAR, UMUT A., MATTHIAS BLUME, and JACOB DONHAM. "A consistent semantics of self-adjusting computation." Journal of Functional Programming 23, no. 3 (May 2013): 249–92. http://dx.doi.org/10.1017/s0956796813000099.

Full text

Abstract:

AbstractThis paper presents a semantics of self-adjusting computation and proves that the semantics is correct and consistent. The semantics introduces memoizing change propagation, which enhances change propagation with the classic idea of memoization to enable reuse of computations even when memory is mutated via side effects. During evaluation, computation reuse via memoization triggers a change-propagation algorithm that adapts the reused computation to the memory mutations (side effects) that took place since the creation of the computation. Since the semantics includes both memoization and change propagation, it involves both non-determinism (due to memoization) and mutation (due to change propagation). Our consistency theorem states that the non-determinism is not harmful: any two evaluations of the same program starting at the same state yield the same result. Our correctness theorem states that mutation is not harmful: Self-adjusting programs are compatible with purely functional programming. We formalize the semantics and its meta-theory in the LF logical framework and machine check our proofs using Twelf.

APA, Harvard, Vancouver, ISO, and other styles

37

YIN, PENG-YENG. "A TABU SEARCH APPROACH TO POLYGONAL APPROXIMATION OF DIGITAL CURVES." International Journal of Pattern Recognition and Artificial Intelligence 14, no. 02 (March 2000): 243–55. http://dx.doi.org/10.1142/s0218001400000167.

Full text

Abstract:

Most of the previous polygonal approximation methods are sub-optimal and their compression ratio with respect to a given error tolerance is not good enough for some applications. Recently, a genetic-based approach which can provide near-optimal solutions is presented, but the demands on computation and memory are too high. In this paper, a new polygonal approximation method using the tabu search technique is presented. The proposed method produces a higher compression ratio than the suboptimal approach and the genetic-based approach, and the computation cost and the memory store are more efficient.

APA, Harvard, Vancouver, ISO, and other styles

38

Rajput, Anil Kumar, and Manisha Pattanaik. "A Nonvolatile 7T2M SRAM Cell with Improved Noise Margin for Energy Efficient In Memory Boolean Computations." International Journal of Engineering Research in Electronics and Communication Engineering 9, no. 1 (January 31, 2022): 1–8. http://dx.doi.org/10.36647/ijerece/09.01.a001.

Full text

Abstract:

The current computing systems are facing von Neumann bottleneck (VNB) in modern times due to the high prominence on big-data applications such as artificial intelligence and neuromorphic computing. In-memory computation is one of the emerging computing paradigms to mitigate this VNB. In this paper, a memristor-based robust 7T2M Nonvolatile-SRAM (NvSRAM) is proposed for energy-efficient In-memory computation. The 7T2M NvSRAM is designed using CMOS and memristor with a higher resistance ratio, which improved the write margin by 74.44% and the energy consumption for read and write operation by 5.10% and 9.66% over conventional 6T SRAM at the cost of increment in write delay. The read decoupled path with the VGND line enhances the read margin and read path Ion/Ioff ratio of 7T2M NvSRAM cell by 2.69× and 102.42%, respectively over conventional 6T SRAM. The proposed cell uses a stacking transistor to reduce the leakage power in standby mode by 64.20% over conventional 6T SRAM. In addition to the normal SRAM function, the proposed 7T2M NvSRAM performs In-Memory Boolean Computation (IMBC) operations such as NAND, AND, NOR, OR, and XOR in a single cycle without compute-disturb (stored data flips during IMC). It achieves 4.29-fJ/bit average energy consumption at 1.8 V for IMBC operations

APA, Harvard, Vancouver, ISO, and other styles

39

Hotta, Wataru, Shunichi Suzuki, and Muneo Hori. "On Contraction of Three-Dimensional Multiple Shear Mechanism Model for Evaluation of Large Scale Liquefaction Using High Performance Computing." Geosciences 9, no. 1 (January 12, 2019): 38. http://dx.doi.org/10.3390/geosciences9010038.

Full text

Abstract:

For more reliable evaluation of liquefaction, an analysis model of higher fidelity should be used even though it requires more numerical computation. We developed a parallel finite element method (FEM), implemented with the non-linear multiple shear mechanism model. A bottleneck experienced when implementing the model is the use of vast amounts of CPU memory for material state parameters. We succeeded in drastically reducing the computation requirements of the model by suitably approximating the formulation of the model. An analysis model of high fidelity was constructed for a soil-structure system, and the model was analyzed by using the developed parallel FEM on a parallel computer. The amount of required CPU memory was reduced. The computation time was reduced as well, and the practical applicability of the developed parallel FEM is demonstrated.

APA, Harvard, Vancouver, ISO, and other styles

40

Lan, Qiang, Zelong Wang, Mei Wen, Chunyuan Zhang, and Yijie Wang. "High Performance Implementation of 3D Convolutional Neural Networks on a GPU." Computational Intelligence and Neuroscience 2017 (2017): 1–8. http://dx.doi.org/10.1155/2017/8348671.

Full text

Abstract:

Convolutional neural networks have proven to be highly successful in applications such as image classification, object tracking, and many other tasks based on 2D inputs. Recently, researchers have started to apply convolutional neural networks to video classification, which constitutes a 3D input and requires far larger amounts of memory and much more computation. FFT based methods can reduce the amount of computation, but this generally comes at the cost of an increased memory requirement. On the other hand, the Winograd Minimal Filtering Algorithm (WMFA) can reduce the number of operations required and thus can speed up the computation, without increasing the required memory. This strategy was shown to be successful for 2D neural networks. We implement the algorithm for 3D convolutional neural networks and apply it to a popular 3D convolutional neural network which is used to classify videos and compare it to cuDNN. For our highly optimized implementation of the algorithm, we observe a twofold speedup for most of the 3D convolution layers of our test network compared to the cuDNN version.

APA, Harvard, Vancouver, ISO, and other styles

41

Dou, Wan Feng, Jing Zhao, Kun Yang, and Min Xu. "Data Parallel and Scheduling Mechanism Based on Petri Nets." Applied Mechanics and Materials 543-547 (March 2014): 3264–67. http://dx.doi.org/10.4028/www.scientific.net/amm.543-547.3264.

Full text

Abstract:

Data-parallel and task-parallel methods are the basic methods frequently used for algorithm design in parallel computing. Data-parallel method as name means is used for partition data to be processed into some small blocks considering storage and computing capacity such as memory size of a computation node, node number to take part in parallel computing and total data size, and etc. On the other hand, data dispensing strategy is an important problem carefully considered to increase the efficiency of computation. According to the characteristics of analysis of digital terrain, petri nets is introduced to describe the parallel relationships within data partitions based on data granularity model considering two kinds of computing modes, shared memory and distributed memory respectively, and corresponding scheduling algorithms are proposed for load balance. The experimental results show that our method is very usable to data partition and dispensation, in particular to distributed memory mode.

APA, Harvard, Vancouver, ISO, and other styles

42

Xu, Shilin, and Caili Guo. "Computation Offloading in a Cognitive Vehicular Networks with Vehicular Cloud Computing and Remote Cloud Computing." Sensors 20, no. 23 (November 29, 2020): 6820. http://dx.doi.org/10.3390/s20236820.

Full text

Abstract:

To satisfy the explosive growth of computation-intensive vehicular applications, we investigated the computation offloading problem in a cognitive vehicular networks (CVN). Specifically, in our scheme, the vehicular cloud computing (VCC)- and remote cloud computing (RCC)-enabled computation offloading were jointly considered. So far, extensive research has been conducted on RCC-based computation offloading, while the studies on VCC-based computation offloading are relatively rare. In fact, due to the dynamic and uncertainty of on-board resource, the VCC-based computation offloading is more challenging then the RCC one, especially under the vehicular scenario with expensive inter-vehicle communication or poor communication environment. To solve this problem, we propose to leverage the VCC’s computation resource for computation offloading with a perception-exploitation way, which mainly comprise resource discovery and computation offloading two stages. In resource discovery stage, upon the action-observation history, a Long Short-Term Memory (LSTM) model is proposed to predict the on-board resource utilizing status at next time slot. Thereafter, based on the obtained computation resource distribution, a decentralized multi-agent Deep Reinforcement Learning (DRL) algorithm is proposed to solve the collaborative computation offloading with VCC and RCC. Last but not least, the proposed algorithms’ effectiveness is verified with a host of numerical simulation results from different perspectives.

APA, Harvard, Vancouver, ISO, and other styles

43

Likhoded, N. A. "Conditions for the existence of broadcast and spatial locality in computation threads." Proceedings of the National Academy of Sciences of Belarus. Physics and Mathematics Series 58, no. 3 (October 12, 2022): 292–99. http://dx.doi.org/10.29235/1561-2430-2022-58-3-292-299.

Full text

Abstract:

Graphics Processing Units (GPUs) are considered as the target computer for implementing parallel algorithms. The set of algorithm operations to be implemented on the GPU must be split into computation threads; the threads should be grouped into computation blocks that are performed atomically on stream processors. Threads of a single block are executed on a stream processor in parts-pools called warp; warp threads are executed simultaneously. The efficiency of the parallel algorithm depends on the way the data is stored in the GPU memory. If all warp threads request the same datum when executing the current operator, then it is desirable to place it in a shared or constant GPU memory; in this case, its distribution across the cores of the multiprocessor is actually realized by means of broadcast. If warp threads request data located close to the memory, then in this case there is a spatial locality of data, which makes it advisable to place this data in the GPU’s memory. The implementation of broadcast or spatial locality by placing data in a memory of the appropriate type allows one to significantly reduce traffic when exchanging data between the memory levels of the GPU. This paper formulates and proves the necessary and sufficient conditions under which it is possible to perform a broadcast or there is a spatial locality of data. The conditions are formulated in terms of functions that determine the use of array elements at occurrences in the algorithm operators and functions that define the information dependencies of the algorithm. The results of the work can be used to optimize parallel algorithms when they are implemented on the GPU.

APA, Harvard, Vancouver, ISO, and other styles

44

Liao, Quingbo, and George A. McMechan. "2-D pseudo-spectral viscoacoustic modeling in a distributed-memory multi-processor computer." Bulletin of the Seismological Society of America 83, no. 5 (October 1, 1993): 1345–54. http://dx.doi.org/10.1785/bssa0830051345.

Full text

Abstract:

Abstract Two pseudo-spectral implementations of 2-D viscoacoustic modeling are developed in a distributed-memory multi-processor computing environment. The first involves simultaneous computation of the response of one model to many source locations and, as it requires no interprocessor communication, is perfectly parallel. The second involves computation of the response, to one source, of a large model that is distributed across all processors. In the latter, local rather than global, Fourier transforms are used to minimize interprocessor communication and to eliminate the need for matrix transposition. In both algorithms, absorbing boundaries are defined as zones of decreased Q as part of the model, and so require no extra computation. An empirical method of determining sets of relaxation times for a broad range of Q values eliminates the need for iterative fitting of Q-frequency curves.

APA, Harvard, Vancouver, ISO, and other styles

45

Liu, Mingshuo, Payal Borulkar, Mousam Hossain, Ronald F. Demara, and Yu Bai. "Spin-Orbit Torque Neuromorphic Fabrics for Low-Leakage Reconfigurable In-Memory Computation." IEEE Transactions on Electron Devices 69, no. 4 (April 2022): 1727–35. http://dx.doi.org/10.1109/ted.2021.3140040.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Viswanathan, S., D. R. Perl, K. M. Visscher, M. J. Kahana, and R. Sekuler. "Homogeneity computation: How interitem similarity in visual short-term memory alters recognition." Psychonomic Bulletin & Review 17, no. 1 (January 15, 2010): 59–65. http://dx.doi.org/10.3758/pbr.17.1.59.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Rangachar Srinivasa, Srivatsa, Akshay Krishna Ramanathan, Xueqing Li, Wei-Hao Chen, Sumeet Kumar Gupta, Meng-Fan Chang, Swaroop Ghosh, Jack Sampson, and Vijaykrishnan Narayanan. "ROBIN: Monolithic-3D SRAM for Enhanced Robustness with In-Memory Computation Support." IEEE Transactions on Circuits and Systems I: Regular Papers 66, no. 7 (July 2019): 2533–45. http://dx.doi.org/10.1109/tcsi.2019.2897497.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Braham, Yosra, Yaroub Elloumi, Mohamed Akil, and Mohamed Hedi Bedoui. "Parallel computation of Watershed Transform in weighted graphs on shared memory machines." Journal of Real-Time Image Processing 17, no. 3 (July 18, 2018): 527–42. http://dx.doi.org/10.1007/s11554-018-0804-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Gilbert, Jean Charles, and Jorge Nocedal. "Automatic differentiation and the step computation in the limited memory BFGS method." Applied Mathematics Letters 6, no. 3 (May 1993): 47–50. http://dx.doi.org/10.1016/0893-9659(93)90032-i.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Zayer, Fakhreddine, Baker Mohammad, Hani Saleh, and Gabriele Gianini. "RRAM Crossbar-Based In-Memory Computation of Anisotropic Filters for Image Preprocessingloa." IEEE Access 8 (2020): 127569–80. http://dx.doi.org/10.1109/access.2020.3004184.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Computation-in-memory'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles