Статті в журналах: "In-memory compute"

1

Varnava, Christiana. "Photonic devices compute in memory." Nature Electronics 2, no. 3 (March 2019): 91. http://dx.doi.org/10.1038/s41928-019-0226-1.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

2

John-Africa, Elijah, and Victor T. Emmah. "Performance Evaluation of LSTM and RNN Models in the Detection of Email Spam Messages." European Journal of Information Technologies and Computer Science 2, no. 6 (November 26, 2022): 24–30. http://dx.doi.org/10.24018/compute.2022.2.6.80.

Повний текст джерела

Анотація:

Email spam is an unwanted bulk message that is sent to a recipient’s email address without explicit consent from the recipient. This is usually considered a means of advertising and maximizing profit, especially with the increase in the usage of the internet for social networking, but can also be very frustrating and annoying to the recipients of these messages. Recent research has shown that about 14.7 billion spam messages are sent out every single day of which more than 45% of these messages are promotional sales content that the recipient did not specifically opt-in. This has gotten the attention of many researchers in the area of natural language processing. In this paper, we used the Long Short-Time Memory (LSTM) for classification tasks between spam and Ham messages. The performance of LSTM is compared with that of a Recurrent Neural Network( RNN) which can also be used for a classification task of this nature but suffers from short-time memory and tends to leave out important information from earlier time steps to later ones in terms of prediction. The evaluation of the result shows that LSTM achieved 97% accuracy with both Adams and RMSprop optimizers compared to RNN with an accuracy of 94% with RMSprop and 87% accuracy with Adams optimizer.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Zhao, Dongyan, Yubo Wang, Jin Shao, Yanning Chen, Zhiwang Guo, Cheng Pan, Guangzhi Dong, et al. "Compute-in-Memory for Numerical Computations." Micromachines 13, no. 5 (May 2, 2022): 731. http://dx.doi.org/10.3390/mi13050731.

Повний текст джерела

Анотація:

In recent years, compute-in-memory (CIM) has been extensively studied to improve the energy efficiency of computing by reducing data movement. At present, CIM is frequently used in data-intensive computing. Data-intensive computing applications, such as all kinds of neural networks (NNs) in machine learning (ML), are regarded as ‘soft’ computing tasks. The ‘soft’ computing tasks are computations that can tolerate low computing precision with little accuracy degradation. However, ‘hard’ tasks aimed at numerical computations require high-precision computing and are also accompanied by energy efficiency problems. Numerical computations exist in lots of applications, including partial differential equations (PDEs) and large-scale matrix multiplication. Therefore, it is necessary to study CIM for numerical computations. This article reviews the recent developments of CIM for numerical computations. The different kinds of numerical methods solving partial differential equations and the transformation of matrixes are deduced in detail. This paper also discusses the iterative computation of a large-scale matrix, which tremendously affects the efficiency of numerical computations. The working procedure of the ReRAM-based partial differential equation solver is emphatically introduced. Moreover, other PDEs solvers, and other research about CIM for numerical computations, are also summarized. Finally, prospects and the future of CIM for numerical computations with high accuracy are discussed.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Handy, Jim, and Tom Coughlin. "Semiconductor Architectures Enable Compute in Memory." Computer 56, no. 5 (May 2023): 126–29. http://dx.doi.org/10.1109/mc.2023.3252099.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Miller, Ethan, Achilles Benetopoulos, George Neville-Neil, Pankaj Mehra, and Daniel Bittman. "Pointers in Far Memory." Queue 21, no. 3 (June 23, 2023): 75–93. http://dx.doi.org/10.1145/3606029.

Повний текст джерела

Анотація:

Effectively exploiting emerging far-memory technology requires consideration of operating on richly connected data outside the context of the parent process. Operating-system technology in development offers help by exposing abstractions such as memory objects and globally invariant pointers that can be traversed by devices and newly instantiated compute. Such ideas will allow applications running on future heterogeneous distributed systems with disaggregated memory nodes to exploit near-memory processing for higher performance and to independently scale their memory and compute resources for lower cost.

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Wan, Weier, Rajkumar Kubendran, Clemens Schaefer, Sukru Burc Eryilmaz, Wenqiang Zhang, Dabin Wu, Stephen Deiss, et al. "A compute-in-memory chip based on resistive random-access memory." Nature 608, no. 7923 (August 17, 2022): 504–12. http://dx.doi.org/10.1038/s41586-022-04992-8.

Повний текст джерела

Анотація:

AbstractRealizing increasingly complex artificial intelligence (AI) functionalities directly on edge devices calls for unprecedented energy efficiency of edge hardware. Compute-in-memory (CIM) based on resistive random-access memory (RRAM)1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory2–5. Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware6–17, it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design. Here, by co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM—a RRAM-based CIM chip that simultaneously delivers versatility in reconfiguring CIM cores for diverse model architectures, energy efficiency that is two-times better than previous state-of-the-art RRAM-CIM chips across various computational bit-precisions, and inference accuracy comparable to software models quantized to four-bit weights across various AI tasks, including accuracy of 99.0 percent on MNIST18 and 85.7 percent on CIFAR-1019 image classification, 84.7-percent accuracy on Google speech command recognition20, and a 70-percent reduction in image-reconstruction error on a Bayesian image-recovery task.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Wang, Ruihong, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, and Walid G. Aref. "The case for distributed shared-memory databases with RDMA-enabled memory disaggregation." Proceedings of the VLDB Endowment 16, no. 1 (September 2022): 15–22. http://dx.doi.org/10.14778/3561261.3561263.

Повний текст джерела

Анотація:

Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lower cost of ownership. This paper makes the case that MD can fuel the next wave of innovation on database systems. We observe that MD revives the great debate of "shared what" in the database community. We envision that distributed shared-memory databases (DSM-DB, for short) - that have not received much attention before - can be promising in the future with MD. We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Yu, Shimeng, Wonbo Shim, Xiaochen Peng, and Yandong Luo. "RRAM for Compute-in-Memory: From Inference to Training." IEEE Transactions on Circuits and Systems I: Regular Papers 68, no. 7 (July 2021): 2753–65. http://dx.doi.org/10.1109/tcsi.2021.3072200.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Alam, Shamiul, Md Mazharul Islam, Md Shafayat Hossain, Akhilesh Jaiswal, and Ahmedullah Aziz. "CryoCiM: Cryogenic compute-in-memory based on the quantum anomalous Hall effect." Applied Physics Letters 120, no. 14 (April 4, 2022): 144102. http://dx.doi.org/10.1063/5.0092169.

Повний текст джерела

Анотація:

The scaling of the already matured complementary metal-oxide-semiconductor technology is steadily approaching its physical limit, motivating the quest for a suitable alternative. Cryogenic operation offers a promising pathway toward continued improvement in computing speed and energy efficiency without aggressive scaling. However, the memory wall bottleneck of the traditional von-Neumann architecture persists even at cryogenic temperature. That is where a compute-in-memory (CiM) architecture, which embeds computing within the memory unit, comes into play. Computations within the memory unit help to reduce the expensive data transfer between the memory and the computing units. Therefore, CiM provides extreme energy efficiency that can enable lower cooling cost at cryogenic temperature. In this work, we demonstrate CryoCiM, a cryogenic compute-in-memory framework utilizing a nonvolatile memory system based on the quantum anomalous Hall effect (QAHE). Our design can perform memory read/write and universal binary logic operations (NAND, NOR, and XOR). We custom design a peripheral circuit assembly that can perform the read/write and single-cycle in-memory logic operations. The utilization of a QAHE-based memory system promises robustness against process variations, through the usage of topologically protected resistive states for data storage. CryoCiM is a major step toward utilizing exclusively cryogenic phenomena to serve the dual purpose of storage and computation with ultra-low power (∼nano-watts) operations.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Redwan, Sadi M., Md Rashed-Al-Mahfuz, and Md Ekramul Hamid. "Recognizing Command Words using Deep Recurrent Neural Network for Both Acoustic and Throat Speech." European Journal of Information Technologies and Computer Science 3, no. 2 (May 22, 2023): 7–13. http://dx.doi.org/10.24018/compute.2023.3.2.88.

Повний текст джерела

Анотація:

The importance of speech command recognition in a human-machine interaction system is increased in recent years. In this study, we propose a deep neural network-based system for acoustic and throat command speech recognition. We apply a preprocessed pipeline to create the input of the deep learning model. Firstly, speech commands are decomposed into components using well-known signal decomposition techniques. The Mel-frequency cepstral coefficients (MFCC) feature extraction method is applied to each component of the speech commands to obtain the feature inputs for the recognition system. At this stage, we apply and compare performance using different speech decomposition techniques such as wavelet packet decomposition (WPD), continuous wavelet transform (CWT), and empirical mode decomposition (EMD) in order to find out the best technique for our model. We observe that WPD shows the best performance in terms of classification accuracy. This paper investigates long short-term memory (LSTM)-based recurrent neural network (RNN), which is trained using the extracted MFCC features. The proposed neural network is trained and tested using acoustic speech commands. Moreover, we also train and test the proposed model using a throat mic. speech commands as well. Lastly, the transfer learning technique is employed to increase the test accuracy for throat speech recognition. The weights of the model train with the acoustic signal are used to initialize the model used for throat speech recognition. Overall, we have found significant classification accuracy for both acoustic and throat command speech. We obtain LSTM is much better than the GMM-HMM model, convolutional neural networks such as CNN-tpool2 and residual networks such as res15 and res26 with an accuracy score of over 97% on Google’s Speech Commands dataset and we achieve 95.35% accuracy on our throat speech data set using the transfer learning technique.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Xu, Zheng Guang, Chen Chen, and Xu Hong Liu. "An Efficient View-Point Invariant Detector and Descriptor." Advanced Materials Research 659 (January 2013): 143–48. http://dx.doi.org/10.4028/www.scientific.net/amr.659.143.

Повний текст джерела

Анотація:

Many computer vision applications need keypoint correspondence between images under different view conditions. Generally speaking, traditional algorithms target applications with either good performance in invariance to affine transformation or speed of computation. Nowadays, the widely usage of computer vision algorithms on handle devices such as mobile phones and embedded devices with low memory and computation capability has proposed a target of making descriptors faster to computer and more compact while remaining robust to affine transformation and noise. To best address the whole process, this paper covers keypoint detection, description and matching. Binary descriptors are computed by comparing the intensities of two sampling points in image patches and they are matched by Hamming distance using an SSE 4.2 optimized popcount. In experiment results, we will show that our algorithm is fast to compute with lower memory usage and invariant to view-point change, blur change, brightness change, and JPEG compression.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Wan, Zhe, Tianyi Wang, Yiming Zhou, Subramanian S. Iyer, and Vwani P. Roychowdhury. "Accuracy and Resiliency of Analog Compute-in-Memory Inference Engines." ACM Journal on Emerging Technologies in Computing Systems 18, no. 2 (April 30, 2022): 1–23. http://dx.doi.org/10.1145/3502721.

Повний текст джерела

Анотація:

Recently, analog compute-in-memory (CIM) architectures based on emerging analog non-volatile memory (NVM) technologies have been explored for deep neural networks (DNNs) to improve scalability, speed, and energy efficiency. Such architectures, however, leverage charge conservation, an operation with infinite resolution, and thus are susceptible to errors. Thus, the inherent stochasticity in any analog NVM used to execute DNNs, will compromise performance. Several reports have demonstrated the use of analog NVM for CIM in a limited scale. It is unclear whether the uncertainties in computations will prohibit large-scale DNNs. To explore this critical issue of scalability, this article first presents a simulation framework to evaluate the feasibility of large-scale DNNs based on CIM architecture and analog NVM. Simulation results show that DNNs trained for high-precision digital computing engines are not resilient against the uncertainty of the analog NVM devices. To avoid such catastrophic failures, this article introduces the analog bi-scale representation for the DNN, and the Hessian-aware Stochastic Gradient Descent training algorithm to enhance the inference accuracy of trained DNNs. As a result of such enhancements, DNNs such as Wide ResNets for CIFAR-100 image recognition problem are demonstrated to have significant performance improvements in accuracy without adding cost to the inference hardware .

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Fung, Larry S. K., Mohammad O. Sindi, and Ali H. Dogru. "Multiparadigm Parallel Acceleration for Reservoir Simulation." SPE Journal 19, no. 04 (January 6, 2014): 716–25. http://dx.doi.org/10.2118/163591-pa.

Повний текст джерела

Анотація:

Summary With the advent of the multicore central-processing unit (CPU), today's commodity PC clusters are effectively a collection of interconnected parallel computers, each with multiple multicore CPUs and large shared random access memory (RAM), connected together by means of high-speed networks. Each computer, referred to as a compute node, is a powerful parallel computer on its own. Each compute node can be equipped further with acceleration devices such as the general-purpose graphical processing unit (GPGPU) to further speed up computational-intensive portions of the simulator. Reservoir-simulation methods that can exploit this heterogeneous hardware system can be used to solve very-large-scale reservoir-simulation models and run significantly faster than conventional simulators. Because typical PC clusters are essentially distributed share-memory computers, this suggests that the use of the mixed-paradigm parallelism (distributed-shared memory), such as message-passing interface and open multiprocessing (MPI-OMP), should work well for computational efficiency and memory use. In this work, we compare and contrast the single-paradigm programming models, MPI or OMP, with the mixed paradigm, MPI-OMP, programming model for a class of solver method that is suited for the different modes of parallelism. The results showed that the distributed memory (MPI-only) model has superior multicompute-node scalability, whereas the shared memory (OMP-only) model has superior parallel performance on a single compute node. The mixed MPI-OMP model and OMP-only model are more memory-efficient for the multicore architecture than the MPI-only model because they require less or no halo-cell storage for the subdomains. To exploit the fine-grain shared memory parallelism available on the GPGPU architecture, algorithms should be suited to the single-instruction multiple-data (SIMD) parallelism, and any recursive operations are serialized. In addition, solver methods and data store need to be reworked to coalesce memory access and to avoid shared memory-bank conflicts. Wherever possible, the cost of data transfer through the peripheral component interconnect express (PCIe) bus between the CPU and GPGPU needs to be hidden by means of asynchronous communication. We applied multiparadigm parallelism to accelerate compositional reservoir simulation on a GPGPU-equipped PC cluster. On a dual-CPU-dual-GPGPU compute node, the parallelized solver running on the dual-GPGPU Fermi M2090Q achieved up to 19 times speedup over the serial CPU (1-core) results and up to 3.7 times speedup over the parallel dual-CPU X5675 results in a mixed MPI + OMP paradigm for a 1.728-million-cell compositional model. Parallel performance shows a strong dependency on the subdomain sizes. Parallel CPU solve has a higher performance for smaller domain partitions, whereas GPGPU solve requires large partitions for each chip for good parallel performance. This is related to improved cache efficiency on the CPU for small subdomains and the loading requirement for massive parallelism on the GPGPU. Therefore, for a given model, the multinode parallel performance decreases for the GPGPU relative to the CPU as the model is further subdivided into smaller subdomains to be solved on more compute nodes. To illustrate this, a modified SPE5 (Killough and Kossack 1987) model with various grid dimensions was run to generate comparative results. Parallel performances for three field compositional models of various sizes and dimensions are included to further elucidate and contrast CPU-GPGPU single-node and multiple-node performances. A PC cluster with the Tesla M2070Q GPGPU and the 6-core Xeon X5675 Westmere was used to produce the majority of the reported results. Another PC cluster with the Tesla M2090Q GPGPU was available for some cases, and the results are reported for the modified SPE5 (Killough and Kossack 1987) problems for comparison.

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Wilde, D., and S. Rajopadhye. "Memory Reuse Analysis in the Polyhedral Model." Parallel Processing Letters 07, no. 02 (June 1997): 203–15. http://dx.doi.org/10.1142/s0129626497000218.

Повний текст джерела

Анотація:

In the context of developing a compiler for a ALPHA, a functional data-parallel language based on systems of affine recurrence equations (SAREs), we address the problem of transforming scheduled single-assignment code to multiple assignment code. We show how the polyhedral model allows us to statically compute the lifetimes of program variables, and thus enables us to derive necessary and sufficient conditions for reusing memory.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Bhaskar, Archana, and Rajeev Ranjan. "Optimized memory model for hadoop map reduce framework." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 5 (October 1, 2019): 4396. http://dx.doi.org/10.11591/ijece.v9i5.pp4396-4407.

Повний текст джерела

Анотація:

Map Reduce is the preferred computing framework used in large data analysis and processing applications. Hadoop is a widely used Map Reduce framework across different community due to its open source nature. Cloud service provider such as Microsoft azure HDInsight offers resources to its customer and only pays for their use. However, the critical challenges of cloud service provider is to meet user task Service level agreement (SLA) requirement (task deadline). Currently, the onus is on client to compute the amount of resource required to run a job on cloud. This work present a novel memory optimization model for Hadoop Map Reduce framework namely MOHMR (Optimized Hadoop Map Reduce) to process data in real-time and utilize system resource efficiently. The MOHMR present accurate model to compute job memory optimization and also present a model to provision the amount of cloud resource required to meet task deadline. The MOHMR first build a profile for each job and computes memory optimization time of job using greedy approach. Experiment are conducted on Microsoft Azure HDInsight cloud platform considering different application such as text computing and bioinformatics application to evaluate performance of MOHMR of over existing model shows significant performance improvement in terms of computation time. Experiment are conducted on Microsoft Azure HDInsight cloud. Overall, good correlation is reported between practical memory optimization values and theoretical memory optimization values.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Luo, Yandong, and Shimeng Yu. "AILC: Accelerate On-Chip Incremental Learning With Compute-in-Memory Technology." IEEE Transactions on Computers 70, no. 8 (August 1, 2021): 1225–38. http://dx.doi.org/10.1109/tc.2021.3053199.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Yu, Shimeng, Hongwu Jiang, Shanshi Huang, Xiaochen Peng, and Anni Lu. "Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects." IEEE Circuits and Systems Magazine 21, no. 3 (2021): 31–56. http://dx.doi.org/10.1109/mcas.2021.3092533.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Spetalnick, Samuel, and Arijit Raychowdhury. "A Practical Design-Space Analysis of Compute-in-Memory With SRAM." IEEE Transactions on Circuits and Systems I: Regular Papers 69, no. 4 (April 2022): 1466–79. http://dx.doi.org/10.1109/tcsi.2021.3138057.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Thirumala, Sandeep Krishna, Yi-Tse Hung, Shubham Jain, Arnab Raha, Niharika Thakuria, Vijay Raghunathan, Anand Raghunathan, Zhihong Chen, and Sumeet Gupta. "Valley-Coupled-Spintronic Non-Volatile Memories With Compute-In-Memory Support." IEEE Transactions on Nanotechnology 19 (2020): 635–47. http://dx.doi.org/10.1109/tnano.2020.3012550.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Yu, Shimeng. "Special Topic on Exploratory Devices and Circuits for Compute-in-Memory." IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 6, no. 1 (June 2020): iii—iv. http://dx.doi.org/10.1109/jxcdc.2020.3001859.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Jung, Daejin, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. "Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping." IEEE Computer Architecture Letters 17, no. 1 (January 1, 2018): 72–75. http://dx.doi.org/10.1109/lca.2017.2773055.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Seo, Jae-Sun. "Special Topic on Energy-Efficient Compute-in-Memory With Emerging Devices." IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 8, no. 2 (December 2022): iii—v. http://dx.doi.org/10.1109/jxcdc.2022.3231764.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

23

HUBRECHTS, HENDRIK. "MEMORY EFFICIENT HYPERELLIPTIC CURVE POINT COUNTING." International Journal of Number Theory 07, no. 01 (February 2011): 203–14. http://dx.doi.org/10.1142/s1793042111004034.

Повний текст джерела

Анотація:

In recent algorithms that use deformation in order to compute the number of points on varieties over a finite field, certain differential equations of matrices over p-adic fields emerge. We present a novel strategy to solve this kind of equations in a memory efficient way. The main application is an algorithm requiring quasi-cubic time and only quadratic memory in the parameter n, that solves the following problem: for E a hyperelliptic curve of genus g over a finite field of extension degree n and small characteristic, compute its zeta function. This improves substantially upon Kedlaya's result which has the same quasi-cubic time asymptotic, but requires also cubic memory size.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Wu, Chenyuan, Mohammad Javad Amiri, Jared Asch, Heena Nagda, Qizhen Zhang, and Boon Thau Loo. "FlexChain." Proceedings of the VLDB Endowment 16, no. 1 (September 2022): 23–36. http://dx.doi.org/10.14778/3561261.3561264.

Повний текст джерела

Анотація:

While permissioned blockchains enable a family of data center applications, existing systems suffer from imbalanced loads across compute and memory, exacerbating the underutilization of cloud resources. This paper presents FlexChain , a novel permissioned blockchain system that addresses this challenge by physically disaggregating CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation allows blockchain service providers to upgrade and expand hardware resources independently to support a wide range of smart contracts with diverse CPU and memory demands. Moreover, it ensures efficient resource utilization and hence prevents resource fragmentation in a data center. We have explored the design of XOV blockchain systems in a disaggregated fashion and developed a tiered key-value store that can elastically scale its memory and storage. Our design significantly speeds up the execution stage. We have also leveraged several techniques to parallelize the validation stage in FlexChain to further improve the overall blockchain performance. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Harshavardhan, K. S. "Programming in OpenCL and its advantages in a GPU Framework." International Journal for Research in Applied Science and Engineering Technology 10, no. 7 (July 31, 2022): 3739–43. http://dx.doi.org/10.22214/ijraset.2022.45835.

Повний текст джерела

Анотація:

Abstract: OpenCL is a framework used for building applications that mostly run on heterogenous platforms containing CPUs, GPUs, and DSPs. OpenCL provides an interface for parallel programming that can be used to take advantage of the GPUs high parallel computing power. Programmers who need complete control over the parallelization process and who are required to write portable heterogeneous code mostly use OpenCL. OpenCL views a processing unit as a collection of compute units which in turn are made up work items. According to OpenCL’s workflow and memory hierarchy, each work item is a thread as far in terms of control and memory model. A collection of work items is called a work group which is mapped to a compute unit. The language used to write “compute kernels” is called kernel language. OpenCL uses C/C++ to carry over the kernel computations done on the device. The host code specifies the kernel specifications that is needed for the computation of the device which includes creating buffers, calling kernels, mapping the memory back to CPU from device, etc. OpenCL also has specific optimization techniques that helps improve parallelization while computing on a GPU which results in better performance numbers

Стилі APA, Harvard, Vancouver, ISO та ін.

26

AKL, SELIM G. "THREE COUNTEREXAMPLES TO DISPEL THE MYTH OF THE UNIVERSAL COMPUTER." Parallel Processing Letters 16, no. 03 (September 2006): 381–403. http://dx.doi.org/10.1142/s012962640600271x.

Повний текст джерела

Анотація:

It is shown that the concept of a Universal Computer cannot be realized. Specifically, instances of a computable function [Formula: see text] are exhibited that cannot be computed on any machine [Formula: see text] that is capable of only a finite and fixed number of operations per step. This remains true even if the machine [Formula: see text] is endowed with an infinite memory and the ability to communicate with the outside world while it is attempting to compute [Formula: see text]. It also remains true if, in addition, [Formula: see text] is given an indefinite amount of time to compute [Formula: see text]. This result applies not only to idealized models of computation, such as the Turing Machine and the like, but also to all known general-purpose computers, including existing conventional computers (both sequential and parallel), as well as contemplated unconventional ones such as biological and quantum computers. Even accelerating machines (that is, machines that increase their speed at every step) cannot be universal.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Suresh, Naveen, Neelesh Chinnakonda Ashok Kumar, Srikumar Subramanian, and Gowri Srinivasa. "Memory augmented recurrent neural networks for de-novo drug design." PLOS ONE 17, no. 6 (June 23, 2022): e0269461. http://dx.doi.org/10.1371/journal.pone.0269461.

Повний текст джерела

Анотація:

A recurrent neural network (RNN) is a machine learning model that learns the relationship between elements of an input series, in addition to inferring a relationship between the data input to the model and target output. Memory augmentation allows the RNN to learn the interrelationships between elements of the input over a protracted length of the input series. Inspired by the success of stack augmented RNN (StackRNN) to generate strings for various applications, we present two memory augmented RNN-based architectures: the Neural Turing Machine (NTM) and the Differentiable Neural Computer (DNC) for the de-novo generation of small molecules. We trained a character-level convolutional neural network (CNN) to predict the properties of a generated string and compute a reward or loss in a deep reinforcement learning setup to bias the Generator to produce molecules with the desired property. Further, we compare the performance of these architectures to gain insight to their relative merits in terms of the validity and novelty of the generated molecules and the degree of property bias towards the computational generation of de-novo drugs. We also compare the performance of these architectures with simpler recurrent neural networks (Vanilla RNN, LSTM, and GRU) without an external memory component to explore the impact of augmented memory in the task of de-novo generation of small molecules.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Choe, Gihun, and Shimeng Yu. "(Invited) Impact of Polarization Variation on Ferroelectric Field-Effect Transistor and Compute-in-Memory." ECS Transactions 109, no. 4 (September 30, 2022): 73–85. http://dx.doi.org/10.1149/10904.0073ecst.

Повний текст джерела

Анотація:

The discovery of Hafnia-based ferroelectric materials made ferroelectric field-effect transistor (FeFET) a promising nonvolatile memory device and enables aggressive scaling down. However, the ferroelectric layer possesses polarization variation (PV) induced by its crystallinity, whereby FeFET suffers from performance variation. Hence, it is critical to assess its influence quantitatively to utilize the FeFET for storage or compute-in-memory applications. In this review, recent trends and progress of the performance variation on the FeFETs are surveyed. We present the impact of PV on three-dimensional (3D) FeFETs. In addition, to show its capability for compute-in-memory application, the inference accuracy is discussed under the structures. Next, the Voronoi diagram is introduced to model the different sizes and shapes of ferroelectric grains. A comparative study of the device variability with other sources is investigated under different technology nodes. Finally, a machine learning-aided methodology to analyze the variability of FeFET based on the metrology data is proposed.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

Lu, Anni, Xiaochen Peng, Yandong Luo, and Shimeng Yu. "Benchmark of the Compute-in-Memory-Based DNN Accelerator With Area Constraint." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, no. 9 (September 2020): 1945–52. http://dx.doi.org/10.1109/tvlsi.2020.3001526.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Floréen, Patrik, and Pekka Orponen. "Attraction Radii in Binary Hopfield Nets are Hard to Compute." Neural Computation 5, no. 5 (September 1993): 812–21. http://dx.doi.org/10.1162/neco.1993.5.5.812.

Повний текст джерела

Анотація:

We prove that it is an NP-hard problem to determine the attraction radius of a stable vector in a binary Hopfield memory network, and even that the attraction radius is hard to approximate. Under synchronous updating, the problems are already NP-hard for two-step attraction radii; direct (one-step) attraction radii can be computed in polynomial time.

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Zhang, Yingqiang, Chaoyi Ruan, Cheng Li, Xinjun Yang, Wei Cao, Feifei Li, Bo Wang, et al. "Towards cost-effective and elastic cloud database deployment via memory disaggregation." Proceedings of the VLDB Endowment 14, no. 10 (June 2021): 1900–1912. http://dx.doi.org/10.14778/3467861.3467877.

Повний текст джерела

Анотація:

It is challenging for cloud-native relational databases to meet the ever-increasing needs of scaling compute and memory resources independently and elastically. The recent emergence of memory disaggregation architecture, relying on high-speed RDMA network, offers opportunities to build cost-effective and elastic cloud-native databases. There exist proposals to let unmodified applications run transparently on disaggregated systems. However, running relational database kernel atop such proposals experiences notable performance degradation and time-consuming failure recovery, offsetting the benefits of disaggregation. To address these challenges, in this paper, we propose a novel database architecture called LegoBase, which explores the co-design of database kernel and memory disaggregation. It pushes the memory management back to the database layer for bypassing the Linux I/O stack and re-using or designing (remote) memory access optimizations with an understanding of data access patterns. LegoBase further splits the conventional ARIES fault tolerance protocol to independently handle the local and remote memory failures for fast recovery of compute instances. We implemented LegoBase atop MySQL. We compare LegoBase against MySQL running on a standalone machine and the state-of-the-art disaggregation proposal Infiniswap. Our evaluation shows that even with a large fraction of data placed on the remote memory, LegoBase's system performance in terms of throughput (up to 9.41% drop) and P99 latency (up to 11.58% increase) is comparable to the monolithic MySQL setup, and significantly outperforms (1.99x-2.33x, respectively) the deployment of MySQL over Infiniswap. Meanwhile, LegoBase introduces an up to 3.87x and 5.48x speedup of the recovery and warm-up time, respectively, over the monolithic MySQL and MySQL over Infiniswap, when handling failures or planned re-configurations.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Ajani, Taiwo Samuel, Agbotiname Lucky Imoize, and Aderemi A. Atayero. "An Overview of Machine Learning within Embedded and Mobile Devices–Optimizations and Applications." Sensors 21, no. 13 (June 28, 2021): 4412. http://dx.doi.org/10.3390/s21134412.

Повний текст джерела

Анотація:

Embedded systems technology is undergoing a phase of transformation owing to the novel advancements in computer architecture and the breakthroughs in machine learning applications. The areas of applications of embedded machine learning (EML) include accurate computer vision schemes, reliable speech recognition, innovative healthcare, robotics, and more. However, there exists a critical drawback in the efficient implementation of ML algorithms targeting embedded applications. Machine learning algorithms are generally computationally and memory intensive, making them unsuitable for resource-constrained environments such as embedded and mobile devices. In order to efficiently implement these compute and memory-intensive algorithms within the embedded and mobile computing space, innovative optimization techniques are required at the algorithm and hardware levels. To this end, this survey aims at exploring current research trends within this circumference. First, we present a brief overview of compute intensive machine learning algorithms such as hidden Markov models (HMM), k-nearest neighbors (k-NNs), support vector machines (SVMs), Gaussian mixture models (GMMs), and deep neural networks (DNNs). Furthermore, we consider different optimization techniques currently adopted to squeeze these computational and memory-intensive algorithms within resource-limited embedded and mobile environments. Additionally, we discuss the implementation of these algorithms in microcontroller units, mobile devices, and hardware accelerators. Conclusively, we give a comprehensive overview of key application areas of EML technology, point out key research directions and highlight key take-away lessons for future research exploration in the embedded machine learning domain.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Pleiter, Dirk. "HPC Systems in the Next Decade – What to Expect, When, Where." EPJ Web of Conferences 245 (2020): 11004. http://dx.doi.org/10.1051/epjconf/202024511004.

Повний текст джерела

Анотація:

HPC systems have seen impressive growth in terms of performance over a period of many years. Soon the next milestone is expected to be reached with the deployment of exascale systems in 2021. In this paper, we provide an overview of the exascale challenges from a computer architecture’s perspective and explore technological and other constraints. The analysis of upcoming architectural options and emerging technologies allow for setting expectations for application developers, which will have to cope with heterogeneous architectures, increasingly diverse compute technologies as well as deeper memory and storage hierarchies. Finally, needs resulting from changing science and engineering workflows will be discussed, which need to be addressed by making HPC systems available as part of more open e-infrastructures that provide also other compute and storage services.

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Marinescu, Radu, Akihiro Kishimoto, Adi Botea, Rina Dechter, and Alexander Ihler. "Anytime Recursive Best-First Search for Bounding Marginal MAP." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7924–32. http://dx.doi.org/10.1609/aaai.v33i01.33017924.

Повний текст джерела

Анотація:

Marginal MAP is a difficult mixed inference task for graphical models. Existing state-of-the-art solvers for this task are based on a hybrid best-first and depth-first search scheme that allows them to compute upper and lower bounds on the optimal solution value in an anytime fashion. These methods however are memory intensive schemes (via the best-first component) and do not have an efficient memory management mechanism. For this reason, they are often less effective in practice, especially on difficult problem instances with very large search spaces. In this paper, we introduce a new recursive best-first search based bounding scheme that operates efficiently within limited memory and computes anytime upper and lower bounds that improve over time. An empirical evaluation demonstrates the effectiveness of our proposed approach against current solvers.

Стилі APA, Harvard, Vancouver, ISO та ін.

35

AUSIELLO, GIORGIO, ANDREA RIBICHINI, PAOLO G. FRANCIOSA, and GIUSEPPE F. ITALIANO. "COMPUTING GRAPH SPANNERS IN SMALL MEMORY: FAULT-TOLERANCE AND STREAMING." Discrete Mathematics, Algorithms and Applications 02, no. 04 (December 2010): 591–605. http://dx.doi.org/10.1142/s1793830910000905.

Повний текст джерела

Анотація:

Let G be an undirected graph with m edges and n vertices. A spanner of G is a subgraph which preserves approximate distances between all pairs of vertices. An f-vertex fault-tolerant spanner is a subgraph which preserves approximate distances, under the failure of any set of at most f vertices. The contribution of this paper is twofold: we present algorithms for computing fault-tolerant spanners, and propose streaming algorithms for computing spanners in very small internal memory. In particular, we give algorithms for computing f-vertex fault-tolerant (3,2)- and (2,1)-spanners of G with the following bounds: our (3,2)-spanner contains O(f4/3n4/3) edges and can be computed in time Õ(f2m), while our(2, 1)-spanner contains O(fn3/2) edges and can be computed in time [Formula: see text]. Both algorithms improve significantly on previously known bounds. Assume that the graph G is presented as an input stream of edges, which may appear in any arbitrary order, and that we do not know in advance m and n. We show how to compute efficiently (3, 2)- and (2, 1)-spanners of G, using only very small internal memory and as low access external memory device. Our spanners have asymptotically optimal size and the I/O complexity of our algorithms for computing such spanners is optimal upto apolylogarithmic factor. Our f-vertex fault-tolerant (3, 2)- and (2, 1)-spanners can also be computed efficiently in the same computational model described above.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Álvarez-Bueno, Celia, Vicente Martínez-Vizcaíno, Estela Jiménez López, María Eugenia Visier-Alfonso, Andrés Redondo-Tébar, and Iván Cavero-Redondo. "Comparative Effect of Low-Glycemic Index versus High-Glycemic Index Breakfasts on Cognitive Function: A Systematic Review and Meta-Analysis." Nutrients 11, no. 8 (July 24, 2019): 1706. http://dx.doi.org/10.3390/nu11081706.

Повний текст джерела

Анотація:

This systematic review and meta-analysis aims to compare the effect of High-Glycemic Index (GI) versus Low-GI breakfasts on cognitive functions, including memory and attention, of children and adolescents. We systematically searched the MEDLINE (via PubMed), EMBASE, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and Web of Science databases, from their inception until June 2019. Articles comparing the effect of Low-GI versus High-GI breakfasts on the cognitive function (i.e., immediate memory, delayed memory, and attention) of children and adolescents were included. The DerSimonian and Laird method was used to compute the pooled effect sizes (ESs) and their respective 95% confidence intervals (CIs). The pooled ESs were 0.13 (95% CI: −0.11, 0.37) for immediate memory and 0.07 (95% CI: −0.15, 0.28) for delayed memory. For attention, the pooled ES was −0.01 (95% CI: −0.27, 0.26). In summary, GI breakfasts do not affect cognitive domains in children and adolescents.

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Huang, Shanshi, Hongwu Jiang, Xiaochen Peng, Wantong Li, and Shimeng Yu. "Secure XOR-CIM Engine: Compute-In-Memory SRAM Architecture With Embedded XOR Encryption." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, no. 12 (December 2021): 2027–39. http://dx.doi.org/10.1109/tvlsi.2021.3120296.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Choe, Gihun, Anni Lu, and Shimeng Yu. "3D AND-Type Ferroelectric Transistors for Compute-in-Memory and the Variability Analysis." IEEE Electron Device Letters 43, no. 2 (February 2022): 304–7. http://dx.doi.org/10.1109/led.2021.3139574.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Peters, Adaranijo, George Oikonomou, and Georgios Zervas. "In Compute/Memory Dynamic Packet/Circuit Switch Placement for Optically Disaggregated Data Centers." Journal of Optical Communications and Networking 10, no. 7 (June 29, 2018): B164. http://dx.doi.org/10.1364/jocn.10.00b164.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Zhang, Zhixiao, Xin Si, Srivatsa Srinivasa, Akshay Krishna Ramanathan, and Meng-Fan Chang. "Recent Advances in Compute-in-Memory Support for SRAM Using Monolithic 3-D Integration." IEEE Micro 39, no. 6 (November 1, 2019): 28–37. http://dx.doi.org/10.1109/mm.2019.2946489.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Kim, Dong-Hwan, Su-Yong Lee, Yonggi Jo, Duk Y. Kim, Zaeill Kim, and Taek Jeong. "A Method to Compute the Schrieffer–Wolff Generator for Analysis of Quantum Memory." Entropy 23, no. 10 (September 27, 2021): 1260. http://dx.doi.org/10.3390/e23101260.

Повний текст джерела

Анотація:

Quantum illumination uses entangled light that consists of signal and idler modes to achieve higher detection rate of a low-reflective object in noisy environments. The best performance of quantum illumination can be achieved by measuring the returned signal mode together with the idler mode. Thus, it is necessary to prepare a quantum memory that can keep the idler mode ideal. To send a signal towards a long-distance target, entangled light in the microwave regime is used. There was a recent demonstration of a microwave quantum memory using microwave cavities coupled with a transmon qubit. We propose an ordering of bosonic operators to efficiently compute the Schrieffer–Wolff transformation generator to analyze the quantum memory. Our proposed method is applicable to a wide class of systems described by bosonic operators whose interaction part represents a definite number of transfer in quanta.

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Giannoula, Christina, Kailong Huang, Jonathan Tang, Nectarios Koziris, Georgios Goumas, Zeshan Chishti, and Nandita Vijaykumar. "Architectural Support for Efficient Data Movement in Fully Disaggregated Systems." ACM SIGMETRICS Performance Evaluation Review 51, no. 1 (June 26, 2023): 5–6. http://dx.doi.org/10.1145/3606376.3593533.

Повний текст джерела

Анотація:

Traditional data centers include monolithic servers that tightly integrate CPU, memory and disk (Figure 1a). Instead, Disaggregated Systems (DSs) [8, 13, 18, 27] organize multiple compute (CC), memory (MC) and storage devices as independent, failure-isolated components interconnected over a high-bandwidth network (Figure 1b). DSs can greatly reduce data center costs by providing improved resource utilization, resource scaling, failure-handling and elasticity in modern data centers [5, 8-10, 10, 11, 13, 18, 27] The MCs provide large pools of main memory (remote memory), while the CCs include the on-chip caches and a few GBs of DRAM (local memory) that acts as a cache of remote memory. In this context, a large fraction of the application's data (~ 80%) [8, 18, 27] is located in remote memory, and can cause large performance penalties from remotely accessing data over the network.

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Neider, Daniel, Alexander Weinert, and Martin Zimmermann. "Synthesizing optimally resilient controllers." Acta Informatica 57, no. 1-2 (October 31, 2019): 195–221. http://dx.doi.org/10.1007/s00236-019-00345-7.

Повний текст джерела

Анотація:

Abstract Recently, Dallal, Neider, and Tabuada studied a generalization of the classical game-theoretic model used in program synthesis, which additionally accounts for unmodeled intermittent disturbances. In this extended framework, one is interested in computing optimally resilient strategies, i.e., strategies that are resilient against as many disturbances as possible. Dallal, Neider, and Tabuada showed how to compute such strategies for safety specifications. In this work, we compute optimally resilient strategies for a much wider range of winning conditions and show that they do not require more memory than winning strategies in the classical model. Our algorithms only have a polynomial overhead in comparison to the ones computing winning strategies. In particular, for parity conditions, optimally resilient strategies are positional and can be computed in quasipolynomial time.

Стилі APA, Harvard, Vancouver, ISO та ін.

44

Teršek, Matija, Lojze Žust, and Matej Kristan. "eWaSR—An Embedded-Compute-Ready Maritime Obstacle Detection Network." Sensors 23, no. 12 (June 7, 2023): 5386. http://dx.doi.org/10.3390/s23125386.

Повний текст джерела

Анотація:

Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper, we analyze the current best-performing maritime obstacle detection network, WaSR. Based on the analysis, we then propose replacements for the most computationally intensive stages and propose its embedded-compute-ready variant, eWaSR. In particular, the new design follows the most recent advancements of transformer-based lightweight networks. eWaSR achieves comparable detection results to state-of-the-art WaSR with only a 0.52% F1 score performance drop and outperforms other state-of-the-art embedded-ready architectures by over 9.74% in F1 score. On a standard GPU, eWaSR runs 10× faster than the original WaSR (115 FPS vs. 11 FPS). Tests on a real embedded sensor OAK-D show that, while WaSR cannot run due to memory restrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the first practical embedded-compute-ready maritime obstacle detection network. The source code and trained eWaSR models are publicly available.

Стилі APA, Harvard, Vancouver, ISO та ін.

45

Ding, Yifan, Nicholas Botzer, and Tim Weninger. "HetSeq: Distributed GPU Training on Heterogeneous Infrastructure." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 15432–38. http://dx.doi.org/10.1609/aaai.v35i17.17813.

Повний текст джерела

Анотація:

Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory capacity and compute performance. Unfortunately, most organizations, especially universities, have a piecemeal approach to purchasing computer systems resulting in a heterogeneous infrastructure, which cannot be used to compute large models. The present work describes HetSeq, a software package adapted from the popular PyTorch package that provides the capability to train large neural network models on heterogeneous infrastructure. Experiments with language translation, text and image classification shows that HetSeq scales over heterogeneous systems. Additional information, support documents, source code are publicly available at https://github.com/yifding/hetseq.

Стилі APA, Harvard, Vancouver, ISO та ін.

46

Choe, Gihun, and Shimeng Yu. "(Invited) Impact of Polarization Variation on Ferroelectric Field-Effect Transistor and Compute-in-Memory." ECS Meeting Abstracts MA2022-02, no. 32 (October 9, 2022): 1184. http://dx.doi.org/10.1149/ma2022-02321184mtgabs.

Повний текст джерела

Анотація:

The discovery of Hafnia-based ferroelectric materials made ferroelectric field-effect transistor (FeFET) a more promising nonvolatile memory device than ever. Compared to the perovskite-based ferroelectrics, it could have a larger coercive field, better compatibility with CMOS fabrication, and good scalability. The importance of shrinking down FeFET cannot be overemphasized in the same manner as the CMOS. However, scaling down devices results in device-to-device variation. In the case of FeFET, the deposited ferroelectric layer possesses polarization variation (PV) induced by its crystallinity, whereby FeFET suffers from performance variation. Hence, it is critical to assess its influence quantitatively to utilize the FeFET for storage or compute-in-memory applications. In this review, recent trend and progress of the performance variation on the FeFET are surveyed. First, we present the impact of PV on three-dimensional (3D) NAND FeFET and 3D AND FeFET. In addition, to show its capability for compute-in-memory application, the inference accuracy is discussed under the structure. Second, the Voronoi diagram is introduced to model the different sizes and shapes of ferroelectric grains. Third, a comparative study of the device variability by different sources is investigated under the 28 nm to 7 nm technology node. Last, a machine learning-aided methodology to analyze the variability of FeFET based on the metrology results is proposed.

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Pommerening, Florian, and Malte Helmert. "Incremental LM-Cut." Proceedings of the International Conference on Automated Planning and Scheduling 23 (June 2, 2013): 162–70. http://dx.doi.org/10.1609/icaps.v23i1.13560.

Повний текст джерела

Анотація:

In heuristic search and especially in optimal classical planning the computation of accurate heuristic values can take up the majority of runtime. In many cases, the heuristic computations for a search node and its successors are very similar, leading to significant duplication of effort. For example most landmarks of a node that are computed by the LM-cut algorithm are also landmarks for the node's successors. We propose to reuse these landmarks and incrementally compute new ones to speed up the LM-cut calculation. The speed advantage obtained by incremental computation is offset by higher memory usage. We investigate different search algorithms that reduce memory usage without sacrificing the faster computation, leading to a substantial increase in coverage for benchmark domains from the International Planning Competitions.

Стилі APA, Harvard, Vancouver, ISO та ін.

48

Roijers, Diederik Marijn, Shimon Whiteson, and Frans A. Oliehoek. "Computing Convex Coverage Sets for Faster Multi-objective Coordination." Journal of Artificial Intelligence Research 52 (March 31, 2015): 399–443. http://dx.doi.org/10.1613/jair.4550.

Повний текст джерела

Анотація:

In this article, we propose new algorithms for multi-objective coordination graphs (MO-CoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of problems, it also has important characteristics that facilitate more efficient solutions. We propose two main algorithms for computing a CCS in MO-CoGs. Convex multi-objective variable elimination (CMOVE) computes a CCS by performing a series of agent eliminations, which can be seen as solving a series of local multi-objective subproblems. Variable elimination linear support (VELS) iteratively identifies the single weight vector, w, that can lead to the maximal possible improvement on a partial CCS and calls variable elimination to solve a scalarized instance of the problem for w. VELS is faster than CMOVE for small and medium numbers of objectives and can compute an ε-approximate CCS in a fraction of the runtime. In addition, we propose variants of these methods that employ AND/OR tree search instead of variable elimination to achieve memory efficiency. We analyze the runtime and space complexities of these methods, prove their correctness, and compare them empirically against a naive baseline and an existing PCS method, both in terms of memory-usage and runtime. Our results show that, by focusing on the CCS, these methods achieve much better scalability in the number of agents than the current state of the art.

Стилі APA, Harvard, Vancouver, ISO та ін.

49

Minhas, Umar Ibrahim, Roger Woods, and Georgios Karakonstantis. "Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs." Journal of Signal Processing Systems 93, no. 5 (February 13, 2021): 587–602. http://dx.doi.org/10.1007/s11265-020-01633-z.

Повний текст джерела

Анотація:

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Стилі APA, Harvard, Vancouver, ISO та ін.

50

Lutteropp, Sarah, Alexey M. Kozlov, and Alexandros Stamatakis. "A fast and memory-efficient implementation of the transfer bootstrap." Bioinformatics 36, no. 7 (November 22, 2019): 2280–81. http://dx.doi.org/10.1093/bioinformatics/btz874.

Повний текст джерела

Анотація:

Abstract Motivation Recently, Lemoine et al. suggested the transfer bootstrap expectation (TBE) branch support metric as an alternative to classical phylogenetic bootstrap support for taxon-rich datasets. However, the original TBE implementation in the booster tool is compute- and memory-intensive. Results We developed a fast and memory-efficient TBE implementation. We improve upon the original algorithm by Lemoine et al. via several algorithmic and technical optimizations. On empirical as well as on random tree sets with varying taxon counts, our implementation is up to 480 times faster than booster. Furthermore, it only requires memory that is linear in the number of taxa, which leads to 10× to 40× memory savings compared with booster. Availability and implementation Our implementation has been partially integrated into pll-modules and RAxML-NG and is available under the GNU Affero General Public License v3.0 at https://github.com/ddarriba/pll-modules and https://github.com/amkozlov/raxml-ng. The parallel version that also computes additional TBE-related statistics is available at: https://github.com/lutteropp/raxml-ng/tree/tbe. Supplementary information Supplementary data are available at Bioinformatics online.

Стилі APA, Harvard, Vancouver, ISO та ін.

Статті в журналах з теми "In-memory compute"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями