Log in

Relevant bibliographies by topics / Cache memory / Journal articles

Journal articles on the topic 'Cache memory'

To see the other types of publications on this topic, follow the link: Cache memory.

Author: Grafiati

Published: 4 June 2021

Last updated: 31 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Cache memory.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Verbeek, N. A. M. "Food cache recovery by Northwestern Crows (Corvus caurinus)." Canadian Journal of Zoology 75, no. 8 (August 1, 1997): 1351–56. http://dx.doi.org/10.1139/z97-760.

Full text

Abstract:

This field study examined experimentally whether Northwestern Crows (Corvus caurinus) used random search or memory to relocate food caches. The crows cached food items in the ground, one per cache, and covered the cache before leaving the site. Most caches were recovered within 24 h. The crows found caches made by me 1 m from their own caches significantly less often than they found their own caches. Replacing the covering of the cache with material other than the crows used did not significantly affect recovery success, but the crows found significantly fewer of their caches when the latter were experimentally moved 15 cm. Adding a 25 cm long stick to the site 15 cm from the cache significantly decreased a crow's ability to relocate its cache, but not when it was placed 30 cm away. A 50 cm long stick placed 15 or 30 cm away had the same negative effect on a crow's ability to relocate its cache, but not when it was placed 45 cm away. When memory is used, recovery success can be as high as 99%; when random search is used, it can be as low as 6%.

APA, Harvard, Vancouver, ISO, and other styles

2

Bednekoff, Peter A., and Russell P. Balda. "Social Caching and Observational Spatial Memory in Pinyon Jays." Behaviour 133, no. 11-12 (1996): 807–26. http://dx.doi.org/10.1163/156853996x00251.

Full text

Abstract:

In the wild, pinyon jays (Gymnorhinus cyanocephalus) live in large, integrated flocks and cache tens of thousands of seeds per year. This study explored social aspects of caching and recovering by pinyon jays. In Experiment 1, birds cached in a large experimental room under three conditions: alone, paired with a dominant, and paired with a subordinate. In all cases, birds recovered caches alone seven days later. Individuals ate more seeds before caching when alone than when paired and started caching sooner when subordinate than when dominant. Pinyon jays accurately returned to sites containing their own caches but not to sites containing caches made by partner birds. However, they went to areas containing partner caches sooner than would be expected, indicating memory for the general areas containing caches made by other pinyon jays. In Experiments 2 and 3 birds were placed closer to each other and allowed to recover one or two days after caching. In Experiment 2, both free-flying and caged observers found caches with accuracies above chance. Cachers made significantly fewer errors than observers. During Experiment 3, caged observers saw the cachers recover some seeds one day after they were cached. On the next day cachers and observers were separately allowed to visit all cache sites. Both cachers and observers performed accurately and did not differ in accuracy. Neither group discriminated between extant and depleted caches. Observational spatial memory in pinyon jays may allow economical cache robbery by wild pinyon jays under some circumstances, particularly shortly after caches are created.

APA, Harvard, Vancouver, ISO, and other styles

3

DRACH, N., A. GEFFLAUT, P. JOUBERT, and A. SEZNEC. "ABOUT CACHE ASSOCIATIVITY IN LOW-COST SHARED MEMORY MULTI-MICROPROCESSORS." Parallel Processing Letters 05, no. 03 (September 1995): 475–87. http://dx.doi.org/10.1142/s0129626495000436.

Full text

Abstract:

Sizes of on-chip caches on current commercial microprocessors range from 16 Kbytes to 36 Kbytes. These microprocessors can be directly used in the design of a low cost single-bus shared memory multiprocessors without using any second-level cache. In this paper, we explore the viability of such a multi-microprocessor. Simulations results clearly establish that performance of such a system will be quite poor if on-chip caches are direct-mapped. On the other hand, when the on-chip caches are partially associative, the achieved level of performance is quite promising. In particular, two recently proposed innovative cache structures, the skewed-associative cache organization and the semi-unified cache organization are shown to work fine.

APA, Harvard, Vancouver, ISO, and other styles

4

Zhu, Wei, and Xiaoyang Zeng. "Decision Tree-Based Adaptive Reconfigurable Cache Scheme." Algorithms 14, no. 6 (June 1, 2021): 176. http://dx.doi.org/10.3390/a14060176.

Full text

Abstract:

Applications have different preferences for caches, sometimes even within the different running phases. Caches with fixed parameters may compromise the performance of a system. To solve this problem, we propose a real-time adaptive reconfigurable cache based on the decision tree algorithm, which can optimize the average memory access time of cache without modifying the cache coherent protocol. By monitoring the application running state, the cache associativity is periodically tuned to the optimal cache associativity, which is determined by the decision tree model. This paper implements the proposed decision tree-based adaptive reconfigurable cache in the GEM5 simulator and designs the key modules using Verilog HDL. The simulation results show that the proposed decision tree-based adaptive reconfigurable cache reduces the average memory access time compared with other adaptive algorithms.

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Ming Qian, Jie Tao Diao, Nan Li, Xi Wang, and Kai Bu. "A Study on Reconfiguring On-Chip Cache with Non-Volatile Memory." Applied Mechanics and Materials 644-650 (September 2014): 3421–25. http://dx.doi.org/10.4028/www.scientific.net/amm.644-650.3421.

Full text

Abstract:

NVM has become a promising technology to partly replace SRAM as on-chip cache and reduce the gap between the core and cache. To take all advantages of NVM and SRAM, we propose a Hybrid Cache, constructing on-chip cache hierarchies with different technologies. As shown in article, hybrid cache performance and power consumption of Hybrid Cache have a large advantage over caches base on single technologies. In addition, we have shown some other methods that can optimize the performance of hybrid cache.

APA, Harvard, Vancouver, ISO, and other styles

6

Prihozhy, A. A. "Simulation of direct mapped, k-way and fully associative cache on all pairs shortest paths algorithms." «System analysis and applied information science», no. 4 (December 30, 2019): 10–18. http://dx.doi.org/10.21122/2309-4923-2019-4-10-18.

Full text

Abstract:

Caches are intermediate level between fast CPU and slow main memory. It aims to store copies of frequently used data and to reduce the access time to the main memory. Caches are capable of exploiting temporal and spatial localities during program execution. When the processor accesses memory, the cache behavior depends on if the data is in cache: a cache hit occurs if it is, and, a cache miss occurs, otherwise. In the last case, the cache may have to evict other data. The misses produce processor stalls and slow down the computations. The replacement policy chooses a data to evict, trying to predict the future accesses to memory. The hit and miss rate depends on the cache type: direct mapped, set associative and fully associative cache. The least recently used replacement policy serves the sets. The miss rate strongly depends on the executed algorithm. The all pairs shortest paths algorithms solve many practical problems, and it is important to know what algorithm and what cache type match best. This paper presents a technique of simulating the direct mapped, k-way associative and fully associative cache during the algorithm execution, to measure the frequency of read data to cache and write data to memory operations. We have measured the frequencies versus the cache size, the data block size, the amount of processed data, the type of cache, and the type of algorithm. After comparing the basic and blocked Floyd-Warshall algorithms, we conclude that the blocked algorithm well localizes data accesses within one block, but it does not localize data dependencies among blocks. The direct mapped cache significantly loses the associative cache; we can improve its performance by appropriate mapping virtual addresses to physical locations.

APA, Harvard, Vancouver, ISO, and other styles

7

Mutanga, Alfred. "A SystemC Cache Simulator for a Multiprocessor Shared Memory System." International Letters of Social and Humanistic Sciences 13 (October 2013): 75–87. http://dx.doi.org/10.18052/www.scipress.com/ilshs.13.75.

Full text

Abstract:

In this research we built a SystemC Level-1 data cache system in a distributed shared memory architectural environment, with each processor having its own local cache. Using a set of Fast-Fourier Transform and Random trace files we evaluated the cache performance, based on the number of cache hits/misses, of the caches using snooping and directory-based cache coherence protocols. A series of experiments were carried out, with the results of the experiments showing that the directory-based MOESI cache coherency protocol has a performance edge over the snooping Valid-Invalid cache coherency protocol.

APA, Harvard, Vancouver, ISO, and other styles

8

Clayton, N. S., D. P. Griffiths, N. J. Emery, and A. Dickinson. "Elements of episodic–like memory in animals." Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 356, no. 1413 (September 29, 2001): 1483–91. http://dx.doi.org/10.1098/rstb.2001.0947.

Full text

Abstract:

A number of psychologists have suggested that episodic memory is a uniquely human phenomenon and, until recently, there was little evidence that animals could recall a unique past experience and respond appropriately. Experiments on food–caching memory in scrub jays question this assumption. On the basis of a single caching episode, scrub jays can remember when and where they cached a variety of foods that differ in the rate at which they degrade, in a way that is inexplicable by relative familiarity. They can update their memory of the contents of a cache depending on whether or not they have emptied the cache site, and can also remember where another bird has hidden caches, suggesting that they encode rich representations of the caching event. They make temporal generalizations about when perishable items should degrade and also remember the relative time since caching when the same food is cached in distinct sites at different times. These results show that jays form integrated memories for the location, content and time of caching. This memory capability fulfils Tulving's behavioural criteria for episodic memory and is thus termed ‘episodic–like’. We suggest that several features of episodic memory may not be unique to humans.

APA, Harvard, Vancouver, ISO, and other styles

9

Aasaraai, Kaveh, and Andreas Moshovos. "NCOR: An FPGA-Friendly Nonblocking Data Cache for Soft Processors with Runahead Execution." International Journal of Reconfigurable Computing 2012 (2012): 1–12. http://dx.doi.org/10.1155/2012/915178.

Full text

Abstract:

Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as Runahead and out-of-order execution that require nonblocking caches to tolerate main memory latencies. Instead, these processors use non-blocking caches to extract memory level parallelism and improve performance. However, conventional non-blocking cache designs are expensive and slow on FPGAs as they use content-addressable memories (CAMs). This work proposes NCOR, an FPGA-friendly non-blocking cache that exploits the key properties of Runahead execution. NCOR does not require CAMs and utilizes smart cache controllers. A 4 KB NCOR operates at 329 MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32 KB NCOR operates at 278 Mhz and uses 269 logic elements.

APA, Harvard, Vancouver, ISO, and other styles

10

Shukur, Hanan, Subhi Zeebaree, Rizgar Zebari, Omar Ahmed, Lailan Haji, and Dildar Abdulqader. "Cache Coherence Protocols in Distributed Systems." Journal of Applied Science and Technology Trends 1, no. 3 (June 24, 2020): 92–97. http://dx.doi.org/10.38094/jastt1329.

Full text

Abstract:

Distributed systems performance is affected significantly by cache coherence protocols due to their role in data consistency maintaining. Also, cache coherent protocols have a great task for keeping the interconnection of caches in a multiprocessor environment. Moreover, the overall performance of distributed shared memory multiprocessor system is influenced by the used cache coherence protocol type. The major challenge of shared memory devices is to maintain the cache coherently. Therefore, in past years many contributions have been presented to address the cache issues and to improve the performance of distributed systems. This paper reviews in a systematic way a number of methods used for the cache-coherent protocols in a distributed system.

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Xiaoming, Weiping Xu, Qing Zhu, Jinxing Hu, Han Hu, and Yeting Zhang. "A Multi-Level Cache Approach for Realtime Visualization of Massive 3D GIS Data." International Journal of 3-D Information Modeling 1, no. 3 (July 2012): 37–48. http://dx.doi.org/10.4018/ij3dim.2012070104.

Full text

Abstract:

The real-time visualization of 3D GIS at a whole city scale always faces the challenge of dynamic data loading with high-efficiency. Based on the multi-tier distributed 3D GIS framework, this paper presents a multi-level cache approach for dynamic data loading. It aims to establish in 3D GIS spatial database engine (3DGIS-SDE) the unified management mechanism of caches on three levels, including: the client memory cache (CMC) oriented to sharing application, the client file cache (CFC) organized by index, as well as the application server memory cache (ASMC) of structural consistency. With the help of the proposed optimized cache replacement policy, multi-level cache consistency maintenance as well as multithread loading model designed in the paper, the engine is able to adaptively make full use of each-level caches according to their own application properties and achieve effective coordination between them. Finally, a practical 3D GIS database based on Oracle 11g is employed for test. The experimental results prove this approach could satisfy multi-user concurrent applications of 3D visual exploration.

APA, Harvard, Vancouver, ISO, and other styles

12

Modi, Garima, Aritra Bagchi, Neetu Jindal, Ayan Mandal, and Preeti Ranjan Panda. "CABARRE: Request Response Arbitration for Shared Cache Management." ACM Transactions on Embedded Computing Systems 22, no. 5s (September 9, 2023): 1–24. http://dx.doi.org/10.1145/3608096.

Full text

Abstract:

Modern multi-processor systems-on-chip (MPSoCs) are characterized by caches shared by multiple cores. These shared caches receive requests issued by the processor cores. Requests that are subject to cache misses may result in the generation of responses . These responses are received from the lower level of the memory hierarchy and written to the cache. The outstanding requests and responses contend for the shared cache bandwidth. To mitigate the impact of the cache bandwidth contention on the overall system performance, an efficient request and response arbitration policy is needed. Research on shared cache management has neglected the additional cache contention caused by responses, which are written to the cache. We propose CABARRE , a novel request and response arbitration policy at shared caches, so as to improve the overall system performance. CABARRE shows a performance improvement of 23% on average across a set of SPEC workloads compared to straightforward adaptations of state-of-the-art solutions.

APA, Harvard, Vancouver, ISO, and other styles

13

CHEN, HSIN-CHUAN, and JEN-SHIUN CHIANG. "A HIGH-PERFORMANCE SEQUENTIAL MRU CACHE USING VALID-BIT ASSISTANT SEARCH ALGORITHM." Journal of Circuits, Systems and Computers 16, no. 04 (August 2007): 613–26. http://dx.doi.org/10.1142/s0218126607003824.

Full text

Abstract:

Most recently used (MRU) cache is one of the set-associative caches that emphasize implementation of associativity higher than 2. However, the access time is increased because the MRU information must be fetched before accessing the sequential MRU (SMRU) cache. In this paper, focusing on the SMRU cache with subblock placement, we propose an MRU cache scheme that separates the valid bits from data memory and uses these valid bits to decide to reduce the unnecessary access number of memory banks. By this approach, the probability of the front hits is thus increased, and it significantly helps in improving the average access time of the SMRU cache without valid-bit assistant search especially for large associativity and small subblock size.

APA, Harvard, Vancouver, ISO, and other styles

14

Zhang, Qizhen, Philip A. Bernstein, Daniel S. Berger, and Badrish Chandramouli. "Redy." Proceedings of the VLDB Endowment 15, no. 4 (December 2021): 766–79. http://dx.doi.org/10.14778/3503585.3503587.

Full text

Abstract:

Redy is a cloud service that provides high performance caches using RDMA-accessible remote memory. An application can customize the performance of each cache with a service level objective (SLO) for latency and throughput. By using remote memory, it can leverage stranded memory and spot VM instances to reduce the cost of its caches and improve data center resource utilization. Redy automatically customizes the resource configuration for the given SLO, handles the dynamics of remote memory regions, and recovers from failures. The experimental evaluation shows that Redy can deliver its promised performance and robustness under remote memory dynamics in the cloud. We augment a production key-value store, FASTER, with a Redy cache. When the working set exceeds local memory, using Redy is significantly faster than spilling to SSDs.

APA, Harvard, Vancouver, ISO, and other styles

15

Journal, Baghdad Science. "Cache Coherence Protocol Design and Simulation Using IES (Invalid Exclusive read/write Shared) State." Baghdad Science Journal 14, no. 1 (March 5, 2017): 219–30. http://dx.doi.org/10.21123/bsj.14.1.219-230.

Full text

Abstract:

To improve the efficiency of a processor in recent multiprocessor systems to deal with data, cache memories are used to access data instead of main memory which reduces the latency of delay time. In such systems, when installing different caches in different processors in shared memory architecture, the difficulties appear when there is a need to maintain consistency between the cache memories of different processors. So, cache coherency protocol is very important in such kinds of system. MSI, MESI, MOSI, MOESI, etc. are the famous protocols to solve cache coherency problem. We have proposed in this research integrating two states of MESI's cache coherence protocol which are Exclusive and Modified, which responds to a request from reading and writing at the same time and that are exclusive to these requests. Also back to the main memory from one of the other processor that has a modified state is removed in using a proposed protocol when it is invalidated as a result of writing to that location that has the same address because in all cases it depends on the latest value written and if back to memory is used to protect data from loss; preprocessing steps to IES protocol is used to maintain and saving data in main memory when it evict from the cache. All of this leads to increased processor efficiency by reducing access to main memory

APA, Harvard, Vancouver, ISO, and other styles

16

Roque, João V., João D. Lopes, Mário P. Véstias, and José T. de Sousa. "IOb-Cache: A High-Performance Configurable Open-Source Cache." Algorithms 14, no. 8 (July 21, 2021): 218. http://dx.doi.org/10.3390/a14080218.

Full text

Abstract:

Open-source processors are increasingly being adopted by the industry, which requires all sorts of open-source implementations of peripherals and other system-on-chip modules. Despite the recent advent of open-source hardware, the available open-source caches have low configurability, limited lack of support for single-cycle pipelined memory accesses, and use non-standard hardware interfaces. In this paper, the IObundle cache (IOb-Cache), a high-performance configurable open-source cache is proposed, developed and deployed. The cache has front-end and back-end modules for fast integration with processors and memory controllers. The front-end module supports the native interface, and the back-end module supports the native interface and the standard Advanced eXtensible Interface (AXI). The cache is highly configurable in structure and access policies. The back-end can be configured to read bursts of multiple words per transfer to take advantage of the available memory bandwidth. To the best of our knowledge, IOb-Cache is currently the only configurable cache that supports pipelined Central Processing Unit (CPU) interfaces and AXI memory bus interface. Additionally, it has a write-through buffer and an independent controller for fast, most of the time 1-cycle writing together with 1-cycle reading, while previous works only support 1-cycle reading. This allows the best clocks-per-Instruction (CPI) to be close to one (1.055). IOb-Cache is integrated into IOb System-on-Chip (IOb-SoC) Github repository, which has 29 stars and is already being used in 50 projects (forks).

APA, Harvard, Vancouver, ISO, and other styles

17

Ho, Nam, Paul Kaufmann, and Marco Platzner. "Evolution of application-specific cache mappings." International Journal of Hybrid Intelligent Systems 16, no. 3 (September 28, 2020): 149–61. http://dx.doi.org/10.3233/his-200281.

Full text

Abstract:

Reconfigurable caches offer an intriguing opportunity to tailor cache behavior to applications for better run-times and energy consumptions. While one may adapt structural cache parameters such as cache and block sizes, we adapt the memory-address-to-cache-index mapping function to the needs of an application. Using a LEON3 embedded multi-core processor with reconfigurable cache mappings, a metaheuristic search procedure, and MiBench applications, we show in this work how to accurately compare non-deterministic performances of applications and how to use this information to implement an optimization procedure that evolves application-specific cache mappings for the LEON3 multi-core processor.

APA, Harvard, Vancouver, ISO, and other styles

18

Askari, Mahmoud, and Nick Ivanov. "The Dependence of Physical Memory Footprint of Processor on the Applications." Asian Journal of Computer Science and Technology 2, no. 2 (November 5, 2013): 4–10. http://dx.doi.org/10.51983/ajcst-2013.2.2.1724.

Full text

Abstract:

Recently, with growing the gap between processors and memory speeds, parallel performance on chip multicore processors becomes more attractive for filling up this gap. In this direction, calculating the Cycle Per Instructions (CPI) and its relationship with cache miss ratio is important. In this paper, impact of cache usage on Intel i5–460M processor by SPEC CPU2000 with fixed point operations is investigated. At first, the model of the memory hierarchy is under discussion. Afterwards, dependency of cache memory usage is discussed. In part IV and V, regression analysis of data and results are considered. Experiments exploited VTune 2013 counters demonstrate that switching off particular caches depended on kind of application enhances processor performance.

APA, Harvard, Vancouver, ISO, and other styles

19

Jalil, Luma Fayeq, Maha Abdul kareem H. Al-Rawi, and Abeer Diaa Al-Nakshabandi. "Cache coherence protocol design using VMSI (Valid Modified Shared Invalid) states." Journal of University of Human Development 3, no. 1 (March 31, 2017): 274. http://dx.doi.org/10.21928/juhd.v3n1y2017.pp274-281.

Full text

Abstract:

We have proposed in this research the design of a new protocol named VMSI coherence protocol in the cache in order to solve the problem of coherence which is the incompatibility of data between caches that appeared in recent multiprocessors system through the operations of reading and writing. The main purpose of this protocol is to increase processor efficiency by reducing traffic between processor and memory that have been achieved through the removal of the write back to the main memory in the case of reading or writing of shared caches because it depends on existing directory inside that cache which contains all the data that represents a subset of main memory.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhang, Zhaoyang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, and Ping Luo. "Cached Transformers: Improving Transformers with Differentiable Memory Cachde." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16935–43. http://dx.doi.org/10.1609/aaai.v38i15.29636.

Full text

Abstract:

This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens. GRC attention enables attending to both past and current tokens, increasing the receptive field of attention and allowing for exploring long-range dependencies. By utilizing a recurrent gating unit to continuously update the cache, our model achieves significant advancements in \textbf{six} language and vision tasks, including language modeling, machine translation, ListOPs, image classification, object detection, and instance segmentation. Furthermore, our approach surpasses previous memory-based techniques in tasks such as language modeling and displays the ability to be applied to a broader range of situations.

APA, Harvard, Vancouver, ISO, and other styles

21

Sardar, Rupam. "Cache Memory Organization and Virtual Memory." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 03 (March 7, 2024): 1–6. http://dx.doi.org/10.55041/ijsrem29124.

Full text

Abstract:

Cache memory is mainly inculcated in systems to overcome the gap created in-between the main memory and CPUs due to their performance issues. Since, the speed of the processors is ever-increasing, so a need arises for a faster speed cache memory that can definitely assist in bridging the gap between the speed of processor and memory. Therefore, this paper proposes architecture circumscribed with three improvement techniques namely victim cache, sub-blocks, and memory bank. These three techniques will be implemented one after other to improve and make the speed and performance of cache comparative to main memory.

APA, Harvard, Vancouver, ISO, and other styles

22

Zulfa, Mulki Indana, Sri Maryani, Ardiansyah Ardiansyah, Triyanna Widiyaningtyas, and Waleed Ali Ali. "Application Level Caching Approach Based on Enhanced Aging Factor and Pearson Correlation Coefficient." JOIV : International Journal on Informatics Visualization 8, no. 1 (March 31, 2024): 31. http://dx.doi.org/10.62527/joiv.8.1.2143.

Full text

Abstract:

Relational database management systems (RDBMS) have long served as the fundamental infrastructure for web applications. Relatively slow access speeds characterize an RDBMS because its data is stored on a disk. This RDBMS weakness can be overcome using an in-memory database (IMDB). Each query result can be stored in the IMDB to accelerate future access. However, due to the limited capacity of the server cache in the IMDB, an appropriate data priority assessment mechanism needs to be developed. This paper presents a similar cache framework that considers four data vectors, namely the data size, timestamp, aging factor, and controller access statistics for each web page, which serve as the foundation elements for determining the replacement policy whenever there is a change in the content of the server cache. The proposed similarCache employs the Pearson correlation coefficient to quantify the similarity levels among the cached data in the server cache. The lowest Pearson correlation coefficients cached data are the first to be evicted from the memory. The proposed similarCache was empirically evaluated based on simulations conducted on four IRcache datasets. The simulation outcomes revealed that the data access patterns, and the configuration of the allocated memory cache significantly influenced the hit ratio performance. In particular, the simulations on the SV dataset with the most minor memory space configuration exhibited a 2.33% and 1% superiority over the SIZE and FIFO algorithms, respectively. Future tasks include building a cache that can adapt to data access patterns by determining the standard deviation. The proposed similarCache should raise the Pearson coefficient for often available data to the same level as most accessed data in exceptional cases.

APA, Harvard, Vancouver, ISO, and other styles

23

Calciu, Irina, M. Talha Imran, Ivan Puddu, Sanidhya Kashyap, Hasan Al Maruf, Onur Mutlu, and Aasheesh Kolli. "Using Local Cache Coherence for Disaggregated Memory Systems." ACM SIGOPS Operating Systems Review 57, no. 1 (June 26, 2023): 21–28. http://dx.doi.org/10.1145/3606557.3606561.

Full text

Abstract:

Disaggregated memory provides many cost savings and resource provisioning benefits for current datacenters, but software systems enabling disaggregated memory access result in high performance penalties. These systems require intrusive code changes to port applications for disaggregated memory or employ slow virtual memory mechanisms to avoid code changes. Such mechanisms result in high overhead page faults to access remote data and high dirty data amplification when tracking changes to cached data at page-granularity. In this paper, we propose a fundamentally new approach for disaggregated memory systems, based on the observation that we can use local cache coherence to track applications' memory accesses transparently, without code changes, at cache-line granularity. This simple idea (1) eliminates page faults from the application critical path when accessing remote data, and (2) decouples the application memory access tracking from the virtual memory page size, enabling cache-line granularity dirty data tracking and eviction. Using this observation, we implemented a new software runtime for disaggregated memory that improves average memory access time and reduces dirty data amplification1.

APA, Harvard, Vancouver, ISO, and other styles

24

MITTAL, SPARSH, and ZHAO ZHANG. "EnCache: A DYNAMIC PROFILING-BASED RECONFIGURATION TECHNIQUE FOR IMPROVING CACHE ENERGY EFFICIENCY." Journal of Circuits, Systems and Computers 23, no. 10 (October 14, 2014): 1450147. http://dx.doi.org/10.1142/s0218126614501473.

Full text

Abstract:

With each CMOS technology generation, leakage energy consumption has been dramatically increasing and hence, managing leakage power consumption of large last-level caches (LLCs) has become a critical issue in modern processor design. In this paper, we present EnCache, a novel software-based technique which uses dynamic profiling-based cache reconfiguration for saving cache leakage energy. EnCache uses a simple hardware component called profiling cache, which dynamically predicts energy efficiency of an application for 32 possible cache configurations. Using these estimates, system software reconfigures the cache to the most energy efficient configuration. EnCache uses dynamic cache reconfiguration and hence, it does not require offline profiling or tuning the parameter for each application. Furthermore, EnCache optimizes directly for the overall memory subsystem (LLC and main memory) energy efficiency instead of the LLC energy efficiency alone. The experiments performed with an ×86-64 simulator and workloads from SPEC2006 suite confirm that EnCache provides larger energy saving than a conventional energy saving scheme. For single core and dual-core system configurations, the average savings in memory subsystem energy over a shared baseline configuration are 30.0% and 27.3%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

25

Azevedo, Arnaldo, and Ben Juurlink. "A Multidimensional Software Cache for Scratchpad-Based Systems." International Journal of Embedded and Real-Time Communication Systems 1, no. 4 (October 2010): 1–20. http://dx.doi.org/10.4018/jertcs.2010100101.

Full text

Abstract:

In many kernels of multimedia applications, the working set is predictable, making it possible to schedule the data transfers before the computation. Many other kernels, however, process data that is known just before it is needed or have working sets that do not fit in the scratchpad memory. Furthermore, multimedia kernels often access two or higher dimensional data structures and conventional software caches have difficulties to exploit the data locality exhibited by these kernels. For such kernels, the authors present a Multidimensional Software Cache (MDSC), which stores 1- 4 dimensional blocks to mimic in cache the organization of the data structure. Furthermore, it indexes the cache using the matrix indices rather than linear memory addresses. MDSC also makes use of the lower overhead of Direct Memory Access (DMA) list transfers and allows exploiting known data access patterns to reduce the number of accesses to the cache. The MDSC is evaluated using GLCM, providing an 8% performance improvement compared to the IBM software cache. For MC, several optimizations are presented that reduce the number of accesses to the MDSC.

APA, Harvard, Vancouver, ISO, and other styles

26

Baker, Myron Charles, Eric Stone, Ann Eileen Miller Baker, Robert J. Shelden, Patricia Skillicorn, and Mark D. Mantych. "Evidence Against Observational Learning in Storage and Recovery of Seeds by Black-Capped Chickadees." Auk 105, no. 3 (July 1, 1988): 492–97. http://dx.doi.org/10.1093/auk/105.3.492.

Full text

Abstract:

Abstract Recovery of cached sunflower seeds by Black-capped Chickadees (Parus atricapillus) was observed in four laboratory experiments. Results of the first experiment were consistent with the hypothesis that chickadees use spatial memory to recover seeds cached 24 h earlier. The second experiment demonstrated that individuals have a high recovery rate for their own caches and a low recovery rate for caches made by another. The third and fourth experiments demonstrated that one chickadee observing another caching seeds provided no recovery benefit to the observer in comparison to its performance when recovering seeds hidden in its absence. This result held for 2-h and for 6-min delays between observation and attempted recovery. We believe that spatial memory is used by chickadees, that the individual carrying out the caching has a large recovery advantage over a conspecific that searches the same patch, and that the perceptual and motor experience involved in the act of traveling to a cache location may be necessary for the establishment of spatial memory.

APA, Harvard, Vancouver, ISO, and other styles

27

Rostami-Sani, Sajjad, Mojtaba Valinataj, and Saeideh Alinezhad Chamazcoti. "Parloom: A New Low-Power Set-Associative Instruction Cache Architecture Utilizing Enhanced Counting Bloom Filter and Partial Tags." Journal of Circuits, Systems and Computers 28, no. 12 (November 2019): 1950203. http://dx.doi.org/10.1142/s0218126619502037.

Full text

Abstract:

The cache system dissipates a significant amount of energy compared to the other memory components. This will be intensified if a cache is designed with a set-associative structure to improve the system performance because the parallel accesses to the entries of a set for tag comparisons lead to even more energy consumption. In this paper, a novel method is proposed as a combination of a counting Bloom filter and partial tags to mitigate the energy consumption of set-associative caches. This new hybrid method noticeably decreases the cache energy consumption especially in highly-associative instruction caches. In fact, it uses an enhanced counting Bloom filter to predict cache misses with a high accuracy as well as partial tags to decrease the overall cache size. This way, unnecessary tag comparisons can be prevented and therefore, the cache energy consumption is considerably reduced. Based on the simulation results, the proposed method provides the energy reduction from 22% to 31% for 4-way–32-way set-associative L1 caches bigger than 16[Formula: see text]kB running the MiBench programs. The improvements are attained with a negligible system performance degradation compared to the traditional cache system.

APA, Harvard, Vancouver, ISO, and other styles

28

Mittal, Shaily, and Nitin. "Memory Map: A Multiprocessor Cache Simulator." Journal of Electrical and Computer Engineering 2012 (2012): 1–12. http://dx.doi.org/10.1155/2012/365091.

Full text

Abstract:

Nowadays, Multiprocessor System-on-Chip (MPSoC) architectures are mainly focused on by manufacturers to provide increased concurrency, instead of increased clock speed, for embedded systems. However, managing concurrency is a tough task. Hence, one major issue is to synchronize concurrent accesses to shared memory. An important characteristic of any system design process is memory configuration and data flow management. Although, it is very important to select a correct memory configuration, it might be equally imperative to choreograph the data flow between various levels of memory in an optimal manner. Memory map is a multiprocessor simulator to choreograph data flow in individual caches of multiple processors and shared memory systems. This simulator allows user to specify cache reconfigurations and number of processors within the application program and evaluates cache miss and hit rate for each configuration phase taking into account reconfiguration costs. The code is open source and in java.

APA, Harvard, Vancouver, ISO, and other styles

29

Fang, Juan, Zelin Wei, and Huijing Yang. "Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU." Micromachines 12, no. 10 (October 17, 2021): 1262. http://dx.doi.org/10.3390/mi12101262.

Full text

Abstract:

GPGPUs has gradually become a mainstream acceleration component in high-performance computing. The long latency of memory operations is the bottleneck of GPU performance. In the GPU, multiple threads are divided into one warp for scheduling and execution. The L1 data caches have little capacity, while multiple warps share one small cache. That makes the cache suffer a large amount of cache contention and pipeline stall. We propose Locality-Based Cache Management (LCM), combined with the Locality-Based Warp Scheduling (LWS), to reduce cache contention and improve GPU performance. Each load instruction can be divided into three types according to locality: only used once as streaming data locality, accessed multiple times in the same warp as intra-warp locality, and accessed in different warps as inter-warp data locality. According to the locality of the load instruction, LWS applies cache bypass to the streaming locality request to improve the cache utilization rate, extend inter-warp memory request coalescing to make full use of the inter-warp locality, and combine with the LWS to alleviate cache contention. LCM and LWS can effectively improve cache performance, thereby improving overall GPU performance. Through experimental evaluation, our LCM and LWS can obtain an average performance improvement of 26% over baseline GPU.

APA, Harvard, Vancouver, ISO, and other styles

30

Zulfa, Mulki Indana, Rudy Hartanto, Adhistya Erna Permanasari, and Waleed Ali. "LRU-GENACO: A Hybrid Cached Data Optimization Based on the Least Used Method Improved Using Ant Colony and Genetic Algorithms." Electronics 11, no. 19 (September 20, 2022): 2978. http://dx.doi.org/10.3390/electronics11192978.

Full text

Abstract:

An optimization strategy for cached data offloading plays a crucial role in the edge network environment. This strategy can improve the performance of edge nodes with limited cache memory to serve data service requests from user terminals. The main challenge that must be solved in optimizing cached data offloading is assessing and selecting the cached data with the highest profit to be stored in the cache memory. Selecting the appropriate cached data can improve the utility of memory space to increase HR and reduce LSR. In this paper, we model the cached data offloading optimization strategy as the classic optimization KP01. The cached data offloading optimization strategy is then improved using a hybrid approach of three algorithms: LRU, ACO, and GA, called LRU-GENACO. The proposed LRU-GENACO was tested using four real proxy log datasets from IRCache. The simulation results show that the proposed LRU-GENACO hit ratio is superior to the LRU GDS SIZE algorithms by 13.1%, 26.96%, 53.78%, and 81.69%, respectively. The proposed LRU-GENACO method also reduces the average latency by 25.27%.

APA, Harvard, Vancouver, ISO, and other styles

31

Yan, Pei Xiang, Jiang Jiang, Xian Ju Yang, and Min Xuan Zhang. "A Probabilistic Cache Sharing Mechanism for Chip Multiprocessors." Applied Mechanics and Materials 135-136 (October 2011): 119–25. http://dx.doi.org/10.4028/www.scientific.net/amm.135-136.119.

Full text

Abstract:

Capacity sharing is efficient for private L2 caches to utilize cache resources in chip multiprocessors. We propose a probabilistic sharing mechanism using reuse replacement strategy. This mechanism adopts decoupled tag and data arrays, and partitions the data arrays into private and shared regions. Probability is introduced to control the capability of each core to compete shared data resources. We assign high probabilities to cores with stress memory demands and dynamically adjust these probabilities corresponding to the monitored run-time memory demands. Simulation results of PARSEC benchmarks show that our mechanism exceeds conventional LRU managed private cache. Compared with reuse replacement managed private cache without sharing among cores, our mechanism achieves an average L2 miss rate reduction of 8.70%.

APA, Harvard, Vancouver, ISO, and other styles

32

Lichti, Nathanael I., Harmony J. Dalgleish, and Michael A. Steele. "Interactions among Shade, Caching Behavior, and Predation Risk May Drive Seed Trait Evolution in Scatter-Hoarded Plants." Diversity 12, no. 11 (October 31, 2020): 416. http://dx.doi.org/10.3390/d12110416.

Full text

Abstract:

Although dispersal is critical to plant life history, the relationships between seed traits and dispersal success in animal-dispersed plants remain unclear due to complex interactions among the effects of seed traits, habitat structure, and disperser behavior. We propose that in plants dispersed by scatter-hoarding granivores, seed trait evolution may have been driven by selective pressures that arise from interactions between seedling shade intolerance and predator-mediated caching behavior. Using an optimal foraging model that accounts for cache concealment, hoarder memory, and perceived predation risk, we show that hoarders can obtain cache-recovery advantages by placing caches in moderately risky locations that force potential pilferers to engage in high levels of vigilance. Our model also demonstrates that the level of risk needed to optimally protect a cache increases with the value of the cached food item. If hoarders perceive less sheltered, high-light conditions to be more risky and use this information to protect their caches, then shade-intolerant plants may increase their fitness by producing seeds with traits valued by hoarders. Consistent with this hypothesis, shade tolerance in scatter-hoarded tree species is inversely related to the value of their seeds as perceived by a scatter-hoarding rodent.

APA, Harvard, Vancouver, ISO, and other styles

33

Shen, Lili, Ning Wu, and Gaizhen Yan. "Fuzzy-Based Thermal Management Scheme for 3D Chip Multicores with Stacked Caches." Electronics 9, no. 2 (February 18, 2020): 346. http://dx.doi.org/10.3390/electronics9020346.

Full text

Abstract:

By using through-silicon-vias (TSV), three dimension integration technology can stack large memory on the top of cores as a last-level on-chip cache (LLC) to reduce off-chip memory access and enhance system performance. However, the integration of more on-chip caches increases chip power density, which might lead to temperature-related issues in power consumption, reliability, cooling cost, and performance. An effective thermal management scheme is required to ensure the performance and reliability of the system. In this study, a fuzzy-based thermal management scheme (FBTM) is proposed that simultaneously considers cores and stacked caches. The proposed method combines a dynamic cache reconfiguration scheme with a fuzzy-based control policy in a temperature-aware manner. The dynamic cache reconfiguration scheme determines the size of the cache for the processor core according to the application that reaches a substantial amount of power consumption savings. The fuzzy-based control policy is used to change the frequency level of the processor core based on dynamic cache reconfiguration, a process which can further improve the system performance. Experiments show that, compared with other thermal management schemes, the proposed FBTM can achieve, on average, 3 degrees of reduction in temperature and a 41% reduction of leakage energy.

APA, Harvard, Vancouver, ISO, and other styles

34

Yang, Juncheng, Yao Yue, and K. V. Rashmi. "A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter." ACM Transactions on Storage 17, no. 3 (August 31, 2021): 1–35. http://dx.doi.org/10.1145/3468521.

Full text

Abstract:

Modern web services use in-memory caching extensively to increase throughput and reduce latency. There have been several workload analyses of production systems that have fueled research in improving the effectiveness of in-memory caching systems. However, the coverage is still sparse considering the wide spectrum of industrial cache use cases. In this work, we significantly further the understanding of real-world cache workloads by collecting production traces from 153 in-memory cache clusters at Twitter, sifting through over 80 TB of data, and sometimes interpreting the workloads in the context of the business logic behind them. We perform a comprehensive analysis to characterize cache workloads based on traffic pattern, time-to-live (TTL), popularity distribution, and size distribution. A fine-grained view of different workloads uncover the diversity of use cases: many are far more write-heavy or more skewed than previously shown and some display unique temporal patterns. We also observe that TTL is an important and sometimes defining parameter of cache working sets. Our simulations show that ideal replacement strategy in production caches can be surprising, for example, FIFO works the best for a large number of workloads.

APA, Harvard, Vancouver, ISO, and other styles

35

Tabbassum, Kavita, Shah Nawaz Talpur, Sanam Narejo, and Noor-u.-Zaman Leghari. "Management of Scratchpad Memory Using Programming Techniques." Mehran University Research Journal of Engineering and Technology 38, no. 2 (April 1, 2019): 305–12. http://dx.doi.org/10.22581/muet1982.1902.05.

Full text

Abstract:

Consuming the conventional approaches, processors are incapable to achieve effective energy reduction. In upcoming processors on-chip memory system will be the major restriction. On-chip memories are managed by the software SMCs (Software Managed Chips), and are work with caches (on-chip), where inside a block of caches software can explicitly read as well as write specific or complete memory references, or work separately just like scratchpad memory. In embedded systems Scratch memory is generally used as an addition to caches or as a substitute of cache, but due to their comprehensive ease of programmability cache containing architectures are still to be chosen in numerous applications. In contrast to conventional caches in embedded schemes because of their better energy and silicon range effectiveness SPM (Scratch-Pad Memories) are being progressively used. Power consumption of ported applications can significantly be lower as well as portability of scratchpad architectures will be advanced with the language agnostic software management method which is suggested in this manuscript. To enhance the memory configuration and optimization on relevant architectures based on SPM, the variety of current methods are reviewed for finding the chances of optimizations and usage of new methods as well as their applicability to numerous schemes of memory management are also discussed in this paper.

APA, Harvard, Vancouver, ISO, and other styles

36

ALVES, MARCO A. Z., HENRIQUE C. FREITAS, and PHILIPPE O. A. NAVAUX. "HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURES." Parallel Processing Letters 21, no. 01 (March 2011): 85–106. http://dx.doi.org/10.1142/s0129626411000096.

Full text

Abstract:

Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.

APA, Harvard, Vancouver, ISO, and other styles

37

Jaamoum, Amine, Thomas Hiscock, and Giorgio Di Natale. "Noise-Free Security Assessment of Eviction Set Construction Algorithms with Randomized Caches." Applied Sciences 12, no. 5 (February 25, 2022): 2415. http://dx.doi.org/10.3390/app12052415.

Full text

Abstract:

Cache timing attacks, i.e., a class of remote side-channel attack, have become very popular in recent years. Eviction set construction is a common step for many such attacks, and algorithms for building them are evolving rapidly. On the other hand, countermeasures are also being actively researched and developed. However, most countermeasures have been designed to secure last-level caches and few of them actually protect the entire memory hierarchy. Cache randomization is a well-known mitigation technique against cache attacks that has a low-performance overhead. In this study, we attempted to determine whether address randomization on first-level caches is worth considering from a security perspective. In this paper, we present the implementation of a noise-free cache simulation framework that enables the analysis of the behavior of eviction set construction algorithms. We show that randomization at the first level of caches (L1) brings about improvements in security but is not sufficient to mitigate all known algorithms, such as the recently developed Prime–Prune–Probe technique. Nevertheless, we show that L1 randomization can be combined with a lightweight random eviction technique in higher-level caches to mitigate known conflict-based cache attacks.

APA, Harvard, Vancouver, ISO, and other styles

38

Jingyu Zhang, Jingyu Zhang, Ruihan Zhang Jingyu Zhang, Osama Alfarraj Ruihan Zhang, Amr Tolba Osama Alfarraj, and Gwang-Jun Kim Amr Tolba. "A Memory-Aware Spark Cache Replacement Strategy." 網際網路技術學刊 23, no. 6 (November 2022): 1185–90. http://dx.doi.org/10.53106/160792642022112306002.

Full text

Abstract:

<p>Spark is currently the most widely used distributed computing framework, and its key data abstraction concept, Resilient Distributed Dataset (RDD), brings significant performance improvements in big data computing. In application scenarios, Spark jobs often need to replace RDDs due to insufficient memory. Spark uses the Least Recently Used (LRU) algorithm by default as the cache replacement strategy. This algorithm only considers the most recent use time of RDDs as the replacement basis. This characteristic may cause the RDDs that need to be reused to be evicted when performing cache replacement, resulting in a decrease in Spark performance. In response to the above problems, this paper proposes a memory-aware Spark cache replacement strategy, which comprehensively considers the cluster memory usage, RDD size, RDD dependencies, usage times and other information when performing cache replacement and selects the RDDs to be evicted. Furthermore, this paper designs extensive corresponding experiments to test and analyze the performance of the memory-aware Spark cache replacement strategy. The experimental data show that the proposed strategy can improve the performance by up to 13% compared with the LRU algorithm in different scenarios.</p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

39

Guo, Shengpeng. "Sports Smart Data Writing Based on New-Type Semiconductor Nonvolatile Storage Mode." Scientific Programming 2022 (August 30, 2022): 1–13. http://dx.doi.org/10.1155/2022/2770333.

Full text

Abstract:

The development of smart sports has just taken a new chapter. Based on the digital transformation in all aspects today, the data content of smart sports with a large number of branches is huge. However, the current storage system, an external storage disk, has not adapted to the computer’s increasingly high requirements for IO performance; DRAM- and SRAM-based memory and cache also face power consumption and capacity expansion problems; the storage system on the overall performance of the computer system constraints is increasingly prominent. Based on the data storage of smart sports for digital transformation, this article proposes a smart for the writing of sports data; a better cache method has been explored under the new semiconductor nonvolatile cache mode. The experimental results in this article show that by comparing single-value and dual-value STT-RAM (spin-torque transfer RAM) caches, the three-valued cache with hierarchical mapping of page swap design has a significant improvement in IPC performance under each test load, compared with single-value STT-RAM caches. Compared with the value type, the average performance has increased by 16%; the overall energy consumption including memory has been reduced by an average of 12.3%.

APA, Harvard, Vancouver, ISO, and other styles

40

Tabbassum, Kavita, Shahnawaz Talpur, and Noor-u.-Zaman Laghari. "Managing Scratchpad Memory Architecture for Lower Power Consumption Using Programming Techniques." Asian Journal of Applied Science and Engineering 9, no. 1 (May 18, 2020): 79–86. http://dx.doi.org/10.18034/ajase.v9i1.31.

Full text

Abstract:

In embedded systems Scratch memory is generally used as an addition to caches or as a substitute of cache, but due to their comprehensive ease of programmability cache containing architectures are still to be chosen in numerous applications. Power consumption of ported applications can be significantly lowered as well as the portability of scratchpad architectures will be advanced with our suggested language-agnostic software management method. To enhance the memory configuration on relevant architectures, a variety of present methods is reviewed for finding the chances of optimizations and usage of new methods as well as their applicability to numerous memory schemes are discussed in this paper.

APA, Harvard, Vancouver, ISO, and other styles

41

Asiatici, Mikhail, and Paolo Ienne. "Request, Coalesce, Serve, and Forget: Miss-Optimized Memory Systems for Bandwidth-Bound Cache-Unfriendly Applications on FPGAs." ACM Transactions on Reconfigurable Technology and Systems 15, no. 2 (June 30, 2022): 1–33. http://dx.doi.org/10.1145/3466823.

Full text

Abstract:

Applications such as large-scale sparse linear algebra and graph analytics are challenging to accelerate on FPGAs due to the short irregular memory accesses, resulting in low cache hit rates. Nonblocking caches reduce the bandwidth required by misses by requesting each cache line only once, even when there are multiple misses corresponding to it. However, such reuse mechanism is traditionally implemented using an associative lookup. This limits the number of misses that are considered for reuse to a few tens, at most. In this article, we present an efficient pipeline that can process and store thousands of outstanding misses in cuckoo hash tables in on-chip SRAM with minimal stalls. This brings the same bandwidth advantage as a larger cache for a fraction of the area budget, because outstanding misses do not need a data array, which can significantly speed up irregular memory-bound latency-insensitive applications. In addition, we extend nonblocking caches to generate variable-length bursts to memory, which increases the bandwidth delivered by DRAMs and their controllers. The resulting miss-optimized memory system provides up to 25% speedup with 24× area reduction on 15 large sparse matrix-vector multiplication benchmarks evaluated on an embedded and a datacenter FPGA system.

APA, Harvard, Vancouver, ISO, and other styles

42

D.Kesavan, G., and P. N.Karthikayan. "Request Schedule Oriented Compression Cache Memory." International Journal of Engineering & Technology 7, no. 2.19 (April 17, 2018): 80. http://dx.doi.org/10.14419/ijet.v7i2.19.15053.

Full text

Abstract:

Using cache memory the overall memory access time to fetch data gets reduced. As use of cache memory is related to a system's performance, the caching process should take less time. To speed up caching process, there are many cache optimization techniques available. Some of the cache optimization process are Reducing Miss Rate, Reducing Miss Penalty, Re-ducing the time to hit in the cache etc. Re-cent advancement paved way for compressing data in cache, accessing recent data use pat-tern etc. All the techniques focus on increasing cache capacity or replacement policies in cache resulting in more hit ratio. There are many cache related compression and optimization techniques available which address only capacity and replacement related optimization and their related issues. This paper deals with scheduling the requests of cache memory as per compressed cache organization. So that cache searching and indexing speed gets reduced considerably and service the request in a faster manner. For capacity and replacement improvements Dictionary sharing based caching is used. Through this scheme multiple requests are foreseen using pre-fetcher and are searched as per cache organization, promoting easier indexing process.The benefit comes from both compressed storage and also easier storage ac-cess.

APA, Harvard, Vancouver, ISO, and other styles

43

Pratama, Gunadi, Jajang Mulyana, and Wawan Kusdiawan. "MEMBANGUN PROXY SERVER WEB CACHE DENGAN ANALISIS PERBANDINGAN CACHE REPLACEMENT PADA SQUID PROXY." Syntax : Jurnal Informatika 9, no. 2 (October 25, 2020): 98–109. http://dx.doi.org/10.35706/syji.v9i2.3823.

Full text

Abstract:

Pada sebuah jaringan komputer terdapat Proxy server yang memiliki fungsi salah satunya caching. Mekanisme caching pada proxy server adalah menyimpan objek-objek yang merupakan hasil permintaan komputer client dari internet dan memberikan layanan jika client akan mengaksesnya kembali tanpa meminta sepenuhnya ke internet. Mengingat bahwa cache memory merupakan penyimpanan yang tetap maka dapat memungkinkan terjadinya cache memory penuh, untuk mengoptimalisasi kinerja cache memory terdapat metode cache replacement pada proxy server. Cache replacement pada proxy server merupakan metode penghapusan objek pada cache memory untuk digantikan dengan objek baru yang bertujuan agar cache memory tidak penuh. Maka dari itu penulis akan membangun sebuah proxy server web cache serta menganalisis algoritme Least recently used dan Greedy dual size frequently sebagai aturan cache replacement pada squid proxy dengan menggunakan metode pengembangan Netwok Development Live Cycle (NDLC) Cisco PPDIOO. Kata kunci: Proxy server, Caching, Cache Replacement, LRU, GDSF, Cisco PPDIOO.

APA, Harvard, Vancouver, ISO, and other styles

44

Wijaya, Marvin Chandra. "Improving Cache Hits On Replacment Blocks Using Weighted LRU-LFU Combinations." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 10 (November 2, 2023): 1542–50. http://dx.doi.org/10.17762/ijritcc.v11i10.8706.

Full text

Abstract:

Block replacement refers to the process of selecting a block of data or a cache line to be evicted or replaced when a new block needs to be brought into a cache or a memory hierarchy. In computer systems, block replacement policies are used in caching mechanisms, such as in CPU caches or disk caches, to determine which blocks are evicted when the cache is full and new data needs to be fetched. The combination of LRU (Least Recently Used) and LFU (Least Frequently Used) in a weighted manner is known as the "LFU2" algorithm. LFU2 is an enhanced caching algorithm that aims to leverage the benefits of both LRU and LFU by considering both recency and frequency of item access. In LFU2, each item in the cache is associated with two counters: the usage counter and the recency counter. The usage counter tracks the frequency of item access, while the recency counter tracks the recency of item access. These counters are used to calculate a combined weight for each item in the cache. Based on the experimental results, the LRU-LFU combination method succeeded in increasing cache hits from 94.8% on LFU and 95.5% on LFU to 96.6%.

APA, Harvard, Vancouver, ISO, and other styles

45

LEE, JE-HOON, and HYUN GUG CHO. "ASYNCHRONOUS INSTRUCTION CACHE MEMORY FOR AVERAGE-CASE PERFORMANCE." Journal of Circuits, Systems and Computers 23, no. 05 (May 8, 2014): 1450063. http://dx.doi.org/10.1142/s0218126614500637.

Full text

Abstract:

This paper presents an asynchronous instruction cache memory for average-case performance, rather than worst-case performance. Even though the proposed instruction cache design is based on a fixed delay model, it can achieve high throughput by employing a new memory segmentation technique that divides cache memory cell arrays into multiple memory segments. The conventional bit-line memory segmentation divides a whole memory system into multiple segments so that all memory segments have the same size. On the contrary, we propose a new bit-line segmentation technique for the cache memory which consists of multiple segments but all the memory segments have the same delay bound for themselves. We use the resister-capacitor (R-C) modeling of bit-line delay for content addressable memory–random access memory (CAM–RAM) structure in a cache in order to estimate the total bit-line delay. Then, we decide the number of segments to trade-off between the throughput and complexity of a cache system. We synthesized a 128 KB cache memory consisting of various segments from 1 to 16 using Hynix 0.35-μm CMOS process. From the simulation results, our implementation with dividing factor 4 and 16 can reduce the average cache access time to 28% and 35% when compared to the non-segmented counterpart system. It also shows that our implementation can reduce the average cache access time by 11% and 17% when compared to the bit-line segmented cache that consists of the same number of segments that have the same size.

APA, Harvard, Vancouver, ISO, and other styles

46

Charrier, Dominic E., Benjamin Hazelwood, Ekaterina Tutlyaeva, Michael Bader, Michael Dumbser, Andrey Kudryavtsev, Alexander Moskovsky, and Tobias Weinzierl. "Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver." International Journal of High Performance Computing Applications 33, no. 5 (April 15, 2019): 973–86. http://dx.doi.org/10.1177/1094342019842645.

Full text

Abstract:

We study the performance behaviour of a seismic simulation using the ExaHyPE engine with a specific focus on memory characteristics and energy needs. ExaHyPE combines dynamically adaptive mesh refinement (AMR) with ADER-DG. It is parallelized using tasks, and it is cache efficient. AMR plus ADER-DG yields a task graph which is highly dynamic in nature and comprises both arithmetically expensive tasks and tasks which challenge the memory’s latency. The expensive tasks and thus the whole code benefit from AVX vectorization, although we suffer from memory access bursts. A frequency reduction of the chip improves the code’s energy-to-solution. Yet, it does not mitigate burst effects. The bursts’ latency penalty becomes worse once we add Intel Optane technology, increase the core count significantly or make individual, computationally heavy tasks fall out of close caches. Thread overbooking to hide away these latency penalties becomes contra-productive with noninclusive caches as it destroys the cache and vectorization character. In cases where memory-intense and computationally expensive tasks overlap, ExaHyPE’s cache-oblivious implementation nevertheless can exploit deep, noninclusive, heterogeneous memory effectively, as main memory misses arise infrequently and slow down only few cores. We thus propose that upcoming supercomputing simulation codes with dynamic, inhomogeneous task graphs are actively supported by thread runtimes in intermixing tasks of different compute character, and we propose that future hardware actively allows codes to downclock the cores running particular task types.

APA, Harvard, Vancouver, ISO, and other styles

47

Gozlan, Itamar, Chen Avin, Gil Einziger, and Gabriel Scalosub. "Go-to-Controller is Better: Efficient and Optimal LPM Caching with Splicing." ACM SIGMETRICS Performance Evaluation Review 51, no. 1 (June 26, 2023): 15–16. http://dx.doi.org/10.1145/3606376.3593546.

Full text

Abstract:

Data center networks must support huge forwarding policies as they handle the traffic of the various tenants. Since such policies cannot be stored within the limited memory available at commodity switches, SDN controllers can manage the memory available at the switch as a cache, updating and changing the forwarding rules in the cache according to the policy and workloads dynamics. Most policies, such as Longest-prefix-match (LPM) policies, include dependencies between the forwarding rules, which introduce consistency constraints on the structure of the cached content, affecting the performance in terms of throughput and delay. Previous work suggested the concept of splicing to address such deficiencies, where modified Go-to-Controller rules can be inserted into the cache to improve performance while maintaining consistency. We present the first optimal algorithm for determining the cache content with splicing, as well as several efficient heuristics with some performance guarantees. We evaluate our solutions using traces derived from real systems and traffic, and show that splicing can reduce the cache miss ratio by as much as 30%, without increasing the cache size. We further propose a new metric which can provide a quick estimate as to the potential benefits of splicing compared to classical LPM-caching. The full version of our work appeared in [2].

APA, Harvard, Vancouver, ISO, and other styles

48

Carter, John B., Wilson C. Hsieh, Leigh B. Stoller, Mark Swanson, Lixin Zhang, and Sally A. McKee. "Impulse: Memory System Support for Scientific Applications." Scientific Programming 7, no. 3-4 (1999): 195–209. http://dx.doi.org/10.1155/1999/209416.

Full text

Abstract:

Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application‐specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can improve the performance of memory‐bound scientific applications. For instance, Impulse decreases the running time of the NAS conjugate gradient benchmark by 67%. We expect that Impulse will also benefit regularly strided, memory‐bound applications of commercial importance, such as database and multimedia programs.

APA, Harvard, Vancouver, ISO, and other styles

49

Kim, Beomjun, Yongtae Kim, Prashant Nair, and Seokin Hong. "Exploiting Data Compression for Adaptive Block Placement in Hybrid Caches." Electronics 11, no. 2 (January 12, 2022): 240. http://dx.doi.org/10.3390/electronics11020240.

Full text

Abstract:

STT-RAM (Spin-Transfer Torque Random Access Memory) appears to be a viable alternative to SRAM-based on-chip caches. Due to its high density and low leakage power, STT-RAM can be used to build massive capacity last-level caches (LLC). Unfortunately, STT-RAM has a much longer write latency and a much greater write energy than SRAM. Researchers developed hybrid caches made up of SRAM and STT-RAM regions to cope with these challenges. In order to store as many write-intensive blocks in the SRAM region as possible in hybrid caches, an intelligent block placement policy is essential. This paper proposes an adaptive block placement framework for hybrid caches that incorporates metadata embedding (ADAM). When a cache block is evicted from the LLC, ADAM embeds metadata (i.e., write intensity) into the block. Metadata embedded in the cache block are then extracted and used to determine the block’s write intensity when it is fetched from main memory. Our research demonstrates that ADAM can enhance performance by 26% (on average) when compared to a baseline block placement scheme.

APA, Harvard, Vancouver, ISO, and other styles

50

Wu, Lan, and Wei Zhang. "Cache-Aware SPM Allocation to Reduce Worst-Case Execution Time for Hybrid SPM-Caches." Journal of Circuits, Systems and Computers 27, no. 05 (February 6, 2018): 1850080. http://dx.doi.org/10.1142/s0218126618500809.

Full text

Abstract:

Scratch-Pad Memories (SPMs) have been increasingly used in real-time and embedded systems. However, it is still unknown and challenging to reduce the worst-case execution time (WCET) for hybrid SPM-cache architecture, where an SPM and a cache memory are placed on-chip in parallel to cooperatively improve performance and/or energy efficiency. In this paper, we study four SPM allocation strategies to reduce the WCET for hybrid SPM-caches with different complexities. These algorithms differ by whether or not they can cooperate with the cache or be aware of the WCET. Our evaluation shows that the cache-aware and WCET-oriented SPM allocation can minimize the WCET for real-time benchmarks with little or even positive impact on the average-case execution time (ACET).

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!