Dissertations / Theses: 'Cache memory'

1

Sehat, Kamiar. "Evaluation of caches and cache coherency." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.335240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Brewer, Jeffery R. "Reconfigurable cache memory /." Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1885437651&sid=8&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Brewer, Jeffery Ramon. "Reconfigurable Cache Memory." OpenSIUC, 2009. https://opensiuc.lib.siu.edu/theses/48.

Full text

Abstract:

AN ABSTRACT OF THE THESIS OF Jeffery R. Brewer, for the Master degree in Electrical Computer Engineer, presented on May 22, 2009 at Southern Illinois University Carbondale. TITLE: Reconfigurable Cache Memory MAJOR PROFESSOR: Dr. Nazeih Botros As chip designers continue to push the performance of microprocessors to higher levels the energy demand grows. The increase need for integrated chips that provide energy savings without degrading performance is paramount. The cache memory is typically over fifty percent of the size of today's microprocessor chip, and consumes a significant percentage of the total power. Therefore, by designing a reconfigurable cache that's able to dynamically adjust to a smaller cache size without encountering a significant degrade in performance, we are able to realize power conservation. Tournament caching is a reconfigurable method that tracks the current performance of the cache and compares it to possible smaller or larger cache size [1] . The results in this thesis shows that reconfigurable cache memory implemented with a configuration mechanism like Tournament caching would take advantage of associativity and cache size while providing energy conservation. i

APA, Harvard, Vancouver, ISO, and other styles

4

Gieske, Edmund Joseph. "Critical Words Cache Memory." University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1208368190.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Huang, Cheng-Chieh. "Optimizing cache utilization in modern cache hierarchies." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/19571.

Full text

Abstract:

Memory wall is one of the major performance bottlenecks in modern computer systems. SRAM caches have been used to successfully bridge the performance gap between the processor and the memory. However, SRAM cache’s latency is inversely proportional to its size. Therefore, simply increasing the size of caches could result in negative impact on performance. To solve this problem, modern processors employ multiple levels of caches, each of a different size, forming the so called memory hierarchy. Upon a miss, the processor will start to lookup the data from the highest level (L1 cache) to the lowest level (main memory). Such a design can effectively reduce the negative performance impact of simply using a large cache. However, because SRAM has lower storage density compared to other volatile storage, the size of an SRAM cache is restricted by the available on-chip area. With modern applications requiring more and more memory, researchers are continuing to look at techniques for increasing the effective cache capacity. In general, researchers are approaching this problem from two angles: maximizing the utilization of current SRAM caches or exploiting new technology to support larger capacity in cache hierarchies. The first part of this thesis focuses on how to maximize the utilization of existing SRAM cache. In our first work, we observe that not all words belonging to a cache block are accessed around the same time. In fact, a subset of words are consistently accessed sooner than others. We call this subset of words as critical words. In our study, we found these critical words can be predicted by using access footprint. Based on this observation, we propose critical-words-only cache (co cache). Unlike the conventional cache which stores all words that belongs to a block, co-cache only stores the words that we predict as critical. In this work, we convert an L2 cache to a co-cache and use L1s access footprint information to predict critical words. Our experiments show the co-cache can outperform a conventional L2 cache in the workloads whose working-set-sizes are greater than the L2 cache size. To handle the workloads whose working-set-sizes fit in the conventional L2, we propose the adaptive co-cache (acocache) which allows the co-cache to be configured back to the conventional cache. The second part of this thesis focuses on how to efficiently enable a large capacity on-chip cache. In the near future, 3D stacking technology will allow us to stack one or multiple DRAM chip(s) onto the processor. The total size of these chips is expected to be on the order of hundreds of megabytes or even few gigabytes. Recent works have proposed to use this space as an on-chip DRAM cache. However, the tags of the DRAM cache have created a classic space/time trade-off issue. On the one hand, we would like the latency of a tag access to be small as it would contribute to both hit and miss latencies. Accordingly, we would like to store these tags in a faster media such as SRAM. However, with hundreds of megabytes of die-stacked DRAM cache, the space overhead of the tags would be huge. For example, it would cost around 12 MB of SRAM space to store all the tags of a 256MB DRAM cache (if we used conventional 64B blocks). Clearly this is too large, considering that some of the current chip multiprocessors have an L3 that is smaller. Prior works have proposed to store these tags along with the data in the stacked DRAM array (tags-in-DRAM). However, this scheme increases the access latency of the DRAM cache. To optimize access latency in the DRAM cache, we propose aggressive tag cache (ATCache). Similar to a conventional cache, the ATCache caches recently accessed tags to exploit temporal locality; it exploits spatial locality by prefetching tags from nearby cache sets. In addition, we also address the high miss latency issue and cache pollution caused by excessive prefetching. To reduce this overhead, we propose a cost-effective prefetching, which is a combination of dynamic prefetching granularity tunning and hit-prefetching, to throttle the number of sets prefetched. Our proposed ATCache (which consumes 0.4% of overall tag size) can satisfy over 60% of DRAM cache tag accesses on average. The last proposed work in this thesis is a DRAM-Cache-Aware (DCA) DRAM controller. In this work, we first address the challenge of scheduling requests in the DRAM cache. While many recent DRAM works have built their techniques based on a tagsin- DRAM scheme, storing these tags in the DRAM array, however, increases the complexity of a DRAM cache request. In contrast to a conventional request to DRAM main memory, a request to the DRAM cache will now translate into multiple DRAM cache accesses (tag and data). In this work, we address challenges of how to schedule these DRAM cache accesses. We start by exploring whether or not a conventional DRAM controller will work well in this scenario. We introduce two potential designs and study their limitations. From this study, we derive a set of design principles that an ideal DRAM cache controller must satisfy. We then propose a DRAM-cache-aware (DCA) DRAM controller that is based on these design principles. Our experimental results show that DCA can outperform the baseline over 14%.

APA, Harvard, Vancouver, ISO, and other styles

6

GIESKE, EDMUND J. "B+ TREE CACHE MEMORY PERFORMANCE." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092344402.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Van, Vleet Taylor. "Dynamic cache-line sizes /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/6899.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Srinivasan, James Richard. "Improving cache utilisation." Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609508.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Ramaswamy, Subramanian. "Active management of Cache resources." Diss., Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24663.

Full text

Abstract:

This dissertation addresses two sets of challenges facing processor design as the industry enters the deep sub-micron region of semiconductor design. The first set of challenges relates to the memory bottleneck. As the focus shifts from scaling processor frequency to scaling the number of cores, performance growth demands increasing die area. Scaling the number of cores also places a concurrent area demand in the form of larger caches. While on-chip caches occupy 50-60% of area and consume 20-30% of energy expended on-chip, their performance and energy efficiencies are less than 15% and 1% respectively for a range of benchmarks! The second set of challenges is posed by transistor leakage and process variation (inter-die and intra-die) at future technology nodes. Leakage power is anticipated to increase exponentially and sharply lower defect-free yield with successive technology generations. For performance scaling to continue, cache efficiencies have to improve significantly. This thesis proposes and evaluates a broad family of such improvements. This dissertation first contributes a model for cache efficiencies and finds them to be extremely low - performance efficiencies less than 15% and energy efficiencies in the order of 1%. Studying the sources of inefficiency leads to a framework for efficiency improvement based on two interrelated strategies. The approach for improving energy efficiency primarily relies on sizing the cache to match the application memory footprint during a program phase while powering down all remaining cache sets. Importantly, the sized is fully functional with no references to inactive sets. Improving performance efficiency primarily relies on cache shaping, i.e., changing the placement function and thereby the manner in which memory shares the cache. Sizing and shaping are applied at different phase of the design cycle: i) post-manufacturing & offline, ii) at compile-time, and at iii) run-time. This thesis proposes and explores techniques at each phase collectively realizing a repertoire of techniques for future memory system designers. The techniques use a combination of HW-SW techniques and are demonstrated to provide substantive improvements with modest overheads.

APA, Harvard, Vancouver, ISO, and other styles

10

Kumar, Krishna. "Visible synchronization-based cache coherence." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0001/MQ44885.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Fix, James D. "Cache performance analysis of algorithms /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6880.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Jakšić, Zoran. "Cache memory design in the FinFET era." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/316394.

Full text

Abstract:

The major problem in the future technology scaling is the variations in process parameters that are interpreted as imperfections in the development process. Moreover, devices are more sensitive to the environmental changes of temperature and supply volt- age as well as to ageing. All these influences are manifested in the integrated circuits as increased power consumption, reduced maximal operating frequency and increased number of failures. These effects have been partially overcome with the introduction of the FinFET technology which have solved the problem of variability caused by Random Dopant Fluctuations. However, in the next ten years channel length is projected to shrink to 10nm where the variability source generated by Line Edge Roughness will dominate, and its effects on the threshold voltage variations will become critical. The embedded memories with their cells as the basic building unit are the most prone to these effects due to their the smallest dimensions. Because of that, memories should be designed with particular care in order to make possible further technology scaling. This thesis explores upcoming 10nm FinFETs and the existing issues in the cache memory design with this technology. More- over, it tries to present some original and novel techniques on the different level of design abstraction for mitigating the effects of process and environmental variability. At first original method for simulating variability of Tri-Gate Fin- FETs is presented using conventional HSPICE simulation environment and BSIM-CMG model cards. When that is accomplished, thorough characterisation of traditional SRAM cell circuits (6T and 8T) is performed. Possibility of using Independent Gate FinFETs for increasing cell stability has been explored, also. Gain Cells appeared in the recent past as an attractive alternative for in the cache memory design. This thesis partially explores this idea by presenting and performing detailed circuit analysis of the dynamic 3T gain cell for 10nm FinFETs. At the top of this work, thesis shows one micro-architecture optimisation of high-speed cache when it is implemented by 3T gain cells. We show how the cache coherency states can be used in order to reduce refresh energy of the memory as well as reduce memory ageing.
El principal problema de l'escalat la tecnologia són les variacions en els paràmetres de disseny (imperfeccions) durant procés de fabricació. D'altra banda, els dispositius també són més sensibles als canvis ambientals de temperatura, la tensió d'alimentació, així com l'envelliment. Totes aquestes influències es manifesten en els circuits integrats com l'augment de consum d'energia, la reducció de la freqüència d'operació màxima i l'augment del nombre de xips descartats. Aquests efectes s'han superat parcialment amb la introducció de la tecnologia FinFET que ha resolt el problema de la variabilitat causada per les fluctuacions de dopants aleatòries. No obstant això, en els propers deu anys, l'ample del canal es preveu que es reduirà a 10nm, on la font de la variabilitat generada per les rugositats de les línies de material dominarà, i els seu efecte en les variacions de voltatge llindar augmentarà. Les memòries encastades amb les seves cel·les com la unitat bàsica de construcció són les més propenses a sofrir aquests efectes a causa de les seves dimensions més petites. A causa d'això, cal dissenyar les memòries amb una especial cura per tal de fer possible l'escalat de la tecnologia. Aquesta tesi explora la tecnologia de FinFETs de 10nm i els problemes existents en el disseny de memòries amb aquesta tecnologia. A més a més, presentem noves tècniques originals sobre diferents nivells d'abstracció del disseny per a la mitigació dels efectes les variacions tan de procés com ambientals. En primer lloc, presentem un mètode original per a la simulació de la variabilitat de Tri-Gate FinFETs usant entorn de simulació HSPICE convencional i models de tecnologia BSIMCMG. Després, es realitza la caracterització completa dels circuits de cel·les SRAM tradicionals (6T i 8T) conjuntament amb l'ús de Gate-independent FinFETs per augmentar l'estabilitat de la cèl·lula.

APA, Harvard, Vancouver, ISO, and other styles

13

He, Bingsheng. "Cache-oblivious query processing /." View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?CSED%202008%20HE.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Page, Daniel Stephen. "Effective use of partitioned cache memories." Thesis, University of Bristol, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.369524.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Naz, Afrin Kavi Krishna M. "Split array and scalar data caches a comprehensive study of data cache organization /." [Denton, Tex.] : University of North Texas, 2007. http://digital.library.unt.edu/permalink/meta-dc-3932.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Ramesh, Bharath. "Samhita: Virtual Shared Memory for Non-Cache-Coherent Systems." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/23687.

Full text

Abstract:

Among the key challenges of computing today are the emergence of many-core architectures and the resulting need to effectively exploit explicit parallelism. Indeed, programmers are striving to exploit parallelism across virtually all platforms and application domains. The shared memory programming model effectively addresses the parallelism needs of mainstream computing (e.g., portable devices, laptops, desktop, servers), giving rise to a growing ecosystem of shared memory parallel techniques, tools, and design practices. However, to meet the extreme demands for processing and memory of critical problem domains, including scientific computation and data intensive computing, computing researchers continue to innovate in the high-end distributed memory architecture space to create cost-effective and scalable solutions. The emerging distributed memory architectures are both highly parallel and increasingly heterogeneous. As a result, they do not present the programmer with a cache-coherent view of shared memory, either across the entire system or even at the level of an individual node. Furthermore, it remains an open research question which programming model is best for the heterogeneous platforms that feature multiple traditional processors along with accelerators or co-processors. Hence, we have two contradicting trends. On the one hand, programming convenience and the presence of shared memory call for a shared memory programming model across the entire heterogeneous system. On the other hand, increasingly parallel and heterogeneous nodes lacking cache-coherent shared memory call for a message passing model. In this dissertation, we present the architecture of Samhita, a distributed shared memory (DSM) system that addresses the challenge of providing shared memory for non-cache-coherent systems. We define regional consistency (RegC), the memory consistency model implemented by Samhita. We present performance results for Samhita on several computational kernels and benchmarks, on both cluster supercomputers and heterogeneous systems. The results demonstrate the promising potential of Samhita and the RegC model, and include the largest scale evaluation by a significant margin for any DSM system reported to date.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

17

Leung, Shun-Tak Albert. "Array restructuring for cache locality /." Thesis, Connect to this title online; UW restricted, 1996. http://hdl.handle.net/1773/7023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Pendyala, Ragini. "Cache memory design with embedded LRU replacement policy /." Available to subscribers only, 2006. http://proquest.umi.com/pqdweb?did=1240704191&sid=10&Fmt=2&clientId=1509&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Archibald, James K. "The cache coherence problem in shared-memory multiprocessors /." Thesis, Connect to this title online; UW restricted, 1987. http://hdl.handle.net/1773/6955.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Sebek, Filip. "Instruction cache memory : issues in real-time systems /." Västerås : Mälardalen University, 2002. http://www.mrtc.mdh.se/publications/0433.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Berg, Stefan Georg. "A cache-based prefetching memory system for mediaprocessors /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Ashenden, Peter J. "An experimental system for evaluating cache coherence protocols in shared memory multiprocessors /." Title page, contents and abstract only, 1997. http://web4.library.adelaide.edu.au/theses/09PH/09pha824.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Xiang, Ping. "ANALYZING INSTRUCTTION BASED CACHE REPLACEMENT POLICIES." Master's thesis, University of Central Florida, 2010. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2589.

Full text

Abstract:

The increasing speed gap between microprocessors and off-chip DRAM makes last-level caches (LLCs) a critical component for computer performance. Multi core processors aggravate the problem since multiple processor cores compete for the LLC. As a result, LLCs typically consume a significant amount of the die area and effective utilization of LLCs is mandatory for both performance and power efficiency. We present a novel replacement policy for last-level caches (LLCs). The fundamental observation is to view LLCs as a shared resource among multiple address streams with each stream being generated by a static memory access instruction. The management of LLCs in both single-core and multi-core processors can then be modeled as a competition among multiple instructions. In our proposed scheme, we prioritize those instructions based on the number of LLC accesses and reuses and only allow cache lines having high instruction priorities to replace those of low priorities. The hardware support for our proposed replacement policy is light-weighted. Our experimental results based on a set of SPEC 2006 benchmarks show that it achieves significant performance improvement upon the least-recently used (LRU) replacement policy for benchmarks with high numbers of LLC misses. To handle LRU-friendly workloads, the set sampling technique is adopted to retain the benefits from the LRU replacement policy.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering MSCpE

APA, Harvard, Vancouver, ISO, and other styles

24

Rasquinha, Mitchelle. "An energy efficient cache design using spin torque transfer (STT) RAM." Thesis, Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42715.

Full text

Abstract:

The advent of many core architectures has coincided with the energy and power limited design of modern processors. Projections for main memory clearly show widening of the processor-memory gap. Cache capacity increased to help reduce this gap will lead to increased energy and area usage and due to small growth in die size, impede performance scaling that has accompanied Moore's Law to date. Among the dominant sources of energy consumption is the on-chip memory hierar- chy, specically the L2 cache and the Last Level Cache (LLC). This work explores the use of a novel non-volatile memory technology - Spin Torque Transfer RAM (STT RAM)" for the design of the L2/LLC caches. While STTRAM is a promising memory technology, it has some limitations, particularly in terms of write energy and write latencies. The main objectives of this thesis is to use a novel cell design for a non-volatile 1T1MTJ cell and demonstrate its use at the L2 and LLC cache levels with architectural optimizations to maximize energy reduction. The proposed cache hierarchy dissipates significantly lesser energy (both leakage and dynamic) and uses less area in comparison to a conventional SRAM based cache designs.

APA, Harvard, Vancouver, ISO, and other styles

25

Harper, John Stuart. "Analytic cache modelling of numerical programs." Thesis, University of Warwick, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.343887.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Zhao, DongYan. "Using processor cache memory to speed up spatial operations." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ56384.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Schwalb, David, Jens Krüger, and Hasso Plattner. "Cache conscious column organization in in-memory column stores." Universität Potsdam, 2013. http://opus.kobv.de/ubp/volltexte/2013/6389/.

Full text

Abstract:

Cost models are an essential part of database systems, as they are the basis of query performance optimization. Based on predictions made by cost models, the fastest query execution plan can be chosen and executed or algorithms can be tuned and optimised. In-memory databases shifts the focus from disk to main memory accesses and CPU costs, compared to disk based systems where input and output costs dominate the overall costs and other processing costs are often neglected. However, modelling memory accesses is fundamentally different and common models do not apply anymore. This work presents a detailed parameter evaluation for the plan operators scan with equality selection, scan with range selection, positional lookup and insert in in-memory column stores. Based on this evaluation, a cost model based on cache misses for estimating the runtime of the considered plan operators using different data structures is developed. Considered are uncompressed columns, bit compressed and dictionary encoded columns with sorted and unsorted dictionaries. Furthermore, tree indices on the columns and dictionaries are discussed. Finally, partitioned columns consisting of one partition with a sorted and one with an unsorted dictionary are investigated. New values are inserted in the unsorted dictionary partition and moved periodically by a merge process to the sorted partition. An efficient attribute merge algorithm is described, supporting the update performance required to run enterprise applications on read-optimised databases. Further, a memory traffic based cost model for the merge process is provided.
Kostenmodelle sind ein essentieller Teil von Datenbanksystemen und bilden die Basis für Optimierungen von Ausführungsplänen. Durch Abschätzungen der Kosten können die entsprechend schnellsten Operatoren und Algorithmen zur Abarbeitung einer Anfrage ausgewählt und ausgeführt werden. Hauptspeicherresidente Datenbanken verschieben den Fokus von I/O Operationen hin zu Zugriffen auf den Hauptspeicher und CPU Kosten, verglichen zu Datenbanken deren primäre Kopie der Daten auf Sekundärspeicher liegt und deren Kostenmodelle sich in der Regel auf die kostendominierenden Zugriffe auf das Sekundärmedium beschränken. Kostenmodelle für Zugriffe auf Hauptspeicher unterscheiden sich jedoch fundamental von Kostenmodellen für Systeme basierend auf Festplatten, so dass alte Modelle nicht mehr greifen. Diese Arbeit präsentiert eine detaillierte Parameterdiskussion, sowie ein Kostenmodell basierend auf Cache-Zugriffen zum Abschätzen der Laufzeit von Datenbankoperatoren in spaltenorientierten und hauptspeicherresidenten Datenbanken wie das Selektieren von Werten einer Spalte mittels einer Gleichheitsbedingung oder eines Wertebereichs, das Nachschlagen der Werte einzelner Positionen oder dem Hinzufügen neuer Werte. Dabei werden Kostenfunktionen für die Operatoren erstellt, welche auf unkomprimierten Spalten, mittels Substitutionskompression komprimierten Spalten sowie bit-komprimierten Spalten operieren. Des Weiteren werden Baumstrukturen als Index Strukturen auf Spalten und Wörterbüchern in die Betrachtung gezogen. Abschließend werden partitionierte Spalten eingeführt, welche aus einer lese- und einer schreib-optimierten Partition bestehen. Neu Werte werden in die schreiboptimierte Partition eingefügt und periodisch von einem Attribut-Merge-Prozess mit der leseoptimierten Partition zusammengeführt. Beschrieben wird eine Effiziente Implementierung für den Attribut-Merge-Prozess und ein Hauptspeicher-bandbreitenbasiertes Kostenmodell aufgestellt.

APA, Harvard, Vancouver, ISO, and other styles

28

Koneru, Venkata Raja Ramchandar. "Fault Insertion and Fault Analysis of Neural Cache Memory." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1592171695469746.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Romer, Theodore H. "Using virtual memory to improve cache and TLB performance /." Thesis, Connect to this title online; UW restricted, 1998. http://hdl.handle.net/1773/6913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Elver, Marco Iskender. "Memory consistency directed cache coherence protocols for scalable multiprocessors." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/22073.

Full text

Abstract:

The memory consistency model, which formally specifies the behavior of the memory system, is used by programmers to reason about parallel programs. From a hardware design perspective, weaker consistency models permit various optimizations in a multiprocessor system: this thesis focuses on designing and optimizing the cache coherence protocol for a given target memory consistency model. Traditional directory coherence protocols are designed to be compatible with the strictest memory consistency model, sequential consistency (SC). When they are used for chip multiprocessors (CMPs) that provide more relaxed memory consistency models, such protocols turn out to be unnecessarily strict. Usually, this comes at the cost of scalability, in terms of per-core storage due to sharer tracking, which poses a problem with increasing number of cores in today’s CMPs, most of which no longer are sequentially consistent. The recent convergence towards programming language based relaxed memory consistency models has sparked renewed interest in lazy cache coherence protocols. These protocols exploit synchronization information by enforcing coherence only at synchronization boundaries via self-invalidation. As a result, such protocols do not require sharer tracking which benefits scalability. On the downside, such protocols are only readily applicable to a restricted set of consistency models, such as Release Consistency (RC), which expose synchronization information explicitly. In particular, existing architectures with stricter consistency models (such as x86) cannot readily make use of lazy coherence protocols without either: adapting the protocol to satisfy the stricter consistency model; or changing the architecture’s consistency model to (a variant of) RC, typically at the expense of backward compatibility. The first part of this thesis explores both these options, with a focus on a practical approach satisfying backward compatibility. Because of the wide adoption of Total Store Order (TSO) and its variants in x86 and SPARC processors, and existing parallel programs written for these architectures, we first propose TSO-CC, a lazy cache coherence protocol for the TSO memory consistency model. TSO-CC does not track sharers and instead relies on self-invalidation and detection of potential acquires (in the absence of explicit synchronization) using per cache line timestamps to efficiently and lazily satisfy the TSO memory consistency model. Our results show that TSO-CC achieves, on average, performance comparable to a MESI directory protocol, while TSO-CC’s storage overhead per cache line scales logarithmically with increasing core count. Next, we propose an approach for the x86-64 architecture, which is a compromise between retaining the original consistency model and using a more storage efficient lazy coherence protocol. First, we propose a mechanism to convey synchronization information via a simple ISA extension, while retaining backward compatibility with legacy codes and older microarchitectures. Second, we propose RC3 (based on TSOCC), a scalable cache coherence protocol for RCtso, the resulting memory consistency model. RC3 does not track sharers and relies on self-invalidation on acquires. To satisfy RCtso efficiently, the protocol reduces self-invalidations transitively using per-L1 timestamps only. RC3 outperforms a conventional lazy RC protocol by 12%, achieving performance comparable to a MESI directory protocol for RC optimized programs. RC3’s storage overhead per cache line scales logarithmically with increasing core count and reduces on-chip coherence storage overheads by 45% compared to TSO-CC. Finally, it is imperative that hardware adheres to the promised memory consistency model. Indeed, consistency directed coherence protocols cannot use conventional coherence definitions (e.g. SWMR) to be verified against, and few existing verification methodologies apply. Furthermore, as the full consistency model is used as a specification, their interaction with other components (e.g. pipeline) of a system must not be neglected in the verification process. Therefore, verifying a system with such protocols in the context of interacting components is even more important than before. One common way to do this is via executing tests, where specific threads of instruction sequences are generated and their executions are checked for adherence to the consistency model. It would be extremely beneficial to execute such tests under simulation, i.e. when the functional design implementation of the hardware is being prototyped. Most prior verification methodologies, however, target post-silicon environments, which when used for simulation-based memory consistency verification would be too slow. We propose McVerSi, a test generation framework for fast memory consistency verification of a full-system design implementation under simulation. Our primary contribution is a Genetic Programming (GP) based approach to memory consistency test generation, which relies on a novel crossover function that prioritizes memory operations contributing to non-determinism, thereby increasing the probability of uncovering memory consistency bugs. To guide tests towards exercising as much logic as possible, the simulator’s reported coverage is used as the fitness function. Furthermore, we increase test throughput by making the test workload simulation-aware. We evaluate our proposed framework using the Gem5 cycle accurate simulator in full-system mode with Ruby (with configurations that use Gem5’s MESI protocol, and our proposed TSO-CC together with an out-of-order pipeline). We discover 2 new bugs in the MESI protocol due to the faulty interaction of the pipeline and the cache coherence protocol, highlighting that even conventional protocols should be verified rigorously in the context of a full-system. Crucially, these bugs would not have been discovered through individual verification of the pipeline or the coherence protocol. We study 11 bugs in total. Our GP-based test generation approach finds all bugs consistently, therefore providing much higher guarantees compared to alternative approaches (pseudo-random test generation and litmus tests).

APA, Harvard, Vancouver, ISO, and other styles

31

Hamelin, Claire. "Couples de spin-orbite en vue d'applications aux mémoires cache." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAY061/document.

Full text

Abstract:

Le remplacement des technologies DRAM et SRAM des mémoires caches est un enjeu pour l’industrie microélectronique qui doit faire face à des demandes de miniaturisation, de réduction des amplitudes et des durées des courants d’écriture et de lecture des données. Les mémoires à accès direct magnétiques (MRAM) sont des candidates pour une future génération de mémoires et la découverte des couples de spin-orbite (SOT) a ouvert la voix à une combinaison des deux technologies appelée SOT-MRAM. Ces mémoires sont très prometteuses car elles allient non-volatilité et bonne fiabilité, mais de nombreux défis techniques et théoriques restent à relever.L’objectif de ce travail de thèse est d’étudier le retournement de l’aimantation par couple de spin-orbite avec des impulsions de courant sub-nanoseconde et de diminuer les courants d’écriture à couple de spin-orbite. Ce travail est préliminaire à la preuve de concept d’une mémoire SOT-MRAM écrite avec des impulsions de courant électrique ultra-courtes et des amplitudes relativement faibles.Pour cela nous avons étudié des cellules mémoire à base de Ta-CoFeB-MgO. Nous avons vérifié les dépendances du courant critique en durées d’impulsions et en un champ magnétique extérieur. Nous avons ensuite, sur une cellule type SOT-MRAM, prouvé l’écriture ultrarapide avec des impulsions de courant inférieures à la nanoseconde. Puis nous nous sommes intéressés à la diminution du courant d’écriture de SOT-MRAM à l’aide d’un champ électrique. Nous avons démontré que ce dernier permet de modulerl’anisotropie magnétique. Sa diminution lors d’une impulsion de courant dans la liste de tantale montre que la densité de courant critique pour le retournement de l’aimantation du CoFeB par SOT est réduite. Ces résultats sont très encourageants pour le développement des SOT-MRAM et incitent à approfondir ces études. Le mécanisme de retournement de l’aimantation semble être une nucléation puis une propagation de parois de domaines magnétiques. Cette hypothèse se fonde sur des tendances physiques observées lors des expériences ainsi que sur des simulations numériques
They require smaller areas for bigger storage densities, non-volatility as well as reduced and shorter writing electrical currents. Magnetic Random Access Memory (MRAM) is one of the best candidates for the replacement of SRAM and DRAM. Moreover, the recent discovery of spin-orbit torques (SOT) may lead to a new technology called SOT-MRAM. These promising technologies combine non-volatility and good reliability but many challenges still need to be taken up.This thesis aims at switching magnetization by spin-orbit torques with ultra-fast current pulse and at reducing their amplitude. This preliminary work should enable one to proof the concept of SOT-MRAM written with short current pulses and low electrical consumption to write a memory cell.To do so, we studied Ta-CoFeB-MgO-based memory cells for which we verified current dependencies on pulse lengths and external magnetic field. Then we proved the ultrafast writing of a SOT-MRAM cell with pulses as short as 400 ps. Next, we focused on reducing the critical writing currents by SOT with the application of an electric field. We showed that magnetic anisotropy can be modulated by an electricfield. If it can be lowered while a current pulse is injected through the tantalum track, we observed a reduction of the critical current density for the switching of the CoFeB magnetization. Those results are very promising for the development of SOT-MRAM and encourage one to delve deeper into this study. The magnetization switching mechanism seems to be a nucleation followed by propagations of magneticdomain walls. This assumption is based on many physical tendencies we observed and also on numerical simulations

APA, Harvard, Vancouver, ISO, and other styles

32

Ammari, Rami J. "A study for reducing conflict misses in data cache." Master's thesis, Mississippi State : Mississippi State University, 2004. http://library.msstate.edu/etd/show.asp?etd=etd-04032004-211908.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Sorenson, Elizabeth S. "Cache characterization and performance studies using locality surfaces /." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd950.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Furis, Mihai Alexandru Johnson Jeremy. "Cache miss analysis of Walsh-Hadamard Transform algorithms /." Philadelphia : Drexel University, 2003. http://dspace.library.drexel.edu/handle/1721.1/109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Lodde, Mario. "Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/35325.

Full text

Abstract:

La jerarquía de caches y la red en el chip (NoC) son dos componentes clave de los chip multiprocesadores (CMPs). La mayoría del trafico en la NoC se debe a mensajes que las caches envían según lo que establece el protocolo de coherencia. La cantidad de trafico, el porcentaje de mensajes cortos y largos y el patrón de trafico en general varían dependiendo de la geometría de las caches y del protocolo de coherencia. La arquitectura de la NoC y la jerarquía de caches están de hecho firmemente acopladas, y estos dos componentes deben ser diseñados y evaluados conjuntamente para estudiar como el variar uno afecta a las prestaciones del otro. Además, cada componente debe ajustarse a los requisitos y a las oportunidades del otro, y al revés. Normalmente diferentes clases de mensajes se envían por diferentes redes virtuales o por NoCs con diferente ancho de banda, separando mensajes largos y cortos. Sin embargo, otra clasificación de los mensajes se puede hacer dependiendo del tipo de información que proveen: algunos mensajes, como las peticiones de datos, necesitan campos para almacenar información (dirección del bloque, tipo de petición, etc.); otros, como los mensajes de reconocimiento (ACK), no proporcionan ninguna información excepto por el ID del nodo destino: solo proveen una información de tipo temporal, en el sentido que la recepción de un ACK indica que el nodo fuente ha recibido el mensaje al que está contestando con el ACK y completado todas las operaciones determinadas por el protocolo de coherencia. Esta segunda clase de mensaje no necesita de mucho ancho de banda: la latencia es mucho mas importante, dado que el nodo destino esta típicamente bloqueado esperando la recepción de ellos. En este trabajo de tesis se desarrolla una red dedicada para trasmitir la segunda clase de mensajes; la red es muy sencilla y rápida, y permite la entrega de los ACKs con una latencia de pocos ciclos de reloj. Reduciendo la latencia y el trafico en la NoC debido a los ACKs, es posible: -acelerar la fase de invalidación en fase de escritura en un sistema que usa un protocolo de coherencia basado en directorios -mejorar las prestaciones de un protocolo de coerencia basado en broadcast, hasta llegar a prestaciones comparables con las de un protocolo de directorios pero sin el coste de área debido a la necesidad de almacenar el directorio -implementar un mapeado dinámico de bloques a las caches de ultimo nivel de forma eficiente, con el objetivo de acercar cuanto al máximo los bloques a los cores que los utilizan El objetivo final es obtener un co-diseño de NoC y jerarquía de caches que minimice los problemas de escalabilidad de los protocolos de coherencia. Como gran objetivo final, se pretende la implementación de un CMP con ubicación dinámica de los recursos de cache y red, tal que estos recursos se puedan particionar de forma eficiente e independiente para asignar diferentes particiones a diferentes aplicaciones en un entorno virtualizado.
Lodde, M. (2014). Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/35325
TESIS

APA, Harvard, Vancouver, ISO, and other styles

36

Baek, Seungcheol. "High-performance memory system architectures using data compression." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51863.

Full text

Abstract:

The Chip Multi-Processor (CMP) paradigm has cemented itself as the archetypal philosophy of future microprocessor design. Rapidly diminishing technology feature sizes have enabled the integration of ever-increasing numbers of processing cores on a single chip die. This abundance of processing power has magnified the venerable processor-memory performance gap, which is known as the ”memory wall”. To bridge this performance gap, a high-performing memory structure is needed. An attractive solution to overcoming this processor-memory performance gap is using compression in the memory hierarchy. In this thesis, to use compression techniques more efficiently, compressed cacheline size information is studied, and size-aware cache management techniques and hot-cacheline prediction for dynamic early decompression technique are proposed. Also, the proposed works in this thesis attempt to mitigate the limitations of phase change memory (PCM) such as low write performance and limited long-term endurance. One promising solution is the deployment of hybridized memory architectures that fuse dynamic random access memory (DRAM) and PCM, to combine the best attributes of each technology by using the DRAM as an off-chip cache. A dual-phase compression technique is proposed for high-performing DRAM/PCM hybrid environments and a multi-faceted wear-leveling technique is proposed for the long-term endurance of compressed PCM. This thesis also includes a new compression-based hybrid multi-level cell (MLC)/single-level cell (SLC) PCM management technique that aims to combine the performance edge of SLCs with the higher capacity of MLCs in a hybrid environment.

APA, Harvard, Vancouver, ISO, and other styles

37

Nwachukwu, Izuchukwu Udochi. "Techniques for Improving Uniformity in Direct Mapped Caches." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc68025/.

Full text

Abstract:

Directly mapped caches are an attractive option for processor designers as they combine fast lookup times with reduced complexity and area. However, directly-mapped caches are prone to higher miss-rates as there are no candidates for replacement on a cache miss, hence data residing in a cache set would have to be evicted to the next level cache. Another issue that inhibits cache performance is the non-uniformity of accesses exhibited by most applications: some sets are under-utilized while others receive the majority of accesses. This implies that increasing the size of caches may not lead to proportionally improved cache hit rates. Several solutions that address cache non-uniformity have been proposed in the literature. These techniques have been proposed over the past decade and each proposal independently claims the benefit of reduced conflict misses. However, because the published results use different benchmarks and different experimental setups, (there is no established frame of reference for comparing these results) it is not easy to compare them. In this work we report a side-by-side comparison of these techniques. Finally, we propose and Adaptive-Partitioned cache for multi-threaded applications. This design limits inter-thread thrashing while dynamically reducing traffic to heavily accessed sets.

APA, Harvard, Vancouver, ISO, and other styles

38

Karlsson, Martin. "Cache memory design trade-offs for current and emerging workloads." Licentiate thesis, Uppsala universitet, Avdelningen för datorteknik, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-86156.

Full text

Abstract:

The memory system is the key to performance in contemporary computer systems. When designing a new memory system, architectural decisions are often arbitrated based on their expected performance effect. It is therefore very important to make performance estimates based on workloads that accurately reflect the future use of the system. This thesis presents the first memory system characterization study of Java-based middleware, which is an emerging workload likely to be an important design consideration for next generation processors and servers. Manufacturing technology has reached a point where it is now possible to fit multiple full-scale processors and integrate board-level features on a chip. The raised competition for chip resources has increased the need to design more effective caches without trading off area or power. Two common ways to improve cache performance is to increase the size or associativity of the cache. Both of these approaches come at a high cost in chip area as well as power. This thesis presents two new cache organizations, each aimed at more efficient use of either power or area. First, the Elbow cache is presented, which is shown to be a power-efficient alternative to highly set-associative caches. Secondly, a selective cache allocation algorithm is presented, RASCAL, that significantly reduces the miss ratio at a limited cost in area.

APA, Harvard, Vancouver, ISO, and other styles

39

Manjikian, Naraig. "Program transformations for cache locality enhancement on shared-memory multiprocessors." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/NQ27692.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Vishwasrao, Saket Dilip. "Performance Evaluation of Web Archiving Through In-Memory Page Cache." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/78252.

Full text

Abstract:

This study proposes and evaluates a new method for Web archiving. We leverage the caching infrastructure in Web servers for archiving. Redis is used as the page cache and its persistence mechanism is exploited for archiving. We experimentally evaluate the performance of our archival technique using the Greek version of Wikipedia deployed on Amazon cloud infrastructure. We show that there is a slight increase in latencies of the rendered pages due to archiving. Though the server performance is comparable at larger page cache sizes, the maximum throughput the server can handle decreases significantly at lower cache sizes due to more disk write operations as a result of archiving. Since pages are dynamically rendered and the technology stack of Wikipedia is extensively used in a number of Web applications, our results should have broad impact.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

41

Kim, Donglok. "Extended data cache prefetching using a reference prediction table /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6127.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Naz, Afrin. "Split array and scalar data cache: A comprehensive study of data cache organization." Thesis, University of North Texas, 2007. https://digital.library.unt.edu/ark:/67531/metadc3932/.

Full text

Abstract:

Existing cache organization suffers from the inability to distinguish different types of localities, and non-selectively cache all data rather than making any attempt to take special advantage of the locality type. This causes unnecessary movement of data among the levels of the memory hierarchy and increases in miss ratio. In this dissertation I propose a split data cache architecture that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition. In this system, because scalar and array references will no longer negatively affect each other, cache-interference is diminished, delivering better performance. Further improvement is achieved by the introduction of victim cache, prefetching, data flattening and reconfigurability to tune the array and scalar caches for specific application. The most significant contribution of my work is the introduction of novel cache architecture for embedded microprocessor platforms. My proposed cache architecture uses reconfigurability coupled with split data caches to reduce area and power consumed by cache memories while retaining performance gains. My results show excellent reductions in both memory size and memory access times, translating into reduced power consumption. Since there was a huge reduction in miss rates at L-1 caches, further power reduction is achieved by partially or completely shutting down L-2 data or L-2 instruction caches. The saving in cache sizes resulting from these designs can be used for other processor activities including instruction and data prefetching, branch-prediction buffers. The potential benefits of such techniques for embedded applications have been evaluated in my work. I also explore how my cache organization performs for non-numeric data structures. I propose a novel idea called "Data flattening" which is a profile based memory allocation technique to compress sparsely scattered pointer data into regular contiguous memory locations and explore the potentials of my proposed Spit cache organization for data treated with data flattening method.

APA, Harvard, Vancouver, ISO, and other styles

43

Harmon, C. Reid Jr. "IPU/LTB:a method for reducing effective memory latency." Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/5352.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Sridharan, Aswinkumar. "Adaptive and intelligent memory systems." Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S140/document.

Full text

Abstract:

Dans cette thèse, nous nous sommes concentrés sur l'interférence aux ressources de la hiérarchie de la mémoire partagée : cache de dernier niveau et accès à la mémoire hors-puce dans le contexte des systèmes multicœurs à grande échelle. À cette fin, le premier travail a porté sur les caches de dernier niveau partagées, où le nombre d'applications partageant le cache pourrait dépasser l'associativité du cache. Pour gérer les caches dans de telles situations, notre solution évalue l'empreinte du cache des applications pour déterminer approximativement à quel point elles pourraient utiliser le cache. L'estimation quantitative de l'utilitaire de cache permet explicitement de faire respecter différentes priorités entre les applications. La seconde partie apporte une prédétection dans la gestion de la mémoire cache. En particulier, nous observons les blocs cache pré-sélectionnés pour présenter un bon comportement de réutilisation dans le contexte de caches plus grands. Notre troisième travail est axé sur l'interférence entre les demandes à la demande et les demandes de prélecture à l'accès partagé à la mémoire morte. Ce travail est basé sur deux observations fondamentales de la fraction des requêtes de prélecture générées et de sa corrélation avec l'utilité de prélecture et l'interférence causée par le prélecteur. Au total, deux observations conduisent à contrôler le flux de requêtes de prélecture entre les mémoires LLC et off-chip
In this thesis, we have focused on addressing interference at the shared memory-hierarchy resources: last level cache and off-chip memory access in the context of large-scale multicore systems. Towards this end, the first work focused on shared last level caches, where the number of applications sharing the cache could exceed the associativity of the cache. To manage caches in such situations, our solution estimates the cache footprint of applications to approximate how well they could utilize the cache. Quantitative estimate of cache utility explicitly allows enforcing different priorities across applications. The second part brings in prefetch awareness in cache management. In particular, we observe prefetched cache blocks to exhibit good reuse behavior in the context of larger caches. Our third work focuses on addressing interference between on-demand and prefetch requests at the shared off-chip memory access. This work is based on two fundamental observations of the fraction of prefetch requests generated and its correlation with prefetch usefulness and prefetcher-caused interference. Altogether, two observations lead to control the flow of prefetch requests between LLC and off-chip memory

APA, Harvard, Vancouver, ISO, and other styles

45

Kaplan, Scott Frederick. "Compressed caching and modern virtual memory simulation /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Kimbrel, Tracy. "Parallel prefetching and caching /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6943.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Wong, Wayne A. "Techniques utilizing memory reference characteristics for improved performance /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Harmon, C. Reid. "IPU/LTB a method for reducing effective memory latency /." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04072004-180020/unrestricted/harmon%5Fc%5Fr%5F200312%Fphd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Davari, Mahdad. "Advances Towards Data-Race-Free Cache Coherence Through Data Classification." Doctoral thesis, Uppsala universitet, Avdelningen för datorteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-320595.

Full text

Abstract:

Providing a consistent view of the shared memory based on precise and well-defined semantics—memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times. In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases. Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption. In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics.

APA, Harvard, Vancouver, ISO, and other styles

50

Chandran, Pravin Chander. "Design of ALU and Cache memory for an 8 bit microprocessor." Connect to this title online, 2007. http://etd.lib.clemson.edu/documents/1202498822/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Cache memory'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles