Rozprawy doktorskie na temat „In-memory compute”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „In-memory compute”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Scrbak, Marko. "Methodical Evaluation of Processing-in-Memory Alternatives". Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505199/.
Pełny tekst źródłaThomas, Jonathan. "Asynchronous Validity Resolution in Sequentially Consistent Shared Virtual Memory". Fogler Library, University of Maine, 2001. http://www.library.umaine.edu/theses/pdf/Thomas.pdf.
Pełny tekst źródłaJiang, Song. "Efficient caching algorithms for memory management in computer systems". W&M ScholarWorks, 2004. https://scholarworks.wm.edu/etd/1539623446.
Pełny tekst źródłaSquillante, Mark S. "Issues in shared-memory multiprocessor scheduling : a performance evaluation /". Thesis, Connect to this title online; UW restricted, 1990. http://hdl.handle.net/1773/6858.
Pełny tekst źródłaSperens, Martin. "Dynamic Memory Managment in C++". Thesis, Luleå tekniska universitet, Datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-76611.
Pełny tekst źródłaChan, Chun Keung. "A study on non-volatile memory scaling in the sub-100nm regime /". View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202005%20CHAN.
Pełny tekst źródłaZeffer, Håkan. "Hardware–Software Tradeoffs in Shared-Memory Implementations". Licentiate thesis, Uppsala universitet, Avdelningen för datorteknik, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-86369.
Pełny tekst źródłaMcDonald, Ian Lindsay. "Memory management in a distributed system of single address space operating systems supporting quality of service". Thesis, University of Glasgow, 2001. http://theses.gla.ac.uk/5427/.
Pełny tekst źródłaBearpark, Keith. "Learning and memory in genetic programming". Thesis, University of Southampton, 2000. https://eprints.soton.ac.uk/45930/.
Pełny tekst źródłaOlson, Julius, i Emma Södergren. "Long Term Memory in Conversational Robots". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260316.
Pełny tekst źródłaI denna rapport behandlas implementeringen av ett långtidsminne i roboten Furhat. Idén bakom detta minne var att hindra roboten från att vara repetitiv och ställa allt för likartade eller identiska frågor till en konversationspartner. Projektet inkluderar användandet av tf-idf, samt inledande försök med word2vec i skapandet av vektorrepresentationer av dialogsystemets frågor, samt klustring av dessa representationer med algoritmen k-means. De genomförda testerna renderade goda resultat, vilket är lovande för implementering av en liknande mekanism i Furhats dialogsystem samt för framtida forskning inom långtidsminnesfunktionalitet i chatbots i allmänhet.
Mao, Yandong. "Fast in-memory storage systems : two aspects". Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/93819.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 109-114).
This dissertation addresses two challenges relating to in-memory storage systems. The first challenge is storing and retrieving data at a rate close to the capabilities of the underlying memory system, particularly in the face of parallel accesses from multiple cores. We present Masstree, a high performance in-memory key-value store that runs on a single multi-core server. Masstree is derived from a concurrent B+tree. It provides lock-free reads for good multi-core performance, which requires special care to avoid writes interfering with concurrent reads. To reduce time spent waiting for memory for workloads with long common key prefixes, Masstree arranges a set of B+trees into a Trie. Masstree uses software prefetch to further hide DRAM latency. Several optimizations improve concurrency. Masstree achieves millions of queries per second on a 16-core server, which is more than 30x as fast as MongoDB [6] or VoltDB [17]. The second challenge is replicating storage for fault-tolerance without being limited by slow writes to stable disk storage. Lazy VSR is a quorum-based replication protocol that is fast and can recover from simultaneous crashes of all the replicas as long as a majority revive with intact disks. The main idea is to acknowledge requests after recording them in memory, and to write updates to disk in the background, allowing large batched writes and thus good performance. A simultaneous crash of all replicas may leave the replicas with significantly different on-disk states; much of the design of Lazy VSR is concerned with reconciling these states efficiently during recovery. Lazy VSR's client-visible semantics are unusual in that the service may discard recent acknowledged updates if a majority of replicas crash. To demonstrate that clients can nevertheless make good use of Lazy VSR, we built a file system backend on it. Evaluation shows that Lazy VSR achieves much better performance than a version of itself with traditional group commit. Lazy VSR achieves 1.7 x the performance of ZooKeeper [42] and 3.6 x the performance of MongoDB [6].
by Yandong Mao.
Ph. D.
Zhao, Anthony Dong. "Modeling image-to-image confusions in memory". Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100611.
Pełny tekst źródłaThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Title as it appears in MIT Commencement Exercises program, June 5, 2015: Metamers in memory: predicting pairwise image confusions with deep learning. Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 81-83).
Previous experiments have examined what causes images to be remembered or forgotten. In these experiments, participants sometimes create false positives when identifying images they have seen before, but the precise cause of these false positives has remained unclear. We examine confusions between individual images as a possible cause of these false positives. We first introduce a new experimental task for examining measuring the rates at which participants confuse one image for another and show that the images prone to false positives are also ones that people tend to confuse. Second, we show that there is a correlation between how often people confuse pairs of images and how similar they find those pairs. Finally, we train a Siamese neural network to predict confusions between pairs of images. By studying the mechanisms behind the failures of memory, we hope to increase our understanding of memory as a whole and move closer to a computational model of memory.
by Anthony Dong Zhao.
M. Eng. in Computer Science and Engineering
Malviya, Nirmesh. "Recovery algorithms for in-memory OLTP databases". Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/75716.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (p. 63-66).
Fine-grained, record-oriented write-ahead logging, as exemplified by systems like ARIES, has been the gold standard for relational database recovery. In this thesis, we show that in modern high-throughput transaction processing systems, this is no longer the optimal way to recover a database system. In particular, as transaction throughputs get higher, ARIES-style logging starts to represent a non-trivial fraction of the overall transaction execution time. We propose a lighter weight, coarse-grained command logging technique which only records the transactions that were executed on the database. It then does recovery by starting from a transactionally consistent checkpoint and replaying the commands in the log as if they were new transactions. By avoiding the overhead of fine-grained, page-level logging of before and after images (and substantial associated I/O), command logging can yield significantly higher throughput at run-time. Recovery times for command logging are higher compared to ARIES, but especially with the advent of high-availability techniques that can mask the outage of a recovering node, recovery speeds have become secondary in importance to run-time performance for most applications. We evaluated our approach on an implementation of TPC-C in a main memory database system (VoltDB), and found that command logging can offer 1.5x higher throughput than a main-memory optimized implementation of ARIES.
by Nirmesh Malviya.
S.M.
Tu, Stephen Lyle. "Fast transactions for multicore in-memory databases". Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/82375.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (p. 55-57).
Though modern multicore machines have sufficient RAM and processors to manage very large in-memory databases, it is not clear what the best strategy for dividing work among cores is. Should each core handle a data partition, avoiding the overhead of concurrency control for most transactions (at the cost of increasing it for cross-partition transactions)? Or should cores access a shared data structure instead? We investigate this question in the context of a fast in-memory database. We describe a new transactionally consistent database storage engine called MAFLINGO. Its cache-centered data structure design provides excellent base key-value store performance, to which we add a new, cache-friendly serializable protocol and support for running large, read-only transactions on a recent snapshot. On a key-value workload, the resulting system introduces negligible performance overhead as compared to a version of our system with transactional support stripped out, while achieving linear scalability versus the number of cores. It also exhibits linear scalability on TPC-C, a popular transactional benchmark. In addition, we show that a partitioning-based approach ceases to be beneficial if the database cannot be partitioned such that only a small fraction of transactions access multiple partitions, making our shared-everything approach more relevant. Finally, based on a survey of results from the literature, we argue that our implementation substantially outperforms previous main-memory databases on TPC-C benchmarks.
by Stephen Lyle Tu.
S.M.
Sevcik, Jaroslav. "Program transformations in weak memory models". Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3132.
Pełny tekst źródłaShelor, Charles F. "Dataflow Processing in Memory Achieves Significant Energy Efficiency". Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1248478/.
Pełny tekst źródłaHedberg, Charlie Forsberg, i Alexander Pedersen. "Artificial Intelligence : Memory-driven decisions in games". Thesis, Blekinge Tekniska Högskola, Institutionen för teknik och estetik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3640.
Pełny tekst źródłaAtt utveckla AI (Artificiell Intelligence) i spel kan vara en hård och utmanande uppgift. Ibland är det önskvärt att skapa beteenden som följer något sorts logiskt mönster. För att kunna göra detta måste information samlas in och processas. I detta kandidatarbete presenteras en algoritm som kan assistera nuvarande AI-teknologier för att samla in och memorera omgivningsinformation. Denna uppsats täcker också riktlinjer för praktisk implementering fastställda genom undersökning och tester.
Detta är en reflekstionsdel till en digital medieproduktion.
Vrljicak, Tomislav. "Reinforcement learning in stochastic games against bounded memory opponents". Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98512.
Pełny tekst źródłaSchwiebert, Loren. "A comprehensive study of communication in distributed memory multiprocessors /". The Ohio State University, 1995. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487864986609059.
Pełny tekst źródłaLaing, C. D. "A reflective process memory in decision making". Thesis, University of Bristol, 1998. http://hdl.handle.net/1983/eb6a9ded-1e28-454e-baea-286bfe75f9bf.
Pełny tekst źródłaKearney, Garrett Donough Anthony. "Design of a memory based expert system for interpreting facial expressions in terms of signalled emotions". Thesis, University of Greenwich, 1991. http://gala.gre.ac.uk/6376/.
Pełny tekst źródłaDoudalis, Ioannis. "Hardware assisted memory checkpointing and applications in debugging and reliability". Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42700.
Pełny tekst źródłaGhosh, Mrinmoy. "Microarchitectural techniques to reduce energy consumption in the memory hierarchy". Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/28265.
Pełny tekst źródłaCommittee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhakar.
Barrie, Anne. "An exploratory study of computer-assisted memory training in head injury and schizophrenia /". Title page, contents and abstract only, 1989. http://web4.library.adelaide.edu.au/theses/09ARPS/09arpsb275.pdf.
Pełny tekst źródłaFagrell, Per, i Richard Eklycke. "Implementing Memory Protection in a Minimal OS". Thesis, Linköping University, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-17355.
Pełny tekst źródłaThe car industry has created a series of standards called AutoSAR as a response to the increasing number of processors in modern vehicles. Among these specifications is one for real-time operating systems (RTOS). This RTOS standard includes requirements for memory protection. This thesis outlines the work involved in introducing the memory protection outlined in this specification in the OSEck operating system. The work consisted of updating the operating system, implementing the AutoSAR OS API, and updating the suite of toolsused to build the finished system.The AutoSAR specifications were found to be very thorough and well thoughtout. The OS API was successfully implemented, and the data-structures needed to permit its functionality. The existing software tools were updated to conformwith the new requirements from AutoSAR, and additional software was createdto ease the configuration process.Memory protection was successfully implemented in the OSEck operating system, including two implementations of the trap interface. The memory protection functionality adds yet another layer of user-configuration to the operating system. Also, additional overhead for system calls, context switches and message passing is expected. A general evaluation of how OSEck application performance is aff ected is beyond the scope of this thesis, but preliminary studies of additional instruction counts on certain system calls have been performed.
Hsieh, Wilson Cheng-Yi. "Dynamic computation migration in distributed shared memory systems". Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/36635.
Pełny tekst źródłaVita.
Includes bibliographical references (p. 123-131).
by Wilson Cheng-Yi Hsieh.
Ph.D.
Ezell, Novice M. J. (Novice Marie Johnson) 1976. "Analysis of memory usage in a LaserJet printer". Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80062.
Pełny tekst źródłaIncludes bibliographical references (leaf 56).
by Novice M.J. Ezell.
S.B.and M.Eng.
Shai, Yee-man. "Effects of computer presentation formats on learning among elderly and younger adults the role of cognitive abilities /". Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B35804440.
Pełny tekst źródłaYuki, Tomofumi. "Dissertation beyond shared memory loop parallelism in the polyhedral model". Thesis, Colorado State University, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3565471.
Pełny tekst źródłaWith the introduction of multi-core processors, motivated by power and energy concerns, parallel processing has become main-stream. Parallel programming is much more difficult due to its non-deterministic nature, and because of parallel programming bugs that arise from non-determinacy. One solution is automatic parallelization, where it is entirely up to the compiler to efficiently parallelize sequential programs. However, automatic parallelization is very difficult, and only a handful of successful techniques are available, even after decades of research.
Automatic parallelization for distributed memory architectures is even more problematic in that it requires explicit handling of data partitioning and communication. Since data must be partitioned among multiple nodes that do not share memory, the original memory allocation of sequential programs cannot be directly used. One of the main contributions of this dissertation is the development of techniques for generating distributed memory parallel code with parametric tiling.
Our approach builds on important contributions to the polyhedral model, a mathematical framework for reasoning about program transformations. We show that many affine control programs can be uniformized only with simple techniques. Being able to assume uniform dependences significantly simplifies distributed memory code generation, and also enables parametric tiling. Our approach implemented in the AlphaZ system, a system for prototyping analyses, transformations, and code generators in the polyhedral model. The key features of AlphaZ are memory re-allocation, and explicit representation of reductions.
We evaluate our approach on a collection of polyhedral kernels from the PolyBench suite, and show that our approach scales as well as PLuTo, a state-of-the-art shared memory automatic parallelizer using the polyhedral model.
Automatic parallelization is only one approach to dealing with the non-deterministic nature of parallel programming that leaves the difficulty entirely to the compiler. Another approach is to develop novel parallel programming languages. These languages, such as X10, aim to provide highly productive parallel programming environment by including parallelism into the language design. However, even in these languages, parallel bugs remain to be an important issue that hinders programmer productivity.
Another contribution of this dissertation is to extend the array dataflow analysis to handle a subset of X10 programs. We apply the result of dataflow analysis to statically guarantee determinism. Providing static guarantees can significantly increase programmer productivity by catching questionable implementations at compile-time, or even while programming.
Demaine, Erik. "Effcient Simulation of Message-Passing in Distributed-Memory Architectures". Thesis, University of Waterloo, 1996. http://hdl.handle.net/10012/1069.
Pełny tekst źródłaSandblom, Johan. "Episodic memory in the human prefrontal cortex /". Stockholm, 2007. http://diss.kib.ki.se/2007/978-91-7357-136-4/.
Pełny tekst źródłaOlsson, Markus. "Design and Implementation of Transactions in a Column-Oriented In-Memory Database System". Thesis, Umeå University, Department of Computing Science, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32705.
Pełny tekst źródłaColdbase is a column-oriented in-memory database implemented in Java that is used with a specific workload in mind. Coldbase is optimized to receive large streams of timestamped trading data arriving at a fast pace while allowing simple but frequent queries that analyse the data concurrently. By limiting the functionality, Coldbase is able to reach a high performance while the memory consumption is low. This thesis presents ColdbaseTX which is an extension to Coldbase that adds support for transactions. It uses an optimistic approach by storing all writes of a transaction locally and applying them when the transaction commits. Readers are separated from writers by using two versions of the data which makes it possible to guarantee that readers are never blocked.Benchmarks compare Coldbase to ColdbaseTX regarding both performance andmemory efficiency. The results show that ColdbaseTX introduces a small overhead in both memory and performance which however is deemed acceptable since the gain is support for transactions.
Das, Jayita. "Auxiliary Roles in STT-MRAM Memory". Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5613.
Pełny tekst źródłaÅslin, Fredrik. "Evaluation of Hierarchical Temporal Memory in algorithmic trading". Thesis, Linköping University, Department of Computer and Information Science, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54235.
Pełny tekst źródłaThis thesis looks into how one could use Hierarchal Temporal Memory (HTM) networks to generate models that could be used as trading algorithms. The thesis begins with a brief introduction to algorithmic trading and commonly used concepts when developing trading algorithms. The thesis then proceeds to explain what an HTM is and how it works. To explore whether an HTM could be used to generate models that could be used as trading algorithms, the thesis conducts a series of experiments. The goal of the experiments is to iteratively optimize the settings for an HTM and try to generate a model that when used as a trading algorithm would have more profitable trades than losing trades. The setup of the experiments is to train an HTM to predict if it is a good time to buy some shares in a security and hold them for a fixed time before selling them again. A fair amount of the models generated during the experiments was profitable on data the model have never seen before, therefore the author concludes that it is possible to train an HTM so it can be used as a profitable trading algorithm.
Lundgren, Björn, i Anders Ödlund. "Exposure of Patterns in Parallel Memory Acces". Thesis, Linköping University, Department of Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-9795.
Pełny tekst źródłaThe concept and advantages of a Parallel Memory Architecture (PMA) in computer systems have been known for long but it’s only in recent time it has become interesting to implement modular parallel memories even in handheld embedded systems. This thesis presents a method to analyse source code to expose possible parallel memory accesses. Memory access Patterns may be found, categorized and the corresponding code marked for optimization. As a result a PMA compatible with found pattern(s) and code optimization may be specified.
Mills, Richard Tran. "Dynamic adaptation to CPU and memory load in scientific applications". W&M ScholarWorks, 2004. https://scholarworks.wm.edu/etd/1539623457.
Pełny tekst źródłaHo, Yuk. "Application of minimal perfect hashing in main memory indexing". Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/36445.
Pełny tekst źródłaAtmaca, Eralp 1976. "Hysteresis and memory effects in nanocrystal embedded MOS capacitors". Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/58686.
Pełny tekst źródłaIssued separately by degree.
Includes bibliographical references (p. 103-106).
Nanocrystal Memory is a promising new memory type which utilizes silicon nanocrystals and quantum mechanical direct tunneling current for charge storage. This thesis presents the work done to characterize the memory effect in nanocrystal embedded metal-oxide-semiconductor (NC-MOS) capacitors, the fundamental components of the nanocrystal memory. Various properties of the NC-MOS capacitors including gate stack composition, oxide charge storage and interface traps are studied by making high frequency and quasi-static capacitance voltage and current voltage measurements. High frequency and quasi-static capacitance characteristics reveal hysteresis which is evidence for the memory effect. A hysteresis of 2 V is demonstrated which is large enough to enable the use of nanocrystal embedded devices as memory devices. Measurement results suggest that the tunneling in the accumulation bias regime is mostly electron tunneling from the channel into the nanocrystals, and the tunneling in the inversion bias regime is hole tunneling from the channel into the nanocrystals. Charge is stored in the nanocrystals either in the discrete quantum dot states or in the interface traps that surround the nanocrystals. The oxide thickness is varied to control the tunneling rate and the retention time. A thinner tunnel oxide is necessary for achieving a higher tunneling rate which provides a faster write/erase. However, when the barrier thickness is lower, the charge confined in the nanocrystals can leak back into the channel more easily. Measurement conditions such as bias schemes, hold times, sweep rates and illumination can significantly influence the memory effect. It is demonstrated that the memory effect is enhanced by longer hold times, wider sweep regimes and light.
by Eralp Atmaca.
M.Eng.and S.B.
Stamatoiu, Oana L. (Oana Liana) 1981. "Learning commonsense categorical knowledge in a thread memory system". Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/17988.
Pełny tekst źródłaIncludes bibliographical references (p. 89-92).
If we are to understand how we can build machines capable of broad purpose learning and reasoning, we must first aim to build systems that can represent, acquire, and reason about the kinds of commonsense knowledge that we humans have about the world. This endeavor suggests steps such as identifying the kinds of knowledge people commonly have about the world, constructing suitable knowledge representations, and exploring the mechanisms that people use to make judgments about the everyday world. In this work, I contribute to these goals by proposing an architecture for a system that can learn commonsense knowledge about the properties and behavior of objects in the world. The architecture described here augments previous machine learning systems in four ways: (1) it relies on a seven dimensional notion of context, built from information recently given to the system, to learn and reason about objects' properties; (2) it has multiple methods that it can use to reason about objects, so that when one method fails, it can fall back on others; (3) it illustrates the usefulness of reasoning about objects by thinking about their similarity to other, better known objects, and by inferring properties of objects from the categories that they belong to; and (4) it represents an attempt to build a autonomous learner and reasoner, that sets its own goals for learning about the world and deduces new facts by reflecting on its acquired knowledge. This thesis describes this architecture, as well as a first implementation, that can learn from sentences such as "A blue bird flew to the tree" and "The small bird flew to the cage" that birds can fly. One of the main contributions of this work lies in suggesting a further set of salient ideas about how we can
(cont.) build broader purpose commonsense artificial learners and reasoners.
by Oana L. Stamatoiu.
M.Eng.
Zhang, Guowei Ph D. Massachusetts Institute of Technology. "Architectural support to exploit commutativity in shared-memory systems". Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106073.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (pages 57-64).
Parallel systems are limited by the high costs of communication and synchronization. Exploiting commutativity has historically been a fruitful avenue to reduce traffic and serialization. This is because commutative operations produce the same final result regardless of the order they are performed in, and therefore can be processed concurrently and without communication. Unfortunately, software techniques that exploit commutativity, such as privatization and semantic locking, incur high runtime overheads. These overheads offset the benefit and thereby limit the applicability of software techniques. To avoid high overheads, it would be ideal to exploit commutativity in hardware. In fact, hardware already provides much of the functionality that is required to support commutativity For instance, private caches can buffer and coalesce multiple updates. However, current memory hierarchies can understand only reads and writes, which prevents hardware from recognizing and accelerating commutative operations. The key insight this thesis develops is that, with minor hardware modifications and minimal extra complexity, cache coherence protocols, the key component of communication and synchronization in shared-memory systems, can be extended to allow local and concurrent commutative operations. This thesis presents two techniques that leverage this insight to exploit commutativity in hardware. First, Coup provides architectural support for a limited number of single-instruction commutative updates, such as addition and bitwise logical operations. CouP allows multiple private caches to simultaneously hold update-only permission to the same cache line. Caches with update-only permission can locally buffer and coalesce updates to the line, but cannot satisfy read requests. Upon a read request, Coup reduces the partial updates buffered in private caches to produce the final value. Second, CoMMTM is a commutativity-aware hardware transactional memory (HTM) that supports an even broader range of multi-instruction, semantically commutative operations, such as set insertions and ordered puts. COMMTM extends the coherence protocol with a reducible state tagged with a user-defined label. Multiple caches can hold a given line in the reducible state with the same label, and transactions can implement arbitrary user-defined commutative operations through labeled loads and stores. These commutative operations proceed concurrently, without triggering conflicts or incurring any communication. A non-commutative operation (e.g., a conventional load or store) triggers a user-defined reduction that merges the different cache lines and may abort transactions with outstanding reducible updates. CouP and CoMMTM reduce communication and synchronization in many challenging parallel workloads. At 128 cores, CouP accelerates state-of-the-art implementations of update-heavy algorithms by up to 2.4x, and COMMTM outperforms a conventional eager-lazy HTM by up to 3.4x and reduces or eliminates wasted work due to transactional aborts.
by Guowei Zhang.
S.M.
Beane, Glen L. "The Effects of Microprocessor Architecture on Speedup in Distrbuted Memory Supercomputers". Fogler Library, University of Maine, 2004. http://www.library.umaine.edu/theses/pdf/BeaneGL2004.pdf.
Pełny tekst źródłaAbbas, Gulfam, i Naveed Asif. "Performance Tradeoffs in Software Transactional Memory". Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-6059.
Pełny tekst źródłaMcNamee, Dylan James. "Virtual memory alternatives for transaction buffer management in a single-level store /". Thesis, Connect to this title online; UW restricted, 1996. http://hdl.handle.net/1773/6961.
Pełny tekst źródłaJohnson, Gregory. "Beliefs of Graduate Students About Unstructured Computer Use in Face-to-Face Classes with Internet Access and its Influence on Student Recall". Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2089.
Pełny tekst źródłaPh.D.
Department of Educational Research, Technology and Leadership
Education
Education PhD
Balasubramaniam, Mahadevan. "Performance analysis and evaluation of dynamic loop scheduling techniques in a competitive runtime environment for distributed memory architectures". Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04022003-154254.
Pełny tekst źródłaMuñiz, Navarro José Alberto. "A hybrid data structure for dense keys in in-memory database systems". Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62662.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (p. 71-72).
This thesis presents a data structure which performs well for in-memory indexing of keys that are unevenly distributed into clusters with a high density of keys. This pattern is prevalent, for example, in systems that use tables with keys where one field is auto-incremented. These types of tables are widely used. The proposed data structure consists of a B+ Tree with intervals as keys, and arrays as values. Each array holds a cluster of values, while the clusters themselves are managed by the B+ Tree for space and cache efficiency. Using the H-Tree as an in-memory indexing structure for an implementation of the TPC-C benchmark sped up the transaction processing time by up to 50% compared to an implementation based on B+Trees, and showed even more dramatic performance gains in the presence of few and large clusters of data.
by José Alberto Muñiz Navarro.
M.Eng.
Gul, Saba. "Novelty in goal-oriented machines using a thread memory structure". Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/53143.
Pełny tekst źródłaIncludes bibliographical references (leaves 55-56).
Resourcefulness and creativity are desirable properties for an intelligent machine. The incredible adeptness of the human mind at seeing situations from diverse viewpoints allows it to conjure many techniques to accomplish the same goal, and hence recover elegantly when one method fails. In the context of goal-oriented machines, this thesis presents a system that finds substitutes for the typical physical resource used to accomplish a goal, by finding novel uses for other, available resources-uses that these resources were not originally meant or designed for. In a domain where an object can serve multiple functions, this requires: (1) understanding the functional context the object is operating in; (2) building a realistic representation of the given objects, which often do not fall neatly into tightly-structured categorizations, but instead share properties with other 'boundary' objects. The system does this by learning from examples, and using the average member, or 'stereotype' as the class representative; (3) allowing imperfection: identifying properties that are not crucial for goal satisfaction, and selectively ignoring them; and (4) measuring similarity between objects to find the best substitute. The system bootstraps with knowledge about the properties of the objects and is given positive and negative examples for the goal. It can infer, for example, that two objects such as an orange (the typical resource) and a ball (the positive example) are related in the context of finding a throwable object on account of their similarity in shape and size, but unrelated in the context of finding an ingredient for a fruit salad, because one is a fruit and the other is not.
(cont.) It then finds a substitute that shares shape and size features with the orange. If, on the other hand, we need an ingredient for a fruit salad, we can supply it another edible fruit as a positive example. The system is implemented in Java; its performance is illustrated with 7 examples in the domain of everyday objects.
by Saba Gul.
M.Eng.
Qazi, Masood. "Circuit design for embedded memory in low-power integrated circuits". Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/75645.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (p. 141-152).
This thesis explores the challenges for integrating embedded static random access memory (SRAM) and non-volatile memory-based on ferroelectric capacitor technology-into lowpower integrated circuits. First considered is the impact of process variation in deep-submicron technologies on SRAM, which must exhibit higher density and performance at increased levels of integration with every new semiconductor generation. Techniques to speed up the statistical analysis of physical memory designs by a factor of 100 to 10,000 relative to the conventional Monte Carlo Method are developed. The proposed methods build upon the Importance Sampling simulation algorithm and efficiently explore the sample space of transistor parameter fluctuation. Process variation in SRAM at low-voltage is further investigated experimentally with a 512kb 8T SRAM test chip in 45nm SOI CMOS technology. For active operation, an AC coupled sense amplifier and regenerative global bitline scheme are designed to operate at the limit of on current and off current separation on a single-ended SRAM bitline. The SRAM operates from 1.2 V down to 0.57 V with access times from 400ps to 3.4ns. For standby power, a data retention voltage sensor predicts the mismatch-limited minimum supply voltage without corrupting the contents of the memory. The leakage power of SRAM forces the chip designer to seek non-volatile memory in applications such as portable electronics that retain significant quantities of data over long durations. In this scenario, the energy cost of accessing data must be minimized. This thesis presents a ferroelectric random access memory (FRAM) prototype that addresses the challenges of sensing diminishingly small charge under conditions favorable to low access energy with a time-to-digital sensing scheme. The 1 Mb IT1C FRAM fabricated in 130 nm CMOS operates from 1.5 V to 1.0 V with corresponding access energy from 19.2 pJ to 9.8 pJ per bit. Finally, the computational state of sequential elements interspersed in CMOS logic, also restricts the ability to power gate. To enable simple and fast turn-on, ferroelectric capacitors are integrated into the design of a standard cell register, whose non-volatile operation is made compatible with the digital design flow. A test-case circuit containing ferroelectric registers exhibits non-volatile operation and consumes less than 1.3 pJ per bit of state information and less than 10 clock cycles to save or restore with no minimum standby power requirement in-between active periods.
by Masood Qazi.
Ph.D.
Zheng, Wenting. "Fast checkpoint and recovery techniques for an in-memory database". Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/91701.
Pełny tekst źródłaThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-62).
Multicore in-memory databases for modern machines can support extraordinarily high transaction rates for online transaction processing workloads. A potential weakness of such databases, however, is recovery from crash failures. We show that techniques for disk-based persistence can be ecient enough to keep up with current systems' huge memory sizes and fast transaction rates, be smart enough to avoid additional contention, and provide fast recovery. This thesis presents SiloR, a persistence system built for a very fast multicore database system called Silo. We show that naive logging and checkpoints make normal-case execution slower, but that careful design of the persistence system allows us to keep up with many workloads without negative impact on runtime performance. We design the checkpoint and logging system to utilize multicore's resources to its fullest extent, both during runtime and during recovery. Parallelism allows the system to recover fast. Experiments show that a large database (~~ 50 GB) can be recovered in under five minutes.
by Wenting Zheng.
M. Eng.
Kjelso, Morten. "A quantitative evaluation of data compression in the memory hierarchy". Thesis, Loughborough University, 1997. https://dspace.lboro.ac.uk/2134/10596.
Pełny tekst źródła