Dissertations / Theses: 'Prefetching'

1

Lee, Vivian Tin-Wai. "User directed prefetching." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ57760.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Lee, Vivian Tin-Wai 1972 Carleton University Dissertation Computer Science. "User directed prefetching." Ottawa.:, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Kimbrel, Tracy. "Parallel prefetching and caching /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6943.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

FIGUEROA, AMPARITO ALEXANDRA MORALES. "PREFETCHING CONTENT IN MULTIMEDIA PRESENTATIONS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24285@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Quando entregamos e apresentamos aplicações multimídia por meio de uma rede de comunicação, a latência de exibição pode representar um fator central e crítico que afeta a qualidade da apresentação multimídia. Na entrega de uma apresentação multimídia de boa qualidade o sincronismo é prevalecido, consequentemente, os conteúdos são exibidos de forma contínua, conforme as especificações do autor da aplicação. Nesta tese, um plano de pré-busca de conteúdos multimídia é proposto com o intuito de reduzir a latência de exibição e garantir o sincronismo entre os objetos de mídia que fazem parte da apresentação multimídia. O mecanismo proposto considera as aplicações multimídia desenvolvidas na linguagem declarativa NCL e utiliza a vantagem do sincronismo estar baseado em eventos, na determinação da ordem adequada de recuperação dos diferentes objetos de mídia e no cálculo dos seus tempos de início de recuperação. Aspectos importantes a serem considerados em um ambiente de pré-busca são levantados e os diferentes algoritmos que compõem o plano de pré-busca são desenvolvidos.
When delivering and presenting multimedia applications through a communication network, the presentation lag could be a major and critical factor affecting the multimedia presentation quality. In a good quality presentation the synchronism is always preserved, hence all the contents are presented in a continue way according to the authoring specifications. In this dissertation, a multimedia content prefetching plan is proposed in order to minimize the presentation lag and guarantee the synchronism between the media objects, which constitute the multimedia application. The proposed mechanism regards the multimedia applications developed using the NCL declarative language and it uses the events based synchronism advantage to determinate the ideal retrieval order of the media objects and to calculate their start retrieval times. Furthermore, important issues to be considered in a prefetch ambient are raised and the different algorithms that belong to the prefetching plan are developed.

APA, Harvard, Vancouver, ISO, and other styles

5

Charmchi, Niloofar. "Compressed cache layout aware prefetching." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S017.

Full text

Abstract:

Le fossé de vitesse entre le processeur et la mémoire dégrade la performance. La compression de cache et le préchargement matériel sont deux techniques qui pourraient éviter ce goulet d'étranglement en diminuant les échecs du dernier niveau de cache. Pourtant, la compression et le préchargement ont des interactions positives, car le préchargement bénéficie d'une capacité de cache plus élevée et la compression augmente la taille effective du cache. Cette étude propose Compressed cache Layout Aware Prefetching (CLAP) pour exploiter les schémas de cache sectoriels compressés, comme SCC ou YACC, pour créer une synergie entre les caches compressés et le préchargement. L'idée de cette approche est de précharger des blocs contigus qui peuvent être compressés et co-alloués ensemble avec le bloc demandé sur un défaut de cache. Les blocs préchargés qui partagent le stockage avec les blocs existants n'ont pas besoin d'expulser une entrée existante valide; par conséquent, CLAP évite la pollution de cache. Afin de choisir les blocs co-allouables à précharger, nous proposons un prédicteur de compression. Sur nos évaluations expérimentales, CLAP réduit le nombre de défauts de cache de 9% et améliore les performances de 3% en moyenne, par rapport à un cache compressé. De plus, afin d'obtenir plus d'améliorations, nous unifions CLAP et d'autres politiques de préchargement et introduisons deux CLAP adaptatifs qui sélectionnent le meilleur politique de préchargement selon l'application
The speed gap between CPU and memory is impairing performance. Cache compression and hardware prefetching are two techniques that could confront this bottleneck by decreasing last level cache misses. However, compression and prefetching have positive interactions, as prefetching benefits from higher cache capacity and compression increases the effective cache size. This study proposes Compressed cache Layout Aware Prefetching (CLAP) to leverage the recently proposed sector-based compressed cache layouts, such as SCC or YACC, to create a synergy between compressed caches and prefetching. The idea of this approach is to prefetch contiguous blocks that can be compressed and co-allocated together with the requested block on a miss access. Prefetched blocks that share storage with existing blocks do not need to evict a valid existing entry; therefore, CLAP avoids cache pollution. In order to decide the co-allocatable blocks to prefetch, we propose a compression predictor. Based on our experimental evaluations, CLAP reduces the number of cache misses by 9% and improves performance by 3% on average, comparing to a compressed cache. Furthermore, in order to get more improvements, we unify CLAP and other prefetchers and introduce two adaptive CLAPs which select the best prefetcher based on the application

APA, Harvard, Vancouver, ISO, and other styles

6

Sharma, Mayank. "PERFORMANCE EVALUATION OF AN ENHANCED POPULARITY-BASED WEB PREFETCHING TECHNIQUE." University of Akron / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=akron1164053047.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Touma, Rizkallah. "Computer-language based data prefetching techniques." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/665207.

Full text

Abstract:

Data prefetching has long been used as a technique to improve access times to persistent data. It is based on retrieving data records from persistent storage to main memory before the records are needed. Data prefetching has been applied to a wide variety of persistent storage systems, from file systems to Relational Database Management Systems and NoSQL databases, with the aim of reducing access times to the data maintained by the system and thus improve the execution times of the applications using this data. However, most existing solutions to data prefetching have been based on information that can be retrieved from the storage system itself, whether in the form of heuristics based on the data schema or data access patterns detected by monitoring access to the system. There are multiple disadvantages of these approaches in terms of the rigidity of the heuristics they use, the accuracy of the predictions they make and / or the time they need to make these predictions, a process often performed while the applications are accessing the data and causing considerable overhead. In light of the above, this thesis proposes two novel approaches to data prefetching based on predictions made by analyzing the instructions and statements of the computer languages used to access persistent data. The proposed approaches take into consideration how the data is accessed by the higher-level applications, make accurate predictions and are performed without causing any additional overhead. The first of the proposed approaches aims at analyzing instructions of applications written in object-oriented languages in order to prefetch data from Persistent Object Stores. The approach is based on static code analysis that is done prior to the application execution and hence does not add any overhead. It also includes various strategies to deal with cases that require runtime information unavailable prior to the execution of the application. We integrate this analysis approach into an existing Persistent Object Store and run a series of extensive experiments to measure the improvement obtained by prefetching the objects predicted by the approach. The second approach analyzes statements and historic logs of the declarative query language SPARQL in order to prefetch data from RDF Triplestores. The approach measures two types of similarity between SPARQL queries in order to detect recurring query patterns in the historic logs. Afterwards, it uses the detected patterns to predict subsequent queries and launch them before they are requested to prefetch the data needed by them. Our evaluation of the proposed approach shows that it high-accuracy prediction and can achieve a high cache hit rate when caching the results of the predicted queries.
Precargar datos ha sido una de las técnicas más comunes para mejorar los tiempos de acceso a datos persistentes. Esta técnica se basa en predecir los registros de datos que se van a acceder en el futuro y cargarlos del almacenimiento persistente a la memoria con antelación a su uso. Precargar datos ha sido aplicado en multitud de sistemas de almacenimiento persistente, desde sistemas de ficheros a bases de datos relacionales y NoSQL, con el objetivo de reducir los tiempos de acceso a los datos y por lo tanto mejorar los tiempos de ejecución de las aplicaciones que usan estos datos. Sin embargo, la mayoría de los enfoques existentes utilizan predicciones basadas en información que se encuentra dentro del mismo sistema de almacenimiento, ya sea en forma de heurísticas basadas en el esquema de los datos o patrones de acceso a los datos generados mediante la monitorización del acceso al sistema. Estos enfoques presentan varias desventajas en cuanto a la rigidez de las heurísticas usadas, la precisión de las predicciones generadas y el tiempo que necesitan para generar estas predicciones, un proceso que se realiza con frecuencia mientras las aplicaciones acceden a los datos y que puede tener efectos negativos en el tiempo de ejecución de estas aplicaciones. En vista de lo anterior, esta tesis presenta dos enfoques novedosos para precargar datos basados en predicciones generadas por el análisis de las instrucciones y sentencias del lenguaje informático usado para acceder a los datos persistentes. Los enfoques propuestos toman en consideración cómo las aplicaciones acceden a los datos, generan predicciones precisas y mejoran el rendimiento de las aplicaciones sin causar ningún efecto negativo. El primer enfoque analiza las instrucciones de applicaciones escritas en lenguajes de programación orientados a objetos con el fin de precargar datos de almacenes de objetos persistentes. El enfoque emplea análisis estático de código hecho antes de la ejecución de las aplicaciones, y por lo tanto no afecta negativamente el rendimiento de las mismas. El enfoque también incluye varias estrategias para tratar casos que requieren información de runtime no disponible antes de ejecutar las aplicaciones. Además, integramos este enfoque en un almacén de objetos persistentes y ejecutamos una serie extensa de experimentos para medir la mejora de rendimiento que se puede obtener utilizando el enfoque. Por otro lado, el segundo enfoque analiza las sentencias y logs del lenguaje declarativo de consultas SPARQL para precargar datos de triplestores de RDF. Este enfoque aplica dos medidas para calcular la similtud entre las consultas del lenguaje SPARQL con el objetivo de detectar patrones recurrentes en los logs históricos. Posteriormente, el enfoque utiliza los patrones detectados para predecir las consultas siguientes y precargar con antelación los datos que necesitan. Nuestra evaluación muestra que este enfoque produce predicciones de alta precisión y puede lograr un alto índice de aciertos cuando los resultados de las consultas predichas se guardan en el caché.

APA, Harvard, Vancouver, ISO, and other styles

8

Grannæs, Marius. "Bandwidth-Aware Prefetching in Chip Multiprocessors." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2006. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-10115.

Full text

Abstract:

Chip Multiprocessors (CMP) are an increasingly popular architecture and increasing numbers of vendors are now offering CMP solutions. The shift to CMP architectures from uniprocessors is driven by the increasing complexity of cores, the processor-memory performance gap, limitations in ILP and increasing power requirements. Prefetching is a successful technique commonly used in high performance processors to hide latency. In a CMP, prefetching offers new opportunities and challenges, as current uniprocessor heuristics will need adaption or redesign to integrate with CMPs. In this thesis, I look at the state of the art in prefetching and CMP architecture. I conduct experiments on how unmodified uniprocessor prefetching heuristics perform in a CMP. In addition, I have proposed a new prefetching scheme based on bandwidth monitoring and prediction through performance counters, suited for embedded CMP systems. This new prefetching scheme has been simulated with SimpleScalar. It offers lower bandwidth usage (up to 47.8 %), while retaining most of the performance gains from prefetching for low accuracy prefetching heuristics.

APA, Harvard, Vancouver, ISO, and other styles

9

Cevikbas, Safak Burak. "Visibility Based Prefetching With Simulated Annealing." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609324/index.pdf.

Full text

Abstract:

Complex urban scene rendering is not feasible without culling invisible geometry before the rendering actually takes place. Visibility culling can be performed on predefined regions of scene where for each region a potential visible set of scene geometry is computed. Rendering cost is reduced since instead of a bigger set only a single PVS which is associated with the region of the viewer is rendered. However, when the viewer leaves a region and enters one of its neighbors, disposing currently loaded PVS and loading the new PVS causes stalls. Prefetching policies are utilized to overcome stalls by loading PVS of a region before the viewer enters it. This study presents a prefetching method for interactive urban walkthroughs. Regions and transitions among them are represented as a graph where the regions are the nodes and transitions are the edges. Groups of nodes are formed according to statistical data of transitions and used as the prefetching policy. Some heuristics for constructing groups of nodes are developed and Simulated Annealing is utilized for constructing optimized groups based on developed heuristics. The proposed method and underlying application of Simulated Annealing are customized for minimizing average transition cost.

APA, Harvard, Vancouver, ISO, and other styles

10

Palpanas, Themistoklis. "Web prefetching using partial match prediction." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape10/PQDD_0022/MQ40748.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Bilgin, Ahmet Soydan. "Deriving Efficient SQL Sequences Via Prefetching." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-01032008-141946/.

Full text

Abstract:

Modern information architectures place business logic in an application server and persistent objects in a relational DBMS. To effectively realize such architectures, we must surmount the problem of effectively fetching objects from DBMS to the application server. Object access patterns are not random; they are driven by applications and user behaviors. Naive implementations retrieve objects from the DBMS as such objects are requested by the application, costing a DBMS roundtrip for each query. This fact, coupled with the growing performance bottleneck of computer storage systems, has resulted in a significant amount of research improving object access behavior through predicting future access objects because throughput will continue to improve but latency will not. Latency will be an ever-increasing component of object access cost. In turn, object access cost is usually the bottleneck for modern high performance systems. This yields unacceptably poor performance when application server submit sequence of relational queries to DBMSs. A reasonable approach would be to generate prefetch queries that retrieve objects that would later be requested by the application. However, whereas some prefetch queries would be beneficial, some would not. Distinguishing between them is nontrivial in practice, because commercial DBMSs do not expose efficient query response-time estimators. First, there is no standardized interface for an application server to force the database system to calculate the costs (e.g., response time) for a given query. Second, we still have the entire roundtrip costs between application servers and the DBMSs to estimate the total cost of query evaluation. Consequently, in current practice, programmers spend enormous amounts of time tuning the queries by which objects are retrieved by an application. This dissertation develops an application-independent approach for generating prefetch queries that can be implemented in conventional middleware systems. The main contribution of this dissertation is a set of application-independent guidelines for selecting, based on application's access patterns and additional parameters, efficient ways of merging the application's data requests into prefetch queries. Our guidelines take the current configuration such as local or wide area networks into account, allowing it to select strategies that give good performance in a wider range of configurations. The ensuing performance gains are evaluated via realistic settings based on a retail database inspired by the SPECJ performance test suite.

APA, Harvard, Vancouver, ISO, and other styles

12

DE, LA OSSA PÉREZ BERNARDO ANTONIO. "Web Prefetching Techniques in Real Environments." Doctoral thesis, Universitat Politècnica de València, 2012. http://hdl.handle.net/10251/15574.

Full text

Abstract:

Esta tesis estudia la aplicación a la World Wide Web (WWW) de las técnicas de prebúsqueda desde un punto de vista realista y práctico. La prebúsqueda se aplica a la web para reducir la latencia percibida por los usuarios ya que, básicamente, consiste en predecir y preprocesar los siguientes accesos de los usuarios. Hasta ahora, la literatura disponible acerca de la prebúsqueda web se ha concentrado en cuestiones teóricas y no ha considerado algunos de los problemas que aparecen al implementar la técnica en condiciones reales. Por otra parte, los trabajos de investigación existentes usan para la evaluación modelos simplificados que no consideran cómo los aspectos prácticos afectan realmente a la implementación de una técnica de prebúsqueda. Además, apenas unos pocos trabajos han usado índices de prestaciones que sean relevantes para los usuarios en la evaluación de los beneficios que la prebúsqueda puede lograr. Con objeto de superar estas tres restricciones se ha desarrollado Delfos, un entorno de prebúsqueda web que implementa predicción y prebúsqueda en un sistema real, puede integrarse en la arquitectura web sin realizar modificaciones en los protocolos web estándar, y es compatible con los programas existentes. Delfos también puede usarse para evaluar y comparar técnicas de prebúsqueda y algoritmos de predicción así como ayudar en el diseño de otros nuevos ya que proporciona información estadística detallada de los experimentos llevados a cabo. A modo de ejemplo, Delfos se ha usado para proponer, probar y evaluar una nueva técnica (Predecir en la Prebúsqueda, PAP) que es capaz de reducir considerablemente la latencia percibida por el usuario sin costes adicionales respecto al mecanismo de prebúsqueda básico. Los algoritmos de predicción propuestos en la literatura de investigación que alcanzan la mayor precisión incurren en un alto coste computacional, y esto representa un problema para incluirlos en sistemas reales. Para aminorar este inconveniente, en esta tesis se propone un nuevo algoritmo de predicci on de bajo coste, (Referrer Graph, RG).
De La Ossa Pérez, BA. (2011). Web Prefetching Techniques in Real Environments [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/15574
Palancia

APA, Harvard, Vancouver, ISO, and other styles

13

Bracamonte, Nunez Matherey. "Prefetching for the Kilo-Instruction Processor." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174843.

Full text

Abstract:

The large latency of memory accesses in modern computer systems is a key obstacle to achieving high processor utilization. Techniques to reduce or tolerate large memory latencies become essential for achieving high processor utilization. Prefetch is one of the most widely studied mechanisms at literature. This mechanism predicts the future effective addresses of loads to bring in advance their data to the upper and faster levels of the memory hierarchy. Another technique to alleviate the memory gap is the out-of-order commit, implemented in the Kilo-Instruction processors. This technique is based on the fact that independent instructions of a delinquent load can be executed even if the data for that load is not still available. The goal of this thesis project is to study a stride prefetching mechanism built on top of a Kilo-Instruction processor. I implemented the stride prefetching mechanism on SimpleScalar 3.0 and evaluate several sets of prefetch parameters by simulating them in a Kilo-instruction processor environment using the SPEC2000 benchmarks. The results show that the prefetching scheme effectively eliminates a major portion of data access penalty for a uniprocessor environment but provides less than 15% speedup improvement when applied to the Kilo-Instruction processor.

APA, Harvard, Vancouver, ISO, and other styles

14

Chen, Tien-Fu. "Data prefetching for high-performance processors /." Thesis, Connect to this title online; UW restricted, 1993. http://hdl.handle.net/1773/6871.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Doshi, Punit Rameshchandra. "Adaptive prefetching for visual data exploration." Link to electronic thesis, 2003. http://www.wpi.edu/Pubs/ETD/Available/etd-0131103-203307.

Full text

Abstract:

Thesis (M.S.)--Worcester Polytechnic Institute.
Keywords: Adaptive prefetching; Large-scale multivariate data visualization; Semantic caching; Hierarchical data exploration; Exploratory data analysis. Includes bibliographical references (p.66-70).

APA, Harvard, Vancouver, ISO, and other styles

16

Ainsworth, Sam. "Prefetching for complex memory access patterns." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/277804.

Full text

Abstract:

Modern-day workloads, particularly those in big data, are heavily memory-latency bound. This is because of both irregular memory accesses, which have no discernable pattern in their memory addresses, and large data sets that cannot fit in any cache. However, this need not be a barrier to high performance. With some data structure knowledge it is typically possible to bring data into the fast on-chip memory caches early, so that it is already available by the time it needs to be accessed. This thesis makes three contributions. I first contribute an automated software prefetching compiler technique to insert high-performance prefetches into program code to bring data into the cache early, achieving 1.3x geometric mean speedup on the most complex processors, and 2.7x on the simplest. I also provide an analysis of when and why this is likely to be successful, which data structures to target, and how to schedule software prefetches well. Then I introduce a hardware solution, the configurable graph prefetcher. This uses the example of breadth-first search on graph workloads to motivate how a hardware prefetcher armed with data-structure knowledge can avoid the instruction overheads, inflexibility and limited latency tolerance of software prefetching. The configurable graph prefetcher sits at the L1 cache and observes memory accesses, which can be configured by a programmer to be aware of a limited number of different data access patterns, achieving 2.3x geometric mean speedup on graph workloads on an out-of-order core. My final contribution extends the hardware used for the configurable graph prefetcher to make an event-triggered programmable prefetcher, using a set of a set of very small micro-controller-sized programmable prefetch units (PPUs) to cover a wide set of workloads. I do this by developing a highly parallel programming model that can be used to issue prefetches, thus allowing high-throughput prefetching with low power and area overheads of only around 3%, and a 3x geometric mean speedup for a variety of memory-bound applications. To facilitate its use, I then develop compiler techniques to help automate the process of targeting the programmable prefetcher. These provide a variety of tradeoffs from easiest to use to best performance.

APA, Harvard, Vancouver, ISO, and other styles

17

Ekemark, Per. "Static Multi-Versioning for Efficient Prefetching." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-300753.

Full text

Abstract:

Energy efficiency is one of the biggest challenges in modern computer architecture. Increased performance and improved energy efficiency is always in demand whether it is for battery longevity in mobile applications or thermal limits in high-performance computing. To reach the best result, hardware and software must compliment each other. Software Decoupled Access-Execute (DAE) is a technique where memory operations are rearranged and grouped together by the compiler. Larger sections of memory bound code allows more efficient use of hardware Dynamic Voltage and Frequency Scaling (DVFS), which can save energy without affecting performance. While previous work in automatically generated software DAE has used a one size fits all approach, this work adds a parametrisation aspect, making it possible to explore multiple versions of optimised code. Given a parameter, a heuristic can scale how aggressively DAE is applied in order to generate multiple optimised versions, increasing the chance of finding a better fit for a wider variety of programs. On targeted code in 7 programs from the SPEC CPU2006 benchmark suite, this technique yields an average energy delay product (EDP) improvement of 12 % (10 % energy and 2 % performance) over coupled execution (non-DAE), with a peak EDP improvement of 36 %. The multi-versioning aspect also proves useful as different programs, and even alternate workloads of the same program, reach peak performance with different optimisation versions.

APA, Harvard, Vancouver, ISO, and other styles

18

Torrents, Lapuerta Martí. "Improving prefetching mechanisms for tiled CMP platforms." Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/404418.

Full text

Abstract:

Recently, high performance processor designs have evolved toward Chip-Multiprocessor (CMP) architectures to deal with instruction level parallelism limitations and, more important, to manage the power consumption that is becoming unaffordable due to the increased transistor count and clock frequency. At the present moment, this architecture, which implements multiple processing cores on a single die, is commercially available with up to twenty four processors on a single chip and there are roadmaps and research trends that suggest that number of cores will increase in the near future. The increasing on number of cores has converted the interconnection network in a key issue that will have significant impact on performance. Moreover, as the number of cores increases, tiled architectures are foreseen to provide a scalable solution to handle design complexity. Network-on-Chip (NoC) emerges as a solution to deal with growing on-chip wire delays. On the other hand, CMP designs are likely to be equipped with latency hiding techniques like prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss rates would lead to. Unfortunately, the extra number of network messages that prefetching entails can drastically increase power consumption and the latency in the NoC. In this thesis, we do not develop a new prefetching technique for CMPs but propose improvements applicable to any of them. Specifically, we analyze the behavior of the prefetching in the CMPs and its impact to the interconnect. We propose several dynamic management techniques to improve the performance of the prefetching mechanism in the system. Furthermore, we identify the main problems when implementing prefetching in distributed memory systems like tiled architectures and propose directions to solve them. Finally, we propose several research lines to continue the work done in this thesis.
Recentment l'arquitectura dels processadors d'altes prestacions ha evolucionat cap a processadors amb diversos nuclis per a concordar amb les limitacions del paral·lelisme a nivell d'instrucció i, mes important encara, per tractar el consum d'energia que ha esdevingut insostenible degut a l'increment de transistors i la freqüència de rellotge. Ara mateix, aquestes arquitectures, que implementes varis nuclis en un sol xip, estan a la venta amb mes de vint-i-quatre processadors en un sol xip i hi ha previsions que suggereixen que aquest nombre de nuclis creixerà en un futur pròxim. Aquest increment del nombre de nuclis, ha convertit la xarxa que els connecta en un punt clau que tindrà un impacte important en el seu rendiment. Una topologia de xarxa que sembla que serà capaç de proveir una solució escalable per aquestes arquitectures ha estat la topologia tile. Les xarxes en el xip (NoC) es presenten com la solució del increment de la latència dels cables del xip. Per altre banda, els dissenys de multiprocessadors seguiran disposant de tècniques de reducció de latència de memòria com el prefetch per tal de reduir l'impacte negatiu en rendiment que, altrament, tindríem degut als elevats temps de latència en fallades a memòria cache. Desafortunadament, el gran nombre de peticions destinades a prefetch, pot augmentar dràsticament la congestió a la xarxa i el consum d'energia. En aquesta tesi, no desenvolupem cap tècnica nova de prefetching, però proposem millores aplicables a qualsevol d'ells. Concretament analitzem el comportament del prefetching en multiprocessadors i el seu impacte a la xarxa. Proposem diverses tècniques de control dinàmic per millor el rendiment del prefetcher al sistema. A més, identifiquem els problemes principals d'implementar el prefetching en els sistemes de memòria distribuïts com els de les arquitectures tile i proposem línies d'investigació per solucionar-los. Finalment, també proposem diverses línies d'investigació per continuar amb el treball fet en aquesta tesi.

APA, Harvard, Vancouver, ISO, and other styles

19

Bowman, Ivan. "Scalpel: Optimizing Query Streams Using Semantic Prefetching." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/1093.

Full text

Abstract:

Client applications submit streams of relational queries to database servers. For simple requests, inter-process communication costs account for a significant portion of user-perceived latency. This trend increases with faster processors, larger memory sizes, and improved database execution algorithms, and this trend is not significantly offset by improvements in communication bandwidth. Caching and prefetching are well studied approaches to reducing user-perceived latency. Caching is useful in many applications, but it does not help if future requests rarely match previous requests. Prefetching can help in this situation, but only if we are able to predict future requests. This prediction is complicated in the case of relational queries by the presence of request parameters: a prefetching algorithm must predict not only a query that will be executed in the future, but also the actual parameter values that will be supplied. We have found that, for many applications, the streams of submitted queries contain patterns that can be used to predict future requests. Further, there are correlations between results of earlier requests and actual parameter values used in future requests. We present the Scalpel system, a prototype implementation that detects these patterns of queries and optimizes request streams using context-based predictions of future requests. Scalpel uses its predictions to provide a form of semantic prefetching, which involves combining a predicted series of requests into a single request that can be issued immediately. Scalpel's semantic prefetching reduces not only the latency experienced by the application but also the total cost of query evaluation. We describe how Scalpel learns to predict optimizable request patterns by observing the application's request stream during a training phase. We also describe the types of query pattern rewrites that Scalpel's cost-based optimizer considers. Finally, we present empirical results that show the costs and benefits of Scalpel's optimizations. We have found that even when an application is well suited for its original configuration, it may behave poorly when moving to a new configuration such as a wireless network. The optimizations performed by Scalpel take the current configuration into account, allowing it to select strategies that give good performance in a wider range of configurations.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhang, Jun. "Spatial trend prefetching for online maps mashups." Thesis, University of British Columbia, 2008. http://hdl.handle.net/2429/5540.

Full text

Abstract:

Mashups try to merge some specific information together with online maps application by displaying related markers onto maps. Sometimes markers will be displayed very slowly. In this thesis, we have presented an approach to improve the performance of related applications by reducing network latency for those responses. We use Spatial Trend Web Prefetching Model to predict the areas which can be possibly arrived at after next movements. We classified movements into three patterns. Then this model will check history operations done by a specific user, find possible pattern he may be following and then predict next possible operations for the user according to the specific algorithm responding to that pattern. In our experiments done for lab environment (Nearby Cities application) and Internet environment (Skype-Google Map application), we can see that our approach can achieve hit rate of about 85% when movement interval is not less than l000ms and larger than 30% when movement interval is less than l000ms.

APA, Harvard, Vancouver, ISO, and other styles

21

Selvidge, Charles William. "Compilation-based prefetching for memory latency tolerance." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/13236.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1992.
Includes bibliographical references (leaves 160-164).
by Charles William Selvidge.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

22

Wuerges, Emílio. "WCET-aware prefetching of unlocked instruction caches." reponame:Repositório Institucional da UFSC, 2015. https://repositorio.ufsc.br/xmlui/handle/123456789/134804.

Full text

Abstract:

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2015.
Made available in DSpace on 2015-09-08T04:10:00Z (GMT). No. of bitstreams: 1 334103.pdf: 474482 bytes, checksum: e8c8770f10b59df4df1e70854aa60a94 (MD5) Previous issue date: 2015
A computação embarcada requer crescente vazão sob baixa potência. Ela requer um aumento de eficiência energética quando se executam programas de crescente complexidade. Muitos sistemas embarcados são também sistemas de tempo real, cuja correção temporal precisa ser garantida através de análise de escalonabilidade, a qual costuma assumir que o WCET de uma tarefa é conhecido em tempo de projeto. Como resultado da crescente complexidade do software, uma quantidade significativa de energia é gasta ao se prover instruções através da hierarquia de memória. Como a cache de instruções consome cerca de 40% da energia gasta em um processador embarcado e afeta a energia consumida em memória principal, ela se torna um relevante alvo para otimização. Entretanto, como ela afeta substancialmente o WCET, o comportamento da cache precisa ser restrito via Â cache lockingÂ ou previsto via análise de WCET. Para obter eficiência energética sob restrições de tempo real, é preciso estender a consciência que o compilador tem da plataforma de hardware. Entretanto, compiladores para tempo real ignoram a energia, embora determinem rapidamente limites superiores para o WCET, enquanto compiladores para sistemas embarcados estimem com precisão a energia, mas gastem muito tempo em Â profilingÂ . Por isso, esta tese propõe um método unificado para estimar a energia gasta em memória, o qual é baseado em Interpretação Abstrata, exatamente o mesmo substrato matemático usado para a análise de WCET em caches. As estimativas mostram derivadas que são tão precisas quanto as obtidas via Â profilingÂ , mas são computadas 1000 vezes mais rápido, sendo apropriadas para induzir otimização de código através de melhoria iterativa. Como Â cache lockingÂ troca eficiência energética por previsibilidade, esta tese propõe uma nova otimização de código, baseada em pré-carga por software, a qual reduz a taxa de faltas de caches de instruções e, provadamente, não aumenta o WCET. A otimização proposta é comparada com o estado-da-arte em Â cache lockingÂ parcial para 37 programas do Â Malardalen WCET benchmarkÂ para 36 configurações de cache e duas tecnologias distintas (2664 casos de uso). Em média, para obter uma melhoria de 68% no WCET, Â cache lockingÂ parcial requer 8% mais energia. Por outro lado, a pré-carga por software diminui o consumo de energia em 11% enquanto melhora em 15% o WCET, reconciliando assim eficiência energética e garantias de tempo real.

Abstract : Embedded computing requires increasing throughput at low power budgets. It asks for growing energy efficiency when executing programs of rising complexity. Many embedded systems are also real-time systems, whose temporal correctness is asserted through schedulability analysis, which often assumes that the WCET of each task is known at design-time. As a result of the growing software complexity, a significant amount of energy is spent in supplying instructions through the memory hierarchy. Since an instruction cache consumes around 40% of an embedded processorÂ s energy and affects the energy spent in main memory, it becomes a relevant optimization target. However, since it largely impacts the WCET, cache behavior must be either constrained via cache locking or predicted by WCET analysis. To achieve energy efficiency under real-time constraints, a compiler must have extended awareness of the hardware platform. However, real-time compilers ignore energy, although they quickly determine bounds for WCET, whereas embedded compilers accurately estimate energy but require time-consuming profiling. That is why this thesis proposes a unifying method to estimate memory energy consumption that is based on Abstract Interpretation, the very same mathematical framework employed for the WCET analysis of caches. The estimates exhibit derivatives that are as accurate as those obtained by profiling, but are computed 1000 times faster, being suitable for driving code optimization through iterative improvement. Since cache locking gives up energy efficiency for predictability, this thesis proposes a novel code optimization, based on software prefetching, which reduces miss rate of unlocked instruction caches and, provenly, does not increase the WCET. The proposed optimization is compared with a state-of-the-art partial cache locking technique for the 37 programs of the Malardalen WCET benchmarks under 36 cache configurations and two distinct target technologies (2664 use cases). On average, to achieve an improvement of 68% in the WCET, partial cache locking required 8% more energy. On the other hand, software prefetching decreased the energy consumption by 11% while leading to an improvement of 15% in the WCET, thereby reconciling energy efficiency and real-time guarantees.

APA, Harvard, Vancouver, ISO, and other styles

23

Yesilmurat, Serdar. "A Prefetching Method For Interactive Web Gis Applications." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12611639/index.pdf.

Full text

Abstract:

A Web GIS system has a major issue of serving the map data to the client applications. Since most of the GIS services provide their geospatial data as basic image formats like PNG and JPEG, constructing those images and transferring them over the internet are costly operations. To enhance this inefficient process, various approaches are offered. Caching the responses of the requests on the client side is the most commonly implemented solution. However, this method is not adequate by itself. Besides caching the responses, predicting the next possible requests of the client and updating the cache with the responses for those requests provide a remarkable performance improvement. This procedure is called &ldquo
prefetching&rdquo
. Via prefetching, caching mechanisms can be used more effectively and efficiently. This study proposes a prefetching algorithm called Retrospective Adaptive Prefetch (RAP). The algorithm is constructed over a heuristic method that takes the former actions of the user into consideration. This method reduces the user-perceived response time and improves users&rsquo
navigation efficiency. The caching mechanism developed takes the memory capacity of the client machine into consideration to adjust the cache capacity by default. Otherwise, cache size can be configured manually. RAP is compared with 4 other methods. According to the experiments, this study shows that RAP provides better performance enhancements than the other compared methods.

APA, Harvard, Vancouver, ISO, and other styles

24

Blair, Stuart Andrew. "On the classification and evaluation of prefetching schemes." Thesis, University of Glasgow, 2003. http://theses.gla.ac.uk/2274/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Acharjee, Utpal. "Personalized and artificial intelligence Web caching and prefetching." Thesis, University of Ottawa (Canada), 2006. http://hdl.handle.net/10393/27215.

Full text

Abstract:

Web caching and prefetching are the most popular and widely used solutions to remedy Internet performance problems. Performance is increased if a combination of caching and prefetching systems is used rather than if these techniques are used individually. Web caching reduces the bandwidth consumption and network latency by serving the user's request from its own cache instead of the original Internet source. Prefetching is a technique that preloads and caches the web object that is not currently requested by the user but can be requested (expected) in the near future. It provides low retrieval latency for users and as well as high hit ratios. Existing methods for caching and prefetching are mostly traditional sharable Proxy cache servers. In our personalized caching and prefetching approach, the system builds up a user profile associated with a user's web behaviour by parsing the keywords from HTML pages that are browsed by the user. The keywords of a user profile are updated by adding a new keyword or incrementing its associated weight if it is already, in the profile. This user profile reflects users' web behaviour or interest. In this cache and prefetch prediction module we considered both static and dynamic users' web behaviour. We have designed and implemented an artificial intelligence multilayer neural network-based caching and prediction algorithm to personalize the Proteus Proxy server with this mechanism. Enhanced Proteus is a multilingual and internationally-supported Proxy system and can work with both mobile and traditional Proxy server-based sharable environments. In the prefetch option of Proteus, time also implemented a unique content filtering feature that blocks the downloading of unwanted web objects.

APA, Harvard, Vancouver, ISO, and other styles

26

Waern, Jonatan. "Profiling-Assisted Prefetching with Just-In-Time Compilation." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-234483.

Full text

Abstract:

The Decoupled Access/Execute(DAE) approach is a method to reduce the energy consumption of task-based programs, based on dividing tasks in two phases where the first phase prefetches data at a low CPU frequency and the following phase performs computation at a high CPU frequency. The goal of this project is to extend this approach to sequential programs and examine the benefits of optimising the access phase to better suit the architecture the program runs on, and the program input. By using a Just-In-Time compiler to dynamically optimise the program and by utilising profiling tools to obtain runtime information, we have examined the possible benefits of the DAE approach on sequential programs with optimised access phases. We compared the benefits with the cost of dynamic optimisation by testing the method on selected benchmarks from the SPEC CPU 2006 suite. The results indicate that for many types of sequential programs dynamically optimised DAE is suitable but care must be taken when determining how to optimise the access phases.

APA, Harvard, Vancouver, ISO, and other styles

27

Song, Dong Ho. "An accurate prefetching policy for object oriented systems." Thesis, University of Newcastle Upon Tyne, 1991. http://hdl.handle.net/10443/2054.

Full text

Abstract:

In the latest high-performance computers, there is a growing requirement for accurate prefetching(AP) methodologies for advanced object management schemes in virtual memory and migration systems. The major issue for achieving this goal is that of finding a simple way of accurately predicting the objects that will be referenced in the near future and to group them so as to allow them to be fetched same time. The basic notion of AP involves building a relationship for logically grouping related objects and prefetching them, rather than using their physical grouping and it relies on demand fetching such as is done in existing restructuring or grouping schemes. By this, AP tries to overcome some of the shortcomings posed by physical grouping methods. Prefetching also makes use of the properties of object oriented languages to build inter and intra object relationships as a means of logical grouping. This thesis describes how this relationship can be established at compile time and how it can be used for accurate object prefetching in virtual memory systems. In addition, AP performs control flow and data dependency analysis to reinforce the relationships and to find the dependencies of a program. The user program is decomposed into prefetching blocks which contain all the information needed for block prefetching such as long branches and function calls at major branch points. The proposed prefetching scheme is implemented by extending a C++ compiler and evaluated on a virtual memory simulator. The results show a significant reduction both in the number of page fault and memory pollution. In particular, AP can suppress many page faults that occur during transition phases which are unmanageable by other ways of fetching. AP can be applied to a local and distributed virtual memory system so as to reduce the fault rate by fetching groups of objects at the same time and consequently lessening operating system overheads.

APA, Harvard, Vancouver, ISO, and other styles

28

Thakker, Dhawal. "Prefetching and clustering techniques for network based storage." Thesis, Middlesex University, 2009. http://eprints.mdx.ac.uk/6451/.

Full text

Abstract:

The usage of network-based applications is increasing, as network speeds increase, and the use of streaming applications, e.g BBC iPlayer, YouTube etc., running over network infrastructure is becoming commonplace. These applications access data sequentially. However, as processor speeds and the amount of memory available increase, the rate at which streaming applications access data is now faster than the rate at which the blocks can be fetched consecutively from network storage. In addition to sequential access, the system also needs to promptly satisfy demand misses in order for applications to continue their execution. This thesis proposes a design to provide Quality-Of-Service (QoS) for streaming applications (sequential accesses) and demand misses, such that, streaming applications can run without jitter (once they are started) and demand misses can be satisfied in reasonable time using network storage. To implement the proposed design in real time, the thesis presents an analytical model to estimate the average time taken to service a demand miss. Further, it defines and explores the operational space where the proposed QoS could be provided. Using database techniques, this region is then encapsulated into an autonomous algorithm which is verified using simulation. Finally, a prototype Experimental File System (EFS) is designed and implemented to test the algorithm on a real test-bed.

APA, Harvard, Vancouver, ISO, and other styles

29

Selfa, Oliver Vicent. "Adaptive Prefetching and Cache Partitioning for Multicore Processors." Doctoral thesis, Universitat Politècnica de València, 2018. http://hdl.handle.net/10251/112423.

Full text

Abstract:

El acceso a la memoria principal en los procesadores actuales supone un importante cuello de botella para las prestaciones, dado que los diferentes núcleos compiten por el limitado ancho de banda de memoria, agravando la brecha entre las prestaciones del procesador y las de la memoria principal. Distintas técnicas atacan este problema, siendo las más relevantes el uso de jerarquías de caché multinivel y la prebúsqueda. Las cachés jerárquicas aprovechan la localidad temporal y espacial que en general presentan los programas en el acceso a los datos, para mitigar las enormes latencias de acceso a memoria principal. Para limitar el número de accesos a la memoria DRAM, fuera del chip, los procesadores actuales cuentan con grandes cachés de último nivel (LLC). Para mejorar su utilización y reducir costes, estas cachés suelen compartirse entre todos los núcleos del procesador. Este enfoque mejora significativamente el rendimiento de la mayoría de las aplicaciones en comparación con el uso de cachés privados más pequeños. Compartir la caché, sin embargo, presenta una problema importante: la interferencia entre aplicaciones. La prebúsqueda, por otro lado, trae bloques de datos a las cachés antes de que el procesador los solicite, ocultando la latencia de memoria principal. Desafortunadamente, dado que la prebúsqueda es una técnica especulativa, si no tiene éxito puede contaminar la caché con bloques que no se usarán. Además, las prebúsquedas interfieren con los accesos a memoria normales, tanto los del núcleo que emite las prebúsquedas como los de los demás. Esta tesis se centra en reducir la interferencia entre aplicaciones, tanto en las caché compartidas como en el acceso a la memoria principal. Para reducir la interferencia entre aplicaciones en el acceso a la memoria principal, el mecanismo propuesto en esta disertación regula la agresividad de cada prebuscador, activando o desactivando selectivamente algunos de ellos, dependiendo de su rendimiento individual y de los requisitos de ancho de banda de memoria principal de los otros núcleos. Con respecto a la interferencia en cachés compartidos, esta tesis propone dos técnicas de particionado para la LLC, las cuales otorgan más espacio de caché a las aplicaciones que progresan más lentamente debido a la interferencia entre aplicaciones. La primera propuesta de particionado de caché requiere hardware específico no disponible en procesadores comerciales, por lo que se ha evaluado utilizando un entorno de simulación. La segunda propuesta de particionado de caché presenta una familia de políticas que superan las limitaciones en el número de particiones y en el número de vías de caché disponibles mediante la agrupación de aplicaciones en clústeres y la superposición de particiones de caché, por lo que varias aplicaciones comparten las mismas vías. Dado que se ha implementado utilizando los mecanismos para el particionado de la LLC que presentan algunos procesadores Intel modernos, esta propuesta ha sido evaluada en una máquina real. Los resultados experimentales muestran que el mecanismo de prebúsqueda selectiva propuesto en esta tesis reduce el número de solicitudes de memoria principal en un 20%, cosa que se traduce en mejoras en la equidad del sistema, el rendimiento y el consumo de energía. Por otro lado, con respecto a los esquemas de partición propuestos, en comparación con un sistema sin particiones, ambas propuestas reducen la iniquidad del sistema en un promedio de más del 25%, independientemente de la cantidad de aplicaciones en ejecución, y esta reducción en la injusticia no afecta negativamente al rendimiento.
Accessing main memory represents a major performance bottleneck in current processors, since the different cores compete among them for the limited offchip bandwidth, aggravating even more the so called memory wall. Several techniques have been applied to deal with the core-memory performance gap, with the most preeminent ones being prefetching and hierarchical caching. Hierarchical caches leverage the temporal and spacial locality of the accessed data, mitigating the huge main memory access latencies. To limit the number of accesses to the off-chip DRAM memory, current processors feature large Last Level Caches. These caches are shared between all the cores to improve the utilization of the cache space and reduce cost. This approach significantly improves the performance of most applications compared to using smaller private caches. Cache sharing, however, presents an important shortcoming: the interference between applications. Prefetching, on the other hand, brings data blocks to the caches before they are requested, hiding the main memory latency. Unfortunately, since prefetching is a speculative technique, inaccurate prefetches may pollute the cache with blocks that will not be used. In addition, the prefetches interfere with the regular memory requests, both the ones from the application running on the core that issued the prefetches and the others. This thesis focuses on reducing the inter-application interference, both in the shared cache and in the access to the main memory. To reduce the interapplication interference in the access to main memory, the proposed approach regulates the aggressiveness of each core prefetcher, and selectively activates or deactivates some of them, depending on their individual performance and the main memory bandwidth requirements of the other cores. With respect to interference in shared caches, this thesis proposes two LLC partitioning techniques that give more cache space to the applications that have their progress diminished due inter-application interferences. The first cache partitioning proposal requires dedicated hardware not available in commercial processors, so it has been evaluated using a simulation framework. The second proposal dealing with cache partitioning presents a family of partitioning policies that overcome the limitations in the number of partitions and the number of available ways by grouping applications and overlapping cache partitions, so multiple applications share the same ways. Since it has been implemented using the cache partitioning features of modern Intel processors it has been evaluated in a real machine. Experimental results show that the proposed selective prefetching mechanism reduces the number of main memory requests by 20%, which translates to improvements in unfairness, performance, and energy consumption. On the other hand, regarding the proposed partitioning schemes, compared to a system with no partitioning, both reduce unfairness more than 25% on average, regardless of the number of applications running in the multicore, and this reduction in unfairness does not negatively affect the performance.
L'accés a la memòria principal en els processadors actuals suposa un important coll d'ampolla per a les prestacions, ja que els diferents nuclis competeixen pel limitat ample de banda de memòria, agreujant la bretxa entre les prestacions del processador i les de la memòria principal. Diferents tècniques ataquen aquest problema, sent les més rellevants l'ús de jerarquies de memòria cau multinivell i la prebusca. Les memòries cau jeràrquiques aprofiten la localitat temporal i espacial que en general presenten els programes en l'accés a les dades per mitigar les enormes latències d'accés a memòria principal. Per limitar el nombre d'accessos a la memòria DRAM, fora del xip, els processadors actuals compten amb grans caus d'últim nivell (LLC). Per millorar la seva utilització i reduir costos, aquestes memòries cau solen compartir-se entre tots els nuclis del processador. Aquest enfocament millora significativament el rendiment de la majoria de les aplicacions en comparació amb l'ús de caus privades més menudes. Compartir la memòria cau, no obstant, presenta una problema important: la interferencia entre aplicacions. La prebusca, per altra banda, porta blocs de dades a les memòries cau abans que el processador els sol·licite, ocultant la latència de memòria principal. Desafortunadament, donat que la prebusca és una técnica especulativa, si no té èxit pot contaminar la memòria cau amb blocs que no fan falta. A més, les prebusques interfereixen amb els accessos normals a memòria, tant els del nucli que emet les prebusques com els dels altres. Aquesta tesi es centra en reduir la interferència entre aplicacions, tant en les cau compartides com en l'accés a la memòria principal. Per reduir la interferència entre aplicacions en l'accés a la memòria principal, el mecanismo proposat en aquesta dissertació regula l'agressivitat de cada prebuscador, activant o desactivant selectivament alguns d'ells, en funció del seu rendiment individual i dels requisits d'ample de banda de memòria principal dels altres nuclis. Pel que fa a la interferència en caus compartides, aquesta tesi proposa dues tècniques de particionat per a la LLC, les quals atorguen més espai de memòria cau a les aplicacions que progressen més lentament a causa de la interferència entre aplicacions. La primera proposta per al particionat de memòria cau requereix hardware específic no disponible en processadors comercials, per la qual cosa s'ha avaluat utilitzant un entorn de simulació. La segona proposta de particionat per a memòries cau presenta una família de polítiques que superen les limitacions en el nombre de particions i en el nombre de vies de memòria cau disponibles mitjan¿ cant l'agrupació d'aplicacions en clústers i la superposició de particions de memòria cau, de manera que diverses aplicacions comparteixen les mateixes vies. Atès que s'ha implementat utilitzant els mecanismes per al particionat de la LLC que ofereixen alguns processadors Intel moderns, aquesta proposta s'ha avaluat en una màquina real. Els resultats experimentals mostren que el mecanisme de prebusca selectiva proposat en aquesta tesi redueix el nombre de sol·licituds a la memòria principal en un 20%, cosa que es tradueix en millores en l'equitat del sistema, el rendiment i el consum d'energia. Per altra banda, pel que fa als esquemes de particiónat proposats, en comparació amb un sistema sense particions, ambdues propostes redueixen la iniquitat del sistema en més d'un 25% de mitjana, independentment de la quantitat d'aplicacions en execució, i aquesta reducció en la iniquitat no afecta negativament el rendiment.
Selfa Oliver, V. (2018). Adaptive Prefetching and Cache Partitioning for Multicore Processors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112423
TESIS

APA, Harvard, Vancouver, ISO, and other styles

30

Berg, Stefan Georg. "A cache-based prefetching memory system for mediaprocessors /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Sair, Suleyman. "Predictor-directed data prefetching for pointer-based applications /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3090445.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Payami, Maryam. "Instruction prefetching techniques for ultra low-power multicore architectures." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/12462/.

Full text

Abstract:

As the gap between processor and memory speeds increases, memory latencies have become a critical bottleneck for computing performance. To reduce this bottleneck, designers have been working on techniques to hide these latencies. On the other hand, design of embedded processors typically targets low cost and low power consumption. Therefore, techniques which can satisfy these constraints are more desirable for embedded domains. While out-of-order execution, aggressive speculation, and complex branch prediction algorithms can help hide the memory access latency in high-performance systems, yet they can cost a heavy power budget and are not suitable for embedded systems. Prefetching is another popular method for hiding the memory access latency, and has been studied very well for high-performance processors. Similarly, for embedded processors with strict power requirements, the application of complex prefetching techniques is greatly limited, and therefore, a low power/energy solution is mostly desired in this context. In this work, we focus on instruction prefetching for ultra-low power processing architectures and aim to reduce energy overhead of this operation by proposing a combination of simple, low-cost, and energy efficient prefetching techniques. We study a wide range of applications from cryptography to computer vision and show that our proposed mechanisms can effectively improve the hit-rate of almost all of them to above 95%, achieving an average performance improvement of more than 2X. Plus, by synthesizing our designs using the state-of-the-art technologies we show that the prefetchers increase system’s power consumption less than 15% and total silicon area by less than 1%. Altogether, a total energy reduction of 1.9X is achieved, thanks to the proposed schemes, enabling a significantly higher battery life.

APA, Harvard, Vancouver, ISO, and other styles

33

Cortés, Toni. "Cooperative caching and prefetching in parallel/distributed file systems." Doctoral thesis, Universitat Politècnica de Catalunya, 1997. http://hdl.handle.net/10803/6009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Büttner, Markus. "Kombinatorische integrierte Prefetching- und Cachingalgorithmen für Einzel- und Mehrplattensysteme." [S.l. : s.n.], 2004. http://deposit.ddb.de/cgi-bin/dokserv?idn=97213591X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Demke, Angela K. "Automatic I/O prefetching for out-of-core applications." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp04/mq28765.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Kulkarni, Amit Vasant. "Implementing and Evaluating SCM Algorithms for Rate-Aware Prefetching." NCSU, 2009. http://www.lib.ncsu.edu/theses/available/etd-12112008-154909/.

Full text

Abstract:

File system prefetching has been widely studied and used to hide high latency of disk I/O. However, there are very few algorithms that explicitly take the file access rate or burstiness into account to distribute resources, especially the prefetching memory. In this work we draw parallels between file system prefetching and the field of Supply Chain Management (SCM), particularly Inventory Theory. We further describe two very commonly used algorithms in SCM that directly address access rate and uncertainty. We also implement these prefetching algorithms in the Linux kernel and present the performance results of using these algorithms. Our results show that with these SCM-based algorithms, we can improve the throughput of standard Linux file transfer applications by up to 33% and the throughput of some server workloads (such as Video-on-Demand) by up to 41%.

APA, Harvard, Vancouver, ISO, and other styles

37

Angelino, Elaine Lee. "Accelerating Markov chain Monte Carlo via parallel predictive prefetching." Thesis, Harvard University, 2014. http://nrs.harvard.edu/urn-3:HUL.InstRepos:13070022.

Full text

Abstract:

We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. This dissertation demonstrates that MCMC inference can be accelerated in a model of parallel computation that uses speculation to predict and complete computational work ahead of when it is known to be useful. By exploiting fast, iterative approximations to the target density, we can speculatively evaluate many potential future steps of the chain in parallel. In Bayesian inference problems, this approach can accelerate sampling from the target distribution, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores.
Engineering and Applied Sciences

APA, Harvard, Vancouver, ISO, and other styles

38

Zacharopoulos, Georgios. "Employing Hardware Transactional Memory in Prefetching for Energy Efficiency." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-260556.

Full text

Abstract:

Energy efficiency is becoming a highly significant topic regarding modern hardware. The need for decreased energy consumption in our computers and more battery life in our laptops and smart-phones is increasing, without sustaining performance loss in our machines. Much work is being conducted towards that cause and as a result our lives could become more convenient. For serving the purpose of this project, we have investigated the implementation of Hardware Transactional Memory (HTM) in the prefetching phase of Decoupled Access/Execute (DAE) model [1]. The challenge posed by using DAE model is to make sure the memory state remains intact while prefetching data. We propose a solution to overcome this challenge by employing HTM that is supported by Intel's latest processors. An innovative approach of HTM was carried out, in order to achieve the final implementation of it in the Access phase of the DAE model. Evaluation proved that benefits resulting from the DAE model utilization can be maintained by our approach. Furthermore, we are able to extend the use of the model to more applications that was previously not possible.

APA, Harvard, Vancouver, ISO, and other styles

39

Knafla, Nils. "Prefetching techniques for client server object-oriented database systems." Thesis, University of Edinburgh, 1999. http://hdl.handle.net/1842/10999.

Full text

Abstract:

The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. In this work we suggest several prefetching techniques to avoid, or at least to reduce, page fetch latency. In practice no prediction technique is prefect and no prefetching technique can reduce the total demand fetch time. Therefore we are interested in the trade-off between the level of accuracy required for obtaining good results in terms of elapsed time reduction and the processing overhead needed to achieve this level of accuracy. If prefetching accuracy is high then the total elapsed time of an application can be reduced significantly otherwise if the prefetching accuracy is low, many incorrect pages are prefetched and the extra load on the client, network, server and disks decreases the whole system performance. Access pattern of object-oriented databases are often complex and usually hard to predict accurately. The main thrust of our work therefore concentrates on analysing the structure of object relationships to obtain knowledge about page reference patterns. We designed a technique, called OSP, which prefetches pages according to a time constraint established by the duration of a page fetch. In addition, every page has an associated weight that decides about the execution of a prefetch. We implemented OSP in the EXODUS storage manager by adding multithreading to the database client. The performance of OSP is evaluated on different machines in interaction with buffer management, distributed databases and other system parameters.

APA, Harvard, Vancouver, ISO, and other styles

40

Khan, Muneeb. "Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching." Doctoral thesis, Uppsala universitet, Datorarkitektur och datorkommunikation, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-272095.

Full text

Abstract:

Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefetching, to increase performance. Such complex hardware structures have helped improve performance in general, however, their full potential is not realized as software often utilizes the memory hierarchy inefficiently. Performance can be improved further by ensuring careful interaction between software and hardware. Performance can typically improve by increasing the cache utilization and by conserving the DRAM bandwidth, i.e., retaining more useful data in the caches and lowering data requests to the DRAM. One way to achieve this is to conserve space across the cache hierarchy and increase opportunity for temporal reuse of cached data. Similarly, conserving the DRAM bandwidth is essential for performance in highly utilized multicores, as it can easily become a critical resource. When multiple cores are active and the per-core share of DRAM bandwidth shrinks, its efficient utilization plays an important role in improving the overall performance. Together the cache hierarchy and the DRAM bandwidth play a significant role in defining the overall performance in multicores. Based on deep insight from memory behavior modeling of software, this thesis explores five software-only methods to analyze and increase performance in multicores. The underlying philosophy that drives these techniques is to increase cache utilization and conserve DRAM bandwidth by 1) focusing on making data prefetching more accurate, and 2) lowering the miss rate in the cache hierarchy either by preserving useful data longer by cache-bypassing the less useful data or via code size compaction using compiler options. First, we show how microarchitecture-independent memory access profiles can be used to analyze the Instruction Cache performance of software. We use this information in a compiler pass to recompile application phases (with large Instruction cache miss rate) for smaller code size in an effort to improve the application Instruction Cache behavior. Second, we demonstrate how a resourceefficient software prefetching method can be combined with hardware prefetching to improve performance in multicores when running software that exhibits irregular memory access patterns. Third, we show that hardware prefetching on high performance commodity multicores is sub-optimal and demonstrate how a resource-efficient software-only prefetching method can perform better in fully utilized multicores. Fourth, we present an adaptive prefetching approach that dynamically combines software and hardware prefetching in a runtime system to improve performance in highly utilized multicores. Finally, in the fifth work we develop a method to predict per-core prefetching configurations that deliver near-optimal overall multicore performance. These software techniques enable us to tap greater performance in multicores (up to 50%), without requiring more processing resources.

APA, Harvard, Vancouver, ISO, and other styles

41

Fougner, Alexander. "Increasing energy efficiency and instruction scheduling by software prefetching." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-292714.

Full text

Abstract:

With the increasing problems related to semiconductor process node shrinkage and the expansion of the mobile devices market, the requirements for energy efficiency are continuously constrained. Alternative methods such as Decoupled Access/Execute adapts software to better fit dynamic voltage and frequency scaling. Targeting the energy inefficient Out-of-Order execution logic new methods propose to increase energy efficiency by moving the Out-of-Order logic from hardware level to software level by enabling reordering of loop iterations. One way to enable reordering of iterations is to transform a loop to backwards recursion. The aim of this thesis is to investigate transformations of loops into recursions and to evaluate the resulting performance impact. This thesis presents a source language independent implementation of a LLVM compiler pass transforming loops into forward and backward recursions. The performance impact is evaluated by choosing parallel loops from the Rodinia benchmark, measuring the recursion overhead for different recursion depths. In certain cases, tight loops showed a variation in overhead ranging between 22% to 78% for the backwards recursion case depending on recursion depth, whereas for loose loops the observed overhead for some loops were as low as 1% regardless of the recursion depth.

APA, Harvard, Vancouver, ISO, and other styles

42

Kim, Donglok. "Extended data cache prefetching using a reference prediction table /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6127.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Garside, Jamie. "Real-time prefetching on shared-memory multi-core systems." Thesis, University of York, 2015. http://etheses.whiterose.ac.uk/10711/.

Full text

Abstract:

In recent years, there has been a growing trend towards using multi-core processors in real-time systems to cope with the rising computation requirements of real-time tasks. Coupled with this, the rising memory requirements of these tasks pushes demand beyond what can be provided by small, private on-chip caches, requiring the use of larger, slower off-chip memories such as DRAM. Due to the cost, power requirements and complexity of these memories, they are typically shared between all of the tasks within the system. In order for the execution time of these tasks to be bounded, the response time of the memory and the interference from other tasks also needs to be bounded. While there is a great amount of current research on bounding this interference, one popular method is to effectively partition the available memory bandwidth between the processors in the system. Of course, as the number of processors increases, so does the worst-case blocking, and worst-case blocking times quickly increase with the number of processors. It is difficult to further optimise the arbitration scheme; instead, this scaling problem needs to be approached from another angle. Prefetching has previously been shown to improve the execution time of tasks by speculatively issuing memory accesses ahead of time for items which may be useful in the near future, although these prefetchers are typically not used in real-time systems due to their unpredictable nature. Instead, this work presents a framework by which a prefetcher can be safely used alongside a composable memory arbiter, a predictable prefetching scheme, and finally a method by which this predictable prefetcher can be used to improve the worst-case execution time of a running task.

APA, Harvard, Vancouver, ISO, and other styles

44

Díaz, Pedro. "Mechanisms to improve the efficiency of hardware data prefetchers." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5650.

Full text

Abstract:

A well known performance bottleneck in computer architecture is the so-called memory wall. This term refers to the huge disparity between on-chip and off-chip access latencies. Historically speaking, the operating frequency of processors has increased at a steady pace, while most past advances in memory technology have been in density, not speed. Nowadays, the trend for ever increasing processor operating frequencies has been replaced by an increasing number of CPU cores per chip. This will continue to exacerbate the memory wall problem, as several cores now have to compete for off-chip data access. As multi-core systems pack more and more cores, it is expected that the access latency as observed by each core will continue to increase. Although the causes of the memory wall have changed, it is, and will continue to be in the near future, a very significant challenge in terms of computer architecture design. Prefetching has been an important technique to amortize the effect of the memory wall. With prefetching, data or instructions that are expected to be used in the near future are speculatively moved up in the memory hierarchy, were the access latency is smaller. This dissertation focuses on hardware data prefetching at the last cache level before memory (last level cache, LLC). Prefetching at the LLC usually offers the best performance increase, as this is where the disparity between hit and miss latencies is the largest. Hardware prefetchers operate by examining the miss address stream generated by the cache and identifying patterns and correlations between the misses. Most prefetchers divide the global miss stream in several sub-streams, according to some pre-specified criteria. This process is known as localization. The benefits of localization are well established: it increases the accuracy of the predictions and helps filtering out spurious, non-predictable misses. However localization has one important drawback: since the misses are classified into different sub-streams, important chronological information is lost. A consequence of this is that most localizing prefetchers issue prefetches in an untimely manner, fetching data too far in advance. This behavior promotes data pollution in the cache. The first part of this thesis proposes a new class of prefetchers based on the novel concept of Stream Chaining. With Stream Chaining, the prefetcher tries to reconstruct the chronological information lost in the process of localization, while at the same time keeping its benefits. We describe two novel Stream Chaining prefetching algorithms based on two state of the art localizing prefetchers: PC/DC and C/DC. We show how both prefetchers issue prefetches in a more timely manner than their nonchaining counterparts, increasing performance by as much as 55% (10% on average) on a suite of sequential benchmarks, while consuming roughly the same amount of memory bandwidth. In order to hide the effects of the memory wall, hardware prefetchers are usually configured to aggressively prefetch as much data as possible. However, a highly aggressive prefetcher can have negative effects on performance. Factors such as prefetching accuracy, cache pollution and memory bandwidth consumption have to be taken into account. This is specially important in the context of multi-core systems, where typically each core has its own prefetching engine and there is high competition for accessing memory. Several prefetch throttling and filtering mechanisms have been proposed to maximize the effect of prefetching in multi-core systems. The general strategy behind these heuristics is to promote prefetches that are more likely to be used and cause less interference. Traditionally these methods operate at the source level, i.e., directly into the prefetch engine they are assigned to control. In multi-core systems all prefetches are aggregated in a FIFO-like data structure called the Prefetch Request Queue (PRQ), where they wait to be dispatched to memory. The second part of this thesis shows that a traditional FIFO PRQ does not promote a timely prefetching behavior and usually hinders part of the performance benefits achieved by throttling heuristics. We propose a novel approach to prefetch aggressiveness control in multi-cores that performs throttling at the PRQ (i.e., global) level, using global knowledge of the metrics of all prefetchers and information about the global state of the PRQ. To do this, we introduce the Resizable Prefetching Heap (RPH), a data structure modeled after a binary heap that promotes timely dispatch of prefetches as well as fairness in the distribution of prefetching bandwidth. The RPH is designed as a drop-in replacement of traditional FIFO PRQs. We compare our proposal against a state-of-the-art source-level throttling algorithm (HPAC) in a 8-core system. Unlike previous research, we evaluate both multiprogrammed and multithreaded (parallel) workloads, using a modern prefetching algorithm (C/DC). Our experimental results show that RPH-based throttling increases the throttling performance benefits obtained by HPAC by as much as 148% (53.8% average) in multiprogrammed workloads and as much as 237% (22.5% average) in parallel benchmarks, while consuming roughly the same amount of memory bandwidth. When comparing the speedup over fixed degree prefetching, RPH increased the average speedup of HPAC from 7.1% to 10.9% in multiprogrammed workloads, and from 5.1% to 7.9% in parallel benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

45

Mahmood, Omer. "ADAPTIVE PROFILE DRIVEN DATA CACHING AND PREFETCHING IN MOBILE ENVIRONMENT." Thesis, The University of Sydney, 2005. http://hdl.handle.net/2123/714.

Full text

Abstract:

This thesis describes a new method of calculating data priority by using adaptive mobile user and device profiles which change with user location, time of the day, available networks and data access history. The profiles are used for data prefetching, selection of most suitable wireless network and cache management on the mobile device in order to optimally utilize the device's storage capacity and available bandwidth. Some of the inherent characteristics of mobile devices due to user movements are – non-persistent connection, limited bandwidth and storage capacity, changes in mobile device's geographical location and connection (eg. connection can be from GPRS to WLAN to Bluetooth). New research is being carried out in making mobile devices work more efficiently by reducing and/or eliminating their limitations. The focus of this research is to propose, evaluate and test a new user profiling technique which specifically caters to the needs of the mobile device users who are required to access large amounts of data, possibly more than the device storage capability during the course of the day or week. This work involves the development of an intelligent user profiling system along with mobile device caching system which will first allocate weight (priority) to the different sets and subsets of the total given data based on user's location, user's appointment information, user's preferences, device capabilities and available networks. Then the profile will automatically change the data weights with user movements, history of cached data access and characteristics of available networks. The Adaptive User and Device Profiles were designed to handle broad range of the issues associated with: •Changing network types and conditions •Limited storage capacity and document type support of mobile devices •Changes in user data needs due to their movements at different times of the day Many research areas have been addressed through this research but the primary focus has remained on the following four core areas. The four core areas are : selecting the most suitable wireless network; allocating weights to different datasets & subsets by integrating user's movements; previously accessed data; time of the day with user appointment information and device capabilities.

APA, Harvard, Vancouver, ISO, and other styles

46

Mahmood, Omer. "ADAPTIVE PROFILE DRIVEN DATA CACHING AND PREFETCHING IN MOBILE ENVIRONMENT." University of Sydney. Information Technologies, 2005. http://hdl.handle.net/2123/714.

Full text

Abstract:

This thesis describes a new method of calculating data priority by using adaptive mobile user and device profiles which change with user location, time of the day, available networks and data access history. The profiles are used for data prefetching, selection of most suitable wireless network and cache management on the mobile device in order to optimally utilize the device�s storage capacity and available bandwidth. Some of the inherent characteristics of mobile devices due to user movements are �non-persistent connection, limited bandwidth and storage capacity, changes in mobile device�s geographical location and connection (eg. connection can be from GPRS to WLAN to Bluetooth). New research is being carried out in making mobile devices work more efficiently by reducing and/or eliminating their limitations. The focus of this research is to propose, evaluate and test a new user profiling technique which specifically caters to the needs of the mobile device users who are required to access large amounts of data, possibly more than the device storage capability during the course of the day or week. This work involves the development of an intelligent user profiling system along with mobile device caching system which will first allocate weight (priority) to the different sets and subsets of the total given data based on user�s location, user�s appointment information, user�s preferences, device capabilities and available networks. Then the profile will automatically change the data weights with user movements, history of cached data access and characteristics of available networks. The Adaptive User and Device Profiles were designed to handle broad range of the issues associated with: �Changing network types and conditions �Limited storage capacity and document type support of mobile devices �Changes in user data needs due to their movements at different times of the day Many research areas have been addressed through this research but the primary focus has remained on the following four core areas. The four core areas are : selecting the most suitable wireless network; allocating weights to different datasets & subsets by integrating user�s movements; previously accessed data; time of the day with user appointment information and device capabilities.

APA, Harvard, Vancouver, ISO, and other styles

47

Sundin, Albin. "Word Space Models for Web User Clustering and Page Prefetching." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-82012.

Full text

Abstract:

This study evaluates methods for clustering web users via vector space models, for the purpose of web page prefetching for possible applications of server optimization. An experiment using Latent Semantic Analysis (LSA) is deployed to investigate whether LSA can reproduce the encouraging results obtained from previous research with Random Indexing (RI) and a chaos based optimization algorithm (CAS-C). This is not only motivated by LSA being yet another vector space model, but also by a study indicating LSA to outperform RI in a task similar to the web user clustering and prefetching task. The prefetching task was used to verify the applicability of LSA, where both RI and CAS-C have shown promising results. The original data set from the RI web user clustering and prefetching task was modeled using weighted (tf-idf) LSA. Clusters were defined using a common clustering algorithm (k-means). The least scattered cluster configuration for the model was identified by combining an internal validity measure (SSE) and a relative criterion validity measure (SD index). The assumed optimal cluster configuration was used for the web page prefetching task. Precision and recall of the LSA based method is found to be on par with RI and CAS-C, in as much that it solves the web user clustering and web task with similar characteristics as unweighted RI. The hypothesized inherent gains to precision and recall by using LSA was neither confirmed nor conclusively disproved. The effects of different weighting functions for RI are discussed and a number of methodological factors are identified for further research concerning LSA based clustering and prefetching.

APA, Harvard, Vancouver, ISO, and other styles

48

Suryanarayana, Vidya Rao. "Credit scheduling and prefetching in hypervisors using Hidden Markov Models." Thesis, Wichita State University, 2010. http://hdl.handle.net/10057/3749.

Full text

Abstract:

The advances in storage technologies like storage area networking, virtualization of servers and storage have revolutionized the storage of the explosive data of modern times. With such technologies, resource consolidation has become an increasingly easy task to accomplish which has in turn simplified the access of remote data. Recent researches in hardware has boosted the capacity of drives and the hard disks have become very inexpensive than before. However, with such an increase in the storage technologies, there come some bottlenecks in terms of performance and interoperability. When it comes to virtualization, especially server virtualization, there will be a lot of guest operating systems running on the same hardware. Hence, it is very important to ensure each guest is scheduled at the right time and decrease the latency of data access. There are various hardware advances that have made prefetching of data into the cache easy and efficient. But, however, interoperability between vendors must be assured and more efficient algorithms need to be developed for these purposes. In virtualized environments where there can be hundreds of virtual machines running, very good scheduling algorithms need to be developed in order to reduce the latency and the wait time of the virtual machines in run queue. The current algorithms are more oriented in providing fair access to the virtual machines and are not very concerned about reducing the latency. This can be a major bottleneck in time critical applications like scientific applications that have now started deploying SAN technologies to store the explosive data. Also, when data needs to be extracted from these storage arrays to vii analyze and process them, the latency of a read operation has to be reduced in order to improve the performance. The research done in this thesis aims to reduce the scheduling delay in a XEN hypervisor and also to reduce the latency of reading data from the disk using Hidden Markov Models (HMM). The scheduling and prefetching scenarios are modeled using a Gaussian and a Discrete HMM and the latency involved is evaluated. The HMM is a statistical analysis technique used to classify and predict data that has a repetitive pattern over time. The results show that using a HMM decreases the scheduling and access latencies involved. The proposed technique is mainly intended for virtualization scenarios involving hypervisors and storage arrays. Various patterns of data access involving different ratios of reads and writes are considered and a discrete HMM (DHMM) is used to prefetch the next most probable block of data that might be read by a guest. Also, a Gaussian HMM is used to classify the arrival time of the requests in a XEN hypervisor and the GHMM is incorporated with the credit scheduler used in order to reduce the scheduling latency. The results are numerically evaluated and found that scheduling the virtual machines (domains) at the correct time indeed decreases the waiting times of the domains in the run queue.
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science.

APA, Harvard, Vancouver, ISO, and other styles

49

Wong, Gordon. "Web prefetching with client clustering." 2004. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=94960&T=F.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Esfahbod, Behdad. "Preload: An adaptive prefetching Daemon." 2006. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=450330&T=F.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Prefetching'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles