Дисертації: "Automatic dynamic memory management"

1

Österlund, Erik. "Automatic memory management system for automatic parallelization." Thesis, Linnéuniversitetet, Institutionen för datavetenskap, fysik och matematik, DFM, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-13693.

Повний текст джерела

Анотація:

With Moore’s law coming to an end and the era of multiprocessor chips emerging, the need for ways of dealing with the essential problems with concurrency is becoming imminent. Automatic parallelization for imperative languages and pure functions in functional programming languages all try to prove independence statically. This thesis argues that independence is dynamic in nature. Static analysis for automatic parallelization has failed to do anything but trivial optimizations. This thesis shows a new approach where dynamic analysis about the system is provided for very low costs using a garbage collector that has to go through all live cells anyway. Immutable sub-graphs of objects that cannot change state are found. Their methods become pure functions that can be parallelized. The garbage collector implemented is a kind of replicating collector. It is about three times faster than Boehm’s collector in garbage collection, fully concurrent and provides the dynamic analysis almost for free.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Stojanovic, Marta. "Automatic memory management in Java." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp05/MQ65392.pdf.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Doddapaneni, Srinivas P. "Automatic dynamic decomposition of programs on distributed memory machines." Diss., Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/8158.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Zhang, Yang. "Dynamic Memory Management for the Loci Framework." MSSTATE, 2004. http://sun.library.msstate.edu/ETD-db/theses/available/etd-04062004-215627/.

Повний текст джерела

Анотація:

Resource management is a critical part in high-performance computing software. While management of processing resources to increase performance is the most critical, efficient management of memory resources plays an important role in solving large problems. This thesis research seeks to create an effective dynamic memory management scheme for a declarative data-parallel programming system. In such systems, some sort of automatic resource management is a requirement. Using the Loci framework, this thesis research focuses on exploring such opportunities. We believe there exists an automatic memory management scheme for such declarative data-parallel systems that provides good compromise between memory utilization and performance. In addition to basic memory management, this thesis research also seeks to develop methods that take advantages of the cache memory subsystem and explore balances between memory utilization and parallel communication costs in such declarative data-parallel frameworks.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Van, Vleet Taylor. "Dynamic cache-line sizes /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/6899.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Herrmann, Edward C. "Threaded Dynamic Memory Management in Many-Core Processors." University of Cincinnati / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1277132326.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Burrell, Tiffany. "System Identification in Automatic Database Memory Tuning." Scholar Commons, 2010. https://scholarcommons.usf.edu/etd/1583.

Повний текст джерела

Анотація:

Databases are very complex systems that require database system administrators to perform system tuning in order to achieve optimal performance. Memory tuning is vital to the performance of a database system because when the database workload exceeds its memory capacity, the results of the queries running on a system are delayed and can cause substantial user dissatisfaction. In order to solve this problem, this thesis presents a platform modeled after a closed control feedback loop to control the level of multi-query processing. Utilizing this platform provides two key assets. First, the system identification is acquired, which is one of two crucial steps involved in developing a closed feedback loop. Second, the platform provides a means to experimentally study database tuning problem and verify the effectiveness of research ideas related to database performance.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Ma, Yuke. "Design, Test and Implement a Reflective Scheduler with Task Partitioning Support of a Grid." Thesis, Cranfield University, 2008. http://hdl.handle.net/1826/3510.

Повний текст джерела

Анотація:

How to manage a dynamic environment and how to provide task partitioning are two key concerns when developing distributed computing applications. The emergence of Grid computing environments extends these problems. Conventional resource management systems are based on a relatively static resource model and a centralized scheduler that assigns computing resources to users. Distributed management introduces resource heterogeneity: not only the set of available resources, but even the set of resource types is constantly changing. Obviously this is unsuitable for the present Grid. In addition, the Grid provides users with the physical infrastructure to run parallel programs. Because of this increasing availability, there are more requirements for parallelization technologies. Therefore, based on problems outlined above, this thesis provides a novel scheduler which not only enables dynamic management but also provides skeleton library to support the task partition. Dynamic management is derived from the concept of reflectiveness, which allows the Grid to perform like an efficient market with some limited government controls. To supplement the reflective mechanism, this thesis integrates a statistical forecasting approach to predict the environment of the Grid in the next period. The task partitioning support is extended from the skeleton library in the parallel computing and cluster computing areas. The thesis shows how this idea can be applied in the Grid environment to simplify the user’s programming works. Later in this PhD thesis, a Petri-net based simulation methodology is introduced to examine the performance of the reflective scheduler. Moreover, a real testing environment is set up by using a reflective scheduler to run a geometry optimization application. In summary, by combining knowledge from economics, statistics, mathematics and computer science, this newly invented scheduler not only provides a convenient and efficient way to parallelize users’ tasks, but also significantly improves the performance of the Grid.

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Li, Wentong Kavi Krishna M. "High performance architecture using speculative threads and dynamic memory management hardware." [Denton, Tex.] : University of North Texas, 2007. http://digital.library.unt.edu/permalink/meta-dc-5150.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Li, Wentong. "High Performance Architecture using Speculative Threads and Dynamic Memory Management Hardware." Thesis, University of North Texas, 2007. https://digital.library.unt.edu/ark:/67531/metadc5150/.

Повний текст джерела

Анотація:

With the advances in very large scale integration (VLSI) technology, hundreds of billions of transistors can be packed into a single chip. With the increased hardware budget, how to take advantage of available hardware resources becomes an important research area. Some researchers have shifted from control flow Von-Neumann architecture back to dataflow architecture again in order to explore scalable architectures leading to multi-core systems with several hundreds of processing elements. In this dissertation, I address how the performance of modern processing systems can be improved, while attempting to reduce hardware complexity and energy consumptions. My research described here tackles both central processing unit (CPU) performance and memory subsystem performance. More specifically I will describe my research related to the design of an innovative decoupled multithreaded architecture that can be used in multi-core processor implementations. I also address how memory management functions can be off-loaded from processing pipelines to further improve system performance and eliminate cache pollution caused by runtime management functions.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Li, Bo. "Modeling and Runtime Systems for Coordinated Power-Performance Management." Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/87064.

Повний текст джерела

Анотація:

Emergent systems in high-performance computing (HPC) expect maximal efficiency to achieve the goal of power budget under 20-40 megawatts for 1 exaflop set by the Department of Energy. To optimize efficiency, emergent systems provide multiple power-performance control techniques to throttle different system components and scale of concurrency. In this dissertation, we focus on three throttling techniques: CPU dynamic voltage and frequency scaling (DVFS), dynamic memory throttling (DMT), and dynamic concurrency throttling (DCT). We first conduct an empirical analysis of the performance and energy trade-offs of different architectures under the throttling techniques. We show the impact on performance and energy consumption on Intel x86 systems with accelerators of Intel Xeon Phi and a Nvidia general-purpose graphics processing unit (GPGPU). We show the trade-offs and potentials for improving efficiency. Furthermore, we propose a parallel performance model for coordinating DVFS, DMT, and DCT simultaneously. We present a multivariate linear regression-based approach to approximate the impact of DVFS, DMT, and DCT on performance for performance prediction. Validation using 19 HPC applications/kernels on two architectures (i.e., Intel x86 and IBM BG/Q) shows up to 7% and 17% prediction error correspondingly. Thereafter, we develop the metrics for capturing the performance impact of DVFS, DMT, and DCT. We apply the artificial neural network model to approximate the nonlinear effects on performance impact and present a runtime control strategy accordingly for power capping. Our validation using 37 HPC applications/kernels shows up to a 20% performance improvement under a given power budget compared with the Intel RAPL-based method.
Ph. D.
System efficiency on high-performance computing (HPC) systems is the key to achieving the goal of power budget for exascale supercomputers. Techniques for adjusting the performance of different system components can help accomplish this goal by dynamically controlling system performance according to application behaviors. In this dissertation, we focus on three techniques: adjusting CPU performance, memory performance, and the number of threads for running parallel applications. First, we profile the performance and energy consumption of different HPC applications on both Intel systems with accelerators and IBM BG/Q systems. We explore the trade-offs of performance and energy under these techniques and provide optimization insights. Furthermore, we propose a parallel performance model that can accurately capture the impact of these techniques on performance in terms of job completion time. We present an approximation approach for performance prediction. The approximation has up to 7% and 17% prediction error on Intel x86 and IBM BG/Q systems respectively under 19 HPC applications. Thereafter, we apply the performance model in a runtime system design for improving performance under a given power budget. Our runtime strategy achieves up to 20% performance improvement to the baseline method.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Shalan, Mohamed A. "Dynamic memory management for embedded real-time multiprocessor system-on-a-chip." Diss., Available online, Georgia Institute of Technology, 2003:, 2003. http://etd.gatech.edu/theses/available/etd-11252003-131621/unrestricted/shalanmohameda200312.pdf.

Повний текст джерела

Анотація:

Thesis (Ph. D.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2004.
Vincent Mooney, Committee Chair; John Barry, Committee Member; James Hamblen, Committee Member; Karsten Schwan, Committee Member; Linda Wills, Committee Member. Includes bibliography.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Xia, Xiuxian. "Dynamic power distribution management for all electric aircraft." Thesis, Cranfield University, 2011. http://dspace.lib.cranfield.ac.uk/handle/1826/6285.

Повний текст джерела

Анотація:

In recent years, with the rapid development of electric and electronic technology, the All-Electric Aircraft (AEA) concept has attracted more and more attention, which only utilizes the electric power instead of conventional hydraulic and pneumatic power to supply all the airframe systems. To meet the power requirements under various flight stages and operating conditions, the AEA approach has resulted in the current aircraft electrical power generation capacity up to 1.6 MW. To satisfy the power quality and stability requirements, the advanced power electronic interfaces and more efficient power distribution systems must be investigated. Moreover, with the purpose of taking the full advantages of available electrical power, novel dynamic power distribution management research and design for an AEA must be carried out. The main objective of this thesis is to investigate and develop a methodology of more efficient power distribution management with the purpose of minimizing the rated power generating capacity and the mass of the electrical power system (EPS) including the power generation system and the power distribution system in an AEA. It is important to analyse and compare the subsistent electrical power distribution management approaches in current aircraft. Therefore the electrical power systems of A320 and B777, especially the power management system, will be discussed in this thesis. Most importantly the baseline aircraft, the Flying Crane is the outcome of the group design project. The whole project began in March 2008, and ended in September 2010, including three stages: conceptual design, preliminary design and detailed design. The dynamic power distribution management research is based on the power distribution system of the Flying Crane. The main task of the investigation is to analyse and manage the power usage among and inside typical airframe systems by using dynamic power distribution management method. The characteristics and operation process of these systems will be investigated in detail and thoroughly. By using the method of dynamic power distribution management, all the electrical consumers and sub-systems powered by electricity are managed effectively. The performance of an aircraft can be improved by reducing the peak load requirement on board. Furthermore, the electrical system architecture, distributed power distribution system and the dynamic power distribution management system for AEA are presented. Finally, the mass of the whole electrical power system is estimated and analysed carefully.

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Kim, Seyeon. "Node-oriented dynamic memory management for real-time systems on ccNUMA architecture systems." Thesis, University of York, 2013. http://etheses.whiterose.ac.uk/5712/.

Повний текст джерела

Анотація:

Since the 1960s, most operating systems and programming languages have been able to use dynamic memory allocation and deallocation. Although memory allocation has always required explicit interaction with an allocator, deallocation can be either explicit or implicit. Surprisingly, even though memory allocation/deallocation algorithms have been studied extensively over the last five decades, limited attention has been focused on the real-time properties. Most algorithms are general-purpose and do not satisfy the requirements of real-time systems. Furthermore, the few allocators supporting real-time systems do not scale well on multiprocessors. The increasing demand for high-performance computational processing has resulted in the trend of having many cores. ccNUMA architecture systems are part of this trend and provide a systematic scalable design. This thesis contends that current memory allocators for Operating Systems that support cc-NUMA architecture are not appropriate for real-time applications. We further contend that those real-time allocators that have been proposed in the literature are not cc-NUMA aware. The thesis proposes and implements (a prototype of) a new NUMA-aware dynamic memory allocation algorithm for use in soft real-time systems. We study the behaviour of our new allocation algorithm in comparison with related allocators both theoretically and practically.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Peterson, Thomas. "Dynamic Allocation for Embedded Heterogeneous Memory : An Empirical Study." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-223904.

Повний текст джерела

Анотація:

Embedded systems are omnipresent and contribute to our lives in many ways by instantiating functionality in larger systems. To operate, embedded systems require well-functioning software, hardware as well as an interface in-between these. The hardware and software of these systems is under constant change as new technologies arise. An actual change these systems are undergoing are the experimenting with different memory management techniques for RAM as novel non-volatile RAM(NVRAM) technologies have been invented. These NVRAM technologies often come with asymmetrical read and write latencies and thus motivate designing memory consisting of multiple NVRAMs. As a consequence of these properties and memory designs there is a need for memory management that minimizes latencies.This thesis addresses the problem of memory allocation on heterogeneous memory by conducting an empirical study. The first part of the study examines free list, bitmap and buddy system based allocation techniques. The free list allocation technique is then concluded to be superior. Thereafter, multi-bank memory architectures are designed and memory bank selection strategies are established. These strategies are based on size thresholds as well as memory bank occupancies. The evaluation of these strategies did not result in any major conclusions but showed that some strategies were more appropriate for someapplication behaviors.
Inbyggda system existerar allestädes och bidrar till våran livsstandard på flertalet avseenden genom att skapa funktionalitet i större system. För att vara verksamma kräver inbyggda system en välfungerande hård- och mjukvara samt gränssnitt mellan dessa. Dessa tre måste ständigt omarbetas i takt med utvecklingen av nya användbara teknologier för inbyggda system. En förändring dessa system genomgår i nuläget är experimentering med nya minneshanteringstekniker för RAM-minnen då nya icke-flyktiga RAM-minnen utvecklats. Dessa minnen uppvisar ofta asymmetriska läs och skriv fördröjningar vilket motiverar en minnesdesign baserad på flera olika icke-flyktiga RAM. Som en konsekvens av dessa egenskaper och minnesdesigner finns ett behov av att hitta minnesallokeringstekniker som minimerar de fördröjningar som skapas. Detta dokument adresserar problemet med minnesallokering på heterogena minnen genom en empirisk studie. I den första delen av studien studerades allokeringstekniker baserade på en länkad lista, bitmapp och ett kompissystem. Med detta som grund drogs slutsatsen att den länkade listan var överlägsen alternativen. Därefter utarbetades minnesarkitekturer med flera minnesbanker samtidigt som framtagandet av flera strategier för val av minnesbank utfördes. Dessa strategier baserades på storleksbaserade tröskelvärden och nyttjandegrad hos olika minnesbanker. Utvärderingen av dessa strategier resulterade ej i några större slutsatser men visade att olika strategier var olika lämpade för olika beteenden hos applikationer.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Gazi, Boran. "Dynamic buffer management policy for shared memory packet switches by employing per-queue thresholds." Thesis, Northumbria University, 2007. http://nrl.northumbria.ac.uk/3695/.

Повний текст джерела

Анотація:

One of the main problems concerning high-performance communications networks is the unavoidable congestion in network nodes. Network traffic is normally characterised as "bursty", which may use up network resources during peak periods. As a consequence end-user applications are subject to end-to-end delays and disruptions. Simultaneous transmission of packets on a finite bandwidth channel might result in contentions, where one or more packets are refrained from entering the transmission channel resulting in packet losses. Hence, the motivations of this thesis are two-fold: investigation and evaluation of switch architectures with electronic and optical buffers, and the development and evaluation of an improved dynamic threshold policy for shared memory switch architecture. In this work, switch architectures based on modular designs are evaluated, with simulation results showing that modular switch structures, i.e. multistage interconnection networks with optical delay line buffers, offer packet loss rate, throughput and average delay time similar to their electronic counterparts. Such optical architectures emulate prime features of shared memory switch architecture under general traffic conditions. Although the shared memory switch architecture is superior to other buffering approaches, but its performance is inadequate under imbalanced input traffic. Here its limiting features are investigated by means of numerical analysis. Different buffer management schemes, namely static thresholds, dynamic thresholds, pre-emptive, adaptive control, are investigated by using the Markov simulation model. An improved dynamic buffer management policy, decay function threshold (DFT) policy, is proposed and it is compared with the dynamic thresholds (DT), partial sharing partial partitioning (PSPP) and dynamic queue thresholds (DQT) buffer management policies by using bursty traffic source models, such as interrupted Poisson process (IPP), by means of simulations. Simulation results show that proposed policy is as good as well-known dynamic thresholds policy in the presence of best-effort traffic and offers improved packet loss performance when multicast traffic is considered. An integration framework for dynamic buffer management and bandwidth scheduling is also presented in this study. This framework employs loosely coupled buffer management and scheduling (weighted round robin, weighted fair queueing etc.) providing support for quality of service traffic. Conducted tests show that this framework matches the best-effort packet loss performance of dynamic thresholds policy.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Huang, Jipeng. "Efficient Context Sensitivity for Dynamic Analyses via Calling Context Uptrees and Customized Memory Management." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397231571.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Vijayakumar, Smita. "A Framework for Providing Automatic Resource and Accuracy Management in a Cloud Environment." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1274194090.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Young, Jeffrey. "Dynamic partitioned global address spaces for high-efficiency computing." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26467.

Повний текст джерела

Анотація:

Thesis (M. S.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Yalamanchili, Sudhakar; Committee Member: Riley, George; Committee Member: Schimmel, David. Part of the SMARTech Electronic Thesis and Dissertation Collection.

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Ramaswamy, Lakshmish Macheeri. "Towards Efficient Delivery of Dynamic Web Content." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/7646.

Повний текст джерела

Анотація:

Advantages of cache cooperation on edge cache networks serving dynamic web content were studied. Design of cooperative edge cache grid a large-scale cooperative edge cache network for delivering highly dynamic web content with varying server update frequencies was presented. A cache clouds-based architecture was proposed to promote low-cost cache cooperation in cooperative edge cache grid. An Internet landmarks-based scheme, called selective landmarks-based server-distance sensitive clustering scheme, for grouping edge caches into cooperative clouds was presented. Dynamic hashing technique for efficient, load-balanced, and reliable documents lookups and updates was presented. Utility-based scheme for cooperative document placement in cache clouds was proposed. The proposed architecture and techniques were evaluated through trace-based simulations using both real-world and synthetic traces. Results showed that the proposed techniques provide significant performance benefits. A framework for automatically detecting cache-effective fragments in dynamic web pages was presented. Two types of fragments in web pages, namely, shared fragments and lifetime-personalization fragments were identified and formally defined. A hierarchical fragment-aware web page model called the augmented-fragment tree model was proposed. An efficient algorithm to detect maximal fragments that are shared among multiple documents was proposed. A practical algorithm for detecting fragments based on their lifetime and personalization characteristics was designed. The proposed framework and algorithms were evaluated through experiments on real web sites. The effect of adopting the detected fragments on web-caches and origin-servers is experimentally studied.

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Sinha, Udayan Prabir. "Memory Management Error Detection in Parallel Software using a Simulated Hardware Platform." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219606.

Повний текст джерела

Анотація:

Memory management errors in concurrent software running on multi-core architectures can be difficult and costly to detect and repair. Examples of errors are usage of uninitialized memory, memory leaks, and data corruptions due to unintended overwrites of data that are not owned by the writing entity. If memory management errors could be detected at an early stage, for example when using a simulator before the software has been delivered and integrated in a product, significant savings could be achieved. This thesis investigates and develops methods for detection of usage of uninitialized memory in software that runs on a virtual hardware platform. The virtual hardware platform has models of Ericsson Radio Base Station hardware for baseband processing and digital radio processing. It is a bit-accurate representation of the underlying hardware, with models of processors and peripheral units, and it is used at Ericsson for software development and integration. There are tools available, such as Memcheck (Valgrind), and MemorySanitizer and AddressSanitizer (Clang), for memory management error detection. The features of such tools have been investigated, and memory management error detection algorithms were developed for a given processor’s instruction set. The error detection algorithms were implemented in a virtual platform, and issues and design considerations reflecting the application-specific instruction set architecture of the processor, were taken into account. A prototype implementation of memory error presentation with error locations mapped to the source code of the running program, and presentation of stack traces, was done, using functionality from a debugger. An experiment, using a purpose-built test program, was used to evaluate the error detection capability of the algorithms in the virtual platform, and for comparison with the error detection capability of Memcheck. The virtual platform implementation detects all known errors, except one, in the program and reports them to the user in an appropriate manner. There are false positives reported, mainly due to the limited awareness about the operating system used on the simulated processor
Minneshanteringsfel i parallell mjukvara som exekverar på flerkärniga arkitekturer kan vara svåra att detektera, samt kostsamma att åtgärda. Exempel på fel kan vara användning av ej initialiserat minne, minnesläckage, samt att data blir överskrivna av en process som inte är ägare till de data som skrivs över. Om minneshanteringsfel kan detekteras i ett tidigt skede, t ex genom att använda en simulator, som körs innan mjukvaran har levererats och integrerats i en produkt, skulle man kunna erhålla signifikanta kostnadsbesparingar. Detta examensarbete undersöker och utvecklar metoder för detektion av ej initialiserat minne i mjukvara som körs på en virtuell plattform. Den virtuella plattformen innehåller modeller av delar av den digitala hårdvara, för basband och radio, som finns i en Ericsson radiobasstation. Modellerna är bit-exakta representationer av motsvarande hårdvarublock, och innefattar processorer och periferienheter. Den virtuella plattformen används av Ericsson för utveckling och integration av mjukvara. Det finns verktyg, exempelvis Memcheck (Valgrind), samt MemorySanitizer och AddressSanitizer (Clang), som kan användas för att detektera minneshanteringsfel. Egenskaper hos sådana verktyg har undersökts, och algoritmer för detektion av minneshanteringsfel har utvecklats, för en specifik processor och dess instruktioner. Algoritmerna har implementerats i en virtuell plattform, och kravställningar och design-överväganden som speglar den tillämpnings-specifika instruktionsrepertoaren för den valda processorn, har behandlats. En prototyp-implementation av presentation av minneshanteringsfel, där källkodsraderna samt anropsstacken för de platser där fel har hittats pekas ut, har utvecklats, med användning av en debugger. Ett experiment, som använder sig av ett för ändamålet utvecklat program, har använts för att utvärdera feldetektions-förmågan för de algoritmer som implementerats i den virtuella plattformen, samt för att jämföra med feldetektions-förmågan hos Memcheck. De algoritmer som implementerats i den virtuella plattformen kan, för det program som används, detektera alla kända fel, förutom ett. Algoritmerna rapporterar också falska felindikeringar. Dessa rapporter är huvudsakligen ett resultat av att den aktuella implementationen har begränsad kunskap om det operativsystem som används på den simulerade processorn.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

Gangadharappa, Tejus A. "Designing Support For MPI-2 Programming Interfaces On Modern InterConnects." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1243908626.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Green, Craig Elkton. "Composite thermal capacitors for transient thermal management of multicore microprocessors." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44772.

Повний текст джерела

Анотація:

While 3D stacked multi-processor technology offers the potential for significant computing advantages, these architectures also face the significant challenge of small, localized hotspots with very large heat fluxes due to the placement of asymmetric cores, heterogeneous devices and performance driven layouts. In this thesis, a new thermal management solution is introduced that seeks to maximize the performance of microprocessors with dynamically managed power profiles. To mitigate the non-uniformities in chip temperature profiles resulting from the dynamic power maps, solid-liquid phase change materials (PCMs) with an embedded heat spreader network are strategically positioned near localized hotspots, resulting in a large increase in the local thermal capacitance in these problematic areas. Theoretical analysis shows that the increase in local thermal capacitance results in an almost twenty-fold increase in the time that a thermally constrained core can operate before a power gating or core migration event is required. Coupled to the PCMs are solid state coolers (SSCs) that serve as a means for fast regeneration of the PCMs during the cool down periods associated with throttling events. Using this combined PCM/SSC approach allows for devices that operate with the desirable combination of low throttling frequency and large overall core duty cycles, thus maximizing computational throughput. The impact of the thermophysical properties of the PCM on the device operating characteristics has been investigated from first principles in order to better inform the PCM selection or design process. Complementary to the theoretical characterization of the proposed thermal solution, a prototype device called a "Composite Thermal Capacitor (CTC)" that monolithically integrates micro heaters, PCMs and a spreader matrix into a Si test chip was fabricated and tested to validate the efficacy of the concept. A prototype CTC was shown to increase allowable device operating times by over 7X and address heat fluxes of up to ~395 W/cm2. Various methods for regenerating the CTC have been investigated, including air, liquid, and solid state cooling, and operational duty cycles of over 60% have been demonstrated.

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Zhu, Yong. "Routing, Resource Allocation and Network Design for Overlay Networks." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14017.

Повний текст джерела

Анотація:

Overlay networks have been the subject of significant research and practical interest recently in addressing the inefficiency and ossification of the current Internet. In this thesis, we cover various aspects of overlay network design, including overlay routing algorithms, overlay network assignment and multihomed overlay networks. We also examine the behavior of overlay networks under a wide range of network settings and identify several key factors that affect the performance of overlay networks. Based on these findings, practical design guidelines are also given. Specifically, this thesis addresses the following problems: 1) Dynamic overlay routing: We perform an extensive simulation study to investigate the performance of available bandwidth-based dynamic overlay routing from three important aspects: efficiency, stability, and safety margin. Based on the findings, we propose a hybrid routing scheme that achieves good performance in all three aspects. We also examine the effects of several factors on overlay routing performance, including network load, traffic variability, link-state staleness, number of overlay hops, measurement errors, and native sharing effects. 2) Virtual network assignment: We investigate the virtual network (VN) assignment problem in the scenario of network virtualization. Specifically, we develop a basic VN assignment scheme without reconfiguration and use it as the building block for all other advanced algorithms. Subdividing heuristics and adaptive optimization strategies are presented to further improve the performance. We also develop a selective VN reconfiguration scheme that prioritizes the reconfiguration for the most critical VNs. 3) Overlay network configuration tool for PlanetLab: We develop NetFinder, an automatic overlay network configuration tool to efficiently allocate PlanetLab resources to individual overlays. NetFinder continuously monitors the resource utilization of PlanetLab and accepts a user-defined overlay topology as input and selects the set of PlanetLab nodes and their interconnection for the user overlay. 4) Multihomed overlay network: We examine the effectiveness of combining multihoming and overlay routing from the perspective of an overlay service provider (OSP). We focus on the corresponding design problem and examine, with realistic network performance and pricing data, whether the OSP can provide a network service that is profitable, better (in terms of round-trip time), and less expensive than the competing native ISPs.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

Saxena, Abhinav. "Knowledge-Based Architecture for Integrated Condition Based Maintenance of Engineering Systems." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/16125.

Повний текст джерела

Анотація:

A paradigm shift is emerging in system reliability and maintainability. The military and industrial sectors are moving away from the traditional breakdown and scheduled maintenance to adopt concepts referred to as Condition Based Maintenance (CBM) and Prognostic Health Management (PHM). In addition to signal processing and subsequent diagnostic and prognostic algorithms these new technologies involve storage of large volumes of both quantitative and qualitative information to carry out maintenance tasks effectively. This not only requires research and development in advanced technologies but also the means to store, organize and access this knowledge in a timely and efficient fashion. Knowledge-based expert systems have been shown to possess capabilities to manage vast amounts of knowledge, but an intelligent systems approach calls for attributes like learning and adaptation in building autonomous decision support systems. This research presents an integrated knowledge-based approach to diagnostic reasoning for CBM of engineering systems. A two level diagnosis scheme has been conceptualized in which first a fault is hypothesized using the observational symptoms from the system and then a more specific diagnostic test is carried out using only the relevant sensor measurements to confirm the hypothesis. Utilizing the qualitative (textual) information obtained from these systems in combination with quantitative (sensory) information reduces the computational burden by carrying out a more informed testing. An Industrial Language Processing (ILP) technique has been developed for processing textual information from industrial systems. Compared to other automated methods that are computationally expensive, this technique manipulates standardized language messages by taking advantage of their semi-structured nature and domain limited vocabulary in a tractable manner. A Dynamic Case-based reasoning (DCBR) framework provides a hybrid platform for diagnostic reasoning and an integration mechanism for the operational infrastructure of an autonomous Decision Support System (DSS) for CBM. This integration involves data gathering, information extraction procedures, and real-time reasoning frameworks to facilitate the strategies and maintenance of critical systems. As a step further towards autonomy, DCBR builds on a self-evolving knowledgebase that learns from its performance feedback and reorganizes itself to deal with non-stationary environments. A unique Human-in-the-Loop Learning (HITLL) approach has been adopted to incorporate human feedback in the traditional Reinforcement Learning (RL) algorithm.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Frampton, Daniel John. "An Investigation into Automatic Dynamic Memory Management Strategies using Compacting Collection." Thesis, 2003. http://hdl.handle.net/1885/39951.

Повний текст джерела

Анотація:

Modern object oriented languages such as Java and C# have been gaining widespread industry support in recent times. Such languages rely on a runtime infrastructure that provides automatic dynamic memory management services. The performance of such services is a crucial component of overall system performance. This thesis discusses work undertaken in relation to automatic memory management using the Java Memory Management Toolkit (JMTk) running on the Jikes Research Virtual Machine (Jikes RVM). The primary goal of this work was to develop an automatic memory management strategy employing a compacting collector to run on this platform. Compacting collectors are an important class of collectors used in several production runtimes, including Microsoft's Common Language Runtime and IBM's Java Runtime Environment. The development of a strategy using compaction makes an important contribution to JMTk, and provides a platform where side-by-side comparisons between compacting collectors and other important classes of collector can be made. A compacting collector differs from the collectors that currently exist in JMTk in several important ways. Prior to this work, JMTk and Jikes RVM did not have an implementation of a compacting collector, nor the structure to fully support one. This work has achieved its primary goal in providing an implementation of a compacting collector. It describes how both JMTk and Jikes RVM were modified to support such collectors. Although substantial, this project should be considered but a first step into the investigation of this class of collectors. It is anticipated that through broadening the set of operations supported by JMTk and Jikes RVM that this work will also allow new classes of collectors to be implemented and compared. The cost of performing a compacting collection was shown to be very significant given the current implementation. The use of compaction in a generational collector demonstrated increased performance, bringing it in-line with other generational collectors in JMTk. This work shows that there are benefits in reducing memory fragmentation through the use of compacting collectors. When discounting the cost of the collection, the implemented compacting collectors come close to matching or outperforming other collection strategies. The difficulty now lies in attempting to reduce the cost of compacting collection.

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Pai, Sreepathi. "Efficient Dynamic Automatic Memory Management And Concurrent Kernel Execution For General-Purpose Programs On Graphics Processing Units." Thesis, 2014. http://etd.iisc.ernet.in/handle/2005/2609.

Повний текст джерела

Анотація:

Modern supercomputers now use accelerators to achieve their performance with the most widely used accelerator being the Graphics Processing Unit (GPU). However, achieving the performance potential of systems that combine a GPU and CPU is an arduous task which could be made easier with the assistance of the compiler or runtime. In particular, exploiting two features of GPU architectures -- distributed memory and concurrent kernel execution -- is critical to achieve good performance, but in current GPU programming systems, programmers must exploit them manually. This can lead to poor performance. In this thesis, we propose automatic techniques that: i) perform data transfers between the CPU and GPU, ii) allocate resources for concurrent kernels, and iii) schedule concurrent kernels efficiently without programmer intervention.

Most GPU programs access data in GPU memory for performance. Manually inserting data transfers that move data to and from this GPU memory is an error-prone and tedious task. In this work, we develop a software coherence mechanism to fully automate all data transfers between the CPU and GPU without any assistance from the programmer. Our mechanism uses compiler analysis to identify potential stale data accesses and uses a runtime to initiate transfers as necessary. This avoids redundant transfers that are exhibited by all other existing automatic memory management proposals for general purpose programs. We integrate our automatic memory manager into the X10 compiler and runtime, and find that it not only results in smaller and simpler programs, but also eliminates redundant memory transfers. Tested on eight programs ported from the Rodinia benchmark suite it achieves (i) a 1.06x speedup over hand-tuned manual memory management, and (ii) a 1.29x speedup over another recently proposed compiler--runtime automatic memory management system. Compared to other existing runtime-only (ADSM) and compiler-only (OpenMPC) proposals, it also transfers 2.2x to 13.3x less data on average.

Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite. Current GPUs therefore allow concurrent execution of kernels to improve utilization. We study concurrent execution of GPU kernels using multiprogrammed workloads on current NVIDIA Fermi GPUs. On two-program workloads from Parboil2 we find concurrent execution is often no better than serialized execution. We identify lack of control over resource allocation to kernels as a major serialization bottleneck. We propose transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage. We then propose several elastic-kernel aware runtime concurrency policies that offer significantly better performance and concurrency than the current CUDA policy. We evaluate our proposals on real hardware using multiprogrammed workloads constructed from benchmarks in the Parboil2 suite. On average, our proposals increase system throughput (STP) by 1.21x and improve the average normalized turnaround time (ANTT) by 3.73x for two-program workloads over the current CUDA concurrency implementation.

Recent NVIDIA GPUs use a FIFO policy in their thread block scheduler (TBS) to schedule thread blocks of concurrent kernels. We show that FIFO leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the Shortest Remaining Time First (SRTF) policy instead. Since SRTF requires an estimate of runtime (i.e. execution time), we introduce Structural Runtime Prediction that uses the grid structure of GPU programs for predicting runtimes. Using a novel Staircase model of GPU kernel execution, we show that kernel runtime can be predicted by profiling only the first few thread blocks. We evaluate an online predictor based on this model on benchmarks from ERCBench and find that predictions made after the execution of single thread block are between 0.48x to 1.08x of actual runtime. %Next, we design a thread block scheduler that is both concurrent kernel-aware and incorporates this predictor. We implement the SRTF policy for concurrent kernels that uses this predictor and evaluate it on two-program workloads from ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. Compared to MPMax, a state-of-the-art resource allocation policy for concurrent kernels, SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also propose SRTF/Adaptive which controls resource usage of concurrently executing kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by 2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of SRTF achieves STP to within 12.64% of Shortest Job First (SJF, an oracle optimal scheduling policy), bridging 49% of the gap between FIFO and SJF.

Стилі APA, Harvard, Vancouver, ISO та ін.

28

Quinane, Luke. "An Examination of Deferred Reference Counting and Cycle Detection." Thesis, 2003. http://hdl.handle.net/1885/42030.

Повний текст джерела

Анотація:

Object-oriented programing languages are becoming increasingly important as are managed runtime-systems. An area of importance in such systems is dynamic automatic memory management. A key function of dynamic automatic memory management is detecting and reclaiming discarded memory regions; this is also referred to as garbage collection. A significant proportion of research has been conducted in the field of memory management, and more specifically garbage collection techniques. In the past, adequate comparisons against a range of competing algorithms and implementations has often been overlooked. JMTk is a flexible memory management toolkit, written in Java, which attempts to provide a testbed for such comparisons. This thesis aims to examine the implementation of one algorithm currently available in JMTk: the deferred reference counter. Other research has shown that the reference counter in JMTk performs poorly both in throughput and responsiveness. Several aspects of the reference counter are tested, including the write barrier, allocation cost, increment and decrement processing and cycle-detection. The results of these examinations found the bump-pointer to be 8% faster than the free-list in raw allocation. The cost of the reference counting write barrier was determined to be 10% on the PPC architecture and 20% on the i686 architecture. Processing increments in the write barrier was found to be up to 13% faster than buffering them until collection time on a uni-processor platform. Cycle detection was identified as a key area of cost in reference counting. In order to improve the performance of the deferred reference counter and to contribute to the JMTk testbed, a new algorithm for detecting cyclic garbage was described. This algorithm is based on a mark scan approach to cycle detection. Using this algorithm, two new cycle detectors were implemented and compared to the original trial deletion cycle detector. The semi-concurrent cycle detector had the best throughput, outperforming trial deletion by more than 25% on the javac benchmark. The non-concurrent cycle detector had poor throughput attributed to poor triggering heuristics. Both new cycle detectors had poor pause times. Even so, the semi-concurrent cycle detector had the lowest pause times on the javac benchmark. The work presented in this thesis contributes to an evaluation of components of the reference counter and a comparsion between approaches to reference counting implementation. Previous to this work, the cost of the reference counter's components had not been quantified. Additionally, past work presented different approaches to reference counting implementation as a whole, instead of individual components.

Стилі APA, Harvard, Vancouver, ISO та ін.

29

SUN, SHOU-LIANG, and 孫守亮. "Dynamic Memory Management for Xen Virtualization Platforms." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/s4xhyj.

Повний текст джерела

Анотація:

碩士
國立屏東大學
資訊工程學系碩士班
105
Virtual memory is a typical mechanism in modern operating systems, which provides extended memory space from disks for a memory overcommit situation (i.e., system with insufficient mem- ory). Although the sufficient amount of memory can be provided, the performance is impaired due to a large amount of disk I/O requests is required for the enabling of Virtual memory. In virtu- alization environments, however, these I/O requests may degrade another virtual machine (VM) because VMs shared the disk on the same physical machine (i.e., disk contentions). In this thesis, we propose an dynamic memory management approach for Xen virtualization systems, called critical amount guaranteed memory allocation (CAGMA), to expand or shrink the allocated memory of a VM dynamically with a guaranteed amount of available memory. Under CAGMA, a critical memory amount is calculated for each VM periodically and at the time a swapping event is occurred or virtual memory is enabled. The allocated memory of each VM is then adjusted according to its critical memory amount so that the number of I/O requests generated for virtual memory could be reduced greatly and the performance degradation problem could be prevented. Our proposed CAGMA has been implemented in Xen 4.2.2 and a series of experiments have been conducted for which some encouraging results were obtained.

Стилі APA, Harvard, Vancouver, ISO та ін.

30

Hsiao, Kuang-tse, and 蕭光哲. "Customized Dynamic Memory Management for Embedded Systems." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/31795309229125915660.

Повний текст джерела

Анотація:

碩士
國立成功大學
電腦與通信工程研究所
97
More and more applications for multimedia or for network services are ported to embedded systems. These applications usually depend on dynamic memory usage for their executions due to the inherent unpredictability in the processing. Many general dynamic memory management policies and their implementations are available to achieve good performance in general purpose systems. However, using a common and general purpose memory management mechanism might not be suitable for embedded system platforms due to resource constraint. Instead, customized dynamic memory management would be a suitable solution. When customizing dynamic memory management is integrated into the porting process or development process, it would be an effective approach to produce efficient memory usage and better performance at low cost. This thesis presents the design and implementation of customized dynamic memory management for embedded system platforms. There are two parts in the customized dynamic memory management: configurable dynamic memory management mechanisms for applications and supporting tools which help select memory management mechanisms. By providing several dynamic memory management mechanisms, application developer may choose suitable management mechanisms based on the pattern of memory requests. The supporting tools provide information on memory usage by collecting data of memory requests at run time and analyzing the collected data. The data are collected from both application side and operating system side. Hence, the choice on suitable dynamic memory management mechanism might, at least, achieve balance between these two aspects and better performance. The customized dynamic memory management of this thesis is implemented in the Zinix micro-kernel operating system running on TI DaVinci EVM. The implementation includes memory management policies in function library to be used by application software, and page management policies in operating system. To help choose suitable memory management policy for application software, a tool which analyzes memory usage of application software and suggests suitable choice, is implemented. The implementation is tested with H.264 decoder software and predicted 3D objects rendering software. The results are promising as predicted and the efficient memory usage makes the test programs execute faster.

Стилі APA, Harvard, Vancouver, ISO та ін.

31

Ke, Dian-Chia, and 柯典嘉. "Dynamic Memory Management on uC/OS-II." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/11987121996736990793.

Повний текст джерела

Анотація:

碩士
臺灣大學
資訊工程學研究所
98
There is an increasing demand for more memory to satisfy complex execution of applications even in embedded or real-time systems. The implementation of more memory is usually done by using paging or virtual memory. However, page faults in such systems impact the performance of memory access significantly and result in unpredictable response time. So, reducing page fault rate is critical to improve the system performance. In this thesis, a dynamic memory management scheme based on paging on μC/OS-II is proposed. Memory pages are allocated to tasks relying on task priorities, so pages of high-priority tasks are more likely to be kept in memory, while those of low-priority tasks may be easier replaced. By this way, pages are prevented from being re-allocated frequently, resulting in a lower page fault rate. In my experiment, the page fault rate could reduce 66\%. More experiments under different scenarios are also designed to test how the performance would be influenced. The results show that such prioritized scheme improves the overall performance of page fault rate, especially with wide memory access range, more number of tasks, and low aging strategy. I believe dynamic memory management with task priority can effectively improve the overall performance of many embedded and real-time systems.

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Hsu, Chao-Hung, and 徐肇鴻. "Dynamic Memory Optimization and Parallelism Management for OpenCL." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/78409180200517991120.

Повний текст джерела

Анотація:

碩士
國立交通大學
資訊科學與工程研究所
102
Recently, multiprocessor platforms have become trends for achieving high performance. Multiprocessor platforms may be categorized into homogeneous multiprocessor platforms and heterogeneous multiprocessor platforms. For some applications with large concurrency, such as digital signal processing, linear algebra matrix operations, and so on, executing on heterogeneous multiprocessors usually achieves higher performance than on homogeneous multiprocessors. However, it is difficult and tedious to program applications for executing on heterogeneous multiprocessors. OpenCL (Open Computing Language), released by Khronos group, is one of the programming standards for heterogeneous multiprocessor, and provides portability for heterogeneous multiprocessor platforms. OpenCL may support three types of device, CPUs (Central Processing Unit), GPUs (Graphic Processing Unit), and accelerators. Our research focuses on platforms with CPUs and GPUs, because GPUs are now widespread in use. On such a platform, two programming issues may affect the performance on GPU computing significantly. One is the work load distribution including parallelizing application into work items and distributing work items into workgroups. The other is the employment of GPU memory hierarchy. To fully utilize the characteristics of GPUs, programmers have to be not only proficient at parallel programming but also familiar with hardware specification. Therefore, in this thesis, we propose a compilation pass to automatically perform optimizations for OpenCL kernels. The input is a naïve kernel which is functionally correct without optimization for performance improvement. Our compilation pass will transform the input kernel function with optimizations, including kernel function analysis, work-group rearrangement, memory coalescing, and work-item merge. In addition, our framework is implemented on a runtime system so that it may dynamically adjust the optimizing parameters according to the hardware specifications. Although the optimizations performed in runtime may incur overheads of execution time, the overheads may be covered by massive kernel computation or input data in most cases. The experiment results of our benchmarks demonstrate that the applications may gain 1.3 times speedup in average. Therefore, we design and implement an optimization pass for OpenCL which may take hardware specification of target platform into account for optimization in a runtime compiler framework based on LLVM.

Стилі APA, Harvard, Vancouver, ISO та ін.

33

Hong, Wan-Zen, and 洪婉荏. "Development of Automatic Deployment and Dynamic Resource Adjustment for Virtual Machine Management." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/13435496510826130135.

Повний текст джерела

Анотація:

碩士
國立中興大學
資訊科學與工程學系所
99
As the extensive use of virtualization technology in the cloud computing environment, in particular IaaS (Infrastructure as a Service), how to effectively manage a cluster of virtual machines has been an emergent problem to be resolve. Other than the adoption of public cloud management platform, virtual machine management under limited resources in the private cloud is a challenge. This thesis focuses on the IaaS management issues in the private cloud environment, and provides an integration resource adjustment. By applying libvirt API for management, each physical machine is assumed to be equipped with KVM and QEMU. Menu-style model is developed for virtual machines configuration so that users can save time as well as reduce operational mistakes. SNMP is employed to regularly monitor CPU and memory usage on both physical machines and virtual machines. Based on the resource consumption, we are able to automatically adjust the memory allocation for each virtual machine. When its memory usage exceeds 80%, an extra memory will be added if available. If the physical machine can not supply the additional memory demand for virtual machines, migrated to another physical machine will be automatically taken place. Experimental results show that by applying the proposed mechanism, the loading of virtual machines and physical machines can be balanced and resources can be efficiently utilized.

Стилі APA, Harvard, Vancouver, ISO та ін.

34

Ramashekar, Thejas. "Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines." Thesis, 2013. http://etd.iisc.ernet.in/handle/2005/2627.

Повний текст джерела

Анотація:

Multi-GPU machines are being increasingly used in high performance computing. These machines are being used both as standalone work stations to run computations on medium to large data sizes (tens of gigabytes) and as a node in a CPU-Multi GPU cluster handling very large data sizes (hundreds of gigabytes to a few terabytes). Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and managed at a on each GPU. A significant body of scientific applications that utilize multi-GPU machines contain computations inside affine loop nests, i.e., loop nests that have affine bounds and affine array access functions. These include stencils, linear-algebra kernels, dynamic programming codes and data-mining applications. Data allocation, buffer management, and coherency handling are critical steps that need to be performed to run affine applications on multi-GPU machines. Existing works that propose to automate these steps have limitations and in efficiencies in terms of allocation sizes, exploiting reuse, transfer costs and scalability. An automatic multi-GPU memory manager that can overcome these limitations and enable applications to achieve salable performance is highly desired. One technique that has been used in certain memory management contexts in the literature is that of bounding boxes. The bounding box of an array, for a given tile, is the smallest hyper-rectangle that encapsulates all the array elements accessed by that tile. In this thesis, we exploit the potential of bounding boxes for memory management far beyond their current usage in the literature. In this thesis, we propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding Box based Memory Manager (BBMM). BBMM is a compiler-assisted runtime memory manager. At compile time, it use static analysis techniques to identify a set of bounding boxes accessed by a computation tile. At run time, it uses the bounding box set operations such as union, intersection, difference, finding subset and superset relation to compute a set of disjoint bounding boxes from the set of bounding boxes identified at compile time. It also exploits the architectural capability provided by GPUs to perform fast transfers of rectangular (strided) regions of memory and hence performs all data transfers in terms of bounding boxes. BBMM uses these techniques to automatically allocate, and manage data required by applications (suitably tiled and parallelized for GPUs). This allows It to (1) allocate only as much data (or close to) as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence, maximize data reuse across tiles and minimize the data transfer overhead, (3) and as a result, enable applications to maximize the utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a system with four GPUs with various scientific programs showed that BBMM is able to reduce data allocations on each GPU by up to 75% compared to current allocation schemes, yield at least 88% of the performance of hand-optimized Open CL codes and allows excellent weak scaling.

Стилі APA, Harvard, Vancouver, ISO та ін.

35

Huetter, RJ. "RHmalloc : a very large, highly concurrent dynamic memory manager." Thesis, 2005. http://hdl.handle.net/10453/37375.

Повний текст джерела

Анотація:

University of Technology, Sydney. Dept. of Software Engineering.
Dynamic memory management (DMM) is a fundamental aspect of computing, directly affecting the capability, performance and reliability of virtually every system in existence today. Yet oddly, the fifty year research into DMM has not taken memory capacity into account, having fallen significantly behind hardware trends. Comparatively little research work on scalable DMM has been conducted – of the order of ten papers exist on this topic – all of which focus on CPU scalability only; the largest heap reported in the literature to date is 600MB. By contrast, symmetric multiprocessor (SMP) machines with terabytes of memory are now commercially available. The contribution of our research is the formal exploration, design, construction and proof of a general purpose, high performance dynamic memory manager which scales indefinitely with respect to both CPU and memory – one that can predictably manage a heap of arbitrary size, on any SMP machine with an arbitrary number of CPU’s, without a priori knowledge. We begin by recognizing the scattered, inconsistency of the literature surrounding this topic. Firstly, to ensure clarity, we present a simplified introduction, followed by a catalog of the fundamental techniques. We discuss the melting pot of engineering tradeoffs, so as to establish a sound basis to tackle the issue at hand – large scale DMM. We review both the history and state of the art, from which significant insight into this topic is to be found. We then explore the problem space and suggest a workable solution. Our proposal, known as RHmalloc, is based on the novel perspective that, a highly scalable heap can be viewed as, an unbounded set of finite-sized sub-heaps, where each sub-heap maybe concurrently shared by any number of threads; such that a suitable sub-heap can be found in O(1) time, and an allocation from a suitable sub-heap is also O(1). Testing the design properties of RHmalloc, we show by extrapolation that, RHmalloc will scale to at least 1,024 CPU’s and 1PB; and we theoretically prove that DMM scales indefinitely with respect to both CPU and memory. Most importantly, the approach for the scalability proof alludes to a general analysis and design technique for systems of this nature.

Стилі APA, Harvard, Vancouver, ISO та ін.

36

Kim, Yŏng-jin. "Hybrid approaches to solve dynamic fleet management problems." Thesis, 2003. http://wwwlib.umi.com/cr/utexas/fullcit?p3116412.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

37

Lee, Chang Joo 1975. "DRAM-aware prefetching and cache management." Thesis, 2010. http://hdl.handle.net/2152/ETD-UT-2010-12-2492.

Повний текст джерела

Анотація:

Main memory system performance is crucial for high performance microprocessors. Even though the peak bandwidth of main memory systems has increased through improvements in the microarchitecture of Dynamic Random Access Memory (DRAM) chips, conventional on-chip memory systems of microprocessors do not fully take advantage of it. This results in underutilization of the DRAM system, in other words, many idle cycles on the DRAM data bus. The main reason for this is that conventional on-chip memory system designs do not fully take into account important DRAM characteristics. Therefore, the high bandwidth of DRAM-based main memory systems cannot be realized and exploited by the processor. This dissertation identifies three major performance-related characteristics that can significantly affect DRAM performance and makes a case for DRAM characteristic-aware on-chip memory system design. We show that on-chip memory resource management policies (such as prefetching, buffer, and cache policies) that are aware of these DRAM characteristics can significantly enhance entire system performance. The key idea of the proposed mechanisms is to send out to the DRAM system useful memory requests that can be serviced with low latency or in parallel with other requests rather than requests that are serviced with high latency or serially. Our evaluations demonstrate that each of the proposed DRAM-aware mechanisms significantly improves performance by increasing DRAM utilization for useful data. We also show that when employed together, the performance benefit of each mechanism is achieved additively: they work synergistically and significantly improve the overall system performance of both single-core and Chip MultiProcessor (CMP) systems.
text

Стилі APA, Harvard, Vancouver, ISO та ін.

38

Sartor, Jennifer Bedke. "Exploiting language abstraction to optimize memory efficiency." Thesis, 2010. http://hdl.handle.net/2152/ETD-UT-2010-08-1919.

Повний текст джерела

Анотація:

The programming language and underlying hardware determine application performance, and both are undergoing revolutionary shifts. As applications have become more sophisticated and capable, programmers have chosen managed languages in many domains for ease of development. These languages abstract memory management from the programmer, which can introduce time and space overhead but also provide opportunities for dynamic optimization. Optimizing memory performance is in part paramount because hardware is reaching physical limits. Recent trends towards chip multiprocessor machines exacerbate the memory system bottleneck because they are adding cores without adding commensurate bandwidth. Both language and architecture trends add stress to the memory system and degrade application performance. This dissertation exploits the language abstraction to analyze and optimize memory efficiency on emerging hardware. We study the sources of memory inefficiencies on two levels: heap data and hardware storage traffic. We design and implement optimizations that change the heap layout of arrays, and use program semantics to eliminate useless memory traffic. These techniques improve memory system efficiency and performance. We first quantitatively characterize the problem by comparing many data compression algorithms and their combinations in a limit study of Java benchmarks. We find that arrays are a dominant source of heap inefficiency. We introduce z-rays, a new array layout design, to bridge the gap between fast access, space efficiency and predictability. Z-rays facilitate compression and offer flexibility, and time and space efficiency. We find that there is a semantic mismatch between managed languages, with their rapid allocation rates, and current hardware, causing unnecessary and excessive traffic in the memory subsystem. We take advantage of the garbage collector's identification of dead data regions, communicating information to the caches to eliminate useless traffic to memory. By reducing traffic and bandwidth, we improve performance. We show that the memory abstraction in managed languages is not just a cost to be borne, but an opportunity to alleviate the memory bottleneck. This thesis shows how to exploit this abstraction to improve space and time efficiency and overcome the memory wall. We enhance the productivity and performance of ubiquitous managed languages on current and future architectures.
text

Стилі APA, Harvard, Vancouver, ISO та ін.

39

Jacob, Joseph 1971. "Automatic scheduling and dynamic load sharing of parallel computations on heterogeneous workstation clusters." Thesis, 1995. http://hdl.handle.net/1957/34688.

Повний текст джерела

Анотація:

Parallel computing on heterogeneous workstation clusters has proved to be a very efficient use of available resources, increasing their overall utilization. However, for it to be a viable alternative to expensive, dedicated parallel machines, a number of key issues need to be resolved. One of the major challenges of heterogeneous computing is coping with the inherent heterogeneity of the system, with the availability of workstations from different vendors of varying processing speeds and capabilities. The existence of multiple jobs and users further complicates the task. The time taken for a parallel job is constrained by the time taken by the slowest or the most heavily loaded workstation. Therefore, load sharing of parallel computations is imperative in ensuring good overall utilization of the system. Since load sharing is essentially independent of the particular parallel job being run, the development of program independent, automatic, scheduling and load sharing strategies have become vital to the efficient use of the heterogeneous cluster. This thesis discusses various prior approaches to load sharing, examines a new strategy developed for heterogeneous workstations, and evaluates its performance.
Graduation date: 1996

Стилі APA, Harvard, Vancouver, ISO та ін.

40

Ha, Chi Yuan, and 哈䆊遠. "Energy Saving Designs with Joint Considerations of Non-Volatile Memory Allocation and Dynamic Power Management." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/kyzq6n.

Повний текст джерела

Анотація:

碩士
長庚大學
資訊工程學系
105
To explore more computing power in a hardware system, vendors and researchers had been focusing on increasing the clock rate and including more components in a chip, but these technologies had also led to high energy consumption and overheating problems. When energy saving is considered, the Dynamic Power Management(DPM) technique can temporarily turn off some components in a chip or change the operating modes of some parts in a system. However, the expense of DPM energy saving is the long latency of system resuming to the active mode, which might violate application performance requirements and can cause extra energy consumption during the resumption period. This work exploits Non-Volatile Memory (NVM) for reducing the system suspending and resuming time so as to make our DPM algorithm more practical and effective for energy saving designs on Internet of Things (IoT) and wearable devices. The major challenge of this work is on the co-design of task scheduling and memory allocation for energy saving and/or better performance, which has renewed the system synthesis problem with a more flexible computing and memory architecture. By exploring application behaviors, this work uses NVM to keep critical data and binaries during system idle periods so as to reduce the system resumption time and further save more energy. Real-time analysis is provided for the performance guarantee of the applications on the target system. Experiments with the NVMain simulator and the WCET benchmark suite were conducted to evaluate the performance of our algorithm. Experimental results show that our solution can significantly save more energy compared to the other solutions.

Стилі APA, Harvard, Vancouver, ISO та ін.

41

Chandan, G. "Effective Automatic Computation Placement and Data Allocation for Parallelization of Regular Programs." Thesis, 2014. http://hdl.handle.net/2005/3111.

Повний текст джерела

Анотація:

Scientiﬁc applications that operate on large data sets require huge amount of computation power and memory. These applications are typically run on High Performance Computing (HPC) systems that consist of multiple compute nodes, connected over an network interconnect such as InﬁniBand. Each compute node has its own memory and does not share the address space with other nodes. A signiﬁcant amount of work has been done in past two decades on parallelizing for distributed-memory architectures. A majority of this work was done in developing compiler technologies such as high performance Fortran (HPF) and partitioned global address space (PGAS). However, several steps involved in achieving good performance remained manual. Hence, the approach currently used to obtain the best performance is to rely on highly tuned libraries such as ScaLAPACK. The objective of this work is to improve automatic compiler and runtime support for distributed-memory clusters for regular programs. Regular programs typically use arrays as their main data structure and array accesses are afﬁne functions of outer loop indices and program parameters. A lot of scientiﬁc applications such as linear-algebra kernels, stencils, partial differential equation solvers, data-mining applications and dynamic programming codes fall in this category. In this work, we propose techniques for ﬁnding computation mapping and data allocation when compiling regular programs for distributed-memory clusters. Techniques for transformation and detection of parallelism, relying on the polyhedral framework already exist. We propose automatic techniques to determine computation placements for identiﬁed parallelism and allocation of data. We model the problem of ﬁnding good computation placement as a graph partitioning problem with the constraints to minimize both communication volume and load imbalance for entire program. We show that our approach for computation mapping is more effective than those that can be developed using vendor-supplied libraries. Our approach for data allocation is driven by tiling of data spaces along with a compiler assisted runtime scheme to allocate and deallocate tiles on-demand and reuse them. Experimental results on some sequences of BLAS calls demonstrate a mean speedup of 1.82× over versions written with ScaLAPACK. Besides enabling weak scaling for distributed memory, data tiling also improves locality for shared-memory parallelization. Experimental results on a 32-core shared-memory SMP system shows a mean speedup of 2.67× over code that is not data tiled.

Стилі APA, Harvard, Vancouver, ISO та ін.

42

Jindal, Prachee. "Compiler Assisted Energy Management For Sensor Network Nodes." Thesis, 2008. http://hdl.handle.net/2005/819.

Повний текст джерела

Анотація:

Emerging low power, embedded, wireless sensor devices are useful for wide range of applications, yet have very limited processing storage and especially energy resources. Sensor networks have a wide variety of applications in medical monitoring, environmental sensing and military surveillance. Due to the large number of sensor nodes that may be deployed and the required long system lifetimes, replacing the battery is not an option. Sensor systems must utilize the minimal possible energy while operating over a wide range of operating scenarios. The most of the efforts in the energy management in sensor networks have concentrated on minimizing energy consumption in the communication subsystem. Some researchers have also dealt with the issue of minimizing the energy in computing subsystem of a sensor network node. Some proposals using energy aware software have also been made. Relatively little work has been done on compiler controlled energy management in sensor networks. In this thesis, we present our investigations on how compiler techniques can be used to minimize CPU energy consumption in sensor network nodes. One effectively used energy management technique in general purpose processors, is dynamic voltage scaling. In this thesis we implement and evaluate a compiler assisted DVS algorithm and show its usefulness for a small sensor node processor. We were able to achieve an energy saving of 29% with a little performance slowdown. Scratchpad memories have been widely used for improving performance. In this thesis we show that if the scratchpad size for the system is chosen carefully, then large energy savings can be achieved by using a compiler assisted scratchpad allocation policy. With a small size of 512 byte scratchpad memory we were able to achieve 50% of energy savings. We also studied the behavior of dynamic voltage scaling in presence of scratchpad memory. Our results show that in presence of scratchpad memory less opportunities are found for applying dynamic voltage scaling techniques. The sensor network community lacks a comprehensive benchmark suite, for our study we also implemented a set of applications, representative of computational workload on sensor network nodes. The techniques studied in this thesis can easily be integrated with existing energy management techniques in sensor networks, yielding in additional energy savings.

Стилі APA, Harvard, Vancouver, ISO та ін.

43

Cérat, Benjamin. "Étude de cas sur l’ajout de vecteurs d’enregistrements typés dans Gambit Scheme." Thèse, 2014. http://hdl.handle.net/1866/11984.

Повний текст джерела

Анотація:

Dans le but d’optimiser la représentation en mémoire des enregistrements Scheme dans le compilateur Gambit, nous avons introduit dans celui-ci un système d’annotations de type et des vecteurs contenant une représentation abrégée des enregistrements. Ces derniers omettent la référence vers le descripteur de type et l’entête habituellement présents sur chaque enregistrement et utilisent plutôt un arbre de typage couvrant toute la mémoire pour retrouver le vecteur contenant une référence. L’implémentation de ces nouvelles fonctionnalités se fait par le biais de changements au runtime de Gambit. Nous introduisons de nouvelles primitives au langage et modifions l’architecture existante pour gérer correctement les nouveaux types de données. On doit modifier le garbage collector pour prendre en compte des enregistrements contenants des valeurs hétérogènes à alignements irréguliers, et l’existence de références contenues dans d’autres objets. La gestion de l’arbre de typage doit aussi être faite automatiquement. Nous conduisons ensuite une série de tests de performance visant à déterminer si des gains sont possibles avec ces nouvelles primitives. On constate une amélioration majeure de performance au niveau de l’allocation et du comportement du gc pour les enregistrements typés de grande taille et des vecteurs d’enregistrements typés ou non. De légers surcoûts sont toutefois encourus lors des accès aux champs et, dans le cas des vecteurs d’enregistrements, au descripteur de type.
In order to optimize the in memory representation of Scheme records in the Gambit compiler, we introduce a type annotation system on record fields. We also introduce flat vector of records containing an abbreviated representation of those records. These vectors omit the header and reference to the type descriptor on contained records and use a type tree spanning the whole memory to recover the type as needed from an internal pointer. The implementation of the new functionnalities is done through changes in the Gambit runtime. We add new primitives to the language and modify the existing architecture to correctly handle the new data types in a way transparent that is transparent to the user. To do so, we modify the garbage collector to account to account for the existance of internal references and of heterogenous records whose fields may not be alligned to a word and need not be boxed. We also have to automatically and systematically update hte type tree to reflect the live vectors. To asses our implementation’s performance, we run a serie of benchmarks. We measure significant gains on allocation time and space with both typed records and contained records. We also measure a minor overhead in access costs on typed fields and major loss on accesses to the type descriptor of contained records.

Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Automatic dynamic memory management"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями