Готові списки джерел за темами / Non-uniform memory access

Добірка наукової літератури з теми "Non-uniform memory access"

Автор: Grafiati

Опубліковано: 6 вересня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Non-uniform memory access".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Non-uniform memory access"

Lameter, Christoph. "An overview of non-uniform memory access." Communications of the ACM 56, no. 9 (September 2013): 59–54. http://dx.doi.org/10.1145/2500468.2500477.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Lameter, Christoph. "NUMA (Non-Uniform Memory Access): An Overview." Queue 11, no. 7 (July 2013): 40–51. http://dx.doi.org/10.1145/2508834.2513149.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Wang, Rui-bo, Kai Lu, and Xi-cheng Lu. "Aware conflict detection of non-uniform memory access system and prevention for transactional memory." Journal of Central South University 19, no. 8 (August 2012): 2266–71. http://dx.doi.org/10.1007/s11771-012-1270-4.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

MOTLAGH, BAHMAN S., and RONALD F. DeMARA. "PERFORMANCE OF SCALABLE SHARED-MEMORY ARCHITECTURES." Journal of Circuits, Systems and Computers 10, no. 01n02 (February 2000): 1–22. http://dx.doi.org/10.1142/s0218126600000068.

Повний текст джерела

Анотація:

Analytical models were developed and simulations of memory latency were performed for Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), Local-Remote-Global (LRG), and RCR architectures for hit rates from 0.1 to 0.9 in steps of 0.1, memory access times of 10 to 100 ns, proportions of read/write access from 0.01 to 0.1, and block sizes of 8 to 64 words. The RCR architecture provides favorable performance over UMA and NUMA architectures for all ranges of application and system parameters. RCR outperforms LRG architectures when the hit rates of the processor cache exceed 80%and replicated memory exceed 25%. Thus, inclusion of a small replicated memory at each processor significantly reduces expected access time since all replicated memory hits become independent of global traffic. For configurations of up to 32 processors, results show that latency is further reduced by distinguishing burst-mode transfers between isolated memory accesses and those which are incrementally outside the working set.

Стилі APA, Harvard, Vancouver, ISO та ін.

Nikolopoulos, Dimitrios S., Ernest Artiaga, Eduard Ayguadé, and Jesús Labarta. "Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules." Scientific Programming 11, no. 2 (2003): 143–58. http://dx.doi.org/10.1155/2003/379739.

Повний текст джерела

Анотація:

In this paper we explore the idea of customizing and reusing loop schedules to improve the scalability of non-regular numerical codes in shared-memory architectures with non-uniform memory access latency. The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality. This transformation provides a great deal of flexibility in optimizing locality, without compromising the simplicity of the shared-memory programming paradigm. In particular, the programmer does not need to explicitly distribute data between processors. The paper presents practical examples from real applications and experiments showing the efficiency of the approach.

Стилі APA, Harvard, Vancouver, ISO та ін.

Denoyelle, Nicolas, Brice Goglin, Aleksandar Ilic, Emmanuel Jeannot, and Leonel Sousa. "Modeling Non-Uniform Memory Access on Large Compute Nodes with the Cache-Aware Roofline Model." IEEE Transactions on Parallel and Distributed Systems 30, no. 6 (June 1, 2019): 1374–89. http://dx.doi.org/10.1109/tpds.2018.2883056.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Priya, Bhukya Krishna, and N. Ramasubramanian. "Improving the Lifetime of Phase Change Memory by Shadow Dynamic Random Access Memory." International Journal of Service Science, Management, Engineering, and Technology 12, no. 2 (March 2021): 154–68. http://dx.doi.org/10.4018/ijssmet.2021030109.

Повний текст джерела

Анотація:

Emerging NVM are replacing the conventional memory technologies due to their huge cell density and low energy consumption. Restricted writes is one of the major drawbacks to adopt PCM memories in real-time environments. The non-uniform writes and process variations can damage the memory cell with intensive writes, as PCM memory cells are having restricted write endurance. To prolong the lifetime of a PCM, an extra DRAM shadow memory has been added to store the writes that comes to the PCM and to level out the wearing that occurs on the PCM. An extra address directory will store the address of data written to the DRAM and a counter is used to count the number of times the blocks are written into. Based upon the counter values, the data will be written from DRAM to the PCM. The data is written to the DRAM from the PCM, based on the data requirement. Experimental results show the reduction in overall writes in a PCM, which in turn improves the lifetime of a PCM by 5% with less hardware and power overhead.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wittig, Robert, Philipp Schulz, Emil Matus, and Gerhard P. Fettweis. "Accurate Estimation of Service Rates in Interleaved Scratchpad Memory Systems." ACM Transactions on Embedded Computing Systems 21, no. 1 (January 31, 2022): 1–15. http://dx.doi.org/10.1145/3457171.

Повний текст джерела

Анотація:

The prototyping of embedded platforms demands rapid exploration of multi-dimensional parameter sets. Especially the design of the memory system is essential to guarantee high utilization while reducing conflicts at the same time. To aid the design process, several probabilistic models to estimate the throughput of interleaved memory systems have been proposed. While accurately estimating the average throughput of the system, these models fail to determine the impact on individual processing elements. To mitigate this divergence, we extend three known models to include non-uniform access probabilities and priorities.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wang, Qing, Youyou Lu, Junru Li, Minhui Xie, and Jiwu Shu. "Nap: Persistent Memory Indexes for NUMA Architectures." ACM Transactions on Storage 18, no. 1 (February 28, 2022): 1–35. http://dx.doi.org/10.1145/3507922.

Повний текст джерела

Анотація:

We present Nap , a black-box approach that converts concurrent persistent memory (PM) indexes into non-uniform memory access (NUMA)-aware counterparts. Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. The NAL maintains (1) per-node partial views in PM for serving insert/update/delete operations with failure atomicity and (2) a global view in DRAM for serving lookup operations. The NAL eliminates remote PM accesses to hot items without inducing extra local PM accesses. Moreover, to handle dynamic workloads, Nap adopts a fast NAL switch mechanism. We convert five state-of-the-art PM indexes using Nap . Evaluation on a four-node machine with Optane DC Persistent Memory shows that Nap can improve the throughput by up to 2.3× and 1.56× under write-intensive and read-intensive workloads, respectively.

Стилі APA, Harvard, Vancouver, ISO та ін.

Știrb, Iulia. "Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree." Computers 7, no. 4 (December 3, 2018): 66. http://dx.doi.org/10.3390/computers7040066.

Повний текст джерела

Анотація:

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Non-uniform memory access"

Alnowaiser, Khaled Abdulrahman. "Garbage collection optimization for non uniform memory access architectures." Thesis, University of Glasgow, 2016. http://theses.gla.ac.uk/7495/.

Повний текст джерела

Анотація:

Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes ﬁve signiﬁcant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identiﬁes a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artiﬁcial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for conﬁguring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-speciﬁc and conﬁguring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Non-uniform memory access"

Steele, Guy L., Xiaowei Shen, Josep Torrellas, Mark Tuckerman, Eric J. Bohm, Laxmikant V. Kalé, Glenn Martyna, et al. "Cm* - The First Non-Uniform Memory Access Architecture." In Encyclopedia of Parallel Computing, 297–303. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-09766-4_14.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Rao, Ravishankar, Justin Wenck, Diana Franklin, Rajeevan Amirtharajah, and Venkatesh Akella. "Segmented Bitline Cache: Exploiting Non-uniform Memory Access Patterns." In High Performance Computing - HiPC 2006, 123–34. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11945918_17.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Gerofi, Balazs, Masamichi Takagi, and Yutaka Ishikawa. "Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs." In Lecture Notes in Computer Science, 242–53. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-14313-2_21.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Grosso, Roberto, Thomas Ertl, and Rainer Klier. "A Load Balancing Scheme for Parallelizing Hierarchical Splatting on a MPP System with a Non-uniform Memory Access Architecture." In High Performance Computing for Computer Graphics and Visualisation, 125–34. London: Springer London, 1996. http://dx.doi.org/10.1007/978-1-4471-1011-8_9.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Petersen, Wesley, and Peter Arbenz. "SIMD, Single Instruction Multiple Data." In Introduction to Parallel Computing. Oxford University Press, 2004. http://dx.doi.org/10.1093/oso/9780198515760.003.0008.

Повний текст джерела

Анотація:

The single instruction, multiple data (SIMD) mode is the simplest method of parallelism and now becoming the most common. In most cases this SIMD mode means the same as vectorization. Ten years ago, ve ctor computers were expensive but reasonably simple to program. Today, encouraged by multimedia applications, vector hardware is now commonly available in Intel Pentium III and Pentium 4 PCs, and Apple/Motorola G-4 machines. In this chapter, we will cover both old and new and find that the old paradigms for programming were simpler because CMOS or ECL memories permitted easy non-unit stride memory access. Most of the ideas are the same, so the simpler programming methodology makes it easy to understand the concepts. As PC and Mac compilers improve, perhaps automatic vectorization will become as effective as on the older non-cache machines. In the meantime, on PCs and Macs we will often need to use intrinsics ([23, 22, 51]). It seems at first that the intrinsics keep a programmer close to the hardware, which is not a bad thing, but this is somewhat misleading. Hardware control in this method of programming is only indirect. Actual register assignments are made by the compiler and may not be quite what the programmer wants. The SSE2 or Altivec programming serves to illustrate a form of instruction level parallelism we wish to emphasize. This form, SIMD or vectorization, has single instructions which operate on multiple data. There are variants on this theme which use templates or macros which consist of multiple instructions carefully scheduled to accomplish the same objective, but are not strictly speaking SIMD, for example see Section 1.2.2.1. Intrinsics are C macros which contain one or more SIMD instructions to execute certain operations on multiple data, usually 4-words/time in our case. Data are explicitly declared mm128 datatypes in the Intel SSE case and vector variables using the G-4 Altivec. Our examples will show you how this works. Four basic concepts are important: Consistent with our notion that examples are the best way to learn, several will be illustrated: • from linear algebra, the Level 1 basic linear algebra subprograms (BLAS) — vector updates (-axpy) — reduction operations and linear searches • recurrence formulae and polynomial evaluations • uniform random number generation.

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Non-uniform memory access"

Shin, Wongyu, Jeongmin Yang, Jungwhan Choi, and Lee-Sup Kim. "NUAT: A non-uniform access time memory controller." In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2014. http://dx.doi.org/10.1109/hpca.2014.6835956.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Guo, Xiaomei, and Haiyun Han. "The Research of a Memory Accesses Behavior on Non-Uniform Memory Access Architecture." In 2019 10th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 2019. http://dx.doi.org/10.1109/itme.2019.00174.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yao, Fan, Guru Venkataramani, and Miloš Doroslovački. "Covert Timing Channels Exploiting Non-Uniform Memory Access based Architectures." In GLSVLSI '17: Great Lakes Symposium on VLSI 2017. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3060403.3060417.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Guo, Xiaomei. "A novel parallel FDTD algorithm on Non-Uniform Memory Access multiprocessors." In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). IEEE, 2016. http://dx.doi.org/10.1109/icis.2016.7550921.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Guo, Xiaomei, and Haiyun Han. "A good data allocation strategy on non-uniform memory access architecture." In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS). IEEE, 2017. http://dx.doi.org/10.1109/icis.2017.7960048.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Guo, Xiaomei, and Haiyun Han. "The Research of Several Situations About Memory Accessing on Non-Uniform Memory Access Architecture." In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS). IEEE, 2018. http://dx.doi.org/10.1109/icis.2018.8466393.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yang, R., J. Antony, and A. P. Rendell. "A Simple Performance Model for Multithreaded Applications Executing on Non-uniform Memory Access Computers." In 2009 11th IEEE International Conference on High Performance Computing and Communications. IEEE, 2009. http://dx.doi.org/10.1109/hpcc.2009.39.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Marchi, Felipe, and Rafael Stubs Parpinelli. "Exploring the Non-Uniform Memory Access Parallel Architecture Applied to the Protein Structure Prediction Problem." In Congresso Brasileiro de Inteligência Computacional. SBIC, 2021. http://dx.doi.org/10.21528/cbic2021-97.

Повний текст джерела

Анотація:

Proteins are base molecules present in live organisms. The study of their structures and functions is of considerable importance for many application fields, particularly for the pharmaceutical area. However, predict the structure of a protein is considered a complex problem. As optimizing methods for this problem have high execution time, a parallel algorithm was proposed. However, just employing parallelization is not enough to guarantee the efficient use of the available computational resources. In this work, the proposed PSP optimizer was executed in a system with NUMA architecture. To demonstrate the effects of this architecture on the execution of an algorithm with simple parallel model, experiments were carried. Results shows that the that the improper execution of a parallel algorithm in this architecture may lead to performance loss.

Стилі APA, Harvard, Vancouver, ISO та ін.

Carothers, Christopher D., Kalyan S. Perumalla, and Richard M. Fujimoto. "The effect of state-saving in optimistic simulation on a cache-coherent non-uniform memory access architecture." In the 31st conference. New York, New York, USA: ACM Press, 1999. http://dx.doi.org/10.1145/324898.325340.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Plauth, Max, Wieland Hagen, Frank Feinbube, Felix Eberhardt, Lena Feinbube, and Andreas Polze. "Parallel Implementation Strategies for Hierarchical Non-uniform Memory Access Systems by Example of the Scale-Invariant Feature Transform Algorithm." In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2016. http://dx.doi.org/10.1109/ipdpsw.2016.47.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!