Academic literature on the topic 'Non-uniform memory access'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Non-uniform memory access.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Non-uniform memory access"

1

Lameter, Christoph. "An overview of non-uniform memory access." Communications of the ACM 56, no. 9 (September 2013): 59–54. http://dx.doi.org/10.1145/2500468.2500477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lameter, Christoph. "NUMA (Non-Uniform Memory Access): An Overview." Queue 11, no. 7 (July 2013): 40–51. http://dx.doi.org/10.1145/2508834.2513149.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Rui-bo, Kai Lu, and Xi-cheng Lu. "Aware conflict detection of non-uniform memory access system and prevention for transactional memory." Journal of Central South University 19, no. 8 (August 2012): 2266–71. http://dx.doi.org/10.1007/s11771-012-1270-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

MOTLAGH, BAHMAN S., and RONALD F. DeMARA. "PERFORMANCE OF SCALABLE SHARED-MEMORY ARCHITECTURES." Journal of Circuits, Systems and Computers 10, no. 01n02 (February 2000): 1–22. http://dx.doi.org/10.1142/s0218126600000068.

Full text
Abstract:
Analytical models were developed and simulations of memory latency were performed for Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), Local-Remote-Global (LRG), and RCR architectures for hit rates from 0.1 to 0.9 in steps of 0.1, memory access times of 10 to 100 ns, proportions of read/write access from 0.01 to 0.1, and block sizes of 8 to 64 words. The RCR architecture provides favorable performance over UMA and NUMA architectures for all ranges of application and system parameters. RCR outperforms LRG architectures when the hit rates of the processor cache exceed 80%and replicated memory exceed 25%. Thus, inclusion of a small replicated memory at each processor significantly reduces expected access time since all replicated memory hits become independent of global traffic. For configurations of up to 32 processors, results show that latency is further reduced by distinguishing burst-mode transfers between isolated memory accesses and those which are incrementally outside the working set.
APA, Harvard, Vancouver, ISO, and other styles
5

Nikolopoulos, Dimitrios S., Ernest Artiaga, Eduard Ayguadé, and Jesús Labarta. "Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules." Scientific Programming 11, no. 2 (2003): 143–58. http://dx.doi.org/10.1155/2003/379739.

Full text
Abstract:
In this paper we explore the idea of customizing and reusing loop schedules to improve the scalability of non-regular numerical codes in shared-memory architectures with non-uniform memory access latency. The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality. This transformation provides a great deal of flexibility in optimizing locality, without compromising the simplicity of the shared-memory programming paradigm. In particular, the programmer does not need to explicitly distribute data between processors. The paper presents practical examples from real applications and experiments showing the efficiency of the approach.
APA, Harvard, Vancouver, ISO, and other styles
6

Denoyelle, Nicolas, Brice Goglin, Aleksandar Ilic, Emmanuel Jeannot, and Leonel Sousa. "Modeling Non-Uniform Memory Access on Large Compute Nodes with the Cache-Aware Roofline Model." IEEE Transactions on Parallel and Distributed Systems 30, no. 6 (June 1, 2019): 1374–89. http://dx.doi.org/10.1109/tpds.2018.2883056.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Priya, Bhukya Krishna, and N. Ramasubramanian. "Improving the Lifetime of Phase Change Memory by Shadow Dynamic Random Access Memory." International Journal of Service Science, Management, Engineering, and Technology 12, no. 2 (March 2021): 154–68. http://dx.doi.org/10.4018/ijssmet.2021030109.

Full text
Abstract:
Emerging NVM are replacing the conventional memory technologies due to their huge cell density and low energy consumption. Restricted writes is one of the major drawbacks to adopt PCM memories in real-time environments. The non-uniform writes and process variations can damage the memory cell with intensive writes, as PCM memory cells are having restricted write endurance. To prolong the lifetime of a PCM, an extra DRAM shadow memory has been added to store the writes that comes to the PCM and to level out the wearing that occurs on the PCM. An extra address directory will store the address of data written to the DRAM and a counter is used to count the number of times the blocks are written into. Based upon the counter values, the data will be written from DRAM to the PCM. The data is written to the DRAM from the PCM, based on the data requirement. Experimental results show the reduction in overall writes in a PCM, which in turn improves the lifetime of a PCM by 5% with less hardware and power overhead.
APA, Harvard, Vancouver, ISO, and other styles
8

Wittig, Robert, Philipp Schulz, Emil Matus, and Gerhard P. Fettweis. "Accurate Estimation of Service Rates in Interleaved Scratchpad Memory Systems." ACM Transactions on Embedded Computing Systems 21, no. 1 (January 31, 2022): 1–15. http://dx.doi.org/10.1145/3457171.

Full text
Abstract:
The prototyping of embedded platforms demands rapid exploration of multi-dimensional parameter sets. Especially the design of the memory system is essential to guarantee high utilization while reducing conflicts at the same time. To aid the design process, several probabilistic models to estimate the throughput of interleaved memory systems have been proposed. While accurately estimating the average throughput of the system, these models fail to determine the impact on individual processing elements. To mitigate this divergence, we extend three known models to include non-uniform access probabilities and priorities.
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Qing, Youyou Lu, Junru Li, Minhui Xie, and Jiwu Shu. "Nap: Persistent Memory Indexes for NUMA Architectures." ACM Transactions on Storage 18, no. 1 (February 28, 2022): 1–35. http://dx.doi.org/10.1145/3507922.

Full text
Abstract:
We present Nap , a black-box approach that converts concurrent persistent memory (PM) indexes into non-uniform memory access (NUMA)-aware counterparts. Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. The NAL maintains (1) per-node partial views in PM for serving insert/update/delete operations with failure atomicity and (2) a global view in DRAM for serving lookup operations. The NAL eliminates remote PM accesses to hot items without inducing extra local PM accesses. Moreover, to handle dynamic workloads, Nap adopts a fast NAL switch mechanism. We convert five state-of-the-art PM indexes using Nap . Evaluation on a four-node machine with Optane DC Persistent Memory shows that Nap can improve the throughput by up to 2.3× and 1.56× under write-intensive and read-intensive workloads, respectively.
APA, Harvard, Vancouver, ISO, and other styles
10

Știrb, Iulia. "Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree." Computers 7, no. 4 (December 3, 2018): 66. http://dx.doi.org/10.3390/computers7040066.

Full text
Abstract:
The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Non-uniform memory access"

1

Alnowaiser, Khaled Abdulrahman. "Garbage collection optimization for non uniform memory access architectures." Thesis, University of Glasgow, 2016. http://theses.gla.ac.uk/7495/.

Full text
Abstract:
Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Non-uniform memory access"

1

Steele, Guy L., Xiaowei Shen, Josep Torrellas, Mark Tuckerman, Eric J. Bohm, Laxmikant V. Kalé, Glenn Martyna, et al. "Cm* - The First Non-Uniform Memory Access Architecture." In Encyclopedia of Parallel Computing, 297–303. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-09766-4_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rao, Ravishankar, Justin Wenck, Diana Franklin, Rajeevan Amirtharajah, and Venkatesh Akella. "Segmented Bitline Cache: Exploiting Non-uniform Memory Access Patterns." In High Performance Computing - HiPC 2006, 123–34. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11945918_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gerofi, Balazs, Masamichi Takagi, and Yutaka Ishikawa. "Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs." In Lecture Notes in Computer Science, 242–53. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-14313-2_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Grosso, Roberto, Thomas Ertl, and Rainer Klier. "A Load Balancing Scheme for Parallelizing Hierarchical Splatting on a MPP System with a Non-uniform Memory Access Architecture." In High Performance Computing for Computer Graphics and Visualisation, 125–34. London: Springer London, 1996. http://dx.doi.org/10.1007/978-1-4471-1011-8_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Petersen, Wesley, and Peter Arbenz. "SIMD, Single Instruction Multiple Data." In Introduction to Parallel Computing. Oxford University Press, 2004. http://dx.doi.org/10.1093/oso/9780198515760.003.0008.

Full text
Abstract:
The single instruction, multiple data (SIMD) mode is the simplest method of parallelism and now becoming the most common. In most cases this SIMD mode means the same as vectorization. Ten years ago, ve ctor computers were expensive but reasonably simple to program. Today, encouraged by multimedia applications, vector hardware is now commonly available in Intel Pentium III and Pentium 4 PCs, and Apple/Motorola G-4 machines. In this chapter, we will cover both old and new and find that the old paradigms for programming were simpler because CMOS or ECL memories permitted easy non-unit stride memory access. Most of the ideas are the same, so the simpler programming methodology makes it easy to understand the concepts. As PC and Mac compilers improve, perhaps automatic vectorization will become as effective as on the older non-cache machines. In the meantime, on PCs and Macs we will often need to use intrinsics ([23, 22, 51]). It seems at first that the intrinsics keep a programmer close to the hardware, which is not a bad thing, but this is somewhat misleading. Hardware control in this method of programming is only indirect. Actual register assignments are made by the compiler and may not be quite what the programmer wants. The SSE2 or Altivec programming serves to illustrate a form of instruction level parallelism we wish to emphasize. This form, SIMD or vectorization, has single instructions which operate on multiple data. There are variants on this theme which use templates or macros which consist of multiple instructions carefully scheduled to accomplish the same objective, but are not strictly speaking SIMD, for example see Section 1.2.2.1. Intrinsics are C macros which contain one or more SIMD instructions to execute certain operations on multiple data, usually 4-words/time in our case. Data are explicitly declared mm128 datatypes in the Intel SSE case and vector variables using the G-4 Altivec. Our examples will show you how this works. Four basic concepts are important: Consistent with our notion that examples are the best way to learn, several will be illustrated: • from linear algebra, the Level 1 basic linear algebra subprograms (BLAS) — vector updates (-axpy) — reduction operations and linear searches • recurrence formulae and polynomial evaluations • uniform random number generation.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Non-uniform memory access"

1

Shin, Wongyu, Jeongmin Yang, Jungwhan Choi, and Lee-Sup Kim. "NUAT: A non-uniform access time memory controller." In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2014. http://dx.doi.org/10.1109/hpca.2014.6835956.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Guo, Xiaomei, and Haiyun Han. "The Research of a Memory Accesses Behavior on Non-Uniform Memory Access Architecture." In 2019 10th International Conference on Information Technology in Medicine and Education (ITME). IEEE, 2019. http://dx.doi.org/10.1109/itme.2019.00174.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Yao, Fan, Guru Venkataramani, and Miloš Doroslovački. "Covert Timing Channels Exploiting Non-Uniform Memory Access based Architectures." In GLSVLSI '17: Great Lakes Symposium on VLSI 2017. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3060403.3060417.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Guo, Xiaomei. "A novel parallel FDTD algorithm on Non-Uniform Memory Access multiprocessors." In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). IEEE, 2016. http://dx.doi.org/10.1109/icis.2016.7550921.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Guo, Xiaomei, and Haiyun Han. "A good data allocation strategy on non-uniform memory access architecture." In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS). IEEE, 2017. http://dx.doi.org/10.1109/icis.2017.7960048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Guo, Xiaomei, and Haiyun Han. "The Research of Several Situations About Memory Accessing on Non-Uniform Memory Access Architecture." In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS). IEEE, 2018. http://dx.doi.org/10.1109/icis.2018.8466393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, R., J. Antony, and A. P. Rendell. "A Simple Performance Model for Multithreaded Applications Executing on Non-uniform Memory Access Computers." In 2009 11th IEEE International Conference on High Performance Computing and Communications. IEEE, 2009. http://dx.doi.org/10.1109/hpcc.2009.39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Marchi, Felipe, and Rafael Stubs Parpinelli. "Exploring the Non-Uniform Memory Access Parallel Architecture Applied to the Protein Structure Prediction Problem." In Congresso Brasileiro de Inteligência Computacional. SBIC, 2021. http://dx.doi.org/10.21528/cbic2021-97.

Full text
Abstract:
Proteins are base molecules present in live organisms. The study of their structures and functions is of considerable importance for many application fields, particularly for the pharmaceutical area. However, predict the structure of a protein is considered a complex problem. As optimizing methods for this problem have high execution time, a parallel algorithm was proposed. However, just employing parallelization is not enough to guarantee the efficient use of the available computational resources. In this work, the proposed PSP optimizer was executed in a system with NUMA architecture. To demonstrate the effects of this architecture on the execution of an algorithm with simple parallel model, experiments were carried. Results shows that the that the improper execution of a parallel algorithm in this architecture may lead to performance loss.
APA, Harvard, Vancouver, ISO, and other styles
9

Carothers, Christopher D., Kalyan S. Perumalla, and Richard M. Fujimoto. "The effect of state-saving in optimistic simulation on a cache-coherent non-uniform memory access architecture." In the 31st conference. New York, New York, USA: ACM Press, 1999. http://dx.doi.org/10.1145/324898.325340.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Plauth, Max, Wieland Hagen, Frank Feinbube, Felix Eberhardt, Lena Feinbube, and Andreas Polze. "Parallel Implementation Strategies for Hierarchical Non-uniform Memory Access Systems by Example of the Scale-Invariant Feature Transform Algorithm." In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 2016. http://dx.doi.org/10.1109/ipdpsw.2016.47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography