Conecte-se

Bibliografias temáticas / Tradeoff bandwidth/memory

Literatura científica selecionada sobre o tema "Tradeoff bandwidth/memory"

Autor: Grafiati

Publicado: 25 de maio de 2024

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Índice

Artigos de revistas
Teses / dissertações
Trabalhos de conferências

Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "Tradeoff bandwidth/memory".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Artigos de revistas sobre o assunto "Tradeoff bandwidth/memory"

1

Ahmadi, Mahdieh, James Roberts, Emilio Leonardi e Ali Movaghar. "Cache Subsidies for an Optimal Memory for Bandwidth Tradeoff in the Access Network". IEEE Journal on Selected Areas in Communications 38, n.º 4 (abril de 2020): 736–49. http://dx.doi.org/10.1109/jsac.2020.2971806.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

2

Martin, Milo M. K., Pacia J. Harper, Daniel J. Sorin, Mark D. Hill e David A. Wood. "Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors". ACM SIGARCH Computer Architecture News 31, n.º 2 (maio de 2003): 206–17. http://dx.doi.org/10.1145/871656.859642.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

3

Kukreja, Navjot, Jan Hückelheim, Mathias Louboutin, John Washbourne, Paul H. J. Kelly e Gerard J. Gorman. "Lossy checkpoint compression in full waveform inversion: a case study with ZFPv0.5.5 and the overthrust model". Geoscientific Model Development 15, n.º 9 (12 de maio de 2022): 3815–29. http://dx.doi.org/10.5194/gmd-15-3815-2022.

Texto completo da fonte

Resumo:

Abstract. This paper proposes a new method that combines checkpointing methods with error-controlled lossy compression for large-scale high-performance full-waveform inversion (FWI), an inverse problem commonly used in geophysical exploration. This combination can significantly reduce data movement, allowing a reduction in run time as well as peak memory. In the exascale computing era, frequent data transfer (e.g., memory bandwidth, PCIe bandwidth for GPUs, or network) is the performance bottleneck rather than the peak FLOPS of the processing unit. Like many other adjoint-based optimization problems, FWI is costly in terms of the number of floating-point operations, large memory footprint during backpropagation, and data transfer overheads. Past work for adjoint methods has developed checkpointing methods that reduce the peak memory requirements during backpropagation at the cost of additional floating-point computations. Combining this traditional checkpointing with error-controlled lossy compression, we explore the three-way tradeoff between memory, precision, and time to solution. We investigate how approximation errors introduced by lossy compression of the forward solution impact the objective function gradient and final inverted solution. Empirical results from these numerical experiments indicate that high lossy-compression rates (compression factors ranging up to 100) have a relatively minor impact on convergence rates and the quality of the final solution.

Estilos ABNT, Harvard, Vancouver, APA, etc.

4

Singh, Shikha, Prashant Pandey, Michael A. Bender, Jonathan W. Berry, Martín Farach-Colton, Rob Johnson, Thomas M. Kroeger e Cynthia A. Phillips. "Timely Reporting of Heavy Hitters Using External Memory". ACM Transactions on Database Systems 46, n.º 4 (31 de dezembro de 2021): 1–35. http://dx.doi.org/10.1145/3472392.

Texto completo da fonte

Resumo:

Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.

Estilos ABNT, Harvard, Vancouver, APA, etc.

5

Gabbay, Freddy, Rotem Lev Aharoni e Ori Schweitzer. "Deep Neural Network Memory Performance and Throughput Modeling and Simulation Framework". Mathematics 10, n.º 21 (6 de novembro de 2022): 4144. http://dx.doi.org/10.3390/math10214144.

Texto completo da fonte

Resumo:

Deep neural networks (DNNs) are widely used in various artificial intelligence applications and platforms, such as sensors in internet of things (IoT) devices, speech and image recognition in mobile systems, and web searching in data centers. While DNNs achieve remarkable prediction accuracy, they introduce major computational and memory bandwidth challenges due to the increasing model complexity and the growing amount of data used for training and inference. These challenges introduce major difficulties not only due to the constraints of system cost, performance, and energy consumption, but also due to limitations in currently available memory bandwidth. The recent advances in semiconductor technologies have further intensified the gap between computational hardware performance and memory systems bandwidth. Consequently, memory systems are, today, a major performance bottleneck for DNN applications. In this paper, we present DRAMA, a deep neural network memory simulator. DRAMA extends the SCALE-Sim simulator for DNN inference on systolic arrays with a detailed, accurate, and extensive modeling and simulation environment of the memory system. DRAMA can simulate in detail the hierarchical main memory components—such as memory channels, modules, ranks, and banks—and related timing parameters. In addition, DRAMA can explore tradeoffs for memory system performance and identify bottlenecks for different DNNs and memory architectures. We demonstrate DRAMA’s capabilities through a set of experimental simulations based on several use cases.

Estilos ABNT, Harvard, Vancouver, APA, etc.

6

Radwan, Amr, Taghreed Ali Alenezi, Wejdan Alrashdan e Won-Joo Hwang. "Balancing Tradeoffs in Network Queue Management Problem via Forward–Backward Sweeping with Finite Checkpoints". Symmetry 15, n.º 7 (10 de julho de 2023): 1395. http://dx.doi.org/10.3390/sym15071395.

Texto completo da fonte

Resumo:

Network queue management can be modelled as an optimal control problem and is aimed at controlling the dropping rate, in which the state and control variables are the instantaneous queue length and the dropping rate, respectively. One way to solve it is by using an indirect method, namely forward–backward sweeping based on the Pontryagin minimum principle to derive control the trajectory of the dropping rate. However, there exists some performance balance issues in the network queue, such as memory usage versus runtime of the algorithm, or dropping rate versus network queue length. Many researchers have exploited symmetry for constrained systems, controllers, and model predictive control problems to achieve an exponential memory reduction and simple, intuitive optimal controllers. In this article, we introduce the integration of the checkpointing method into forward–backward sweeping to address such balancing issues. Specifically, we exploit the revolve algorithm in checkpointing and choose a finite number of checkpoints to reduce the complexity. Both numerical and simulation results in a popular network simulator (ns-2) are provided through two experiments: varying bandwidth and offered load, which solidify our proposal in comparison to other deployed queue management algorithms.

Estilos ABNT, Harvard, Vancouver, APA, etc.

7

Biryukov, Alex, e Dmitry Khovratovich. "Equihash: Asymmetric Proof-of-Work Based on the Generalized Birthday Problem". Ledger 2 (28 de abril de 2017): 1–30. http://dx.doi.org/10.5195/ledger.2017.48.

Texto completo da fonte

Resumo:

Proof-of-work is a central concept in modern cryptocurrencies and denial-ofservice protection tools, but the requirement for fast verification so far has made it an easy prey for GPU-, ASIC-, and botnet-equipped users. The attempts to rely on memory-intensive computations in order to remedy the disparity between architectures have resulted in slow or broken schemes. In this paper we solve this open problem and show how to construct an asymmetric proof-of-work (PoW) based on a computationally-hard problem, which requires a great deal of memory to generate a proof (called a ”memory-hardness” feature) but is instant to verify. Our primary proposal, Equihash, is a PoW based on the generalized birthday problem and enhanced Wagner’s algorithm for it. We introduce the new technique of algorithm binding to prevent cost amortization and demonstrate that possible parallel implementations are constrained by memory bandwidth. Our scheme has tunable and steep time-space tradeoffs, which impose large computational penalties if less memory is used. Our solution is practical and ready to deploy: a reference implementation of a proof-of-work requiring 700 MB of RAM runs in 15 seconds on a 2.1 GHz CPU, increases the computations by a factor of 1000 if memory is halved, and presents a proof of just 120 bytes long.

Estilos ABNT, Harvard, Vancouver, APA, etc.

8

Peterson, Brennan, Michael Kwan, Fred Duewer, Andrew Reid e Rhiannon Brooks. "Optimizing X-Ray Inspection for Advanced Packaging Applications". International Symposium on Microelectronics 2020, n.º 1 (1 de setembro de 2020): 000165–68. http://dx.doi.org/10.4071/2380-4505-2020.1.000165.

Texto completo da fonte

Resumo:

ABSTRACT Over the coming decade, advanced packaging will become increasingly critical to performance, cost, and density improvements in advanced electronics. There is both an industry push: cost and performance advances in transistor scaling are increasingly difficult. And there is an industry pull: customization for each market can be done far more quickly by assembling a series of parts in a package, rather than by design and integration into a single device. This isnt a new idea: Gordon Moore said the same in the 60’s. But after decades of increased device level integration, it is an important change. Figure 1 shows an example (future) device: there are large bumps, hybrid bonds--for extreme bandwidth and low latency connection to cache memory, TSV based DRAM, and multiple CPU to CPU interconnects. Each of these is a failure point. Figure 1: The wide variety of interconnects on future advanced packages Figure 2: the triangle of misery as applied to standard and Advanced xray imaging (AXI) Manufacturing will necessarily advance in the packaging arena: pin density and package size will both increase to support the high bandwidth and device integration demands. The downside of multiple device integration is a higher set of requirements on the reliability of both the individual devices and the fully assembled system. This is an opportunity to take advantage of new strategies and technologies in package inspection. The sampling challenges for both control and inspection for high reliability require systems that can run at 100% coverage and millions of units per year. An overview of reliability sampling challenges as it relates to the end of line inspection, as well as sampling for both defect type and incidence is critical to understanding how and what to measure to maximize yield. There are fundamental tradeoffs between speed, resolution, and signal to noise ratio that inform a systematic engineering understanding of inspection. Optimizing that trade-off specifically for semiconductor inspection leads to dedicated tools with extremely high resolution, speed, and low dose. In parallel with the speed requirements, sensitivity, and noise immunity can be improved with an understanding of the systematic sources of noise. These can be mitigated and even eliminated with novel algorithms for both image enhancement and defect location.

Estilos ABNT, Harvard, Vancouver, APA, etc.

9

Alqahtani, Fahad, Mohammed Almutairi e Frederick T. Sheldon. "Cloud Security Using Fine-Grained Efficient Information Flow Tracking". Future Internet 16, n.º 4 (25 de março de 2024): 110. http://dx.doi.org/10.3390/fi16040110.

Texto completo da fonte

Resumo:

This study provides a comprehensive review and comparative analysis of existing Information Flow Tracking (IFT) tools which underscores the imperative for mitigating data leakage in complex cloud systems. Traditional methods impose significant overhead on Cloud Service Providers (CSPs) and management activities, prompting the exploration of alternatives such as IFT. By augmenting consumer data subsets with security tags and deploying a network of monitors, IFT facilitates the detection and prevention of data leaks among cloud tenants. The research here has focused on preventing misuse, such as the exfiltration and/or extrusion of sensitive data in the cloud as well as the role of anonymization. The CloudMonitor framework was envisioned and developed to study and design mechanisms for transparent and efficient IFT (eIFT). The framework enables the experimentation, analysis, and validation of innovative methods for providing greater control to cloud service consumers (CSCs) over their data. Moreover, eIFT enables enhanced visibility to assess data conveyances by third-party services toward avoiding security risks (e.g., data exfiltration). Our implementation and validation of the framework uses both a centralized and dynamic IFT approach to achieve these goals. We measured the balance between dynamism and granularity of the data being tracked versus efficiency. To establish a security and performance baseline for better defense in depth, this work focuses primarily on unique Dynamic IFT tracking capabilities using e.g., Infrastructure as a Service (IaaS). Consumers and service providers can negotiate specific security enforcement standards using our framework. Thus, this study orchestrates and assesses, using a series of real-world experiments, how distinct monitoring capabilities combine to provide a comparatively higher level of security. Input/output performance was evaluated for execution time and resource utilization using several experiments. The results show that the performance is unaffected by the magnitude of the input/output data that is tracked. In other words, as the volume of data increases, we notice that the execution time grows linearly. However, this increase occurs at a rate that is notably slower than what would be anticipated in a strictly proportional relationship. The system achieves an average CPU and memory consumption overhead profile of 8% and 37% while completing less than one second for all of the validation test runs. The results establish a performance efficiency baseline for a better measure and understanding of the cost of preserving confidentiality, integrity, and availability (CIA) for cloud Consumers and Providers (C&P). Consumers can scrutinize the benefits (i.e., security) and tradeoffs (memory usage, bandwidth, CPU usage, and throughput) and the cost of ensuring CIA can be established, monitored, and controlled. This work provides the primary use-cases, formula for enforcing the rules of data isolation, data tracking policy framework, and the basis for managing confidential data flow and data leak prevention using the CloudMonitor framework.

Estilos ABNT, Harvard, Vancouver, APA, etc.

10

Yin, Jun, e Marian Verhelst. "CNN-based Robust Sound Source Localization with SRP-PHAT for the Extreme Edge". ACM Transactions on Embedded Computing Systems, 6 de março de 2023. http://dx.doi.org/10.1145/3586996.

Texto completo da fonte

Resumo:

Robust sound source localization for environments with noise and reverberation are increasingly exploiting deep neural networks fed with various acoustic features. Yet, state-of-the-art research mainly focuses on optimizing algorithmic accuracy, resulting in huge models preventing edge-device deployment. The edge, however, urges for real-time low-footprint acoustic reasoning for applications such as hearing aids and robot interactions. Hence, we set off from a robust CNN-based model using SRP-PHAT features, Cross3D [16], to pursue an efficient yet compact model architecture for the extreme edge. For both the SRP feature representation and neural network, we propose respectively our scalable LC-SRP-Edge and Cross3D-Edge algorithms which are optimized towards lower hardware overhead. LC-SRP-Edge halves the complexity and on-chip memory overhead for the sinc interpolation compared to the original LC-SRP [19]. Over multiple SRP resolution cases, Cross3D-Edge saves 10.32 ∼ 73.71% computational complexity and 59.77 ∼ 94.66% neural network weights against the Cross3D baseline. In terms of the accuracy-efficiency tradeoff, the most balanced version ( EM ) requires only 127.1 MFLOPS computation, 3.71 MByte/s bandwidth, and 0.821 MByte on-chip memory in total, while still retaining competitiveness in state-of-the-art accuracy comparisons. It achieves 8.59 ms/frame end-to-end latency on a Rasberry Pi 4B, which is 7.26x faster than the corresponding baseline.

Estilos ABNT, Harvard, Vancouver, APA, etc.

Teses / dissertações sobre o assunto "Tradeoff bandwidth/memory"

1

Benkirane, Nada. "La gestion du trafic dans les réseaux orientés contenus". Electronic Thesis or Diss., Paris 6, 2014. http://www.theses.fr/2014PA066039.

Texto completo da fonte

Resumo:

Les réseaux orientés contenus (CCN) ont été créés afin d'optimiser les ressources réseau et assurer une plus grande sécurité. Le design et l'implémentation de cette architecture est encore à ces débuts. Ce travail de thèse présente des propositions pour la gestion de trafic dans les réseaux du future. Il est nécessaire d'ajouter des mécanismes de contrôle concernant le partage de la bande passante entre flots. Le contrôle de trafic est nécessaire pour assurer un temps de latence faible pour les flux de streaming vidéo ou audio, et pour partager équitablement la bande passante entre flux élastiques. Nous proposons un mécanisme d'Interest Discard pour les réseaux CCN afin d'optimiser l'utilisation de la bande passante. Les CCN favorisant l'utilisation de plusieurs sources pour télécharger un contenu, nous étudions les performances des Multipaths/ Multisources; on remarque alors que leurs performances dépendent des performances de caches.Dans la deuxième partie de cette thèse, nous évaluons les performances de caches en utilisant une approximation simple et précise pour les caches LRU. Les performances des caches dépendent fortement de la popularité des objets et de la taille des catalogues. Ainsi, Nous avons évalué les performances des caches en utilisant des popularités et des catalogues représentant les données réelles échangées sur Internet. Aussi, nous avons observé que les tailles de caches doivent être très grandes pour assurer une réduction significative de la bande passante; ce qui pourrait être contraignant pour l'implémentation des caches dans les routeurs. Nous pensons que la distribution des caches devrait répondre à un compromis bande passante/mémoire ; la distribution adoptée devrait réaliser un coût minimum. Pour ce faire, nous évaluons les différences de coût entre architectures
Content Centric Network (CCN) architecture has been designed to optimize network resources and ensure greater security. The design and the implementation of this architecture are only in its beginning. This work has made some proposals in traffic management related to the internet of the future.We argue that it is necessary to supplement CCN with mechanisms enabling controlled sharing of network bandwidth by competitive flows. Traffic control is necessary to ensure low latency for conversational and streaming flows, and to realize satisfactory bandwidth sharing between elastic flows. These objectives can be realized using "per-flow bandwidth sharing". As the bandwidth sharing algorithms in the IP architecture are not completely satisfactory, we proposed the Interest Discard as a new technique for CCN. We tested some of the mechanisms using CCNx prototype software and simulations. In evaluating the performance of multi-paths we noted the role of cache performance in the choice of selected paths.In the second part, we evaluate the performance of caches using a simple approximation for LRU cache performance that proves highly accurate. As caches performance heavily depends on populations and catalogs sizes, we evaluate their performance using popularity and catalogs representing the current Internet exchanges. Considering alpha values, we observe that the cache size should be very large, which can be restrictive for caches implementation in routers.We believe that the distribution of caches on an architecture creates an excessive bandwidth consumption. Then, it is important to determine a tradeoff bandwidth/memory to determine how we should size caches and where we should place them, this amounts to evaluate differences, in cost, between architectures

Estilos ABNT, Harvard, Vancouver, APA, etc.

Trabalhos de conferências sobre o assunto "Tradeoff bandwidth/memory"

1

Roberts, James, e Nada Sbihi. "Exploring the memory-bandwidth tradeoff in an information-centric network". In 2013 25th International Teletraffic Congress (ITC 2013). IEEE, 2013. http://dx.doi.org/10.1109/itc.2013.6662936.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

2

Martin, Milo M. K., Pacia J. Harper, Daniel J. Sorin, Mark D. Hill e David A. Wood. "Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors". In the 30th annual international symposium. New York, New York, USA: ACM Press, 2003. http://dx.doi.org/10.1145/859618.859642.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

3

Mok, Fai. "Applications of holographic storage in lithium niobate". In OSA Annual Meeting. Washington, D.C.: Optica Publishing Group, 1992. http://dx.doi.org/10.1364/oam.1992.we1.

Texto completo da fonte

Resumo:

Multiple volume holograms can be recorded in photorefractive crystals. Holograms are typically recorded by exposing the crystals to interference patterns of plane waves and signal beams. Any one of the two recording beams can be used to read the recorded holograms. Depending on the choice of the readout beam, a multiple-hologram memory can be configured either as an inner-product computer or as an imagery memory. We show experimental results of both configurations of our multiple-hologram memory. The storage capacity of a multiple-hologram memory is a function of the average diffraction efficiency/available readout energy, bandwidth of the optical system/ separation between holograms, and tolerable noise level. In an angle-multiplexed memory, utilizing fractal-space multiplexing can increase the storage bandwidth by over an order of magnitude. Angular separation between holograms and volume holographic cross-talk noise level are both minimal when the nominal included angle between plane waves and signal beams is 90°. The tradeoff of using a 90° geometry is a reduction in the average diffraction efficiency. We attempt to answer some of the questions involved in this tradeoff.

Estilos ABNT, Harvard, Vancouver, APA, etc.

4

Langguth, Johannes, Xing Cai e Mohammed Sourouri. "Memory Bandwidth Contention: Communication vs Computation Tradeoffs in Supercomputers with Multicore Architectures". In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2018. http://dx.doi.org/10.1109/padsw.2018.8644601.

Texto completo da fonte

Estilos ABNT, Harvard, Vancouver, APA, etc.

Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!