Academic literature on the topic 'Network-on-chip, Dataflow Computing, Performance, Framework'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Network-on-chip, Dataflow Computing, Performance, Framework.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

Fang, Juan, Sitong Liu, Shijian Liu, Yanjin Cheng, and Lu Yu. "Hybrid Network-on-Chip: An Application-Aware Framework for Big Data." Complexity 2018 (July 30, 2018): 1–11. http://dx.doi.org/10.1155/2018/1040869.

Full text
Abstract:
Burst growing IoT and cloud computing demand exascale computing systems with high performance and low power consumption to process massive amounts of data. Modern system platforms based on fundamental requirements encounter a performance gap in chasing exponential growth in data speed and amount. To narrow the gap, a heterogamous design gives us a hint. A network-on-chip (NoC) introduces a packet-switched fabric for on-chip communication and becomes the de facto many-core interconnection mechanism; it refers to a vital shared resource for multifarious applications which will notably affect system energy efficiency. Among all the challenges in NoC, unaware application behaviors bring about considerable congestion, which wastes huge amounts of bandwidth and power consumption on the chip. In this paper, we propose a hybrid NoC framework, combining buffered and bufferless NoCs, to make the NoC framework aware of applications’ performance demands. An optimized congestion control scheme is also devised to satisfy the requirement in energy efficiency and the fairness of big data applications. We use a trace-driven simulator to model big data applications. Compared with the classical buffered NoC, the proposed hybrid NoC is able to significantly improve the performance of mixed applications by 17% on average and 24% at the most, decrease the power consumption by 38%, and improve the fairness by 13.3%.
APA, Harvard, Vancouver, ISO, and other styles
2

Lin, Yanru, Yanjun Zhang, and Xu Yang. "A Low Memory Requirement MobileNets Accelerator Based on FPGA for Auxiliary Medical Tasks." Bioengineering 10, no. 1 (December 24, 2022): 28. http://dx.doi.org/10.3390/bioengineering10010028.

Full text
Abstract:
Convolutional neural networks (CNNs) have been widely applied in the fields of medical tasks because they can achieve high accuracy in many fields using a large number of parameters and operations. However, many applications designed for auxiliary checks or help need to be deployed into portable devices, where the huge number of operations and parameters of a standard CNN can become an obstruction. MobileNet adopts a depthwise separable convolution to replace the standard convolution, which can greatly reduce the number of operations and parameters while maintaining a relatively high accuracy. Such highly structured models are very suitable for FPGA implementation in order to further reduce resource requirements and improve efficiency. Many other implementations focus on performance more than on resource requirements because MobileNets has already reduced both parameters and operations and obtained significant results. However, because many small devices only have limited resources they cannot run MobileNet-like efficient networks in a normal way, and there are still many auxiliary medical applications that require a high-performance network running in real-time to meet the requirements. Hence, we need to figure out a specific accelerator structure to further reduce the memory and other resource requirements while running MobileNet-like efficient networks. In this paper, a MobileNet accelerator is proposed to minimize the on-chip memory capacity and the amount of data that is transferred between on-chip and off-chip memory. We propose two configurable computing modules: Pointwise Convolution Accelerator and Depthwise Convolution Accelerator, to parallelize the network and reduce the memory requirement with a specific dataflow model. At the same time, a new cache usage method is also proposed to further reduce the use of the on-chip memory. We implemented the accelerator on Xilinx XC7Z020, deployed MobileNetV2 on it, and achieved 70.94 FPS with 524.25 KB on-chip memory usage under 150 MHz.
APA, Harvard, Vancouver, ISO, and other styles
3

Sui, Xuefu, Qunbo Lv, Liangjie Zhi, Baoyu Zhu, Yuanbo Yang, Yu Zhang, and Zheng Tan. "A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation." Sensors 23, no. 2 (January 11, 2023): 824. http://dx.doi.org/10.3390/s23020824.

Full text
Abstract:
To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve “lossless” CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.
APA, Harvard, Vancouver, ISO, and other styles
4

Chen, Hui, Zihao Zhang, Peng Chen, Xiangzhong Luo, Shiqing Li, and Weichen Liu. "MARCO: A High-performance Task M apping a nd R outing Co -optimization Framework for Point-to-Point NoC-based Heterogeneous Computing Systems." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–21. http://dx.doi.org/10.1145/3476985.

Full text
Abstract:
Heterogeneous computing systems (HCSs), which consist of various processing elements (PEs) that vary in their processing ability, are usually facilitated by the network-on-chip (NoC) to interconnect its components. The emerging point-to-point NoCs which support single-cycle-multi-hop transmission, reduce or eliminate the latency dependence on distance, addressing the scalability concern raised by high latency for long-distance transmission and enlarging the design space of the routing algorithm to search the non-shortest paths. For such point-to-point NoC-based HCSs, resource management strategies which are managed by compilers, scheduler, or controllers, e.g., mapping and routing, are complicated for the following reasons: (i) Due to the heterogeneity, mapping and routing need to optimize computation and communication concurrently (for homogeneous computing systems, only communication). (ii) Conducting mapping and routing consecutively cannot minimize the schedule length in most cases since the PEs with high processing ability may locate in the crowded area and suffer from high resource contention overhead. (iii) Since changing the mapping selection of one task will reconstruct the whole routing design space, the exploration of mapping and routing design space is challenging. Therefore, in this work, we propose MARCO, the m apping a nd r outing co -optimization framework, to decrease the schedule length of applications on point-to-point NoC-based HCSs. Specifically, we revise the tabu search to explore the design space and evaluate the quality of mapping and routing. The advanced reinforcement learning (RL)algorithm, i.e., advantage actor-critic, is adopted to efficiently compute paths. We perform extensive experiments on various real applications, which demonstrates that the MARCO achieves a remarkable performance improvement in terms of schedule length (+44.94% ∼ +50.18%) when compared with the state-of-the-art mapping and routing co-optimization algorithm for homogeneous computing systems. We also compare MARCO with different combinations of state-of-the-art mapping and routing approaches.
APA, Harvard, Vancouver, ISO, and other styles
5

Ma, Fuqi, Bo Wang, Min Li, Xuzhu Dong, Yifan Mao, Yinyu Zhou, and Hengrui Ma. "Edge Intelligent Perception Method for Power Grid Icing Condition Based on Multi-Scale Feature Fusion Target Detection and Model Quantization." Frontiers in Energy Research 9 (October 4, 2021). http://dx.doi.org/10.3389/fenrg.2021.754335.

Full text
Abstract:
Insulator is an important equipment of power transmission line. Insulator icing can seriously affect the stable operation of power transmission line. So insulator icing condition monitoring has great significance of the safety and stability of power system. Therefore, this paper proposes a lightweight intelligent recognition method of insulator icing thickness for front-end ice monitoring device. In this method, the residual network (ResNet) and feature pyramid network (FPN) are fused to construct a multi-scale feature extraction network framework, so that the shallow features and deep features are fused to reduce the information loss and improve the target detection accuracy. Then, the full convolution neural network (FCN) is used to classify and regress the iced insulator, so as to realize the high-precision identification of icing thickness. Finally, the proposed method is compressed by model quantization to reduce the size and parameters of the model for adapting the icing monitoring terminal with limited computing resources, and the performance of the method is verified and compared with other classical method on the edge intelligent chip.
APA, Harvard, Vancouver, ISO, and other styles
6

Lin, Wei-Ting, Hsiang-Yun Cheng, Chia-Lin Yang, Meng-Yao Lin, Kai Lien, Han-Wen Hu, Hung-Sheng Chang, et al. "DL-RSIM: A Reliability and Deployment Strategy Simulation Framework for ReRAM-based CNN Accelerators." ACM Transactions on Embedded Computing Systems, January 31, 2022. http://dx.doi.org/10.1145/3507639.

Full text
Abstract:
Memristor-based deep learning accelerators provide a promising solution to improve the energy efficiency of neuromorphic computing systems. However, the electrical properties and crossbar structure of memristors make these accelerators error-prone. In addition, due to the hardware constraints, the way to deploy neural network models on memristor crossbar arrays affects the computation parallelism and communication overheads.To enable reliable and energy-efficient memristor-based accelerators, a simulation platform is needed to precisely analyze the impact of non-ideal circuit/device properties on the inference accuracy and the influence of different deployment strategies on performance and energy consumption. In this paper, we propose a flexible simulation framework, DL-RSIM, to tackle this challenge. A rich set of reliability impact factors and deployment strategies are explored by DL-RSIM, and it can be incorporated with any deep learning neural networks implemented by TensorFlow. Using several representative convolutional neural networks as case studies, we show that DL-RSIM can guide chip designers to choose a reliability-friendly design option and energy-efficient deployment strategies and develop optimization techniques accordingly.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

MAZUMDAR, SOMNATH. "An Efficient NoC-based Framework To Improve Dataflow Thread Management At Runtime." Doctoral thesis, Università di Siena, 2017. http://hdl.handle.net/11365/1011261.

Full text
Abstract:
This doctoral thesis focuses on how the application threads that are based on dataflow execution model can be managed at Network-on-Chip (NoC) level. The roots of the dataflow execution model date back to the early 1970’s. Applications adhering to such program execution model follow a simple producer-consumer communication scheme for synchronising parallel thread related activities. In dataflow execution environment, a thread can run if and only if all its required inputs are available. Applications running on a large and complex computing environment can significantly benefit from the adoption of dataflow model. In the first part of the thesis, the work is focused on the thread distribution mechanism. It has been shown that how a scalable hash-based thread distribution mechanism can be implemented at the router level with low overheads. To enhance the support further, a tool to monitor the dataflow threads’ status and a simple, functional model is also incorporated into the design. Next, a software defined NoC has been proposed to manage the distribution of dataflow threads by exploiting its reconfigurability. The second part of this work is focused more on NoC microarchitecture level. Traditional 2D-mesh topology is combined with a standard ring, to understand how such hybrid network topology can outperform the traditional topology (such as 2D-mesh). Finally, a mixed-integer linear programming based analytical model has been proposed to verify if the application threads mapped on to the free cores is optimal or not. The proposed mathematical model can be used as a yardstick to verify the solution quality of the newly developed mapping policy. It is not trivial to provide a complete low-level framework for dataflow thread execution for better resource and power management. However, this work could be considered as a primary framework to which improvements could be carried out.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Network-on-chip, Dataflow Computing, Performance, Framework"

1

Kim, Hanjoon, Seulki Heo, Junghoon Lee, Jaehyuk Huh, and John Kim. "On-Chip Network Evaluation Framework." In 2010 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2010. http://dx.doi.org/10.1109/sc.2010.35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Yixiao, Yutaka Matsubara, Daniel Olbrys, Kazuhiro Kajio, Takashi Inada, and Hiroaki Takada. "Agile Software Design Verification and Validation (V&V) for Automated Driving." In FISITA World Congress 2021. FISITA, 2021. http://dx.doi.org/10.46720/f2020-ves-017.

Full text
Abstract:
Automated Driving System (ADS) generally consists of 3 functions 1) Recognition, 2) Planning, 3) Control. Precise vehicle localization and accurate recognition of objects (vehicle, pedestrian, lane, traffic sign, etc.) are typically based on high-definition dynamic maps and data from multiple sensors (e.g. Camera, LiDAR, Radar). Planners, especially those for optimal path and trajectory, tend to be computationally intensive. Many applications in ADS use machine learning techniques such as DNN (Deep Neural Network), which further increase the demand for computing power. To parallelly process massive tasks and data in real-time, scalable software and high-performance SoC (System on Chip) with many CPUs or processing cores, and hardware accelerators (e.g. GPU, DLA) have been adopted. However, ADS software and SoC hardware architecture are so large and complex that software validation at later testing phase is inefficient and costly. Due to continuous ADS software evolution and iterations, software redesign will occur much more frequently than traditional automotive systems. The productivity of software validation must be improved to avoid the unacceptable bloat of required effort and time. This paper explores how to obtain optimal ADS software scheduling design and how to enable agile ADS software V&V (Verification and Validation) in order to release the product in short development cycle. The proposed agile software V&V framework integrates the design verification with scheduling simulator in PC and the validation with debugging and tracing tools for the hardware target, which is usually an embedded board. We developed utility tools to make the proposed framework seamless and automated. The evaluation results indicate that the proposed framework can efficiently explore the optimal scheduling design (e.g. scheduling policy, thread priority, core affinity) satisfying several non-functional requirements (e.g. response time, CPU utilization) for ADS. We also proved that the framework is practical and can be incorporated into agile ADS software development by validating it through the project. Key words: - Automated Driving System (ADS) - System on Chip (SOC) - Deep Neural Network (DNN) - Optimal Scheduling Design - Verification and Validation (V&V)
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography