Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Neural Mask Estimation.

Artykuły w czasopismach na temat „Neural Mask Estimation”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Neural Mask Estimation”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Om, Chol Nam, Hyok Kwak, Chong Il Kwak, Song Gum Ho i Hyon Gyong Jang. "Multichannel Speech Enhancement of Target Speaker Based on Wakeup Word Mask Estimation with Deep Neural Network". International Journal of Advanced Networking and Applications 15, nr 01 (2023): 5754–59. http://dx.doi.org/10.35444/ijana.2023.15101.

Pełny tekst źródła
Streszczenie:
In this paper, we address a multichannel speech enhancement method based on wakeup word mask estimation using Deep Neural Network (DNN). It is thought that the wakeup word is an important clue for target speaker. We use a DNN to estimate the wakeup word mask and noise mask and apply them to separate the mixed wakeup word signal into target speaker’s speech and background noise. Convolutional Recurrent Neural Network (CRNN) is used to exploit both short and long term time-frequency dependencies of sequences such as speech signals. Generalized Eigen Vector (GEV) beamforming estimates the spatial filter by using the masks to enhance the following speech command of target speaker and reduce undesirable noise. Experiment results show that the proposal provides more robust to noise, so that improves the Signal-to-Noise Ratio (SNR) and speech recognition accuracy.
Style APA, Harvard, Vancouver, ISO itp.
2

Lee, Hyo-Jun, Jong-Hyeon Baek, Young-Kuk Kim, Jun Heon Lee, Myungjae Lee, Wooju Park, Seung Hwan Lee i Yeong Jun Koh. "BTENet: Back-Fat Thickness Estimation Network for Automated Grading of the Korean Commercial Pig". Electronics 11, nr 9 (19.04.2022): 1296. http://dx.doi.org/10.3390/electronics11091296.

Pełny tekst źródła
Streszczenie:
For the automated grading of the Korean commercial pig, we propose deep neural networks called the back-fat thickness estimation network (BTENet). The proposed BTENet contains segmentation and thickness estimation modules to simultaneously perform a back-fat area segmentation and a thickness estimation. The segmentation module estimates a back-fat area mask from an input image. Through both the input image and estimated back-fat mask, the thickness estimation module predicts a real back-fat thickness in millimeters by effectively analyzing the back-fat area. To train BTENet, we also build a large-scale pig image dataset called PigBT. Experimental results validate that the proposed BTENet achieves the reliable thickness estimation (Pearson’s correlation coefficient: 0.915; mean absolute error: 1.275 mm; mean absolute percentage error: 6.4%). Therefore, we expect that BTENet will accelerate a new phase for the automated grading system of the Korean commercial pig.
Style APA, Harvard, Vancouver, ISO itp.
3

Bezsonov, Oleksandr, Oleh Lebediev, Valentyn Lebediev, Yuriy Megel, Dmytro Prochukhan i Oleg Rudenko. "Breed recognition and estimation of live weight of cattle based on methods of machine learning and computer vision". Eastern-European Journal of Enterprise Technologies 6, nr 9 (114) (29.12.2021): 64–74. http://dx.doi.org/10.15587/1729-4061.2021.247648.

Pełny tekst źródła
Streszczenie:
A method of measuring cattle parameters using neural network methods of image processing was proposed. To this end, several neural network models were used: a convolutional artificial neural network and a multilayer perceptron. The first is used to recognize a cow in a photograph and identify its breed followed by determining its body dimensions using the stereopsis method. The perceptron was used to estimate the cow's weight based on its breed and size information. Mask RCNN (Mask Regions with CNNs) convolutional network was chosen as an artificial neural network. To clarify information on the physical parameters of animals, a 3D camera (Intel RealSense D435i) was used. Images of cows taken from different angles were used to determine the parameters of their bodies using the photogrammetric method. The cow body dimensions were determined by analyzing animal images taken with synchronized cameras from different angles. First, a cow was identified in the photograph and its breed was determined using the Mask RCNN convolutional neural network. Next, the animal parameters were determined using the stereopsis method. The resulting breed and size data were fed to a predictive model to determine the estimated weight of the animal. When modeling, Ayrshire, Holstein, Jersey, Krasnaya Stepnaya breeds were considered as cow breeds to be recognized. The use of a pre-trained network with its subsequent training applying the SGD algorithm and Nvidia GeForce 2080 video card has made it possible to significantly speed up the learning process compared to training in a CPU. The results obtained confirm the effectiveness of the proposed method in solving practical problems.
Style APA, Harvard, Vancouver, ISO itp.
4

Lee, Chang-bok, Han-sung Lee i Hyun-chong Cho. "Cattle Weight Estimation Using Fully and Weakly Supervised Segmentation from 2D Images". Applied Sciences 13, nr 5 (23.02.2023): 2896. http://dx.doi.org/10.3390/app13052896.

Pełny tekst źródła
Streszczenie:
Weight information is important in cattle breeding because it can measure animal growth and be used to calculate the appropriate amount of daily feed. To estimate the weight, we developed an image-based method that does not stress cattle and requires no manual labor. From a 2D image, a mask was obtained by segmenting the animal and background, and weights were estimated using a deep neural network with residual connections by extracting weight-related features from the segmentation mask. Two image segmentation methods, fully and weakly supervised segmentation, were compared. The fully supervised segmentation method uses a Mask R-CNN model that learns the ground truth mask generated by labeling as the correct answer. The weakly supervised segmentation method uses an activation visualization map that is proposed in this study. The first method creates a more precise mask, but the second method does not require ground truth segmentation labeling. The body weight was estimated using statistical features of the segmented region. In experiments, the following performance results were obtained: a mean average error of 17.31 kg and mean absolute percentage error of 5.52% for fully supervised segmentation, and a mean average error of 35.91 kg and mean absolute percentage error of 10.1% for the weakly supervised segmentation.
Style APA, Harvard, Vancouver, ISO itp.
5

Guimarães, André, Maria Valério, Beatriz Fidalgo, Raúl Salas-Gonzalez, Carlos Pereira i Mateus Mendes. "Cork Oak Production Estimation Using a Mask R-CNN". Energies 15, nr 24 (17.12.2022): 9593. http://dx.doi.org/10.3390/en15249593.

Pełny tekst źródła
Streszczenie:
Cork is a versatile natural material. It can be used as an insulator in construction, among many other applications. For good forest management of cork oaks, forest owners need to calculate the volume of cork periodically. This will allow them to choose the right time to harvest the cork. The traditional method is laborious and time consuming. The present work aims to automate the process of calculating the trunk area of a cork oak from which cork is extracted. Through this calculation, it will be possible to estimate the volume of cork produced before the stripping process. A deep neural network, Mask R-CNN, and a machine learning algorithm are used. A dataset of images of cork oaks was created, where targets of known dimensions were fixed on the trunks. The Mask R-CNN was trained to recognize targets cork regions, and so the area of cork was estimated based on the target dimensions. Preliminary results show that the model presents a good performance in the recognition of targets and trunks, registering a mAP@0.7 of 0.96. After obtaining the mask results, three machine learning models were trained to estimate the cork volume based on the area and biometric parameters of the tree. The results showed that a support vector machine produced an average error of 8.75%, which is within the error margins obtained using traditional methods.
Style APA, Harvard, Vancouver, ISO itp.
6

Hasannezhad, Mojtaba, Zhiheng Ouyang, Wei-Ping Zhu i Benoit Champagne. "Speech Enhancement With Phase Sensitive Mask Estimation Using a Novel Hybrid Neural Network". IEEE Open Journal of Signal Processing 2 (2021): 136–50. http://dx.doi.org/10.1109/ojsp.2021.3067147.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Sivapatham, Shoba, Asutosh Kar, Roshan Bodile, Vladimir Mladenovic i Pitikhate Sooraksa. "A deep neural network-correlation phase sensitive mask based estimation to improve speech intelligibility". Applied Acoustics 212 (wrzesień 2023): 109592. http://dx.doi.org/10.1016/j.apacoust.2023.109592.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Osorio, Kavir, Andrés Puerto, Cesar Pedraza, David Jamaica i Leonardo Rodríguez. "A Deep Learning Approach for Weed Detection in Lettuce Crops Using Multispectral Images". AgriEngineering 2, nr 3 (28.08.2020): 471–88. http://dx.doi.org/10.3390/agriengineering2030032.

Pełny tekst źródła
Streszczenie:
Weed management is one of the most important aspects of crop productivity; knowing the amount and the locations of weeds has been a problem that experts have faced for several decades. This paper presents three methods for weed estimation based on deep learning image processing in lettuce crops, and we compared them to visual estimations by experts. One method is based on support vector machines (SVM) using histograms of oriented gradients (HOG) as feature descriptor. The second method was based in YOLOV3 (you only look once V3), taking advantage of its robust architecture for object detection, and the third one was based on Mask R-CNN (region based convolutional neural network) in order to get an instance segmentation for each individual. These methods were complemented with a NDVI index (normalized difference vegetation index) as a background subtractor for removing non photosynthetic objects. According to chosen metrics, the machine and deep learning methods had F1-scores of 88%, 94%, and 94% respectively, regarding to crop detection. Subsequently, detected crops were turned into a binary mask and mixed with the NDVI background subtractor in order to detect weed in an indirect way. Once the weed image was obtained, the coverage percentage of weed was calculated by classical image processing methods. Finally, these performances were compared with the estimations of a set from weed experts through a Bland–Altman plot, intraclass correlation coefficients (ICCs) and Dunn’s test to obtain statistical measurements between every estimation (machine-human); we found that these methods improve accuracy on weed coverage estimation and minimize subjectivity in human-estimated data.
Style APA, Harvard, Vancouver, ISO itp.
9

Song, Junho, Yonggu Lee i Euiseok Hwang. "Time–Frequency Mask Estimation Based on Deep Neural Network for Flexible Load Disaggregation in Buildings". IEEE Transactions on Smart Grid 12, nr 4 (lipiec 2021): 3242–51. http://dx.doi.org/10.1109/tsg.2021.3066547.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Rizwan, Tahir, Yunze Cai, Muhammad Ahsan, Noman Sohail, Emad Abouel Nasr i Haitham A. Mahmoud. "Neural Network Approach for 2-Dimension Person Pose Estimation With Encoded Mask and Keypoint Detection". IEEE Access 8 (2020): 107760–71. http://dx.doi.org/10.1109/access.2020.3001473.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Ding, Hongyu, Muhammad Ahsan Latif, Zain Zia, Muhammad Asif Habib, Muhammad Abdul Qayum i Quancai Jiang. "Facial Mask Detection Using Image Processing with Deep Learning". Mathematical Problems in Engineering 2022 (12.08.2022): 1–10. http://dx.doi.org/10.1155/2022/8220677.

Pełny tekst źródła
Streszczenie:
Coronavirus disease 2019 (COVID-19) has a significant impact on human life. The novel pandemic forced humans to change their lifestyles. Scientists have broken through the vaccine in many countries, but the face mask is the only protection for public interaction. In this study, deep neural networks (DNN) have been employed to determine the persons wearing masks correctly. The faster region-based convolutional neural networks (RCNN) model has been used to train the data using graphics processing unit (GPU) device. To achieve our goals, we used a multiphase detection model: first, to label the face mask, and second to detect the edge and compute edge projection for the chosen face region within the face mask. The current findings revealed that faster RCNN was efficient and precise, giving 97% accuracy. The overall loss after 200,000 epochs is 0.0503, with a trend to decrease. While the loss is falling, we are getting more accurate results. As a result, the faster RCNN technique effectively identifies whether a person is wearing face masks or not, and the training period was decreased with better accuracy. In the future, Deep Neural Network (DNN) might be used first to train the data and then compress the dimensions of the input to run it on low-powered devices, resulting in a lower computational cost. Our proposed system can achieve high face detection accuracy and coarsely obtain face posture estimation based on the specified rule. The faster RCNN learning algorithm returns high precision, and the model’s lower computational cost is achieved on GPU. We use the “label-image” application to label the photographs extracted from the dataset and apply Inception V2 of faster RCNN for face mask detection and classification.
Style APA, Harvard, Vancouver, ISO itp.
12

Gu, Yanlei, Huiyang Zhang i Shunsuke Kamijo. "Multi-Person Pose Estimation using an Orientation and Occlusion Aware Deep Learning Network". Sensors 20, nr 6 (12.03.2020): 1593. http://dx.doi.org/10.3390/s20061593.

Pełny tekst źródła
Streszczenie:
Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, the joint feature is not sufficient, especially when the image includes multi-person and the pose is occluded or not fully visible. This paper proposes a novel multi-task framework for the multi-person pose estimation. The proposed framework is developed based on Mask Region-based Convolutional Neural Networks (R-CNN) and extended to integrate the joint feature, body boundary, body orientation and occlusion condition together. In order to further improve the performance of the multi-person pose estimation, this paper proposes to organize the different information in serial multi-task models instead of the widely used parallel multi-task network. The proposed models are trained on the public dataset Common Objects in Context (COCO), which is further augmented by ground truths of body orientation and mutual-occlusion mask. Experiments demonstrate the performance of the proposed method for multi-person pose estimation and body orientation estimation. The proposed method can detect 84.6% of the Percentage of Correct Keypoints (PCK) and has an 83.7% Correct Detection Rate (CDR). Comparisons further illustrate the proposed model can reduce the over-detection compared with other methods.
Style APA, Harvard, Vancouver, ISO itp.
13

Li, Xinqiang, Xingmian Wang, Yanan Qin i Jing Li. "SNR Classification Based Multi-Estimator IRM Speech Enhancement Algorithm". Journal of Physics: Conference Series 2173, nr 1 (1.01.2022): 012086. http://dx.doi.org/10.1088/1742-6596/2173/1/012086.

Pełny tekst źródła
Streszczenie:
Abstract Deep neural network(DNN)-based ideal ratio mask(IRM) estimation methods are often adopted in speech enhancement tasks. In the previous work, IRM estimation was usually realized by a single DNN-based IRM estimator without considering the SNR levels, which had a limited performance in real applications. Therefore, a two stage speech enhancement method is proposed in this paper. Firstly, a DNN-based SNR classifier is employed to classify the speech frames into three classes according to different SNR thresholds. Secondly, three corresponding DNN based IRM estimators related to the three SNR classes are trained respectively, from which the amplitude spectrum is corrected. Finally, speech enhancement is realized by doing IDFT to the corrected speech spectrum combined with the phase information of noisy speech. Experiment results show that the algorithm proposed in this paper has better performances in the evaluation of short time objective intelligibility(STOI), perceptual evaluation of speech quality(PESQ) and segmental signal-to-noise ratio improvement(SSNRI) scores.
Style APA, Harvard, Vancouver, ISO itp.
14

Zubov, I. G., i N. A. Obukhova. "Method for Automatic Determination of a 3D Trajectory of Vehicles in a Video Image". Journal of the Russian Universities. Radioelectronics 24, nr 3 (24.06.2021): 49–59. http://dx.doi.org/10.32603/1993-8985-2021-24-3-49-59.

Pełny tekst źródła
Streszczenie:
Introduction. An important part of an automotive unmanned vehicle (UV) control system is the environment analysis module. This module is based on various types of sensors, e.g. video cameras, lidars and radars. The development of computer and video technologies makes it possible to implement an environment analysis module using a single video camera as a sensor. This approach is expected to reduce the cost of the entire module. The main task in video image processing is to analyse the environment as a 3D scene. The 3D trajectory of an object, which takes into account its dimensions, angle of view and movement vector, as well as the vehicle pose in a video image, provides sufficient information for assessing the real interaction of objects. A basis for constructing a 3D trajectory is vehicle pose estimation.Aim. To develop an automatic method for estimating vehicle pose based on video data analysis from a single video camera.Materials and methods. An automatic method for vehicle pose estimation from a video image was proposed based on a cascade approach. The method includes vehicle detection, key points determination, segmentation and vehicle pose estimation. Vehicle detection and determination of its key points were resolved via a neural network. The segmentation of a vehicle video image and its mask preparation were implemented by transforming it into a polar coordinate system and searching for the outer contour using graph theory.Results. The estimation of vehicle pose was implemented by matching the Fourier image of vehicle mask signatures and the templates obtained based on 3D models. The correctness of the obtained vehicle pose and angle of view estimation was confirmed by experiments based on the proposed method. The vehicle pose estimation had an accuracy of 89 % on an open Carvana image dataset.Conclusion. A new approach for vehicle pose estimation was proposed, involving the transition from end-to-end learning of neural networks to resolve several problems at once, e.g., localization, classification, segmentation, and angle of view, towards cascade analysis of information. The accuracy level of end-to-end learning requires large sets of representative data, which complicates the scalability of solutions for road environments in Russia. The proposed method makes it possible to estimate the vehicle pose with a high accuracy level, at the same time as involving no large costs for manual data annotation and training.
Style APA, Harvard, Vancouver, ISO itp.
15

Nirunsin, Surasi, Praween Shinonawanik, Tawan Thintawornkul i Theeraphong Wongratanaphisan. "Size Estimation of Mango Using Mask-RCNN Object Detection and Stereo Camera for Agricultural Robotics". International Journal of Emerging Technology and Advanced Engineering 12, nr 10 (1.10.2022): 161–68. http://dx.doi.org/10.46338/ijetae1022_17.

Pełny tekst źródła
Streszczenie:
—In agricultural robotics, to perform automatic harvesting task, it is important that the size of thetarget can be reasonably approximated.Stereo cameras can be used as a sensor to detect the targeted object and approximate its size. In this paper we present two methods for estimating the size of mangoes: pinhole model and ellipsoid model methods, using data obtained from a stereo camera.Object detection scheme using Mask-RCNN deep learning neural network on an RGB image and point cloud data from stereo camera are employed. The pinhole model method assumes simple triangular projection of the object onto the image plane.It makes use of the bounding boxes predicted by the neural network and the 3-D point cloud data. The ellipsoid model method, on the other hand, uses three-dimensional point cloud data obtained by processing the images from the stereo camera.It assumes that mango shape can be approximated as an ellipsoid and uses the predicted masked region and its corresponding point cloud data to fit an ellipsoid whose center and axes are extracted to estimate the position and the size of the mango, respectively. In the scope of this study where the full shape of mango can be seen in the image view, results from experiments suggest that the simpler pinhole model can effectively estimate the size of the mango with greater accuracy than the ellipsoid model. Keywords—Object Detection; Agricultural Robotics; Artificial Intelligent
Style APA, Harvard, Vancouver, ISO itp.
16

Dai, Yaqiao, Renjiao Yi, Chenyang Zhu, Hongjun He i Kai Xu. "Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 1 (26.06.2023): 488–96. http://dx.doi.org/10.1609/aaai.v37i1.25123.

Pełny tekst źródła
Streszczenie:
Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to convolution operations and down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is more accurate globally. Therefore, we propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Instead of merging the low- and high-resolution estimations equally, we adopt the core idea of Poisson fusion, trying to implant the gradient domain of high-resolution depth into the low-resolution depth. While classic Poisson fusion requires a fusion mask as supervision, we propose a self-supervised framework based on guided image filtering. We demonstrate that this gradient-based composition performs much better at noisy immunity, compared with the state-of-the-art depth map fusion method. Our lightweight depth fusion is one-shot and runs in real-time, making it 80X faster than a state-of-the-art depth fusion method. Quantitative evaluations demonstrate that the proposed method can be integrated into many fully convolutional monocular depth estimation backbones with a significant performance boost, leading to state-of-the-art results of detail enhancement on depth maps. Codes are released at https://github.com/yuinsky/gradient-based-depth-map-fusion.
Style APA, Harvard, Vancouver, ISO itp.
17

Karageorgos, Konstantinos, Anastasios Dimou, Federico Alvarez i Petros Daras. "Implicit and Explicit Regularization for Optical Flow Estimation". Sensors 20, nr 14 (10.07.2020): 3855. http://dx.doi.org/10.3390/s20143855.

Pełny tekst źródła
Streszczenie:
In this paper, two novel and practical regularizing methods are proposed to improve existing neural network architectures for monocular optical flow estimation. The proposed methods aim to alleviate deficiencies of current methods, such as flow leakage across objects and motion consistency within rigid objects, by exploiting contextual information. More specifically, the first regularization method utilizes semantic information during the training process to explicitly regularize the produced optical flow field. The novelty of this method lies in the use of semantic segmentation masks to teach the network to implicitly identify the semantic edges of an object and better reason on the local motion flow. A novel loss function is introduced that takes into account the objects’ boundaries as derived from the semantic segmentation mask to selectively penalize motion inconsistency within an object. The method is architecture agnostic and can be integrated into any neural network without modifying or adding complexity at inference. The second regularization method adds spatial awareness to the input data of the network in order to improve training stability and efficiency. The coordinates of each pixel are used as an additional feature, breaking the invariance properties of the neural network architecture. The additional features are shown to implicitly regularize the optical flow estimation enforcing a consistent flow, while improving both the performance and the convergence time. Finally, the combination of both regularization methods further improves the performance of existing cutting edge architectures in a complementary way, both quantitatively and qualitatively, on popular flow estimation benchmark datasets.
Style APA, Harvard, Vancouver, ISO itp.
18

Zyuzin, Vasily, Mikhail Ronkin, Sergey Porshnev i Alexey Kalmykov. "Automatic Asbestos Control Using Deep Learning Based Computer Vision System". Applied Sciences 11, nr 22 (9.11.2021): 10532. http://dx.doi.org/10.3390/app112210532.

Pełny tekst źródła
Streszczenie:
The paper discusses the results of the research and development of an innovative deep learning-based computer vision system for the fully automatic asbestos content (productivity) estimation in rock chunk (stone) veins in an open pit and within the time comparable with the work of specialists (about 10 min per one open pit processing place). The discussed system is based on the applying of instance and semantic segmentation of artificial neural networks. The Mask R-CNN-based network architecture is applied to the asbestos-containing rock chunks searching images of an open pit. The U-Net-based network architecture is applied to the segmentation of asbestos veins in the images of selected rock chunks. The designed system allows an automatic search and takes images of the asbestos rocks in an open pit in the near-infrared range (NIR) and processes the obtained images. The result of the system work is the average asbestos content (productivity) estimation for each controlled open pit. It is validated to estimate asbestos content as the graduated average ratio of the vein area value to the selected rock chunk area value, both determined by the trained neural network. For both neural network training tasks the training, validation, and test datasets are collected. The designed system demonstrates an error of about 0.4% under different weather conditions in an open pit when the asbestos content is about 1.5–4%. The obtained accuracy is sufficient to use the system as a geological service tool instead of currently applied visual-based estimations.
Style APA, Harvard, Vancouver, ISO itp.
19

Park, Gyuseok, i Sangmin Lee. "Environmental Noise Classification Using Convolutional Neural Networks with Input Transform for Hearing Aids". International Journal of Environmental Research and Public Health 17, nr 7 (27.03.2020): 2270. http://dx.doi.org/10.3390/ijerph17072270.

Pełny tekst źródła
Streszczenie:
Hearing aids are essential for people with hearing loss, and noise estimation and classification are some of the most important technologies used in devices. This paper presents an environmental noise classification algorithm for hearing aids that uses convolutional neural networks (CNNs) and image signals transformed from sound signals. The algorithm was developed using the data of ten types of noise acquired from living environments where such noises occur. Spectrogram images transformed from sound data are used as the input of the CNNs after processing of the images by a sharpening mask and median filter. The classification results of the proposed algorithm were compared with those of other noise classification methods. A maximum correct classification accuracy of 99.25% was achieved by the proposed algorithm for a spectrogram time length of 1 s, with the correct classification accuracy decreasing with increasing spectrogram time length up to 8 s. For a spectrogram time length of 8 s and using the sharpening mask and median filter, the classification accuracy was 98.73%, which is comparable with the 98.79% achieved by the conventional method for a time length of 1 s. The proposed hearing aid noise classification algorithm thus offers less computational complexity without compromising on performance.
Style APA, Harvard, Vancouver, ISO itp.
20

Ren, Guanghao, Yun Wang, Zhenyun Shi, Guigang Zhang, Feng Jin i Jian Wang. "Aero-Engine Remaining Useful Life Estimation Based on CAE-TCN Neural Networks". Applied Sciences 13, nr 1 (20.12.2022): 17. http://dx.doi.org/10.3390/app13010017.

Pełny tekst źródła
Streszczenie:
With the rapid growth of the aviation fields, the remaining useful life (RUL) estimation of aero-engine has become the focus of the industry. Due to the shortage of existing prediction methods, life prediction is stuck in a bottleneck. Aiming at the low efficiency of traditional estimation algorithms, a more efficient neural network is proposed by using Convolutional Neural Networks (CNN) to replace Long-Short Term Memory (LSTM). Firstly, multi-sensor degenerate information fusion coding is realized with the convolutional autoencoder (CAE). Then, the temporal convolutional network (TCN) is applied to achieve efficient prediction with the obtained degradation code. It does not depend on the iteration along time, but learning the causality through a mask. Moreover, the data processing is improved to further improve the application efficiency of the algorithm. ExtraTreesClassifier is applied to recognize when the failure first develops. This step can not only assist labelling, but also realize feature filtering combined with tree model interpretation. For multiple operation conditions, new features are clustered by K-means++ to encode historical condition information. Finally, an experiment is carried out to evaluate the effectiveness on the Commercial Modular Aero-Propulsion System Simulation (CMAPSS) datasets provided by the National Aeronautics and Space Administration (NASA). The results show that the proposed algorithm can ensure high-precision prediction and effectively improve the efficiency.
Style APA, Harvard, Vancouver, ISO itp.
21

Syukron Abu Ishaq Alfarozi i Azkario Rizky Pratama. "CNN-Based Model for Copy Detection Pattern Estimation and Authentication". Jurnal Nasional Teknik Elektro dan Teknologi Informasi 12, nr 1 (22.02.2023): 44–49. http://dx.doi.org/10.22146/jnteti.v12i1.6205.

Pełny tekst źródła
Streszczenie:
Counterfeiting has been one of the crimes of the 21st century. One of the methods to overcome product counterfeiting is a copy detection pattern (CDP) stamped on the product. CDP is a copy-sensitive pattern that leads to quality degradation of the pattern after the print and scan process. The amount of information loss is used to distinguish between original and fake CDPs. This paper proposed a CDP estimation model based on the convolutional neural network (CNN), namely, CDP-CNN. The CDP-CNN addresses the spatial dependency of the image patch. Thus, it should be better than the state-of-the-art model that uses a multi-layer perceptron (MLP) architecture. The proposed model had an estimation bit error rate (BER) of 9.91% on the batch estimation method. The error rate was 9% lower than the previous method that used an autoencoder MLP model. The proposed model also had a lower number of parameters compared to the previous method. The effect of preprocessing, namely the use of an unsharp mask, was tested using a statistical testing method. The effect of preprocessing had no significant difference except in the batch estimation scheme where the unsharp mask filter reduced the error rate by at least 0.5%. In addition, the proposed model was also used for the authentication method. The authentication using the estimation model had a good separation distribution to distinguish the fake and original CDPs. Thus, the CDP can still be used as the authentication method with reliable performance. It helps anti-counterfeiting on product distribution and reduces negative impacts on various sectors of the economy.
Style APA, Harvard, Vancouver, ISO itp.
22

Lin, Liyu, Chaoran She, Yun Chen, Ziyu Guo i Xiaoyang Zeng. "TB-NET: A Two-Branch Neural Network for Direction of Arrival Estimation under Model Imperfections". Electronics 11, nr 2 (11.01.2022): 220. http://dx.doi.org/10.3390/electronics11020220.

Pełny tekst źródła
Streszczenie:
For direction of arrival (DoA) estimation, the data-driven deep-learning method has an advantage over the model-based methods since it is more robust against model imperfections. Conventionally, networks are based singly on regression or classification and may lead to unstable training and limited resolution. Alternatively, this paper proposes a two-branch neural network (TB-Net) that combines classification and regression in parallel. The grid-based classification branch is optimized by binary cross-entropy (BCE) loss and provides a mask that indicates the existence of the DoAs at predefined grids. The regression branch refines the DoA estimates by predicting the deviations from the grids. At the output layer, the outputs of the two branches are combined to obtain final DoA estimates. To achieve a lightweight model, only convolutional layers are used in the proposed TB-Net. The simulation results demonstrated that compared with the model-based and existing deep-learning methods, the proposed method can achieve higher DoA estimation accuracy in the presence of model imperfections and only has a size of 1.8 MB.
Style APA, Harvard, Vancouver, ISO itp.
23

Choi, Ho-Hyoung, Hyun-Soo Kang i Byoung-Ju Yun. "CNN-Based Illumination Estimation with Semantic Information". Applied Sciences 10, nr 14 (13.07.2020): 4806. http://dx.doi.org/10.3390/app10144806.

Pełny tekst źródła
Streszczenie:
For more than a decade, both academia and industry have focused attention on the computer vision and in particular the computational color constancy (CVCC). The CVCC is used as a fundamental preprocessing task in a wide range of computer vision applications. While our human visual system (HVS) has the innate ability to perceive constant surface colors of objects under varying illumination spectra, the computer vision is facing the color constancy challenge in nature. Accordingly, this article proposes novel convolutional neural network (CNN) architecture based on the residual neural network which consists of pre-activation, atrous or dilated convolution and batch normalization. The proposed network can automatically decide what to learn from input image data and how to pool without supervision. When receiving input image data, the proposed network crops each image into image patches prior to training. Once the network begins learning, local semantic information is automatically extracted from the image patches and fed to its novel pooling layer. As a result of the semantic pooling, a weighted map or a mask is generated. Simultaneously, the extracted information is estimated and combined to form global information during training. The use of the novel pooling layer enables the proposed network to distinguish between useful data and noisy data, and thus efficiently remove noisy data during learning and evaluating. The main contribution of the proposed network is taking CVCC to higher accuracy and efficiency by adopting the novel pooling method. The experimental results demonstrate that the proposed network outperforms its conventional counterparts in estimation accuracy.
Style APA, Harvard, Vancouver, ISO itp.
24

Selvaraj, Poovarasan, i E. Chandra. "Ideal ratio mask estimation using supervised DNN approach for target speech signal enhancement". Journal of Intelligent & Fuzzy Systems 42, nr 3 (2.02.2022): 1869–83. http://dx.doi.org/10.3233/jifs-211236.

Pełny tekst źródła
Streszczenie:
The most challenging process in recent Speech Enhancement (SE) systems is to exclude the non-stationary noises and additive white Gaussian noise in real-time applications. Several SE techniques suggested were not successful in real-time scenarios to eliminate noises in the speech signals due to the high utilization of resources. So, a Sliding Window Empirical Mode Decomposition including a Variant of Variational Model Decomposition and Hurst (SWEMD-VVMDH) technique was developed for minimizing the difficulty in real-time applications. But this is the statistical framework that takes a long time for computations. Hence in this article, this SWEMD-VVMDH technique is extended using Deep Neural Network (DNN) that learns the decomposed speech signals via SWEMD-VVMDH efficiently to achieve SE. At first, the noisy speech signals are decomposed into Intrinsic Mode Functions (IMFs) by the SWEMD Hurst (SWEMDH) technique. Then, the Time-Delay Estimation (TDE)-based VVMD was performed on the IMFs to elect the most relevant IMFs according to the Hurst exponent and lessen the low- as well as high-frequency noise elements in the speech signal. For each signal frame, the target features are chosen and fed to the DNN that learns these features to estimate the Ideal Ratio Mask (IRM) in a supervised manner. The abilities of DNN are enhanced for the categories of background noise, and the Signal-to-Noise Ratio (SNR) of the speech signals. Also, the noise category dimension and the SNR dimension are chosen for training and testing manifold DNNs since these are dimensions often taken into account for the SE systems. Further, the IRM in each frequency channel for all noisy signal samples is concatenated to reconstruct the noiseless speech signal. At last, the experimental outcomes exhibit considerable improvement in SE under different categories of noises.
Style APA, Harvard, Vancouver, ISO itp.
25

Hua, Jiang, Tonglin Hao, Liangcai Zeng i Gui Yu. "An Improved Estimation Algorithm of Space Targets Pose Based on Multi-Modal Feature Fusion". Mathematics 9, nr 17 (29.08.2021): 2085. http://dx.doi.org/10.3390/math9172085.

Pełny tekst źródła
Streszczenie:
The traditional estimation methods of space targets pose are based on artificial features to match the transformation relationship between the image and the object model. With the explosion of deep learning technology, the approach based on deep neural networks (DNN) has significantly improved the performance of pose estimation. However, the current methods still have problems such as complex calculation, low accuracy, and poor real-time performance. Therefore, a new pose estimation algorithm is proposed in this paper. Firstly, the mask image of the target is obtained by the instance segmentation algorithm, and its point cloud image is obtained based on a depth map combined with camera parameters. Finally, the correlation among points is established to realize the prediction of pose based on multi-modal feature fusion. Experimental results in the YCB-Video dataset show that the proposed algorithm can recognize complex images at a speed of about 24 images per second with an accuracy of more than 80%. In conclusion, the proposed algorithm can realize fast pose estimation for complex stacked objects and has strong stability for different objects.
Style APA, Harvard, Vancouver, ISO itp.
26

Qin, Mohan, Li Li i Shoji Makino. "Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation". Journal of Signal Processing 27, nr 4 (1.07.2023): 87–91. http://dx.doi.org/10.2299/jsp.27.87.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

Clarke, Garry K. C., Etienne Berthier, Christian G. Schoof i Alexander H. Jarosch. "Neural Networks Applied to Estimating Subglacial Topography and Glacier Volume". Journal of Climate 22, nr 8 (15.04.2009): 2146–60. http://dx.doi.org/10.1175/2008jcli2572.1.

Pełny tekst źródła
Streszczenie:
Abstract To predict the rate and consequences of shrinkage of the earth’s mountain glaciers and ice caps, it is necessary to have improved regional-scale models of mountain glaciation and better knowledge of the subglacial topography upon which these models must operate. The problem of estimating glacier ice thickness is addressed by developing an artificial neural network (ANN) approach that uses calculations performed on a digital elevation model (DEM) and on a mask of the present-day ice cover. Because suitable data from real glaciers are lacking, the ANN is trained by substituting the known topography of ice-denuded regions adjacent to the ice-covered regions of interest, and this known topography is hidden by imagining it to be ice-covered. For this training it is assumed that the topography is flooded to various levels by horizontal lake-like glaciers. The validity of this assumption and the estimation skill of the trained ANN is tested by predicting ice thickness for four 50 km × 50 km regions that are currently ice free but that have been partially glaciated using a numerical ice dynamics model. In this manner, predictions of ice thickness based on the neural network can be compared to the modeled ice thickness and the performance of the neural network can be evaluated and improved. From the results, thus far, it is found that ANN depth estimates can yield plausible subglacial topography with a representative rms elevation error of ±70 m and remarkably good estimates of ice volume.
Style APA, Harvard, Vancouver, ISO itp.
28

Wu, Di, i Aiping Xiao. "Deep Learning-Based Algorithm for Recognizing Tennis Balls". Applied Sciences 12, nr 23 (26.11.2022): 12116. http://dx.doi.org/10.3390/app122312116.

Pełny tekst źródła
Streszczenie:
In this paper, we adjust the hyperparameters of the training model based on the gradient estimation theory and optimize the structure of the model based on the loss function theory of Mask R-CNN convolutional network and propose a scheme to help a tennis picking robot to perform target recognition and improve the ability of the tennis picking robot to acquire and analyze image information. By collecting suitable image samples of tennis balls and training the image samples using Mask R-CNN convolutional network an algorithmic model dedicated to recognizing tennis balls is output; the final data of various loss functions after gradient descent are recorded, the iterative graph of the model is drawn, and the iterative process of the neural network at different iteration levels is observed; finally, this improved and optimized algorithm for recognizing tennis balls is compared with other algorithms for recognizing tennis balls and a comparison is made. The experimental results show that the improved algorithm based on Mask R-CNN recognizes tennis balls with 92% accuracy between iteration levels 30 and 35, which has higher accuracy and recognition distance compared with other tennis ball recognition algorithms, confirming the feasibility and applicability of the optimized algorithm in this paper.
Style APA, Harvard, Vancouver, ISO itp.
29

Gundu, Sireesha, i Hussain Syed. "Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques". Sensors 23, nr 5 (25.02.2023): 2569. http://dx.doi.org/10.3390/s23052569.

Pełny tekst źródła
Streszczenie:
Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.
Style APA, Harvard, Vancouver, ISO itp.
30

Ulutas, Esra Gungor, i Cemil ALTIN. "Kiwi Fruit Detection with Deep Learning Methods". International Journal of Advanced Natural Sciences and Engineering Researches 7, nr 7 (9.08.2023): 39–45. http://dx.doi.org/10.59287/ijanser.1333.

Pełny tekst źródła
Streszczenie:
The automatic detection of kiwifruit in orchards is a challenging task due to the similaritybetween the fruit and the complex backgrounds formed by branches and stems. Moreover, the traditionalmethod of hand-picking kiwifruit heavily relies on human labor and affects the overall yield. This studyfocuses on the fast and accurate detection of kiwifruit in natural orchard environments, which is crucialfor yield estimation and cost reduction. Two deep learning methods, Faster Region-based ConvolutionalNeural Network (Faster R-CNN) and Mask Region-based Convolutional Neural Network (Mask RCNN), are utilized for kiwifruit detection, and their results are compared. The study begins with obtainingimages of kiwi trees from the Güngör farm in Samsun Çarşamba and creating an original dataset.Preprocessing techniques are applied to improve the dataset, followed by detection using the Faster RCNN method. Different pre-trained architectures like SqueezeNet and MobileNetV3 are used, achievingaverage precision (mAP) values of 87.4% and 88.8%, respectively. In the second part of the study,kiwifruit images are processed using the ResNet50-based Mask R-CNN method, which achieves a highermAP value of 98.48%. The experimental results demonstrate the applicability and effectiveness of theproposed deep learning models for real-time kiwifruit detection in orchards. Accurate kiwifruit detectionallows farmers to optimize yield prediction, reduce costs, and improve productivity. The application ofFaster R-CNN and Mask R-CNN in this study showcases their potential for enhancing the efficiency andaccuracy of kiwifruit detection in orchard environments.
Style APA, Harvard, Vancouver, ISO itp.
31

Tengtrairat, Naruephorn, Wai Lok Woo, Phetcharat Parathai, Damrongsak Rinchumphu i Chatchawan Chaichana. "Non-Intrusive Fish Weight Estimation in Turbid Water Using Deep Learning and Regression Models". Sensors 22, nr 14 (10.07.2022): 5161. http://dx.doi.org/10.3390/s22145161.

Pełny tekst źródła
Streszczenie:
Underwater fish monitoring is the one of the most challenging problems for efficiently feeding and harvesting fish, while still being environmentally friendly. The proposed 2D computer vision method is aimed at non-intrusively estimating the weight of Tilapia fish in turbid water environments. Additionally, the proposed method avoids the issue of using high-cost stereo cameras and instead uses only a low-cost video camera to observe the underwater life through a single channel recording. An in-house curated Tilapia-image dataset and Tilapia-file dataset with various ages of Tilapia are used. The proposed method consists of a Tilapia detection step and Tilapia weight-estimation step. A Mask Recurrent-Convolutional Neural Network model is first trained for detecting and extracting the image dimensions (i.e., in terms of image pixels) of the fish. Secondly, is the Tilapia weight-estimation step, wherein the proposed method estimates the depth of the fish in the tanks and then converts the Tilapia’s extracted image dimensions from pixels to centimeters. Subsequently, the Tilapia’s weight is estimated by a trained model based on regression learning. Linear regression, random forest regression, and support vector regression have been developed to determine the best models for weight estimation. The achieved experimental results have demonstrated that the proposed method yields a Mean Absolute Error of 42.54 g, R2 of 0.70, and an average weight error of 30.30 (±23.09) grams in a turbid water environment, respectively, which show the practicality of the proposed framework.
Style APA, Harvard, Vancouver, ISO itp.
32

Costa, Pedro, Asim Smailagic, Jaime Cardoso i Aurélio Campilho. "Epistemic and Heteroscedastic Uncertainty Estimation in Retinal Blood Vessel Segmentation". U.Porto Journal of Engineering 7, nr 3 (30.04.2021): 93–100. http://dx.doi.org/10.24840/2183-6493_007.003_0008.

Pełny tekst źródła
Streszczenie:
Current state-of-the-art medical image segmentation methods require high quality datasets to obtain good performance. However, medical specialists often disagree on diagnosis, hence, datasets contain contradictory annotations. This, in turn, leads to difficulties in the optimization process of Deep Learning models and hinder performance. We propose a method to estimate uncertainty in Convolutional Neural Network (CNN) segmentation models, that makes the training of CNNs more robust to contradictory annotations. In this work, we model two types of uncertainty, heteroscedastic and epistemic, without adding any additional supervisory signal other than the ground-truth segmentation mask. As expected, the uncertainty is higher closer to vessel boundaries, and on top of thinner and less visible vessels where it is more likely for medical specialists to disagree. Therefore, our method is more suitable to learn from datasets created with heterogeneous annotators. We show that there is a correlation between the uncertainty estimated by our method and the disagreement in the segmentation provided by two different medical specialists. Furthermore, by explicitly modeling the uncertainty, the Intersection over Union of the segmentation network improves 5.7 percentage points.
Style APA, Harvard, Vancouver, ISO itp.
33

Chernyavskiy, A. N., i I. P. Malakhov. "CNN-based visual analysis to study local boiling characteristics". Journal of Physics: Conference Series 2119, nr 1 (1.12.2021): 012068. http://dx.doi.org/10.1088/1742-6596/2119/1/012068.

Pełny tekst źródła
Streszczenie:
Abstract Visual analysis allows an estimate of different local boiling characteristics including bubble growth rate, departure diameters and frequencies of nucleation, nucleation site density and evolution of bubbles and dry spots in time. At the same time, visual determination of the presented characteristics in case of big amounts of data requires the development of the appropriate software which will allow not only determination of bubble location, but also an estimate of their sizes based on high-speed video. The presented problem can be solved by using the instance segmentation approach based on a convolutional neural network. In the presented work Mask R-CNN network architecture was used for estimation of the local boiling characteristics.
Style APA, Harvard, Vancouver, ISO itp.
34

Islam, Md Mahbubul, i Joong-Hwan Baek. "A Hierarchical Approach toward Prediction of Human Biological Age from Masked Facial Image Leveraging Deep Learning Techniques". Applied Sciences 12, nr 11 (24.05.2022): 5306. http://dx.doi.org/10.3390/app12115306.

Pełny tekst źródła
Streszczenie:
The lifestyle of humans has changed noticeably since the contagious COVID-19 disease struck globally. People should wear a face mask as a protective measure to curb the spread of the contagious disease. Consequently, real-world applications (i.e., electronic customer relationship management) dealing with human ages extracted from face images must migrate to a robust system proficient to estimate the age of a person wearing a face mask. In this paper, we proposed a hierarchical age estimation model from masked facial images in a group-to-specific manner rather than a single regression model because age progression across different age groups is quite dissimilar. Our intention was to squeeze the feature space among limited age classes so that the model could fairly discern age. We generated a synthetic masked face image dataset over the IMDB-WIKI face image dataset to train and validate our proposed model due to the absence of a benchmark masked face image dataset with real age annotations. We somewhat mitigated the data sparsity problem of the large public IMDB-WIKI dataset using off-the-shelf down-sampling and up-sampling techniques as required. The age estimation task was fully modeled like a deep classification problem, and expected ages were formulated from SoftMax probabilities. We performed a classification task by deploying multiple low-memory and higher-accuracy-based convolutional neural networks (CNNs). Our proposed hierarchical framework demonstrated marginal improvement in terms of mean absolute error (MAE) compared to the one-off model approach for masked face real age estimation. Moreover, this research is perhaps the maiden attempt to estimate the real age of a person from his/her masked face image.
Style APA, Harvard, Vancouver, ISO itp.
35

Xiong, Feng, Chengju Liu i Qijun Chen. "Region Pixel Voting Network (RPVNet) for 6D Pose Estimation from Monocular Image". Applied Sciences 11, nr 2 (14.01.2021): 743. http://dx.doi.org/10.3390/app11020743.

Pełny tekst źródła
Streszczenie:
Recent studies have shown that deep learning achieves superior results in the task of estimating 6D-pose of target object from an image. End-to-end techniques use deep networks to predict pose directly from image, avoiding the limitations of handcraft features, but rely on training dataset to deal with occlusion. Two-stage algorithms alleviate this problem by finding keypoints in the image and then solving the Perspective-n-Point (PnP) problem to avoid directly fitting the transformation from image space to 6D-pose space. This paper proposes a novel two-stage method using only local features for pixel voting, called Region Pixel Voting Network (RPVNet). Front-end network detects target object and predicts its direction maps, from which the keypoints are recovered by pixel voting using Random Sample Consensus (RANSAC). The backbone, object detection network and mask prediction network of RPVNet are designed based on Mask R-CNN. Direction map is a vector field with the direction of each point pointing to its source keypoint. It is shown that predicting an object’s keypoints is related to its own pixels and independent of other pixels, which means the influence of occlusion decreases in the object’s region. Based on this phenomenon, in RPVNet, local features instead of the whole features, i.e., the output of the backbone, are used by a well-designed Convolutional Neural Networks (CNN) to compute direction maps. The local features are extracted from the whole features through RoIAlign, based on the region provided by detection network. Experiments on LINEMOD dataset show that RPVNet’s average accuracy (86.1%) is almost equal to state-of-the-art (86.4%) when no occlusion occurs. Meanwhile, results on Occlusion LINEMOD dataset show that RPVNet outperforms state-of-the-art (43.7% vs. 40.8%) and is more accurate for small object in occluded scenes.
Style APA, Harvard, Vancouver, ISO itp.
36

Wang, Bodi, Guixiong Liu i Junfang Wu. "Blind Deblurring of Saturated Images Based on Optimization and Deep Learning for Dynamic Visual Inspection on the Assembly Line". Symmetry 11, nr 5 (16.05.2019): 678. http://dx.doi.org/10.3390/sym11050678.

Pełny tekst źródła
Streszczenie:
Image deblurring can improve visual quality and mitigates motion blur for dynamic visual inspection. We propose a method to deblur saturated images for dynamic visual inspection by applying blur kernel estimation and deconvolution modeling. The blur kernel is estimated in a transform domain, whereas the deconvolution model is decoupled into deblurring and denoising stages via variable splitting. Deblurring predicts the mask specifying saturated pixels, which are then discarded, and denoising is learned via the fast and flexible denoising network (FFDNet) convolutional neural network (CNN) at a wide range of noise levels. Hence, the proposed deconvolution model provides the benefits of both model optimization and deep learning. Experiments demonstrate that the proposed method suitably restores visual quality and outperforms existing approaches with good score improvements.
Style APA, Harvard, Vancouver, ISO itp.
37

Li, Yung-Hui, Wenny Ramadha Putri, Muhammad Saqlain Aslam i Ching-Chun Chang. "Robust Iris Segmentation Algorithm in Non-Cooperative Environments Using Interleaved Residual U-Net". Sensors 21, nr 4 (18.02.2021): 1434. http://dx.doi.org/10.3390/s21041434.

Pełny tekst źródła
Streszczenie:
Iris segmentation plays an important and significant role in the iris recognition system. The prerequisite for accurate iris recognition is the correctness of iris segmentation. However, the efficiency and robustness of traditional iris segmentation methods are severely challenged in a non-cooperative environment because of unfavorable factors, for instance, occlusion, blur, low resolution, off-axis, motion, and specular reflections. All of the above factors seriously reduce the accuracy of iris segmentation. In this paper, we present a novel iris segmentation algorithm that localizes the outer and inner boundaries of the iris image. We propose a neural network model called “Interleaved Residual U-Net” (IRUNet) for semantic segmentation and iris mask synthesis. The K-means clustering is applied to select saliency points set in order to recover the outer boundary of the iris, whereas the inner border is recovered by selecting another set of saliency points on the inner side of the mask. Experimental results demonstrate that the proposed iris segmentation algorithm can achieve the mean IOU value of 98.9% and 97.7% for inner and outer boundary estimation, respectively, which outperforms the existing approaches on the challenging CASIA-Iris-Thousand database.
Style APA, Harvard, Vancouver, ISO itp.
38

Long, Libo, i Jochen Lang. "Regularization for Unsupervised Learning of Optical Flow". Sensors 23, nr 8 (18.04.2023): 4080. http://dx.doi.org/10.3390/s23084080.

Pełny tekst źródła
Streszczenie:
Regularization is an important technique for training deep neural networks. In this paper, we propose a novel shared-weight teacher–student strategy and a content-aware regularization (CAR) module. Based on a tiny, learnable, content-aware mask, CAR is randomly applied to some channels in the convolutional layers during training to be able to guide predictions in a shared-weight teacher–student strategy. CAR prevents motion estimation methods in unsupervised learning from co-adaptation. Extensive experiments on optical flow and scene flow estimation show that our method significantly improves on the performance of the original networks and surpasses other popular regularization methods. The method also surpasses all variants with similar architectures and the supervised PWC-Net on MPI-Sintel and on KITTI. Our method shows strong cross-dataset generalization, i.e., our method solely trained on MPI-Sintel outperforms a similarly trained supervised PWC-Net by 27.9% and 32.9% on KITTI, respectively. Our method uses fewer parameters and less computation, and has faster inference times than the original PWC-Net.
Style APA, Harvard, Vancouver, ISO itp.
39

Busch, Nils, Andreas Rausch i Thomas Schanze. "Estimation of accuracy loss by training a deep-learning-based cell organelle recognition software using full dataset and a reduced dataset containing only subviral particle distribution information". Current Directions in Biomedical Engineering 7, nr 2 (1.10.2021): 183–86. http://dx.doi.org/10.1515/cdbme-2021-2047.

Pełny tekst źródła
Streszczenie:
Abstract In collaboration with the Institute of Virology, Philipps University, Marburg, a deep-learning-based method that recognizes and classifies cell organelles based on the distribution of subviral particles in fluorescence microscopy images of virus-infected cells has been further developed. In this work a method to recognize cell organelles by means of partial image information is extended. The focus is on investigating loss of accuracy by only providing information about subviral particles and not all cell organelles to an adopted Mask-R convolutional neural network. Our results show that the subviral particle distribution holds information about the cell morphology, thus making it possible to use it for cell organelle-labelling.
Style APA, Harvard, Vancouver, ISO itp.
40

Hayatbini, Negin, Bailey Kong, Kuo-lin Hsu, Phu Nguyen, Soroosh Sorooshian, Graeme Stephens, Charless Fowlkes i Ramakrishna Nemani. "Conditional Generative Adversarial Networks (cGANs) for Near Real-Time Precipitation Estimation from Multispectral GOES-16 Satellite Imageries—PERSIANN-cGAN". Remote Sensing 11, nr 19 (20.09.2019): 2193. http://dx.doi.org/10.3390/rs11192193.

Pełny tekst źródła
Streszczenie:
In this paper, we present a state-of-the-art precipitation estimation framework which leverages advances in satellite remote sensing as well as Deep Learning (DL). The framework takes advantage of the improvements in spatial, spectral and temporal resolutions of the Advanced Baseline Imager (ABI) onboard the GOES-16 platform along with elevation information to improve the precipitation estimates. The procedure begins by first deriving a Rain/No Rain (R/NR) binary mask through classification of the pixels and then applying regression to estimate the amount of rainfall for rainy pixels. A Fully Convolutional Network is used as a regressor to predict precipitation estimates. The network is trained using the non-saturating conditional Generative Adversarial Network (cGAN) and Mean Squared Error (MSE) loss terms to generate results that better learn the complex distribution of precipitation in the observed data. Common verification metrics such as Probability Of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI), Bias, Correlation and MSE are used to evaluate the accuracy of both R/NR classification and real-valued precipitation estimates. Statistics and visualizations of the evaluation measures show improvements in the precipitation retrieval accuracy in the proposed framework compared to the baseline models trained using conventional MSE loss terms. This framework is proposed as an augmentation for PERSIANN-CCS (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network- Cloud Classification System) algorithm for estimating global precipitation.
Style APA, Harvard, Vancouver, ISO itp.
41

Wang, Hongfeng, Jianzhong Wang, Kemeng Bai i Yong Sun. "Centered Multi-Task Generative Adversarial Network for Small Object Detection". Sensors 21, nr 15 (31.07.2021): 5194. http://dx.doi.org/10.3390/s21155194.

Pełny tekst źródła
Streszczenie:
Despite the breakthroughs in accuracy and efficiency of object detection using deep neural networks, the performance of small object detection is far from satisfactory. Gaze estimation has developed significantly due to the development of visual sensors. Combining object detection with gaze estimation can significantly improve the performance of small object detection. This paper presents a centered multi-task generative adversarial network (CMTGAN), which combines small object detection and gaze estimation. To achieve this, we propose a generative adversarial network (GAN) capable of image super-resolution and two-stage small object detection. We exploit a generator in CMTGAN for image super-resolution and a discriminator for object detection. We introduce an artificial texture loss into the generator to retain the original feature of small objects. We also use a centered mask in the generator to make the network focus on the central part of images where small objects are more likely to appear in our method. We propose a discriminator with detection loss for two-stage small object detection, which can be adapted to other GANs for object detection. Compared with existing interpolation methods, the super-resolution images generated by CMTGAN are more explicit and contain more information. Experiments show that our method exhibits a better detection performance than mainstream methods.
Style APA, Harvard, Vancouver, ISO itp.
42

Zheng, Caiwang, Amr Abd-Elrahman, Vance M. Whitaker i Cheryl Dalid. "Deep Learning for Strawberry Canopy Delineation and Biomass Prediction from High-Resolution Images". Plant Phenomics 2022 (12.10.2022): 1–17. http://dx.doi.org/10.34133/2022/9850486.

Pełny tekst źródła
Streszczenie:
Modeling plant canopy biophysical parameters at the individual plant level remains a major challenge. This study presents a workflow for automatic strawberry canopy delineation and biomass prediction from high-resolution images using deep neural networks. High-resolution (5 mm) RGB orthoimages, near-infrared (NIR) orthoimages, and Digital Surface Models (DSM), which were generated by Structure from Motion (SfM), were utilized in this study. Mask R-CNN was applied to the orthoimages of two band combinations (RGB and RGB-NIR) to identify and delineate strawberry plant canopies. The average detection precision rate and recall rate were 97.28% and 99.71% for RGB images and 99.13% and 99.54% for RGB-NIR images, and the mean intersection over union (mIoU) rates for instance segmentation were 98.32% and 98.45% for RGB and RGB-NIR images, respectively. Based on the center of the canopy mask, we imported the cropped RGB, NIR, DSM, and mask images of individual plants to vanilla deep regression models to model canopy leaf area and dry biomass. Two networks (VGG-16 and ResNet-50) were used as the backbone architecture for feature map extraction. The R2 values of dry biomass models were about 0.76 and 0.79 for the VGG-16 and ResNet-50 networks, respectively. Similarly, the R2 values of leaf area were 0.82 and 0.84, respectively. The RMSE values were approximately 8.31 and 8.73 g for dry biomass analyzed using the VGG-16 and ResNet-50 networks, respectively. Leaf area RMSE was 0.05 m2 for both networks. This work demonstrates the feasibility of deep learning networks in individual strawberry plant extraction and biomass estimation.
Style APA, Harvard, Vancouver, ISO itp.
43

Shi, Furong, i Tong Zhang. "A Multi-Task Network with Distance–Mask–Boundary Consistency Constraints for Building Extraction from Aerial Images". Remote Sensing 13, nr 14 (6.07.2021): 2656. http://dx.doi.org/10.3390/rs13142656.

Pełny tekst źródła
Streszczenie:
Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.
Style APA, Harvard, Vancouver, ISO itp.
44

Javed, Nouman, Prasad N. Paradkar i Asim Bhatti. "Flight behaviour monitoring and quantification of aedes aegypti using convolution neural network". PLOS ONE 18, nr 7 (20.07.2023): e0284819. http://dx.doi.org/10.1371/journal.pone.0284819.

Pełny tekst źródła
Streszczenie:
Mosquito-borne diseases cause a huge burden on public health worldwide. The viruses that cause these diseases impact the behavioural traits of mosquitoes, including locomotion and feeding. Understanding these traits can help in improving existing epidemiological models and developing effective mosquito traps. However, it is difficult to understand the flight behaviour of mosquitoes due to their small sizes, complicated poses, and seemingly random moving patterns. Currently, no open-source tool is available that can detect and track resting or flying mosquitoes. Our work presented in this paper provides a detection and trajectory estimation method using the Mask RCNN algorithm and spline interpolation, which can efficiently detect mosquitoes and track their trajectories with higher accuracy. The method does not require special equipment and works excellently even with low-resolution videos. Considering the mosquito size, the proposed method’s detection performance is validated using a tracker error and a custom metric that considers the mean distance between positions (estimated and ground truth), pooled standard deviation, and average accuracy. The results showed that the proposed method could successfully detect and track the flying (≈ 96% accuracy) as well as resting (100% accuracy) mosquitoes. The performance can be impacted in the case of occlusions and background clutters. Overall, this research serves as an efficient open-source tool to facilitate further examination of mosquito behavioural traits.
Style APA, Harvard, Vancouver, ISO itp.
45

Bentsen, Thomas, Tobias May, Abigail A. Kressner i Torsten Dau. "The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility". PLOS ONE 13, nr 5 (15.05.2018): e0196924. http://dx.doi.org/10.1371/journal.pone.0196924.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
46

Yu, Tianze, Luke Bidulka, Martin J. McKeown i Z. Jane Wang. "PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation". Sensors 23, nr 3 (31.01.2023): 1555. http://dx.doi.org/10.3390/s23031555.

Pełny tekst źródła
Streszczenie:
This paper tackles a novel and challenging problem—3D hand pose estimation (HPE) from a single RGB image using partial annotation. Most HPE methods ignore the fact that the keypoints could be partially visible (e.g., under occlusions). In contrast, we propose a deep-learning framework, PA-Tran, that jointly estimates the keypoints status and 3D hand pose from a single RGB image with two dependent branches. The regression branch consists of a Transformer encoder which is trained to predict a set of target keypoints, given an input set of status, position, and visual features embedding from a convolutional neural network (CNN); the classification branch adopts a CNN for estimating the keypoints status. One key idea of PA-Tran is a selective mask training (SMT) objective that uses a binary encoding scheme to represent the status of the keypoints as observed or unobserved during training. In addition, by explicitly encoding the label status (observed/unobserved), the proposed PA-Tran can efficiently handle the condition when only partial annotation is available. Investigating the annotation percentage ranging from 50–100%, we show that training with partial annotation is more efficient (e.g., achieving the best 6.0 PA-MPJPE when using about 85% annotations). Moreover, we provide two new datasets. APDM-Hand, is for synthetic hands with APDM sensor accessories, which is designed for a specific hand task. PD-APDM-Hand, is a real hand dataset collected from Parkinson’s Disease (PD) patients with partial annotation. The proposed PA-Tran can achieve higher estimation accuracy when evaluated on both proposed datasets and a more general hand dataset.
Style APA, Harvard, Vancouver, ISO itp.
47

Yeom, Jong-Min, Seonyoung Park, Taebyeong Chae, Jin-Young Kim i Chang Suk Lee. "Spatial Assessment of Solar Radiation by Machine Learning and Deep Neural Network Models Using Data Provided by the COMS MI Geostationary Satellite: A Case Study in South Korea". Sensors 19, nr 9 (5.05.2019): 2082. http://dx.doi.org/10.3390/s19092082.

Pełny tekst źródła
Streszczenie:
Although data-driven methods including deep neural network (DNN) were introduced, there was not enough assessment about spatial characteristics when using limited ground observation as reference. This work aimed to interpret the feasibility of several machine learning approaches to assess the spatial distribution of solar radiation on Earth based on the Communication, Ocean, and Meteorological Satellite (COMS) Meteorological Imager (MI) geostationary satellite. Four data-driven models were selected (artificial neural network (ANN), random forest (RF), support vector regression (SVR), and DNN), to compare their accuracy and spatial estimating performance. Moreover, we used a physical model to probe the ability of data-driven methods, implementing hold-out and k-fold cross-validation approaches based on pyranometers located in South Korea. The results of analysis showed the RF had the highest accuracy in predicting performance, although the difference between RF and the second-best technique (DNN) was insignificant. Temporal variations in root mean square error (RMSE) were dependent on the number of data samples, while the physical model showed relatively less sensitivity. Nevertheless, DNN and RF showed less variability in RMSE than the others. To examine spatial estimation performance, we mapped solar radiation over South Korea for each model. The data-driven models accurately simulated the observed cloud pattern spatially, whereas the physical model failed to do because of cloud mask errors. These exhibited different spatial retrieval performances according to their own training approaches. Overall analysis showed that deeper layers of networks approaches (RF and DNN), could best simulate the challenging spatial pattern of thin clouds when using satellite multispectral data.
Style APA, Harvard, Vancouver, ISO itp.
48

Shin, Yoonsoo, Sekojae Heo, Sehee Han, Junhee Kim i Seunguk Na. "An Image-Based Steel Rebar Size Estimation and Counting Method Using a Convolutional Neural Network Combined with Homography". Buildings 11, nr 10 (9.10.2021): 463. http://dx.doi.org/10.3390/buildings11100463.

Pełny tekst źródła
Streszczenie:
Conventionally, the number of steel rebars at construction sites is manually counted by workers. However, this practice gives rise to several problems: it is slow, human-resource-intensive, time-consuming, error-prone, and not very accurate. Consequently, a new method of quickly and accurately counting steel rebars with a minimal number of workers needs to be developed to enhance work efficiency and reduce labor costs at construction sites. In this study, the authors developed an automated system to estimate the size and count the number of steel rebars in bale packing using computer vision techniques based on a convolutional neural network (CNN). A dataset containing 622 images of rebars with a total of 186,522 rebar cross sections and 409 poly tags was established for segmentation rebars and poly tags in images. The images were collected in a full HD resolution of 1920 × 1080 pixels and then center-cropped to 512 × 512 pixels. Moreover, data augmentation was carried out to create 4668 images for the training dataset. Based on the training dataset, YOLACT-based steel bar size estimation and a counting model with a Box and Mask of over 30 mAP was generated to satisfy the aim of this study. The proposed method, which is a CNN model combined with homography, can estimate the size and count the number of steel rebars in an image quickly and accurately, and the developed method can be applied to real construction sites to efficiently manage the stock of steel rebars.
Style APA, Harvard, Vancouver, ISO itp.
49

Tao, Ye, i Zhihao Ling. "Deep Features Homography Transformation Fusion Network—A Universal Foreground Segmentation Algorithm for PTZ Cameras and a Comparative Study". Sensors 20, nr 12 (17.06.2020): 3420. http://dx.doi.org/10.3390/s20123420.

Pełny tekst źródła
Streszczenie:
The foreground segmentation method is a crucial first step for many video analysis methods such as action recognition and object tracking. In the past five years, convolutional neural network based foreground segmentation methods have made a great breakthrough. However, most of them pay more attention to stationary cameras and have constrained performance on the pan–tilt–zoom (PTZ) cameras. In this paper, an end-to-end deep features homography transformation and fusion network based foreground segmentation method (HTFnetSeg) is proposed for surveillance videos recorded by PTZ cameras. In the kernel of HTFnetSeg, there is the combination of an unsupervised semantic attention homography estimation network (SAHnet) for frames alignment and a spatial transformed deep features fusion network (STDFFnet) for segmentation. The semantic attention mask in SAHnet reinforces the network to focus on background alignment by reducing the noise that comes from the foreground. STDFFnet is designed to reuse the deep features extracted during the semantic attention mask generation step by aligning the features rather than only the frames, with a spatial transformation technique in order to reduce the algorithm complexity. Additionally, a conservative strategy is proposed for the motion map based post-processing step to further reduce the false positives that are brought by semantic noise. The experiments on both CDnet2014 and Lasiesta show that our method outperforms many state-of-the-art methods, quantitively and qualitatively.
Style APA, Harvard, Vancouver, ISO itp.
50

Liang, Dongxue, Kyoungju Park i Przemyslaw Krompiec. "Facial Feature Model for a Portrait Video Stylization". Symmetry 10, nr 10 (28.09.2018): 442. http://dx.doi.org/10.3390/sym10100442.

Pełny tekst źródła
Streszczenie:
With the advent of the deep learning method, portrait video stylization has become more popular. In this paper, we present a robust method for automatically stylizing portrait videos that contain small human faces. By extending the Mask Regions with Convolutional Neural Network features (R-CNN) with a CNN branch which detects the contour landmarks of the face, we divided the input frame into three regions: the region of facial features, the region of the inner face surrounded by 36 face contour landmarks, and the region of the outer face. Besides keeping the facial features region as it is, we used two different stroke models to render the other two regions. During the non-photorealistic rendering (NPR) of the animation video, we combined the deformable strokes and optical flow estimation between adjacent frames to follow the underlying motion coherently. The experimental results demonstrated that our method could not only effectively reserve the small and distinct facial features, but also follow the underlying motion coherently.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii