Journal articles: 'Acoustic Scene Analysis'

1

Terez, Dmitry. "Acoustic scene analysis using microphone arrays." Journal of the Acoustical Society of America 128, no. 4 (October 2010): 2442. http://dx.doi.org/10.1121/1.3508731.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Itatani, Naoya, and Georg M. Klump. "Animal models for auditory streaming." Philosophical Transactions of the Royal Society B: Biological Sciences 372, no. 1714 (February 19, 2017): 20160112. http://dx.doi.org/10.1098/rstb.2016.0112.

Full text

Abstract:

Sounds in the natural environment need to be assigned to acoustic sources to evaluate complex auditory scenes. Separating sources will affect the analysis of auditory features of sounds. As the benefits of assigning sounds to specific sources accrue to all species communicating acoustically, the ability for auditory scene analysis is widespread among different animals. Animal studies allow for a deeper insight into the neuronal mechanisms underlying auditory scene analysis. Here, we will review the paradigms applied in the study of auditory scene analysis and streaming of sequential sounds in animal models. We will compare the psychophysical results from the animal studies to the evidence obtained in human psychophysics of auditory streaming, i.e. in a task commonly used for measuring the capability for auditory scene analysis. Furthermore, the neuronal correlates of auditory streaming will be reviewed in different animal models and the observations of the neurons’ response measures will be related to perception. The across-species comparison will reveal whether similar demands in the analysis of acoustic scenes have resulted in similar perceptual and neuronal processing mechanisms in the wide range of species being capable of auditory scene analysis. This article is part of the themed issue ‘Auditory and visual scene analysis’.

APA, Harvard, Vancouver, ISO, and other styles

3

Park, Sangwook, Woohyun Choi, and Hanseok Ko. "Acoustic scene classification using recurrence quantification analysis." Journal of the Acoustical Society of Korea 35, no. 1 (January 31, 2016): 42–48. http://dx.doi.org/10.7776/ask.2016.35.1.042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Imoto, Keisuke. "Introduction to acoustic event and scene analysis." Acoustical Science and Technology 39, no. 3 (May 1, 2018): 182–88. http://dx.doi.org/10.1250/ast.39.182.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Weisser, Adam, Jörg M. Buchholz, Chris Oreinos, Javier Badajoz-Davila, James Galloway, Timothy Beechey, and Gitte Keidser. "The Ambisonic Recordings of Typical Environments (ARTE) Database." Acta Acustica united with Acustica 105, no. 4 (July 1, 2019): 695–713. http://dx.doi.org/10.3813/aaa.919349.

Full text

Abstract:

Everyday listening environments are characterized by far more complex spatial, spectral and temporal sound field distributions than the acoustic stimuli that are typically employed in controlled laboratory settings. As such, the reproduction of acoustic listening environments has become important for several research avenues related to sound perception, such as hearing loss rehabilitation, soundscapes, speech communication, auditory scene analysis, automatic scene classification, and room acoustics. However, the recordings of acoustic environments that are used as test material in these research areas are usually designed specifically for one study, or are provided in custom databases that cannot be universally adapted, beyond their original application. In this work we present the Ambisonic Recordings of Typical Environments (ARTE) database, which addresses several research needs simultaneously: realistic audio recordings that can be reproduced in 3D, 2D, or binaurally, with known acoustic properties, including absolute level and room impulse response. Multichannel higher-order ambisonic recordings of 13 realistic typical environments (e.g., office, cafè, dinner party, train station) were processed, acoustically analyzed, and subjectively evaluated to determine their perceived identity. The recordings are delivered in a generic format that may be reproduced with different hardware setups, and may also be used in binaural, or single-channel setups. Room impulse responses, as well as detailed acoustic analyses, of all environments supplement the recordings. The database is made open to the research community with the explicit intention to expand it in the future and include more scenes.

APA, Harvard, Vancouver, ISO, and other styles

6

Hou, Yuanbo, and Dick Botteldooren. "Artificial intelligence-based collaborative acoustic scene and event classification to support urban soundscape analysis and classification." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 1 (February 1, 2023): 6466–73. http://dx.doi.org/10.3397/in_2022_0974.

Full text

Abstract:

A human listener embedded in a sonic environment will rely on meaning given to sound events as well as on general acoustic features to analyse and appraise its soundscape. However, currently used measurable indicators for soundscape mainly focus on the latter and meaning is only included indirectly. Yet, today's artificial intelligence (AI) techniques allow to recognise a variety of sounds and thus assign meaning to them. Hence, we propose to combine a model for acoustic event classification trained on the large-scale environmental sound database AudioSet, with a scene classification algorithm that couples direct identification of acoustic features with these recognised sound for scene recognition. The combined model is trained on TUT2018, a database containing ten everyday scenes. Applying the resulting AI-model to the soundscapes of the world database without further training shows that the classification that is obtained correlates to perceived calmness and liveliness evaluated by a test panel. It also allows to unravel why an acoustic environment sounds like a lively square or a calm park by analysing the type of sounds and their occurrence pattern over time. Moreover, disturbance of the acoustic environment that is expected based on visual clues, by e.g. traffic can easily be recognised.

APA, Harvard, Vancouver, ISO, and other styles

7

Tang, Zhenyu, Nicholas J. Bryan, Dingzeyu Li, Timothy R. Langlois, and Dinesh Manocha. "Scene-Aware Audio Rendering via Deep Acoustic Analysis." IEEE Transactions on Visualization and Computer Graphics 26, no. 5 (May 2020): 1991–2001. http://dx.doi.org/10.1109/tvcg.2020.2973058.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Ellison, William T., Adam S. Frankel, David Zeddies, Kathleen J. Vigness Raposa, and Cheryl Schroeder. "Underwater acoustic scene analysis: Exploration of appropriate metrics." Journal of the Acoustical Society of America 124, no. 4 (October 2008): 2433. http://dx.doi.org/10.1121/1.4782511.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Makino, S. "Special Section on Acoustic Scene Analysis and Reproduction." IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E91-A, no. 6 (June 1, 2008): 1301–2. http://dx.doi.org/10.1093/ietfec/e91-a.6.1301.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Mou, Xiao-Lei Zhang, and Susanto Rahardja. "An Unsupervised Deep Learning System for Acoustic Scene Analysis." Applied Sciences 10, no. 6 (March 19, 2020): 2076. http://dx.doi.org/10.3390/app10062076.

Full text

Abstract:

Acoustic scene analysis has attracted a lot of attention recently. Existing methods are mostly supervised, which requires well-predefined acoustic scene categories and accurate labels. In practice, there exists a large amount of unlabeled audio data, but labeling large-scale data is not only costly but also time-consuming. Unsupervised acoustic scene analysis on the other hand does not require manual labeling but is known to have significantly lower performance and therefore has not been well explored. In this paper, a new unsupervised method based on deep auto-encoder networks and spectral clustering is proposed. It first extracts a bottleneck feature from the original acoustic feature of audio clips by an auto-encoder network, and then employs spectral clustering to further reduce the noise and unrelated information in the bottleneck feature. Finally, it conducts hierarchical clustering on the low-dimensional output of the spectral clustering. To fully utilize the spatial information of stereo audio, we further apply the binaural representation and conduct joint clustering on that. To the best of our knowledge, this is the first time that a binaural representation is being used in unsupervised learning. Experimental results show that the proposed method outperforms the state-of-the-art competing methods.

APA, Harvard, Vancouver, ISO, and other styles

11

IMOTO, Keisuke, and Suehiro SHIMAUCHI. "Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence." IEICE Transactions on Information and Systems E99.D, no. 10 (2016): 2539–49. http://dx.doi.org/10.1587/transinf.2016slp0004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Sakoda, Keishi, and Ichiro Yamada. "Enhanced 3D (three dimensional) acoustic scene analysis based on sound arrival direction for automatic airport noise monitoring." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 2 (February 1, 2023): 5095–101. http://dx.doi.org/10.3397/in_2022_0736.

Full text

Abstract:

To deepen our understanding of the spatial and temporal distribution of various sound sources of environmental noise, which is observed by unattended aircraft noise monitors, we have been developing a method of acoustic scene analysis based on information on the 3D sound arrival direction. In the congress of INTERNOISE 2021 last year, we reported the basic idea of this method and some examples of the analysis. The method of acoustic scene analysis, named static mode and dynamic mode, is based on the information of the direction of arrival and sound pressure level of the sound in three dimensions from time to time. Since the moving path was limited to the one satisfying the specified preconditions, there were restrictions on the sound sources moving in the sky to be applied. Therefore, we decided to improve the versatility of the dynamic mode acoustic scene analysis method by adding a sensor that observes the direction of sound arrival. In this paper, we report the results of our experiments and analysis.

APA, Harvard, Vancouver, ISO, and other styles

13

Weisser, Adam, Jörg M. Buchholz, and Gitte Keidser. "Complex Acoustic Environments: Review, Framework, and Subjective Model." Trends in Hearing 23 (January 2019): 233121651988134. http://dx.doi.org/10.1177/2331216519881346.

Full text

Abstract:

The concept of complex acoustic environments has appeared in several unrelated research areas within acoustics in different variations. Based on a review of the usage and evolution of this concept in the literature, a relevant framework was developed, which includes nine broad characteristics that are thought to drive the complexity of acoustic scenes. The framework was then used to study the most relevant characteristics for stimuli of realistic, everyday, acoustic scenes: multiple sources, source diversity, reverberation, and the listener’s task. The effect of these characteristics on perceived scene complexity was then evaluated in an exploratory study that reproduced the same stimuli with a three-dimensional loudspeaker array inside an anechoic chamber. Sixty-five subjects listened to the scenes and for each one had to rate 29 attributes, including complexity, both with and without target speech in the scenes. The data were analyzed using three-way principal component analysis with a (2 3 2) Tucker3 model in the dimensions of scales (or ratings), scenes, and subjects, explaining 42% of variation in the data. “Comfort” and “variability” were the dominant scale components, which span the perceived complexity. Interaction effects were observed, including the additional task of attending to target speech that shifted the complexity rating closer to the comfort scale. Also, speech contained in the background scenes introduced a second subject component, which suggests that some subjects are more distracted than others by background speech when listening to target speech. The results are interpreted in light of the proposed framework.

APA, Harvard, Vancouver, ISO, and other styles

14

Ahrens, Axel, and Kasper Duemose Lund. "Auditory spatial analysis in reverberant multi-talker environments with congruent and incongruent audio-visual room information." Journal of the Acoustical Society of America 152, no. 3 (September 2022): 1586–94. http://dx.doi.org/10.1121/10.0013991.

Full text

Abstract:

In a multi-talker situation, listeners have the challenge of identifying a target speech source out of a mixture of interfering background noises. In the current study, it was investigated how listeners analyze audio-visual scenes with varying complexity in terms of number of talkers and reverberation. The visual information of the room was either congruent with the acoustic room or incongruent. The listeners' task was to locate an ongoing speech source in a mixture of other speech sources. The three-dimensional audio-visual scenarios were presented using a loudspeaker array and virtual reality glasses. It was shown that room reverberation, as well as the number of talkers in a scene, influence the ability to analyze an auditory scene in terms of accuracy and response time. Incongruent visual information of the room did not affect this ability. When few talkers were presented simultaneously, listeners were able to detect a target talker quickly and accurately even in adverse room acoustical conditions. Reverberation started to affect the response time when four or more talkers were presented. The number of talkers became a significant factor for five or more simultaneous talkers.

APA, Harvard, Vancouver, ISO, and other styles

15

Pelofi, C., V. de Gardelle, P. Egré, and D. Pressnitzer. "Interindividual variability in auditory scene analysis revealed by confidence judgements." Philosophical Transactions of the Royal Society B: Biological Sciences 372, no. 1714 (February 19, 2017): 20160107. http://dx.doi.org/10.1098/rstb.2016.0107.

Full text

Abstract:

Because musicians are trained to discern sounds within complex acoustic scenes, such as an orchestra playing, it has been hypothesized that musicianship improves general auditory scene analysis abilities. Here, we compared musicians and non-musicians in a behavioural paradigm using ambiguous stimuli, combining performance, reaction times and confidence measures. We used ‘Shepard tones’, for which listeners may report either an upward or a downward pitch shift for the same ambiguous tone pair. Musicians and non-musicians performed similarly on the pitch-shift direction task. In particular, both groups were at chance for the ambiguous case. However, groups differed in their reaction times and judgements of confidence. Musicians responded to the ambiguous case with long reaction times and low confidence, whereas non-musicians responded with fast reaction times and maximal confidence. In a subsequent experiment, non-musicians displayed reduced confidence for the ambiguous case when pure-tone components of the Shepard complex were made easier to discern. The results suggest an effect of musical training on scene analysis: we speculate that musicians were more likely to discern components within complex auditory scenes, perhaps because of enhanced attentional resolution, and thus discovered the ambiguity. For untrained listeners, stimulus ambiguity was not available to perceptual awareness. This article is part of the themed issue ‘Auditory and visual scene analysis’.

APA, Harvard, Vancouver, ISO, and other styles

16

Fabry, David, and Jürgen Tchorz. "Results from a new hearing aid using “acoustic scene analysis”." Hearing Journal 58, no. 4 (April 2005): 30–36. http://dx.doi.org/10.1097/01.hj.0000286604.84352.42.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Imoto, Keisuke, and Nobutaka Ono. "Acoustic Topic Model for Scene Analysis With Intermittently Missing Observations." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, no. 2 (February 2019): 367–82. http://dx.doi.org/10.1109/taslp.2018.2879855.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Aziz, Sumair, Muhammad Awais, Tallha Akram, Umar Khan, Musaed Alhussein, and Khursheed Aurangzeb. "Automatic Scene Recognition through Acoustic Classification for Behavioral Robotics." Electronics 8, no. 5 (April 30, 2019): 483. http://dx.doi.org/10.3390/electronics8050483.

Full text

Abstract:

Classification of complex acoustic scenes under real time scenarios is an active domain which has engaged several researchers lately form the machine learning community. A variety of techniques have been proposed for acoustic patterns or scene classification including natural soundscapes such as rain/thunder, and urban soundscapes such as restaurants/streets, etc. In this work, we present a framework for automatic acoustic classification for behavioral robotics. Motivated by several texture classification algorithms used in computer vision, a modified feature descriptor for sound is proposed which incorporates a combination of 1-D local ternary patterns (1D-LTP) and baseline method Mel-frequency cepstral coefficients (MFCC). The extracted feature vector is later classified using a multi-class support vector machine (SVM), which is selected as a base classifier. The proposed method is validated on two standard benchmark datasets i.e., DCASE and RWCP and achieves accuracies of 97.38 % and 94.10 % , respectively. A comparative analysis demonstrates that the proposed scheme performs exceptionally well compared to other feature descriptors.

APA, Harvard, Vancouver, ISO, and other styles

19

Sakoda, Keishi, Ichro Yamada, and Kenji Shinohara. "Sound arrival direction and acoustic scene analysis for the monitoring of airport noise." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 263, no. 2 (August 1, 2021): 4581–91. http://dx.doi.org/10.3397/in-2021-2753.

Full text

Abstract:

The authors have developed a sound direction detection method based on the cross-correlation method and applied it to automatic monitoring of aircraft noise and identification of sound sources. As aircraft performance improves, noise decreases, and people are interested in and dissatisfied with low-level noise aircraft, especially in urban areas where environmental noise and aircraft noise combine to complicate the acoustic environment. Therefore, it is necessary to monitor and to measure not only aircraft noise but also environmental noise. Since our surveillance is aircraft noise, it is important to analyze noise exposure from acoustic information rather than trucks or images. In this report, we will look back on the development process of this sound direction detection technology, show examples of helicopters and application examples of acoustic scene analysis to high-altitude aircraft, and consider the latest situation realized as acoustic environment monitoring. We believe that this analysis will make it easier to understand the noise exposure situation at the noise monitoring station. It also describes the future outlook for this method.

APA, Harvard, Vancouver, ISO, and other styles

20

Blanchard, T., P. Lecomte, M. Melon, L. Simon, K. Hassan, and R. Nicol. "Experimental acoustic scene analysis using One-Eighth spherical fraction microphone array." Journal of the Acoustical Society of America 151, no. 1 (January 2022): 180–92. http://dx.doi.org/10.1121/10.0009230.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Bregman, Albert S. "Issues in the use of acoustic cues for auditory scene analysis." Journal of the Acoustical Society of America 113, no. 4 (April 2003): 2231. http://dx.doi.org/10.1121/1.4780335.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Abidin, Shamsiah, Roberto Togneri, and Ferdous Sohel. "Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 11 (November 2018): 2112–21. http://dx.doi.org/10.1109/taslp.2018.2854861.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Et. al., R. Abinaya,. "Acoustic based Scene Event Identification Using Deep Learning CNN." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 5 (April 11, 2021): 1398–405. http://dx.doi.org/10.17762/turcomat.v12i5.2034.

Full text

Abstract:

Deep learning is becoming popular nowadays on solving the classification problems when compared with conventional classifiers. Large number of researchers are exploiting deep learning regarding sound event detection for environmental scene analysis. In this research, deep learning convolutional neural network (CNN) classifier is modelled using the extracted MFCC features for classifying the environmental event sounds. The experiment results clearly show that proposed MFCC-CNN outperform other existing methods with a high classification accuracy of 90.65%.

APA, Harvard, Vancouver, ISO, and other styles

24

McElveen, J. K., Leonid Krasny, and Scott Nordlund. "Applying matched field array processing and machine learning to computational auditory scene analysis and source separation challenges." Journal of the Acoustical Society of America 151, no. 4 (April 2022): A232. http://dx.doi.org/10.1121/10.0011162.

Full text

Abstract:

Matched field processing (MFP) techniques employing physics-based models of acoustic propagation have been successfully and widely applied to underwater target detection and localization, while machine learning (ML) techniques have enabled detection and extraction of patterns in data. Fusing MFP and ML enables the estimation of Green’s Function solutions to the Acoustic Wave Equation for waveguides from data captured in real, reverberant acoustic environments. These Green’s Function estimates can further enable the robust separation of individual sources, even in the presence of multiple loud, interfering, interposed, and competing noise sources. We first introduce MFP and ML and then discuss their application to Computational Auditory Scene Analysis (CASA) and acoustic source separation. Results from a variety of tests using a binaural headset, as well as different wearable and free-standing microphone arrays are then presented to illustrate the effects of the number and placement of sensors on the residual noise floor after separation. Finally, speculations on the similarities between this proprietary approach and the human auditory system’s use of interaural cross-correlation in formulation of acoustic spatial models will be introduced and ideas for further research proposed.

APA, Harvard, Vancouver, ISO, and other styles

25

McMullin, Margaret A., Nathan C. Higgins, Brian Gygi, Rohit Kumar, Mounya Elhilali, and Joel S. Snyder. "Perception of global properties, objects, and settings in natural auditory scenes." Journal of the Acoustical Society of America 153, no. 3_supplement (March 1, 2023): A329. http://dx.doi.org/10.1121/10.0019028.

Full text

Abstract:

Theories of auditory scene analysis suggest our perception of scenes relies on identifying and segregating objects within it. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. In our first experiment, we studied perception of eight global properties (e.g., openness), using a collection of 200 high-quality auditory scenes. Participants showed high agreement on their ratings of global properties. The global properties were explained by a two-factor model. Acoustic features of scenes were explained by a seven-factor model, and linearly predicted the global ratings by different amounts (R-squared = 0.33–0.87), although we also observed nonlinear relationships between acoustical and global variables. A multi-layer neural network trained to recognize auditory objects in everyday soundscapes from YouTube shows high-level embeddings of our 200 scenes are correlated with some global variables at earlier stages of processing than others. In a second experiment, we evaluated participants’ accuracy in identifying the setting of and objects within scenes across three durations (1, 2, and 4 s). Overall, participants performed better on the object identification task, but needed longer duration stimuli to do so. These results suggest object identification may require more processing time and/or attention switching than setting identification.

APA, Harvard, Vancouver, ISO, and other styles

26

Hajihashemi, Vahid, Abdorreza Alavi Gharahbagh, Pedro Miguel Cruz, Marta Campos Ferreira, José J. M. Machado, and João Manuel R. S. Tavares. "Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion." Sensors 22, no. 4 (February 16, 2022): 1535. http://dx.doi.org/10.3390/s22041535.

Full text

Abstract:

The analysis of ambient sounds can be very useful when developing sound base intelligent systems. Acoustic scene classification (ASC) is defined as identifying the area of a recorded sound or clip among some predefined scenes. ASC has huge potential to be used in urban sound event classification systems. This research presents a hybrid method that includes a novel mathematical fusion step which aims to tackle the challenges of ASC accuracy and adaptability of current state-of-the-art models. The proposed method uses a stereo signal, two ensemble classifiers (random subspace), and a novel mathematical fusion step. In the proposed method, a stable, invariant signal representation of the stereo signal is built using Wavelet Scattering Transform (WST). For each mono, i.e., left and right, channel, a different random subspace classifier is trained using WST. A novel mathematical formula for fusion step was developed, its parameters being found using a Genetic algorithm. The results on the DCASE 2017 dataset showed that the proposed method has higher classification accuracy (about 95%), pushing the boundaries of existing methods.

APA, Harvard, Vancouver, ISO, and other styles

27

Weller, Tobias, Virginia Best, Jörg M. Buchholz, and Taegan Young. "A Method for Assessing Auditory Spatial Analysis in Reverberant Multitalker Environments." Journal of the American Academy of Audiology 27, no. 07 (July 2016): 601–11. http://dx.doi.org/10.3766/jaaa.15109.

Full text

Abstract:

Background: Deficits in spatial hearing can have a negative impact on listeners’ ability to orient in their environment and follow conversations in noisy backgrounds and may exacerbate the experience of hearing loss as a handicap. However, there are no good tools available for reliably capturing the spatial hearing abilities of listeners in complex acoustic environments containing multiple sounds of interest. Purpose: The purpose of this study was to explore a new method to measure auditory spatial analysis in a reverberant multitalker scenario. Research Design: This study was a descriptive case control study. Study Sample: Ten listeners with normal hearing (NH) aged 20–31 yr and 16 listeners with hearing impairment (HI) aged 52–85 yr participated in the study. The latter group had symmetrical sensorineural hearing losses with a four-frequency average hearing loss of 29.7 dB HL. Data Collection and Analysis: A large reverberant room was simulated using a loudspeaker array in an anechoic chamber. In this simulated room, 96 scenes comprising between one and six concurrent talkers at different locations were generated. Listeners were presented with 45-sec samples of each scene, and were required to count, locate, and identify the gender of all talkers, using a graphical user interface on an iPad. Performance was evaluated in terms of correctly counting the sources and accuracy in localizing their direction. Results: Listeners with NH were able to reliably analyze scenes with up to four simultaneous talkers, while most listeners with hearing loss demonstrated errors even with two talkers at a time. Localization performance decreased in both groups with increasing number of talkers and was significantly poorer in listeners with HI. Overall performance was significantly correlated with hearing loss. Conclusions: This new method appears to be useful for estimating spatial abilities in realistic multitalker scenes. The method is sensitive to the number of sources in the scene, and to effects of sensorineural hearing loss. Further work will be needed to compare this method to more traditional single-source localization tests.

APA, Harvard, Vancouver, ISO, and other styles

28

Kim, Jaehoon, Jeongkyu Oh, and Tae-Young Heo. "Acoustic Scene Classification and Visualization of Beehive Sounds Using Machine Learning Algorithms and Grad-CAM." Mathematical Problems in Engineering 2021 (May 24, 2021): 1–13. http://dx.doi.org/10.1155/2021/5594498.

Full text

Abstract:

Honeybees play a crucial role in the agriculture industry because they pollinate approximately 75% of all flowering crops. However, every year, the number of honeybees continues to decrease. Consequently, numerous researchers in various fields have persistently attempted to solve this problem. Acoustic scene classification, using sounds recorded from beehives, is an approach that can be applied to detect changes inside beehives. This method can be used to determine intervals that threaten a beehive. Currently, studies on sound analysis, using deep learning algorithms integrated with various data preprocessing methods that extract features from sound signals, continue to be conducted. However, there is little insight into how deep learning algorithms recognize audio scenes, as demonstrated by studies on image recognition. Therefore, in this study, we used a mel spectrogram, mel-frequency cepstral coefficients (MFCCs), and a constant-Q transform to compare the performance of conventional machine learning models to that of convolutional neural network (CNN) models. We used the support vector machine, random forest, extreme gradient boosting, shallow CNN, and VGG-13 models. Using gradient-weighted class activation mapping (Grad-CAM), we conducted an analysis to determine how the best-performing CNN model recognized audio scenes. The results showed that the VGG-13 model, using MFCCs as input data, demonstrated the best accuracy (91.93%). Additionally, based on the precision, recall, and F1-score for each class, we established that sounds other than those from bees were effectively recognized. Further, we conducted an analysis to determine the MFCCs that are important for classification through the visualizations obtained by applying Grad-CAM to the VGG-13 model. We believe that our findings can be used to develop a monitoring system that can consistently detect abnormal conditions in beehives early by classifying the sounds inside beehives.

APA, Harvard, Vancouver, ISO, and other styles

29

Hey, Matthias, Adam A. Hersbach, Thomas Hocke, Stefan J. Mauger, Britta Böhnke, and Alexander Mewes. "Ecological Momentary Assessment to Obtain Signal Processing Technology Preference in Cochlear Implant Users." Journal of Clinical Medicine 11, no. 10 (May 23, 2022): 2941. http://dx.doi.org/10.3390/jcm11102941.

Full text

Abstract:

Background: To assess the performance of cochlear implant users, speech comprehension benefits are generally measured in controlled sound room environments of the laboratory. For field-based assessment of preference, questionnaires are generally used. Since questionnaires are typically administered at the end of an experimental period, they can be inaccurate due to retrospective recall. An alternative known as ecological momentary assessment (EMA) has begun to be used for clinical research. The objective of this study was to determine the feasibility of using EMA to obtain in-the-moment responses from cochlear implant users describing their technology preference in specific acoustic listening situations. Methods: Over a two-week period, eleven adult cochlear implant users compared two listening programs containing different sound processing technologies during everyday take-home use. Their task was to compare and vote for their preferred program. Results: A total of 205 votes were collected from acoustic environments that were classified into six listening scenes. The analysis yielded different patterns of voting among the subjects. Two subjects had a consistent preference for one sound processing technology across all acoustic scenes, three subjects changed their preference based on the acoustic scene, and six subjects had no conclusive preference for either technology. Conclusion: Results show that EMA is suitable for quantifying real-world self-reported preference, showing inter-subject variability in different listening environments. However, there is uncertainty that patients will not provide sufficient spontaneous feedback. One improvement for future research is a participant forced prompt to improve response rates.

APA, Harvard, Vancouver, ISO, and other styles

30

Imoto, Keisuke, and Nobutaka Ono. "Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, no. 6 (June 2017): 1335–43. http://dx.doi.org/10.1109/taslp.2017.2690559.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Reed, Albert, Juhyeon Kim, Thomas Blanford, Adithya Pediredla, Daniel Brown, and Suren Jayasuriya. "Neural Volumetric Reconstruction for Coherent Synthetic Aperture Sonar." ACM Transactions on Graphics 42, no. 4 (July 26, 2023): 1–20. http://dx.doi.org/10.1145/3592141.

Full text

Abstract:

Synthetic aperture sonar (SAS) measures a scene from multiple views in order to increase the resolution of reconstructed imagery. Image reconstruction methods for SAS coherently combine measurements to focus acoustic energy onto the scene. However, image formation is typically under-constrained due to a limited number of measurements and bandlimited hardware, which limits the capabilities of existing reconstruction methods. To help meet these challenges, we design an analysis-by-synthesis optimization that leverages recent advances in neural rendering to perform coherent SAS imaging. Our optimization enables us to incorporate physics-based constraints and scene priors into the image formation process. We validate our method on simulation and experimental results captured in both air and water. We demonstrate both quantitatively and qualitatively that our method typically produces superior reconstructions than existing approaches. We share code and data for reproducibility.

APA, Harvard, Vancouver, ISO, and other styles

32

Bayram, Barış, and Gökhan İnce. "An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition." Sensors 21, no. 19 (October 5, 2021): 6622. http://dx.doi.org/10.3390/s21196622.

Full text

Abstract:

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN–LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment.

APA, Harvard, Vancouver, ISO, and other styles

33

Oreinos, Chris, and Jörg M. Buchholz. "Evaluation of Loudspeaker-Based Virtual Sound Environments for Testing Directional Hearing Aids." Journal of the American Academy of Audiology 27, no. 07 (July 2016): 541–56. http://dx.doi.org/10.3766/jaaa.15094.

Full text

Abstract:

Background: Assessments of hearing aid (HA) benefits in the laboratory often do not accurately reflect real-life experience. This may be improved by employing loudspeaker-based virtual sound environments (VSEs) that provide more realistic acoustic scenarios. It is unclear how far the limited accuracy of these VSEs influences measures of subjective performance. Purpose: Verify two common methods for creating VSEs that are to be used for assessing HA outcomes. Research Design: A cocktail-party scene was created inside a meeting room and then reproduced with a 41-channel loudspeaker array inside an anechoic chamber. The reproduced scenes were created either by using room acoustic modeling techniques or microphone array recordings. Study Sample: Participants were 18 listeners with a symmetrical, sloping, mild-to-moderate hearing loss, aged between 66 and 78 yr (mean = 73.8 yr). Data Collection and Analysis: The accuracy of the two VSEs was assessed by comparing the subjective performance measured with two-directional HA algorithms inside all three acoustic environments. The performance was evaluated by using a speech intelligibility test and an acceptable noise level task. Results: The general behavior of the subjective performance seen in the real environment was preserved in the two VSEs for both directional HA algorithms. However, the estimated directional benefits were slightly reduced in the model-based VSE, and further reduced in the recording-based VSE. Conclusions: It can be concluded that the considered VSEs can be used for testing directional HAs, but the provided sensitivity is reduced when compared to a real environment. This can result in an underestimation of the provided directional benefit. However, this minor limitation may be easily outweighed by the high realism of the acoustic scenes that these VSEs can generate, which may result in HA outcome measures with a significantly higher ecological relevance than provided by measures commonly performed in the laboratory or clinic.

APA, Harvard, Vancouver, ISO, and other styles

34

Fishman, Yonatan I., Christophe Micheyl, and Mitchell Steinschneider. "Neural mechanisms of rhythmic masking release in monkey primary auditory cortex: implications for models of auditory scene analysis." Journal of Neurophysiology 107, no. 9 (May 1, 2012): 2366–82. http://dx.doi.org/10.1152/jn.01010.2011.

Full text

Abstract:

The ability to detect and track relevant acoustic signals embedded in a background of other sounds is crucial for hearing in complex acoustic environments. This ability is exemplified by a perceptual phenomenon known as “rhythmic masking release” (RMR). To demonstrate RMR, a sequence of tones forming a target rhythm is intermingled with physically identical “Distracter” sounds that perceptually mask the rhythm. The rhythm can be “released from masking” by adding “Flanker” tones in adjacent frequency channels that are synchronous with the Distracters. RMR represents a special case of auditory stream segregation, whereby the target rhythm is perceptually segregated from the background of Distracters when they are accompanied by the synchronous Flankers. The neural basis of RMR is unknown. Previous studies suggest the involvement of primary auditory cortex (A1) in the perceptual organization of sound patterns. Here, we recorded neural responses to RMR sequences in A1 of awake monkeys in order to identify neural correlates and potential mechanisms of RMR. We also tested whether two current models of stream segregation, when applied to these responses, could account for the perceptual organization of RMR sequences. Results suggest a key role for suppression of Distracter-evoked responses by the simultaneous Flankers in the perceptual restoration of the target rhythm in RMR. Furthermore, predictions of stream segregation models paralleled the psychoacoustics of RMR in humans. These findings reinforce the view that preattentive or “primitive” aspects of auditory scene analysis may be explained by relatively basic neural mechanisms at the cortical level.

APA, Harvard, Vancouver, ISO, and other styles

35

Plumbley, Mark, and Tuomas Virtanen. "Creating a new research community on detection and classification of acoustic scenes and events: Lessons from the first ten years of DCASE challenges and workshops." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 3 (February 1, 2023): 4472–79. http://dx.doi.org/10.3397/in_2022_0643.

Full text

Abstract:

Research work on automatic speech recognition and automatic music transcription has been around for several decades, supported by dedicated conferences or conference sessions. However, while individual researchers have been working on recognition of more general environmental sounds, until ten years ago there were no regular workshops or conference sessions where this research, or its researchers, could be found. There was also little available data for researchers to work on or to benchmark their work. In this talk we will outline how a new research community working on Detection and Classification of Acoustic Scenes and Events (DCASE) has grown over the last ten years, from two challenges on acoustic scene classification and sound event detection with a small workshop poster session, to an annual data challenge with six tasks and a dedicated annual workshop, attracting hundreds of delegates and strong industry interest. We will also describe how the analysis methods have evolved, from mel frequency cepstral coefficients (MFCCs) or cochelograms classified by support vector machines (SVMs) or hidden Markov models (HMMs), to deep learning methods such as transfer learning, transformers, and self-supervised learning. We will finish by suggesting some potential future directions for automatic sound recognition and the DCASE community.

APA, Harvard, Vancouver, ISO, and other styles

36

Wei, Chong, Dorian Houser, Christine Erbe, Chuang Zhang, Eszter Matrai, James J. Finneran, and Whitlow W. Au. "Does rotation during echolocation increase the acoustic field of view? Comparative numerical models based on CT data of a live versus deceased dolphin." Journal of the Acoustical Society of America 151, no. 4 (April 2022): A107. http://dx.doi.org/10.1121/10.0010799.

Full text

Abstract:

Spinning is a natural and common dolphin behavior; however, its role in echolocation is unknown. We used computed tomography (CT) data of a live and a recently deceased bottlenose dolphin together with measurements of the acoustic properties of head tissues to perform acoustic property reconstrcution. The anatomical configuration and acoustic properties of the main forehead structures between the live and deceased dolphins were compared. Finite element analysis (FEA) was applied to simulate the generation and propagation of echolocation clicks, to compute their waveforms and spectra in both near- and far-fields, and to derive echolocation beam patterns. Model results from both the live and deceased dolphins were in good agreement with click recordings from live, echolocating individuals. FEA was also used to estimate the acoustic scene experienced by a dolphin rotating 180ãabout its longitudinal axis to detect fish in the far-field at elevation angles of 0ã–20ã. The results suggest that the spinning behavior provides a wider insonification area and compensates for the dolphin’s relatively narrow biosonar beam and constraints on the pointing direction that are limited by head movement. The results also have implications for examining the accuracy of FEA in acoustic simulations using freshly deceased specimens.

APA, Harvard, Vancouver, ISO, and other styles

37

Barker, Jon P. "Evaluation of scene analysis using real and simulated acoustic mixtures: Lessons learnt from the CHiME speech recognition challenges." Journal of the Acoustical Society of America 141, no. 5 (May 2017): 3693. http://dx.doi.org/10.1121/1.4988044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Obrenovic, Zeljko. "Experimental evaluation of user performance in a pursuit tracking task with multimodal feedback." Yugoslav Journal of Operations Research 14, no. 1 (2004): 99–115. http://dx.doi.org/10.2298/yjor0401099o.

Full text

Abstract:

In this paper we describe the results of experimental evaluation of user performance in a pursuit-tracking task with multimodal feedback. Our experimental results indicate that audio can significantly improve the accuracy of pursuit tracking. Experiments with 19 participants have shown that addition of acoustic modalities reduces the error during pursuit tracking for up to 19%. Moreover, experiments indicated the existence of perceptual boundaries of multimodal HCI for different scene complexity and target speeds. We have also shown that the most appealing paradigms are not the most effective ones, which necessitates a careful quantitative analysis of proposed multimodal HCI paradigms.

APA, Harvard, Vancouver, ISO, and other styles

39

Rusk, Zane T., Michelle C. Vigeant, and Matthew Neal. "Free-field perceptual evaluation of virtual acoustic rendering algorithms using two head-related impulse response delay treatment strategies." Journal of the Acoustical Society of America 153, no. 3_supplement (March 1, 2023): A221. http://dx.doi.org/10.1121/10.0018720.

Full text

Abstract:

Sound scenes can be auralized over headphones using binaural rendering techniques in conjunction with a set of head-related impulse responses (HRIRs). If the directions of the sound objects to be rendered are known, either virtual loudspeaker or Ambisonic scene-based techniques may be used, each of which introduce spatial and timbral artifacts at lower spatial resolutions. Neal and Zahorik quantitatively evaluated the effect of separately applying the HRIR delays to time-aligned HRIRs for use with virtual loudspeaker array techniques, referred to as the prior HRIR delays treatment strategy (J. Acoust. Soc. Am. 2022). The present work aims to perceptually validate their quantitative results in a listening study. Free-field point sources were binaurally rendered using five different methods: vector-based amplitude panning, Ambisonics panning, direct spherical harmonic transform of the HRIR set, MagLS, and Principal Component-Base Amplitude Panning (Neal and Zahorik, Audio Eng. Soc. AVAR Conference 2022). Renderings were simulated at various directions and spatial resolutions, both with and without the prior HRIR delays treatment strategy and compared to direct HRIR convolution. Broadband noise served as a critical test signal for revealing changes in timbre and localization and a high-resolution binaural head HRIR dataset was used. The subjective data analysis will be presented.

APA, Harvard, Vancouver, ISO, and other styles

40

Kaya, Emine Merve, and Mounya Elhilali. "Modelling auditory attention." Philosophical Transactions of the Royal Society B: Biological Sciences 372, no. 1714 (February 19, 2017): 20160101. http://dx.doi.org/10.1098/rstb.2016.0101.

Full text

Abstract:

Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information—a phenomenon referred to as the ‘cocktail party problem’. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by ‘bottom-up’ sensory-driven factors, as well as ‘top-down’ task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes. This article is part of the themed issue ‘Auditory and visual scene analysis’.

APA, Harvard, Vancouver, ISO, and other styles

41

Yang, Jing Song, Xiu Ling He, and Li Xin Li. "Multi-Sensor Life Detection Synergy Platform Design." Applied Mechanics and Materials 442 (October 2013): 520–25. http://dx.doi.org/10.4028/www.scientific.net/amm.442.520.

Full text

Abstract:

Life detection based on a single type of information sources detection technology cannot completely meet the needs of the earthquake relief. The existing life detection techniques are include of acoustic wave life detection, optical life detection and radar life detection. The advantages and existing problems of the three life detection techniques are analyzed. The advantages and present situation of multi-sensor detection synergy technique are given . We explained the platform structure , multi-sensor selection strategy, and information fusion model. Finally, the development direction of life detection technique based on multi-sensor synergy is forecasted. This platform mainly realizes the infrared image and acoustic wave synchronization collection, analysis and processing of both audio life detection and video detection function, make up for the deficiency of the single sensor life detector, can get a more comprehensive and more accurate information about the scene of rescue and living organisms, is more suitable for complex quake rescue environment.

APA, Harvard, Vancouver, ISO, and other styles

42

Rascon, Caleb. "A Corpus-Based Evaluation of Beamforming Techniques and Phase-Based Frequency Masking." Sensors 21, no. 15 (July 23, 2021): 5005. http://dx.doi.org/10.3390/s21155005.

Full text

Abstract:

Beamforming is a type of audio array processing techniques used for interference reduction, sound source localization, and as pre-processing stage for audio event classification and speaker identification. The auditory scene analysis community can benefit from a systemic evaluation and comparison between different beamforming techniques. In this paper, five popular beamforming techniques are evaluated in two different acoustic environments, while varying the number of microphones, the number of interferences, and the direction-of-arrival error, by using the Acoustic Interactions for Robot Audition (AIRA) corpus and a common software framework. Additionally, a highly efficient phase-based frequency masking beamformer is also evaluated, which is shown to outperform all five techniques. Both the evaluation corpus and the beamforming implementations are freely available and provided for experiment repeatability and transparency. Raw results are also provided as a complement to this work to the reader, to facilitate an informed decision of which technique to use. Finally, the insights and tendencies observed from the evaluation results are presented.

APA, Harvard, Vancouver, ISO, and other styles

43

Kashino, Makio, and Hirohito M. Kondo. "Functional brain networks underlying perceptual switching: auditory streaming and verbal transformations." Philosophical Transactions of the Royal Society B: Biological Sciences 367, no. 1591 (April 5, 2012): 977–87. http://dx.doi.org/10.1098/rstb.2011.0370.

Full text

Abstract:

Recent studies have shown that auditory scene analysis involves distributed neural sites below, in, and beyond the auditory cortex (AC). However, it remains unclear what role each site plays and how they interact in the formation and selection of auditory percepts. We addressed this issue through perceptual multistability phenomena, namely, spontaneous perceptual switching in auditory streaming (AS) for a sequence of repeated triplet tones, and perceptual changes for a repeated word, known as verbal transformations (VTs). An event-related fMRI analysis revealed brain activity timelocked to perceptual switching in the cerebellum for AS, in frontal areas for VT, and the AC and thalamus for both. The results suggest that motor-based prediction, produced by neural networks outside the auditory system, plays essential roles in the segmentation of acoustic sequences both in AS and VT. The frequency of perceptual switching was determined by a balance between the activation of two sites, which are proposed to be involved in exploring novel perceptual organization and stabilizing current perceptual organization. The effect of the gene polymorphism of catechol- O -methyltransferase (COMT) on individual variations in switching frequency suggests that the balance of exploration and stabilization is modulated by catecholamines such as dopamine and noradrenalin. These mechanisms would support the noteworthy flexibility of auditory scene analysis.

APA, Harvard, Vancouver, ISO, and other styles

44

Jenny, Claudia, and Christoph Reuter. "Usability of Individualized Head-Related Transfer Functions in Virtual Reality: Empirical Study With Perceptual Attributes in Sagittal Plane Sound Localization." JMIR Serious Games 8, no. 3 (September 8, 2020): e17576. http://dx.doi.org/10.2196/17576.

Full text

Abstract:

Background In order to present virtual sound sources via headphones spatially, head-related transfer functions (HRTFs) can be applied to audio signals. In this so-called binaural virtual acoustics, the spatial perception may be degraded if the HRTFs deviate from the true HRTFs of the listener. Objective In this study, participants wearing virtual reality (VR) headsets performed a listening test on the 3D audio perception of virtual audiovisual scenes, thus enabling us to investigate the necessity and influence of the individualization of HRTFs. Two hypotheses were investigated: first, general HRTFs lead to limitations of 3D audio perception in VR and second, the localization model for stationary localization errors is transferable to nonindividualized HRTFs in more complex environments such as VR. Methods For the evaluation, 39 subjects rated individualized and nonindividualized HRTFs in an audiovisual virtual scene on the basis of 5 perceptual qualities: localizability, front-back position, externalization, tone color, and realism. The VR listening experiment consisted of 2 tests: in the first test, subjects evaluated their own and the general HRTF from the Massachusetts Institute of Technology Knowles Electronics Manikin for Acoustic Research database and in the second test, their own and 2 other nonindividualized HRTFs from the Acoustics Research Institute HRTF database. For the experiment, 2 subject-specific, nonindividualized HRTFs with a minimal and maximal localization error deviation were selected according to the localization model in sagittal planes. Results With the Wilcoxon signed-rank test for the first test, analysis of variance for the second test, and a sample size of 78, the results were significant in all perceptual qualities, except for the front-back position between own and minimal deviant nonindividualized HRTF (P=.06). Conclusions Both hypotheses have been accepted. Sounds filtered by individualized HRTFs are considered easier to localize, easier to externalize, more natural in timbre, and thus more realistic compared to sounds filtered by nonindividualized HRTFs.

APA, Harvard, Vancouver, ISO, and other styles

45

Yu, Boya, Linjie Wen, Jie Bai, and Yuying Chai. "Effect of Road and Railway Sound on Psychological and Physiological Responses in an Office Environment." Buildings 12, no. 1 (December 22, 2021): 6. http://dx.doi.org/10.3390/buildings12010006.

Full text

Abstract:

The present study aims to explore the psychophysiological impact of different traffic sounds in office spaces. In this experiment, 30 subjects were recruited and exposed to different traffic sounds in a virtual reality (VR) office scene. The road traffic sound and three railway sounds (conventional train, high-speed train, and tram) with three sound levels (45, 55, and 65 dB) were used as the acoustic stimuli. Physiological responses, electrodermal activity (EDA) and heart rate (HR) were monitored throughout the experiment. Psychological evaluations under each acoustic stimulus were also measured using scales within the VR system. The results showed that both the psychological and the physiological responses were significantly affected by the traffic sounds. As for psychological responses, considerable adverse effects of traffic sounds were observed, which constantly increased with the increase in the sound level. The peak sound level was found to have a better performance than the equivalent sound level in the assessment of the psychological impact of traffic sounds. As for the physiological responses, significant effects of both the acoustic factors (sound type and sound level) and the non-acoustic factors (gender and exposure time) were observed. The relationship between sound level and physiological parameters varied among different sound groups. The variation in sound level hardly affected the participants’ HR and EDA when exposed to the conventional train and tram sounds. In contrast, HR and EDA were significantly affected by the levels of road traffic sound and high-speed train sound. Through a correlation analysis, a relatively weak correlation between the psychological evaluations and HR was found.

APA, Harvard, Vancouver, ISO, and other styles

46

Ma, Yuxuan. "Common techniques and deep learning application prospects for sound event detection." Applied and Computational Engineering 6, no. 1 (June 14, 2023): 293–99. http://dx.doi.org/10.54254/2755-2721/6/20230795.

Full text

Abstract:

Modern techniques for employing deep learning for sound event identification (SED) challenges have improved significantly. In this paper, the author discusses the development of deep learning models for SED tasks in recent years; and the performance advantages and disadvantages shown by using different deep learning methods for the same sound event dataset. This paper also introduces a few techniques effectively increase the precision of sound detection and possible development trends of SED task methods by analyzing the entries in the 2016-2017 Acoustic Scene and Event Detection and Classification (DCASE) Challenge. Through analysis, this paper finds that the accuracy of the deep learning model used for SED to identify target events will continue to improve to be suitable for industrial and life scenarios, so this is still a valuable research.

APA, Harvard, Vancouver, ISO, and other styles

47

Franklin, Clifford A., Letitia J. White, Thomas C. Franklin, and Laura Smith-Olinde. "The Relationship between the Acceptance of Noise and Acoustic Environments in Young Adults with Normal Hearing: A Pilot Study." Journal of the American Academy of Audiology 25, no. 06 (June 2014): 584–91. http://dx.doi.org/10.3766/jaaa.25.6.8.

Full text

Abstract:

Background: The acceptable noise level (ANL) indicates how much background noise a listener is willing to accept while listening to speech. The clinical impact and application of the ANL measure is as a predictor of hearing-aid use. The ANL may also correlate with the percentage of time spent in different listening environments (i.e., quiet, noisy, noisy with speech present, etc). Information retrieved from data logging could confirm this relationship. Data logging, using sound scene analysis, is a method of monitoring the different characteristics of the listening environments that a hearing-aid user experiences during a period. Purpose: The purpose of this study was to determine if the ANL procedure reflects the proportion of time a person spends in different acoustic environments. Research Design: This was a descriptive quasi-experimental design to collect pilot data in which participants were asked to maintain their regular, daily activities while wearing a data-logging device. Study Sample: After completing the ANL measurement, 29 normal-hearing listeners were provided a data-logging device and were instructed on its proper use. Data Collection/Analysis: ANL measures were obtained along with the percentage of time participants spent in listening environments classified as quiet, speech-in-quiet, speech-in-noise, and noise via a data-logging device. Results: An analysis of variance using a general linear model indicated that listeners with low ANL values spent more time in acoustic environments in which background noise was present than did those with high ANL values; the ANL data did not indicate differences in how much time listeners spent in environments of differing intensities. Conclusions: To some degree, the ANL is reflective of the acoustic environments and the amount of noise that the listener is willing to accept; data logging illustrates the acoustic environments in which the listener was present. Clinical implications include, but are not limited to, decisions in patient care regarding the need for additional counseling and/or the use of digital noise reduction and directional microphone technology.

APA, Harvard, Vancouver, ISO, and other styles

48

Yu, Boya, and Yuying Chai. "Psychophysiological responses to traffic noises in urban green spaces." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 4 (February 1, 2023): 3864–71. http://dx.doi.org/10.3397/in_2022_0548.

Full text

Abstract:

The present study aims to explore the psychophysiological impact of different traffic sounds in urban green spaces. In the experiment, 30 subjects were recruited and exposed to different traffic sounds in the virtual reality (VR) scene. The road traffic sound and three railway sounds (con-ventional train, high-speed train, and tram) with three sound levels (45, 55, and 65 dB) were used as the acoustic stimuli. Physiological responses, electrodermal activity (EDA) and heart rate (HR) were monitored throughout the experiment. Psychological evaluations under each acoustic stimuli were also measured using scales within the VR system. The results showed that both the psychological and the physiological responses were significantly affected by the traffic sounds. As for psychological responses, considerable adverse effects of traffic sounds were observed, which constantly increased with the increase of the sound level. The peak sound level was found to have a better performance than the equivalent sound level in the assessment of the psycho-logical impact of traffic sounds. As for the physiological responses, significant effects of both the acoustic factors (sound type and sound level) and the non-acoustic factors (gender and exposure time) were observed. The physiological effect of high-speed train noise was significantly differ-ent from those of the other three traffic noises. The relationship between sound level and physi-ological parameters varied among different sound groups. The variation of sound level could hardly affect the participants' HR and EDA when exposed to the road traffic noise. On the con-trary, the physiological responses were significantly affected by the sound level of rail traffic noise. By a correlation analysis, no linear correlation between the psychological evaluations and HR was found.

APA, Harvard, Vancouver, ISO, and other styles

49

Scharine, Angelique A., and Michael K. McBeath. "Natural Regularity of Correlated Acoustic Frequency and Intensity in Music and Speech: Auditory Scene Analysis Mechanisms Account for Integrality of Pitch and Loudness." Auditory Perception & Cognition 1, no. 3-4 (October 2, 2018): 205–28. http://dx.doi.org/10.1080/25742442.2019.1600935.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Mjali Mbaideen, Adnan, Ashinida Binti Aladdin, Imran Ho-Abdullah, and Mohammad Khawaldah. "Acoustic Prepositional Deletion in the Quran: The Case of إلى , ilā. A Constructional Grammar Approach." International Journal of Applied Linguistics and English Literature 8, no. 3 (May 30, 2019): 55. http://dx.doi.org/10.7575/aiac.ijalel.v.8n.3p.55.

Full text

Abstract:

The purpose of this study is to investigate the linguistic phenomenon of Acoustic Prepositional Deletion (APD) ( نزع الخافض سماعيا , nazʿi al-khāfiḍ smaʿyan) in the Quran. It mainly addresses deleting thepreposition إلى , ilā from some verses of the Quran despite being preceded by an intransitive verb. Thestudy applies the perspective of Cognitive Linguistic (CL) theory and its relevant approaches to the analysis of the data included. Construction Grammar (CxG) is mainly used to examine to what extent the (non)existence of an element (i.e. preposition) of a particular construction may lead to the alternation of the spatial relationships existing between its elements, and what consequences may appear due to the manipulation of the existing relationships. The study finds that APD results in new partially or totally different, opposite or negative, abstract or manner spatial relationships between the construction entities which in turn result in different semantic conceptualization of these relationships. It also finds out that the degree of loyalty to the spatial scene in the Target Text (TT) varies from partially loyal to disloyal. This validates Croft’s (2001-2017) hypothesis that meaning is both construction and language specific.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Acoustic Scene Analysis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles