Journal articles on the topic 'Audio data'

To see the other types of publications on this topic, follow the link: Audio data.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Audio data.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Matsunuma, Yasuhiro. "Audio data processing apparatus and audio data distributing apparatus." Journal of the Acoustical Society of America 124, no. 4 (2008): 1903. http://dx.doi.org/10.1121/1.3001094.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Schuller, Gerald, Matthias Gruhne, and Tobias Friedrich. "Fast Audio Feature Extraction From Compressed Audio Data." IEEE Journal of Selected Topics in Signal Processing 5, no. 6 (October 2011): 1262–71. http://dx.doi.org/10.1109/jstsp.2011.2158802.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wylie, F. "Digital audio data compression." Electronics & Communication Engineering Journal 7, no. 1 (February 1, 1995): 5–10. http://dx.doi.org/10.1049/ecej:19950103.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Seok, Jong Won, and Jin Woo Hong. "Audio watermarking for copyright protection of digital audio data." Electronics Letters 37, no. 1 (2001): 60. http://dx.doi.org/10.1049/el:20010029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Patil, Adwait. "Covid Classification Using Audio Data." International Journal for Research in Applied Science and Engineering Technology 9, no. 10 (October 31, 2021): 1633–37. http://dx.doi.org/10.22214/ijraset.2021.38675.

Full text
Abstract:
Abstract: Coronavirus outbreak has affected the entire world adversely this project has been developed in order to help common masses diagnose their chances of been covid positive just by using coughing sound and basic patient data. Audio classification is one of the most interesting applications of deep learning. Similar to image data audio data is also stored in form of bits and to understand and analyze this audio data we have used Mel frequency cepstral coefficients (MFCCs) which makes it possible to feed the audio to our neural network. In this project we have used Coughvid a crowdsource dataset consisting of 27000 audio files and metadata of same amount of patients. In this project we have used a 1D Convolutional Neural Network (CNN) to process the audio and metadata. Future scope for this project will be a model that rates how likely it is that a person is infected instead of binary classification. Keywords: Audio classification, Mel frequency cepstral coefficients, Convolutional neural network, deep learning, Coughvid
APA, Harvard, Vancouver, ISO, and other styles
6

BASYSTIUK, Oleh, and Nataliia MELNYKOVA. "MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA." Herald of Khmelnytskyi National University. Technical sciences 313, no. 5 (October 27, 2022): 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.

Full text
Abstract:
Systems of machine translation of texts from one language to another simulate the work of a human translator. Their performance depends on the ability to understand the grammar rules of the language. In translation, the basic units are not individual words, but word combinations or phraseological units that express different concepts. Only by using them, more complex ideas can be expressed through the translated text. The main feature of machine translation is different length for input and output. The ability to work with different lengths of input and output provides us with the approach of recurrent neural networks. A recurrent neural network (RNN) is a class of artificial neural network that has connections between nodes. In this case, a connection refers to a connection from a more distant node to a less distant node. The presence of connections allows the RNN to remember and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming, such networks are analogous to cyclic execution, and from the point of view of the system, such networks are equivalent to a state machine. RNNs are commonly used to process word sequences in natural language processing. Usually, a hidden Markov model (HMM) and an N-program language model are used to process a sequence of words. Deep learning has completely changed the approach to machine translation. Researchers in the deep learning field has created simple solutions based on machine learning that outperform the best expert systems. In this paper was reviewed the main features of machine translation based on recurrent neural networks. The advantages of systems based on RNN using the sequence-to-sequence model against statistical translation systems are also highlighted in the article. Two machine translation systems based on the sequence-to-sequence model were constructed using Keras and PyTorch machine learning libraries. Based on the obtained results, libraries analysis was done, and their performance comparison.
APA, Harvard, Vancouver, ISO, and other styles
7

Wu, S., J. Huang, D. Huang, and Y. Q. Shi. "Efficiently Self-Synchronized Audio Watermarking for Assured Audio Data Transmission." IEEE Transactions on Broadcasting 51, no. 1 (March 2005): 69–76. http://dx.doi.org/10.1109/tbc.2004.838265.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Struthers, Allan. "Radioactive Decay: Audio Data Collection." PRIMUS 19, no. 4 (June 12, 2009): 388–95. http://dx.doi.org/10.1080/10511970802238829.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

LIN, RUEI-SHIANG, and LING-HWEI CHEN. "A NEW APPROACH FOR CLASSIFICATION OF GENERIC AUDIO DATA." International Journal of Pattern Recognition and Artificial Intelligence 19, no. 01 (February 2005): 63–78. http://dx.doi.org/10.1142/s0218001405003958.

Full text
Abstract:
The existing audio retrieval systems fall into one of two categories: single-domain systems that can accept data of only a single type (e.g. speech) or multiple-domain systems that offer content-based retrieval for multiple types of audio data. Since a single-domain system has limited applications, a multiple-domain system will be more useful. However, different types of audio data will have different properties, this will make a multiple-domain system harder to be developed. If we can classify audio information in advance, the above problems can be solved. In this paper, we will propose a real-time classification method to classify audio signals into several basic audio types such as pure speech, music, song, speech with music background, and speech with environmental noise background. In order to make the proposed method robust for a variety of audio sources, we use Bayesian decision function for multivariable Gaussian distribution instead of manually adjusting a threshold for each discriminator. The proposed approach can be applied to content-based audio/video retrieval. In the experiment, the efficiency and effectiveness of this method are shown by an accuracy rate of more than 96% for general audio data classification.
APA, Harvard, Vancouver, ISO, and other styles
10

Alderete, John, and Monica Davies. "Investigating Perceptual Biases, Data Reliability, and Data Discovery in a Methodology for Collecting Speech Errors From Audio Recordings." Language and Speech 62, no. 2 (April 6, 2018): 281–317. http://dx.doi.org/10.1177/0023830918765012.

Full text
Abstract:
This work describes a methodology of collecting speech errors from audio recordings and investigates how some of its assumptions affect data quality and composition. Speech errors of all types (sound, lexical, syntactic, etc.) were collected by eight data collectors from audio recordings of unscripted English speech. Analysis of these errors showed that: (i) different listeners find different errors in the same audio recordings, but (ii) the frequencies of error patterns are similar across listeners; (iii) errors collected “online” using on the spot observational techniques are more likely to be affected by perceptual biases than “offline” errors collected from audio recordings; and (iv) datasets built from audio recordings can be explored and extended in a number of ways that traditional corpus studies cannot be.
APA, Harvard, Vancouver, ISO, and other styles
11

Premjith B., Neethu Mohan, Prabaharan Poornachandran, and Soman K.P. "Audio Data Authentication with PMU Data and EWT." Procedia Technology 21 (2015): 596–603. http://dx.doi.org/10.1016/j.protcy.2015.10.066.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Geiger, Ralf. "Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data." Journal of the Acoustical Society of America 123, no. 3 (2008): 1233. http://dx.doi.org/10.1121/1.2901358.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Manoharan, J. Samuel. "Audio Tagging Using CNN Based Audio Neural Networks for Massive Data Processing." December 2021 3, no. 4 (December 24, 2021): 365–74. http://dx.doi.org/10.36548/jaicn.2021.4.008.

Full text
Abstract:
Sound event detection, speech emotion classification, music classification, acoustic scene classification, audio tagging and several other audio pattern recognition applications are largely dependent on the growing machine learning technology. The audio pattern recognition issues are also addressed by neural networks in recent days. The existing systems operate within limited durations on specific datasets. Pretrained systems with large datasets in natural language processing and computer vision applications over the recent years perform well in several tasks. However, audio pattern recognition research with large-scale datasets is limited in the current scenario. In this paper, a large-scale audio dataset is used for training a pre-trained audio neural network. Several audio related tasks are performed by transferring this audio neural network. Several convolution neural networks are used for modeling the proposed audio neural network. The computational complexity and performance of this system are analyzed. The waveform and leg-mel spectrogram are used as input features in this architecture. During audio tagging, the proposed system outperforms the existing systems with a mean average of 0.45. The performance of the proposed model is demonstrated by applying the audio neural network to five specific audio pattern recognition tasks.
APA, Harvard, Vancouver, ISO, and other styles
14

Kadiri, Sudarsana Reddy, and Paavo Alku. "Subjective Evaluation of Basic Emotions from Audio–Visual Data." Sensors 22, no. 13 (June 29, 2022): 4931. http://dx.doi.org/10.3390/s22134931.

Full text
Abstract:
Understanding of the perception of emotions or affective states in humans is important to develop emotion-aware systems that work in realistic scenarios. In this paper, the perception of emotions in naturalistic human interaction (audio–visual data) is studied using perceptual evaluation. For this purpose, a naturalistic audio–visual emotion database collected from TV broadcasts such as soap-operas and movies, called the IIIT-H Audio–Visual Emotion (IIIT-H AVE) database, is used. The database consists of audio-alone, video-alone, and audio–visual data in English. Using data of all three modes, perceptual tests are conducted for four basic emotions (angry, happy, neutral, and sad) based on category labeling and for two dimensions, namely arousal (active or passive) and valence (positive or negative), based on dimensional labeling. The results indicated that the participants’ perception of emotions was remarkably different between the audio-alone, video-alone, and audio–video data. This finding emphasizes the importance of emotion-specific features compared to commonly used features in the development of emotion-aware systems.
APA, Harvard, Vancouver, ISO, and other styles
15

Wang, Peng, Xia Wang, and Xia Liu. "Selection of Audio Learning Resources Based on Big Data." International Journal of Emerging Technologies in Learning (iJET) 17, no. 06 (March 29, 2022): 23–38. http://dx.doi.org/10.3991/ijet.v17i06.30013.

Full text
Abstract:
Currently, audio learning resources account for a large proportion of the total online learning resources. Designing and implementing a method for optimizing and selecting audio learning resources based on big data of education will be of great significance to the recommendation of learning resources. Therefore, this paper studies a method for selecting audio learning resources based on the big data of education, with music learning as an example. First, the audio signals were converted into mel spectrograms, and accordingly, the mel-frequency cepstral coefficient features of audio learning resources were obtained. Then, on the basis of the conventional content-based audio recommendation algorithm, the established interest degree vector of target students with respect to music learning was expanded, and a collaborative filtering hybrid algorithm for audio learning resources that incorporates the interest degrees of neighbouring students was proposed, which effectively improved the accuracy and stability in the prediction of students’ interest in music learning. Finally, the experimental results verified the feasibility and prediction accuracy of the proposed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
16

Lee, Jae-Woo. "Design and Construction of Fiber Optical Link Application System for Multi-Video Audio Data Transmission." Journal of the Korea Academia-Industrial cooperation Society 10, no. 10 (October 31, 2009): 2691–95. http://dx.doi.org/10.5762/kais.2009.10.10.2691.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Samudra, Yoga. "Mirroring-Based Data Hiding in Audio." International Journal of Intelligent Engineering and Systems 14, no. 5 (October 31, 2021): 550–58. http://dx.doi.org/10.22266/ijies2021.1031.48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Mande, Ameya Ajit. "EMOTION DETECTION USING AUDIO DATA SAMPLES." International Journal of Advanced Research in Computer Science 10, no. 6 (December 20, 2019): 13–20. http://dx.doi.org/10.26483/ijarcs.v10i6.6489.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Sakthisudhan, K., S. Gayathri Priya, P. Prabhu, and P. Thangaraj. "Secure Data Transmission Using Audio Steganography." i-manager's Journal on Electronics Engineering 2, no. 3 (May 15, 2012): 1–6. http://dx.doi.org/10.26634/jele.2.3.1763.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Guarino, Joe, Wes Orme, and Wayne Fischer. "Audio enhancement of biomechanical impact data." Journal of the Acoustical Society of America 125, no. 4 (April 2009): 2731. http://dx.doi.org/10.1121/1.4784508.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Yamasaki, Yoshio, and Itaru Kaneko. "MPEG. 2-2. Audio Data Coding." Journal of the Institute of Television Engineers of Japan 49, no. 4 (1995): 422–30. http://dx.doi.org/10.3169/itej1978.49.422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Tan, E., and B. Vermuelen. "Digital audio tape for data storage." IEEE Spectrum 26, no. 10 (October 1989): 34–38. http://dx.doi.org/10.1109/6.40682.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Ikeda, Mikio, Ryouzoh Toyoshima, Kazuya Takeda, and Fumitada Itakura. "Audio data hiding using band elimination." Electronics and Communications in Japan (Part II: Electronics) 86, no. 2 (January 15, 2003): 57–67. http://dx.doi.org/10.1002/ecjb.10120.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Chen, Lieu-Hen, Pin-Chieh Cheng, Hao-Ming Hung, Wei-Fen Hsieh, and Yasufumi Takama. "An Audio-Visual Information Visualization System for Time-Varying Big Data." SIJ Transactions on Computer Science Engineering & its Applications (CSEA) 03, no. 05 (October 20, 2015): 13–19. http://dx.doi.org/10.9756/sijcsea/v3i5/03080260402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Xu, Yanping, and Sen Xu. "A Clustering Analysis Method for Massive Music Data." Modern Electronic Technology 5, no. 1 (May 6, 2021): 24. http://dx.doi.org/10.26549/met.v5i1.6763.

Full text
Abstract:
Clustering analysis plays a very important role in the field of data mining, image segmentation and pattern recognition. The method of cluster analysis is introduced to analyze NetEYun music data. In addition, different types of music data are clustered to find the commonness among the same kind of music. A music data-oriented clustering analysis method is proposed: Firstly, the audio beat period is calculated by reading the audio file data, and the emotional features of the audio are extracted; Secondly, the audio beat period is calculated by Fourier transform. Finally, a clustering algorithm is designed to obtain the clustering results of music data.
APA, Harvard, Vancouver, ISO, and other styles
26

Teh, Do-Hui. "Apparatus and method for stereo audio encoding of digital audio signal data." Journal of the Acoustical Society of America 103, no. 1 (January 1998): 21. http://dx.doi.org/10.1121/1.423157.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Karamchandani, Sunil H., Krutarth J. Gandhi, Siddharth R. Gosalia, Vinod K. Madan, Shabbir N. Merchant, and Uday B. Desai. "PCA Encrypted Short Acoustic Data Inculcated in Digital Color Images." International Journal of Computers Communications & Control 10, no. 5 (July 1, 2015): 678. http://dx.doi.org/10.15837/ijccc.2015.5.2029.

Full text
Abstract:
We propose develop a generalized algorithm for hiding audio signal using image steganography. The authors suggest transmitting short audio messages camouflaged in digital images using Principal Component Analysis (PCA) as an encryption technique. The quantum of principal components required to represent the audio signal by removing the redundancies is a measure of the magnitude of the Eigen values. The aforementioned technique follows a dual task of encryption and in turn also compresses the audio data, sufficient enough to be buried in the image. A 57Kb audio signal is decipher from the Stego image with a high PSNR of 47.49 and a correspondingly low mse of 3.3266 × 10􀀀^(-6) with an equalized high quality audio output. The consistent and comparable experimental results on application of the proposed method across a series of images demonstrate that PCA based encryption can be adapted as an universal rule for a specific payload and the desired compression ratio.
APA, Harvard, Vancouver, ISO, and other styles
28

Kang, Yu, Tianqiao Liu, Hang Li, Yang Hao, and Wenbiao Ding. "Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10875–83. http://dx.doi.org/10.1609/aaai.v36i10.21334.

Full text
Abstract:
Multimodal pre-training for audio-and-text has recently been proved to be effective and has significantly improved the performance of many downstream speech understanding tasks. However, these state-of-the-art pre-training audio-text models work well only when provided with large amount of parallel audio-and-text data, which brings challenges on many languages that are rich in unimodal corpora but scarce of parallel cross-modal corpus. In this paper, we investigate whether it is possible to pre-train an audio-text multimodal model with extremely low-resource parallel data and extra non-parallel unimodal data. Our pre-training framework consists of the following components: (1) Intra-modal Denoising Auto-Encoding (IDAE), which is able to reconstruct input text (audio) representations from a noisy version of itself. (2) Cross-modal Denoising Auto-Encoding (CDAE), which is pre-trained to reconstruct the input text (audio), given both a noisy version of the input text (audio) and the corresponding translated noisy audio features (text embeddings). (3) Iterative Denoising Process (IDP), which iteratively translates raw audio (text) and the corresponding text embeddings (audio features) translated from previous iteration into the new less-noisy text embeddings (audio features). We adapt a dual cross-modal Transformer as our backbone model which consists of two unimodal encoders for IDAE and two cross-modal encoders for CDAE and IDP. Our method achieves comparable performance on multiple downstream speech understanding tasks compared with the model pre-trained on fully parallel data, demonstrating the great potential of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
29

Budiman, Gelar, Andriyan Bayu Suksmono, and Donny Danudirdjo. "Compressive Sampling with Multiple Bit Spread Spectrum-Based Data Hiding." Applied Sciences 10, no. 12 (June 24, 2020): 4338. http://dx.doi.org/10.3390/app10124338.

Full text
Abstract:
We propose a novel data hiding method in an audio host with a compressive sampling technique. An over-complete dictionary represents a group of watermarks. Each row of the dictionary is a Hadamard sequence representing multiple bits of the watermark. Then, the singular values of the segment-based host audio in a diagonal matrix are multiplied by the over-complete dictionary, producing a lower size matrix. At the same time, we embed the watermark into the compressed audio. In the detector, we detect the watermark and reconstruct the audio. This proposed method offers not only hiding the information, but also compressing the audio host. The application of the proposed method is broadcast monitoring and biomedical signal recording. We can mark and secure the signal content by hiding the watermark inside the signal while we compress the signal for memory efficiency. We evaluate the performance in terms of payload, compression ratio, audio quality, and watermark quality. The proposed method can hide the data imperceptibly, in the range of 729–5292 bps, with a compression ratio 1.47–4.84, and a perfectly detected watermark.
APA, Harvard, Vancouver, ISO, and other styles
30

BAKIR, Çiğdem. "Compressing English Speech Data with Hybrid Methods without Data Loss." International Journal of Applied Mathematics Electronics and Computers 10, no. 3 (September 30, 2022): 68–75. http://dx.doi.org/10.18100/ijamec.1166951.

Full text
Abstract:
Understanding the mechanism of speech formation is of great importance in the successful coding of the speech signal. It is also used for various applications, from authenticating audio files to connecting speech recording to data acquisition device (e.g. microphone). Speech coding is of vital importance in the acquisition, analysis and evaluation of sound, and in the investigation of criminal events in forensics. For the collection, processing, analysis, extraction and evaluation of speech or sounds recorded as audio files, which play an important role in crime detection, it is necessary to compress the audio without data loss. Since there are many voice changing software available today, the number of recorded speech files and their correct interpretation play an important role in detecting originality. Using various techniques such as signal processing, noise extraction, filtering on an incomprehensible speech recording, improving the speech, making them comprehensible, determining whether there is any manipulation on the speech recording, understanding whether it is original, whether various methods of addition and subtraction are used, coding of sounds, the code must be decoded and the decoded sounds must be transcribed. In this study, first of all, what sound coding is, its purposes, areas of use, classification of sound coding according to some features and techniques are given. Moreover, in our study speech coding was done on the English audio data. This dataset is the real dataset and consists of approximately 100000 voice recordings. Speech coding was done using waveform, vocoders and hybrid methods and the success of all the methods used on the system we created was measured. Hybrid models gave more successful results than others. The results obtained will set an example for our future work.
APA, Harvard, Vancouver, ISO, and other styles
31

NOVAMIZANTI, LEDYA, GELAR BUDIMAN, and BHISMA ADI WIBOWO. "Optimasi Sistem Penyembunyian Data pada Audio menggunakan Sub-band Stasioner dan Manipulasi Rata-rata Statistik." ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika 6, no. 2 (July 9, 2018): 165. http://dx.doi.org/10.26760/elkomika.v6i2.165.

Full text
Abstract:
ABSTRAKKasus pelanggaran hak cipta musik atau lagu menjadi masalah dan mendapat perhatian serius oleh industri musik di Indonesia. Teknik audio watermarking merupakan salah satu solusi untuk melindungi hak cipta audio digital dari tindakan ilegal dengan cara menyembunyikan watermark berupa identitas pemilik ke dalam audio tersebut. Pada penelitian ini, audio host diubah menjadi matriks 1-dimensi untuk proses framing. Selanjutnya Stationary Wavelet Transform (SWT) digunakan untuk mendapatkan sub- band stasioner terpilih yang akan disisipkan watermark. Metode Statistical Mean Manipulation (SMM) akan menghitung rata-rata host audio dalam satu frame, dan dilakukan proses penyisipan bit. Optimasi dilakukan dengan melakukan evaluasi parameter yang menghasilkan BER paling tinggi setelah sistem diberikan serangan. Hasil dari optimasi diperoleh suatu sistem audio watermarking yang kuat dan tahan terhadap gangguan signal, dengan rata-rata BER 0.113, SNR 31 dB, ODG -0.6, dan MOS 4.6.Kata kunci: audio watermarking, SWT, SMM, optimasiABSTRACTThe case of copyright infringement of music or song becomes a serious problem in Indonesia. Audio watermarking technique is one solution to protect the music copyright of digital audio from illegal acts by hidding the watermark in the form owner's identity into the audio. The workings of audio watermarking is to embed the watermark in the form owner's identity into the audio. In this study, the audio host is converted into a 1-dimensional matrix for the framing process. Furthermore Stationary Wavelet Transform (SWT) used to obtain stationary sub-bands selected to be inserted watermark. Statistical methods Mean Manipulation (SMM) will calculate the average host audio in one frame, and do bits insertion process. Optimization is done by evaluating the parameters that produce the highest BER after the system is given an attack. The results of the optimization obtained an audio watermarking system that is robust and resistant to signal interference, with the average BER 0.113, SNR 31 dB, ODG -0.6, and MOS 4.6. Keywords: audio watermarking, SWT, SMM, optimization
APA, Harvard, Vancouver, ISO, and other styles
32

Huang, Xinchao, Zihan Liu, Wei Lu, Hongmei Liu, and Shijun Xiang. "Fast and Effective Copy-Move Detection of Digital Audio Based on Auto Segment." International Journal of Digital Crime and Forensics 11, no. 2 (April 2019): 47–62. http://dx.doi.org/10.4018/ijdcf.2019040104.

Full text
Abstract:
Detecting digital audio forgeries is a significant research focus in the field of audio forensics. In this article, the authors focus on a special form of digital audio forgery—copy-move—and propose a fast and effective method to detect doctored audios. First, the article segments the input audio data into syllables by voice activity detection and syllable detection. Second, the authors select the points in the frequency domain as feature by applying discrete Fourier transform (DFT) to each audio segment. Furthermore, this article sorts every segment according to the features and gets a sorted list of audio segments. In the end, the article merely compares one segment with some adjacent segments in the sorted list so that the time complexity is decreased. After comparisons with other state of the art methods, the results show that the proposed method can identify the authentication of the input audio and locate the forged position fast and effectively.
APA, Harvard, Vancouver, ISO, and other styles
33

Xu, Xin, and Su Mei Xi. "Cross-Media Retrieval Method Based on Space Mapping." Advanced Materials Research 756-759 (September 2013): 1898–902. http://dx.doi.org/10.4028/www.scientific.net/amr.756-759.1898.

Full text
Abstract:
This paper puts forward a novel cross-media retrieval approach, which can process multimedia data of different modalities and measure cross-media similarity, such as image-audio similarity. Both image and audio data are selected for experiments and comparisons. Given the same visual and auditory features the new approach outperforms ICA, PCA and PLS methods both in precision and recall performance. Overall cross-media retrieval results between images and audios are very encouraging.
APA, Harvard, Vancouver, ISO, and other styles
34

Jang, Miso, and Dong-Chul Park. "Application of Classifier Integration Model with Confusion Table to Audio Data Classification." International Journal of Machine Learning and Computing 9, no. 3 (June 2019): 368–73. http://dx.doi.org/10.18178/ijmlc.2019.9.3.812.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Leban, Roy. "System and method for communicating audio data signals via an audio communications medium." Journal of the Acoustical Society of America 119, no. 2 (2006): 694. http://dx.doi.org/10.1121/1.2174528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

HARAHAP, HANNAN, GELAR BUDIMAN, and LEDYA NOVAMIZANTI. "Implementasi Teknik Watermarking menggunakan FFT dan Spread Spectrum Watermark pada Data Audio Digital." ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika 4, no. 1 (May 2, 2018): 98. http://dx.doi.org/10.26760/elkomika.v4i1.98.

Full text
Abstract:
ABSTRAKPenggunaan teknologi dan internet yang berkembang dengan pesat menyebabkan banyak pemalsuan dan penyebaran yang tidak sah terhadap data digital. Oleh karena itu, sangat diperlukan suatu teknologi yang dapat melindungi hak cipta data multimedia seperti audio. Teknik yang sering digunakan dalam perlindungan hak cipta adalah watermarking karena teknik ini memiliki tiga kriteria utama dalam keamanan data, yaitu robustness, imperceptibility, dan safety. Untuk itu, pada penelitian ini dirancang suatu skema yang dapat melindungi hak cipta data audio. Metode yang digunakan adalah Fast Fourier Transform, yang mengubah data audio asli ke dalam domain frekuensi sebelum dilakukan proses penyisipan watermark dan proses ekstraksi watermark. Watermark disebar pada komponen yang paling signifikan dari spektrum magnitude audio host. Teknik watermarking pada penelitian ini dapat menghasilkan Signal-to-Noise Ratio di atas 20 dB dan Bit Error Rate di bawah 5%.Kata kunci: Audio watermarking, Copyright Protection, Fast Fourier Transform, Spektrum magnitudeABSTRACTThe use of technology and internet has grown rapidly that causes a lot of forgery and illegal proliferation of digital data. It needs a technology that can protect the copyright of multimedia data such as audio. The most common technique in copyright protection is watermarking because it has three main criteria in data security: robustness, imperceptibility, and safety. This research created a scheme that can protect a copyright of audio data. The method that we used is Fast Fourier Transform. This method changes the original audio data into frequency domain before the embedding and extraction process. The watermark is spread into the most significant component of the magnitude spectrum of audio host. This technique obtains Signal-to-Noise Ratio above 20 dB and Bit Error Rate below 5%.Keywords: Audio watermarking, Copyright Protection, Fast Fourier Transform, Magnitude spectrum
APA, Harvard, Vancouver, ISO, and other styles
37

Fejfar, Jiří, Jiří Šťastný, Martin Pokorný, Jiří Balej, and Petr Zach. "Analysis of sound data streamed over the network." Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis 61, no. 7 (2013): 2105–10. http://dx.doi.org/10.11118/actaun201361072105.

Full text
Abstract:
In this paper we inspect a difference between original sound recording and signal captured after streaming this original recording over a network loaded with a heavy traffic. There are several kinds of failures occurring in the captured recording caused by network congestion. We try to find a method how to evaluate correctness of streamed audio. Usually there are metrics based on a human perception of a signal such as “signal is clear, without audible failures”, “signal is having some failures but it is understandable”, or “signal is inarticulate”. These approaches need to be statistically evaluated on a broad set of respondents, which is time and resource consuming. We try to propose some metrics based on signal properties allowing us to compare the original and captured recording. We use algorithm called Dynamic Time Warping (Müller, 2007) commonly used for time series comparison in this paper. Some other time series exploration approaches can be found in (Fejfar, 2011) and (Fejfar, 2012). The data was acquired in our network laboratory simulating network traffic by downloading files, streaming audio and video simultaneously. Our former experiment inspected Quality of Service (QoS) and its impact on failures of received audio data stream. This experiment is focused on the comparison of sound recordings rather than network mechanism.We focus, in this paper, on a real time audio stream such as a telephone call, where it is not possible to stream audio in advance to a “pool”. Instead it is necessary to achieve as small delay as possible (between speaker voice recording and listener voice replay). We are using RTP protocol for streaming audio.
APA, Harvard, Vancouver, ISO, and other styles
38

Shen, Jiaxing, Jiannong Cao, Oren Lederman, Shaojie Tang, and Alex “Sandy” Pentland. "User Profiling Based on Nonlinguistic Audio Data." ACM Transactions on Information Systems 40, no. 1 (January 31, 2022): 1–23. http://dx.doi.org/10.1145/3474826.

Full text
Abstract:
User profiling refers to inferring people’s attributes of interest ( AoIs ) like gender and occupation, which enables various applications ranging from personalized services to collective analyses. Massive nonlinguistic audio data brings a novel opportunity for user profiling due to the prevalence of studying spontaneous face-to-face communication. Nonlinguistic audio is coarse-grained audio data without linguistic content. It is collected due to privacy concerns in private situations like doctor-patient dialogues. The opportunity facilitates optimized organizational management and personalized healthcare, especially for chronic diseases. In this article, we are the first to build a user profiling system to infer gender and personality based on nonlinguistic audio. Instead of linguistic or acoustic features that are unable to extract, we focus on conversational features that could reflect AoIs. We firstly develop an adaptive voice activity detection algorithm that could address individual differences in voice and false-positive voice activities caused by people nearby. Secondly, we propose a gender-assisted multi-task learning method to combat dynamics in human behavior by integrating gender differences and the correlation of personality traits. According to the experimental evaluation of 100 people in 273 meetings, we achieved 0.759 and 0.652 in F1-score for gender identification and personality recognition, respectively.
APA, Harvard, Vancouver, ISO, and other styles
39

Chen, Ke, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, and Shlomo Dubnov. "Zero-Shot Audio Source Separation through Query-Based Learning from Weakly-Labeled Data." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 4441–49. http://dx.doi.org/10.1609/aaai.v36i4.20366.

Full text
Abstract:
Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.
APA, Harvard, Vancouver, ISO, and other styles
40

Widyastuti, Nadia. "PENERAPAN MEDIA AUDIO VISUAL DALAM PEMBELAJARAN BAHASA INGGRIS KELAS VII DI SMPN 1 SYAMTALIRA BAYU ACEH UTARA." Hudan Lin Naas: Jurnal Ilmu Sosial dan Humaniora 3, no. 2 (December 14, 2022): 59. http://dx.doi.org/10.28944/hudanlinnaas.v3i2.690.

Full text
Abstract:
Penelitian ini membahas tentang penerapan media audio visual dalam mata pelajaran Bahasa inggris, hal ini dilatar belakangi oleh dunia Pendidikan saat ini yang terus mengalami kemajuan dalam teknik penerapannya salah satunya menggunakan media audio visual sebagai media pembelajaran yaitu menggunakan video animasi Bahasa inggris ,film pendek serta musik dengan teks Bahasa Inggris. Penelitian ini bertujuan untuk mendeskripsikan penerapan media audio visual dalam pembelajaran Bahasa inggris di kelas VII dan memaparkan faktor pendukung dan penghambat dalam penerapAan media audio visual pada pembelajaran. Penelitian ini menggunakan penelitian kualitattif deskriptif. Subjek penelitian ini adalah guru kelas VII. Teknik pengumpulan data yang dilakukan adalah wawancara melalui daring atau via whatsapp. Hasil penelitian menunjukkkan bahwa pelaksaan pembelajaran Bahasa inggris dengan menggunakan audio visual berjalan sesuai dengan RPP yang sudah dibuat oleh guru, dan hasil yang didapatkan media audio visual berdampak pada hasil nilai belajar yang semakin membaik sebelum dan sesudah menggunakan audia visual.penggunaan media pembelajaran ini merupakan media yang tepat untuk digunakan.
APA, Harvard, Vancouver, ISO, and other styles
41

MATSUO, Yuichi, and Kazuyo SUEMATSU. "Audio-Visual Technique of Numerical Simulation Data." Journal of the Visualization Society of Japan 20, no. 78 (2000): 197–202. http://dx.doi.org/10.3154/jvs.20.197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

AliSabir, Firas. "Hiding Encrypted Data in Audio Wave File." International Journal of Computer Applications 91, no. 4 (April 18, 2014): 6–9. http://dx.doi.org/10.5120/15867-4809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Mudusu, Rambabu, A. Nagesh, and M. Sadanandam. "Enhancing Data Security Using Audio-Video Steganography." International Journal of Engineering & Technology 7, no. 2.20 (April 18, 2018): 276. http://dx.doi.org/10.14419/ijet.v7i2.20.14777.

Full text
Abstract:
Steganography may be a strategy for concealing any mystery information like content, picture, sound behind distinctive cowl document .In this paper we have a tendency to planned the mix of image steganography associated sound steganography with confront acknowledgment innovation as an instrument for verification. The purpose is to shroud the mystery information behind sound and therefore the beneficiary's face image of video, because it may be a use of various still casings of images and sound. During this technique we've chosen any casing of video to shroud beneficiary's face image and sound to hide the mystery data. Affordable calculation, maybe, increased LSB and RSA rule is been utilised to shroud mystery content and movie. PCA rule is employed for confront acknowledgment. The parameter for security and verification ar gotten at collector and transmitter facet that ar exactly indistinguishable, consequently the knowledge security is dilated.
APA, Harvard, Vancouver, ISO, and other styles
44

H. Kridalaksana, Awang, Andi Yushika Rangan, and Asfami Ansharie. "ENKRIPSI DATA AUDIO MENGGUNAKAN METODE KRIPTOGRAFI RSA." Sebatik 17, no. 1 (January 1, 2017): 6–10. http://dx.doi.org/10.46984/sebatik.v17i1.79.

Full text
Abstract:
Penerapan Metode RSA pada Enkripsi Data Audio, merupakan bentuk penelitian untuk membuktikan bahwa Metode Kriptografi dapat digunakan untuk pencarian solusi, khususnya pada permasalahan kerahasian data. Tujuan dari penelitian ini adalah merancang dan membangun sebuah aplikasi yang dapat menyelesaikan masalah enkripsi data untuk merahasiakan sebuah data dengan menggunakan dua kunci yaitu, proses enkripsi dengan menggunakan kunci public dan kunci Private digunakaan untuk melakukan proses dekripsinya, dengan menggunakan bahasa pemrograman Visual Basic .NET. Dalam penelitian ini, teknik pengumpulan data yang digunakan adalah studi pustaka. Metode pengujian yang digunakan adalah metode pengujian White-Box yang digunakan untuk menguji listing Coding Proses enkirpsi dan dekripsinya, Black Box digunakaan untuk menguji apakah aplikasi berjalan dengan algoritma kunci yang sesuai, menguji daya tahan hasil enkripsi data apakah bisa di enkripsi dengan metode kriptografi lainnya. Dengan menggunakan tahapan pengembangan Prototype yaitu Tahapan Perancangan Antarmuka, Implementasi, Pengujian Sistem, agar dalam membangun Aplkasi Enkripsi Data Audio menggunakan Kriptografi RSA dengan terstruktur. Aplikasi ini dapat menjadi salah satu media alternatif untuk Keamanan data.
APA, Harvard, Vancouver, ISO, and other styles
45

Toyama, Akira. "Apparatus for reproducing digital audio waveform data." Journal of the Acoustical Society of America 103, no. 1 (January 1998): 17. http://dx.doi.org/10.1121/1.423136.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Levine, Scott N. "A malleable audio representation for data compression." Journal of the Acoustical Society of America 107, no. 5 (May 2000): 2875. http://dx.doi.org/10.1121/1.428679.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Warner, Paul. "System for transmitting data simultaneously with audio." Journal of the Acoustical Society of America 81, no. 1 (January 1987): 212. http://dx.doi.org/10.1121/1.394926.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Colasito, Marco, Jeremy Straub, and Pratap Kotala. "Correlated lip motion and voice audio data." Data in Brief 21 (December 2018): 856–60. http://dx.doi.org/10.1016/j.dib.2018.10.043.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Sophiya, E., and S. Jothilakshmi. "Large scale data based audio scene classification." International Journal of Speech Technology 21, no. 4 (September 4, 2018): 825–36. http://dx.doi.org/10.1007/s10772-018-9552-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Alhassan, Salamudeen, Mohammed Muniru Iddrisu, and Mohammed Ibrahim Daabo. "Securing audio data using K-shuffle technique." Multimedia Tools and Applications 78, no. 23 (October 10, 2019): 33985–97. http://dx.doi.org/10.1007/s11042-019-08151-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography