Статті в журналах з теми "ASVspoof"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: ASVspoof.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-42 статей у журналах для дослідження на тему "ASVspoof".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Nafees, Muhammad, Abid Rauf, and Rabbia Mahum. "Automatic Spoofing Detection Using Deep Learning." Global Social Sciences Review IX, no. I (March 30, 2024): 111–333. http://dx.doi.org/10.31703/gssr.2024(ix-i).11.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deep fakes stand out to be the most dangerous side effects of Artificial Intelligence. AI assists to produce voice cloning of any entity which is very arduous to categorize whether it’s fake or real. The aim of the research is to impart a spoofing detection system to an automatic speaker verification (ASV) system that can perceive false voices efficiently. The goal is to perceive the unapparent audio elements with maximum precision and to develop a model that is proficient in automatically extracting audio features by utilizing the ASVspoof 2019 dataset. Hence, the proposed ML-DL SafetyNet model is designed that delicately differentiate ASVspoof 2019 dataset voice speeches into fake or bonafide. ASVspoof 2019 dataset is characterized into two segments LA and PA. The ML-DL SafetyNet model is centred on two unique processes; deep learning and machine learning classifiers. Both techniques executed strong performance by achieving an accuracy of 90%.
2

Zhang, Jiachen, Guoqing Tu, Shubo Liu, and Zhaohui Cai. "Audio Anti-Spoofing Based on Audio Feature Fusion." Algorithms 16, no. 7 (June 28, 2023): 317. http://dx.doi.org/10.3390/a16070317.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The rapid development of speech synthesis technology has significantly improved the naturalness and human-likeness of synthetic speech. As the technical barriers for speech synthesis are rapidly lowering, the number of illegal activities such as fraud and extortion is increasing, posing a significant threat to authentication systems, such as automatic speaker verification. This paper proposes an end-to-end speech synthesis detection model based on audio feature fusion in response to the constantly evolving synthesis techniques and to improve the accuracy of detecting synthetic speech. The model uses a pre-trained wav2vec2 model to extract features from raw waveforms and utilizes an audio feature fusion module for back-end classification. The audio feature fusion module aims to improve the model accuracy by adequately utilizing the audio features extracted from the front end and fusing the information from timeframes and feature dimensions. Data augmentation techniques are also used to enhance the performance generalization of the model. The model is trained on the training and development sets of the logical access (LA) dataset of the ASVspoof 2019 Challenge, an international standard, and is tested on the logical access (LA) and deep-fake (DF) evaluation datasets of the ASVspoof 2021 Challenge. The equal error rate (EER) on ASVspoof 2021 LA and ASVspoof 2021 DF are 1.18% and 2.62%, respectively, achieving the best results on the DF dataset.
3

Faham Ali Zaidi, Syed, and Longting Xu. "Implementation of Multiple Feature Selection Algorithms for Speech Spoofing Detection." Journal of Physics: Conference Series 2224, no. 1 (April 1, 2022): 012119. http://dx.doi.org/10.1088/1742-6596/2224/1/012119.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract The ASVspoof challenge sequences were proposed to lead the research in anti-spoofing to a new level for automatic speaker verification (ASV). It’s verified that constant Q cepstral coefficients (CQCC) processes speech in variable frequencies with adjustable resolution and outperforms the other generally used features and Linear Frequency Cepstral Coefficient (LFCC) is used in high-frequency areas. The feature selection algorithm is offered to decrease computational complexity and overfitting for spoofed utterance detection. Precisely, there’s a demand for feature selection algorithms that are computationally effective and sensitive to feature interactions so that useful features aren’t falsely excluded during the ranking process. We experiment on the ASVspoof 2019 challenge for the assessment of spoofing countermeasures. After the evaluation of our given algorithms and data gives us an equal error rate (EER) and tandem discovery cost function (t-DCF) values. Experimental results on ASVspoof 2019 physical access referring to multiple feature selection approaches show a breakthrough compared to the baseline.
4

Yang, Jichen, Qianhua He, Yongjian Hu, and Weiqiang Pan. "CBC-Based Synthetic Speech Detection." International Journal of Digital Crime and Forensics 11, no. 2 (April 2019): 63–74. http://dx.doi.org/10.4018/ijdcf.2019040105.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In previous studies of synthetic speech detection (SSD), the most widely used features are based on a linear power spectrum. Different from conventional methods, this article proposes a new feature extraction method for SSD from octave power spectrum which is obtained from constant-Q transform (CQT). By combining CQT, block transform (BT) and discrete cosine transform (DCT), a new feature is obtained, namely, constant-Q block coefficients (CBC). In which, CQT is used to transform speech from the time domain into the frequency domain, BT is used to segment octave power spectrum into many blocks and DCT is used to extract principal information of every block. The experimental results on ASVspoof 2015 corpus shows that CBC is superior to other front-ends features that have been benchmarked on ASVspoof 2015 evaluation set in terms of equal error rate (EER).
5

Wu, Zhizheng, Junichi Yamagishi, Tomi Kinnunen, Cemal Hanilci, Mohammed Sahidullah, Aleksandr Sizov, Nicholas Evans, Massimiliano Todisco, and Hector Delgado. "ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge." IEEE Journal of Selected Topics in Signal Processing 11, no. 4 (June 2017): 588–604. http://dx.doi.org/10.1109/jstsp.2017.2671435.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Phapatanaburi, Khomdet, Prawit Buayai, Watcharaphon Naktong, and Jakkree Srinonchat. "Exploiting Magnitude and Phase Aware Deep Neural Network for Replay Attack Detection." ECTI Transactions on Electrical Engineering, Electronics, and Communications 18, no. 2 (August 31, 2020): 89–97. http://dx.doi.org/10.37936/ecti-eec.2020182.240341.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Magnitude and phase aware deep neural network (MP aware DNN) based on Fast Fourier Transform information, has recently been received more attention to many speech applications. However, little attention has been paid to its aspect in terms of replay attack detection developed for the automatic speaker verification and countermeasures (ASVspoof 2017). This paper aims to investigate the MP aware DNN as a speech classification for detecting non-replayed (genuine) and replayed speech. Also, to exploit the advantage of the classifier-based complementary to improve the reliable detection decision, we propose a novel method by combining MP aware DNN with standard replay attack detection (that is, the use of constant Q transform cepstral coefficients-based Gaussian mixture model classification: CQCC-based GMM). Experiments are evaluated using ASVspoof 2017 and a standard measure of detection performance called equal error rate (EER). The results showed that MP aware DNN -based detection performed conventional DNN method using only the magnitude/phase features. Moreover, we found that score combination of CQCC-based GMM with MP aware DNN achieved additional improvement, indicating that MP aware DNN is very useful, especially when combined with the CQCC-based GMM for replay attack detection.
7

Tan, Choon Beng, Mohd Hanafi Ahmad Hijazi, Frazier Kok, Mohd Saberi Mohamad, and Puteri Nor Ellyza Nohuddin. "Artificial speech detection using image-based features and random forest classifier." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 1 (March 1, 2022): 161. http://dx.doi.org/10.11591/ijai.v11.i1.pp161-172.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The ASVspoof 2015 Challenge was one of the efforts of the research community in the field of speech processing to foster the development of generalized countermeasures against spoofing attacks. However, most countermeasures submitted to the ASVspoof 2015 Challenge failed to detect the S10 attack effectively, the only attack that was generated using the waveform concatenation approach. Hence, more informative features are needed to detect previously unseen spoofing attacks. This paper presents an approach that uses data transformation techniques to engineer image-based features together with random forest classifier to detect artificial speech. The objectives are two-fold: (i) to extract image-based features from the melfrequency cepstral coefficients representation of the speech signal and (ii) to compare the performance of using the extracted features and Random Forest to determine the authenticity of voices with the existing approaches. An audio-to-image transformation technique was used to engineer new features in classifying genuine and spoof voices. An experiment was conducted to find the appropriate combination of the engineered features and classifier. Experimental results showed that the proposed approach was able to detect speech synthesis and voice conversion attacks effectively, with an equal error rate of 0.10% and accuracy of 99.93%.
8

Hu, Chenlei, Ruohua Zhou, and Qingsheng Yuan. "Replay Speech Detection Based on Dual-Input Hierarchical Fusion Network." Applied Sciences 13, no. 9 (April 25, 2023): 5350. http://dx.doi.org/10.3390/app13095350.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech anti-spoofing is a crucial aspect of speaker recognition systems and has received a great deal of attention in recent years. Deep neural networks have achieved satisfactory results in datasets with similar training and testing data distributions, but their generalization ability is limited in datasets with different distributions. In this paper, we proposed a novel dual-input hierarchical fusion network (HFN) to improve the generalization ability of our model. The network had two inputs (the original speech signal and the time-reversed signal), which increased the volume and diversity of the training data. The hierarchical fusion model (HFM) enabled more thorough fusion of information from different input levels and improved model performance by fusing the two inputs after speech feature extraction. We finally evaluated the results using the ASVspoof 2021 PA (Physical Access) dataset, and the proposed system achieved an Equal Error Rate (EER) of 24.46% and a minimum tandem Detection Cost Function (min t-DCF) of 0.6708 in the test set. Compared with the four baseline systems in the ASVspoof 2021 competition, the proposed system min t-DCF values were decreased by 28.9%, 31.0%, 32.6%, and 32.9%, and the EERs were decreased by 35.7%, 38.1%, 45.4%, and 49.7%, respectively.
9

Adiban, Mohammad, Hossein Sameti, and Saeedreza Shehnepoor. "Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge." Computer Speech & Language 64 (November 2020): 101105. http://dx.doi.org/10.1016/j.csl.2020.101105.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Nautsch, Andreas, Xin Wang, Nicholas Evans, Tomi H. Kinnunen, Ville Vestman, Massimiliano Todisco, Hector Delgado, Md Sahidullah, Junichi Yamagishi, and Kong Aik Lee. "ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech." IEEE Transactions on Biometrics, Behavior, and Identity Science 3, no. 2 (April 2021): 252–65. http://dx.doi.org/10.1109/tbiom.2021.3059479.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Wang, Xin, Junichi Yamagishi, Massimiliano Todisco, Héctor Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, et al. "ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech." Computer Speech & Language 64 (November 2020): 101114. http://dx.doi.org/10.1016/j.csl.2020.101114.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Chettri, Bhusan, Emmanouil Benetos, and Bob L. T. Sturm. "Dataset Artefacts in Anti-Spoofing Systems: A Case Study on the ASVspoof 2017 Benchmark." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 3018–28. http://dx.doi.org/10.1109/taslp.2020.3036777.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Hernández-Nava, Carlos Alberto, Eric Alfredo Rincón-García, Pedro Lara-Velázquez, Sergio Gerardo de-los-Cobos-Silva, Miguel Angel Gutiérrez-Andrade, and Roman Anselmo Mora-Gutiérrez. "Voice spoofing detection using a neural networks assembly considering spectrograms and mel frequency cepstral coefficients." PeerJ Computer Science 9 (December 18, 2023): e1740. http://dx.doi.org/10.7717/peerj-cs.1740.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nowadays, biometric authentication has gained relevance due to the technological advances that have allowed its inclusion in many daily-use devices. However, this same advantage has also brought dangers, as spoofing attacks are now more common. This work addresses the vulnerabilities of automatic speaker verification authentication systems, which are prone to attacks arising from new techniques for the generation of spoofed audio. In this article, we present a countermeasure for these attacks using an approach that includes easy to implement feature extractors such as spectrograms and mel frequency cepstral coefficients, as well as a modular architecture based on deep neural networks. Finally, we evaluate our proposal using the well-know ASVspoof 2017 V2 database, the experiments show that using the final architecture the best performance is obtained, achieving an equal error rate of 6.66% on the evaluation set.
14

Altuwayjiri, Sarah Mohammed, Ouiem Bchir, and Mohamed Maher Ben Ismail. "Generalized Replay Spoofing Countermeasure Based on Combining Local Subclassification Models." Applied Sciences 12, no. 22 (November 18, 2022): 11742. http://dx.doi.org/10.3390/app122211742.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Automatic speaker verification (ASV) systems play a prominent role in the security field due to the usability of voice biometrics compared to alternative biometric authentication modalities. Nevertheless, ASV systems are susceptible to malicious voice spoofing attacks. In response to such threats, countermeasures have been devised to prevent breaches and ensure the safety of user data by categorizing utterances as either genuine or spoofed. In this paper, we propose a new voice spoofing countermeasure that seeks to improve the generalization of supervised learning models. This is accomplished by alleviating the problem of intraclass variance. Specifically, the proposed approach addresses the generalization challenge by splitting the classification problem into a set of local subproblems in order to lessen the supervised learning task. The system outperformed existing state-of-the-art approaches with an EER of 0.097% on the ASVspoof challenge corpora related to replaying spoofing attacks.
15

Aydın, Barış, and Gökay Dişken. "INCREASING ROBUSTNESS OF I-VECTORS VIA MASKING: A CASE STUDY IN SYNTHETIC SPEECH DETECTION." Uludağ University Journal of The Faculty of Engineering 29, no. 1 (March 28, 2024): 191–204. http://dx.doi.org/10.17482/uumfd.1311113.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Ensuring security in speaker recognition systems is crucial. In the past years, it has been demonstrated that spoofing attacks can fool these systems. In order to deal with this issue, spoof speech detection systems have been developed. While these systems have served with a good performance, their effectiveness tends to degrade under noise. Traditional speech enhancement methods are not efficient for improving performance, they even make it worse. In this research paper, performance of the noise mask obtained via a convolutional neural network structure for reducing the noise effects was investigated. The mask is used to suppress noisy regions of spectrograms in order to extract robust i-vectors. The proposed system is tested on the ASVspoof 2015 database with three different noise types and accomplished superior performance compared to the traditional systems. However, there is a loss of performance in noise types that are not encountered during training phase.
16

Kang, Yeajun, Wonwoong Kim, Sejin Lim, Hyunji Kim, and Hwajeong Seo. "DeepDetection: Privacy-Enhanced Deep Voice Detection and User Authentication for Preventing Voice Phishing." Applied Sciences 12, no. 21 (November 2, 2022): 11109. http://dx.doi.org/10.3390/app122111109.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The deep voice detection technology currently being researched causes personal information leakage because the input voice data are stored in the detection server. To overcome this problem, in this paper, we propose a novel system (i.e., DeepDetection) that can detect deep voices and authenticate users without exposing voice data to the server. Voice phishing prevention is achieved in two-way approaches by performing primary verification through deep voice detection and secondary verification of whether the sender is the correct sender through user authentication. Since voice preprocessing is performed on the user local device, voice data are not stored on the detection server. Thus, we can overcome the security vulnerabilities of the existing detection research. We used ASVspoof 2019 and achieved an F1-score of 100% in deep voice detection and an F1 score of 99.05% in user authentication. Additionally, the average EER for user authentication achieved was 0.15. Therefore, this work can be effectively used to prevent deep voice-based phishing.
17

Jiang, Yi, and Dengpan Ye. "Black-Box Adversarial Attacks against Audio Forensics Models." Security and Communication Networks 2022 (January 17, 2022): 1–8. http://dx.doi.org/10.1155/2022/6410478.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Speech synthesis technology has made great progress in recent years and is widely used in the Internet of things, but it also brings the risk of being abused by criminals. Therefore, a series of researches on audio forensics models have arisen to reduce or eliminate these negative effects. In this paper, we propose a black-box adversarial attack method that only relies on output scores of audio forensics models. To improve the transferability of adversarial attacks, we utilize the ensemble-model method. A defense method is also designed against our proposed attack method under the view of the huge threat of adversarial examples to audio forensics models. Our experimental results on 4 forensics models trained on the LA part of the ASVspoof 2019 dataset show that our attacks can get a 99 % attack success rate on score-only black-box models, which is competitive to the best of white-box attacks, and 60 % attack success rate on decision-only black-box models. Finally, our defense method reduces the attack success rate to 16 % and guarantees 98 % detection accuracy of forensics models.
18

Li, Jiakang, Xiongwei Zhang, Meng Sun, Xia Zou, and Changyan Zheng. "Attention-Based LSTM Algorithm for Audio Replay Detection in Noisy Environments." Applied Sciences 9, no. 8 (April 13, 2019): 1539. http://dx.doi.org/10.3390/app9081539.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Even though audio replay detection has improved in recent years, its performance is known to severely deteriorate with the existence of strong background noises. Given the fact that different frames of an utterance have different impacts on the performance of spoofing detection, this paper introduces attention-based long short-term memory (LSTM) to extract representative frames for spoofing detection in noisy environments. With this attention mechanism, the specific and representative frame-level features will be automatically selected by adjusting their weights in the framework of attention-based LSTM. The experiments, conducted using the ASVspoof 2017 dataset version 2.0, show that the equal error rate (EER) of the proposed approach was about 13% lower than the constant Q cepstral coefficients-Gaussian mixture model (CQCC-GMM) baseline in noisy environments with four different signal-to-noise ratios (SNR). Meanwhile, the proposed algorithm also improved the performance of traditional LSTM on audio replay detection systems in noisy environments. Experiments using bagging with different frame lengths were also conducted to further improve the proposed approach.
19

Zhang, Jinghong, Xiaowei Yi, and Xianfeng Zhao. "One-Class Fake Speech Detection Based on Improved Support Vector Data Description." Security and Communication Networks 2023 (October 4, 2023): 1–10. http://dx.doi.org/10.1155/2023/8830894.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
With the development of deep neural synthesis methods, speech forgery techniques based on text-to-speech (TTS) and voice conversion (VC) pose a serious threat to auto speaker verification (ASV) systems. Some studies show that the attack success rate of deep synthetic speech on ASV systems can reach about 90%. Existing detection methods improve the detection generalization of known forgery methods by a lot of training data, but the detection effect and robustness against unknown methods are poor. We propose an anti-spoofing scheme based on one-class classification for detecting unknown synthetic. We implement deep support data description to capture the feature of bonafide speech. An autoencoder structure is introduced to enhance the detection performance. The proposed method is only trained on native speech, which reduces reliance on large amounts of fake speech. Our method achieves an equal error rate of 8.10% on the evaluation set of ASVspoof 2019 challenge and outperforms other state-of-the-art methods. In the generalization test, the proposed method can reach the equal error rate of 15% on “In-the-wild” dataset and 23% on FoR dataset, which is lower than that of other advanced algorithms.
20

Guo, Jinlin, Yancheng Zhao, and Haoran Wang. "Generalized Spoof Detection and Incremental Algorithm Recognition for Voice Spoofing." Applied Sciences 13, no. 13 (June 30, 2023): 7773. http://dx.doi.org/10.3390/app13137773.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Highly deceptive deepfake technologies have caused much controversy, e.g., artificial intelligence-based software can automatically generate nude photos and deepfake images of anyone. This brings considerable threats to both individuals and society. In addition to video and image forgery, audio forgery poses many hazards but lacks sufficient attention. Furthermore, existing works have only focused on voice spoof detection, neglecting the identification of spoof algorithms. It is of great value to recognize the algorithm for synthesizing spoofing voices in traceability. This study presents a system combining voice spoof detection and algorithm recognition. In contrast, the generalizability of the spoof detection model is discussed from the perspective of embedding space and decision boundaries to face the voice spoofing attacks generated by spoof algorithms that are not available in the training set. This study presents a method for voice spoof algorithms recognition based on incremental learning, taking into account data flow scenarios where new spoof algorithms keep appearing in reality. Our experimental results on the LA dataset of ASVspoof show that our system can improve the generalization of spoof detection and identify new voice spoof algorithms without catastrophic forgetting.
21

Yanagi, Yuta, Ryohei Orihara, Yasuyuki Tahara, Yuichi Sei, Tanel Alumäe, and Akihiko Ohsuga. "The Proposal of Countermeasures for DeepFake Voices on Social Media Considering Waveform and Text Embedding." Annals of Emerging Technologies in Computing 8, no. 2 (April 1, 2024): 15–31. http://dx.doi.org/10.33166/aetic.2024.02.002.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In recent times, advancements in text-to-speech technologies have yielded more natural-sounding voices. However, this has also made it easier to generate malicious fake voices and disseminate false narratives. ASVspoof stands out as a prominent benchmark in the ongoing effort to automatically detect fake voices, thereby playing a crucial role in countering illicit access to biometric systems. Consequently, there is a growing need to broaden our perspectives, particularly when it comes to detecting fake voices on social media platforms. Moreover, existing detection models commonly face challenges related to their generalization performance. This study sheds light on specific instances involving the latest speech generation models. Furthermore, we introduce a novel framework designed to address the nuances of detecting fake voices in the context of social media. This framework considers not only the voice waveform but also the speech content. Our experiments have demonstrated that the proposed framework considerably enhances classification performance, as evidenced by the reduction in equal error rate. This underscores the importance of considering the waveform and the content of the voice when tasked with identifying fake voices and disseminating false claims.
22

Tan, Choon Beng, and Mohd Hanafi Ahmad Hijazi. "A Comparative Evaluation on Data Transformation Approach for Artificial Speech Detection." ITM Web of Conferences 63 (2024): 01012. http://dx.doi.org/10.1051/itmconf/20246301012.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The rise of voice biometrics has transformed user authentication and offered enhanced security and convenience while phasing out less secure methods. Despite these advancements, Automatic Speaker Verification (ASV) systems remain vulnerable to spoofing, particularly with artificial speech generated swiftly using advanced speech synthesis and voice conversion algorithms. A recent data transformation technique achieved an impressive Equal Error Rate (EER) of 1.42% on the ASVspoof 2019 Logical Access Dataset. While this approach predominantly relies on Support Vector Machine (SVM) as the backend classifier for artificial speech detection, it is vital to explore a broader range of classifiers to enhance resilience. This paper addresses this research gap by systematically assessing classifier efficacy in artificial speech detection. The objectives are twofold: first, to evaluate various classifiers, not limited to SVM, and identify those best suited for artificial speech detection; second, to compare this approach's performance with existing methods. The evaluation demonstrated SVM-Polynomial as the top-performing classifier, surpassing the end-to-end learning approach. This work contributes to a deeper understanding of classifier efficacy and equips researchers and practitioners with a diversified toolkit for building robust ASV spoofing detection systems.
23

Gada, Amay, Neel Kothari, Ruhina Karani, Chetashri Badane, Dhruv Gada, and Tanish Patwa. "DR-SASV: A deep and reliable spoof aware speech verification system." International Journal on Information Technologies and Security 15, no. 4 (December 1, 2023): 93–106. http://dx.doi.org/10.59035/ffmb8272.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A spoof-aware speaker verification system is an integrated system that is capable of jointly identifying impostor speakers as well as spoofing attacks from target speakers. This type of system largely helps in protecting sensitive data, mitigating fraud, and reducing theft. Research has recently enhanced the effectiveness of countermeasure systems and automatic speaker verification systems separately to produce low Equal Error Rates (EER) for each system. However, work exploring a combination of both is still scarce. This paper proposes an end-to-end solution to address spoof-aware automatic speaker verification (ASV) by introducing a Deep Reliable Spoof-Aware-Speaker-Verification (DR-SASV) system. The proposed system allows the target audio to pass through a “spoof aware” speaker verification model sequentially after applying a convolutional neural network (CNN)-based spoof detection model. The suggested system produces encouraging results after being trained on the ASVSpoof 2019 LA dataset. The spoof detection model gives a validation accuracy of 96%, while the transformer-based speech verification model authenticates users with an error rate of 13.74%. The system surpasses other state-of-the-art models and produces an EER score of 10.32%.
24

Shim, Hye-jin, Jee-weon Jung, Ju-ho Kim, and Ha-jin Yu. "Integrated Replay Spoofing-Aware Text-Independent Speaker Verification." Applied Sciences 10, no. 18 (September 10, 2020): 6292. http://dx.doi.org/10.3390/app10186292.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A number of studies have successfully developed speaker verification or presentation attack detection systems. However, studies integrating the two tasks remain in the preliminary stages. In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning using a common feature. However, through experiments, we hypothesize that the information required for performing speaker verification and presentation attack detection might differ because speaker verification systems try to remove device-specific information from speaker embeddings, while presentation attack detection systems exploit such information. Therefore, we propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection. This approach has thee input components: two speaker embeddings (for enrollment and test each) and prediction of presentation attacks. Experiments are conducted using the ASVspoof 2017-v2 dataset, which includes official trials on the integration of speaker verification and presentation attack detection. The proposed back-end approach demonstrates a relative improvement of 21.77% in terms of the equal error rate for integrated trials compared to a conventional speaker verification system.
25

Евсюков, Михаил Витальевич, Михаил Михайлович Путято, Александр Самвелович Макарян та Александр Николаевич Черкасов. "Оценка точности субъектозависимого подхода к обнаружению синтезированного голоса". Вестник ВГУ. Серия: Системный анализ и информационные технологии, № 1 (28 травня 2024): 77–93. http://dx.doi.org/10.17308/sait/1995-5499/2024/1/77-93.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Современные методы распознавания личности по голосу демонстрируют высокую точность при обработке подлинного человеческого голоса, однако их главным недостатком является уязвимость к спуфингу. Основной тенденцией, присущей современным исследованиям методов обнаружения спуфинга систем распознавания личности по голосу, является доминирование субъектонезависимых систем. Несмотря на это, существуют исследования, свидетельствующие о перспективности применения субъектозависимого подхода к обнаружению спуфинга. Тем не менее, эффективность его использования ранее не была изучена применительно к обнаружению синтезированного голоса. Цель данного исследования — сравнить точность, которую демонстрируют субъектозависимая и субъектонезависимая системы обнаружения синтезированного голоса, использующие одинаковые алгоритмы извлечения голосовых признаков и модели машинного обучения. Кроме того, мы оцениваем влияние способа обучения субъектозависимых моделей, а также доступного количества обучающих данных диктора, на точность обнаружения синтезированного голоса. В качестве набора данных использовался LA-раздел датасета ASVspoof 2019. В качестве объекта экспериментов использовалась система обнаружения спуфинга LFCC-GMM. Для оценки точности обнаружения синтезированного голоса мы использовали такой критерий как процент равных ошибок (EER). В результате исследования мы выяснили, что использование субъектозависимых моделей подлинных данных позволяет существенно повысить точность обнаружения синтезированного голоса без изменения используемых алгоритмов извлечения голосовых признаков и моделей машинного обучения. Кроме того, увеличение объёма данных, используемых для адаптации или обучения субъектозависимой модели подлинных данных, проявило себя как эффективный способ повышения точности обнаружения синтезированного голоса. Применение субъектозависимой модели подлинных данных, обученной на 90 записях диктора, позволило уменьшить процент равных ошибок с 16.86 % до 9.71 %, по сравнению с субъектонезависимой системой.
26

Gomez-Alanis, Alejandro, Jose A. Gonzalez-Lopez, and Antonio M. Peinado. "GANBA: Generative Adversarial Network for Biometric Anti-Spoofing." Applied Sciences 12, no. 3 (January 29, 2022): 1454. http://dx.doi.org/10.3390/app12031454.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Automatic speaker verification (ASV) is a voice biometric technology whose security might be compromised by spoofing attacks. To increase the robustness against spoofing attacks, presentation attack detection (PAD) or anti-spoofing systems for detecting replay, text-to-speech and voice conversion-based spoofing attacks are being developed. However, it was recently shown that adversarial spoofing attacks may seriously fool anti-spoofing systems. Moreover, the robustness of the whole biometric system (ASV + PAD) against this new type of attack is completely unexplored. In this work, a new generative adversarial network for biometric anti-spoofing (GANBA) is proposed. GANBA has a twofold basis: (1) it jointly employs the anti-spoofing and ASV losses to yield very damaging adversarial spoofing attacks, and (2) it trains the PAD as a discriminator in order to make them more robust against these types of adversarial attacks. The proposed system is able to generate adversarial spoofing attacks which can fool the complete voice biometric system. Then, the resulting PAD discriminators of the proposed GANBA can be used as a defense technique for detecting both original and adversarial spoofing attacks. The physical access (PA) and logical access (LA) scenarios of the ASVspoof 2019 database were employed to carry out the experiments. The experimental results show that the GANBA attacks are quite effective, outperforming other adversarial techniques when applied in white-box and black-box attack setups. In addition, the resulting PAD discriminators are more robust against both original and adversarial spoofing attacks.
27

Wei, Linqiang, Yanhua Long, Haoran Wei, and Yijie Li. "New Acoustic Features for Synthetic and Replay Spoofing Attack Detection." Symmetry 14, no. 2 (January 29, 2022): 274. http://dx.doi.org/10.3390/sym14020274.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
With the rapid development of intelligent speech technologies, automatic speaker verification (ASV) has become one of the most natural and convenient biometric speaker recognition approaches. However, most state-of-the-art ASV systems are vulnerable to spoofing attack techniques, such as speech synthesis, voice conversion, and replay speech. Due to the symmetry distribution characteristic between the genuine (true) speech and spoof (fake) speech pair, the spoofing attack detection is challenging. Many recent research works have been focusing on the ASV anti-spoofing solutions. This work investigates two types of new acoustic features to improve the performance of spoofing attacks. The first features consist of two cepstral coefficients and one LogSpec feature, which are extracted from the linear prediction (LP) residual signals. The second feature is a harmonic and noise subband ratio feature, which can reflect the interaction movement difference of the vocal tract and glottal airflow of the genuine and spoofing speech. The significance of these new features has been investigated in both the t-stochastic neighborhood embedding space and the binary classification modeling space. Experiments on the ASVspoof 2019 database show that the proposed residual features can achieve from 7% to 51.7% relative equal error rate (EER) reduction on the development and evaluation set over the best single system baseline. Furthermore, more than 31.2% relative EER reduction on both the development and evaluation set shows that the proposed new features contain large information complementary to the source acoustic features.
28

Yoon, Sunghyun, and Ha-Jin Yu. "BPCNN: Bi-Point Input for Convolutional Neural Networks in Speaker Spoofing Detection." Sensors 22, no. 12 (June 14, 2022): 4483. http://dx.doi.org/10.3390/s22124483.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We propose a method, called bi-point input, for convolutional neural networks (CNNs) that handle variable-length input features (e.g., speech utterances). Feeding input features into a CNN in a mini-batch unit requires that all features in each mini-batch have the same shape. A set of variable-length features cannot be directly fed into a CNN because they commonly have different lengths. Feature segmentation is a dominant method for CNNs to handle variable-length features, where each feature is decomposed into fixed-length segments. A CNN receives one segment as an input at one time. However, a CNN can consider only the information of one segment at one time, not the entire feature. This drawback limits the amount of information available at one time and consequently results in suboptimal solutions. Our proposed method alleviates this problem by increasing the amount of information available at one time. With the proposed method, a CNN receives a pair of two segments obtained from a feature as an input at one time. Each of the two segments generally covers different time ranges and therefore has different information. We also propose various combination methods and provide a rough guidance to set a proper segment length without evaluation. We evaluate the proposed method on the spoofing detection tasks using the ASVspoof 2019 database under various conditions. The experimental results reveal that the proposed method reduces the relative equal error rate (EER) by approximately 17.2% and 43.8% on average for the logical access (LA) and physical access (PA) tasks, respectively.
29

Li, Lanting, Tianliang Lu, Xingbang Ma, Mengjiao Yuan, and Da Wan. "Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT." Applied Sciences 13, no. 14 (July 22, 2023): 8488. http://dx.doi.org/10.3390/app13148488.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In recent years, voice deepfake technology has developed rapidly, but current detection methods have the problems of insufficient detection generalization and insufficient feature extraction for unknown attacks. This paper presents a forged speech detection method (HuRawNet2_modified) based on a self-supervised pre-trained model (HuBERT) to improve detection (and address the above problems). A combination of impulsive signal-dependent additive noise and additive white Gaussian noise was adopted for data boosting and augmentation, and the HuBERT model was fine-tuned on different language databases. On this basis, the size of the extracted feature maps was modified independently by the α-feature map scaling (α-FMS) method, with a modified end-to-end method using the RawNet2 model as the backbone structure. The results showed that the HuBERT model could extract features more comprehensively and accurately. The best evaluation indicators were an equal error rate (EER) of 2.89% and a minimum tandem detection cost function (min t-DCF) of 0.2182 on the database of the ASVspoof2021 LA challenge, which verified the effectiveness of the detection method proposed in this paper. Compared with the baseline systems in databases of the ASVspoof 2021 LA challenge and the FMFCC-A, the values of EER and min t-DCF decreased. The results also showed that the self-supervised pre-trained model with fine-tuning can extract acoustic features across languages. And the detection can be slightly improved when the languages of the pre-trained database, and the fine-tuned and tested database are the same.
30

Mewada, Hiren, Jawad F. Al-Asad, Faris A. Almalki, Adil H. Khan, Nouf Abdullah Almujally, Samir El-Nakla, and Qamar Naith. "Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification." Sensors 23, no. 14 (July 24, 2023): 6637. http://dx.doi.org/10.3390/s23146637.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.
31

Chadha, Ankita, Azween Abdullah, and Lorita Angeline. "An improved normalized gain-based score normalization technique for spoof detection algorithm." International journal of electrical and computer engineering systems 13, no. 6 (September 1, 2022): 457–65. http://dx.doi.org/10.32985/ijeces.13.6.5.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A spoof detection algorithm supports the speaker verification system to examine the false claims by an imposter through careful analysis of input test speech. The scores are employed to categorize the genuine and spoofed samples effectively. Under the mismatch conditions, the false acceptance ratio increases and can be reduced by appropriate score normalization techniques. In this article, we are using the normalized Discounted Cumulative Gain (nDCG) norm derived from ranking the speaker’s log-likelihood scores. The proposed scoring technique smoothens the decaying process due to logarithm with an added advantage from the ranking. The baseline spoof detection system employs Constant Q-Cepstral Co-efficient (CQCC) as the base features with a Gaussian Mixture Model (GMM) based classifier. The scores are computed using the ASVspoof 2019 dataset for normalized and without normalization conditions. The baseline techniques including the Zero normalization (Z-norm) and Test normalization (T-norm) are also considered. The proposed technique is found to perform better in terms of improved Equal Error Rate (EER) of 0.35 as against 0.43 for baseline system (no normalization) wrt to synthetic attacks using development data. Similarly, improvements are seen in the case of replay attack with EER of 7.83 for nDCG-norm and 9.87 with no normalization (no-norm). Furthermore, the tandem-Detection Cost Function (t-DCF) scores for synthetic attack are 0.015 for no-norm and 0.010 for proposed normalization. Additionally, for the replay attack the t-DCF scores are 0.195 for no-norm and 0.17 proposed normalization. The system performance is satisfactory when evaluated using evaluation data with EER of 8.96 for nDCG-norm as against 9.57 with no-norm for synthetic attacks while the EER of 9.79 for nDCG-norm as against 11.04 with no-norm for replay attacks. Supporting the EER, the t-DCF for nDCG-norm is 0.1989 and for no-norm is 0.2636 for synthetic attacks; while in case of replay attacks, the t-DCF is 0.2284 for the nDCG-norm and 0.2454 for no-norm. The proposed scoring technique is found to increase spoof detection accuracy and overall accuracy of speaker verification system.
32

Mahum, Rabbia, Aun Irtaza, Ali Javed, Haitham A. Mahmoud, and Haseeb Hassan. "DeepDet: YAMNet with BottleNeck Attention Module (BAM) TTS synthesis detection." EURASIP Journal on Audio, Speech, and Music Processing 2024, no. 1 (April 1, 2024). http://dx.doi.org/10.1186/s13636-024-00335-9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractSpoofed speeches are becoming a big threat to society due to advancements in artificial intelligence techniques. Therefore, there must be an automated spoofing detector that can be integrated into automatic speaker verification (ASV) systems. In this study, we recommend a novel and robust model, named DeepDet, based on deep-layered architecture, to categorize speech into two classes: spoofed and bonafide. DeepDet is an improved model based on Yet Another Mobile Network (YAMNet) employing a customized MobileNet combined with a bottleneck attention module (BAM). First, we convert audio into mel-spectrograms that consist of time–frequency representations on mel-scale. Second, we trained our deep layered model using the extracted mel-spectrograms on a Logical Access (LA) set, including synthesized speeches and voice conversions of the ASVspoof-2019 dataset. In the end, we classified the audios, utilizing our trained binary classifier. More precisely, we utilized the power of layered architecture and guided attention that can discern the spoofed speech from bonafide samples. Our proposed improved model employs depth-wise linearly separate convolutions, which makes our model lighter weight than existing techniques. Furthermore, we implemented extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. We attained an equal error rate (EER) of 0.042% on Logical Access (LA), whereas 0.43% on Physical Access (PA) attacks. Therefore, the performance of the proposed model is significant on the ASVspoof 2019 dataset and indicates the effectiveness of the DeepDet over existing spoofing detectors. Additionally, our proposed model is robust enough that can identify the unseen spoofed audios and classifies the several attacks accurately.
33

Chakravarty, Nidhi, and Mohit Dua. "Data Augmentation and Hybrid Feature Amalgamation to detect Audio Deep Fake attacks." Physica Scripta, July 24, 2023. http://dx.doi.org/10.1088/1402-4896/acea05.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract The ability to distinguish between authentic and fake audio is become increasingly difficult due to the increasing accuracy of text-to-speech models, posing a serious threat to speaker verification systems. Furthermore, audio deepfakes are becoming a more likely source of deception with the development of sophisticated methods for producing synthetic voice. The ASVspoof dataset has recently been used extensively in research on the detection of audio deep fakes, together with a variety of machine and deep learning methods. The proposed work in this paper combines data augmentation techniques with hybrid feature extraction method at front-end. Two variants of audio augmentation method and Synthetic Minority Over Sampling Technique (SMOTE) have been used, which have been combined individually with Mel Frequency Cepstral Coefficients (MFCC), Gammatone Cepstral Coefficients (GTCC) and hybrid these two feature extraction methods for implementing front-end feature extraction. To implement the back-end our proposed work two deep learning models, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), and two Machine Learning (ML) classifier Random Forest (RF) and Support Vector Machine (SVM) have been used. For training, and evaluation ASVspoof 2019 Logical Access (LA) partition, and for testing of the said systems, and ASVspoof 2021 deep fake partition have been used. After analysing the results, it can be observed that combination of MFCC+GTCC with SMOTE at front-end and LSTM at back-end has outperformed all other models with 99% test accuracy, and 1.6 % Equal Error Rate (EER) over deepfake partition. Also, the testing of this best combination has been done on DEepfake CROss-lingual (DECRO) dataset. To access the effectiveness of proposed model under noisy scenarios, we have analysed our best model under noisy condition by adding Babble Noise, Street Noise and Car Noise to test data.
34

Kamble, Madhu R., Hardik B. Sailor, Hemant A. Patil, and Haizhou Li. "Advances in anti-spoofing: from the perspective of ASVspoof challenges." APSIPA Transactions on Signal and Information Processing 9 (2020). http://dx.doi.org/10.1017/atsip.2019.21.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract In recent years, automatic speaker verification (ASV) is used extensively for voice biometrics. This leads to an increased interest to secure these voice biometric systems for real-world applications. The ASV systems are vulnerable to various kinds of spoofing attacks, namely, synthetic speech (SS), voice conversion (VC), replay, twins, and impersonation. This paper provides the literature review of ASV spoof detection, novel acoustic feature representations, deep learning, end-to-end systems, etc. Furthermore, the paper also summaries previous studies of spoofing attacks with emphasis on SS, VC, and replay along with recent efforts to develop countermeasures for spoof speech detection (SSD) task. The limitations and challenges of SSD task are also presented. While several countermeasures were reported in the literature, they are mostly validated on a particular database, furthermore, their performance is far from perfect. The security of voice biometrics systems against spoofing attacks remains a challenging topic. This paper is based on a tutorial presented at APSIPA Annual Summit and Conference 2017 to serve as a quick start for those interested in the topic.
35

Cheng, Xingliang, Mingxing Xu, and Thomas Fang Zheng. "A multi-branch ResNet with discriminative features for detection of replay speech signals." APSIPA Transactions on Signal and Information Processing 9 (2020). http://dx.doi.org/10.1017/atsip.2020.26.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Nowadays, the security of ASV systems is increasingly gaining attention. As one of the common spoofing methods, replay attacks are easy to implement but difficult to detect. Many researchers focus on designing various features to detect the distortion of replay attack attempts. Constant-Q cepstral coefficients (CQCC), based on the magnitude of the constant-Q transform (CQT), is one of the striking features in the field of replay detection. However, it ignores phase information, which may also be distorted in the replay processes. In this work, we propose a CQT-based modified group delay feature (CQTMGD) which can capture the phase information of CQT. Furthermore, a multi-branch residual convolution network, ResNeWt, is proposed to distinguish replay attacks from bonafide attempts. We evaluated our proposal in the ASVspoof 2019 physical access dataset. Results show that CQTMGD outperformed the traditional MGD feature, and the fusion with other magnitude-based and phase-based features achieved a further improvement. Our best fusion system achieved 0.0096 min-tDCF and 0.39% EER on the evaluation set and it outperformed all the other state-of-the-art methods in the ASVspoof 2019 physical access challenge.
36

Liu, Xuechen, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, et al. "ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild." IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 1–14. http://dx.doi.org/10.1109/taslp.2023.3285283.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Gupta, Priyanka, Hemant A. Patil, and Rodrigo Capobianco Guido. "Vulnerability issues in Automatic Speaker Verification (ASV) systems." EURASIP Journal on Audio, Speech, and Music Processing 2024, no. 1 (February 10, 2024). http://dx.doi.org/10.1186/s13636-024-00328-8.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractClaimed identities of speakers can be verified by means of automatic speaker verification (ASV) systems, also known as voice biometric systems. Focusing on security and robustness against spoofing attacks on ASV systems, and observing that the investigation of attacker’s perspectives is capable of leading the way to prevent known and unknown threats to ASV systems, several countermeasures (CMs) have been proposed during ASVspoof 2015, 2017, 2019, and 2021 challenge campaigns that were organized during INTERSPEECH conferences. Furthermore, there is a recent initiative to organize the ASVSpoof 5 challenge with the objective of collecting the massive spoofing/deepfake attack data (i.e., phase 1), and the design of a spoofing-aware ASV system using a single classifier for both ASV and CM, to design integrated CM-ASV solutions (phase 2). To that effect, this paper presents a survey on a diversity of possible strategies and vulnerabilities explored to successfully attack an ASV system, such as target selection, unavailability of global countermeasures to reduce the attacker’s chance to explore the weaknesses, state-of-the-art adversarial attacks based on machine learning, and deepfake generation. This paper also covers the possibility of attacks, such as hardware attacks on ASV systems. Finally, we also discuss the several technological challenges from the attacker’s perspective, which can be exploited to come up with better defence mechanisms for the security of ASV systems.
38

Zhao, Yuanjun, Roberto Togneri, and Victor Sreeram. "Multi-task Learning-Based Spoofing-Robust Automatic Speaker Verification System." Circuits, Systems, and Signal Processing, February 18, 2022. http://dx.doi.org/10.1007/s00034-022-01974-z.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractSpoofing attacks posed by generating artificial speech can severely degrade the performance of a speaker verification system. Recently, many anti-spoofing countermeasures have been proposed for detecting varying types of attacks from synthetic speech to replay presentations. While there are numerous effective defenses reported on standalone anti-spoofing solutions, the integration for speaker verification and spoofing detection systems has obvious benefits. In this paper, we propose a spoofing-robust automatic speaker verification system for diverse attacks based on a multi-task learning architecture. This deep learning-based model is jointly trained with time-frequency representations from utterances to provide recognition decisions for both tasks simultaneously. Compared with other state-of-the-art systems on the ASVspoof 2017 and 2019 corpora, a substantial improvement of the combined system under different spoofing conditions can be obtained.
39

Xie, Dang-en, Hai-na Hu, and Qiang Xu. "Replay attack detection based on deformable convolutional neural network and temporal-frequency attention model." Journal of Intelligent Systems 32, no. 1 (January 1, 2023). http://dx.doi.org/10.1515/jisys-2022-0265.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Abstract As an important identity authentication method, speaker verification (SV) has been widely used in many domains, e.g., mobile financials. At the same time, the existing SV systems are insecure under replay spoofing attacks. Toward a more secure and stable SV system, this article proposes a replay attack detection system based on deformable convolutional neural networks (DCNNs) and a time–frequency double-channel attention model. In DCNN, the positions of elements in the convolutional kernel are not fixed. Instead, they are modified by some trainable variable to help the model extract more useful local information from input spectrograms. Meanwhile, a time–frequency domino double-channel attention model is adopted to extract more effective distinctive features to collect valuable information for distinguishing genuine and replay speeches. Experimental results on ASVspoof 2019 dataset show that the proposed model can detect replay attacks accurately.
40

DİŞKEN, Gökay. "Robust Spoofed Speech Detection with Denoised I-vectors." GAZI UNIVERSITY JOURNAL OF SCIENCE, October 6, 2022. http://dx.doi.org/10.35378/gujs.1062788.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Spoofed speech detection is recently gaining attention of the researchers as speaker verification is shown to be vulnerable to spoofing attacks such as voice conversion, speech synthesis, replay, and impersonation. Although various different methods have been proposed to detect spoofed speech, their performances decrease dramatically under the mismatched conditions due to the additive or reverberant noises. Conventional speech enhancement methods fail to recover the performance gap, hence more advanced techniques seem to be necessary to solve the noisy spoofed speech detection problem. In this work, Denoising Autoencoder (DAE) is used to obtain clean estimates of i-vectors from their noisy versions. ASVspoof 2015 database is used in the experiments with five different noise types, added to the original utterances at 0, 10, and 20 dB signal-to-noise ratios (SNR). The experimental results verified that the DAE provides a more robust spoof detection, where the conventional methods fail.
41

Kanwal, Tahira, Rabbia Mahum, Abdul Malik AlSalman, Mohamed Sharaf, and Haseeb Hassan. "Fake speech detection using VGGish with attention block." EURASIP Journal on Audio, Speech, and Music Processing 2024, no. 1 (June 26, 2024). http://dx.doi.org/10.1186/s13636-024-00348-4.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractWhile deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result, the ubiquitous usage of deepfakes for increasing false information poses significant risks to the security and privacy of individuals. The primary objective of audio spoofing detection is to identify audio generated through numerous AI-based techniques. Several techniques for fake audio detection already exist using machine learning algorithms. However, they lack generalization and may not identify all types of AI-synthesized audios such as replay attacks, voice conversion, and text-to-speech (TTS). In this paper, a deep layered model, i.e., VGGish, along with an attention block, namely Convolutional Block Attention Module (CBAM) for spoofing detection, is introduced. Our suggested model successfully classifies input audio into two classes: Fake and Real, converting them into mel-spectrograms, and extracting their most representative features due to the attention block. Our model is a significant technique to utilize for audio spoofing detection due to a simple layered architecture. It captures complex relationships in audio signals due to both spatial and channel features present in an attention module. To evaluate the effectiveness of our model, we have conducted in-depth testing using the ASVspoof 2019 dataset. The proposed technique achieved an EER of 0.52% for Physical Access (PA) attacks and 0.07 % for Logical Access (LA) attacks.
42

Mittal, Aakshi, and Mohit Dua. "Static–dynamic features and hybrid deep learning models based spoof detection system for ASV." Complex & Intelligent Systems, November 19, 2021. http://dx.doi.org/10.1007/s40747-021-00565-w.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractDetection of spoof is essential for improving the performance of current scenario of Automatic Speaker Verification (ASV) systems. Empowerment to both frontend and backend parts can build the robust ASV systems. First, this paper discuses performance comparison of static and static–dynamic Constant Q Cepstral Coefficients (CQCC) frontend features by using Long Short Term Memory (LSTM) with Time Distributed Wrappers model at the backend. Second, it performs comparative analysis of ASV systems built using three deep learning models LSTM with Time Distributed Wrappers, LSTM and Convolutional Neural Network at backend and using static–dynamic CQCC features at frontend. Third, it discusses implementation of two spoof detection systems for ASV by using same static–dynamic CQCC features at frontend and different combination of deep learning models at backend. Out of these two, the first one is a voting protocol based two-level spoof detection system that uses CNN, LSTM model at first level and LSTM with Time Distributed Wrappers model at second level. The second one is a two-level spoof detection system with user identification and verification protocol, which uses LSTM model for user identification at first level and LSTM with Time Distributed Wrappers for verification at the second level. For implementing the proposed work, a variation in ASVspoof 2019 dataset has been used to introduce all types of spoofing attacks such as Speech Synthesis (SS), Voice Conversion (VC) and replay in single set of dataset. The results show that, at frontend, static–dynamic CQCC feature outperform static CQCC features and at the backend, hybrid combination of deep learning models increases accuracy of spoof detection systems.

До бібліографії