Dissertations / Theses on the topic 'Deep learning for Multimedia Forensics'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 22 dissertations / theses for your research on the topic 'Deep learning for Multimedia Forensics.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Nowroozi, Ehsan. "Machine Learning Techniques for Image Forensics in Adversarial Setting." Doctoral thesis, Università di Siena, 2020. http://hdl.handle.net/11365/1096177.
Full textStanton, Jamie Alyssa. "Detecting Image Forgery with Color Phenomenology." University of Dayton / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=dayton15574119887572.
Full textBudnik, Mateusz. "Active and deep learning for multimedia." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM011.
Full textThe main topics of this thesis include the use of active learning-based methods and deep learning in the context of retrieval of multimodal documents. The contributions proposed during this thesis address both these topics. An active learning framework was introduced, which allows for a more efficient annotation of broadcast TV videos thanks to the propagation of labels, the use of multimodal data and selection strategies. Several different scenarios and experiments were considered in the context of person identification in videos, including using different modalities (such as faces, speech segments and overlaid text) and different selection strategies. The whole system was additionally validated in a dry run involving real human annotators.A second major contribution was the investigation and use of deep learning (in particular the convolutional neural network) for video retrieval. A comprehensive study was made using different neural network architectures and training techniques such as fine-tuning or using separate classifiers like SVM. A comparison was made between learned features (the output of neural networks) and engineered features. Despite the lower performance of the engineered features, fusion between these two types of features increases overall performance.Finally, the use of convolutional neural network for speaker identification using spectrograms is explored. The results are compared to other state-of-the-art speaker identification systems. Different fusion approaches are also tested. The proposed approach obtains comparable results to some of the other tested approaches and offers an increase in performance when fused with the output of the best system
Ha, Hsin-Yu. "Integrating Deep Learning with Correlation-based Multimedia Semantic Concept Detection." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2268.
Full textVukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Full textIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Hamm, Simon, and sinonh@angliss edu au. "Digital Audio Video Assessment: Surface or Deep Learning - An Investigation." RMIT University. Education, 2009. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20091216.154300.
Full textQuan, Weize. "Detection of computer-generated images via deep learning." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT076.
Full textWith the advances of image editing and generation software tools, it has become easier to tamper with the content of images or create new images, even for novices. These generated images, such as computer graphics (CG) image and colorized image (CI), have high-quality visual realism, and potentially throw huge threats to many important scenarios. For instance, the judicial departments need to verify that pictures are not produced by computer graphics rendering technology, colorized images can cause recognition/monitoring systems to produce incorrect decisions, and so on. Therefore, the detection of computer-generated images has attracted widespread attention in the multimedia security research community. In this thesis, we study the identification of different computer-generated images including CG image and CI, namely, identifying whether an image is acquired by a camera or generated by a computer program. The main objective is to design an efficient detector, which has high classification accuracy and good generalization capability. Specifically, we consider dataset construction, network architecture, training methodology, visualization and understanding, for the considered forensic problems. The main contributions are: (1) a colorized image detection method based on negative sample insertion, (2) a generalization method for colorized image detection, (3) a method for the identification of natural image (NI) and CG image based on CNN (Convolutional Neural Network), and (4) a CG image identification method based on the enhancement of feature diversity and adversarial samples
MIGLIORELLI, LUCIA. "Towards digital patient monitoring: deep learning methods for the analysis of multimedia data from the actual clinical practice." Doctoral thesis, Università Politecnica delle Marche, 2022. http://hdl.handle.net/11566/295052.
Full textAcquiring information on patients' health status from the analysis of video recordings is a crucial opportunity to enhance current clinical assessment and monitoring practices. This PhD thesis proposes four automated systems that analyse multimedia data using deep learning methodologies. These systems have been developed to enrich current assessment modalities - so far based on direct observation of the patient by trained clinicians coupled with the compilation of clinical scales often collected in paper format- of three categories of patients: preterm infants, adolescents with autism spectrum syndrome and adults affected by neuropathologies (such as stroke and amyotrophic lateral sclerosis). Each system stems from the clinical need of having new tools to treat patients, able at collecting structured, easily accessible and shareable information. This research will continue to be enhanced to ensure that clinicians have more time to devote to patients, to treat them better and to the best of their ability
Dutt, Anuvabh. "Continual learning for image classification." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM063.
Full textThis thesis deals with deep learning applied to image classification tasks. The primary motivation for the work is to make current deep learning techniques more efficient and to deal with changes in the data distribution. We work in the broad framework of continual learning, with the aim to have in the future machine learning models that can continuously improve.We first look at change in label space of a data set, with the data samples themselves remaining the same. We consider a semantic label hierarchy to which the labels belong. We investigate how we can utilise this hierarchy for obtaining improvements in models which were trained on different levels of this hierarchy.The second and third contribution involve continual learning using a generative model. We analyse the usability of samples from a generative model in the case of training good discriminative classifiers. We propose techniques to improve the selection and generation of samples from a generative model. Following this, we observe that continual learning algorithms do undergo some loss in performance when trained on several tasks sequentially. We analyse the training dynamics in this scenario and compare with training on several tasks simultaneously. We make observations that point to potential difficulties in the learning of models in a continual learning scenario.Finally, we propose a new design template for convolutional networks. This architecture leads to training of smaller models without compromising performance. In addition the design lends itself to easy parallelisation, leading to efficient distributed training.In conclusion, we look at two different types of continual learning scenarios. We propose methods that lead to improvements. Our analysis also points to greater issues, to over come which we might need changes in our current neural network training procedure
Darmet, Ludovic. "Vers une approche basée modèle-image flexible et adaptative en criminalistique des images." Thesis, Université Grenoble Alpes, 2020. https://tel.archives-ouvertes.fr/tel-03086427.
Full textImages are nowadays a standard and mature medium of communication.They appear in our day to day life and therefore they are subject to concernsabout security. In this work, we study different methods to assess theintegrity of images. Because of a context of high volume and versatilityof tampering techniques and image sources, our work is driven by the necessity to developflexible methods to adapt the diversity of images.We first focus on manipulations detection through statistical modeling ofthe images. Manipulations are elementary operations such as blurring,noise addition, or compression. In this context, we are more preciselyinterested in the effects of pre-processing. Because of storagelimitation or other reasons, images can be resized or compressed justafter their capture. Addition of a manipulation would then be applied on analready pre-processed image. We show that a pre-resizing of test datainduces a drop of performance for detectors trained on full-sized images.Based on these observations, we introduce two methods to counterbalancethis performance loss for a pipeline of classification based onGaussian Mixture Models. This pipeline models the local statistics, onpatches, of natural images. It allows us to propose adaptation of themodels driven by the changes in local statistics. Our first method ofadaptation is fully unsupervised while the second one, only requiring a fewlabels, is weakly supervised. Thus, our methods are flexible to adaptversatility of source of images.Then we move to falsification detection and more precisely to copy-moveidentification. Copy-move is one of the most common image tampering technique. Asource area is copied into a target area within the same image. The vastmajority of existing detectors identify indifferently the two zones(source and target). In an operational scenario, only the target arearepresents a tampering area and is thus an area of interest. Accordingly, wepropose a method to disentangle the two zones. Our method takesadvantage of local modeling of statistics in natural images withGaussian Mixture Model. The procedure is specific for each image toavoid the necessity of using a large training dataset and to increase flexibility.Results for all the techniques described above are illustrated on publicbenchmarks and compared to state of the art methods. We show that theclassical pipeline for manipulations detection with Gaussian MixtureModel and adaptation procedure can surpass results of fine-tuned andrecent deep-learning methods. Our method for source/target disentanglingin copy-move also matches or even surpasses performances of the latestdeep-learning methods. We explain the good results of these classicalmethods against deep-learning by their additional flexibility andadaptation abilities.Finally, this thesis has occurred in the special context of a contestjointly organized by the French National Research Agency and theGeneral Directorate of Armament. We describe in the Appendix thedifferent stages of the contest and the methods we have developed, as well asthe lessons we have learned from this experience to move the image forensics domain into the wild
Zakaria, Ahmad. "Batch steganography and pooled steganalysis in JPEG images." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS079.
Full textABSTRACT:Batch steganography consists of hiding a message by spreading it out in a set of images, while pooled steganalysis consists of analyzing a set of images to conclude whether or not a hidden message is present. There are many strategies for spreading a message and it is reasonable to assume that the steganalyst does not know which one is being used, but it can be assumed that the steganographer uses the same embedding algorithm for all images. In this case, it can be shown that the most appropriate solution for pooled steganalysis is to use a single quantitative detector (i.e. one that predicts the size of the hidden message), to evaluate for each image the size, the hidden message (which can be zero if there is none), and to average the sizes (which are finally considered as scores) obtained over all the images.What would be the optimal solution if now the steganalyst could discriminate the spreading strategy among a set of known strategies. Could the steganalyst use a pooled steganalysis algorithm that is better than averaging the scores? Could the steganalyst obtain results close to the so-called "clairvoyant" scenario where it is assumed that the steganalyst knows exactly the spreading strategy?In this thesis, we try to answer these questions by proposing a pooled steganalysis architecture based on a quantitative image detector and an optimized score pooling function. The first contribution is a study of quantitative steganalysis algorithms in order to decide which one is best suited for pooled steganalysis. For this purpose, we propose to extend this comparison to binary steganalysis algorithms and we propose a methodology to switch from binary steganalysis results to quantitative steganalysis and vice versa.The core of the thesis lies in the second contribution. We study the scenario where the steganalyst does not know the spreading strategy. We then propose an optimized pooling function of the results based on a set of spreading strategies which improves the accuracy of the pooled steganalysis compared to a simple average. This pooling function is computed using supervised learning techniques. Experimental results obtained with six different spreading strategies and a state-of-the-art quantitative detector confirm our hypothesis. Our pooling function gives results close to a clairvoyant steganalyst who is supposed to know the spreading strategy.Keywords: Multimedia Security, Batch Steganography, Pooled Steganalysis, Machine Learning
Francis, Danny. "Représentations sémantiques d'images et de vidéos." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.
Full textRecent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works
Mašek, Jan. "Automatické strojové metody získávání znalostí z multimediálních dat." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-256538.
Full text(7534550), David Güera. "Media Forensics Using Machine Learning Approaches." Thesis, 2019.
Find full textFerreira, Sara Cardoso. "A machine learning based digital forensics application to detect tampered multimedia files." Master's thesis, 2021. https://hdl.handle.net/10216/135823.
Full textFerreira, Sara Cardoso. "A machine learning based digital forensics application to detect tampered multimedia files." Dissertação, 2021. https://hdl.handle.net/10216/135823.
Full textMarco, Godi. "Deep Learning methods for Fashion Multimedia Search and Retrieval." Doctoral thesis, 2021. http://hdl.handle.net/11562/1048933.
Full textWang, Chien-Yao, and 王建堯. "Deep-Learning-Based Multimedia Processing and Its Applications to Surveillance." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/xu2eq3.
Full text國立中央大學
資訊工程學系
105
Surveillance systems are becoming important. The criminal cases cracked by the video surveillance system, from 1% in 2007 to 19.83% in the first season of 2016. However, the traditional surveillance system relies on manual monitoring; this makes the surveillance system often used as a passive post-tracing, also cannot effectively prevent accidents or crimes when an emergency occurs. Otherwise, the global surveillance cameras will reach 30 billion frames per second by 2020; humans can’t afford to deal with such huge data. Therefore, it is important to develop an active intelligent surveillance system. Recently, deep learning brings great success in the multimedia data analysis; it can effectively and quickly turn a lot of data into useful information. This dissertation will be based on the deep learning multimedia signal processing technology to design for use in intelligent surveillance systems. Sensors suitable for active surveillance systems are cameras and microphones. In this dissertation, the surveillance system is based on the sound and vision to develop an intelligent sound and video analysis technology. The surveillance system based on the vision is able to clearly observe the occurrence of events. However, there is often a blind side or is susceptible to environmental changes. The surveillance system based on the sound is able to observe the sound from all directions, and analysis and recognition. In this dissertation, to develop a deep learning technology of the sound event recognition and detection based on the sound, and image segmentation, action recognition and group proposal technology based on the vision. For sound event recognition and detection, a new deep neural network system, called hierarchical-diving deep belief network (HDDBN), is proposed to classify and detect sound event. The proposed system learns several forms of abstract knowledge from proposed auditory-receptive-field binary pattern (ARFBP) visual audio descriptor that support the knowledge transfer from previously learned concepts to useful representations. For semantic image segmentation, proposed hierarchical joint-guided network (HJGN) using our designed object boundary prediction hierarchical joint learning convolutional network (OBP-HJLCN) to guide segmentation results. For action recognition, The proposed motion attention model, called the dynamic tracking attention model (DTAM), not only considers the information about motion but also perform dynamic tracking of objects in videos. For group proposal, an unsupervised group proposal network (GPN) is developed by combined proposed objectness map generation network and proposed object tracklet network.
CEVALLOS, MOREN JESUS FERNANDO. "Deep learning applications over heterogeneous networks: from multimedia to genes." Doctoral thesis, 2022. http://hdl.handle.net/11573/1654723.
Full text(9089423), Daniel Mas Montserrat. "Machine Learning-Based Multimedia Analytics." Thesis, 2020.
Find full textMachine learning is widely used to extract meaningful information from video, images, audio, text, and other multimedia data. Through a hierarchical structure, modern neural networks coupled with backpropagation learn to extract information from large amounts of data and to perform specific tasks such as classification or regression. In this thesis, we explore various approaches to multimedia analytics with neural networks. We present several image synthesis and rendering techniques to generate new images for training neural networks. Furthermore, we present multiple neural network architectures and systems for commercial logo detection, 3D pose estimation and tracking, deepfakes detection, and manipulation detection in satellite images.
(9722306), Sri Kalyan Yarlagadda. "IMAGE ANALYSIS FOR SHADOW DETECTION, SATELLITE IMAGE FORENSICS AND EATING SCENE SEGMENTATION AND CLUSTERING." Thesis, 2020.
Find full textKhan, Asim. "Automated Detection and Monitoring of Vegetation Through Deep Learning." Thesis, 2022. https://vuir.vu.edu.au/43941/.
Full text