Dissertations / Theses: 'Deep learning for Multimedia Forensics'

1

Nowroozi, Ehsan. "Machine Learning Techniques for Image Forensics in Adversarial Setting." Doctoral thesis, Università di Siena, 2020. http://hdl.handle.net/11365/1096177.

Full text

Abstract:

The use of machine-learning for multimedia forensics is gaining more and more consensus, especially due to the amazing possibilities offered by modern machine learning techniques. By exploiting deep learning tools, new approaches have been proposed whose performance remarkably exceed those achieved by state-of-the-art methods based on standard machine-learning and model-based techniques. However, the inherent vulnerability and fragility of machine learning architectures pose new serious security threats, hindering the use of these tools in security-oriented applications, and, among them, multimedia forensics. The analysis of the security of machine learning-based techniques in the presence of an adversary attempting to impede the forensic analysis, and the development of new solutions capable to improve the security of such techniques is then of primary importance, and, recently, has marked the birth of a new discipline, named Adversarial Machine Learning. By focusing on Image Forensics and image manipulation detection in particular, this thesis contributes to the above mission by developing novel techniques for enhancing the security of binary manipulation detectors based on machine learning in several adversarial scenarios. The validity of the proposed solutions has been assessed by considering several manipulation tasks, ranging from the detection of double compression and contrast adjustment, to the detection of geometric transformations and ltering operations.

APA, Harvard, Vancouver, ISO, and other styles

2

Stanton, Jamie Alyssa. "Detecting Image Forgery with Color Phenomenology." University of Dayton / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=dayton15574119887572.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Budnik, Mateusz. "Active and deep learning for multimedia." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM011.

Full text

Abstract:

Les thèmes principaux abordés dans cette thèse sont l'utilisation de méthodes d'apprentissage actif et d'apprentissage profond dans le contexte du traitement de documents multimodaux. Les contributions proposées dans cette thèse abordent ces deux thèmes. Un système d'apprentissage actif a été introduit pour permettre une annotation plus efficace des émissions de télévision grâce à la propagation des étiquettes, à l'utilisation de données multimodales et à des stratégies de sélection efficaces. Plusieurs scénarios et expériences ont été envisagés dans le cadre de l'identification des personnes dans les vidéos, en prenant en compte l'utilisation de différentes modalités (telles que les visages, les segments de la parole et le texte superposé) et différentes stratégies de sélection. Le système complet a été validé au cours d'un ``test à blanc'' impliquant des annotateurs humains réels.Une deuxième contribution majeure a été l'étude et l'utilisation de l'apprentissage profond (en particulier les réseaux de neurones convolutifs) pour la recherche d'information dans les vidéos. Une étude exhaustive a été réalisée en utilisant différentes architectures de réseaux neuronaux et différentes techniques d'apprentissage telles que le réglage fin (fine-tuning) ou des classificateurs plus classiques comme les SVMs. Une comparaison a été faite entre les caractéristiques apprises (la sortie des réseaux neuronaux) et les caractéristiques plus classiques (``engineered features''). Malgré la performance inférieure des seconds, une fusion de ces deux types de caractéristiques augmente la performance globale.Enfin, l'utilisation d'un réseau neuronal convolutif pour l'identification des locuteurs à l'aide de spectrogrammes a été explorée. Les résultats ont été comparés à ceux obtenus avec d'autres systèmes d'identification de locuteurs récents. Différentes approches de fusion ont également été testées. L'approche proposée a permis d'obtenir des résultats comparables à ceux certains des autres systèmes testés et a offert une augmentation de la performance lorsqu'elle est fusionnée avec la sortie du meilleur système
The main topics of this thesis include the use of active learning-based methods and deep learning in the context of retrieval of multimodal documents. The contributions proposed during this thesis address both these topics. An active learning framework was introduced, which allows for a more efficient annotation of broadcast TV videos thanks to the propagation of labels, the use of multimodal data and selection strategies. Several different scenarios and experiments were considered in the context of person identification in videos, including using different modalities (such as faces, speech segments and overlaid text) and different selection strategies. The whole system was additionally validated in a dry run involving real human annotators.A second major contribution was the investigation and use of deep learning (in particular the convolutional neural network) for video retrieval. A comprehensive study was made using different neural network architectures and training techniques such as fine-tuning or using separate classifiers like SVM. A comparison was made between learned features (the output of neural networks) and engineered features. Despite the lower performance of the engineered features, fusion between these two types of features increases overall performance.Finally, the use of convolutional neural network for speaker identification using spectrograms is explored. The results are compared to other state-of-the-art speaker identification systems. Different fusion approaches are also tested. The proposed approach obtains comparable results to some of the other tested approaches and offers an increase in performance when fused with the output of the best system

APA, Harvard, Vancouver, ISO, and other styles

4

Ha, Hsin-Yu. "Integrating Deep Learning with Correlation-based Multimedia Semantic Concept Detection." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2268.

Full text

Abstract:

The rapid advances in technologies make the explosive growth of multimedia data possible and available to the public. Multimedia data can be defined as data collection, which is composed of various data types and different representations. Due to the fact that multimedia data carries knowledgeable information, it has been widely adopted to different genera, like surveillance event detection, medical abnormality detection, and many others. To fulfil various requirements for different applications, it is important to effectively classify multimedia data into semantic concepts across multiple domains. In this dissertation, a correlation-based multimedia semantic concept detection framework is seamlessly integrated with the deep learning technique. The framework aims to explore implicit and explicit correlations among features and concepts while adopting different Convolutional Neural Network (CNN) architectures accordingly. First, the Feature Correlation Maximum Spanning Tree (FC-MST) is proposed to remove the redundant and irrelevant features based on the correlations between the features and positive concepts. FC-MST identifies the effective features and decides the initial layer's dimension in CNNs. Second, the Negative-based Sampling method is proposed to alleviate the data imbalance issue by keeping only the representative negative instances in the training process. To adjust dierent sizes of training data, the number of iterations for the CNN is determined adaptively and automatically. Finally, an Indirect Association Rule Mining (IARM) approach and a correlation-based re-ranking method are proposed to reveal the implicit relationships from the correlations among concepts, which are further utilized together with the classification scores to enhance the re-ranking process. The framework is evaluated using two benchmark multimedia data sets, TRECVID and NUS-WIDE, which contain large amounts of multimedia data and various semantic concepts.

APA, Harvard, Vancouver, ISO, and other styles

5

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Full text

Abstract:

La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain

APA, Harvard, Vancouver, ISO, and other styles

6

Hamm, Simon, and sinonh@angliss edu au. "Digital Audio Video Assessment: Surface or Deep Learning - An Investigation." RMIT University. Education, 2009. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20091216.154300.

Full text

Abstract:

This research aims to investigate an assertion, endorsed by a range of commentators, that multimedia teaching and learning approaches encourage learners to adopt a richer, creative and deeper level of understanding and participation within the learning environment than traditional teaching and learning methods. The thesis examines this assertion by investigating one type of multimedia activity defined (for the purposes of this research) as a digital audio video assessment (DAVA). Data was collected using a constructivist epistemology, interpretative and naturalistic perspective using primarily a qualitative methodology. Three types of data collection methods were used to collect data from thirteen Diploma of Event Management students from William Angliss TAFE. Firstly, participants completed the Biggs Study Process Questionnaire (2001) which is a predictor of deep and surface learning preference. Each participant then engaged in a semi-structured interview that elicited participant's self-declared learning preferences and their approaches to completion of the DAVA. These data sources were then compared. Six factors that are critical in informing the way that the participants approached the DAVA emerged from the analysis of the data. Based on these findings it is concluded that the DAVA does not restrict, inhibit or negatively influence a participants learning preference. Learners with a pre-existing, stable learning preference are likely to adopt a learning approach that is consisten t with their preference. Participants that have a learning preference that is less stable (more flexible) may adopt either a surface or deep approach depending on the specific task, activity or assessment.

APA, Harvard, Vancouver, ISO, and other styles

7

Quan, Weize. "Detection of computer-generated images via deep learning." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT076.

Full text

Abstract:

Avec les progrès des outils logiciels d'édition et de génération d'images, il est devenu plus facile de falsifier le contenu des images ou de créer de nouvelles images, même pour les novices. Ces images générées, telles que l'image de rendu photoréaliste et l'image colorisée, ont un réalisme visuel de haute qualité et peuvent potentiellement menacer de nombreuses applications importantes. Par exemple, les services judiciaires doivent vérifier que les images ne sont pas produites par la technologie de rendu infographique, les images colorisées peuvent amener les systèmes de reconnaissance / surveillance à produire des décisions incorrectes, etc. Par conséquent, la détection d'images générées par ordinateur a attiré une large attention dans la communauté de recherche en sécurité de multimédia. Dans cette thèse, nous étudions l'identification de différents types d'images générées par ordinateur, y compris l'image de rendu et l'image coloriée. Nous nous intéressons à identifier si une image est acquise par une caméra ou générée par un programme informatique. L'objectif principal est de concevoir un détecteur efficace, qui a une précision de classification élevée et une bonne capacité de généralisation. Nous considérons la construction de jeux de données, l'architecture du réseau de neurones profond, la méthode d'entraînement, la visualisation et la compréhension, pour les problèmes d'investigation légale des images considérés. Nos principales contributions sont : (1) une méthode de détection d'image colorisée basée sur l'insertion d'échantillons négatifs, (2) une méthode d'amélioration de la généralisation pour la détection d'image colorisée, (3) une méthode d'identification d'image naturelle et d'image de rendu basée sur le réseau neuronal convolutif, et (4) une méthode d'identification d'image de rendu basée sur l'amélioration de la diversité des caractéristiques et des échantillons contradictoires
With the advances of image editing and generation software tools, it has become easier to tamper with the content of images or create new images, even for novices. These generated images, such as computer graphics (CG) image and colorized image (CI), have high-quality visual realism, and potentially throw huge threats to many important scenarios. For instance, the judicial departments need to verify that pictures are not produced by computer graphics rendering technology, colorized images can cause recognition/monitoring systems to produce incorrect decisions, and so on. Therefore, the detection of computer-generated images has attracted widespread attention in the multimedia security research community. In this thesis, we study the identification of different computer-generated images including CG image and CI, namely, identifying whether an image is acquired by a camera or generated by a computer program. The main objective is to design an efficient detector, which has high classification accuracy and good generalization capability. Specifically, we consider dataset construction, network architecture, training methodology, visualization and understanding, for the considered forensic problems. The main contributions are: (1) a colorized image detection method based on negative sample insertion, (2) a generalization method for colorized image detection, (3) a method for the identification of natural image (NI) and CG image based on CNN (Convolutional Neural Network), and (4) a CG image identification method based on the enhancement of feature diversity and adversarial samples

APA, Harvard, Vancouver, ISO, and other styles

8

MIGLIORELLI, LUCIA. "Towards digital patient monitoring: deep learning methods for the analysis of multimedia data from the actual clinical practice." Doctoral thesis, Università Politecnica delle Marche, 2022. http://hdl.handle.net/11566/295052.

Full text

Abstract:

Acquisire informazioni sullo stato di salute dei pazienti a partire dall’analisi di video registrazioni è un’opportunità cruciale per potenziare le attuali pratiche cliniche di valutazione e monitoraggio. Questa Tesi di Dottorato propone quattro sistemi automatici che analizzano dati multimediali tramite algoritmi di apprendimento profondo (deep learning). Tali sistemi sono stati sviluppati per arricchire le modalità valutative -ad oggi basate sull’osservazione diretta del paziente da parte del clinico e sulla compilazione di scale cliniche spesso raccolte in formato cartaceo- di tre categorie di pazienti: i neonati prematuri, gli adolescenti con sindrome dello spettro autistico e gli adulti affetti da neuropatologie (ictus e sclerosi laterale amiotrofica). Ogni sistema nasce dal dialogo con gli specialisti dei settori e risponde all’esigenza di avere a disposizione nuovi strumenti per trattare i pazienti, che raccolgano misurazioni in maniera continuativa ed ordinata, in sistemi sicuri e facilmente accessibili e si svilupperà in futuro per garantire ai medici, sempre più provati dai ritmi lavorativi serrati, più tempo da dedicare ai pazienti, per curarli meglio e al meglio delle proprie capacità.
Acquiring information on patients' health status from the analysis of video recordings is a crucial opportunity to enhance current clinical assessment and monitoring practices. This PhD thesis proposes four automated systems that analyse multimedia data using deep learning methodologies. These systems have been developed to enrich current assessment modalities - so far based on direct observation of the patient by trained clinicians coupled with the compilation of clinical scales often collected in paper format- of three categories of patients: preterm infants, adolescents with autism spectrum syndrome and adults affected by neuropathologies (such as stroke and amyotrophic lateral sclerosis). Each system stems from the clinical need of having new tools to treat patients, able at collecting structured, easily accessible and shareable information. This research will continue to be enhanced to ensure that clinicians have more time to devote to patients, to treat them better and to the best of their ability

APA, Harvard, Vancouver, ISO, and other styles

9

Dutt, Anuvabh. "Continual learning for image classification." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM063.

Full text

Abstract:

Cette thèse traite de l'apprentissage en profondeur appliqu'e aux tâches de classification des images. La principale motivation du travail est de rendre les techniques d’apprentissage en profondeur actuelles plus efficaces et de faire face aux changements dans la distribution des données. Nous travaillons dans le cadre élargi de l’apprentissage continu, dans le but d’avoir 'a l’avenir des modèles d’apprentissage automatique pouvant être améliorés en permanence.Nous examinons d’abord la modification de l’espace étiquette d’un ensemble de données, les échantillons de données restant les mêmes. Nous considérons une hiérarchie d'étiquettes sémantiques à laquelle appartiennent les étiquettes. Nous étudions comment nous pouvons utiliser cette hiérarchie pour obtenir des améliorations dans les modèles formés à différents niveaux de cette hiérarchie.Les deuxième et troisième contributions impliquent un apprentissage continu utilisant un modèle génératif. Nous analysons la facilité d'utilisation des échantillons d'un modèle génératif dans le cas de la formation de bons classificateurs discriminants. Nous proposons des techniques pour améliorer la sélection et la génération d'échantillons à partir d'un modèle génératif. Ensuite, nous observons que les algorithmes d’apprentissage continu subissent certaines pertes de performances lorsqu’ils sont entraînés séquentiellement à plusieurs tâches. Nous analysons la dynamique de la formation dans ce scénario et comparons avec la formation sur plusieurs tâches simultanément. Nous faisons des observations qui indiquent des difficultés potentielles dans l’apprentissage de modèles dans un scénario d’apprentissage continu.Enfin, nous proposons un nouveau modèle de conception pour les réseaux de convolution. Cette architecture permet de former des modèles plus petits sans compromettre les performances. De plus, la conception se prête facilement à la parallélisation, ce qui permet une formation distribuée efficace.En conclusion, nous examinons deux types de scénarios d’apprentissage continu. Nous proposons des méthodes qui conduisent à des améliorations. Notre analyse met 'egalement en évidence des problèmes plus importants, dont nous aurions peut-être besoin de changements dans notre procédure actuelle de formation de réseau neuronal
This thesis deals with deep learning applied to image classification tasks. The primary motivation for the work is to make current deep learning techniques more efficient and to deal with changes in the data distribution. We work in the broad framework of continual learning, with the aim to have in the future machine learning models that can continuously improve.We first look at change in label space of a data set, with the data samples themselves remaining the same. We consider a semantic label hierarchy to which the labels belong. We investigate how we can utilise this hierarchy for obtaining improvements in models which were trained on different levels of this hierarchy.The second and third contribution involve continual learning using a generative model. We analyse the usability of samples from a generative model in the case of training good discriminative classifiers. We propose techniques to improve the selection and generation of samples from a generative model. Following this, we observe that continual learning algorithms do undergo some loss in performance when trained on several tasks sequentially. We analyse the training dynamics in this scenario and compare with training on several tasks simultaneously. We make observations that point to potential difficulties in the learning of models in a continual learning scenario.Finally, we propose a new design template for convolutional networks. This architecture leads to training of smaller models without compromising performance. In addition the design lends itself to easy parallelisation, leading to efficient distributed training.In conclusion, we look at two different types of continual learning scenarios. We propose methods that lead to improvements. Our analysis also points to greater issues, to over come which we might need changes in our current neural network training procedure

APA, Harvard, Vancouver, ISO, and other styles

10

Darmet, Ludovic. "Vers une approche basée modèle-image flexible et adaptative en criminalistique des images." Thesis, Université Grenoble Alpes, 2020. https://tel.archives-ouvertes.fr/tel-03086427.

Full text

Abstract:

Les images numériques sont devenues un moyen de communication standard et universel. Elles prennent place dans notre vie de tous les jours, ce qui entraîne directement des inquiétudes quant à leur intégrité. Nos travaux de recherche étudient différentes méthodes pour examiner l’authenticité d’une image numérique. Nous nous plaçons dans un contexte réaliste où les images sont en grandes quantités et avec une large diversité de manipulations et falsifications ainsi que de sources. Cela nous a poussé à développer des méthodes flexibles et adaptative face à cette diversité.Nous nous sommes en premier lieu intéressés à la détection de manipulations à l’aide de la modélisation statistiques des images. Les manipulations sont des opérations élémentaires telles qu’un flou, l’ajout de bruit ou une compression. Dans ce cadre, nous nous sommes plus particulièrement focalisés sur les effets d’un pré-traitement. A cause de limitations de stockage et autres, une image peut être re-dimensionnée ou re-compressée juste après sa capture. L’ajout d’une manipulation se fait donc ensuite sur une image déjà pré-traitée. Nous montrons qu’un pré-redimensionnement pour les images de test induit une chute de performance pour des détecteurs entraînés avec des images en pleine taille. Partant de ce constat, nous introduisons deux nouvelles méthodes pour mitiger cette chute de performance pour des détecteurs basés sur l’utilisation de mixtures de gaussiennes. Ces détecteurs modélisent les statistiques locales, sur des tuiles (patches), d’images naturelles. Cela nous permet de proposer une adaptation de modèle guidée par les changements dans les statistiques locales de l’image. Notre première méthode est une adaptation entièrement non-supervisée, alors que la seconde requière l’accès à quelques labels, faiblement supervisé, pour les images pré-resizées.Ensuite, nous nous sommes tournés vers la détection de falsifications et plus spécifiquement l’identification de copier-coller. Le copier-coller est l’une des falsification les plus populaires. Une zone source est copiée vers une zone cible de la même image. La grande majorité des détecteurs existants identifient indifféremment les deux zones (source et cible). Dans un scénario opérationnel, seulement la zone cible est intéressante car uniquement elle représente une zone de falsification. Ainsi, nous proposons une méthode pour discerner les deux zones. Notre méthode utilise également la modélisation locale des statistiques de l’image à l’aide de mixtures de gaussiennes. La procédure est spécifique à chaque image et ainsi évite la nécessité d’avoir recours à de larges bases d’entraînement et permet une plus grande flexibilité.Des résultats expérimentaux pour toutes les méthodes précédemment décrites sont présentés sur des benchmarks classiques de la littérature et comparés aux méthodes de l’état de l’art. Nous montrons que le détecteur classique de détection de manipulations basé sur les mixtures de gaussiennes, associé à nos nouvelles méthodes d’adaptation de modèle peut surpasser les résultats de récentes méthodes deep-learning. Notre méthode de discernement entre source/cible pour copier-coller égale ou même surpasse les performances des dernières méthodes d’apprentissage profond. Nous expliquons ces bons résultats des méthodes classiques face aux méthodes d’apprentissage profond par la flexibilité et l’adaptabilité supplémentaire dont elles font preuve.Pour finir, cette thèse s’est déroulée dans le contexte très spécial d’un concours organisé conjointement par l’Agence National de la Recherche et la Direction Général de l’Armement. Nous décrivons dans un appendice, les différents tours de ce concours et les méthodes que nous avons développé. Nous dressons également un bilan des enseignements de cette expérience qui avait pour but de passer de benchmarks publics à une détection de falsifications d’images très réalistes
Images are nowadays a standard and mature medium of communication.They appear in our day to day life and therefore they are subject to concernsabout security. In this work, we study different methods to assess theintegrity of images. Because of a context of high volume and versatilityof tampering techniques and image sources, our work is driven by the necessity to developflexible methods to adapt the diversity of images.We first focus on manipulations detection through statistical modeling ofthe images. Manipulations are elementary operations such as blurring,noise addition, or compression. In this context, we are more preciselyinterested in the effects of pre-processing. Because of storagelimitation or other reasons, images can be resized or compressed justafter their capture. Addition of a manipulation would then be applied on analready pre-processed image. We show that a pre-resizing of test datainduces a drop of performance for detectors trained on full-sized images.Based on these observations, we introduce two methods to counterbalancethis performance loss for a pipeline of classification based onGaussian Mixture Models. This pipeline models the local statistics, onpatches, of natural images. It allows us to propose adaptation of themodels driven by the changes in local statistics. Our first method ofadaptation is fully unsupervised while the second one, only requiring a fewlabels, is weakly supervised. Thus, our methods are flexible to adaptversatility of source of images.Then we move to falsification detection and more precisely to copy-moveidentification. Copy-move is one of the most common image tampering technique. Asource area is copied into a target area within the same image. The vastmajority of existing detectors identify indifferently the two zones(source and target). In an operational scenario, only the target arearepresents a tampering area and is thus an area of interest. Accordingly, wepropose a method to disentangle the two zones. Our method takesadvantage of local modeling of statistics in natural images withGaussian Mixture Model. The procedure is specific for each image toavoid the necessity of using a large training dataset and to increase flexibility.Results for all the techniques described above are illustrated on publicbenchmarks and compared to state of the art methods. We show that theclassical pipeline for manipulations detection with Gaussian MixtureModel and adaptation procedure can surpass results of fine-tuned andrecent deep-learning methods. Our method for source/target disentanglingin copy-move also matches or even surpasses performances of the latestdeep-learning methods. We explain the good results of these classicalmethods against deep-learning by their additional flexibility andadaptation abilities.Finally, this thesis has occurred in the special context of a contestjointly organized by the French National Research Agency and theGeneral Directorate of Armament. We describe in the Appendix thedifferent stages of the contest and the methods we have developed, as well asthe lessons we have learned from this experience to move the image forensics domain into the wild

APA, Harvard, Vancouver, ISO, and other styles

11

Zakaria, Ahmad. "Batch steganography and pooled steganalysis in JPEG images." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS079.

Full text

Abstract:

RÉSUMÉ :La stéganographie par lot consiste à dissimuler un message en le répartissant dans un ensemble d’images, tandis que la stéganalyse groupée consiste à analyser un ensemble d’images pour conclure à la présence ou non d’un message caché. Il existe de nombreuses stratégies d’étalement d’un message et on peut raisonnablement penser que le stéganalyste ne connaît pas celle qui est utilisée, mais il peut supposer que le stéganographe utilise le même algorithme d’insertion pour toutes les images. Dans ce cas, on peut montrer que la solution la plus appropriée pour la stéganalyse groupée est d’utiliser un unique détecteur quantitatif (c'est-à-dire qui prédit la taille du message caché), d’évaluer pour chaque image la taille du message caché (qui peut être nulle s'il n'y en a pas) et de faire la moyenne des tailles (qui sont finalement considérées comme des scores) obtenues sur l'ensemble des images.Quelle serait la solution optimale si maintenant, le stéganalyste pouvait discriminer la stratégie d’étalement parmi un ensemble de stratégies connues. Le stéganalyste pourrait-il utiliser un algorithme de stéganalyse groupé meilleur que la moyenne des scores ? Le stéganalyste pourrait-il obtenir des résultats proches du scénario dit "clairvoyant" où l’on suppose qu’il connaît exactement la stratégie d’étalement ?Dans cette thèse, nous essayons de répondre à ces questions en proposant une architecture de stéganalyse groupée fondé sur un détecteur quantitatif d’images et une fonction de groupement optimisée des scores. La première contribution est une étude des algorithmes de stéganalyse quantitatifs afin de décider lequel est le mieux adapté à la stéganalyse groupée. Pour cela, nous proposons d’étendre cette comparaison aux algorithmes de stéganalyse binaires et nous proposons une méthodologie pour passer des résultats de la stéganalyse binaire en stéganalyse quantitative et réciproquement.Le cœur de la thèse se situe dans la deuxième contribution. Nous étudions le scénario où le stéganalyste ne connaît pas la stratégie d’étalement. Nous proposons alors une fonction de groupement optimisée des résultats fondés sur un ensemble de stratégies d’étalement ce qui permet d’améliorer la précision de la stéganalyse groupée par rapport à une simple moyenne. Cette fonction de groupement est calculée en utilisant des techniques d’apprentissage supervisé. Les résultats expérimentaux obtenus avec six stratégies d’étalement différentes et un détecteur quantitatif de l’état de l’art confirment notre hypothèse. Notre fonction de groupement obtient des résultats proches d’un stéganalyste clairvoyant qui est censé connaître la stratégie d’étalement.Mots clés : Sécurité multimédia, Stéganographie par lot, Stéganalyse groupée, Apprentissage machine
ABSTRACT:Batch steganography consists of hiding a message by spreading it out in a set of images, while pooled steganalysis consists of analyzing a set of images to conclude whether or not a hidden message is present. There are many strategies for spreading a message and it is reasonable to assume that the steganalyst does not know which one is being used, but it can be assumed that the steganographer uses the same embedding algorithm for all images. In this case, it can be shown that the most appropriate solution for pooled steganalysis is to use a single quantitative detector (i.e. one that predicts the size of the hidden message), to evaluate for each image the size, the hidden message (which can be zero if there is none), and to average the sizes (which are finally considered as scores) obtained over all the images.What would be the optimal solution if now the steganalyst could discriminate the spreading strategy among a set of known strategies. Could the steganalyst use a pooled steganalysis algorithm that is better than averaging the scores? Could the steganalyst obtain results close to the so-called "clairvoyant" scenario where it is assumed that the steganalyst knows exactly the spreading strategy?In this thesis, we try to answer these questions by proposing a pooled steganalysis architecture based on a quantitative image detector and an optimized score pooling function. The first contribution is a study of quantitative steganalysis algorithms in order to decide which one is best suited for pooled steganalysis. For this purpose, we propose to extend this comparison to binary steganalysis algorithms and we propose a methodology to switch from binary steganalysis results to quantitative steganalysis and vice versa.The core of the thesis lies in the second contribution. We study the scenario where the steganalyst does not know the spreading strategy. We then propose an optimized pooling function of the results based on a set of spreading strategies which improves the accuracy of the pooled steganalysis compared to a simple average. This pooling function is computed using supervised learning techniques. Experimental results obtained with six different spreading strategies and a state-of-the-art quantitative detector confirm our hypothesis. Our pooling function gives results close to a clairvoyant steganalyst who is supposed to know the spreading strategy.Keywords: Multimedia Security, Batch Steganography, Pooled Steganalysis, Machine Learning

APA, Harvard, Vancouver, ISO, and other styles

12

Francis, Danny. "Représentations sémantiques d'images et de vidéos." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.

Full text

Abstract:

Des travaux de recherche récents en apprentissage profond ont permis d’améliorer significativement les performances des modèles multimédias : avec la création de grands jeux de données d’images ou de vidéos annotées, les réseaux de neurones profonds ont surpassé les modèles précédemment utilisés dans la plupart des cas. Dans cette thèse, nous avons développé de nouveaux modèles neuronaux profonds permettant de générer des représentations sémantiques d’images et de vidéos. Nous nous sommes intéressés à deux tâches principales : l’appariement d’images ou de vidéos et de textes, et la génération automatique de légendes. La tâche d’appariement peut être réalisée par le biais d’un espace multimodal commun permettant de comparer images ou vidéos et textes. Nous avons pour cela défini deux types de modèles d’appariement en nous inspirant des travaux récents sur les réseaux de capsules. La génération automatique de légendes textuelles est une tâche ardue, puisqu’elle demande à analyser un objet visuel, et à le transcrire en une description en langage naturel. Pour cela, nous proposons deux méthodes d’apprentissage par curriculum. Par ailleurs, nous avons défini une méthode permettant à un modèle de génération de légendes de vidéos de combiner des informations spatiales et temporelles. Des expériences ont permis de prouver l’intérêt de nos propositions par rapport aux travaux existants
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works

APA, Harvard, Vancouver, ISO, and other styles

13

Mašek, Jan. "Automatické strojové metody získávání znalostí z multimediálních dat." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2016. http://www.nusl.cz/ntk/nusl-256538.

Full text

Abstract:

The quality and efficient processing of increasing amount of multimedia data is nowadays becoming increasingly needed to obtain some knowledge of this data. The thesis deals with a research, implementation, optimization and the experimental verification of automatic machine learning methods for multimedia data analysis. Created approach achieves higher accuracy in comparison with common methods, when applied on selected examples. Selected results were published in journals with impact factor [1, 2]. For these reasons special parallel computing methods were created in this work. These methods use massively parallel hardware to save electric energy and computing time and for achieving better result while solving problems. Computations which usually take days can be computed in minutes using new optimized methods. The functionality of created methods was verified on selected problems: artery detection from ultrasound images with further classifying of artery disease, the buildings detection from aerial images for obtaining geographical coordinates, the detection of materials contained in meteorite from CT images, the processing of huge databases of structured data, the classification of metallurgical materials with using laser induced breakdown spectroscopy and the automatic classification of emotions from texts.

APA, Harvard, Vancouver, ISO, and other styles

14

(7534550), David Güera. "Media Forensics Using Machine Learning Approaches." Thesis, 2019.

Find full text

Abstract:

Consumer-grade imaging sensors have become ubiquitous in the past decade. Images and videos, collected from such sensors are used by many entities for public and private communications, including publicity, advocacy, disinformation, and deception.

In this thesis, we present tools to be able to extract knowledge from and understand this imagery and its provenance. Many images and videos are modified and/or manipulated prior to their public release. We also propose a set of forensics and counter-forensic techniques to determine the integrity of this multimedia content and modify it in specific ways to deceive adversaries. The presented tools are evaluated using publicly available datasets and independently organized challenges.

APA, Harvard, Vancouver, ISO, and other styles

15

Ferreira, Sara Cardoso. "A machine learning based digital forensics application to detect tampered multimedia files." Master's thesis, 2021. https://hdl.handle.net/10216/135823.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Ferreira, Sara Cardoso. "A machine learning based digital forensics application to detect tampered multimedia files." Dissertação, 2021. https://hdl.handle.net/10216/135823.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Marco, Godi. "Deep Learning methods for Fashion Multimedia Search and Retrieval." Doctoral thesis, 2021. http://hdl.handle.net/11562/1048933.

Full text

Abstract:

Online fashion shopping is an increasing market and with this growth comes a greater need for techniques to automate an ever-expanding variety of tasks in a more accurate way. Deep Learning techniques have been successfully applied in many tasks in the fashion domain, such as classification (to recognize different categories of clothes), recommendation (learning the preferences of a user to make suggestions), generation (automatically generate/edit clothes) etc. In this thesis we focus on search and retrieval problems in this domain. This kind of tools can speed up many tasks both on the user side and the industry side. First we start by analyzing existing models for fashion feature extraction and show their shortcomings. The analysis is made using visual summaries, a compact representation of a set of saliency maps, that describe the elements that contributed to a classification. We show that texture information is almost ignored in these models even when it should be significant for a particular style. This brings the second part, where a new kind of texture descriptor is designed, building upon texels, mid-level elements of textures that are repeated. With simple statistics on texels, interpretable attributes can be extracted and used for improving feature representations for tasks such as image retrieval and interactive search. An attribute based descriptor for textures can be plugged in a pre-existing image search framework and easily used by customers who wish to browse a textile catalog, or by designers who wish to choose fabrics for production of clothes. Navigation in this catalog leverages attributes using relative comparisons for a fast exploration of the texture space. We show the advantages of working with texels and how they can be detected using a Mask-RCNN architecture trained on the ElBa dataset, which we introduce in this thesis. It is composed of synthetic images of element-based textures, exploring a wide variety of colors, spatial patterns and shapes. In the third part a framework for Street-To-Shop matching is presented. It is an image retrieval problem where the query image is a picture that contains a clothing item and the gallery set is composed of the pictures of the clothes sold in an e-shop. The goal is to find the product in the shop most similar to the one in the picture. Compared to existing approaches, we focus on the less explored Video-To-Shop problem by extending to the time dimension, extracting information from a video sequence to improve search results even more thanks to an attention mechanism that focuses on the most salient frames. We also design a training procedure that doesn't require bounding box annotations but still yields performances higher than existing approaches that do require it. The model is trained on the MovingFashion dataset, which we also present in this thesis. This provides the user a new ways to browse an online shop, for example by taking pictures of clothes that somebody is wearing or that are seen in a physical shop, and searching for them online automatically. It has also many implications for social media marketing and market research for fashion companies.

APA, Harvard, Vancouver, ISO, and other styles

18

Wang, Chien-Yao, and 王建堯. "Deep-Learning-Based Multimedia Processing and Its Applications to Surveillance." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/xu2eq3.

Full text

Abstract:

博士
國立中央大學
資訊工程學系
105
Surveillance systems are becoming important. The criminal cases cracked by the video surveillance system, from 1% in 2007 to 19.83% in the first season of 2016. However, the traditional surveillance system relies on manual monitoring; this makes the surveillance system often used as a passive post-tracing, also cannot effectively prevent accidents or crimes when an emergency occurs. Otherwise, the global surveillance cameras will reach 30 billion frames per second by 2020; humans can’t afford to deal with such huge data. Therefore, it is important to develop an active intelligent surveillance system. Recently, deep learning brings great success in the multimedia data analysis; it can effectively and quickly turn a lot of data into useful information. This dissertation will be based on the deep learning multimedia signal processing technology to design for use in intelligent surveillance systems. Sensors suitable for active surveillance systems are cameras and microphones. In this dissertation, the surveillance system is based on the sound and vision to develop an intelligent sound and video analysis technology. The surveillance system based on the vision is able to clearly observe the occurrence of events. However, there is often a blind side or is susceptible to environmental changes. The surveillance system based on the sound is able to observe the sound from all directions, and analysis and recognition. In this dissertation, to develop a deep learning technology of the sound event recognition and detection based on the sound, and image segmentation, action recognition and group proposal technology based on the vision. For sound event recognition and detection, a new deep neural network system, called hierarchical-diving deep belief network (HDDBN), is proposed to classify and detect sound event. The proposed system learns several forms of abstract knowledge from proposed auditory-receptive-field binary pattern (ARFBP) visual audio descriptor that support the knowledge transfer from previously learned concepts to useful representations. For semantic image segmentation, proposed hierarchical joint-guided network (HJGN) using our designed object boundary prediction hierarchical joint learning convolutional network (OBP-HJLCN) to guide segmentation results. For action recognition, The proposed motion attention model, called the dynamic tracking attention model (DTAM), not only considers the information about motion but also perform dynamic tracking of objects in videos. For group proposal, an unsupervised group proposal network (GPN) is developed by combined proposed objectness map generation network and proposed object tracklet network.

APA, Harvard, Vancouver, ISO, and other styles

19

CEVALLOS, MOREN JESUS FERNANDO. "Deep learning applications over heterogeneous networks: from multimedia to genes." Doctoral thesis, 2022. http://hdl.handle.net/11573/1654723.

Full text

Abstract:

This research aimed to investigate the synergies between deep learning and heterogeneous graph-based scenario modeling. The candidate has thoroughly studied the state-of-the-art (SOTA) techniques that combine deep learning to heterogeneous graph (het-graph) modeled scenarios. Two main paradigms of collaboration have been identified: 1. Deep learning enhances the scalability and the representation power of graph algorithms and shallow machine learning approaches for graph analysis. 2. Het-graph modeled scenarios help design solution-space exploration biases for deep learning-based optimization algorithms. Moreover, the candidate has chosen two important research fields from industry and academia to identify two open problems where the studied synergisms could be helpful. These open problems were: 1. The online optimization of service function chain deployment in virtualized content delivery networks for live-streaming, 2. The inference of developmental regulatory mechanisms between genes and cis-regulatory elements. Finally, the candidate demonstrated his proficiency in the research field by applying the synergisms identified in the first phase of the research to solve these open problems.

APA, Harvard, Vancouver, ISO, and other styles

20

(9089423), Daniel Mas Montserrat. "Machine Learning-Based Multimedia Analytics." Thesis, 2020.

Find full text

Abstract:

Machine learning is widely used to extract meaningful information from video, images, audio, text, and other multimedia data.  Through a hierarchical structure, modern neural networks coupled with backpropagation learn to extract information from large amounts of data and to perform specific tasks such as classification or regression. In this thesis, we explore various approaches to multimedia analytics with neural networks. We present several image synthesis and rendering techniques to generate new images for training neural networks. Furthermore, we present multiple neural network architectures and systems for commercial logo detection, 3D pose estimation and tracking, deepfakes detection, and manipulation detection in satellite images.

APA, Harvard, Vancouver, ISO, and other styles

21

(9722306), Sri Kalyan Yarlagadda. "IMAGE ANALYSIS FOR SHADOW DETECTION, SATELLITE IMAGE FORENSICS AND EATING SCENE SEGMENTATION AND CLUSTERING." Thesis, 2020.

Find full text

Abstract:

Recent advances in machine learning has enabled notable progress in many aspects of image analysis. In this thesis, we present three applications to exemplify such advancement, including shadow detection, satellite image forensics and eating scene segmentation and clustering. Shadow detection and removal are of great interest to the image processing and image forensics community. In this thesis, we study automatic shadow detection from two different perspectives. First, we propose automatic methods for detecting and removing shadows in color images. Second, we present machine learning based methods to detect if shadows have been removed in an image. In the second part of the thesis, we study image forensics for satellite images. Satellite images have been subjected to various tampering and manipulations due to easy access and the availability of image manipulation tools. In this thesis, we propose methods to automatically detect and localize spliced objects in satellite images. Extracting information from the eating scene captured by images provides new means of studying the relationship between diet and health. In the third part of the thesis, we propose a class-agnostic food segmentation method that is able to segment foods without knowing the food type and a method to cluster eating scene images based on the eating environment.

APA, Harvard, Vancouver, ISO, and other styles

22

Khan, Asim. "Automated Detection and Monitoring of Vegetation Through Deep Learning." Thesis, 2022. https://vuir.vu.edu.au/43941/.

Full text

Abstract:

Healthy vegetation are essential not just for environmental sustainability but also for the development of sustainable and liveable cities. It is undeniable that human activities are altering the vegetation landscape, with harmful implications for the climate. As a result, autonomous detection, health evaluation, and continual monitoring of the plants are required to ensure environmental sustainability. This thesis presents research on autonomous vegetation management using recent advances in deep learning. Currently, most towns do not have a system in place for detection and continual vegetation monitoring. On the one hand, a lack of public knowledge and political will could be a factor; on the other hand, no efficient and cost-effective technique of monitoring vegetation health has been established. Individual plants health condition data is essential since urban trees often develop as stand-alone objects. Manual annotation of these individual trees is a time-consuming, expensive, and inefficient operation that is normally done in person. As a result, skilled manual annotation cannot cover broad areas, and the data they create is out of date. However, autonomous vegetation management poses a number of challenges due to its multidisciplinary nature. It includes automated detection, health assessment, and monitoring of vegetation and trees by integrating techniques from computer vision, machine learning, and remote sensing. Other challenges include a lack of analysis-ready data and imaging diversity, as well as dealing with their dependence on weather variability. With a core focus on automation of vegetation management using deep learning and transfer learning, this thesis contributes novel techniques for Multi-view vegetation detection, robust calculation of vegetation index, and real- time vegetation health assessment using deep convolutional neural networks (CNNs) and deep learning frameworks. The thesis focuses on four general aspects: a) training CNN with possibly inaccurate labels and noisy image dataset; b) deriving semantic vegetation segmentation from the ordinal information contained in the image; c) retrieving semantic vegetation indexes from street-level imagery; and d) developing a vegetation health assessment and monitoring system. Firstly, it is essential to detect and segment the vegetation, and then calculate the pixel value of the semantic vegetation index. However, because the images in multi- sensory data are not identical, all image datasets must be registered before being fed into the model training. The dataset used for vegetation detection and segmentation was acquired from multi-sensors. The whole dataset was multi-temporal based; therefore, it was registered using deep affine features through a convolutional neural network. Secondly, after preparing the dataset, vegetation was segmented by using Deep CNN, a fully convolutional network, and U-net. Although the vegetation index interprets the health of a particular area’s vegetation when assessing small and large vegetation (trees, shrubs, grass, etc.), the health of large plants, such as trees, is determined by steam. In contrast, small plants’ leaves are evaluated to decide whether they are healthy or unhealthy. Therefore, initially, small plant health was assessed through their leaves by training a deep neural network and integrating that trained model into an internet of things (IoT) device such as AWS DeepLens. Another deep CNN was trained to assess the health of large plants and trees like Eucalyptus. This one could also tell which trees were healthy and which ones were unhealthy, as well as their geo-location. Thus, we may ultimately analyse the vegetation’s health in terms of the vegetation index throughout time on the basis of a semantic-based vegetation index and compute the index in a time-series fashion. This thesis shows that computer vision, deep learning and remote sensing approaches can be used to process street-level imagery in different places and cities, to help manage urban forests in new ways, such as biomass-surveillance and remote vegetation monitoring.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Deep learning for Multimedia Forensics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles