Tesis sobre el tema "Multimodal embedding and retrieval"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 48 mejores tesis para su investigación sobre el tema "Multimodal embedding and retrieval".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Rubio, Romano Antonio. "Fashion discovery : a computer vision approach". Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672423.
Texto completoLa interpretación semántica de imágenes del mundo de la moda es sin duda uno de los dominios más desafiantes para la visión por computador. Leves variaciones en color y forma pueden conferir significados o interpretaciones distintas a una imagen. Es un dominio estrechamente ligado a la comprensión humana subjetiva, pero también a la interpretación y reconocimiento de escenarios y contextos. Ser capaz de extraer información específica sobre moda de imágenes e interpretarla de manera correcta puede ser útil en muchas situaciones y puede ayudar a entender la información subyacente en una imagen. Además, la moda es uno de los negocios más importantes a nivel global, con un valor estimado de tres trillones de dólares y un mercado online en constante crecimiento, lo cual aumenta el interés de los algoritmos basados en imágenes para buscar, clasificar o recomendar prendas. Esta tesis doctoral pretende resolver problemas específicos relacionados con el tratamiento de datos de tiendas virtuales de moda, yendo desde la información más básica a nivel de píxel hasta un entendimiento más abstracto que permita extraer conclusiones sobre las prendas presentes en una imagen, aprovechando para ello la Multi-modalidad de los datos disponibles para desarrollar algunas de las soluciones. Las contribuciones incluyen: - Un nuevo método de extracción de superpíxeles enfocado a mejorar el proceso de anotación de imágenes de moda. - La construcción de un espacio común para representar imágenes y textos referentes a moda. - La aplicación de ese espacio en la tarea de identificar el producto principal dentro de una imagen que muestra un conjunto de prendas. En resumen, la moda es un dominio complejo a muchos niveles en términos de visión por computador y aprendizaje automático, y desarrollar algoritmos específicos capaces de capturar la información esencial a partir de imágenes y textos no es una tarea trivial. Con el fin de resolver algunos de los desafíos que esta plantea, y considerando que este es un doctorado industrial, contribuimos al tema con una variedad de soluciones que pueden mejorar el rendimiento de muchas tareas extremadamente útiles para la industria de la moda online
Automàtica, robòtica i visió
Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.
Texto completoNowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
Adebayo, Kolawole John <1986>. "Multimodal Legal Information Retrieval". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amsdottorato.unibo.it/8634/1/ADEBAYO-JOHN-tesi.pdf.
Texto completoChen, Jianan. "Deep Learning Based Multimodal Retrieval". Electronic Thesis or Diss., Rennes, INSA, 2023. http://www.theses.fr/2023ISAR0019.
Texto completoMultimodal tasks play a crucial role in the progression towards achieving general artificial intelligence (AI). The primary goal of multimodal retrieval is to employ machine learning algorithms to extract relevant semantic information, bridging the gap between different modalities such as visual images, linguistic text, and other data sources. It is worth noting that the information entropy associated with heterogeneous data for the same high-level semantics varies significantly, posing a significant challenge for multimodal models. Deep learning-based multimodal network models provide an effective solution to tackle the difficulties arising from substantial differences in information entropy. These models exhibit impressive accuracy and stability in large-scale cross-modal information matching tasks, such as image-text retrieval. Furthermore, they demonstrate strong transfer learning capabilities, enabling a well-trained model from one multimodal task to be fine-tuned and applied to a new multimodal task, even in scenarios involving few-shot or zero-shot learning. In our research, we develop a novel generative multimodal multi-view database specifically designed for the multimodal referential segmentation task. Additionally, we establish a state-of-the-art (SOTA) benchmark and multi-view metric for referring expression segmentation models in the multimodal domain. The results of our comparative experiments are presented visually, providing clear and comprehensive insights
Böckmann, Christine, Jens Biele, Roland Neuber y Jenny Niebsch. "Retrieval of multimodal aerosol size distribution by inversion of multiwavelength data". Universität Potsdam, 1997. http://opus.kobv.de/ubp/volltexte/2007/1436/.
Texto completoZhu, Meng. "Cross-modal semantic-associative labelling, indexing and retrieval of multimodal data". Thesis, University of Reading, 2010. http://centaur.reading.ac.uk/24828/.
Texto completoKahn, Itamar. "Remembering the past : multimodal imaging of cortical contributions to episodic retrieval". Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33171.
Texto completoThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references.
What is the nature of the neural processes that allow humans to remember past events? The theoretical framework adopted in this thesis builds upon cognitive models that suggest that episodic retrieval can be decomposed into two classes of computations: (1) recovery processes that serve to reactivate stored memories, making information from a past episode readily available, and (2) control processes that serve to guide the retrieval attempt and monitor/evaluate information arising from the recovery processes. A multimodal imaging approach that combined fMRI and MEG was adopted to gain insight into the spatial and temporal brain mechanisms supporting episodic retrieval. Chapter 1 reviews major findings and theories in the episodic retrieval literature grounding the open questions and controversies within the suggested framework. Chapter 2 describes an fMRI and MEG experiment that identified medial temporal cortical structures that signal item memory strength, thus supporting the perception of item familiarity. Chapter 3 describes an fMRI experiment that demonstrated that retrieval of contextual details involves reactivation of neural patterns engaged at encoding.
(cont.) Further, leveraging this pattern of reactivation, it was demonstrated that false recognition may be accompanied by recollection. The fMRI experiment reported in Chapter 3, when combined with an MEG experiment reported in Chapter 4, directly addressed questions regarding the control processes engaged during episodic retrieval. In particular, Chapter 3 showed that parietal and prefrontal cortices contribute to controlling the act of arriving at a retrieval decision. Chapter 4 then illuminates the temporal characteristics of parietal activation during episodic retrieval, providing novel evidence about the nature of parietal responses and thus constraints on theories of parietal involvement in episodic retrieval. The conducted research targeted distinct aspects of the multi-faceted act of remembering the past. The obtained data contribute to the building of an anatomical and temporal "blueprint" documenting the cascade of neural events that unfold during attempts to remember, as well as when such attempts are met with success or lead to memory errors. In the course of framing this research within the context of cognitive models of retrieval, the obtained neural data reflect back on and constrain these theories of remembering.
by Itamar Kahn.
Ph.D.
Nag, Chowdhury Sreyasi [Verfasser]. "Text-image synergy for multimodal retrieval and annotation / Sreyasi Nag Chowdhury". Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2021. http://d-nb.info/1240674139/34.
Texto completoLuqman, Muhammad Muzzamil. "Fuzzy multilevel graph embedding for recognition, indexing and retrieval of graphic document images". Thesis, Tours, 2012. http://www.theses.fr/2012TOUR4005/document.
Texto completoThis thesis addresses the problem of lack of efficient computational tools for graph based structural pattern recognition approaches and proposes to exploit computational strength of statistical pattern recognition. It has two fold contributions. The first contribution is a new method of explicit graph embedding. The proposed graph embedding method exploits multilevel analysis of graph for extracting graph level information, structural level information and elementary level information from graphs. It embeds this information into a numeric feature vector. The method employs fuzzy overlapping trapezoidal intervals for addressing the noise sensitivity of graph representations and for minimizing the information loss while mapping from continuous graph space to discrete vector space. The method has unsupervised learning abilities and is capable of automatically adapting its parameters to underlying graph dataset. The second contribution is a framework for automatic indexing of graph repositories for graph retrieval and subgraph spotting. This framework exploits explicit graph embedding for representing the cliques of order 2 by numeric feature vectors, together with classification and clustering tools for automatically indexing a graph repository. It does not require a labeled learning set and can be easily deployed to a range of application domains, offering ease of query by example (QBE) and granularity of focused retrieval
Lolich, María y Susana Azzollini. "Phenomenological retrieval style of autobiographical memories in a sample of major depressed individuals". Pontificia Universidad Católica del Perú, 2016. http://repositorio.pucp.edu.pe/index/handle/123456789/99894.
Texto completoLa evocación de recuerdos autobiográficos se caracteriza por presentar distintos compo nentes fenomenológicos. Dada la ausencia de trabajos previos realizados en poblaciones hispanoparlantes, se realizaron 34 entrevistas en profundidad a individuos con y sin tras torno depresivo mayor de la ciudad de Buenos Aires (Argentina). Fueron explorados los componentes fenomenológicos presentes en la evocación de recuerdos autobiográficos significativos. Los datos fueron analizados cualitativamente por medio de la Teoría Fun damentada en los Hechos. Durante el análisis descriptivo, se detectaron siete categorías fenomenológicas emergentes del discurso. Del análisis axial y selectivo fueron identificados dos ejes discursivos: retórico-proposicional y especificidad-generalidad. Las implicancias, en la regulación afectiva, derivadas de la asunción de un estilo amodal o multimodal de proce samiento de información autobiográfica merecen mayor atención.
A evocação de memórias autobiográficas é caracterizada por diferentes componentes feno menológicos. Dada a falta de trabalhos prévios sobre o tema em populações de língua espanhola, 34 entrevistas em profundidade foram conduzidas em indivíduos com e sem transtorno depressivo maior na cidade de Buenos Aires (Argentina). Foram explorados os componentes fenomenológicos presentes na evocação de memórias autobiográficas signi ficativas. Os dados foram analisados qualitativamente através da Teoria Fundamentada. Durante a análise descritiva, foram detectadas sete categorias fenomenológicas emer gentes no discurso. Dos analises axial e seletivo foram identificados dois eixos discursivos: retórico-proposicional e especificidade-generalidade. As implicações, na regulação afetiva, decorrentes da assunção de um estilo amodal ou um estilo multimodal no processamento de informações autobiográficas merecem mais atenção.
Valero-Mas, Jose J. "Towards Interactive Multimodal Music Transcription". Doctoral thesis, Universidad de Alicante, 2017. http://hdl.handle.net/10045/71275.
Texto completoQuack, Till. "Large scale mining and retrieval of visual data in a multimodal context". Konstanz Hartung-Gorre, 2009. http://d-nb.info/993614620/04.
Texto completoSaragiotis, Panagiotis. "Cross-modal classification and retrieval of multimodal data using combinations of neural networks". Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843338/.
Texto completoFedel, Gabriel de Souza. "Busca multimodal para apoio à pesquisa em biodiversidade". [s.n.], 2011. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275751.
Texto completoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-18T07:07:49Z (GMT). No. of bitstreams: 1 Fedel_GabrieldeSouza_M.pdf: 14390093 bytes, checksum: 63058da33a22121e927f1cdbaff297d3 (MD5) Previous issue date: 2011
Resumo: A pesquisa em computação aplicada à biodiversidade apresenta muitos desafios, que vão desde o grande volume de dados altamente heterogêneos até a variedade de tipos de usuários. Isto gera a necessidade de ferramentas versáteis de recuperação. As ferramentas disponíveis ainda são limitadas e normalmente só consideram dados textuais, deixando de explorar a potencialidade da busca por dados de outra natureza, como imagens ou sons. Esta dissertação analisa os problemas de realizar consultas multimodais a partir de predicados que envolvem texto e imagem para o domínio de biodiversidade, especificando e implementando um conjunto de ferramentas para processar tais consultas. As contribuições do trabalho, validado com dados reais, incluem a construção de uma ontologia taxonômica associada a nomes vulgares e a possibilidade de apoiar dois perfis de usuários (especialistas e leigos). Estas características estendem o escopo da consultas atualmente disponíveis em sistemas de biodiversidade. Este trabalho está inserido no projeto Bio-CORE, uma parceria entre pesquisadores de computação e biologia para criar ferramentas computacionais para dar apoio à pesquisa em biodiversidade
Abstract: Research on Computing applied to biodiversity present several challenges, ranging from the massive volumes of highly heterogeneous data to the variety in user profiles. This kind of scenario requires versatile data retrieval and management tools. Available tools are still limited. Most often, they only consider textual data and do not take advantage of the multiple data types available, such as images or sounds. This dissertation discusses issues concerning multimodal queries that involve both text and images as search parameters, for the domanin of biodiversity. It presents the specification and implementation of a set of tools to process such queries, which were validate with real data from Unicamp's Zoology Museum. The aim contributions also include the construction of a taxonomic ontology that includes species common names, and support to both researchers and non-experts in queries. Such features extend the scop of queries available in biodiversity information systems. This research is associated with the Biocore project, jointly conducted by researchers in computing and biology, to design and develop computational tools to support research in biodiversity
Mestrado
Banco de Dados
Mestre em Ciência da Computação
Dyar, Samuel S. "A multimodal speech interface for dynamic creation and retrieval of geographical landmarks on a mobile device". Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62638.
Texto completoCataloged from PDF version of thesis.
Includes bibliographical references (p. 140).
As mobile devices become more powerful, researchers look to develop innovative applications that use new and effective means of input. Furthermore, developers must exploit the device's many capabilities (GPS, camera, touch screen, etc) in order to make equally powerful applications. This thesis presents the development of a multimodal system that allows users to create and share informative geographical landmarks using Android-powered smart-phones. The content associated with each landmark is dynamically integrated into the system's vocabulary, which allows users to easily use speech to access landmarks by the information related to them. The initial results of releasing the application on the Android Market have been encouraging, but also suggest that improvements need to be made to the system.
by Samuel S. Dyar.
M.Eng.
Calumby, Rodrigo Tripodi 1985. "Recuperação multimodal de imagens com realimentação de relevância baseada em programação genética". [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275814.
Texto completoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-16T05:18:58Z (GMT). No. of bitstreams: 1 Calumby_RodrigoTripodi_M.pdf: 15749586 bytes, checksum: 2493b0b703adc1973eeabf7eb70ad21c (MD5) Previous issue date: 2010
Resumo: Este trabalho apresenta uma abordagem para recuperação multimodal de imagens com realimentação de relevância baseada em programação genética. Supõe-se que cada imagem da coleção possui informação textual associada (metadado, descrição textual, etc.), além de ter suas propriedades visuais (por exemplo, cor e textura) codificadas em vetores de características. A partir da informação obtida ao longo das iterações de realimentação de relevância, programação genética é utilizada para a criação de funções de combinação de medidas de similaridades eficazes. Com essas novas funções, valores de similaridades diversos são combinados em uma única medida, que mais adequadamente reflete as necessidades do usuário. As principais contribuições deste trabalho consistem na proposta e implementação de dois arcabouços. O primeiro, RFCore, é um arcabouço genérico para atividades de realimentação de relevância para manipulação de objetos digitais. O segundo, MMRFGP, é um arcabouço para recuperação de objetos digitais com realimentação de relevância baseada em programação genética, construído sobre o RFCore. O método proposto de recuperação multimodal de imagens foi validado sobre duas coleções de imagens, uma desenvolvida pela Universidade de Washington e outra da ImageCLEF Photographic Retrieval Task. A abordagem proposta mostrou melhores resultados para recuperação multimodal frente a utilização das modalidades isoladas. Além disso, foram obtidos resultados para recuperação visual e multimodal melhores do que as melhores submissões para a ImageCLEF Photographic Retrieval Task 2008
Abstract: This work presents an approach for multimodal content-based image retrieval with relevance feedback based on genetic programming. We assume that there is textual information (e.g., metadata, textual descriptions) associated with collection images. Furthermore, image content properties (e.g., color and texture) are characterized by image descriptores. Given the information obtained over the relevance feedback iterations, genetic programming is used to create effective combination functions that combine similarities associated with different features. Hence using these new functions the different similarities are combined into a unique measure that more properly meets the user needs. The main contribution of this work is the proposal and implementation of two frameworks. The first one, RFCore, is a generic framework for relevance feedback tasks over digital objects. The second one, MMRF-GP, is a framework for digital object retrieval with relevance feedback based on genetic programming and it was built on top of RFCore. We have validated the proposed multimodal image retrieval approach over 2 datasets, one from the University of Washington and another from the ImageCLEF Photographic Retrieval Task. Our approach has yielded the best results for multimodal image retrieval when compared with one-modality approaches. Furthermore, it has achieved better results for visual and multimodal image retrieval than the best submissions for ImageCLEF Photographic Retrieval Task 2008
Mestrado
Sistemas de Recuperação da Informação
Mestre em Ciência da Computação
Durak, Nurcan. "Semantic Video Modeling And Retrieval With Visual, Auditory, Textual Sources". Master's thesis, METU, 2004. http://etd.lib.metu.edu.tr/upload/12605438/index.pdf.
Texto completoOztarak, Hakan. "Structural And Event Based Multimodal Video Data Modeling". Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606919/index.pdf.
Texto completoMatera, Tomáš. "Visipedia - Multi-dimensional Object Embedding Based on Perceptual Similarity". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236115.
Texto completoVukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data". Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Texto completoIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Meseguer, Brocal Gabriel. "Multimodal analysis : informed content estimation and audio source separation". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS111.
Texto completoThis dissertation proposes the study of multimodal learning in the context of musical signals. Throughout, we focus on the interaction between audio signals and text information. Among the many text sources related to music that can be used (e.g. reviews, metadata, or social network feedback), we concentrate on lyrics. The singing voice directly connects the audio signal and the text information in a unique way, combining melody and lyrics where a linguistic dimension complements the abstraction of musical instruments. Our study focuses on the audio and lyrics interaction for targeting source separation and informed content estimation. Real-world stimuli are produced by complex phenomena and their constant interaction in various domains. Our understanding learns useful abstractions that fuse different modalities into a joint representation. Multimodal learning describes methods that analyse phenomena from different modalities and their interaction in order to tackle complex tasks. This results in better and richer representations that improve the performance of the current machine learning methods. To develop our multimodal analysis, we need first to address the lack of data containing singing voice with aligned lyrics. This data is mandatory to develop our ideas. Therefore, we investigate how to create such a dataset automatically leveraging resources from the World Wide Web. Creating this type of dataset is a challenge in itself that raises many research questions. We are constantly working with the classic ``chicken or the egg'' problem: acquiring and cleaning this data requires accurate models, but it is difficult to train models without data. We propose to use the teacher-student paradigm to develop a method where dataset creation and model learning are not seen as independent tasks but rather as complementary efforts. In this process, non-expert karaoke time-aligned lyrics and notes describe the lyrics as a sequence of time-aligned notes with their associated textual information. We then link each annotation to the correct audio and globally align the annotations to it. For this purpose, we use the normalized cross-correlation between the voice annotation sequence and the singing voice probability vector automatically, which is obtained using a deep convolutional neural network. Using the collected data we progressively improve that model. Every time we have an improved version, we can in turn correct and enhance the data
SIMONETTA, FEDERICO. "MUSIC INTERPRETATION ANALYSIS. A MULTIMODAL APPROACH TO SCORE-INFORMED RESYNTHESIS OF PIANO RECORDINGS". Doctoral thesis, Università degli Studi di Milano, 2022. http://hdl.handle.net/2434/918909.
Texto completoEkeberg, Tomas. "Flash Diffractive Imaging in Three Dimensions". Doctoral thesis, Uppsala universitet, Molekylär biofysik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-179643.
Texto completoBucher, Maxime. "Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images". Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC250/document.
Texto completoIn this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision
Inagaki, Yasuyoshi, Katsuhiko Toyama, Nobuo Kawaguchi, Shigeki Matsubara, Satoru Matsunaga, 康善 稲垣, 勝彦 外山, 信夫 河口, 茂樹 松原 y 悟. 松永. "Sync/Mail : 話し言葉の漸進的変換に基づく即時応答インタフェース". 一般社団法人情報処理学会, 1998. http://hdl.handle.net/2237/15382.
Texto completoEl, Mahdaouy Abdelkader. "Accès à l'information dans les grandes collections textuelles en langue arabe". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM091/document.
Texto completoGiven the amount of Arabic textual information available on the web, developing effective Information Retrieval Systems (IRS) has become essential to retrieve relevant information. Most of the current Arabic SRIs are based on the bag-of-words representation, where documents are indexed using surface words, roots or stems. Two main drawbacks of the latter representation are the ambiguity of Single Word Terms (SWTs) and term mismatch.The aim of this work is to deal with SWTs ambiguity and term mismatch. Accordingly, we propose four contributions to improve Arabic content representation, indexing, and retrieval. The first contribution consists of representing Arabic documents using Multi-Word Terms (MWTs). The latter is motivated by the fact that MWTs are more precise representational units and less ambiguous than isolated SWTs. Hence, we propose a hybrid method to extract Arabic MWTs, which combines linguistic and statistical filtering of MWT candidates. The linguistic filter uses POS tagging to identify MWTs candidates that fit a set of syntactic patterns and handles the problem of MWTs variation. Then, the statistical filter rank MWT candidate using our proposed association measure that combines contextual information and both termhood and unithood measures. In the second contribution, we explore and evaluate several IR models for ranking documents using both SWTs and MWTs. Additionally, we investigate a wide range of proximity-based IR models for Arabic IR. Then, we introduce a formal condition that IR models should satisfy to deal adequately with term dependencies. The third contribution consists of a method based on Distributed Representation of Word vectors, namely Word Embedding (WE), for Arabic IR. It relies on incorporating WE semantic similarities into existing probabilistic IR models in order to deal with term mismatch. The aim is to allow distinct, but semantically similar terms to contribute to documents scores. The last contribution is a method to incorporate WE similarity into Pseud-Relevance Feedback PRF for Arabic Information Retrieval. The main idea is to select expansion terms using their distribution in the set of top pseudo-relevant documents along with their similarity to the original query terms. The experimental validation of all the proposed contributions is performed using standard Arabic TREC 2002/2001 collection
Couairon, Guillaume. "Text-Based Semantic Image Editing". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS248.
Texto completoThe aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models
Slizovskaia, Olga. "Audio-visual deep learning methods for musical instrument classification and separation". Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/669963.
Texto completoEn la percepción musical, normalmente recibimos por nuestro sistema visual y por nuestro sistema auditivo informaciones complementarias. Además, la percepción visual juega un papel importante en nuestra experiencia integral ante una interpretación musical. Esta relación entre audio y visión ha incrementado el interés en métodos de aprendizaje automático capaces de combinar ambas modalidades para el análisis musical automático. Esta tesis se centra en dos problemas principales: la clasificación de instrumentos y la separación de fuentes en el contexto de videos musicales. Para cada uno de los problemas, se desarrolla un método multimodal utilizando técnicas de Deep Learning. Esto nos permite obtener -a través del aprendizaje- una representación codificada para cada modalidad. Además, para el problema de la separación de fuentes, también proponemos dos modelos condicionados a las etiquetas de los instrumentos, y examinamos la influencia que tienen dos fuentes de información extra en el rendimiento de la separación -comparándolas contra un modelo convencional-. Otro aspecto importante de este trabajo se basa en la exploración de diferentes modelos de fusión que permiten una mejor integración multimodal de fuentes de información de dominios asociados.
En la percepció visual, és habitual que rebem informacions complementàries des del nostres sistemes visual i auditiu. A més a més, la percepció visual té un paper molt important en la nostra experiència integral davant una interpretació musical. Aquesta relació entre àudio i visió ha fet créixer l'interès en mètodes d’aprenentatge automàtic capaços de combinar ambdues modalitats per l’anàlisi musical automàtic. Aquesta tesi se centra en dos problemes principals: la classificació d'instruments i la separació de fonts en el context dels vídeos musicals. Per a cadascú dels problemes, s'ha desenvolupat un mètode multimodal fent servir tècniques de Deep Learning. Això ens ha permès d'obtenir – gràcies a l’aprenentatge- una representació codificada per a cada modalitat. A més a més, en el cas del problema de separació de fonts, també proposem dos models condicionats a les etiquetes dels instruments, i examinem la influència que tenen dos fonts d’informació extra sobre el rendiment de la separació -tot comparant-les amb un model convencional-. Un altre aspecte d’aquest treball es basa en l’exploració de diferents models de fusió, els quals permeten una millor integració multimodal de fonts d'informació de dominis associats.
Nguyen, Nhu Van. "Représentations visuelles de concepts textuels pour la recherche et l'annotation interactives d'images". Phd thesis, Université de La Rochelle, 2011. http://tel.archives-ouvertes.fr/tel-00730707.
Texto completoBonardi, Fabien. "Localisation visuelle multimodale visible/infrarouge pour la navigation autonome". Thesis, Normandie, 2017. http://www.theses.fr/2017NORMR028/document.
Texto completoAutonomous navigation field gathers the set of algorithms which automate the moves of a mobile robot. The case study of this thesis focuses on the outdoor localisation issue with additionnal constraints : the use of visual sensors only with variable specifications (geometry, modality, etc) and long-term apparence changes of the surrounding environment. Both types of constraints are still rarely studied in the state of the art. Our main contribution concerns the description and compression steps of the data extracted from images. We developped a method called PHROG which represents data as a visual-words histogram. Obtained results on several images datasets show an improvment of the scenes recognition performance compared to methods from the state of the art. In a context of navigation, acquired images are sequential such that we can envision a filtering method to avoid faulty localisation estimation. Two probabilistic filtering approaches are proposed : a first one defines a simple movement model with a histograms filter and a second one sets up a more complex model using visual odometry and a particules filter
ur, Réhman Shafiq. "Expressing emotions through vibration for perception and control". Doctoral thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-32990.
Texto completoTaktil Video
Yesiler, M. Furkan. "Data-driven musical version identification: accuracy, scalability and bias perspectives". Doctoral thesis, Universitat Pompeu Fabra, 2022. http://hdl.handle.net/10803/673264.
Texto completoEn esta tesis se desarrollan sistemas de identificación de versiones musicales basados en audio y aplicables en un entorno industrial. Por lo tanto, los tres aspectos que se abordan en esta tesis son el desempeño, escalabilidad, y los sesgos algorítmicos en los sistemas de identificación de versiones. Se propone un modelo dirigido por datos que incorpora conocimiento musical en su arquitectura de red y estrategia de entrenamiento, para lo cual se experimenta con dos enfoques. Primero, se experimenta con métodos de fusión dirigidos por datos para combinar la información de los modelos que procesan información melódica y armónica, logrando un importante incremento en la exactitud de la identificación. Segundo, se investigan técnicas para la destilación de embeddings para reducir su tamaño, lo cual reduce los requerimientos de almacenamiento de datos, y lo que es más importante, del tiempo de búsqueda. Por último, se analizan los sesgos algorítmicos de nuestros sistemas.
Lerner, Paul. "Répondre aux questions visuelles à propos d'entités nommées". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG074.
Texto completoThis thesis is positioned at the intersection of several research fields, Natural Language Processing, Information Retrieval (IR) and Computer Vision, which have unified around representation learning and pre-training methods. In this context, we have defined and studied a new multimodal task: Knowledge-based Visual Question Answering about Named Entities (KVQAE).In this context, we were particularly interested in cross-modal interactions and different ways of representing named entities. We also focused on data used to train and, more importantly, evaluate Question Answering systems through different metrics.More specifically, we proposed a dataset for this purpose, the first in KVQAE comprising various types of entities. We also defined an experimental framework for dealing with KVQAE in two stages through an unstructured knowledge base and identified IR as the main bottleneck of KVQAE, especially for questions about non-person entities. To improve the IR stage, we studied different multimodal fusion methods, which are pre-trained through an original task: the Multimodal Inverse Cloze Task. We found that these models leveraged a cross-modal interaction that we had not originally considered, and which may address the heterogeneity of visual representations of named entities. These results were strengthened by a study of the CLIP model, which allows this cross-modal interaction to be modeled directly. These experiments were carried out while staying aware of biases present in the dataset or evaluation metrics, especially of textual biases, which affect any multimodal task
Bahceci, Oktay. "Deep Neural Networks for Context Aware Personalized Music Recommendation : A Vector of Curation". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210252.
Texto completoInformationsfiltrering och rekommendationssystem har använts och implementeratspå flera olika sätt från olika enheter sedan gryningen avInternet, och moderna tillvägagångssätt beror påMaskininlärrning samtDjupinlärningför att kunna skapa precisa och personliga rekommendationerför användare i en given kontext. Dessa modeller kräver data i storamängder med en varians av kännetecken såsom tid, plats och användardataför att kunna hitta korrelationer samt mönster som klassiska modellersåsom matris faktorisering samt samverkande filtrering inte kan. Dettaexamensarbete forskar, implementerar och jämför en mängd av modellermed fokus påMaskininlärning samt Djupinlärning för musikrekommendationoch gör det med succé genom att representera rekommendationsproblemetsom ett extremt multi-klass klassifikationsproblem med 100000 unika klasser att välja utav. Genom att jämföra fjorton olika experiment,så lär alla modeller sig kännetäcken såsomtid, plats, användarkänneteckenoch lyssningshistorik för att kunna skapa kontextberoendepersonaliserade musikprediktioner, och löser kallstartsproblemet genomanvändning av användares demografiska kännetäcken, där den bästa modellenklarar av att fånga målklassen i sin rekommendationslista medlängd 100 för mer än 1/3 av det osedda datat under en offline evaluering,när slumpmässigt valda exempel från den osedda kommande veckanevalueras.
Htait, Amal. "Sentiment analysis at the service of book search". Electronic Thesis or Diss., Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0260.
Texto completoThe web technology is in an on going growth, and a huge volume of data is generated in the social web, where users would exchange a variety of information. In addition to the fact that social web text may be rich of information, the writers are often guided by provoked sentiments reflected in their writings. Based on that concept, locating sentiment in a text can play an important role for information extraction. The purpose of this thesis is to improve the book search and recommendation quality of the Open Edition's multilingual Books platform. The Books plat- form also offers additional information through users generated information (e.g. book reviews) connected to the books and rich in emotions expressed in the users' writings. Therefore, the previous analysis, concerning locating sentiment in a text for information extraction, plays an important role in this thesis, and can serve the purpose of quality improvement concerning book search, using the shared users generated information. Accordingly, we choose to follow a main path in this thesis to combine sentiment analysis (SA) and information retrieval (IR) fields, for the purpose of improving the quality of book search. Two objectives are summarised in the following, which serve the main purpose of the thesis in the IR quality improvement using SA: • An approach for SA prediction, easily applicable on different languages, low cost in time and annotated data. • New approaches for book search quality improvement, based on SA employment in information filtering, retrieving and classifying
Guillaumin, Matthieu. "Données multimodales pour l'analyse d'image". Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00522278/en/.
Texto completoGuillaumin, Matthieu. "Données multimodales pour l'analyse d'image". Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM048.
Texto completoThis dissertation delves into the use of textual metadata for image understanding. We seek to exploit this additional textual information as weak supervision to improve the learning of recognition models. There is a recent and growing interest for methods that exploit such data because they can potentially alleviate the need for manual annotation, which is a costly and time-consuming process. We focus on two types of visual data with associated textual information. First, we exploit news images that come with descriptive captions to address several face related tasks, including face verification, which is the task of deciding whether two images depict the same individual, and face naming, the problem of associating faces in a data set to their correct names. Second, we consider data consisting of images with user tags. We explore models for automatically predicting tags for new images, i. E. Image auto-annotation, which can also used for keyword-based image search. We also study a multimodal semi-supervised learning scenario for image categorisation. In this setting, the tags are assumed to be present in both labelled and unlabelled training data, while they are absent from the test data. Our work builds on the observation that most of these tasks can be solved if perfectly adequate similarity measures are used. We therefore introduce novel approaches that involve metric learning, nearest neighbour models and graph-based methods to learn, from the visual and textual data, task-specific similarities. For faces, our similarities focus on the identities of the individuals while, for images, they address more general semantic visual concepts. Experimentally, our approaches achieve state-of-the-art results on several standard and challenging data sets. On both types of data, we clearly show that learning using additional textual information improves the performance of visual recognition systems
Gong, Rong. "Automatic assessment of singing voice pronunciation: a case study with Jingju music". Doctoral thesis, Universitat Pompeu Fabra, 2018. http://hdl.handle.net/10803/664421.
Texto completoEl aprendizaje en línea ha cambiado notablemente la educación musical en la pasada década. Una cada vez mayor cantidad de estudiantes de interpretación musical participan en cursos de aprendizaje musical en línea por su fácil accesibilidad y no estar limitada por restricciones de tiempo y espacio. Puede considerarse el canto como la forma más básica de interpretación. La evaluación automática de la voz cantada, como tarea importante en la disciplina de Recuperación de Información Musical (MIR por sus siglas en inglés) tiene como objetivo la extracción de información musicalmente significativa y la medición de la calidad de la voz cantada del estudiante. La corrección y calidad del canto son específicas a cada cultura y su evaluación requiere metodologías con especificidad cultural. La música del jingju (también conocido como ópera de Beijing) es una de las tradiciones musicales más representativas de China y se ha difundido a muchos lugares del mundo donde existen comunidades chinas.Nuestro objetivo es abordar problemas aún no explorados sobre la evaluación automática de la voz cantada en la música del jingju, hacer que las propuestas eurogenéticas actuales sobre evaluación sean más específicas culturalmente, y al mismo tiempo, desarrollar nuevas propuestas sobre evaluación que puedan ser generalizables para otras tradiciones musicales.
Pinho, Eduardo Miguel Coutinho Gomes de. "Multimodal information retrieval in medical imaging archives". Doctoral thesis, 2019. http://hdl.handle.net/10773/29206.
Texto completoA proliferação de modalidades de imagem médica digital, em hospitais, clínicas e outros centros de diagnóstico, levou à criação de enormes repositórios de dados, frequentemente não explorados na sua totalidade. Além disso, os últimos anos revelam, claramente, uma tendência para o crescimento da produção de dados. Portanto, torna-se importante estudar novas maneiras de indexar, processar e recuperar imagens médicas, por parte da comunidade alargada de radiologistas, cientistas e engenheiros. A recuperação de imagens baseada em conteúdo, que envolve uma grande variedade de métodos, permite a exploração da informação visual num arquivo de imagem médica, o que traz benefícios para os médicos e investigadores. Contudo, a integração destas soluções nos fluxos de trabalho é ainda rara e a eficácia dos mais recentes sistemas de recuperação de imagem médica pode ser melhorada. A presente tese propõe soluções e métodos para recuperação de informação multimodal, no contexto de repositórios de imagem médica. As contribuições principais são as seguintes: um motor de pesquisa para estudos de imagem médica com suporte a pesquisas multimodais num arquivo extensível; uma estrutura para a anotação automática de imagens; e uma avaliação e proposta de técnicas de representation learning para deteção automática de conceitos em imagens médicas, exibindo maior potencial do que as técnicas de extração de features visuais outrora pertinentes em tarefas semelhantes. Estas contribuições procuram reduzir as dificuldades técnicas e científicas para o desenvolvimento e adoção de sistemas modernos de recuperação de imagem médica multimodal, de modo a que estes façam finalmente parte das ferramentas típicas dos profissionais, professores e investigadores da área da saúde.
Programa Doutoral em Informática
Hong, Wei Jhe y 洪煒哲. "An XML-based Metadata Embedding System for Context-aware Image Retrieval". Thesis, 2009. http://ndltd.ncl.edu.tw/handle/y6am66.
Texto completo國立虎尾科技大學
光電與材料科技研究所
97
A variety of digital electronic devices appear in the modern society, therefore, a significant increase in all kinds of multimedia information. Because the factor of a large number of multimedia information, so a variety of image management programs appear. A variety of image management program requirements in order to better search and management efficiency, it will set the framework for the description of information structure and index. However, the structure between the different programs, there is no standard specification. This problem can not be made to describe the information exchange between different programs. This paper presents a method of embedding image metadata in JPEG image file. This method uses XML image metadata to create index of metadata for accelerating the capture and search efficiency. In the paper, it is proposed to use the JPEG standard image format for embedding metadata. Because the JPEG standard has application segments can be expanded for different application, so we will embed metadata in specific application segment. In this way, embedded metadata can be got in specific place. Metadata is created by XML standard, and the use of XML Schema that describes the structure of metadata. Because the metadata embedded in the picture described, so we have to spend a lot of file access time for each picture to capture information and search. In order to solve the problem, so we propose a method to use index of metadata and miniature. Metadata index is described by the same description of the classification of information into a different set of dictionaries. In order to accelerate the search speed, index of the dictionary will use the hash function computation. Metadata can be used to modify metadata consistency and quick search. Index miniature is created by combining single miniature。 The created index miniature''s file size is reduced to the size of the original 20%. In order to provide a more accurate search, Embedded metadata will be combined with keywords, MPEG-7 image feature and user-defined information Keyword: Content-aware, XML, Embedded Metadata, Dictionary index, Image search
"Video2Vec: Learning Semantic Spatio-Temporal Embedding for Video Representations". Master's thesis, 2016. http://hdl.handle.net/2286/R.I.40765.
Texto completoDissertation/Thesis
Masters Thesis Computer Science 2016
Duan, Lingyu. "Multimodal mid-level representations for semantic analysis of broadcast video". Thesis, 2008. http://hdl.handle.net/1959.13/25819.
Texto completoThis thesis investigates the problem of seeking multimodal mid-level representations for semantic analysis of broadcast video. The problem is of interest as humans tend to use high-level semantic concepts when querying and browsing ever increasing multimedia databases, yet generic low-level content metadata available from automated processing deals only with representing perceived content, but not its semantics. Multimodal mid-level representations refer to intermediate representations of multimedia signals that make various kinds of knowledge explicit and that expose various kinds of constraints within the context and knowledge assumed by the analysis system. Semantic multimedia analysis tries to establish the links from the feature descriptors and the syntactic elements to the domain semantics. The goal of this thesis is to devise a mid-level representation framework for detecting semantics from broadcast video, using supervised and data-driven approaches to represent domain knowledge in a manner to facilitate inferencing, i.e., answering the questions asked by higher-level analysis. In our framework, we attempt to address three sub-problems: context-dependent feature extraction, semantic video shot classification, and integration of multimodal cues towards semantic analysis. We propose novel models for the representations of low-level multimedia features. We employ dominant modes in the feature space to characterize color and motion in a nonparametric manner. With the combined use of data-driven mode seeking and supervised learning, we are able to capture contextual information of broadcast video and yield semantic meaningful color and motion features. We present the novel concepts of semantic video shot classes towards an effective approach for reverse engineering of the broadcast video capturing and editing processes. Such concepts link the computational representations of low-level multimedia features with video shot size and the main subject within a shot in the broadcast video stream. The linking, subject to the domain constraints, is achieved by statistical learning. We develop solutions for detecting sports events and classifying commercial spots from broad-cast video streams. This is realized by integrating multiple modalities, in particular the text-based external resources. The alignment across modalities is based on semantic video shot classes. With multimodal mid-level representations, we are able to automatically extract rich semantics from sports programs and commercial spots, with promising accuracies. These findings demonstrate the potential of our framework of constructing mid-level representations to narrow the semantic gap, and it has broad outlook in adapting to new content domains.
Duan, Lingyu. "Multimodal mid-level representations for semantic analysis of broadcast video". 2008. http://hdl.handle.net/1959.13/25819.
Texto completoThis thesis investigates the problem of seeking multimodal mid-level representations for semantic analysis of broadcast video. The problem is of interest as humans tend to use high-level semantic concepts when querying and browsing ever increasing multimedia databases, yet generic low-level content metadata available from automated processing deals only with representing perceived content, but not its semantics. Multimodal mid-level representations refer to intermediate representations of multimedia signals that make various kinds of knowledge explicit and that expose various kinds of constraints within the context and knowledge assumed by the analysis system. Semantic multimedia analysis tries to establish the links from the feature descriptors and the syntactic elements to the domain semantics. The goal of this thesis is to devise a mid-level representation framework for detecting semantics from broadcast video, using supervised and data-driven approaches to represent domain knowledge in a manner to facilitate inferencing, i.e., answering the questions asked by higher-level analysis. In our framework, we attempt to address three sub-problems: context-dependent feature extraction, semantic video shot classification, and integration of multimodal cues towards semantic analysis. We propose novel models for the representations of low-level multimedia features. We employ dominant modes in the feature space to characterize color and motion in a nonparametric manner. With the combined use of data-driven mode seeking and supervised learning, we are able to capture contextual information of broadcast video and yield semantic meaningful color and motion features. We present the novel concepts of semantic video shot classes towards an effective approach for reverse engineering of the broadcast video capturing and editing processes. Such concepts link the computational representations of low-level multimedia features with video shot size and the main subject within a shot in the broadcast video stream. The linking, subject to the domain constraints, is achieved by statistical learning. We develop solutions for detecting sports events and classifying commercial spots from broad-cast video streams. This is realized by integrating multiple modalities, in particular the text-based external resources. The alignment across modalities is based on semantic video shot classes. With multimodal mid-level representations, we are able to automatically extract rich semantics from sports programs and commercial spots, with promising accuracies. These findings demonstrate the potential of our framework of constructing mid-level representations to narrow the semantic gap, and it has broad outlook in adapting to new content domains.
Lu, Hung-Tsung y 盧宏宗. "Semantic Retrieval of Personal Photos Using Multimodal Deep Autoencoder Fusing Visual and Speech Features". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/58fvxy.
Texto completoMourão, André Belchior. "Towards an Architecture for Efficient Distributed Search of Multimodal Information". Doctoral thesis, 2018. http://hdl.handle.net/10362/38850.
Texto completoCarvalho, José Ricardo de Abreu. "Pesquisa multimodal de imagens em dispositivos móveis". Master's thesis, 2021. http://hdl.handle.net/10400.13/3984.
Texto completoDespite the evolution in the field of reverse image search, with algorithms becoming more robust and effective, there still interest for improving search techniques, improving the user experience when searching for the images the user has in mind. The main goal of this work was to develop an application for mobile devices (smartphones) that would allow the user to find images through multimodal inputs. Thus, this dissertation, in addition to propose the search for images in different ways (keywords, drawing/sketching, and camera or device images), proposes that the user can create an image by himself through drawing, editing / changing an existing image, having feedback at the time of each change / interaction. Throughout the search experience, the user can use the images found (which it finds relevant) and improve the search through its edition, going against what it thinks to find. The implementation of this proposal was based on a Google Cloud Vision API responsible for obtaining the results, and the ATsketchkit framework that allowed the creation of drawings, for Apple's iOS system. Tests were carried out with a set of users with different levels of experience in image research and different drawing ability, allowing to assess preference in different input methods, satisfaction with the images retrieved, as well as the usability of the prototype.
"Representation, Exploration, and Recommendation of Music Playlists". Master's thesis, 2019. http://hdl.handle.net/2286/R.I.54843.
Texto completoDissertation/Thesis
Masters Thesis Computer Science 2019
He, Kun. "Learning deep embeddings by learning to rank". Thesis, 2018. https://hdl.handle.net/2144/34773.
Texto completo