Dissertations / Theses: 'Retrieval-based learning'

1

Maleki-Dizaji, Saeedeh. "Evolutionary learning multi-agent based information retrieval systems." Thesis, Sheffield Hallam University, 2003. http://shura.shu.ac.uk/6856/.

Full text

Abstract:

The volume and variety of information available on the Internet has experienced exponential growth, presenting a difficulty to users wishing to obtain information that accurately matches their interests. A number of factors affect the accuracy of matching user interests and the retrieved documents. First, is the fact that users often do not present queries to information retrieval systems in the form that optimally represents the information they want. Secondly, the measure of a document's relevance is highly subjective and variable between different users. This thesis addresses this problem with an adaptive approach that relies on evolutionary user-modelling. The proposed information retrieval system learns user needs from user-provided relevance feedback. The method combines a qualitative feedback measure obtained using fuzzy inference, and quantitative feedback based on evolutionary algorithms (Genetic Algorithms) fitness measures. Furthermore, the retrieval system's design approach is based on a multi-agent design approach, in order to handle the complexities of the information retrieval system including: document indexing, relevance feedback, user modelling, filtering and ranking the retrieve documents. The major contribution of this research are the combination of genetic algorithms and fuzzy relevance feedback for modelling adaptive behaviour, which is compared against conventional relevance feedback. Novel Genetic Algorithms operators are proposed within the context of textual; the encoding and vector space model for document representation is generalised within the same context.

APA, Harvard, Vancouver, ISO, and other styles

2

Wu, Mengjiao. "Retrieval-based Metacognitive Monitoring in Self-regulated Learning." Kent State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=kent1532049448140424.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Chafik, Sanaa. "Machine learning techniques for content-based information retrieval." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL008/document.

Full text

Abstract:

Avec l’évolution des technologies numériques et la prolifération d'internet, la quantité d’information numérique a considérablement évolué. La recherche par similarité (ou recherche des plus proches voisins) est une problématique que plusieurs communautés de recherche ont tenté de résoudre. Les systèmes de recherche par le contenu de l’information constituent l’une des solutions prometteuses à ce problème. Ces systèmes sont composés essentiellement de trois unités fondamentales, une unité de représentation des données pour l’extraction des primitives, une unité d’indexation multidimensionnelle pour la structuration de l’espace des primitives, et une unité de recherche des plus proches voisins pour la recherche des informations similaires. L’information (image, texte, audio, vidéo) peut être représentée par un vecteur multidimensionnel décrivant le contenu global des données d’entrée. La deuxième unité consiste à structurer l’espace des primitives dans une structure d’index, où la troisième unité -la recherche par similarité- est effective.Dans nos travaux de recherche, nous proposons trois systèmes de recherche par le contenu de plus proches voisins. Les trois approches sont non supervisées, et donc adaptées aux données étiquetées et non étiquetées. Elles sont basées sur le concept du hachage pour une recherche efficace multidimensionnelle des plus proches voisins. Contrairement aux approches de hachage existantes, qui sont binaires, les approches proposées fournissent des structures d’index avec un hachage réel. Bien que les approches de hachage binaires fournissent un bon compromis qualité-temps de calcul, leurs performances en termes de qualité (précision) se dégradent en raison de la perte d’information lors du processus de binarisation. À l'opposé, les approches de hachage réel fournissent une bonne qualité de recherche avec une meilleure approximation de l’espace d’origine, mais induisent en général un surcoût en temps de calcul.Ce dernier problème est abordé dans la troisième contribution. Les approches proposées sont classifiées en deux catégories, superficielle et profonde. Dans la première catégorie, on propose deux techniques de hachage superficiel, intitulées Symmetries of the Cube Locality sensitive hashing (SC-LSH) et Cluster-Based Data Oriented Hashing (CDOH), fondées respectivement sur le hachage aléatoire et l’apprentissage statistique superficiel. SCLSH propose une solution au problème de l’espace mémoire rencontré par la plupart des approches de hachage aléatoire, en considérant un hachage semi-aléatoire réduisant partiellement l’effet aléatoire, et donc l’espace mémoire, de ces dernières, tout en préservant leur efficacité pour la structuration des espaces hétérogènes. La seconde technique, CDOH, propose d’éliminer l’effet aléatoire en combinant des techniques d’apprentissage non-supervisé avec le concept de hachage. CDOH fournit de meilleures performances en temps de calcul, en espace mémoire et en qualité de recherche.La troisième contribution est une approche de hachage basée sur les réseaux de neurones profonds appelée "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). UDN2H propose une indexation individuelle de la sortie de chaque neurone de la couche centrale d’un modèle non supervisé. Ce dernier est un auto-encodeur profond capturant une structure individuelle de haut niveau de chaque neurone de sortie.Nos trois approches, SC-LSH, CDOH et UDN2H, ont été proposées séquentiellement durant cette thèse, avec un niveau croissant, en termes de la complexité des modèles développés, et en termes de la qualité de recherche obtenue sur de grandes bases de données d'information
The amount of media data is growing at high speed with the fast growth of Internet and media resources. Performing an efficient similarity (nearest neighbor) search in such a large collection of data is a very challenging problem that the scientific community has been attempting to tackle. One of the most promising solutions to this fundamental problem is Content-Based Media Retrieval (CBMR) systems. The latter are search systems that perform the retrieval task in large media databases based on the content of the data. CBMR systems consist essentially of three major units, a Data Representation unit for feature representation learning, a Multidimensional Indexing unit for structuring the resulting feature space, and a Nearest Neighbor Search unit to perform efficient search. Media data (i.e. image, text, audio, video, etc.) can be represented by meaningful numeric information (i.e. multidimensional vector), called Feature Description, describing the overall content of the input data. The task of the second unit is to structure the resulting feature descriptor space into an index structure, where the third unit, effective nearest neighbor search, is performed.In this work, we address the problem of nearest neighbor search by proposing three Content-Based Media Retrieval approaches. Our three approaches are unsupervised, and thus can adapt to both labeled and unlabeled real-world datasets. They are based on a hashing indexing scheme to perform effective high dimensional nearest neighbor search. Unlike most recent existing hashing approaches, which favor indexing in Hamming space, our proposed methods provide index structures adapted to a real-space mapping. Although Hamming-based hashing methods achieve good accuracy-speed tradeoff, their accuracy drops owing to information loss during the binarization process. By contrast, real-space hashing approaches provide a more accurate approximation in the mapped real-space as they avoid the hard binary approximations.Our proposed approaches can be classified into shallow and deep approaches. In the former category, we propose two shallow hashing-based approaches namely, "Symmetries of the Cube Locality Sensitive Hashing" (SC-LSH) and "Cluster-based Data Oriented Hashing" (CDOH), based respectively on randomized-hashing and shallow learning-to-hash schemes. The SC-LSH method provides a solution to the space storage problem faced by most randomized-based hashing approaches. It consists of a semi-random scheme reducing partially the randomness effect of randomized hashing approaches, and thus the memory storage problem, while maintaining their efficiency in structuring heterogeneous spaces. The CDOH approach proposes to eliminate the randomness effect by combining machine learning techniques with the hashing concept. The CDOH outperforms the randomized hashing approaches in terms of computation time, memory space and search accuracy.The third approach is a deep learning-based hashing scheme, named "Unsupervised Deep Neuron-per-Neuron Hashing" (UDN2H). The UDN2H approach proposes to index individually the output of each neuron of the top layer of a deep unsupervised model, namely a Deep Autoencoder, with the aim of capturing the high level individual structure of each neuron output.Our three approaches, SC-LSH, CDOH and UDN2H, were proposed sequentially as the thesis was progressing, with an increasing level of complexity in terms of the developed models, and in terms of the effectiveness and the performances obtained on large real-world datasets

APA, Harvard, Vancouver, ISO, and other styles

4

Govindarajan, Hariprasath. "Self-Supervised Representation Learning for Content Based Image Retrieval." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223.

Full text

Abstract:

Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects.

APA, Harvard, Vancouver, ISO, and other styles

5

Alzu’bi, Ahmad Gazi Suleiman. "Semantic content-based image retrieval using compact multifeatures and deep learning." Thesis, University of the West of Scotland, 2016. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.738480.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

com, chungkp@yahoo, and Kien Ping Chung. "Intelligent content-based image retrieval framework based on semi-automated learning and historic profiles." Murdoch University, 2007. http://wwwlib.murdoch.edu.au/adt/browse/view/adt-MU20070831.123947.

Full text

Abstract:

Over the last decade, storage of non text-based data in databases has become an increasingly important trend in information management. Image in particular, has been gaining popularity as an alternative, and sometimes more viable, option for information storage. While this presents a wealth of information, it also creates a great problem in retrieving appropriate and relevant information during searching. This has resulted in an enormous growth of interest, and much active research, into the extraction of relevant information from non text-based databases. In particular,content-based image retrieval (CBIR) systems have been one of the most active areas of research. The retrieval principle of CBIR systems is based on visual features such as colour, texture, and shape or the semantic meaning of the images. To enhance the retrieval speed, most CBIR systems pre-process the images stored in the database. This is because feature extraction algorithms are often computationally expensive. If images are to be retrieved from the World-Wide-Web (WWW), the raw images have to be downloaded and processed in real time. In this case, the feature extraction speed becomes crucial. Ideally, systems should only use those feature extraction algorithms that are most suited for analysing the visual features that capture the common relationship between the images in hand. In this thesis, a statistical discriminant analysis based feature selection framework is proposed. Such a framework is able to select the most appropriate visual feature extraction algorithms by using relevance feedback only on the user labelled samples. The idea is that a smaller image sample group is used to analyse the appropriateness of each visual feature, and only the selected features will be used for image comparison and ranking. As the number of features is less, an improvement in the speed of retrieval is achieved. From experimental results, it is found that the retrieval accuracy for small sample data has also improved. Intelligent E-Business has been used as a case study in this thesis to demonstrate the potential of the framework in the application of image retrieval system. In addition, an inter-query framework has been proposed in this thesis. This framework is also based on the statistical discriminant analysis technique. A common approach in inter-query for a CBIR system is to apply the term-document approach. This is done by treating each images name or address as a term, and the query session as a document. However, scalability becomes an issue with this technique as the number of stored queries increases. Moreover, this approach is not appropriate for a dynamic image database environment. In this thesis, the proposed inter-query framework uses a cluster approach to capture the visual properties common to the previously stored queries. Thus, it is not necessary to memorise the name or address of the images. In order to manage the size of the users profile, the proposed framework also introduces a merging approach to combine clusters that are close-by and similar in their characteristics. Experiments have shown that the proposed framework has outperformed the short term learning approach. It also has the advantage that it eliminates the burden of the complex database maintenance strategies required in the term-document approach commonly needed by the interquery learning framework. Lastly, the proposed inter-query learning framework has been further extended by the incorporation of a new semantic structure. The semantic structure is used to connect the previous queries both visually and semantically. This structure provides the system with the ability to retrieve images that are semantically similar and yet visually different. To do this, an active learning strategy has been incorporated for exploring the structure. Experiments have again shown that the proposed new framework has outperformed the previous framework.

APA, Harvard, Vancouver, ISO, and other styles

7

Chung, Kien Ping. "Intelligent content-based image retrieval framework based on semi-automated learning and historic profiles." Thesis, Chung, Kien- Ping (2007) Intelligent content-based image retrieval framework based on semi-automated learning and historic profiles. PhD thesis, Murdoch University, 2007. https://researchrepository.murdoch.edu.au/id/eprint/666/.

Full text

Abstract:

Over the last decade, storage of non text-based data in databases has become an increasingly important trend in information management. Image in particular, has been gaining popularity as an alternative, and sometimes more viable, option for information storage. While this presents a wealth of information, it also creates a great problem in retrieving appropriate and relevant information during searching. This has resulted in an enormous growth of interest, and much active research, into the extraction of relevant information from non text-based databases. In particular,content-based image retrieval (CBIR) systems have been one of the most active areas of research. The retrieval principle of CBIR systems is based on visual features such as colour, texture, and shape or the semantic meaning of the images. To enhance the retrieval speed, most CBIR systems pre-process the images stored in the database. This is because feature extraction algorithms are often computationally expensive. If images are to be retrieved from the World-Wide-Web (WWW), the raw images have to be downloaded and processed in real time. In this case, the feature extraction speed becomes crucial. Ideally, systems should only use those feature extraction algorithms that are most suited for analysing the visual features that capture the common relationship between the images in hand. In this thesis, a statistical discriminant analysis based feature selection framework is proposed. Such a framework is able to select the most appropriate visual feature extraction algorithms by using relevance feedback only on the user labelled samples. The idea is that a smaller image sample group is used to analyse the appropriateness of each visual feature, and only the selected features will be used for image comparison and ranking. As the number of features is less, an improvement in the speed of retrieval is achieved. From experimental results, it is found that the retrieval accuracy for small sample data has also improved. Intelligent E-Business has been used as a case study in this thesis to demonstrate the potential of the framework in the application of image retrieval system. In addition, an inter-query framework has been proposed in this thesis. This framework is also based on the statistical discriminant analysis technique. A common approach in inter-query for a CBIR system is to apply the term-document approach. This is done by treating each image's name or address as a term, and the query session as a document. However, scalability becomes an issue with this technique as the number of stored queries increases. Moreover, this approach is not appropriate for a dynamic image database environment. In this thesis, the proposed inter-query framework uses a cluster approach to capture the visual properties common to the previously stored queries. Thus, it is not necessary to 'memorise' the name or address of the images. In order to manage the size of the user's profile, the proposed framework also introduces a merging approach to combine clusters that are close-by and similar in their characteristics. Experiments have shown that the proposed framework has outperformed the short term learning approach. It also has the advantage that it eliminates the burden of the complex database maintenance strategies required in the term-document approach commonly needed by the interquery learning framework. Lastly, the proposed inter-query learning framework has been further extended by the incorporation of a new semantic structure. The semantic structure is used to connect the previous queries both visually and semantically. This structure provides the system with the ability to retrieve images that are semantically similar and yet visually different. To do this, an active learning strategy has been incorporated for exploring the structure. Experiments have again shown that the proposed new framework has outperformed the previous framework.

APA, Harvard, Vancouver, ISO, and other styles

8

Chung, Kien Ping. "Intelligent content-based image retrieval framework based on semi-automated learning and historic profiles." Chung, Kien- Ping (2007) Intelligent content-based image retrieval framework based on semi-automated learning and historic profiles. PhD thesis, Murdoch University, 2007. http://researchrepository.murdoch.edu.au/666/.

Full text

Abstract:

Over the last decade, storage of non text-based data in databases has become an increasingly important trend in information management. Image in particular, has been gaining popularity as an alternative, and sometimes more viable, option for information storage. While this presents a wealth of information, it also creates a great problem in retrieving appropriate and relevant information during searching. This has resulted in an enormous growth of interest, and much active research, into the extraction of relevant information from non text-based databases. In particular,content-based image retrieval (CBIR) systems have been one of the most active areas of research. The retrieval principle of CBIR systems is based on visual features such as colour, texture, and shape or the semantic meaning of the images. To enhance the retrieval speed, most CBIR systems pre-process the images stored in the database. This is because feature extraction algorithms are often computationally expensive. If images are to be retrieved from the World-Wide-Web (WWW), the raw images have to be downloaded and processed in real time. In this case, the feature extraction speed becomes crucial. Ideally, systems should only use those feature extraction algorithms that are most suited for analysing the visual features that capture the common relationship between the images in hand. In this thesis, a statistical discriminant analysis based feature selection framework is proposed. Such a framework is able to select the most appropriate visual feature extraction algorithms by using relevance feedback only on the user labelled samples. The idea is that a smaller image sample group is used to analyse the appropriateness of each visual feature, and only the selected features will be used for image comparison and ranking. As the number of features is less, an improvement in the speed of retrieval is achieved. From experimental results, it is found that the retrieval accuracy for small sample data has also improved. Intelligent E-Business has been used as a case study in this thesis to demonstrate the potential of the framework in the application of image retrieval system. In addition, an inter-query framework has been proposed in this thesis. This framework is also based on the statistical discriminant analysis technique. A common approach in inter-query for a CBIR system is to apply the term-document approach. This is done by treating each image's name or address as a term, and the query session as a document. However, scalability becomes an issue with this technique as the number of stored queries increases. Moreover, this approach is not appropriate for a dynamic image database environment. In this thesis, the proposed inter-query framework uses a cluster approach to capture the visual properties common to the previously stored queries. Thus, it is not necessary to 'memorise' the name or address of the images. In order to manage the size of the user's profile, the proposed framework also introduces a merging approach to combine clusters that are close-by and similar in their characteristics. Experiments have shown that the proposed framework has outperformed the short term learning approach. It also has the advantage that it eliminates the burden of the complex database maintenance strategies required in the term-document approach commonly needed by the interquery learning framework. Lastly, the proposed inter-query learning framework has been further extended by the incorporation of a new semantic structure. The semantic structure is used to connect the previous queries both visually and semantically. This structure provides the system with the ability to retrieve images that are semantically similar and yet visually different. To do this, an active learning strategy has been incorporated for exploring the structure. Experiments have again shown that the proposed new framework has outperformed the previous framework.

APA, Harvard, Vancouver, ISO, and other styles

9

Wu, Zutao. "Kmer-based sequence representations for fast retrieval and comparison." Thesis, Queensland University of Technology, 2017. https://eprints.qut.edu.au/103083/1/Zutao_Wu_Thesis.pdf.

Full text

Abstract:

This thesis presents a study of alignment-free methods for genetic sequence comparison. By using representations based on k-mers – short subsequences of length k - sequence similarity can be measured rapidly and accurately by calculating the distance between these paired representations. This research utilises and adapts conventional methods from information retrieval to generate novel representations for k-mers and sequence fragments. Precision was further improved through the use of machine learning approaches - especially neural networks - to learn relationships between k-mers and to generate enhanced sequence representations. These approaches have applications in large scale sequence comparison, especially in the analysis of metagenomic samples.

APA, Harvard, Vancouver, ISO, and other styles

10

Shevchuk, Danylo. "Audio Moment Retrieval based on Natural Language Query." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20094.

Full text

Abstract:

Background. Users spend a lot of time searching through media content to find the desirable fragment. Most of the time people can describe verbally what they are looking for but there is not much of a use for that as of today. Using that verbal description as a query to search for the right interval in a given audio sample would save people a lot of time. Objectives. The aim of this thesis is to compare the performance of the methods suitable for retrieving desired intervals from an audio of an arbitrary length using a natural language query input. There are two objectives. The first one is to train models that match a natural language input to the specific interval of a given soundtrack. The second one is to evaluate the models' performance using conventional metrics. Methods. The research method used in this research is mixed. Various literature on the existing methods suitable for audio classification was reviewed. Three models were selected for conducting the experiments. The selected models are YamNet, AlexNet and ResNet-50. Two experiments were conducted. The goal of the first experiment was to measure the models' performance on classifying audio samples. The goal of the second experiment was to measure the same models' performance on the audio intervals retrieval problem which uses classification as a part of the approach. The steps taken to conduct the experiments were reported as well as the statistical data obtained as a result of the experiments. These steps include data collection, data preprocessing, models training and their performance evaluation. Results. The two tests were conducted to see which model performs better on two separate problems - audio classification and intervals retrieval based on a natural language query. The statistical data was obtained as a result of the tests. The degree (performance-wise) to which can we match a natural language query input to a corresponding interval of an audio of an arbitrary length was calculated for each of the selected models. The aggregated performance of the models are mostly comparable, with YamNet occasionally outperforming the other two models. The average Area Under the Curve, and Accuracy for the studied models are as follows: (67, 71.62), (68.99, 67.72) and (66.59, 71.93) for YamNet, AlexNet and ResNet-50, respectively. Conclusions. We have discovered that the tested models were not capable of retrieving intervals from an audio of an arbitrary length based on a natural language query, however the degree to which the models are able to retrieve the intervals varies depending on the queried keyword and other hyperparameters such as the value of the threshold that is used to filter the audio patches that yield too low probability of the queried class.

APA, Harvard, Vancouver, ISO, and other styles

11

Marcén, Terraza Ana Cristina. "Design of a Machine Learning-based Approach for Fragment Retrieval on Models." Doctoral thesis, Universitat Politècnica de València, 2021. http://hdl.handle.net/10251/158617.

Full text

Abstract:

[ES] El aprendizaje automático (ML por sus siglas en inglés) es conocido como la rama de la inteligencia artificial que reúne algoritmos estadísticos, probabilísticos y de optimización, que aprenden empíricamente. ML puede aprovechar el conocimiento y la experiencia que se han generado durante años en las empresas para realizar automáticamente diferentes procesos. Por lo tanto, ML se ha aplicado a diversas áreas de investigación, que estudian desde la medicina hasta la ingeniería del software. De hecho, en el campo de la ingeniería del software, el mantenimiento y la evolución de un sistema abarca hasta un 80% de la vida útil del sistema. Las empresas, que se han dedicado al desarrollo de sistemas software durante muchos años, han acumulado grandes cantidades de conocimiento y experiencia. Por lo tanto, ML resulta una solución atractiva para reducir sus costos de mantenimiento aprovechando los recursos acumulados. Específicamente, la Recuperación de Enlaces de Trazabilidad, la Localización de Errores y la Ubicación de Características se encuentran entre las tareas más comunes y relevantes para realizar el mantenimiento de productos software. Para abordar estas tareas, los investigadores han propuesto diferentes enfoques. Sin embargo, la mayoría de las investigaciones se centran en métodos tradicionales, como la indexación semántica latente, que no explota los recursos recopilados. Además, la mayoría de las investigaciones se enfocan en el código, descuidando otros artefactos de software como son los modelos. En esta tesis, presentamos un enfoque basado en ML para la recuperación de fragmentos en modelos (FRAME). El objetivo de este enfoque es recuperar el fragmento del modelo que realiza mejor una consulta específica. Esto permite a los ingenieros recuperar el fragmento que necesita ser trazado, reparado o ubicado para el mantenimiento del software. Específicamente, FRAME combina la computación evolutiva y las técnicas ML. En FRAME, un algoritmo evolutivo es guiado por ML para extraer de manera eficaz distintos fragmentos de un modelo. Estos fragmentos son posteriormente evaluados mediante técnicas ML. Para aprender a evaluarlos, las técnicas ML aprovechan el conocimiento (fragmentos recuperados de modelos) y la experiencia que las empresas han generado durante años. Basándose en lo aprendido, las técnicas ML determinan qué fragmento del modelo realiza mejor una consulta. Sin embargo, la mayoría de las técnicas ML no pueden entender los fragmentos de los modelos. Por lo tanto, antes de aplicar las técnicas ML, el enfoque propuesto codifica los fragmentos a través de una codificación ontológica y evolutiva. En resumen, FRAME está diseñado para extraer fragmentos de un modelo, codificarlos y evaluar cuál realiza mejor una consulta específica. El enfoque ha sido evaluado a partir de un caso real proporcionado por nuestro socio industrial (CAF, un proveedor internacional de soluciones ferroviarias). Además, sus resultados han sido comparados con los resultados de los enfoques más comunes y recientes. Los resultados muestran que FRAME obtuvo los mejores resultados para la mayoría de los indicadores de rendimiento, proporcionando un valor medio de precisión igual a 59.91%, un valor medio de exhaustividad igual a 78.95%, una valor-F medio igual a 62.50% y un MCC (Coeficiente de Correlación Matthews) medio igual a 0.64. Aprovechando los fragmentos recuperados de los modelos, FRAME es menos sensible al conocimiento tácito y al desajuste de vocabulario que los enfoques basados en información semántica. Sin embargo, FRAME está limitado por la disponibilidad de fragmentos recuperados para llevar a cabo el aprendizaje automático. Esta tesis presenta una discusión más amplia de estos aspectos así como el análisis estadístico de los resultados, que evalúa la magnitud de la mejora en comparación con los otros enfoques.
[CAT] L'aprenentatge automàtic (ML per les seues sigles en anglés) és conegut com la branca de la intel·ligència artificial que reuneix algorismes estadístics, probabilístics i d'optimització, que aprenen empíricament. ML pot aprofitar el coneixement i l'experiència que s'han generat durant anys en les empreses per a realitzar automàticament diferents processos. Per tant, ML s'ha aplicat a diverses àrees d'investigació, que estudien des de la medicina fins a l'enginyeria del programari. De fet, en el camp de l'enginyeria del programari, el manteniment i l'evolució d'un sistema abasta fins a un 80% de la vida útil del sistema. Les empreses, que s'han dedicat al desenvolupament de sistemes programari durant molts anys, han acumulat grans quantitats de coneixement i experiència. Per tant, ML resulta una solució atractiva per a reduir els seus costos de manteniment aprofitant els recursos acumulats. Específicament, la Recuperació d'Enllaços de Traçabilitat, la Localització d'Errors i la Ubicació de Característiques es troben entre les tasques més comunes i rellevants per a realitzar el manteniment de productes programari. Per a abordar aquestes tasques, els investigadors han proposat diferents enfocaments. No obstant això, la majoria de les investigacions se centren en mètodes tradicionals, com la indexació semàntica latent, que no explota els recursos recopilats. A més, la majoria de les investigacions s'enfoquen en el codi, descurant altres artefactes de programari com són els models. En aquesta tesi, presentem un enfocament basat en ML per a la recuperació de fragments en models (FRAME). L'objectiu d'aquest enfocament és recuperar el fragment del model que realitza millor una consulta específica. Això permet als enginyers recuperar el fragment que necessita ser traçat, reparat o situat per al manteniment del programari. Específicament, FRAME combina la computació evolutiva i les tècniques ML. En FRAME, un algorisme evolutiu és guiat per ML per a extraure de manera eficaç diferents fragments d'un model. Aquests fragments són posteriorment avaluats mitjançant tècniques ML. Per a aprendre a avaluar-los, les tècniques ML aprofiten el coneixement (fragments recuperats de models) i l'experiència que les empreses han generat durant anys. Basant-se en l'aprés, les tècniques ML determinen quin fragment del model realitza millor una consulta. No obstant això, la majoria de les tècniques ML no poden entendre els fragments dels models. Per tant, abans d'aplicar les tècniques ML, l'enfocament proposat codifica els fragments a través d'una codificació ontològica i evolutiva. En resum, FRAME està dissenyat per a extraure fragments d'un model, codificar-los i avaluar quin realitza millor una consulta específica. L'enfocament ha sigut avaluat a partir d'un cas real proporcionat pel nostre soci industrial (CAF, un proveïdor internacional de solucions ferroviàries). A més, els seus resultats han sigut comparats amb els resultats dels enfocaments més comuns i recents. Els resultats mostren que FRAME va obtindre els millors resultats per a la majoria dels indicadors de rendiment, proporcionant un valor mitjà de precisió igual a 59.91%, un valor mitjà d'exhaustivitat igual a 78.95%, una valor-F mig igual a 62.50% i un MCC (Coeficient de Correlació Matthews) mig igual a 0.64. Aprofitant els fragments recuperats dels models, FRAME és menys sensible al coneixement tàcit i al desajustament de vocabulari que els enfocaments basats en informació semàntica. No obstant això, FRAME està limitat per la disponibilitat de fragments recuperats per a dur a terme l'aprenentatge automàtic. Aquesta tesi presenta una discussió més àmplia d'aquests aspectes així com l'anàlisi estadística dels resultats, que avalua la magnitud de la millora en comparació amb els altres enfocaments.
[EN] Machine Learning (ML) is known as the branch of artificial intelligence that gathers statistical, probabilistic, and optimization algorithms, which learn empirically. ML can exploit the knowledge and the experience that have been generated for years to automatically perform different processes. Therefore, ML has been applied to a wide range of research areas, from medicine to software engineering. In fact, in software engineering field, up to an 80% of a system's lifetime is spent on the maintenance and evolution of the system. The companies, that have been developing these software systems for a long time, have gathered a huge amount of knowledge and experience. Therefore, ML is an attractive solution to reduce their maintenance costs exploiting the gathered resources. Specifically, Traceability Link Recovery, Bug Localization, and Feature Location are amongst the most common and relevant tasks when maintaining software products. To tackle these tasks, researchers have proposed a number of approaches. However, most research focus on traditional methods, such as Latent Semantic Indexing, which does not exploit the gathered resources. Moreover, most research targets code, neglecting other software artifacts such as models. In this dissertation, we present an ML-based approach for fragment retrieval on models (FRAME). The goal of this approach is to retrieve the model fragment which better realizes a specific query in a model. This allows engineers to retrieve the model fragment, which must be traced, fixed, or located for software maintenance. Specifically, the FRAME approach combines evolutionary computation and ML techniques. In the FRAME approach, an evolutionary algorithm is guided by ML to effectively extract model fragments from a model. These model fragments are then assessed through ML techniques. To learn how to assess them, ML techniques takes advantage of the companies' knowledge (retrieved model fragments) and experience. Then, based on what was learned, ML techniques determine which model fragment better realizes a query. However, model fragments are not understandable for most ML techniques. Therefore, the proposed approach encodes the model fragments through an ontological evolutionary encoding. In short, the FRAME approach is designed to extract model fragments, encode them, and assess which one better realizes a specific query. The approach has been evaluated in our industrial partner (CAF, an international provider of railway solutions) and compared to the most common and recent approaches. The results show that the FRAME approach achieved the best results for most performance indicators, providing a mean precision value of 59.91%, a recall value of 78.95%, a combined F-measure of 62.50%, and a MCC (Matthews correlation coefficient) value of 0.64. Leveraging retrieved model fragments, the FRAME approach is less sensitive to tacit knowledge and vocabulary mismatch than the approaches based on semantic information. However, the approach is limited by the availability of the retrieved model fragments to perform the learning. These aspects are further discussed, after the statistical analysis of the results, which assesses the magnitude of the improvement in comparison to the other approaches.
Marcén Terraza, AC. (2020). Design of a Machine Learning-based Approach for Fragment Retrieval on Models [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/158617
TESIS

APA, Harvard, Vancouver, ISO, and other styles

12

Osodo, Jennifer Akinyi. "An extended vector-based information retrieval system to retrieve e-learning content based on learner models." Thesis, University of Sunderland, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.542053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Tang, Siu-shing. "Integrating distance function learning and support vector machine for content-based image retrieval /." View abstract or full-text, 2006. http://library.ust.hk/cgi/db/thesis.pl?CSED%202006%20TANG.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Dey, Sounak. "Mapping between Images and Conceptual Spaces: Sketch-based Image Retrieval." Doctoral thesis, Universitat Autònoma de Barcelona, 2020. http://hdl.handle.net/10803/671082.

Full text

Abstract:

El diluvi de contingut visual a Internet –de contingut generat per l’usuari a col·leccions d’imatges comercials- motiva nous mètodes intuïtius per cercar contingut d’imatges digitals: com podem trobar determinades imatges en una base de dades de milions? La recuperació d’imatges basada en esbossos (SBIR) és un tema de recerca emergent en què es pot utilitzar un dibuix a mà lliure per consultar visualment imatges fotogràfiques. SBIR s’alinea a les tendències emergents de consum de contingut visual en dispositius mòbils basats en pantalla tàctil, per a les quals les interaccions gestuals com el croquis són una alternativa natural a l’entrada textual. Aquesta tesi presenta diverses contribucions a la literatura de SBIR. En primer lloc, proposem un marc d’aprenentatge entre modalitats que mapi tant esbossos com text en un espai d’inserció conjunta invariant a l’estil representatiu, conservant la semàntica. L’incrustació resultant permet la comparació directa i la cerca entre esbossos / text i imatges i es basa en una xarxa neuronal convolutional multi-branca (CNN) formada mitjançant esquemes d’entrenament únics. S’ha demostrat que l’incorporació profundament obtinguda ofereix un rendiment de recuperació d’última generació en diversos punts de referència SBIR. En segon lloc, proposem un enfocament per a la recuperació d’imatges multimodals en imatges amb etiquetes múltiples. Es formula una arquitectura de xarxa profunda multi-modal per modelar conjuntament esbossos i text com a modalitats de consulta d’entrada en un espai d’inscripció comú, que s’alinea encara més amb l’espai de funcions d’imatge. La nostra arquitectura també es basa en una detecció d’objectes destacables mitjançant un model d’atenció visual basat en LSTM supervisat, obtingut de funcions convolutives. Tant l’alineació entre les consultes com la imatge i la supervisió de l’atenció a les imatges s’obté generalitzant l’algoritme hongarès mitjançant diferents funcions de pèrdua. Això permet codificar les funcions basades en l’objecte i la seva alineació amb la consulta independentment de la disponibilitat de la coincidència de diferents objectes del conjunt d’entrenament. Validem el rendiment del nostre enfocament en conjunts de dades d’un sol objecte o amb diversos objectes, mostrant el rendiment més modern en tots els conjunts de dades SBIR. En tercer lloc, investiguem el problema de la recuperació d’imatges basada en esbossos de zero (ZS-SBIR), on els esbossos humans s’utilitzen com a consultes per a la recuperació de fotografies de categories no vistes. Avancem de forma important les arts prèvies proposant un nou escenari ZS-SBIR que representi un pas endavant en la seva aplicació pràctica. El nou entorn reconeix exclusivament dos importants reptes importants, però sovint descuidats, de la pràctica ZS-SBIR, (i) la gran bretxa de domini entre el dibuix i la fotografia aficionats, i (ii) la necessitat d’avançar cap a una recuperació a gran escala. Primer cop aportem a la comunitat un nou conjunt de dades ZS-SBIR, QuickDraw-Extended, que consisteix en esbossos de 330.000 dòlars i 204.000 dòlars de fotos en 110 categories. Esbossos humans amateurs altament abstractes s’obtenen intencionadament per maximitzar la bretxa de domini, en lloc dels inclosos en conjunts de dades existents que sovint poden ser semi-fotorealistes. A continuació, formulem un marc ZS-SBIR per modelar conjuntament esbossos i fotografies en un espai d’inserció comú. Una nova estratègia per extreure la informació mútua entre dominis està dissenyada específicament per pal·liar la bretxa de domini.
El diluvio de contenido visual en Internet, desde contenido generado por el usuario hasta colecciones de imágenes comerciales, motiva nuevos métodos intuitivos para buscar contenido de imágenes digitales: ¿cómo podemos encontrar ciertas imágenes en una base de datos de millones? La recuperación de imágenes basada en bocetos (SBIR) es un tema de investigación emergente en el que se puede usar un dibujo a mano libre para consultar visualmente imágenes fotográficas. SBIR está alineado con las tendencias emergentes para el consumo de contenido visual en dispositivos móviles con pantalla táctil, para los cuales las interacciones gestuales como el boceto son una alternativa natural a la entrada de texto. Esta tesis presenta varias contribuciones a la literatura de SBIR. En primer lugar, proponemos un marco de aprendizaje multimodal que mapea tanto los bocetos como el texto en un espacio de incrustación conjunto invariante al estilo representativo, al tiempo que conserva la semántica. La incrustación resultante permite la comparación directa y la búsqueda entre bocetos / texto e imágenes y se basa en una red neuronal convolucional de múltiples ramas (CNN) entrenada utilizando esquemas de entrenamiento únicos. La incrustación profundamente aprendida muestra un rendimiento de recuperación de última generación en varios puntos de referencia SBIR. En segundo lugar, proponemos un enfoque para la recuperación de imágenes multimodales en imágenes con etiquetas múltiples. Una arquitectura de red profunda multimodal está formulada para modelar conjuntamente bocetos y texto como modalidades de consulta de entrada en un espacio de incrustación común, que luego se alinea aún más con el espacio de características de la imagen. Nuestra arquitectura también se basa en una detección de objetos sobresalientes a través de un modelo de atención visual supervisado basado en LSTM aprendido de las características convolucionales. Tanto la alineación entre las consultas y la imagen como la supervisión de la atención en las imágenes se obtienen generalizando el algoritmo húngaro utilizando diferentes funciones de pérdida. Esto permite codificar las características basadas en objetos y su alineación con la consulta independientemente de la disponibilidad de la concurrencia de diferentes objetos en el conjunto de entrenamiento. Validamos el rendimiento de nuestro enfoque en conjuntos de datos estándar de objeto único / múltiple, mostrando el rendimiento más avanzado en cada conjunto de datos SBIR. En tercer lugar, investigamos el problema de la recuperación de imágenes basadas en bocetos de disparo cero (ZS-SBIR), donde los bocetos humanos se utilizan como consultas para llevar a cabo la recuperación de fotos de categorías invisibles. Avanzamos de manera importante en las técnicas anteriores al proponer un nuevo escenario ZS-SBIR que representa un firme paso adelante en su aplicación práctica. El nuevo entorno reconoce de manera única dos desafíos importantes pero a menudo descuidados de la práctica ZS-SBIR, (i) la gran brecha de dominio entre el boceto aficionado y la foto, y (ii) la necesidad de avanzar hacia la recuperación a gran escala. Primero contribuimos a la comunidad con un nuevo conjunto de datos ZS-SBIR, QuickDraw -Extended, que consta de bocetos de $ 330,000 $ y fotos de $ 204,000 $ que abarcan 110 categorías. Los bocetos humanos aficionados altamente abstractos se obtienen a propósito para maximizar la brecha de dominio, en lugar de los incluidos en los conjuntos de datos existentes que a menudo pueden ser semi-fotorrealistas. Luego formulamos un marco ZS-SBIR para modelar conjuntamente bocetos y fotos en un espacio de incrustación común.
The deluge of visual content on the Internet – from user-generated content to commercial image collections - motivates intuitive new methods for searching digital image content: how can we find certain images in a database of millions? Sketch-based image retrieval (SBIR) is an emerging research topic in which a free-hand drawing can be used to visually query photographic images. SBIR is aligned to emerging trends for visual content consumption on mobile touch-screen based devices, for which gestural interactions such as sketch are a natural alternative to textual input. This thesis presents several contributions to the literature of SBIR. First, we propose a cross-modal learning framework that maps both sketches and text into a joint embedding space invariant to depictive style, while preserving semantics. The resulting embedding enables direct comparison and search between sketches/text and images and is based upon a multi-branch convolutional neural network (CNN) trained using unique training schemes. The deeply learned embedding is shown to yield state-of-art retrieval performance on several SBIR benchmarks. Second, we propose an approach for multi-modal image retrieval in multi-labelled images. A multi-modal deep network architecture is formulated to jointly model sket-ches and text as input query modalities into a common embedding space, which is then further aligned with the image feature space. Our architecture also relies on a salient object detection through a supervised LSTM-based visual attention model lear-ned from convolutional features. Both the alignment between the queries and the image and the supervision of the attention on the images are obtained by generalizing the Hungarian Algorithm using different loss functions. This permits encoding the object-based features and its alignment with the query irrespective of the availability of the co-occurrence of different objects in the training set. We validate the performance of our approach on standard single/multi-object datasets, showing state-of-the art performance in every SBIR dataset. Third, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of $330,000$ sketches and $204,000$ photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset.

APA, Harvard, Vancouver, ISO, and other styles

15

Mansjur, Dwi Sianto. "Statistical pattern recognition approaches for retrieval-based machine translation systems." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42821.

Full text

Abstract:

This dissertation addresses the problem of Machine Translation (MT), which is defined as an automated translation of a document written in one language (the source language) to another (the target language) by a computer. The MT task requires various types of knowledge of both the source and target language, e.g., linguistic rules and linguistic exceptions. Traditional MT systems rely on an extensive parsing strategy to decode the linguistic rules and use a knowledge base to encode those linguistic exceptions. However, the construction of the knowledge base becomes an issue as the translation system grows. To overcome this difficulty, real translation examples are used instead of a manually-crafted knowledge base. This design strategy is known as the Example-Based Machine Translation (EBMT) principle. Traditional EBMT systems utilize a database of word or phrase translation pairs. The main challenge of this approach is the difficulty of combining the word or phrase translation units into a meaningful and fluent target text. A novel Retrieval-Based Machine Translation (RBMT) system, which uses a sentence-level translation unit, is proposed in this study. An advantage of using the sentence-level translation unit is that the boundary of a sentence is explicitly defined and the semantic, or meaning, is precise in both the source and target language. The main challenge of using a sentential translation unit is the limited coverage, i.e., the difficulty of finding an exact match between a user query and sentences in the source database. Using an electronic dictionary and a topic modeling procedure, we develop a procedure to obtain clusters of sensible variations for each example in the source database. The coverage of our MT system improves because an input query text is matched against a cluster of sensible variations of translation examples instead of being matched against an original source example. In addition, pattern recognition techniques are used to improve the matching procedure, i.e., the design of optimal pattern classifiers and the incorporation of subjective judgments. A high performance statistical pattern classifier is used to identify the target sentences from an input query sentence in our MT system. The proposed classifier is different from the conventional classifier in terms of the way it addresses the generalization capability. A conventional classifier addresses the generalization issue using the parsimony principle and may encounter the possibility of choosing an oversimplified statistical model. The proposed classifier directly addresses the generalization issue in terms of training (empirical) data. Our classifier is expected to generalize better than the conventional classifiers because our classifier is less likely to use over-simplified statistical models based on the available training data. We further improve the matching procedure by the incorporation of subjective judgments. We formulate a novel cost function that combines subjective judgments and the degree of matching between translation examples and an input query. In addition, we provide an optimization strategy for the novel cost function so that the statistical model can be optimized according to the subjective judgments.

APA, Harvard, Vancouver, ISO, and other styles

16

Linckels, Serge. "An e-librarian service : supporting explorative learning by a description logics based semantic retrieval tool." Phd thesis, Universität Potsdam, 2008. http://opus.kobv.de/ubp/volltexte/2008/1745/.

Full text

Abstract:

Although educational content in electronic form is increasing dramatically, its usage in an educational environment is poor, mainly due to the fact that there is too much of (unreliable) redundant, and not relevant information. Finding appropriate answers is a rather difficult task being reliant on the user filtering of the pertinent information from the noise. Turning knowledge bases like the online tele-TASK archive into useful educational resources requires identifying correct, reliable, and "machine-understandable" information, as well as developing simple but efficient search tools with the ability to reason over this information. Our vision is to create an E-Librarian Service, which is able to retrieve multimedia resources from a knowledge base in a more efficient way than by browsing through an index, or by using a simple keyword search. In our E-Librarian Service, the user can enter his question in a very simple and human way; in natural language (NL). Our premise is that more pertinent results would be retrieved if the search engine understood the sense of the user's query. The returned results are then logical consequences of an inference rather than of keyword matchings. Our E-Librarian Service does not return the answer to the user's question, but it retrieves the most pertinent document(s), in which the user finds the answer to his/her question. Among all the documents that have some common information with the user query, our E-Librarian Service identifies the most pertinent match(es), keeping in mind that the user expects an exhaustive answer while preferring a concise answer with only little or no information overhead. Also, our E-Librarian Service always proposes a solution to the user, even if the system concludes that there is no exhaustive answer. Our E-Librarian Service was implemented prototypically in three different educational tools. A first prototype is CHESt (Computer History Expert System); it has a knowledge base with 300 multimedia clips that cover the main events in computer history. A second prototype is MatES (Mathematics Expert System); it has a knowledge base with 115 clips that cover the topic of fractions in mathematics for secondary school w.r.t. the official school programme. All clips were recorded mainly by pupils. The third and most advanced prototype is the "Lecture Butler's E-Librarain Service"; it has a Web service interface to respect a service oriented architecture (SOA), and was developed in the context of the Web-University project at the Hasso-Plattner-Institute (HPI). Two major experiments in an educational environment - at the Lycée Technique Esch/Alzette in Luxembourg - were made to test the pertinence and reliability of our E-Librarian Service as a complement to traditional courses. The first experiment (in 2005) was made with CHESt in different classes, and covered a single lesson. The second experiment (in 2006) covered a period of 6 weeks of intensive use of MatES in one class. There was no classical mathematics lesson where the teacher gave explanations, but the students had to learn in an autonomous and exploratory way. They had to ask questions to the E-Librarian Service just the way they would if there was a human teacher.
Obwohl sich die Verfügbarkeit von pädagogischen Inhalten in elektronischer Form stetig erhöht, ist deren Nutzen in einem schulischen Umfeld recht gering. Die Hauptursache dessen ist, dass es zu viele unzuverlässige, redundante und nicht relevante Informationen gibt. Das Finden von passenden Lernobjekten ist eine schwierige Aufgabe, die vom benutzerbasierten Filtern der passenden Informationen abhängig ist. Damit Wissensbanken wie das online Tele-TASK Archiv zu nützlichen, pädagogischen Ressourcen werden, müssen Lernobjekte korrekt, zuverlässig und in maschinenverständlicher Form identifiziert werden, sowie effiziente Suchwerkzeuge entwickelt werden. Unser Ziel ist es, einen E-Bibliothekar-Dienst zu schaffen, der multimediale Ressourcen in einer Wissensbank auf effizientere Art und Weise findet als mittels Navigieren durch ein Inhaltsverzeichnis oder mithilfe einer einfachen Stichwortsuche. Unsere Prämisse ist, dass passendere Ergebnisse gefunden werden könnten, wenn die semantische Suchmaschine den Sinn der Benutzeranfrage verstehen würde. In diesem Fall wären die gelieferten Antworten logische Konsequenzen einer Inferenz und nicht die einer Schlüsselwortsuche. Tests haben gezeigt, dass unser E-Bibliothekar-Dienst unter allen Dokumenten in einer gegebenen Wissensbank diejenigen findet, die semantisch am besten zur Anfrage des Benutzers passen. Dabei gilt, dass der Benutzer eine vollständige und präzise Antwort erwartet, die keine oder nur wenige Zusatzinformationen enthält. Außerdem ist unser System in der Lage, dem Benutzer die Qualität und Pertinenz der gelieferten Antworten zu quantifizieren und zu veranschaulichen. Schlussendlich liefert unser E-Bibliothekar-Dienst dem Benutzer immer eine Antwort, selbst wenn das System feststellt, dass es keine vollständige Antwort auf die Frage gibt. Unser E-Bibliothekar-Dienst ermöglicht es dem Benutzer, seine Fragen in einer sehr einfachen und menschlichen Art und Weise auszudrücken, nämlich in natürlicher Sprache. Linguistische Informationen und ein gegebener Kontext in Form einer Ontologie werden für die semantische Übersetzung der Benutzereingabe in eine logische Form benutzt. Unser E-Bibliothekar-Dienst wurde prototypisch in drei unterschiedliche pädagogische Werkzeuge umgesetzt. In zwei Experimenten wurde in einem pädagogischen Umfeld die Angemessenheit und die Zuverlässigkeit dieser Werkzeuge als Komplement zum klassischen Unterricht geprüft. Die Hauptergebnisse sind folgende: Erstens wurde festgestellt, dass Schüler generell akzeptieren, ganze Fragen einzugeben - anstelle von Stichwörtern - wenn dies ihnen hilft, bessere Suchresultate zu erhalten. Zweitens, das wichtigste Resultat aus den Experimenten ist die Erkenntnis, dass Schuleresultate verbessert werden können, wenn Schüler unseren E-Bibliothekar-Dienst verwenden. Wir haben eine generelle Verbesserung von 5% der Schulresultate gemessen. 50% der Schüler haben ihre Schulnoten verbessert, 41% von ihnen sogar maßgeblich. Einer der Hauptgründe für diese positiven Resultate ist, dass die Schüler motivierter waren und folglich bereit waren, mehr Einsatz und Fleiß in das Lernen und in das Erwerben von neuem Wissen zu investieren.

APA, Harvard, Vancouver, ISO, and other styles

17

Azuaje, Francisco Javier. "An unsupervised neural learning approach to retrieval strategies for case-based reasoning and decision support." Thesis, University of Ulster, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311877.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Egli, Sebastian [Verfasser], and Jörg [Akademischer Betreuer] Bendix. "Satellite-Based Fog Detection: A Dynamic Retrieval Method for Europe Based on Machine Learning / Sebastian Egli ; Betreuer: Jörg Bendix." Marburg : Philipps-Universität Marburg, 2019. http://d-nb.info/1187443476/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Turner, Sarah J. "Why Do College Students Improve their Learning Performance Across Trials?" Kent State University Honors College / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=ksuhonors1334854997.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Rossi, Alex. "Self-supervised information retrieval: a novel approach based on Deep Metric Learning and Neural Language Models." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.

APA, Harvard, Vancouver, ISO, and other styles

21

Chang, Ran. "Effective Graph-Based Content--Based Image Retrieval Systems for Large-Scale and Small-Scale Image Databases." DigitalCommons@USU, 2013. https://digitalcommons.usu.edu/etd/2123.

Full text

Abstract:

This dissertation proposes two novel manifold graph-based ranking systems for Content-Based Image Retrieval (CBIR). The two proposed systems exploit the synergism between relevance feedback-based transductive short-term learning and semantic feature-based long-term learning to improve retrieval performance. Proposed systems first apply the active learning mechanism to construct users' relevance feedback log and extract high-level semantic features for each image. These systems then create manifold graphs by incorporating both the low-level visual similarity and the high-level semantic similarity to achieve more meaningful structures for the image space. Finally, asymmetric relevance vectors are created to propagate relevance scores of labeled images to unlabeled images via manifold graphs. The extensive experimental results demonstrate two proposed systems outperform the other state-of-the-art CBIR systems in the context of both correct and erroneous users' feedback.

APA, Harvard, Vancouver, ISO, and other styles

22

Belkacem, Thiziri. "Neural models for information retrieval : towards asymmetry sensitive approaches based on attention models." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30167.

Full text

Abstract:

Ce travail se situe dans le contexte de la recherche d'information (RI) utilisant des techniques d'intelligence artificielle (IA) telles que l'apprentissage profond (DL). Il s'intéresse à des tâches nécessitant l'appariement de textes, telles que la recherche ad-hoc, le domaine du questions-réponses et l'identification des paraphrases. L'objectif de cette thèse est de proposer de nouveaux modèles, utilisant les méthodes de DL, pour construire des modèles d'appariement basés sur la sémantique de textes, et permettant de pallier les problèmes de l'inadéquation du vocabulaire relatifs aux représentations par sac de mots, ou bag of words (BoW), utilisées dans les modèles classiques de RI. En effet, les méthodes classiques de comparaison de textes sont basées sur la représentation BoW qui considère un texte donné comme un ensemble de mots indépendants. Le processus d'appariement de deux séquences de texte repose sur l'appariement exact entre les mots. La principale limite de cette approche est l'inadéquation du vocabulaire. Ce problème apparaît lorsque les séquences de texte à apparier n'utilisent pas le même vocabulaire, même si leurs sujets sont liés. Par exemple, la requête peut contenir plusieurs mots qui ne sont pas nécessairement utilisés dans les documents de la collection, notamment dans les documents pertinents. Les représentations BoW ignorent plusieurs aspects, tels que la structure du texte et le contexte des mots. Ces caractéristiques sont très importantes et permettent de différencier deux textes utilisant les mêmes mots et dont les informations exprimées sont différentes. Un autre problème dans l'appariement de texte est lié à la longueur des documents. Les parties pertinentes peuvent être réparties de manières différentes dans les documents d'une collection. Ceci est d'autant vrai dans les documents volumineux qui ont tendance à couvrir un grand nombre de sujets et à inclure un vocabulaire variable. Un document long pourrait ainsi comporter plusieurs passages pertinents qu'un modèle d'appariement doit capturer. Contrairement aux documents longs, les documents courts sont susceptibles de concerner un sujet spécifique et ont tendance à contenir un vocabulaire plus restreint. L'évaluation de leur pertinence est en principe plus simple que celle des documents plus longs. Dans cette thèse, nous avons proposé différentes contributions répondant chacune à l'un des problèmes susmentionnés. Tout d'abord, afin de résoudre le problème d'inadéquation du vocabulaire, nous avons utilisé des représentations distribuées des mots (plongement lexical) pour permettre un appariement basé sur la sémantique entre les différents mots. Ces représentations ont été utilisées dans des applications de RI où la similarité document-requête est calculée en comparant tous les vecteurs de termes de la requête avec tous les vecteurs de termes du document, indifféremment. Contrairement aux modèles proposés dans l'état-de-l'art, nous avons étudié l'impact des termes de la requête concernant leur présence/absence dans un document. Nous avons adopté différentes stratégies d'appariement document/requête. L'intuition est que l'absence des termes de la requête dans les documents pertinents est en soi un aspect utile à prendre en compte dans le processus de comparaison. En effet, ces termes n'apparaissent pas dans les documents de la collection pour deux raisons possibles : soit leurs synonymes ont été utilisés ; soit ils ne font pas partie du contexte des documents en questions
This work is situated in the context of information retrieval (IR) using machine learning (ML) and deep learning (DL) techniques. It concerns different tasks requiring text matching, such as ad-hoc research, question answering and paraphrase identification. The objective of this thesis is to propose new approaches, using DL methods, to construct semantic-based models for text matching, and to overcome the problems of vocabulary mismatch related to the classical bag of word (BoW) representations used in traditional IR models. Indeed, traditional text matching methods are based on the BoW representation, which considers a given text as a set of independent words. The process of matching two sequences of text is based on the exact matching between words. The main limitation of this approach is related to the vocabulary mismatch. This problem occurs when the text sequences to be matched do not use the same vocabulary, even if their subjects are related. For example, the query may contain several words that are not necessarily used in the documents of the collection, including relevant documents. BoW representations ignore several aspects about a text sequence, such as the structure the context of words. These characteristics are important and make it possible to differentiate between two texts that use the same words but expressing different information. Another problem in text matching is related to the length of documents. The relevant parts can be distributed in different ways in the documents of a collection. This is especially true in large documents that tend to cover a large number of topics and include variable vocabulary. A long document could thus contain several relevant passages that a matching model must capture. Unlike long documents, short documents are likely to be relevant to a specific subject and tend to contain a more restricted vocabulary. Assessing their relevance is in principle simpler than assessing the one of longer documents. In this thesis, we have proposed different contributions, each addressing one of the above-mentioned issues. First, in order to solve the problem of vocabulary mismatch, we used distributed representations of words (word embedding) to allow a semantic matching between the different words. These representations have been used in IR applications where document/query similarity is computed by comparing all the term vectors of the query with all the term vectors of the document, regardless. Unlike the models proposed in the state-of-the-art, we studied the impact of query terms regarding their presence/absence in a document. We have adopted different document/query matching strategies. The intuition is that the absence of the query terms in the relevant documents is in itself a useful aspect to be taken into account in the matching process. Indeed, these terms do not appear in documents of the collection for two possible reasons: either their synonyms have been used or they are not part of the context of the considered documents. The methods we have proposed make it possible, on the one hand, to perform an inaccurate matching between the document and the query, and on the other hand, to evaluate the impact of the different terms of a query in the matching process. Although the use of word embedding allows semantic-based matching between different text sequences, these representations combined with classical matching models still consider the text as a list of independent elements (bag of vectors instead of bag of words). However, the structure of the text as well as the order of the words is important. Any change in the structure of the text and/or the order of words alters the information expressed. In order to solve this problem, neural models were used in text matching

APA, Harvard, Vancouver, ISO, and other styles

23

Wilhelm-Stein, Thomas. "Information Retrieval in der Lehre." Doctoral thesis, Universitätsbibliothek Chemnitz, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-199778.

Full text

Abstract:

Das Thema Information Retrieval hat insbesondere in Form von Internetsuchmaschinen eine große Bedeutung erlangt. Retrievalsysteme werden für eine Vielzahl unterschiedlicher Rechercheszenarien eingesetzt, unter anderem für firmeninterne Supportdatenbanken, aber auch für die Organisation persönlicher E-Mails. Eine aktuelle Herausforderung besteht in der Bestimmung und Vorhersage der Leistungsfähigkeit einzelner Komponenten dieser Retrievalsysteme, insbesondere der komplexen Wechselwirkungen zwischen ihnen. Für die Implementierung und Konfiguration der Retrievalsysteme und der Retrievalkomponenten werden Fachleute benötigt. Mithilfe der webbasierten Lernanwendung Xtrieval Web Lab können Studierende praktisches Wissen über den Information Retrieval Prozess erwerben, indem sie Retrievalkomponenten zu einem Retrievalsystem zusammenstellen und evaluieren, ohne dafür eine Programmiersprache einsetzen zu müssen. Spielemechaniken leiten die Studierenden bei ihrem Entdeckungsprozess an, motivieren sie und verhindern eine Informationsüberladung durch eine Aufteilung der Lerninhalte
Information retrieval has achieved great significance in form of search engines for the Internet. Retrieval systems are used in a variety of research scenarios, including corporate support databases, but also for the organization of personal emails. A current challenge is to determine and predict the performance of individual components of these retrieval systems, in particular the complex interactions between them. For the implementation and configuration of retrieval systems and retrieval components professionals are needed. By using the web-based learning application Xtrieval Web Lab students can gain practical knowledge about the information retrieval process by arranging retrieval components in a retrieval system and their evaluation without using a programming language. Game mechanics guide the students in their discovery process, motivate them and prevent information overload by a partition of the learning content

APA, Harvard, Vancouver, ISO, and other styles

24

Yuee, Liu. "Ontology-based image annotation." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/39611/1/Liu_Yuee_Thesis.pdf.

Full text

Abstract:

With regard to the long-standing problem of the semantic gap between low-level image features and high-level human knowledge, the image retrieval community has recently shifted its emphasis from low-level features analysis to high-level image semantics extrac- tion. User studies reveal that users tend to seek information using high-level semantics. Therefore, image semantics extraction is of great importance to content-based image retrieval because it allows the users to freely express what images they want. Semantic content annotation is the basis for semantic content retrieval. The aim of image anno- tation is to automatically obtain keywords that can be used to represent the content of images. The major research challenges in image semantic annotation are: what is the basic unit of semantic representation? how can the semantic unit be linked to high-level image knowledge? how can the contextual information be stored and utilized for image annotation? In this thesis, the Semantic Web technology (i.e. ontology) is introduced to the image semantic annotation problem. Semantic Web, the next generation web, aims at mak- ing the content of whatever type of media not only understandable to humans but also to machines. Due to the large amounts of multimedia data prevalent on the Web, re- searchers and industries are beginning to pay more attention to the Multimedia Semantic Web. The Semantic Web technology provides a new opportunity for multimedia-based applications, but the research in this area is still in its infancy. Whether ontology can be used to improve image annotation and how to best use ontology in semantic repre- sentation and extraction is still a worth-while investigation. This thesis deals with the problem of image semantic annotation using ontology and machine learning techniques in four phases as below. 1) Salient object extraction. A salient object servers as the basic unit in image semantic extraction as it captures the common visual property of the objects. Image segmen- tation is often used as the �rst step for detecting salient objects, but most segmenta- tion algorithms often fail to generate meaningful regions due to over-segmentation and under-segmentation. We develop a new salient object detection algorithm by combining multiple homogeneity criteria in a region merging framework. 2) Ontology construction. Since real-world objects tend to exist in a context within their environment, contextual information has been increasingly used for improving object recognition. In the ontology construction phase, visual-contextual ontologies are built from a large set of fully segmented and annotated images. The ontologies are composed of several types of concepts (i.e. mid-level and high-level concepts), and domain contextual knowledge. The visual-contextual ontologies stand as a user-friendly interface between low-level features and high-level concepts. 3) Image objects annotation. In this phase, each object is labelled with a mid-level concept in ontologies. First, a set of candidate labels are obtained by training Support Vectors Machines with features extracted from salient objects. After that, contextual knowledge contained in ontologies is used to obtain the �nal labels by removing the ambiguity concepts. 4) Scene semantic annotation. The scene semantic extraction phase is to get the scene type by using both mid-level concepts and domain contextual knowledge in ontologies. Domain contextual knowledge is used to create scene con�guration that describes which objects co-exist with which scene type more frequently. The scene con�guration is represented in a probabilistic graph model, and probabilistic inference is employed to calculate the scene type given an annotated image. To evaluate the proposed methods, a series of experiments have been conducted in a large set of fully annotated outdoor scene images. These include a subset of the Corel database, a subset of the LabelMe dataset, the evaluation dataset of localized semantics in images, the spatial context evaluation dataset, and the segmented and annotated IAPR TC-12 benchmark.

APA, Harvard, Vancouver, ISO, and other styles

25

Moreux, Jean-Philippe, and Guillaume Chiron. "Image Retrieval in Digital Libraries: A Large Scale Multicollection Experimentation of Machine Learning techniques." Sächsische Landesbibliothek - Staats- und Universitätsbibliothek Dresden, 2017. https://slub.qucosa.de/id/qucosa%3A16444.

Full text

Abstract:

While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the scope and performance of the information retrieval services offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in printed materials (dailies, magazines, monographies); Transform, harmonize and enrich the image descriptive metadata (in particular with machine learning classification tools); Load it all into a web app dedicated to image retrieval. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.
Si historiquement, les bibliothèques numériques patrimoniales furent d’abord alimentées par des images, elles profitèrent rapidement de la technologie OCR pour indexer les collections imprimées afin d’améliorer périmètre et performance du service de recherche d’information offert aux utilisateurs. Mais l’accès aux ressources iconographiques n’a pas connu les mêmes progrès et ces dernières demeurent dans l’ombre : indexation manuelle lacunaire, hétérogène et non viable à grande échelle ; silos documentaires par genre iconographique ; recherche par le contenu (CBIR, content-based image retrieval) encore peu opérationnelle sur les collections patrimoniales. Aujourd’hui, il serait pourtant possible de mieux valoriser ces ressources, en particulier en exploitant les énormes volumes d’OCR produits durant les deux dernières décennies (tant comme descripteur textuel que pour l’identification automatique des illustrations imprimées). Et ainsi mettre en valeur ces gravures, dessins, photographies, cartes, etc. pour leur valeur propre mais aussi comme point d’entrée dans les collections, en favorisant découverte et rebond de document en document, de collection à collection. Cet article décrit une approche ETL (extract-transform-load) appliquée aux images d’une bibliothèque numérique à vocation encyclopédique : identifier et extraire l’iconographie partout où elle se trouve (dans les collections image mais aussi dans les imprimés : presse, revue, monographie) ; transformer, harmoniser et enrichir ses métadonnées descriptives grâce à des techniques d’apprentissage machine – machine learning – pour la classification et l’indexation automatiques ; charger ces données dans une application web dédiée à la recherche iconographique (ou dans d’autres services de la bibliothèque). Approche qualifiée de pragmatique à double titre, puisqu’il s’agit de valoriser des ressources numériques existantes et de mettre à profit des technologies (quasiment) mâtures.

APA, Harvard, Vancouver, ISO, and other styles

26

Hou, Jun. "Text mining with semantic annotation : using enriched text representation for entity-oriented retrieval, semantic relation identification and text clustering." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/79206/1/Jun_Hou_Thesis.pdf.

Full text

Abstract:

This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.

APA, Harvard, Vancouver, ISO, and other styles

27

Wiklund-Hörnqvist, Carola. "Brain-based teaching : behavioral and neuro-cognitive evidence for the power of test-enhanced learning." Doctoral thesis, Umeå universitet, Institutionen för psykologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-96395.

Full text

Abstract:

A primary goal of education is the acquisition of durable knowledge which challenges the use of efficient pedagogical methods of how to best facilitate learning. Research in cognitive psychology has demonstrated that repeated testing during the learning phase improves performance on later retention tests compared to restudy of material. This empirical phenomenon is called the testing effect. The testing effect has shown to be robust across different kinds of material and when compared to different pedagogical methods. Despite the extensive number of published papers on the testing effect, the majority of the studies have been conducted in the laboratory. More specific, few studies have examined the testing effect in authentic settings when using course material during the progress of a course. Further, few studies have investigated the beneficial effects with test-enhanced learning by the use of neuroimaging methods (e.g. fMRI). The aim with the thesis was to investigate the effects of test-enhanced learning in an authentic educational context and how this is related to individual differences in working memory capacity (Study I and II) as well as changes in brain activity involved in successful repeated testing and long term retention (Study III). In study I, we examined whether repeated testing with feedback benefitted learning compared to rereading of introductory psychology key concepts in a sample of undergraduate students. The results revealed that repeated testing with feedback was superior compared to rereading both immediate after practice and at longer delays. The effect of repeated testing was beneficial for students irrespectively of WMC. In Study II, we investigated test-enhanced learning in relation to the encoding variability hypothesis for the learning of mathematics in a sample of fifth-grade children. Learning was examined in relation to both practiced and transfer tasks. No differences were found for the practiced tasks. Regarding the transfer tasks, the results gave support for the encoding variability hypothesis, but only at the immediate test. In contrast, when we followed up the durability of learning across time, the results showed that taking the same questions over and over again during the intervention resulted in better performance across time compared to variable encoding. Individual differences in WMC predicted performance on the transfer tasks, but only at the immediate test, regardless of group. Together, the results from Study I and Study II clearly indicate that testenhanced learning is effective in authentic settings, across age-groups and also produces transfer. Integrate current findings from cognitive science, in terms of test-enhanced learning, by the use of authentic materials and assessments relevant for educational goals can be rather easily done with vi computer based tasks. The observed influence of individual differences in WMC between the studies warrant further study of its specific contribution to be able to optimize the learning procedure. In Study III, we tested the complementary hypothesis regarding the mechanisms behind memory retrieval. Recurrent retrieval may be efficient because it induces representational consistency or, alternatively, because it induces representational variability - the altering or adding of underlying representations as a function of successful repeated retrieval. A cluster in right superior parietal cortex was identified as important for items successfully repeatedly retrieved Day 1, and also correctly remembered Day 7, compared to those successfully repeatedly retrieved Day 1 but forgotten Day 7. Representational similarity analysis in this region gave support for the theoretical explanations that emphasis semantic elaboration.

APA, Harvard, Vancouver, ISO, and other styles

28

Kühnlein, Meike [Verfasser], and Thomas [Akademischer Betreuer] Nauss. "A machine learning based 24-h-technique for an area-wide rainfall retrieval using MSG SEVIRI data over Central Europe / Meike Kühnlein. Betreuer: Thomas Nauss." Marburg : Philipps-Universität Marburg, 2014. http://d-nb.info/1064097758/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Minelli, Michele. "Fully homomorphic encryption for machine learning." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE056/document.

Full text

Abstract:

Le chiffrement totalement homomorphe permet d’effectuer des calculs sur des données chiffrées sans fuite d’information sur celles-ci. Pour résumer, un utilisateur peut chiffrer des données, tandis qu’un serveur, qui n’a pas accès à la clé de déchiffrement, peut appliquer à l’aveugle un algorithme sur ces entrées. Le résultat final est lui aussi chiffré, et il ne peut être lu que par l’utilisateur qui possède la clé secrète. Dans cette thèse, nous présentons des nouvelles techniques et constructions pour le chiffrement totalement homomorphe qui sont motivées par des applications en apprentissage automatique, en portant une attention particulière au problème de l’inférence homomorphe, c’est-à-dire l’évaluation de modèles cognitifs déjà entrainé sur des données chiffrées. Premièrement, nous proposons un nouveau schéma de chiffrement totalement homomorphe adapté à l’évaluation de réseaux de neurones artificiels sur des données chiffrées. Notre schéma atteint une complexité qui est essentiellement indépendante du nombre de couches dans le réseau, alors que l’efficacité des schéma proposés précédemment dépend fortement de la topologie du réseau. Ensuite, nous présentons une nouvelle technique pour préserver la confidentialité du circuit pour le chiffrement totalement homomorphe. Ceci permet de cacher l’algorithme qui a été exécuté sur les données chiffrées, comme nécessaire pour protéger les modèles propriétaires d’apprentissage automatique. Notre mécanisme rajoute un coût supplémentaire très faible pour un niveau de sécurité égal. Ensemble, ces résultats renforcent les fondations du chiffrement totalement homomorphe efficace pour l’apprentissage automatique, et représentent un pas en avant vers l’apprentissage profond pratique préservant la confidentialité. Enfin, nous présentons et implémentons un protocole basé sur le chiffrement totalement homomorphe pour le problème de recherche d’information confidentielle, c’est-à-dire un scénario où un utilisateur envoie une requête à une base de donnée tenue par un serveur sans révéler cette requête
Fully homomorphic encryption enables computation on encrypted data without leaking any information about the underlying data. In short, a party can encrypt some input data, while another party, that does not have access to the decryption key, can blindly perform some computation on this encrypted input. The final result is also encrypted, and it can be recovered only by the party that possesses the secret key. In this thesis, we present new techniques/designs for FHE that are motivated by applications to machine learning, with a particular attention to the problem of homomorphic inference, i.e., the evaluation of already trained cognitive models on encrypted data. First, we propose a novel FHE scheme that is tailored to evaluating neural networks on encrypted inputs. Our scheme achieves complexity that is essentially independent of the number of layers in the network, whereas the efficiency of previously proposed schemes strongly depends on the topology of the network. Second, we present a new technique for achieving circuit privacy for FHE. This allows us to hide the computation that is performed on the encrypted data, as is necessary to protect proprietary machine learning algorithms. Our mechanism incurs very small computational overhead while keeping the same security parameters. Together, these results strengthen the foundations of efficient FHE for machine learning, and pave the way towards practical privacy-preserving deep learning. Finally, we present and implement a protocol based on homomorphic encryption for the problem of private information retrieval, i.e., the scenario where a party wants to query a database held by another party without revealing the query itself

APA, Harvard, Vancouver, ISO, and other styles

30

Lakshmanan, Muthukumar S. "Using effective information searching skills to solve problems." Phd thesis, Australia : Macquarie University, 2009. http://hdl.handle.net/1959.14/42606.

Full text

Abstract:

"2008".
Thesis (PhD)--Macquarie University, Australian Centre for Educational Studies, School of Education, 2009.
Bibliography: p. 268-283.
Introduction -- Review of the literature -- Methods and procedures -- Pre-intervention qualitative data analysis & discussion of findings -- Intervention -- Post-intervention qualitative data analysis & discussions of findings -- Post-intervention quantitative data analysis & discussions of findings -- Conclusions.
Problem-based learning (PBL) is an instructional approach that is organized around the investigation and resolution of problems. Problems are neither uniform nor similar. Jonassen (1998, 2000) in his design theory of problem solving has categorized problems into two broad types - well-structured and ill-structured. He has also described a host of mediating skills that impact problem solving outcomes. However, this list of skills is not exhaustive and in view of the utility of the Internet as an informational repository, this study examined the need for effective information searching skills to be included in this list. -- This study was aimed at studying how students solve well and ill structured problems and how different Internet information seeking strategies can be used to engage in problem solving. This study devised and empirically tested the efficacy of an interventionist conceptual model that maps the application of different information seeking techniques to successfully resolving well and ill structured problem types. The intervention helps to better understand the influence of information searching skills on problem solving performance and the various problem solving strategies students can adopt in approaching problem solving. The contrasting patterns of navigational path movements taken by students in seeking information to resolve ill and well structured problems were also investigated. -- A mixed methodology research design, involving a mix of quantitative and qualitative approaches was used in this study. The research site was a polytechnic in Singapore that has implemented problem-based learning in its curriculum design. A first year class of 25 students were the sample population who participated in this study. Six problems from the curriculum were chosen for this study - three well-structured and another three ill-structured problems. -- The research findings of this study inform that information searching skills indeed play an important role in problem solving. The findings affirm the need for students to be systematically instructed in the skills of information searching to be aware of the complexities involved in information seeking and accomplish desired problem solving goals. This study has also shown that well and ill structured problems demand different cognitive and information seeking capabilities. Well-structured problems are easily solved and come with singular correct answers. The information searching necessary for solving well-structured problems is constrained and readily manageable. Thus, students only have to be acquainted with fundamental information searching skills to solve well-structured problems. On the other hand, ill-structured problems are messy and contain a number of unknown elements. There are no easy prototypic solutions. Subsequently, the information needs of ill-structured problems are usually complex, multi-disciplinary and expansive. Hence, students have to be trained to apply a more advanced set of information searching skills in resolving ill-structured problems.
Mode of access: World Wide Web.
xiv, 283 p. ill

APA, Harvard, Vancouver, ISO, and other styles

31

Zeng, Kaiman. "Next Generation of Product Search and Discovery." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2312.

Full text

Abstract:

Online shopping has become an important part of people’s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumers’ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize users’ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users. This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the user’s overhead in locating the information of value is reduced, and the user’s experience of seeking for useful product information is optimized.

APA, Harvard, Vancouver, ISO, and other styles

32

Eyorokon, Vahid. "Measuring Goal Similarity Using Concept, Context and Task Features." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1534084289041091.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Gorisse, David. "Passage à l’échelle des méthodes de recherche sémantique dans les grandes bases d’images." Thesis, Cergy-Pontoise, 2010. http://www.theses.fr/2010CERG0519/document.

Full text

Abstract:

Avec la révolution numérique de cette dernière décennie, la quantité de photos numériques mise à disposition de chacun augmente plus rapidement que la capacité de traitement des ordinateurs. Les outils de recherche actuels ont été conçus pour traiter de faibles volumes de données. Leur complexité ne permet généralement pas d'effectuer des recherches dans des corpus de grande taille avec des temps de calculs acceptables pour les utilisateurs. Dans cette thèse, nous proposons des solutions pour passer à l'échelle les moteurs de recherche d'images par le contenu. Dans un premier temps, nous avons considéré les moteurs de recherche automatique traitant des images indexées sous la forme d'histogrammes globaux. Le passage à l'échelle de ces systèmes est obtenu avec l'introduction d'une nouvelle structure d'index adaptée à ce contexte qui nous permet d'effectuer des recherches de plus proches voisins approximées mais plus efficaces. Dans un second temps, nous nous sommes intéressés à des moteurs plus sophistiqués permettant d'améliorer la qualité de recherche en travaillant avec des index locaux tels que les points d'intérêt. Dans un dernier temps, nous avons proposé une stratégie pour réduire la complexité de calcul des moteurs de recherche interactifs. Ces moteurs permettent d'améliorer les résultats en utilisant des annotations que les utilisateurs fournissent au système lors des sessions de recherche. Notre stratégie permet de sélectionner rapidement les images les plus pertinentes à annoter en optimisant une méthode d'apprentissage actif
In this last decade, would the digital revolution and its ancillary consequence of a massive increases in digital picture quantities. The database size grow much faster than the processing capacity of computers. The current search engine which conceived for small data volumes do not any more allow to make searches in these new corpus with acceptable response times for users.In this thesis, we propose scalable content-based image retrieval engines.At first, we considered automatic search engines where images are indexed with global histograms. Secondly, we were interested in more sophisticated engines allowing to improve the search quality by working with bag of feature. In a last time, we proposed a strategy to reduce the complexity of interactive search engines. These engines allow to improve the results by using labels which the users supply to the system during the search sessions

APA, Harvard, Vancouver, ISO, and other styles

34

Artchounin, Daniel. "Tuning of machine learning algorithms for automatic bug assignment." Thesis, Linköpings universitet, Programvara och system, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139230.

Full text

Abstract:

In software development projects, bug triage consists mainly of assigning bug reports to software developers or teams (depending on the project). The partial or total automation of this task would have a positive economic impact on many software projects. This thesis introduces a systematic four-step method to find some of the best configurations of several machine learning algorithms intending to solve the automatic bug assignment problem. These four steps are respectively used to select a combination of pre-processing techniques, a bug report representation, a potential feature selection technique and to tune several classifiers. The aforementioned method has been applied on three software projects: 66 066 bug reports of a proprietary project, 24 450 bug reports of Eclipse JDT and 30 358 bug reports of Mozilla Firefox. 619 configurations have been applied and compared on each of these three projects. In production, using the approach introduced in this work on the bug reports of the proprietary project would have increased the accuracy by up to 16.64 percentage points.

APA, Harvard, Vancouver, ISO, and other styles

35

Patterson, William Robert David. "Introspective techniques for maintaining retrieval knowledge in case-base reasoning." Thesis, University of Ulster, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365937.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Soltan-Zadeh, Yasaman. "Improved rule-based document representation and classification using genetic programming." Thesis, Royal Holloway, University of London, 2011. http://repository.royalholloway.ac.uk/items/479a1773-779b-8b24-b334-7ed485311abe/8/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Bergqvist, Martin, and Jim Glansk. "Fördelar med att applicera Collaborative Filtering på Steam : En utforskande studie." Thesis, Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-14129.

Full text

Abstract:

Rekommendationssystem används överallt. På populära plattformar såsom Netflix och Amazon får du alltid rekommendationer på vad som är nästa lämpliga film eller inköp, baserat på din personliga profil. Detta sker genom korsreferering mellan användare och produkter för att finna sannolika mönster. Syftet med studien har varit att jämföra de två prevalenta tillvägagångssätten att skapa rekommendationer, på en annorlunda datamängd, där ”best practice” inte nödvändigtvis är tillämpbart. Som följd därav, har jämförelse gjorts på effektiviteten av Content-based Filtering kontra Collaborative Filtering, på Steams spelplattform, i syfte att etablera potential för en bättre lösning. Detta angreps genom att samla in data från Steam; Bygga en Content-based Filtering motor som baslinje för att representera Steams nuvarande rekommendationssystem, samt en motsvarande Collaborative Filtering motor, baserad på en standard-implementation, att jämföra mot. Under studiens gång visade det sig att Content-based Filtering prestanda initiellt växte linjärt medan spelarbasen på ett givet spel ökade. Collaborative Filtering däremot hade en exponentiell prestationskurva för spel med få spelare, för att sedan plana ut på en nivå som prestationsmässigt överträffade jämförelsmetoden. Den praktiska signifikansen av dessa resultat torde rättfärdiga en mer utbredd implementering av Collaborative Filtering även där man normalt avstår till förmån för Content-based Filtering då det är enklare att implementera och ger acceptabla resultat. Då våra resultat visar på såpass stor avvikelse redan vid basmodeller, är det här en attityd som mycket väl kan förändras. Collaborative Filtering har varit sparsamt använt på mer mångfacetterade datamängder, men våra resultat visar på potential att överträffa Content-based Filtering med relativt liten insats även på sådana datamängder. Detta kan gynna alla inköps- och community-kombinerade plattformar, då det finns möjlighet att övervaka användandet av inköpen i realtid, vilket möjliggör för justeringar av de faktorer som kan visa sig resultera i felrepresentation.
The use of recommender systems is everywhere. On popular platforms such as Netflix and Amazon, you are always given new recommendations on what to consume next, based on your specific profiling. This is done by cross-referencing users and products to find probable patterns. The aims of this study were to compare the two main ways of generating recommendations, in an unorthodox dataset where “best practice” might not apply. Subsequently, recommendation efficiency was compared between Content Based Filtering and Collaborative Filtering, on the gaming-platform of Steam, in order to establish if there was potential for a better solution. We approached this by gathering data from Steam, building a representational baseline Content-based Filtering recommendation-engine based on what is currently used by Steam, and a competing Collaborative Filtering engine based on a standard implementation. In the course of this study, we found that while Content-based Filtering performance initially grew linearly as the player base of a game increased, Collaborative Filtering’s performance grew exponentially from a small player base, to plateau at a performance-level exceeding the comparison. The practical consequence of these findings would be the justification to apply Collaborative Filtering even on smaller, more complex sets of data than is normally done; The justification being that Content-based Filtering is easier to implement and yields decent results. With our findings showing such a big discrepancy even at basic models, this attitude might well change. The usage of Collaborative Filtering has been used scarcely on the more multifaceted datasets, but our results show that the potential to exceed Content-based Filtering is rather easily obtainable on such sets as well. This potentially benefits all purchase/community-combined platforms, as the usage of the purchase is monitorable on-line, and allows for the adjustments of misrepresentational factors as they appear.

APA, Harvard, Vancouver, ISO, and other styles

38

Désoyer, Adèle. "Appariement de contenus textuels dans le domaine de la presse en ligne : développement et adaptation d'un système de recherche d'information." Thesis, Paris 10, 2017. http://www.theses.fr/2017PA100119/document.

Full text

Abstract:

L'objectif de cette thèse, menée dans un cadre industriel, est d'apparier des contenus textuels médiatiques. Plus précisément, il s'agit d'apparier à des articles de presse en ligne des vidéos pertinentes, pour lesquelles nous disposons d'une description textuelle. Notre problématique relève donc exclusivement de l'analyse de matériaux textuels, et ne fait intervenir aucune analyse d'image ni de langue orale. Surviennent alors des questions relatives à la façon de comparer des objets textuels, ainsi qu'aux critères mobilisés pour estimer leur degré de similarité. L'un de ces éléments est selon nous la similarité thématique de leurs contenus, autrement dit le fait que deux documents doivent relater le même sujet pour former une paire pertinente. Ces problématiques relèvent du domaine de la recherche d'information (ri), dans lequel nous nous ancrons principalement. Par ailleurs, lorsque l'on traite des contenus d'actualité, la dimension temporelle est aussi primordiale et les problématiques qui l'entourent relèvent de travaux ayant trait au domaine du topic detection and tracking (tdt) dans lequel nous nous inscrivons également.Le système d'appariement développé dans cette thèse distingue donc différentes étapes qui se complètent. Dans un premier temps, l'indexation des contenus fait appel à des méthodes de traitement automatique des langues (tal) pour dépasser la représentation classique des textes en sac de mots. Ensuite, deux scores sont calculés pour rendre compte du degré de similarité entre deux contenus : l'un relatif à leur similarité thématique, basé sur un modèle vectoriel de ri; l'autre à leur proximité temporelle, basé sur une fonction empirique. Finalement, un modèle de classification appris à partir de paires de documents, décrites par ces deux scores et annotées manuellement, permet d'ordonnancer les résultats.L'évaluation des performances du système a elle aussi fait l'objet de questionnements dans ces travaux de thèse. Les contraintes imposées par les données traitées et le besoin particulier de l'entreprise partenaire nous ont en effet contraints à adopter une alternative au protocole classique d'évaluation en ri, le paradigme de Cranfield
The goal of this thesis, conducted within an industrial framework, is to pair textual media content. Specifically, the aim is to pair on-line news articles to relevant videos for which we have a textual description. The main issue is then a matter of textual analysis, no image or spoken language analysis was undertaken in the present study. The question that arises is how to compare these particular objects, the texts, and also what criteria to use in order to estimate their degree of similarity. We consider that one of these criteria is the topic similarity of their content, in other words, the fact that two documents have to deal with the same topic to form a relevant pair. This problem fall within the field of information retrieval (ir) which is the main strategy called upon in this research. Furthermore, when dealing with news content, the time dimension is of prime importance. To address this aspect, the field of topic detection and tracking (tdt) will also be explored.The pairing system developed in this thesis distinguishes different steps which complement one another. In the first step, the system uses natural language processing (nlp) methods to index both articles and videos, in order to overcome the traditionnal bag-of-words representation of texts. In the second step, two scores are calculated for an article-video pair: the first one reflects their topical similarity and is based on a vector space model; the second one expresses their proximity in time, based on an empirical function. At the end of the algorithm, a classification model learned from manually annotated document pairs is used to rank the results.Evaluation of the system's performances raised some further questions in this doctoral research. The constraints imposed both by the data and the specific need of the partner company led us to adapt the evaluation protocol traditionnal used in ir, namely the cranfield paradigm. We therefore propose an alternative solution for evaluating the system that takes all our constraints into account

APA, Harvard, Vancouver, ISO, and other styles

39

Bellafqira, Reda. "Chiffrement homomorphe et recherche par le contenu sécurisé de données externalisées et mutualisées : Application à l'imagerie médicale et l'aide au diagnostic." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0063.

Full text

Abstract:

La mutualisation et l'externalisation de données concernent de nombreux domaines y compris celui de la santé. Au-delà de la réduction des coûts de maintenance, l'intérêt est d'améliorer la prise en charge des patients par le déploiement d'outils d'aide au diagnostic fondés sur la réutilisation des données. Dans un tel environnement, la sécurité des données (confidentialité, intégrité et traçabilité) est un enjeu majeur. C'est dans ce contexte que s'inscrivent ces travaux de thèse. Ils concernent en particulier la sécurisation des techniques de recherche d'images par le contenu (CBIR) et de « machine learning » qui sont au c'ur des systèmes d'aide au diagnostic. Ces techniques permettent de trouver des images semblables à une image requête non encore interprétée. L'objectif est de définir des approches capables d'exploiter des données externalisées et sécurisées, et de permettre à un « cloud » de fournir une aide au diagnostic. Plusieurs mécanismes permettent le traitement de données chiffrées, mais la plupart sont dépendants d'interactions entre différentes entités (l'utilisateur, le cloud voire un tiers de confiance) et doivent être combinés judicieusement de manière à ne pas laisser fuir d'information lors d'un traitement.Au cours de ces trois années de thèse, nous nous sommes dans un premier temps intéressés à la sécurisation à l'aide du chiffrement homomorphe, d'un système de CBIR externalisé sous la contrainte d'aucune interaction entre le fournisseur de service et l'utilisateur. Dans un second temps, nous avons développé une approche de « Machine Learning » sécurisée fondée sur le perceptron multicouches, dont la phase d'apprentissage peut être externalisée de manière sûre, l'enjeu étant d'assurer la convergence de cette dernière. L'ensemble des données et des paramètres du modèle sont chiffrés. Du fait que ces systèmes d'aides doivent exploiter des informations issues de plusieurs sources, chacune externalisant ses données chiffrées sous sa propre clef, nous nous sommes intéressés au problème du partage de données chiffrées. Un problème traité par les schémas de « Proxy Re-Encryption » (PRE). Dans ce contexte, nous avons proposé le premier schéma PRE qui permet à la fois le partage et le traitement des données chiffrées. Nous avons également travaillé sur un schéma de tatouage de données chiffrées pour tracer et vérifier l'intégrité des données dans cet environnement partagé. Le message tatoué dans le chiffré est accessible que l'image soit ou non chiffrée et offre plusieurs services de sécurité fondés sur le tatouage
Cloud computing has emerged as a successful paradigm allowing individuals and companies to store and process large amounts of data without a need to purchase and maintain their own networks and computer systems. In healthcare for example, different initiatives aim at sharing medical images and Personal Health Records (PHR) in between health professionals or hospitals with the help of the cloud. In such an environment, data security (confidentiality, integrity and traceability) is a major issue. In this context that these thesis works, it concerns in particular the securing of Content Based Image Retrieval (CBIR) techniques and machine learning (ML) which are at the heart of diagnostic decision support systems. These techniques make it possible to find similar images to an image not yet interpreted. The goal is to define approaches that can exploit secure externalized data and enable a cloud to provide a diagnostic support. Several mechanisms allow the processing of encrypted data, but most are dependent on interactions between different entities (the user, the cloud or a trusted third party) and must be combined judiciously so as to not leak information. During these three years of thesis, we initially focused on securing an outsourced CBIR system under the constraint of no interaction between the users and the service provider (cloud). In a second step, we have developed a secure machine learning approach based on multilayer perceptron (MLP), whose learning phase can be outsourced in a secure way, the challenge being to ensure the convergence of the MLP. All the data and parameters of the model are encrypted using homomorphic encryption. Because these systems need to use information from multiple sources, each of which outsources its encrypted data under its own key, we are interested in the problem of sharing encrypted data. A problem known by the "Proxy Re-Encryption" (PRE) schemes. In this context, we have proposed the first PRE scheme that allows both the sharing and the processing of encrypted data. We also worked on watermarking scheme over encrypted data in order to trace and verify the integrity of data in this shared environment. The embedded message is accessible whether or not the image is encrypted and provides several services

APA, Harvard, Vancouver, ISO, and other styles

40

Pereira, Silvio Moreto. "Caracterização de imagens de úlceras dermatológicas para indexação e recuperação por conteúdo." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/82/82131/tde-08012013-110054/.

Full text

Abstract:

Úlceras de pele são causadas devido à deficiência na circulação sanguínea. O diagnóstico é feito pela análise visual das regiões afetadas. A quantificação da distribuição de cores da lesão, por meio de técnicas de processamento de imagens pode auxiliar na caracterização e análise da dinâmica do processo patológico e resposta ao tratamento. O processamento de imagens de úlceras dermatológicas envolve etapas relacionadas a segmentação, caracterização e indexação. Esta análise é importante para classificação, recuperação de imagens similares e acompanhamento da evolução de uma lesão. Este trabalho apresenta um estudo sobre técnicas de segmentação e caracterização de imagens coloridas de úlceras de pele, baseadas nos modelos de cores RGB, HSV, L*a*b* e L*u*v*, utilizando suas componentes na extração de informações de textura e cor. Foram utilizadas técnicas de Aprendizado de Máquina e algoritmos matemáticos para a segmentação e extração de atributos, utilizando uma base de dados com 172 imagens. Nos testes de recuperação, foram utilizadas diferentes métricas de distância para avaliação do desempenho e técnicas de seleção de atributos. Os resultados obtidos evidenciam bom potencial para apoio ao diagnóstico e acompanhamento da evolução do tratamento com valores de até 75% de precisão para as técnicas de recuperação, 0,9 de área embaixo da curva receiver-operating-characteristic na classificação e 0,04 de erro médio quadrático entre a composição de cores da imagem segmentada automaticamente e a segmentada manualmente. Nos testes utilizando seleção de atributos, foi observado uma redução nos valores de precisão de recuperação (60%) e valores similares nos tetes de classificação (0,85).
Skin ulcers are caused due to deficiency in the bloodstream. The diagnosis is made by a visual analysis of the affected area. Quantification of color distribution of the lesion by image processing techniques can aid in the characterization and response to treatment. The image processing steps involves skin ulcers related to segmentation, characterization and indexing. This analysis is important for classification, image retrieval and similar tracking the evolution of an injury. This project presents a study of segmentation techniques and characterization of color images of dermatological skin ulcers, based on the color models RGB, HSV, L*a*b* and L*u*v*, using their components in the extraction of texture and color information. Were used Machine Learning techniques, mathematical algorithms for segmentation and extraction of attributes, using a database containing 172 images in two versions. In recovery tests were used different distance metrics for performance evaluation and techniques of features selection. The results show good potential to support the diagnosis and monitoring of treatment progress with values up to 75% precision in recovery techniques, 0.9 area under the curve receiver-operating-characteristic) in classification, and 0.04 mean square error between the color composition of the automatically segmented image and the manually segmented image. In tests utilizing feature selection was observed a decrease in precision values of image retrieval (60%) and similar values in the classification\'s tests (0.85).

APA, Harvard, Vancouver, ISO, and other styles

41

Westerdahl, Simon, and Larsson Fredrik Lemón. "Optimization for search engines based on external revision database." Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-21000.

Full text

Abstract:

The amount of data is continually growing and the ability to efficiently search through vast amounts of data is almost always sought after. To efficiently find data in a set there exist many technologies and methods but all of them cost in the form of resources like cpu-cycles, memory and storage. In this study a search engine (SE) is optimized using several methods and techniques. Thesis looks into how to optimize a SE that is based on an external revision database.The optimized implementation is compared to a non-optimized implementation when executing a query. An artificial neural network (ANN) trained on a dataset containing 3 years normal usage at a company is used to prioritize within the resultset before returning the result to the caller. The new indexing algorithms have improved the document space complexity by removing all duplicate documents that add no value. Machine learning (ML) has been used to analyze the user behaviour to reduce the necessary amount of documents that gets retrieved by a query.

APA, Harvard, Vancouver, ISO, and other styles

42

Kotevska, Olivera. "Learning based event model for knowledge extraction and prediction system in the context of Smart City." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM005/document.

Full text

Abstract:

Des milliards de «choses» connectées à l’internet constituent les réseaux symbiotiques de périphériques de communication (par exemple, les téléphones, les tablettes, les ordinateurs portables), les appareils intelligents, les objets (par exemple, la maison intelligente, le réfrigérateur, etc.) et des réseaux de personnes comme les réseaux sociaux. La notion de réseaux traditionnels se développe et, à l'avenir, elle ira au-delà, y compris plus d'entités et d'informations. Ces réseaux et ces dispositifs détectent, surveillent et génèrent constamment une grande uantité de données sur tous les aspects de la vie humaine. L'un des principaux défis dans ce domaine est que le réseau se compose de «choses» qui sont hétérogènes à bien des égards, les deux autres, c'est qu'ils changent au fil du temps, et il y a tellement d'entités dans le réseau qui sont essentielles pour identifier le lien entre eux.Dans cette recherche, nous abordons ces problèmes en combinant la théorie et les algorithmes du traitement des événements avec les domaines d'apprentissage par machine. Notre objectif est de proposer une solution possible pour mieux utiliser les informations générées par ces réseaux. Cela aidera à créer des systèmes qui détectent et répondent rapidement aux situations qui se produisent dans la vie urbaine afin qu'une décision intelligente puisse être prise pour les citoyens, les organisations, les entreprises et les administrations municipales. Les médias sociaux sont considérés comme une source d'information sur les situations et les faits liés aux utilisateurs et à leur environnement social. Au début, nous abordons le problème de l'identification de l'opinion publique pour une période donnée (année, mois) afin de mieux comprendre la dynamique de la ville. Pour résoudre ce problème, nous avons proposé un nouvel algorithme pour analyser des données textuelles complexes et bruyantes telles que Twitter-messages-tweets. Cet algorithme permet de catégoriser automatiquement et d'identifier la similarité entre les sujets d'événement en utilisant les techniques de regroupement. Le deuxième défi est de combiner les données du réseau avec diverses propriétés et caractéristiques en format commun qui faciliteront le partage des données entre les services. Pour le résoudre, nous avons créé un modèle d'événement commun qui réduit la complexité de la représentation tout en conservant la quantité maximale d'informations. Ce modèle comporte deux ajouts majeurs : la sémantiques et l’évolutivité. La partie sémantique signifie que notre modèle est souligné avec une ontologie de niveau supérieur qui ajoute des capacités d'interopérabilité. Bien que la partie d'évolutivité signifie que la structure du modèle proposé est flexible, ce qui ajoute des fonctionnalités d'extensibilité. Nous avons validé ce modèle en utilisant des modèles d'événements complexes et des techniques d'analyse prédictive. Pour faire face à l'environnement dynamique et aux changements inattendus, nous avons créé un modèle de réseau dynamique et résilient. Il choisit toujours le modèle optimal pour les analyses et s'adapte automatiquement aux modifications en sélectionnant le meilleur modèle. Nous avons utilisé une approche qualitative et quantitative pour une sélection évolutive de flux d'événements, qui réduit la solution pour l'analyse des liens, l’optimale et l’alternative du meilleur modèle
Billions of “things” connected to the Internet constitute the symbiotic networks of communication devices (e.g., phones, tablets, and laptops), smart appliances (e.g., fridge, coffee maker and so forth) and networks of people (e.g., social networks). So, the concept of traditional networks (e.g., computer networks) is expanding and in future will go beyond it, including more entities and information. These networks and devices are constantly sensing, monitoring and generating a vast amount of data on all aspects of human life. One of the main challenges in this area is that the network consists of “things” which are heterogeneous in many ways, the other is that their state of the interconnected objects is changing over time, and there are so many entities in the network which is crucial to identify their interdependency in order to better monitor and predict the network behavior. In this research, we address these problems by combining the theory and algorithms of event processing with machine learning domains. Our goal is to propose a possible solution to better use the information generated by these networks. It will help to create systems that detect and respond promptly to situations occurring in urban life so that smart decision can be made for citizens, organizations, companies and city administrations. Social media is treated as a source of information about situations and facts related to the users and their social environment. At first, we tackle the problem of identifying the public opinion for a given period (year, month) to get a better understanding of city dynamics. To solve this problem, we proposed a new algorithm to analyze complex and noisy textual data such as Twitter messages-tweets. This algorithm permits an automatic categorization and similarity identification between event topics by using clustering techniques. The second challenge is combing network data with various properties and characteristics in common format that will facilitate data sharing among services. To solve it we created common event model that reduces the representation complexity while keeping the maximum amount of information. This model has two major additions: semantic and scalability. The semantic part means that our model is underlined with an upper-level ontology that adds interoperability capabilities. While the scalability part means that the structure of the proposed model is flexible in adding new entries and features. We validated this model by using complex event patterns and predictive analytics techniques. To deal with the dynamic environment and unexpected changes we created dynamic, resilient network model. It always chooses the optimal model for analytics and automatically adapts to the changes by selecting the next best model. We used qualitative and quantitative approach for scalable event stream selection, that narrows down the solution for link analysis, optimal and alternative best model. It also identifies efficient relationship analysis between data streams such as correlation, causality, similarity to identify relevant data sources that can act as an alternative data source or complement the analytics process

APA, Harvard, Vancouver, ISO, and other styles

43

Guillaumin, Matthieu. "Données multimodales pour l'analyse d'image." Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00522278/en/.

Full text

Abstract:

La présente thèse s'intéresse à l'utilisation de méta-données textuelles pour l'analyse d'image. Nous cherchons à utiliser ces informations additionelles comme supervision faible pour l'apprentissage de modèles de reconnaissance visuelle. Nous avons observé un récent et grandissant intérêt pour les méthodes capables d'exploiter ce type de données car celles-ci peuvent potentiellement supprimer le besoin d'annotations manuelles, qui sont coûteuses en temps et en ressources. Nous concentrons nos efforts sur deux types de données visuelles associées à des informations textuelles. Tout d'abord, nous utilisons des images de dépêches qui sont accompagnées de légendes descriptives pour s'attaquer à plusieurs problèmes liés à la reconnaissance de visages. Parmi ces problèmes, la vérification de visages est la tâche consistant à décider si deux images représentent la même personne, et le nommage de visages cherche à associer les visages d'une base de données à leur noms corrects. Ensuite, nous explorons des modèles pour prédire automatiquement les labels pertinents pour des images, un problème connu sous le nom d'annotation automatique d'image. Ces modèles peuvent aussi être utilisés pour effectuer des recherches d'images à partir de mots-clés. Nous étudions enfin un scénario d'apprentissage multimodal semi-supervisé pour la catégorisation d'image. Dans ce cadre de travail, les labels sont supposés présents pour les données d'apprentissage, qu'elles soient manuellement annotées ou non, et absentes des données de test. Nos travaux se basent sur l'observation que la plupart de ces problèmes peuvent être résolus si des mesures de similarité parfaitement adaptées sont utilisées. Nous proposons donc de nouvelles approches qui combinent apprentissage de distance, modèles par plus proches voisins et méthodes par graphes pour apprendre, à partir de données visuelles et textuelles, des similarités visuelles spécifiques à chaque problème. Dans le cas des visages, nos similarités se concentrent sur l'identité des individus tandis que, pour les images, elles concernent des concepts sémantiques plus généraux. Expérimentalement, nos approches obtiennent des performances à l'état de l'art sur plusieurs bases de données complexes. Pour les deux types de données considérés, nous montrons clairement que l'apprentissage bénéficie de l'information textuelle supplémentaire résultant en l'amélioration de la performance des systèmes de reconnaissance visuelle.

APA, Harvard, Vancouver, ISO, and other styles

44

Zhou, Xujuan. "Rough set-based reasoning and pattern mining for information filtering." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/29350/1/Xujuan_Zhou_Thesis.pdf.

Full text

Abstract:

An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).

APA, Harvard, Vancouver, ISO, and other styles

45

Zhou, Xujuan. "Rough set-based reasoning and pattern mining for information filtering." Queensland University of Technology, 2008. http://eprints.qut.edu.au/29350/.

Full text

Abstract:

An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).

APA, Harvard, Vancouver, ISO, and other styles

46

Laurier, Cyril François. "Automatic Classification of musical mood by content-based analysis." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/51582.

Full text

Abstract:

In this work, we focus on automatically classifying music by mood. For this purpose, we propose computational models using information extracted from the audio signal. The foundations of such algorithms are based on techniques from signal processing, machine learning and information retrieval. First, by studying the tagging behavior of a music social network, we find a model to represent mood. Then, we propose a method for automatic music mood classification. We analyze the contributions of audio descriptors and how their values are related to the observed mood. We also propose a multimodal version using lyrics, contributing to the field of text retrieval. Moreover, after showing the relation between mood and genre, we present a new approach using automatic music genre classification. We demonstrate that genre-based mood classifiers give higher accuracies than standard audio models. Finally, we propose a rule extraction technique to explicit our models.
En esta tesis, nos centramos en la clasificación automática de música a partir de la detección de la emoción que comunica. Primero, estudiamos cómo los miembros de una red social utilizan etiquetas y palabras clave para describir la música y las emociones que evoca, y encontramos un modelo para representar los estados de ánimo. Luego, proponemos un método de clasificación automática de emociones. Analizamos las contribuciones de descriptores de audio y cómo sus valores están relacionados con los estados de ánimo. Proponemos también una versión multimodal de nuestro algoritmo, usando las letras de canciones. Finalmente, después de estudiar la relación entre el estado de ánimo y el género musical, presentamos un método usando la clasificación automática por género. A modo de recapitulación conceptual y algorítmica, proponemos una técnica de extracción de reglas para entender como los algoritmos de aprendizaje automático predicen la emoción evocada por la música

APA, Harvard, Vancouver, ISO, and other styles

47

Dzhambazov, Georgi. "Knowledge-based probabilistic modeling for tracking lyrics in music audio signals." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404681.

Full text

Abstract:

This thesis proposes specific signal processing and machine learning methodologies for automatically aligning the lyrics of a song to its corresponding audio recording. The research carried out falls in the broader field of music information retrieval (MIR) and in this respect, we aim at improving some existing state-of-the-art methodologies, by introducing domain-specific knowledge. The goal of this work is to devise models capable of tracking in the music audio signal the sequential aspect of one particular element of lyrics - the phonemes. Music can be understood as comprising different facets, one of which is lyrics. The models we build take into account the complementary context that exists around lyrics, which is any musical facet complementary to lyrics. The facets used in this thesis include the structure of the music composition, structure of a melodic phrase, the structure of a metrical cycle. From this perspective, we analyse not only the low-level acoustic characteristics, representing the timbre of the phonemes, but also higher-level characteristics, in which the complementary context manifests. We propose specific probabilistic models to represent how the transitions between consecutive sung phonemes are conditioned by different facets of complementary context. The complementary context, which we address, unfolds in time according to principles that are particular of a music tradition. To capture these, we created corpora and datasets for two music traditions, which have a rich set of such principles: Ottoman Turkish makam and Beijing opera. The datasets and the corpora comprise different data types: audio recordings, music scores, and metadata. From this perspective, the proposed models can take advantage both of the data and the music-domain knowledge of particular musical styles to improve existing baseline approaches. As a baseline, we choose a phonetic recognizer based on hidden Markov models (HMM): a widely-used methodology for tracking phonemes both in singing and speech processing problems. We present refinements in the typical steps of existing phonetic recognizer approaches, tailored towards the characteristics of the studied music traditions. On top of the refined baseline, we device probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to its complementary context. Two separate models are built for two granularities of complementary context: the structure of a melodic phrase (higher-level) and the structure of the metrical cycle (finer-level). In one model we exploit the fact the syllable durations depend on their position within a melodic phrase. Information about the melodic phrases is obtained from the score, as well as from music-specific knowledge.Then in another model, we analyse how vocal note onsets, estimated from audio recordings, influence the transitions between consecutive vowels and consonants. We also propose how to detect the time positions of vocal note onsets in melodic phrases by tracking simultaneously the positions in a metrical cycle (i.e. metrical accents). In order to evaluate the potential of the proposed models, we use the lyrics-to-audio alignment as a concrete task. Each model improves the alignment accuracy, compared to the baseline, which is based solely on the acoustics of the phonetic timbre. This validates our hypothesis that knowledge of complementary context is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment. The outcomes of this study are not only theoretic methodologies and data, but also specific software tools that have been integrated into Dunya - a suite of tools, built in the context of CompMusic, a project for advancing the computational analysis of the world's music. With this application, we have also shown that the developed methodologies are useful not only for tracking lyrics, but also for other use cases, such as enriched music listening and appreciation, or for educational purposes.
La tesi aquí presentada proposa metodologies d’aprenentatge automàtic i processament de senyal per alinear automàticament el text d’una cançó amb el seu corresponent enregistrament d’àudio. La recerca duta a terme s’engloba en l’ampli camp de l’extracció d’informació musical (Music Information Retrieval o MIR). Dins aquest context la tesi pretén millorar algunes de les metodologies d’última generació del camp introduint coneixement específic de l’àmbit. L’objectiu d’aquest treball és dissenyar models que siguin capaços de detectar en la senyal d’àudio l’aspecte seqüencial d’un element particular dels textos musicals; els fonemes. Podem entendre la música com la composició de diversos elements entre els quals podem trobar el text. Els models que construïm tenen en compte el context complementari del text. El context són tots aquells aspectes musicals que complementen el text, dels quals hem utilitzat en aquest tesi: la estructura de la composició musical, la estructura de les frases melòdiques i els accents rítmics. Des d’aquesta prespectiva analitzem no només les característiques acústiques de baix nivell, que representen el timbre musical dels fonemes, sinó també les característiques d’alt nivell en les quals es fa patent el context complementari. En aquest treball proposem models probabilístics específics que representen com les transicions entre fonemes consecutius de veu cantanda es veuen afectats per diversos aspectes del context complementari. El context complementari que tractem aquí es desenvolupa en el temps en funció de les característiques particulars de cada tradició musical. Per tal de modelar aquestes característiques hem creat corpus i conjunts de dades de dues tradicions musicals que presenten una gran riquesa en aquest aspectes; la música de l’opera de Beijing i la música makam turc-otomana. Les dades són de diversos tipus; enregistraments d’àudio, partitures musicals i metadades. Des d’aquesta prespectiva els models proposats poden aprofitar-se tant de les dades en si mateixes com del coneixement específic de la tradició musical per a millorar els resultats de referència actuals. Com a resultat de referència prenem un reconeixedor de fonemes basat en models ocults de Markov (Hidden Markov Models o HMM), una metodologia abastament emprada per a detectar fonemes tant en la veu cantada com en la parlada. Presentem millores en els processos comuns dels reconeixedors de fonemes actuals, ajustant-los a les característiques de les tradicions musicals estudiades. A més de millorar els resultats de referència també dissenyem models probabilistics basats en xarxes dinàmiques de Bayes (Dynamic Bayesian Networks o DBN) que respresenten la relació entre la transició dels fonemes i el context complementari. Hem creat dos models diferents per dos aspectes del context complementari; la estructura de la frase melòdica (alt nivell) i la estructura mètrica (nivell subtil). En un dels models explotem el fet que la duració de les síl·labes depén de la seva posició en la frase melòdica. Obtenim aquesta informació sobre les frases musical de la partitura i del coneixement específic de la tradició musical. En l’altre model analitzem com els atacs de les notes vocals, estimats directament dels enregistraments d’àudio, influencien les transicions entre vocals i consonants consecutives. A més també proposem com detectar les posicions temporals dels atacs de les notes en les frases melòdiques a base de localitzar simultàniament els accents en un cicle mètric musical. Per tal d’evaluar el potencial dels mètodes proposats utlitzem la tasca específica d’alineament de text amb àudio. Cada model proposat millora la precisió de l’alineament en comparació als resultats de referència, que es basen exclusivament en les característiques acústiques tímbriques dels fonemes. D’aquesta manera validem la nostra hipòtesi de que el coneixement del context complementari ajuda a la detecció automàtica de text musical, especialment en el cas de veu cantada amb acompanyament instrumental. Els resultats d’aquest treball no consisteixen només en metodologies teòriques i dades, sinó també en eines programàtiques específiques que han sigut integrades a Dunya, un paquet d’eines creat en el context del projecte de recerca CompMusic, l’objectiu del qual és promoure l’anàlisi computacional de les músiques del món. Gràcies a aquestes eines demostrem també que les metodologies desenvolupades es poden fer servir per a altres aplicacions en el context de la educació musical o la escolta musical enriquida.

APA, Harvard, Vancouver, ISO, and other styles

48

(6996329), Garrett M. O'Day. "Improving Problem Solving with Retrieval-Based Learning." Thesis, 2019.

Find full text

Abstract:

Recent research asserts that the mnemonic benefits gained from retrieval-based learning vanish for complex materials. Subsequently, it is recommended that students study worked examples when learning about complex, problem-centered tasks. The experiments that have evaluated the effectiveness of studying worked examples tend to overlook the mental processing that students engage in when completing retrieval-based learning activities. In contrast, theories of transfer-appropriate processing emphasize the importance of compatibility between the cognitive processing required by the test and the cognitive processing that is activated during learning. For learners to achieve optimal test performance, according to transfer-appropriate processing, they need to study in such a way that they are engaging in the same mental processing that will be required of them when tested. This idea was used to generate testable predictions that compete against the claim that the retrieval practice effect disappears for complex materials, and these competing predictions were evaluated in three experiments that required students to learn about the Poisson probability distribution.

In Experiment 1, students learned the general procedure for how to solve these problems by either repeatedly recalling the procedural steps or by simply studying them. The retrieval practice condition produced better memory for the procedure on an immediate test compared to the study only condition. In Experiment 2, students engaged in the same learning activities as Experiment 1, but the test focused on their problem- solving ability. Students who practiced retrieval of the procedural steps experienced no benefit on the problem-solving test compared to the study only condition. In Experiment 3, students learned to solve Poisson probability problems by studying four worked examples, by studying one worked example and solving three practice problems, or by studying one worked example and solving three practice problems with feedback. Students were tested on their problem-solving ability one week later. The problem- solving learning activities outperformed the worked example condition on the final problem-solving test. Taken together, the results demonstrate a pronounced retrieval practice effect but only when the retrieval-based learning activities necessitated the same mental processing that was required during the final assessment, providing support for the transfer-appropriate processing account.

APA, Harvard, Vancouver, ISO, and other styles

49

Chou, Yu-Sheng, and 周佑昇. "Personalized Face Retrieval based on Multi-Kernel Learning." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/14222127089871615899.

Full text

Abstract:

碩士
中原大學
資訊工程研究所
102
In recent years, attribute-based face image retrieval has become a hot research topic due to the explosive growth of social media. Semantic visual attributes are pre-trained and combined to retrieve specific face images. However, just like an image cannot be described by keywords completely, it is impossible to describe a face image by limited attributes. Therefore, we propose a personalized face image retrieval scheme based on Generalized Multiple Kernel Learning (GMKL) in this paper. Each face image is first aligned by Constrained Local Model (CLM) and landmarks are extracted for locating facial parts automatically. The local features extracted from different facial parts are then modeled as the base-kernels in GMKL. After learning the kernel weights from the training set that selected by a user, face image retrieval can be achieved without pre-training attributes. Experimental results show that our method is reliable and efficient on LFW dataset using only tens training samples.

APA, Harvard, Vancouver, ISO, and other styles

50

Hsu, Ching-Yu, and 許靖雨. "Information Retrieval Technology for Internet Based Learning Evaluation." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/02340663199595511023.

Full text

Abstract:

碩士
國立臺灣大學
土木工程學研究所
92
This study presents application of Information Retrieval (IR) technology for internet-based learning (IBL) evaluation. The objective is to support IBL, which is getting important due to the ubiquity of the internet. The learning evaluation of students are based on a “instruction assisting knowledge set” which consists of key words of learning materials. Acorrding to “learning assisting knowledge set”, we use IR technology, such as similarity evaluation and query expansion, to analyze students’ performance, especially for their learning universality. A two-week case study is conducted on junior and senior students in the department of Civil Engineering. The results show that the evaluation obtained from the system in good agreement with that of the instructor. The study that verifies the applicability of using IR technology to analyze students’ behaviors. As a result, instructors could use the analysis to find out students’ learning problem and provide appropriate guidance.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Retrieval-based learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles