Academic literature on the topic 'Retrieved document sets'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Retrieved document sets.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Retrieved document sets"

1

Fosci, Paolo, and Giuseppe Psaila. "Towards Flexible Retrieval, Integration and Analysis of JSON Data Sets through Fuzzy Sets: A Case Study." Information 12, no. 7 (June 22, 2021): 258. http://dx.doi.org/10.3390/info12070258.

Full text
Abstract:
How to exploit the incredible variety of JSON data sets currently available on the Internet, for example, on Open Data portals? The traditional approach would require getting them from the portals, then storing them into some JSON document store and integrating them within the document store. However, once data are integrated, the lack of a query language that provides flexible querying capabilities could prevent analysts from successfully completing their analysis. In this paper, we show how the J-CO Framework, a novel framework that we developed at the University of Bergamo (Italy) to manage large collections of JSON documents, is a unique and innovative tool that provides analysts with querying capabilities based on fuzzy sets over JSON data sets. Its query language, called J-CO-QL, is continuously evolving to increase potential applications; the most recent extensions give analysts the capability to retrieve data sets directly from web portals as well as constructs to apply fuzzy set theory to JSON documents and to provide analysts with the capability to perform imprecise queries on documents by means of flexible soft conditions. This paper presents a practical case study in which real data sets are retrieved, integrated and analyzed to effectively show the unique and innovative capabilities of the J-CO Framework.
APA, Harvard, Vancouver, ISO, and other styles
2

VILLATORO, ESAÚ, ANTONIO JUÁREZ, MANUEL MONTES, LUIS VILLASEÑOR, and L. ENRIQUE SUCAR. "Document ranking refinement using a Markov random field model." Natural Language Engineering 18, no. 2 (March 14, 2012): 155–85. http://dx.doi.org/10.1017/s1351324912000010.

Full text
Abstract:
AbstractThis paper introduces a novel ranking refinement approach based on relevance feedback for the task of document retrieval. We focus on the problem of ranking refinement since recent evaluation results from Information Retrieval (IR) systems indicate that current methods are effective retrieving most of the relevant documents for different sets of queries, but they have severe difficulties to generate a pertinent ranking of them. Motivated by these results, we propose a novel method to re-rank the list of documents returned by an IR system. The proposed method is based on a Markov Random Field (MRF) model that classifies the retrieved documents as relevant or irrelevant. The proposed MRF combines: (i) information provided by the base IR system, (ii) similarities among documents in the retrieved list, and (iii) relevance feedback information. Thus, the problem of ranking refinement is reduced to that of minimising an energy function that represents a trade-off between document relevance and inter-document similarity. Experiments were conducted using resources from four different tasks of the Cross Language Evaluation Forum (CLEF) forum as well as from one task of the Text Retrieval Conference (TREC) forum. The obtained results show the feasibility of the method for re-ranking documents in IR and also depict an improvement in mean average precision compared to a state of the art retrieval machine.
APA, Harvard, Vancouver, ISO, and other styles
3

Sunita, B., and T. John Peter. "Analysis of Various Multilingual Document Clustering." Journal of Computational and Theoretical Nanoscience 17, no. 9 (July 1, 2020): 3921–26. http://dx.doi.org/10.1166/jctn.2020.8989.

Full text
Abstract:
Today’s world is heading towards data science era. As the volume of the data is increasing extremely at exponential rate and data is produced and circulated all over the world, not only in English languages but in every regional language too. Since the data’s are in multilingual it is extremely difficult to manage such huge amount of variant data. Hence there is a scope for research work on multilingual document clustering. By document clustering we can retrieve the information of user query. This technique is to divide a given set of documents into a certain number of clusters. The aim is to create a multilingual document clustering that are related internally, but substantially different from each other. Main challenge faced while creating MDC, is the quality and stability of the cluster which swirls rapidly with document sets. Multilingual Document clustering has to be represented in a form of matrix which is done by either Vector Space Model or TF–IDF method. Where each word is given a value representing particular document. News articles are retrieved from relevant news explorer using proper search engines. Word in news articles are represented in one dimensional error using mathematical vector. Keywords in every articles are selected using term frequency (tf) and evaluated with inverse document frequency (idf). It uses discriminative power of keyword’s over articles. It encourages users to opt for cluster based browsing which is acceptable for processing the results. Big Data tools work efficiently in distributed environment which gives a significant analysis of our retrieved information. Many document clustering works well with small set of data but fails to deal with the large set of document. In this paper we concentrate on bring out the various problems that raise during multilingual document clustering and possible solution to overcome those problems.
APA, Harvard, Vancouver, ISO, and other styles
4

Jayasudha, R., S. Subramanian, and L. Sivakumar. "Genetic Algorithm and PSO Based Intelligent Software Reuse." Applied Mechanics and Materials 573 (June 2014): 612–17. http://dx.doi.org/10.4028/www.scientific.net/amm.573.612.

Full text
Abstract:
Software Reuse can improve the development time, cost and quality of Software artifacts. The Storage of artifacts plays an important role of easy retrieval of the needed components according to the requirement. In this paper a great measure has been taken for the retrieval of relevant component from the Ontology based repository. Two famous evolutionary algorithms Genetic Algorithm and Particle Swarm Optimization algorithm were used for extraction of needed component. These two algorithms are separately used for component retrieval. Genetic Algorithm in Component Retrieval is best suited if the repository has more number of relevant components. PSO for Component search is best suited if the query is highly refined to get more relevant document. PSO is used for the mainly query expansion. These two methods are combined first the retrieved set of component is organized with the help of GA and PSO for best query expansion. Thus these two methods are combined for best precision and retrieval time for different sets of requirement query
APA, Harvard, Vancouver, ISO, and other styles
5

Kumaravel, Girthana, and Swamynathan Sankaranarayanan. "PQPS: Prior-Art Query-Based Patent Summarizer Using RBM and Bi-LSTM." Mobile Information Systems 2021 (December 28, 2021): 1–19. http://dx.doi.org/10.1155/2021/2497770.

Full text
Abstract:
A prior-art search on patents ascertains the patentability constraints of the invention through an organized review of prior-art document sources. This search technique poses challenges because of the inherent vocabulary mismatch problem. Manual processing of every retrieved relevant patent in its entirety is a tedious and time-consuming job that demands automated patent summarization for ease of access. This paper employs deep learning models for summarization as they take advantage of the massive dataset present in the patents to improve the summary coherence. This work presents a novel approach of patent summarization named PQPS: prior-art query-based patent summarizer using restricted Boltzmann machine (RBM) and bidirectional long short-term memory (Bi-LSTM) models. The PQPS also addresses the vocabulary mismatch problem through query expansion with knowledge bases such as domain ontology and WordNet. It further enhances the retrieval rate through topic modeling and bibliographic coupling of citations. The experiments analyze various interlinked smart device patent sample sets. The proposed PQPS demonstrates that retrievability increases both in extractive and abstractive summaries.
APA, Harvard, Vancouver, ISO, and other styles
6

Yogish, Deepa, T. N. Manjunath, and Ravindra S. Hegadi. "Analysis of Vector Space Method in Information Retrieval for Smart Answering System." Journal of Computational and Theoretical Nanoscience 17, no. 9 (July 1, 2020): 4468–72. http://dx.doi.org/10.1166/jctn.2020.9099.

Full text
Abstract:
In the world of internet, searching play a vital role to retrieve the relevant answers for the user specific queries. The most promising application of natural language processing and information retrieval system is Question answering system which provides directly the accurate answer instead of set of documents. The main objective of information retrieval is to retrieve relevant document from a huge volume of data sets underlying in the internet using appropriatemodel. There are many models proposed for retrieval process such as Boolean, Vector space and Probabilistic method. Vector space model is best method in information retrieval for document ranking with efficient document representation which combines simplicity and clarity. VSM adopts similarity function to measure the matching between documents and user intent, and assign scores from the biggest to smallest. The documents and query are assigned with weights using term frequency and inverse document frequency method. To retrieve most relevant document to the user query term, document ranking function cosine similarity score is applied for every document and user query. The documents having more similarity scores will be considered as relevant documents to the query term and they are ranked based on these scores. This paper emphasizes on different techniques of information retrieval and Vector Space Model offers a realistic compromise in IR processing. It allows best weighing scheme which ranks the set of documents in order of relevance based on user query.
APA, Harvard, Vancouver, ISO, and other styles
7

Marijan, Robert, and Robert Leskovar. "A library’s information retrieval system (In)effectiveness: case study." Library Hi Tech 33, no. 3 (September 21, 2015): 369–86. http://dx.doi.org/10.1108/lht-07-2015-0071.

Full text
Abstract:
Purpose – The purpose of this paper is to evaluate the effectiveness of the information retrieval component of a daily newspaper publisher’s integrated library system (ILS) in comparison with the open source alternatives and observe the impact of the scale of metadata, generated daily by library administrators, on retrieved result sets. Design/methodology/approach – In Experiment 1, the authors compared the result sets of the information retrieval system (IRS) component of the publisher’s current ILS and the result sets of proposed ones with human-assessed relevance judgment set. In Experiment 2, the authors compared the performance of proposed IRS components with the publisher’s current production IRS, using result sets of current IRS classified as relevant. Both experiments were conducted using standard information retrieval (IR) evaluation methods: precision, recall, precision at k, F-measure, mean average precision and 11-point interpolated average precision. Findings – Results showed that: first, in Experiment 1, the publisher’s current production ILS ranked last of all participating IRSs when compared to a relevance document set classified by the senior library administrator; and second, in Experiment 2, the tested IR components’ request handlers that used only automatically generated metadata performed slightly better than request handlers that used all of the metadata fields. Therefore, regarding the effectiveness of IR, the daily human effort of generating the publisher’s current set of metadata attributes is unjustified. Research limitations/implications – The experiments’ collections contained Slovene language with large number of variations of the forms of nouns, verbs and adjectives. The results could be different if the experiments’ collections contained languages with different grammatical properties. Practical implications – The authors have confirmed, using standard IR methods, that the IR component used in the publisher’s current ILS, could be adequately replaced with an open source component. Based on the research, the publisher could incorporate the suggested open source IR components in practice. In the research, the authors have described the methods that can be used by libraries for evaluating the effectiveness of the IR of their ILSs. Originality/value – The paper provides a framework for the evaluation of an ILS’s IR effectiveness for libraries. Based on the evaluation results, the libraries could replace the IR components if their current information system setup allows it.
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Yanshan, In-Chan Choi, and Hongfang Liu. "Generalized ensemble model for document ranking in information retrieval." Computer Science and Information Systems 14, no. 1 (2017): 123–51. http://dx.doi.org/10.2298/csis160229042w.

Full text
Abstract:
A generalized ensemble model (gEnM) for document ranking is proposed in this paper. The gEnM linearly combines the document retrieval models and tries to retrieve relevant documents at high positions. In order to obtain the optimal linear combination of multiple document retrieval models or rankers, an optimization program is formulated by directly maximizing the mean average precision. Both supervised and unsupervised learning algorithms are presented to solve this program. For the supervised scheme, two approaches are considered based on the data setting, namely batch and online setting. In the batch setting, we propose a revised Newton?s algorithm, gEnM.BAT, by approximating the derivative and Hessian matrix. In the online setting, we advocate a stochastic gradient descent (SGD) based algorithm-gEnM.ON. As for the unsupervised scheme, an unsupervised ensemble model (UnsEnM) by iteratively co-learning from each constituent ranker is presented. Experimental study on benchmark data sets verifies the effectiveness of the proposed algorithms. Therefore, with appropriate algorithms, the gEnM is a viable option in diverse practical information retrieval applications.
APA, Harvard, Vancouver, ISO, and other styles
9

Nita, Stefania Loredana. "Secure Document Search in Cloud Computing using MapReduce." Scientific Bulletin of Naval Academy XXIII, no. 1 (July 15, 2020): 231–35. http://dx.doi.org/10.21279/1454-864x-20-i1-031.

Full text
Abstract:
Nowadays, cloud computing is an important technology, which is part of our daily lives. Moving to cloud brings some benefits: create new applications, store large sets of data, process large amount of data. Individual users or companies can store own data on cloud (e.g. maritime, environmental protection, physics analysis etc.). An important thing before storing in cloud is that data needs to be encrypted, in order to keep its confidentiality. Among these, users can store encrypted documents on cloud. However, when owner needs a specific document, they should retrieve all documents from cloud, decrypt them, chose the desired document, encrypt again and finally store back encrypted documents on cloud. To avoid these entire steps, a user can choose to work with searchable encryption. This is an encryption technique, where key words (or indexes) are associated to encrypted documents, and when the owner needs a document, he/she only needs to search throw key words and then retrieve the documents that have associated the desired keywords. An important programming paradigm for cloud computing is MapReduce, which allows high scalability on a large number of servers in a cluster. Basically, MapReduce works with (key, value) pairs. In the current study paper, we describe a new technique through which a user can extract encrypted documents stored on cloud servers based on key words, using searchable encryption and MapReduce.
APA, Harvard, Vancouver, ISO, and other styles
10

Bhari, Purushottam, Abhishek Dadhich, and Vikram Khandelwal. "An Approach for Improving Similarity Measure Using Fuzzy Logic." ECS Transactions 107, no. 1 (April 24, 2022): 20213–33. http://dx.doi.org/10.1149/10701.20213ecst.

Full text
Abstract:
An information retrieval system stores and indexes documents such that when users submit a query, the system gets relevant documents and assigns a score to each one. The higher the score, the more important the document is. IR systems typically yield vast result sets, and users must spend a significant amount of time sifting through them to identify the elements that are genuinely important. Different suggestions for applying evolutionary computing to the topic of information retrieval will be reviewed from the specialist literature. To do so, researchers looked at a variety of IR issues that were addressed using evolutionary algorithms. Some of the current ways will be detailed in detail; for example, when dealing with specialized domain knowledge, this challenge can be solved by embedding a knowledge base into existing information retrieval systems that illustrates the relationships between index words. The fuzzy set theory may be used to change the knowledge in the bases to cope with the ambiguity that is typical of human knowledge. In this work, a novel way for implementing a similarity measure utilizing fuzzy logic for IR is provided. A suggested similarity metric is based on many IR system attributes that boost IR system performance. This method's strength is that it can extract the majority of a document's characteristics. Fuzzy rules, which translated domain knowledge into fuzzy sets, were also designed to make this most effective. Our suggested similarity metric is validated using the CACM and CRAN benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Retrieved document sets"

1

Oyarce, Guillermo Alfredo. "A Study of Graphically Chosen Features for Representation of TREC Topic-Document Sets." Thesis, University of North Texas, 2000. https://digital.library.unt.edu/ark:/67531/metadc2456/.

Full text
Abstract:
Document representation is important for computer-based text processing. Good document representations must include at least the most salient concepts of the document. Documents exist in a multidimensional space that difficult the identification of what concepts to include. A current problem is to measure the effectiveness of the different strategies that have been proposed to accomplish this task. As a contribution towards this goal, this dissertation studied the visual inter-document relationship in a dimensionally reduced space. The same treatment was done on full text and on three document representations. Two of the representations were based on the assumption that the salient features in a document set follow the chi-distribution in the whole document set. The third document representation identified features through a novel method. A Coefficient of Variability was calculated by normalizing the Cartesian distance of the discriminating value in the relevant and the non-relevant document subsets. Also, the local dictionary method was used. Cosine similarity values measured the inter-document distance in the information space and formed a matrix to serve as input to the Multi-Dimensional Scale (MDS) procedure. A Precision-Recall procedure was averaged across all treatments to statistically compare them. Treatments were not found to be statistically the same and the null hypotheses were rejected.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Retrieved document sets"

1

Jiang, Dongwei. The methods of analyzing retrieved document sets in information retrieval. 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Harabagiu, Sanda, and Dan Moldovan. Question Answering. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0031.

Full text
Abstract:
Textual Question Answering (QA) identifies the answer to a question in large collections of on-line documents. By providing a small set of exact answers to questions, QA takes a step closer to information retrieval rather than document retrieval. A QA system comprises three modules: a question-processing module, a document-processing module, and an answer extraction and formulation module. Questions may be asked about any topic, in contrast with Information Extraction (IE), which identifies textual information relevant only to a predefined set of events and entities. The natural language processing (NLP) techniques used in open-domain QA systems may range from simple lexical and semantic disambiguation of question stems to complex processing that combines syntactic and semantic features of the questions with pragmatic information derived from the context of candidate answers. This article reviews current research in integrating knowledge-based NLP methods with shallow processing techniques for QA.
APA, Harvard, Vancouver, ISO, and other styles
3

Introducción a la teoría de conjuntos, los operadores booleanos y la teoría del concepto para profesionales de la información documental. México: Universidad Nacional Autónoma de México, 2017.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Johansen, Bruce, and Adebowale Akande, eds. Nationalism: Past as Prologue. Nova Science Publishers, Inc., 2021. http://dx.doi.org/10.52305/aief3847.

Full text
Abstract:
Nationalism: Past as Prologue began as a single volume being compiled by Ad Akande, a scholar from South Africa, who proposed it to me as co-author about two years ago. The original idea was to examine how the damaging roots of nationalism have been corroding political systems around the world, and creating dangerous obstacles for necessary international cooperation. Since I (Bruce E. Johansen) has written profusely about climate change (global warming, a.k.a. infrared forcing), I suggested a concerted effort in that direction. This is a worldwide existential threat that affects every living thing on Earth. It often compounds upon itself, so delays in reducing emissions of fossil fuels are shortening the amount of time remaining to eliminate the use of fossil fuels to preserve a livable planet. Nationalism often impedes solutions to this problem (among many others), as nations place their singular needs above the common good. Our initial proposal got around, and abstracts on many subjects arrived. Within a few weeks, we had enough good material for a 100,000-word book. The book then fattened to two moderate volumes and then to four two very hefty tomes. We tried several different titles as good submissions swelled. We also discovered that our best contributors were experts in their fields, which ranged the world. We settled on three stand-alone books:” 1/ nationalism and racial justice. Our first volume grew as the growth of Black Lives Matter following the brutal killing of George Floyd ignited protests over police brutality and other issues during 2020, following the police assassination of Floyd in Minneapolis. It is estimated that more people took part in protests of police brutality during the summer of 2020 than any other series of marches in United States history. This includes upheavals during the 1960s over racial issues and against the war in Southeast Asia (notably Vietnam). We choose a volume on racism because it is one of nationalism’s main motive forces. This volume provides a worldwide array of work on nationalism’s growth in various countries, usually by authors residing in them, or in the United States with ethnic ties to the nation being examined, often recent immigrants to the United States from them. Our roster of contributors comprises a small United Nations of insightful, well-written research and commentary from Indonesia, New Zealand, Australia, China, India, South Africa, France, Portugal, Estonia, Hungary, Russia, Poland, Kazakhstan, Georgia, and the United States. Volume 2 (this one) describes and analyzes nationalism, by country, around the world, except for the United States; and 3/material directly related to President Donald Trump, and the United States. The first volume is under consideration at the Texas A & M University Press. The other two are under contract to Nova Science Publishers (which includes social sciences). These three volumes may be used individually or as a set. Environmental material is taken up in appropriate places in each of the three books. * * * * * What became the United States of America has been strongly nationalist since the English of present-day Massachusetts and Jamestown first hit North America’s eastern shores. The country propelled itself across North America with the self-serving ideology of “manifest destiny” for four centuries before Donald Trump came along. Anyone who believes that a Trumpian affection for deportation of “illegals” is a new thing ought to take a look at immigration and deportation statistics in Adam Goodman’s The Deportation Machine: America’s Long History of Deporting Immigrants (Princeton University Press, 2020). Between 1920 and 2018, the United States deported 56.3 million people, compared with 51.7 million who were granted legal immigration status during the same dates. Nearly nine of ten deportees were Mexican (Nolan, 2020, 83). This kind of nationalism, has become an assassin of democracy as well as an impediment to solving global problems. Paul Krugman wrote in the New York Times (2019:A-25): that “In their 2018 book, How Democracies Die, the political scientists Steven Levitsky and Daniel Ziblatt documented how this process has played out in many countries, from Vladimir Putin’s Russia, to Recep Erdogan’s Turkey, to Viktor Orban’s Hungary. Add to these India’s Narendra Modi, China’s Xi Jinping, and the United States’ Donald Trump, among others. Bit by bit, the guardrails of democracy have been torn down, as institutions meant to serve the public became tools of ruling parties and self-serving ideologies, weaponized to punish and intimidate opposition parties’ opponents. On paper, these countries are still democracies; in practice, they have become one-party regimes….And it’s happening here [the United States] as we speak. If you are not worried about the future of American democracy, you aren’t paying attention” (Krugmam, 2019, A-25). We are reminded continuously that the late Carl Sagan, one of our most insightful scientific public intellectuals, had an interesting theory about highly developed civilizations. Given the number of stars and planets that must exist in the vast reaches of the universe, he said, there must be other highly developed and organized forms of life. Distance may keep us from making physical contact, but Sagan said that another reason we may never be on speaking terms with another intelligent race is (judging from our own example) could be their penchant for destroying themselves in relatively short order after reaching technological complexity. This book’s chapters, introduction, and conclusion examine the worldwide rise of partisan nationalism and the damage it has wrought on the worldwide pursuit of solutions for issues requiring worldwide scope, such scientific co-operation public health and others, mixing analysis of both. We use both historical description and analysis. This analysis concludes with a description of why we must avoid the isolating nature of nationalism that isolates people and encourages separation if we are to deal with issues of world-wide concern, and to maintain a sustainable, survivable Earth, placing the dominant political movement of our time against the Earth’s existential crises. Our contributors, all experts in their fields, each have assumed responsibility for a country, or two if they are related. This work entwines themes of worldwide concern with the political growth of nationalism because leaders with such a worldview are disinclined to co-operate internationally at a time when nations must find ways to solve common problems, such as the climate crisis. Inability to cooperate at this stage may doom everyone, eventually, to an overheated, stormy future plagued by droughts and deluges portending shortages of food and other essential commodities, meanwhile destroying large coastal urban areas because of rising sea levels. Future historians may look back at our time and wonder why as well as how our world succumbed to isolating nationalism at a time when time was so short for cooperative intervention which is crucial for survival of a sustainable earth. Pride in language and culture is salubrious to individuals’ sense of history and identity. Excess nationalism that prevents international co-operation on harmful worldwide maladies is quite another. As Pope Francis has pointed out: For all of our connectivity due to expansion of social media, ability to communicate can breed contempt as well as mutual trust. “For all our hyper-connectivity,” said Francis, “We witnessed a fragmentation that made it more difficult to resolve problems that affect us all” (Horowitz, 2020, A-12). The pope’s encyclical, titled “Brothers All,” also said: “The forces of myopic, extremist, resentful, and aggressive nationalism are on the rise.” The pope’s document also advocates support for migrants, as well as resistance to nationalist and tribal populism. Francis broadened his critique to the role of market capitalism, as well as nationalism has failed the peoples of the world when they need co-operation and solidarity in the face of the world-wide corona virus pandemic. Humankind needs to unite into “a new sense of the human family [Fratelli Tutti, “Brothers All”], that rejects war at all costs” (Pope, 2020, 6-A). Our journey takes us first to Russia, with the able eye and honed expertise of Richard D. Anderson, Jr. who teaches as UCLA and publishes on the subject of his chapter: “Putin, Russian identity, and Russia’s conduct at home and abroad.” Readers should find Dr. Anderson’s analysis fascinating because Vladimir Putin, the singular leader of Russian foreign and domestic policy these days (and perhaps for the rest of his life, given how malleable Russia’s Constitution has become) may be a short man physically, but has high ambitions. One of these involves restoring the old Russian (and Soviet) empire, which would involve re-subjugating a number of nations that broke off as the old order dissolved about 30 years ago. President (shall we say czar?) Putin also has international ambitions, notably by destabilizing the United States, where election meddling has become a specialty. The sight of Putin and U.S. president Donald Trump, two very rich men (Putin $70-$200 billion; Trump $2.5 billion), nuzzling in friendship would probably set Thomas Jefferson and Vladimir Lenin spinning in their graves. The road of history can take some unanticipated twists and turns. Consider Poland, from which we have an expert native analysis in chapter 2, Bartosz Hlebowicz, who is a Polish anthropologist and journalist. His piece is titled “Lawless and Unjust: How to Quickly Make Your Own Country a Puppet State Run by a Group of Hoodlums – the Hopeless Case of Poland (2015–2020).” When I visited Poland to teach and lecture twice between 2006 and 2008, most people seemed to be walking on air induced by freedom to conduct their own affairs to an unusual degree for a state usually squeezed between nationalists in Germany and Russia. What did the Poles then do in a couple of decades? Read Hlebowicz’ chapter and decide. It certainly isn’t soft-bellied liberalism. In Chapter 3, with Bruce E. Johansen, we visit China’s western provinces, the lands of Tibet as well as the Uighurs and other Muslims in the Xinjiang region, who would most assuredly resent being characterized as being possessed by the Chinese of the Han to the east. As a student of Native American history, I had never before thought of the Tibetans and Uighurs as Native peoples struggling against the Independence-minded peoples of a land that is called an adjunct of China on most of our maps. The random act of sitting next to a young woman on an Air India flight out of Hyderabad, bound for New Delhi taught me that the Tibetans had something to share with the Lakota, the Iroquois, and hundreds of other Native American states and nations in North America. Active resistance to Chinese rule lasted into the mid-nineteenth century, and continues today in a subversive manner, even in song, as I learned in 2018 when I acted as a foreign adjudicator on a Ph.D. dissertation by a Tibetan student at the University of Madras (in what is now in a city called Chennai), in southwestern India on resistance in song during Tibet’s recent history. Tibet is one of very few places on Earth where a young dissident can get shot to death for singing a song that troubles China’s Quest for Lebensraum. The situation in Xinjiang region, where close to a million Muslims have been interned in “reeducation” camps surrounded with brick walls and barbed wire. They sing, too. Come with us and hear the music. Back to Europe now, in Chapter 4, to Portugal and Spain, we find a break in the general pattern of nationalism. Portugal has been more progressive governmentally than most. Spain varies from a liberal majority to military coups, a pattern which has been exported to Latin America. A situation such as this can make use of the term “populism” problematic, because general usage in our time usually ties the word into a right-wing connotative straightjacket. “Populism” can be used to describe progressive (left-wing) insurgencies as well. José Pinto, who is native to Portugal and also researches and writes in Spanish as well as English, in “Populism in Portugal and Spain: a Real Neighbourhood?” provides insight into these historical paradoxes. Hungary shares some historical inclinations with Poland (above). Both emerged from Soviet dominance in an air of developing freedom and multicultural diversity after the Berlin Wall fell and the Soviet Union collapsed. Then, gradually at first, right wing-forces began to tighten up, stripping structures supporting popular freedom, from the courts, mass media, and other institutions. In Chapter 5, Bernard Tamas, in “From Youth Movement to Right-Liberal Wing Authoritarianism: The Rise of Fidesz and the Decline of Hungarian Democracy” puts the renewed growth of political and social repression into a context of worldwide nationalism. Tamas, an associate professor of political science at Valdosta State University, has been a postdoctoral fellow at Harvard University and a Fulbright scholar at the Central European University in Budapest, Hungary. His books include From Dissident to Party Politics: The Struggle for Democracy in Post-Communist Hungary (2007). Bear in mind that not everyone shares Orbán’s vision of what will make this nation great, again. On graffiti-covered walls in Budapest, Runes (traditional Hungarian script) has been found that read “Orbán is a motherfucker” (Mikanowski, 2019, 58). Also in Europe, in Chapter 6, Professor Ronan Le Coadic, of the University of Rennes, Rennes, France, in “Is There a Revival of French Nationalism?” Stating this title in the form of a question is quite appropriate because France’s nationalistic shift has built and ebbed several times during the last few decades. For a time after 2000, it came close to assuming the role of a substantial minority, only to ebb after that. In 2017, the candidate of the National Front reached the second round of the French presidential election. This was the second time this nationalist party reached the second round of the presidential election in the history of the Fifth Republic. In 2002, however, Jean-Marie Le Pen had only obtained 17.79% of the votes, while fifteen years later his daughter, Marine Le Pen, almost doubled her father's record, reaching 33.90% of the votes cast. Moreover, in the 2019 European elections, re-named Rassemblement National obtained the largest number of votes of all French political formations and can therefore boast of being "the leading party in France.” The brutality of oppressive nationalism may be expressed in personal relationships, such as child abuse. While Indonesia and Aotearoa [the Maoris’ name for New Zealand] hold very different ranks in the United Nations Human Development Programme assessments, where Indonesia is classified as a medium development country and Aotearoa New Zealand as a very high development country. In Chapter 7, “Domestic Violence Against Women in Indonesia and Aotearoa New Zealand: Making Sense of Differences and Similarities” co-authors, in Chapter 8, Mandy Morgan and Dr. Elli N. Hayati, from New Zealand and Indonesia respectively, found that despite their socio-economic differences, one in three women in each country experience physical or sexual intimate partner violence over their lifetime. In this chapter ther authors aim to deepen understandings of domestic violence through discussion of the socio-economic and demographic characteristics of theit countries to address domestic violence alongside studies of women’s attitudes to gender norms and experiences of intimate partner violence. One of the most surprising and upsetting scholarly journeys that a North American student may take involves Adolf Hitler’s comments on oppression of American Indians and Blacks as he imagined the construction of the Nazi state, a genesis of nationalism that is all but unknown in the United States of America, traced in this volume (Chapter 8) by co-editor Johansen. Beginning in Mein Kampf, during the 1920s, Hitler explicitly used the westward expansion of the United States across North America as a model and justification for Nazi conquest and anticipated colonization by Germans of what the Nazis called the “wild East” – the Slavic nations of Poland, the Baltic states, Ukraine, and Russia, most of which were under control of the Soviet Union. The Volga River (in Russia) was styled by Hitler as the Germans’ Mississippi, and covered wagons were readied for the German “manifest destiny” of imprisoning, eradicating, and replacing peoples the Nazis deemed inferior, all with direct references to events in North America during the previous century. At the same time, with no sense of contradiction, the Nazis partook of a long-standing German romanticism of Native Americans. One of Goebbels’ less propitious schemes was to confer honorary Aryan status on Native American tribes, in the hope that they would rise up against their oppressors. U.S. racial attitudes were “evidence [to the Nazis] that America was evolving in the right direction, despite its specious rhetoric about equality.” Ming Xie, originally from Beijing, in the People’s Republic of China, in Chapter 9, “News Coverage and Public Perceptions of the Social Credit System in China,” writes that The State Council of China in 2014 announced “that a nationwide social credit system would be established” in China. “Under this system, individuals, private companies, social organizations, and governmental agencies are assigned a score which will be calculated based on their trustworthiness and daily actions such as transaction history, professional conduct, obedience to law, corruption, tax evasion, and academic plagiarism.” The “nationalism” in this case is that of the state over the individual. China has 1.4 billion people; this system takes their measure for the purpose of state control. Once fully operational, control will be more subtle. People who are subject to it, through modern technology (most often smart phones) will prompt many people to self-censor. Orwell, modernized, might write: “Your smart phone is watching you.” Ming Xie holds two Ph.Ds, one in Public Administration from University of Nebraska at Omaha and another in Cultural Anthropology from the Chinese Academy of Social Sciences, Beijing, where she also worked for more than 10 years at a national think tank in the same institution. While there she summarized news from non-Chinese sources for senior members of the Chinese Communist Party. Ming is presently an assistant professor at the Department of Political Science and Criminal Justice, West Texas A&M University. In Chapter 10, analyzing native peoples and nationhood, Barbara Alice Mann, Professor of Honours at the University of Toledo, in “Divide, et Impera: The Self-Genocide Game” details ways in which European-American invaders deprive the conquered of their sense of nationhood as part of a subjugation system that amounts to genocide, rubbing out their languages and cultures -- and ultimately forcing the native peoples to assimilate on their own, for survival in a culture that is foreign to them. Mann is one of Native American Studies’ most acute critics of conquests’ contradictions, and an author who retrieves Native history with a powerful sense of voice and purpose, having authored roughly a dozen books and numerous book chapters, among many other works, who has traveled around the world lecturing and publishing on many subjects. Nalanda Roy and S. Mae Pedron in Chapter 11, “Understanding the Face of Humanity: The Rohingya Genocide.” describe one of the largest forced migrations in the history of the human race, the removal of 700,000 to 800,000 Muslims from Buddhist Myanmar to Bangladesh, which itself is already one of the most crowded and impoverished nations on Earth. With about 150 million people packed into an area the size of Nebraska and Iowa (population less than a tenth that of Bangladesh, a country that is losing land steadily to rising sea levels and erosion of the Ganges river delta. The Rohingyas’ refugee camp has been squeezed onto a gigantic, eroding, muddy slope that contains nearly no vegetation. However, Bangladesh is majority Muslim, so while the Rohingya may starve, they won’t be shot to death by marauding armies. Both authors of this exquisite (and excruciating) account teach at Georgia Southern University in Savannah, Georgia, Roy as an associate professor of International Studies and Asian politics, and Pedron as a graduate student; Roy originally hails from very eastern India, close to both Myanmar and Bangladesh, so he has special insight into the context of one of the most brutal genocides of our time, or any other. This is our case describing the problems that nationalism has and will pose for the sustainability of the Earth as our little blue-and-green orb becomes more crowded over time. The old ways, in which national arguments often end in devastating wars, are obsolete, given that the Earth and all the people, plants, and other animals that it sustains are faced with the existential threat of a climate crisis that within two centuries, more or less, will flood large parts of coastal cities, and endanger many species of plants and animals. To survive, we must listen to the Earth, and observe her travails, because they are increasingly our own.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Retrieved document sets"

1

Repke, Tim, and Ralf Krestel. "Extraction and Representation of Financial Entities from Text." In Data Science for Economics and Finance, 241–63. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-66891-4_11.

Full text
Abstract:
AbstractIn our modern society, almost all events, processes, and decisions in a corporation are documented by internal written communication, legal filings, or business and financial news. The valuable knowledge in such collections is not directly accessible by computers as they mostly consist of unstructured text. This chapter provides an overview of corpora commonly used in research and highlights related work and state-of-the-art approaches to extract and represent financial entities and relations.The second part of this chapter considers applications based on knowledge graphs of automatically extracted facts. Traditional information retrieval systems typically require the user to have prior knowledge of the data. Suitable visualization techniques can overcome this requirement and enable users to explore large sets of documents. Furthermore, data mining techniques can be used to enrich or filter knowledge graphs. This information can augment source documents and guide exploration processes. Systems for document exploration are tailored to specific tasks, such as investigative work in audits or legal discovery, monitoring compliance, or providing information in a retrieval system to support decisions.
APA, Harvard, Vancouver, ISO, and other styles
2

Loukachevitch, Natalia, and Boris Dobrov. "RuThes Thesaurus for Natural Language Processing." In The Palgrave Handbook of Digital Russia Studies, 319–34. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-42855-6_18.

Full text
Abstract:
AbstractThis chapter describes the Russian RuThes thesaurus created as a linguistic and terminological resource for automatic document processing. Its structure utilizes two popular paradigms for computer thesauri: concept-based units, a small set of relation types, rules for including multiword expression as in information retrieval thesauri; and language-motivated units, detailed sets of synonyms, description of ambiguous words as in WordNet-like thesauri. The development of the RuThes thesaurus is supported for many years: new concepts, new senses, and multiword expressions found in contemporary texts are introduced regularly. The chapter shows some examples of representing newly appeared concepts related to important internal and international events.
APA, Harvard, Vancouver, ISO, and other styles
3

Moro, Gianluca, Lorenzo Valgimigli, Alex Rossi, Cristiano Casadei, and Andrea Montefiori. "Self-supervised Information Retrieval Trained from Self-generated Sets of Queries and Relevant Documents." In Similarity Search and Applications, 283–90. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-17849-8_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ruecker, Stan. "Rich-Prospect Browsing Interfaces." In Encyclopedia of Multimedia Technology and Networking, Second Edition, 1240–48. IGI Global, 2009. http://dx.doi.org/10.4018/978-1-60566-014-1.ch168.

Full text
Abstract:
Everyone who has browsed the Internet is familiar with the problems involved in finding what they want. From the novice to the most sophisticated user, the challenge is the same: how to identify quickly and reliably the precise Web sites or other documents they seek from within an ever-growing collection of several billion possibilities? This is not a new problem. Vannevar Bush, the successful Director of the Office of Scientific Research and Development, which included the Manhattan project, made a famous public call in The Atlantic Monthly in 1945 for the scientific community in peacetime to continue pursuing the style of fruitful collaboration they had experienced during the war (Bush, 1945). Bush advocated this approach to address the central difficulty posed by the proliferation of information beyond what could be managed by any single expert using contemporary methods of document management and retrieval. Bush’s vision is often cited as one of the early visions of the World Wide Web, with professional navigators trailblazing paths through the literature and leaving sets of linked documents behind them for others to follow. Sixty years later, we have the professional indexers behind Google, providing the rest of us with a magic window into the data. We can type a keyword or two, pause for reflection, then hit the “I’m feeling lucky” button and see what happens. Technically, even though it often runs in a browser, this task is “information retrieval.” One of its fundamental tenets is that the user cannot manage the data and needs to be guided and protected through the maze by a variety of information hierarchies, taxonomies, indexes, and keywords. Information retrieval is a complex research domain. The Association for Computing Machinery, arguably the largest professional organization for academic computing scientists, sponsors a periodic contest in information retrieval, where teams compete to see who has the most effective algorithms. The contest organizers choose or create a document collection, such as a set of a hundred thousand newspaper articles in English, and contestants demonstrate their software’s ability to find the most documents most accurately. Two of the measures are precision and recall: both of these are ratios, and they pull in opposite directions. Precision is the ratio of the number of documents that have been correctly identified out of the number of documents returned by the search. Recall is the ratio of the number of documents that have been retrieved out of the total number in the collection that should have been retrieved. It is therefore possible to get 100% on precision—just retrieve one document precisely on topic. However, the corresponding recall score would be a disaster. Similarly, an algorithm can score 100% on recall just by retrieving all the documents in the collection. Again, the related precision score would be abysmal. Fortunately, information retrieval is not the only technology available. For collections that only contain thousands of entries, there is no reason why people should not be allowed to simply browse the entire contents, rather than being limited to carrying out searches. Certainly, retrieval can be part of browsing—the two technologies are not mutually exclusive. However, by embedding retrieval within browsing the user gains a significant number of perceptual advantages and new opportunities for actions.
APA, Harvard, Vancouver, ISO, and other styles
5

C. N., Subalalitha, and Balaji J. "Automatic Query-Focused Summary Generation System for Tourism Discourse Using Rhetorical Structure Theory." In Innovative Perspectives on Tourism Discourse, 201–12. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-2930-9.ch012.

Full text
Abstract:
Summary generation systems when integrated with Information Retrieval (IR) can give an idea about the retrieved web pages to the user before even the user opens the web page. The summary could be generated for a single web page or for a set of web pages retrieved for a given query. When such a system is built for tourism web sites, the user can get a summary of a particular tourist spot or about the tourist spots present in a particular place. This chapter describes about such a summary generation system which is built using Rhetorical Structure Theory (RST). RST is a well-known discourse theory which is used for discourse analysis of text documents. The RST makes use of another semantic representation namely, Universal Networking Language (UNL) to find the coherent text fragments. These coherent text fragments are indexed and linked with an IR system. When a user gives a query, the web pages along with a single document and multi document summary.
APA, Harvard, Vancouver, ISO, and other styles
6

Fang, Jianing. "Retrieving HTML and XBRL Data With Microsoft Excel 2010." In Maximizing Social Science Research Through Publicly Accessible Data Sets, 62–82. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-3616-1.ch004.

Full text
Abstract:
The Securities and Exchange Commission (SEC) has upgraded the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, to the Interactive Data Electronic Applications (IDEA) platform, or the Next-Generation EDGAR (New EDGAR). The SEC issued its final mandate for XBRL adoption and the conversion target dates for all firms in January 2009. With this conversion, users can retrieve the financial statement information of listed companies at both the document level and data element level. This chapter reviews the fundamental concepts of XBRL and reports on the current compliance status of the SEC XBRL conversion mandate. The main task is to demonstrate how to retrieve data from the New EDGAR and how to process the data with Microsoft Excel 2010.
APA, Harvard, Vancouver, ISO, and other styles
7

Dami, Asmae, Mohamed Fakir, and Belaid Bouikhalene. "Information Retrieval (IR) and Extracting Associative Rules." In Business Intelligence, 713–32. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-9562-7.ch037.

Full text
Abstract:
This chapter is located in the intersection of two research themes, namely: Information Retrieval and Knowledge Discovery from texts (Text mining). The purpose of this paper is two-fold: first, it focuses on Information Retrieval (IR) whose purpose is to implement a set of models and systems for selecting a set of documents satisfying user needs in terms of information expressed as a query. An information retrieval system is composed mainly of two processes the representation and retrieval process. The process of representation is called indexing, which allows representation of documents and queries by descriptors, or indexes. These descriptors reflect the contents of documents. The retrieval process consists on the comparison between documents representations and query representation. The second aim of this paper is to discover the relationships between terms (keywords) descriptors of documents in a document database. The correlations (relationships) between terms are extracted by using a technique of the Text mining, mainly association rules.
APA, Harvard, Vancouver, ISO, and other styles
8

Chawla, Suruchi. "Web Page Recommender System using hybrid of Genetic Algorithm and Trust for Personalized Web Search." In Research Anthology on Multi-Industry Uses of Genetic Programming and Algorithms, 656–75. IGI Global, 2021. http://dx.doi.org/10.4018/978-1-7998-8048-6.ch034.

Full text
Abstract:
The main challenge to effective information retrieval is to optimize the page ranking in order to retrieve relevant documents for user queries. In this article, a method is proposed which uses hybrid of genetic algorithms (GA) and trust for generating the optimal ranking of trusted clicked URLs for web page recommendations. The trusted web pages are selected based on clustered query sessions for GA based optimal ranking in order to retrieve more relevant documents up in ranking and improves the precision of search results. Thus, the optimal ranking of trusted clicked URLs recommends relevant documents to web users for their search goal and satisfy the information need of the user effectively. The experiment was conducted on a data set captured in three domains, academics, entertainment and sports, to evaluate the performance of GA based optimal ranking (with/without trust) and search results confirms the improvement of precision of search results.
APA, Harvard, Vancouver, ISO, and other styles
9

Khalloufi, Rida, Rachid El Ayachi, Mohamed Biniz, Mohamed Fakir, and Muhammad Sarfraz. "An Approach of Documents Indexing Using Summarization." In Advances in Library and Information Science, 78–86. IGI Global, 2020. http://dx.doi.org/10.4018/978-1-7998-1021-6.ch005.

Full text
Abstract:
Document indexing is an active domain, which is interesting a lot of researchers. Generally, it is used in the information retrieval systems. Document indexing encompasses a set of approaches that can be applied to index a document using a corpus. This treatment has several advantages, like accelerating the research process, finding the pertinent contains related to a query, reducing storage space, etc. The use of the entire document in the indexing process affects several parameters, such as indexing time, research time, storage space of treatment, etc. The focus of this chapter is to improve all parameters (cited above) related to the indexing process by proposing a new indexing approach. The goal of proposed approach is to use a summarization to minimize the size of documents without affecting the meaning.
APA, Harvard, Vancouver, ISO, and other styles
10

Lam, S. S., and Samuel P. M. Choi. "Multidimensional Ontology-Based Information Retrieval for Academic Counseling." In Information Retrieval and Management, 1726–44. IGI Global, 2018. http://dx.doi.org/10.4018/978-1-5225-5191-1.ch078.

Full text
Abstract:
Conventional information retrieval can only locate documents containing user specified keywords. Integrating domain ontology with information retrieval extends the keyword-based search to semantic search and thus potentially improves the precision and recall of the document retrieval. In this paper, a set of new multidimensional ontology-based information retrieval algorithms is proposed for searching both specific and related terms. In particular, the relevant data properties of an instance, the relevant concepts, the relevant related concepts, and the related instances of a given user query can be identified from the domain ontology via the multidimensional search. Using the proposed algorithms, an intelligent counselling system which provides 24x7 online academic counselling services is developed. Through an interactive user-interface and domain ontology, the system facilitates students to find desired information by reviewing and refining their query. The article also outlines how to enable ontology-based searching for a conventional website.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Retrieved document sets"

1

Aldous, Kenneth J., and Andrew B. Lintott. "A Web Platform for the Exchange and Transformation of Business Objects." In ASME 2003 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2003. http://dx.doi.org/10.1115/detc2003/cie-48266.

Full text
Abstract:
Internet technologies provide a means of implementing data exchange systems incrementally, from connecting two applications on the one machine to implementing an Internetwide information system. Recent advances in XML and related technologies greatly facilitate data extraction and reformatting, thus enabling data from one or more applications to be prepared for use by another. A model, based on these technologies, is described for the exchange of business documents among disparate computer applications used to manage manufacturing operations. The model comprises two parts, the first of which includes a set of code modules, written in Java, for handling requests to retrieve information from one or more Web resources, composing the retrieved data into an XML document and delivering the result. The second part comprises a set of what are termed XSL meta-stylesheets for transforming the retrieved documents into the various forms required by recipient computer applications. Meta-stylesheets are used to transform simple declarative XML documents into more complex stylesheets to add new material to received documents, invert the structure of XML documents and make changes to selected text within the documents.
APA, Harvard, Vancouver, ISO, and other styles
2

Rooney, Phelim, Christoffer Nilsen-Aas, Haavard Skjerve, Kristin Nedrelid, and Nils Gunnar Viko. "Offshore Monitoring of Snorre A TLP TTR Jumpers." In ASME 2015 34th International Conference on Ocean, Offshore and Arctic Engineering. American Society of Mechanical Engineers, 2015. http://dx.doi.org/10.1115/omae2015-42126.

Full text
Abstract:
Four sensors were installed on the Snorre A TLP (Tension Leg Platform) on 16th April 2014 and retrieved on 10th May 2014, to document motions of the vessel, top tensioned riser (TTR) and flexible jumper connecting the TTR (Top Tensioned Riser) with the topside piping. The data recorded represents 3828 data sets. Associated significant wave height and peak period is synchronous data extracted from the Miros wave measurement radar and stored in the environmental data base. The SmartMotion riser sensors are certified for service in the Wellbay. The sensors are modelled into the OrcaFlex (1) “calibration” analysis model in order to simulate the motion responses in the same format as recorded offshore (accelerations and rates of rotation), and to carry out verification of the OrcaFlex model by comparing both raw data and filtered/integrated derivatives. This work provides a basis for life extension of the Jumpers and provides valuable feedback to design and analysis of TLP and Spar Jumpers between TTRs and topside Headers.
APA, Harvard, Vancouver, ISO, and other styles
3

Miers, Glenn, Marek Czernuszenko, and Brian Hughes. "Improved Information Retrieval From Well Related Documents Using Supervised Learning." In SPE Annual Technical Conference and Exhibition. SPE, 2022. http://dx.doi.org/10.2118/210146-ms.

Full text
Abstract:
Abstract We introduce a system for rapid retrieval of relevant well related information from a corpus of over 20 million documents. This allows for exploration workers to retrieve important business data more quickly. Tracking down all of the information required to make complex business decisions is a time consuming and error prone process. This poses a direct risk of expensive miscalculations and missed opportunities. A first version of this system is currently undergoing tests with select users. As the work here represents the first version of the system, it is expected that improvements will be made. This is a system that can be used at enterprise scale to enable searches to more easily yield usable information to workers. This system uses a supervised learning model to identify well related documents from several categories. Examples of these categories include (but are not limited to) formation evaluation and well completion reports. A machine learning model was trained to classify documents according to input from a well document expert. This input came in the form of a set of labeled documents compiled by said expert. This model was then applied to over 20 million documents that are deemed relevant to the exploration process. The inferred classifications for each document were stored in a search engine in order to facilitate retrieval of documents by each of the labels from above. The benefits of this system are twofold. First, it reduces the number of documents that come back for a given search of a large corpus of documents. Second, it allows users without technical experience in well-related work to more easily find documents.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhao, Xueliang, Chongyang Tao, Wei Wu, Can Xu, Dongyan Zhao, and Rui Yan. "A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/756.

Full text
Abstract:
We present a document-grounded matching network (DGMN) for response selection that can power a knowledge-aware retrieval-based chatbot system. The challenges of building such a model lie in how to ground conversation contexts with background documents and how to recognize important information in the documents for matching. To overcome the challenges, DGMN fuses information in a document and a context into representations of each other, and dynamically determines if grounding is necessary and importance of different parts of the document and the context through hierarchical interaction with a response at the matching step. Empirical studies on two public data sets indicate that DGMN can significantly improve upon state-of-the-art methods and at the same time enjoys good interpretability.
APA, Harvard, Vancouver, ISO, and other styles
5

Braik, Fahed Mubarak, Abdulla Sulaiman Al Shehhi, Luigi Saputelli, Carlos Mata, Dorzhi Badmaev, Salman Khan, and Fariz Rahman. "Automated Subsurface Knowledge ASK Thamama Retrieval Engine Driven by Conversational Text Analytics and NLP - Lessons Learned in Managing Large Volume of Documents in Abu Dhabi Assets." In SPE Annual Technical Conference and Exhibition. SPE, 2021. http://dx.doi.org/10.2118/206372-ms.

Full text
Abstract:
Abstract The purpose of this paper is to communicate the experiences in the development of an innovative concept named "ASK Thamama" as an automated data and information retrieval engine driven by artificial intelligence techniques including text analytics and natural language processing. ASK is an AI enabled conversational search engine used to retrieve information from various internal data repositories using natural language queries. The text processing and conversational engine concept is built upon available open-source software requiring minimum coding of new libraries. A data set with 1000 documents was used to validate key functionalities with an accuracy of 90% of the search queries and able to provide specific answers for 80% of queries framed as questions. The results of this work show encouraging results and demonstrate value that AI-enabled methodologies can provide natural language search by enabling automated workflows for data information retrieval. The developed AI methodology has tremendous potential of integration in an end-to-end workflow of knowledge management by utilizing available document repositories to valuable insights, with little to no human intervention.
APA, Harvard, Vancouver, ISO, and other styles
6

Kovačič, Ivan, David Bajs, and Milan Ojsteršek. "Methodology for the Assessment of the Text Similarity of Documents in the CORE Open Access Data Set of Scholarly Documents." In 7th Student Computer Science Research Conference. University of Maribor Press, 2021. http://dx.doi.org/10.18690/978-961-286-516-0.12.

Full text
Abstract:
This paper describes the methodology of data preparation and analysis of the text similarity required for plagiarism detection on the CORE data set. Firstly, we used the CrossREF API and Microsoft Academic Graph data set for metadata enrichment and elimination of duplicates of doc-uments from the CORE 2018 data set. In the second step, we used 4-gram sequences of words from every document and transformed them into SHA-256 hash values. Features retrieved using hashing algorithm are compared, and the result is a list of documents and the percentages of cov-erage between pairs of documents features. In the third step, called pairwise feature-based ex-haustive analysis, pairs of documents are checked using the longest common substring.
APA, Harvard, Vancouver, ISO, and other styles
7

Deolalikar, Vinay. "Topological models of document-query sets in retrieval for Enterprise Information Management." In 2014 IEEE International Conference on Big Data (Big Data). IEEE, 2014. http://dx.doi.org/10.1109/bigdata.2014.7004426.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

KhudaBukhsh, Ashiqur R., Shriphani Palakodety, and Jaime G. Carbonell. "Harnessing Code Switching to Transcend the Linguistic Barrier." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/602.

Full text
Abstract:
Code mixing (or code switching) is a common phenomenon observed in social-media content generated by a linguistically diverse user-base. Studies show that in the Indian sub-continent, a substantial fraction of social media posts exhibit code switching. While the difficulties posed by code mixed documents to further downstream analyses are well-understood, lending visibility to code mixed documents under certain scenarios may have utility that has been previously overlooked. For instance, a document written in a mixture of multiple languages can be partially accessible to a wider audience; this could be particularly useful if a considerable fraction of the audience lacks fluency in one of the component languages. In this paper, we provide a systematic approach to sample code mixed documents leveraging a polyglot embedding based method that requires minimal supervision. In the context of the 2019 India-Pakistan conflict triggered by the Pulwama terror attack, we demonstrate an untapped potential of harnessing code mixing for human well-being: starting from an existing hostility diffusing hope speech classifier solely trained on English documents, code mixed documents are utilized to perform cross-lingual sampling and retrieve hope speech content written in a low-resource but widely used language - Romanized Hindi. Our proposed pipeline requires minimal supervision and holds promise in substantially reducing web moderation efforts. A further exploratory study on a new COVID-19 data set introduced in this paper demonstrates the generalizability of our cross-lingual sampling technique.
APA, Harvard, Vancouver, ISO, and other styles
9

Rockwell, Justin A., Paul Witherell, Rui Fernandes, Ian Grosse, Sundar Krishnamurty, and Jack Wileden. "A Web-Based Environment for Documentation and Sharing of Engineering Design Knowledge." In ASME 2008 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2008. http://dx.doi.org/10.1115/detc2008-50086.

Full text
Abstract:
This paper presents the foundation for a collaborative Web-based environment for improving communication by formally defining a platform for documentation and sharing of engineering design knowledge throughout the entire design process. In this work an ontological structure is utilized to concisely define a set of individual engineering concepts. This set of modular ontologies link together to create a flexible, yet consistent, product development knowledge-base. The resulting infrastructure uniquely enables the information stored within the knowledge-base to be readily inspectable and computable, thus allowing for design tools that reason on the information to assist designers and automate design processes. A case study of the structural optimization of a transfer plate for an aerospace circuit breaker is presented to demonstrate implementation and usefulness of the knowledge framework. The results indicate that the ontological knowledge-base can be used to prompt engineers to document important product development information, increase understanding of the design process, provide a means to intuitively retrieve information, and seamlessly access distributed information.
APA, Harvard, Vancouver, ISO, and other styles
10

Denli, Huseyin, Hassan A. Chughtai, Brian Hughes, Robert Gistri, and Peng Xu. "Geoscience Language Processing for Exploration." In Abu Dhabi International Petroleum Exhibition & Conference. SPE, 2021. http://dx.doi.org/10.2118/207766-ms.

Full text
Abstract:
Abstract Deep learning has recently been providing step-change capabilities, particularly using transformer models, for natural language processing applications such as question answering, query-based summarization, and language translation for general-purpose context. We have developed a geoscience-specific language processing solution using such models to enable geoscientists to perform rapid, fully-quantitative and automated analysis of large corpuses of data and gain insights. One of the key transformer-based model is BERT (Bidirectional Encoder Representations from Transformers). It is trained with a large amount of general-purpose text (e.g., Common Crawl). Use of such a model for geoscience applications can face a number of challenges. One is due to the insignificant presence of geoscience-specific vocabulary in general-purpose context (e.g. daily language) and the other one is due to the geoscience jargon (domain-specific meaning of words). For example, salt is more likely to be associated with table salt within a daily language but it is used as a subsurface entity within geosciences. To elevate such challenges, we retrained a pre-trained BERT model with our 20M internal geoscientific records. We will refer the retrained model as GeoBERT. We fine-tuned the GeoBERT model for a number of tasks including geoscience question answering and query-based summarization. BERT models are very large in size. For example, BERT-Large has 340M trained parameters. Geoscience language processing with these models, including GeoBERT, could result in a substantial latency when all database is processed at every call of the model. To address this challenge, we developed a retriever-reader engine consisting of an embedding-based similarity search as a context retrieval step, which helps the solution to narrow the context for a given query before processing the context with GeoBERT. We built a solution integrating context-retrieval and GeoBERT models. Benchmarks show that it is effective to help geologists to identify answers and context for given questions. The prototype will also produce a summary to different granularity for a given set of documents. We have also demonstrated that domain-specific GeoBERT outperforms general-purpose BERT for geoscience applications.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography