Добірка наукової літератури з теми "Corpus de tweets"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Corpus de tweets".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Corpus de tweets":

1

Mitra, Tanushree, and Eric Gilbert. "CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations." Proceedings of the International AAAI Conference on Web and Social Media 9, no. 1 (August 3, 2021): 258–67. http://dx.doi.org/10.1609/icwsm.v9i1.14625.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators — within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.
2

Chen, Lu, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, and Amit Sheth. "Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter." Proceedings of the International AAAI Conference on Web and Social Media 6, no. 1 (August 3, 2021): 50–57. http://dx.doi.org/10.1609/icwsm.v6i1.14252.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.
3

Yang, Yuan-Chi, Mohammed Ali Al-Garadi, Whitney Bremer, Jane M. Zhu, David Grande, and Abeed Sarker. "Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid." Journal of Medical Internet Research 23, no. 5 (May 3, 2021): e26616. http://dx.doi.org/10.2196/26616.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Background The wide adoption of social media in daily life renders it a rich and effective resource for conducting near real-time assessments of consumers’ perceptions of health services. However, its use in these assessments can be challenging because of the vast amount of data and the diversity of content in social media chatter. Objective This study aims to develop and evaluate an automatic system involving natural language processing and machine learning to automatically characterize user-posted Twitter data about health services using Medicaid, the single largest source of health coverage in the United States, as an example. Methods We collected data from Twitter in two ways: via the public streaming application programming interface using Medicaid-related keywords (Corpus 1) and by using the website’s search option for tweets mentioning agency-specific handles (Corpus 2). We manually labeled a sample of tweets in 5 predetermined categories or other and artificially increased the number of training posts from specific low-frequency categories. Using the manually labeled data, we trained and evaluated several supervised learning algorithms, including support vector machine, random forest (RF), naïve Bayes, shallow neural network (NN), k-nearest neighbor, bidirectional long short-term memory, and bidirectional encoder representations from transformers (BERT). We then applied the best-performing classifier to the collected tweets for postclassification analyses to assess the utility of our methods. Results We manually annotated 11,379 tweets (Corpus 1: 9179; Corpus 2: 2200) and used 7930 (69.7%) for training, 1449 (12.7%) for validation, and 2000 (17.6%) for testing. A classifier based on BERT obtained the highest accuracies (81.7%, Corpus 1; 80.7%, Corpus 2) and F1 scores on consumer feedback (0.58, Corpus 1; 0.90, Corpus 2), outperforming the second best classifiers in terms of accuracy (74.6%, RF on Corpus 1; 69.4%, RF on Corpus 2) and F1 score on consumer feedback (0.44, NN on Corpus 1; 0.82, RF on Corpus 2). Postclassification analyses revealed differing intercorpora distributions of tweet categories, with political (400778/628411, 63.78%) and consumer feedback (15073/27337, 55.14%) tweets being the most frequent for Corpus 1 and Corpus 2, respectively. Conclusions The broad and variable content of Medicaid-related tweets necessitates automatic categorization to identify topic-relevant posts. Our proposed system presents a feasible solution for automatic categorization and can be deployed and generalized for health service programs other than Medicaid. Annotated data and methods are available for future studies.
4

Al-Twairesh, Nora, Hend Al-Khalifa, AbdulMalik Al-Salman, and Yousef Al-Ohali. "AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets." Procedia Computer Science 117 (2017): 63–72. http://dx.doi.org/10.1016/j.procs.2017.10.094.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Abayomi-Alli, Adebayo, Olusola Abayomi-Alli, Sanjay Misra, and Luis Fernandez-Sanz. "Study of the Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms." Information 13, no. 3 (March 15, 2022): 152. http://dx.doi.org/10.3390/info13030152.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Mining opinion on social media microblogs presents opportunities to extract meaningful insight from the public from trending issues like the “yahoo-yahoo” which in Nigeria, is synonymous to cybercrime. In this study, content analysis of selected historical tweets from “yahoo-yahoo” hash-tag was conducted for sentiment and topic modelling. A corpus of 5500 tweets was obtained and pre-processed using a pre-trained tweet tokenizer while Valence Aware Dictionary for Sentiment Reasoning (VADER), Liu Hu method, Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and Multidimensional Scaling (MDS) graphs were used for sentiment analysis, topic modelling and topic visualization. Results showed the corpus had 173 unique tweet clusters, 5327 duplicates tweets and a frequency of 9555 for “yahoo”. Further validation using the mean sentiment scores of ten volunteers returned R and R2 of 0.8038 and 0.6402; 0.5994 and 0.3463; 0.5999 and 0.3586 for Human and VADER; Human and Liu Hu; Liu Hu and VADER sentiment scores, respectively. While VADER outperforms Liu Hu in sentiment analysis, LDA and LSI returned similar results in the topic modelling. The study confirms VADER’s performance on unstructured social media data containing non-English slangs, conjunctions, emoticons, etc. and proved that emojis are more representative of sentiments in tweets than the texts.
6

V, Ashwin. "Twitter Tweet Classifier." IAES International Journal of Artificial Intelligence (IJ-AI) 5, no. 1 (March 1, 2016): 41. http://dx.doi.org/10.11591/ijai.v5.i1.pp41-44.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
<p>This paper addresses the task of building a classifier that would categorise tweets in Twitter. Microblogging nowadays has become a tool of communication for Internet users. They share opinion on different aspects of life. As the popularity of the microblogging sites increases the closer we get to the era of Information Explosion.Twitter is the second most used microblogging site which handles more than 500 million tweets tweeted everyday which translates to mind boggling 5,700 tweets per second. Despite the humongous usage of twitter there isn’t any specific classifier for these tweets that are tweeted on this site. This research attempts to segregate tweets and classify them to categories like Sports, News, Entertainment, Technology, Music, TV, Meme, etc. Naïve Bayes, a machine learning algorithm is used for building a classifier which classifies the tweets when trained with the twitter corpus. With this kind of classifier the user may simply skim the tweets without going through the tedious work of skimming the newsfeed.</p>
7

Park, Jung Ran, and Houda El Mimouni. "Emoticons and non-verbal communications across Arabic, English, and Korean Tweets." Global Knowledge, Memory and Communication 69, no. 8/9 (June 6, 2020): 579–95. http://dx.doi.org/10.1108/gkmc-02-2020-0021.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Purpose The purpose of this study is to examine how tweeters drawn from three different languages and cultural boundaries manage the lack of contextual cues through an analysis of Arabic, English and Korean tweets. Design/methodology/approach Data for this study is drawn from a corpus of tweets (n = 1,200) streamed using Python through Twitter API. Using the language information, the authors limited the number of tweets to 400 randomly selected tweets from each language, totaling 1,200 tweets. Final coding taxonomy was derived through interactive processes preceded by literature and a preliminary analysis based on a small subset (n = 150) by isolating nonverbal communication devices and emoticons. Findings The results of the study present that there is great commonality across these tweets in terms of strategies and creativity in compensating for the constraints imposed by the tweet platform. The language-specific characteristics are also shown in the form of different usage of devices. Research limitations/implications Emoticon usage indicates that the communication mode influences online social interaction; the restriction of 140 maximum characters seems to engender a frequent usage of emoticons across tweets regardless of language differences. The results of the study bring forth implications into the design of social media technologies that reflect affective aspects of communication and language-/culture-specific traits and characteristics. Originality/value To the best of the authors’ knowledge, there are no qualitative studies examining paralinguistic nonverbal communication cues in the Twitter platform across language boundaries.
8

Li, Quanzhi, Sameena Shah, Xiaomo Liu, and Armineh Nourbakhsh. "Data Sets: Word Embeddings Learned from Tweets and General Data." Proceedings of the International AAAI Conference on Web and Social Media 11, no. 1 (May 3, 2017): 428–36. http://dx.doi.org/10.1609/icwsm.v11i1.14859.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A word embedding is a low-dimensional, dense and real-valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually generated from a large text corpus. The embedding of a word captures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets and the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general data. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks.
9

Vieira da Silva, Fernando J., Norton T. Roman, and Ariadne M. B. R. Carvalho. "Stock market tweets annotated with emotions." Corpora 15, no. 3 (November 2020): 343–54. http://dx.doi.org/10.3366/cor.2020.0203.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
As stock trading became a popular topic on Twitter, many researchers have proposed different approaches to make predictions on it, relying on the emotions found in messages. However, detailed studies require a reasonably sized corpus with emotions properly annotated. In this work, we introduce a corpus of tweets in Brazilian Portuguese annotated with emotions. Comprising 4,277 tweets, this is, to the best of our knowledge, the largest annotated corpus available in the stock market domain for this language. Amongst its possible uses, the corpus lends itself to the application of machine learning models for automatic emotion identification, as well as to the study of correlations between emotions and stock price movements.
10

McDonald, Graham, Romain Deveaud, Richard McCreadie, Craig Macdonald, and Iadh Ounis. "Tweet Enrichment for Effective Dimensions Classification in Online Reputation Management." Proceedings of the International AAAI Conference on Web and Social Media 9, no. 1 (August 3, 2021): 654–57. http://dx.doi.org/10.1609/icwsm.v9i1.14674.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Online Reputation Management (ORM) is concerned with the monitoring of public opinions on social media for entities such as commercial organisations. In particular, we investigate the task of reputation dimension classification, which aims to classify tweets that mention a business entity into different dimensions (e.g. "financial performance'' or "products and services''). However, producing a general reputation dimension classification system that can be used across businesses of different types is challenging, due to the brief nature of tweets and the lack of terms in tweets that relate to specific reputation dimensions. To tackle these issues, we propose a robust and effective tweet enrichment approach that expands tweets with additional discriminative terms from a contemporary Web corpus. Using the RepLab 2014 test collection, we show that our tweet enrichment approach outperforms effective baselines including the top performing submission to RepLab 2014. Moreover, we show that the achieved accuracy scores are very close to the upper bound that our approach could achieve on this collection.

Дисертації з теми "Corpus de tweets":

1

Doudagiri, Vivek Reddy. "Extracting Temporally-Anchored Knowledge from Tweets." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1157588/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Twitter has quickly become one of the most popular social media sites. It has 313 million monthly active users, and 500 million tweets are published daily. With the massive number of tweets, Twitter users share information about a location along with the temporal awareness. In this work, I focus on tweets where author of the tweets exclusively mentions a location in the tweet. Natural language processing systems can leverage wide range of information from the tweets to build applications like recommender systems that predict the location of the author. This kind of system can be used to increase the visibility of the targeted audience and can also provide recommendations interesting places to visit, hotels to stay, restaurants to eat, targeted on-line advertising, and co-traveler matching based on the temporal information extracted from a tweet. In this work I determine if the author of the tweet is present in the mentioned location of the tweet. I also determine if the author is present in the location before tweeting, while tweeting, or after tweeting. I introduce 5 temporal tags (before the tweet but > 24 hours; before the tweet but < 24 hours; during the tweet is posted; after the tweet is posted but < 24 hours; and after the tweet is posted but > 24 hours). The major contributions of this paper are: (1) creation of a corpus of 1062 tweets containing 1200 location named entities, containing annotations whether author of a tweet is or is not located in the location he tweets about with respect to 5 temporal tags; (2) detailed corpus analysis including real annotation examples and label distributions per temporal tag; (3) detailed inter-annotator agreements, including Cohen's kappa, Krippendorff's alpha and confusion matrices per temporal tag; (4) label distributions and analysis; and (5) supervised learning experiments, along with the results.
2

Gauthier, Michaël. "Age, gender, fuck, and twitter : a sociolinguistic analysis of swearing in a corpus of British tweets." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE2079.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Parmi les traits traditionnellement associés au genre d’une personne, la manière dont les individus utilisent le langage est un facteur tout aussi important que d’autres aspects, potentiellement plus évidents, tels que la manière de s’habiller par exemple. La littérature recense de nombreux traits linguistiques faisant partie de ces idées associées à un genre ou l’autre tels que la tournure interrogative (tag question) ou la déférence par exemple. De toutes les caractéristiques linguistiques genrées, celle qui a probablement été le plus débattue est celle concernant l’utilisation des jurons. Ceci est principalement dû à des idées qui ont longtemps été associées à l’utilisation de la vulgarité : celles d’un langage « impur », blasphématoire ou encore d’un langage tabou. A cause d’une interaction complexe entre pression et pouvoir social, la vulgarité a traditionnellement été associée à l’idée de masculinité avant tout. Utiliser des jurons est souvent considéré comme étant l’affirmation linguistique d’une forme de pouvoir social. Par conséquent, l’association intrinsèque de la vulgarité comme forme de pouvoir à un genre ou l’autre pourrait conduire à l’association d’autres caractéristiques sociales aux questions de masculinité ou de féminité, qu’elles soient fondées ou non. Certaines études ont démontré que, contrairement aux idées longtemps répandues, les femmes n’utilisent pas la vulgarité moins souvent que les hommes, pas plus qu’elles n’utilisent un registre linguistique fondamentalement différent. Certaines ont même prédit que l’utilisation de jurons dits « forts » (i.e. « strong swear words ») chez les femmes augmenterait dans certains contextes, et en particulier sur les réseaux sociaux (Thelwall, 2008) ; ceci s’appliquerait particulièrement aux jeunes générations de femmes. En d’autres termes, l’utilisation de certains jurons chez ces jeunes générations de femmes deviendrait à terme plus fréquente que celle des hommes du même âge. Cette hypothèse correspond à certains travaux qui suggèrent que la communication assistée par ordinateur participe à l’autonomisation des femmes. Il serait donc intéressant de vérifier ces observations et prédictions près de dix ans après qu’elles aient été formulées afin de les confirmer ou de les réfuter. Par conséquent, la question suivante se pose : les prévisions faites par Thelwall en 2008 se sont-elles réalisées près de dix ans plus tard, dans une société où les médias sociaux n’ont jamais eu autant d’importance dans notre vie quotidienne ? Le but de cette thèse est donc double : tout d’abord elle vise à offrir une meilleure compréhension de la manière dont les femmes et les hommes utilisent la vulgarité sur les réseaux sociaux. Le second objectif de ce travail est de démontrer le potentiel de ce type de médias en tant qu’objet d’étude sociolinguistique synchronique (et potentiellement diachronique) de grande échelle.Cette étude est basée sur un corpus composé d’un peu plus de dix-huit millions de tweets représentatifs de près de 739 000 utilisateurs. Le corpus a été constitué à partir de tweets provenant du Royaume Unis, émis par des utilisateurs masculins et féminins, et appartenant à différentes tranches d’âge. Une méthodologie et des outils d’analyse issus de la linguistique dite de corpus ont été utilisés pour mener à bien ce projet et tenter de répondre aux problématiques soulevées précédemment. Aussi, en raison du manque d’informations démographiques directement associées aux profils Twitter (e.g. le genre et l’âge des utilisateurs), il fût nécessaire de recourir à des techniques issues de la programmation informatique afin d’inférer le genre et l’âge de ces personnes. Cette thèse entend donc améliorer les connaissances que nous avons du lien qu’il existe entre le genre, l’âge, l’utilisation de la vulgarité et les réseaux sociaux
Gender norms pervade many layers of our society, and more or less strongly influence the expectations we may have of others. Among these pre-conceptions, many linguistic patterns have been said to be representative of male or female features, like tag questions, deference, turn-taking for example. Of all the gendered linguistic characteristics, the one which may have been the most debated is that of swear words. Swearing is indeed a subject which, even when gender is not concerned, generally provokes many tensions and debates. This is partly due to what swear words are often associated to, that is, what is called “bad language”. Because of a complex interplay between social expectations and power relations, swearing has traditionally been associated with men. These kinds of association led to the creation of pre-conceived ideas stigmatizing women and men who would use a linguistic feature not generally associated with them. These preconceived ideas also fuel societal stereotypes and may impact people’s standards concerning what is desirable from each gender. Moreover, swearing is often considered as an act of power and a way of affirming oneself. Thus, the fact that one gender may be perceived as more frequent users of swear words, or on the other hand as swear word eschewers, may have an impact on other qualities related to power that we would inherently attribute to one gender or the other, whether these differences are real or not. Some studies have showed that contrary to what has long been widely believed, women do not swear less frequently than men, nor do they use a drastically different register. Some even envisioned that the use of “strong” swear words by women would increase in certain contexts, specifically on social media; this seemed especially true for younger generations of users. It was even predicted that “gender equality in swearing or a reversal in gender patterns for strong swearing, will slowly become more widespread, at least in social network sites” (Thelwall, 2008: 102), such that the use of strong swear words among young women will eventually be more frequent than among (young) men. Accordingly, the swearing patterns displayed in 2008 could keep evolving for a certain category of women (especially younger ones), which would correlate with other claims, which stated that computer-mediated communication as a whole could be empowering for women. Thus, the following question arises: has the prediction made by Thelwall in 2008 been fulfilled eight years later, in a society where computer-mediated communication in the context of social media is firmly rooted in people's everyday lives? The aim of this thesis is thus twofold: first, it is to offer a better understanding of the patterns of swear word usage among women and men on social media, and second, it is to show the potential of these media as a source of data for synchronic (and possibly diachronic) sociolinguistic studies on a much larger scale. This study is based specifically on a corpus composed of just over eighteen million tweets issued by roughly 739 000 users. The corpus was populated with tweets by British users of both genders and from different age groups throughout the United Kingdom. Corpus linguistic methodology and tools have been used to address the sociolinguistic issues raised earlier. Also, because Twitter does not provide us with a direct access to the gender or the age of the users, using computer-programming methods has been necessary to be able to study these age and gender differences.This thesis hopes to advance the field of swearing research with regards both to gender and the relatively new context of social media. In so doing, it also aims to further establish the use of social media in linguistic investigation and pave the way for future studies
3

Miletic, Filip. "An investigation into contact-induced semantic shifts in Quebec English : conciliating corpus-based vector models and variationist sociolinguistic inquiry." Electronic Thesis or Diss., Toulouse 2, 2022. http://www.theses.fr/2022TOU20034.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Cette thèse étudie les glissements de sens induits par le contact de langues en anglais québécois, à savoir des mots anglais préexistants utilisés avec un sens différent en raison d’une influence potentielle du français. Nous proposons une approche novatrice à l’intersection du traitement automatique des langues et de la sociolinguistique variationniste, afin de fournir une description exhaustive de ce phénomène ainsi que d’évaluer les contributions des approches sur corpus mises en œuvre ici.Afin d’effectuer des analyses computationnelles de variation sémantique, nous avons constitué un corpus composé de 78,8 millions de tweets de Montréal, Toronto et Vancouver. Le corpus a été utilisé pour mettre en œuvre différents types de modèles vectoriels, à savoir des représentations computationnelles du sens des mots. Les modèles statiques ont permis d’identifier de nouveaux glissements de sens, alors que les modèles contextuels ont permis de caractériser plus finement leurs utilisations. Malgré des résultats prometteurs, ces méthodes sont limitées par le bruit lié à leurs caractéristiques intrinsèques et à la structure du corpus.Ces approches ont été complétées par des données plus fines recueillies au moyen d’entretiens sociolinguistiques avec 15 locuteurs vivant à Montréal. Les corrélations entre les variables linguistiques et différents facteurs sociodémographiques, ainsi que les remarques qualitatives sur leur utilisation, indiquent quatre patterns de variation synchronique ; ceux-ci pourraient à leur tour refléter des processus diachroniques. Par ailleurs, la variabilité inter-locuteurs suggère un rôle important des locuteurs bilingues et plus jeunes dans l'utilisation des glissements de sens. Enfin, les scores d'acceptabilité sont faiblement corrélés avec les mesures computationnelles, ce qui suggère que ceux-ci reflètent d’autres dimensions de variation sémantique.Dans l'ensemble, cette thèse a fourni la première description systématique des glissements de sens en anglais québécois. Elle a également mis en évidence la complémentarité des approches développées dans des disciplines différentes. Ces considérations ouvrent la voie à une utilisation plus avisée des méthodes computationnelles basées sur corpus dans des études de phénomènes sociolinguistiques
This dissertation investigates contact-induced semantic shifts in Quebec English, i.e., preexisting English words which are used with a different meaning due to the potential influence of French. I propose a novel approach at the intersection of natural language processing and variationist sociolinguistics, aiming to provide a more comprehensive descriptive account as well as assess the contributions of the implemented methods.In order to conduct computational analyses of semantic variation, I created a corpus containing 78.8 million tweets from Montreal, Toronto, and Vancouver. It was used to implement different types of vector space models, i.e., computational representations of word meaning. Type-level models were used to identify new semantic shifts based on the semantic differences between Montreal and the other two cities. Token-level models were used in finer-grained analyses and allowed to further characterize their use. Despite promising results, systematic quantitative evaluation and extensive qualitative analyses suggest that these methods are hampered by noise related to their inherent characteristics as well as corpus structure.These large-scale approaches were complemented with finer-grained data collected through sociolinguistic interviews with 15 speakers living in Montreal. Varying correlations between lexical items and a range of sociodemographic factors, coupled with qualitative remarks on their use, point to four distinct patterns of synchronic variation; these in turn reflect potential diachronic processes. Interspeaker variability suggests that the use of semantic shifts is driven by speakers who tend to be younger and proficient in both English and French. The acceptability ratings are weakly correlated with computational variation measures, suggesting that they capture different dimensions of semantic variation.Overall, this dissertation has provided the first systematic description of contact-induced semantic shifts in Quebec English, and highlighted the complementarity of approaches used in different disciplines. These considerations have provided a pathway towards a better-informed use of corpus-based computational methods in studies of sociolinguistic phenomena
4

Sanagavarapu, Krishna Chaitanya. "Determining Whether and When People Participate in the Events They Tweet About." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984235/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This work describes an approach to determine whether people participate in the events they tweet about. Specifically, we determine whether people are participants in events with respect to the tweet timestamp. We target all events expressed by verbs in tweets, including past, present and events that may occur in future. We define event participant as people directly involved in an event regardless of whether they are the agent, recipient or play another role. We present an annotation effort, guidelines and quality analysis with 1,096 event mentions. We discuss the label distributions and event behavior in the annotated corpus. We also explain several features used and a standard supervised machine learning approach to automatically determine if and when the author is a participant of the event in the tweet. We discuss trends in the results obtained and devise important conclusions.
5

Jonsson, Lisbeth. "Literatuur uit de Lage Landen in Zweedse vertaling tussen 1995-2019 : Cultuuroverdracht tussen twee perifere talen." Thesis, Stockholms universitet, Institutionen för slaviska och baltiska språk, finska, nederländska och tyska, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-189242.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Deze studie betreft boeken uit de Lage Landen in Zweedse vertaling in de periode 1995-2019. Dit is een voorbeeld van literatuuroverdracht tussen twee perifere talen. Een vergelijking met de overdracht van literatuur uit de Lage Landen naar de centrale talen Engels, Duits en Frans is in de focus. Belangrijke vraagstukken zijn de selectie, de overdracht en de receptie.  Het Nederlands is in de bestudeerde periode steeds in de top tien lijst van talen waaruit boeken in het Zweeds zijn vertaald, alhoewel de bijdrage steeds minder dan 1% van alle boeken in Zweedse vertaling is. Een derde van de in totaal 137 boeken zijn door negen gevestigde auteurs geschreven. De selectie bevat vele oudere boeken met gevestigde waarden, terwijl nieuwe contemporaine auteurs nauwelijks zijn vertegenwoordigd.  De overdracht heeft verschillende patronen voor Nederlandse en Vlaamse schrijvers. Een derde van de boeken verschijnen in het Zweeds zonder eerder in een van de centrale talen te zijn vertaald. Wat betreft de overige boeken zijn de Nederlandse schrijvers meestal eerst in het Duits en de Vlaamse meestal eerst in het Frans vertaald. Nederlandse schrijvers zijn meestal bij grote en Vlaamse meestal bij kleine Zweedse uitgevers gepubliceerd. De receptie wordt in 245 artikelen in de vier grootste Zweedse dagbladen met kwalitatieve en kwantitatieve methodes geanalyseerd. Eerdere studies van de receptie in de centrale talen hebben specifieke kenmerken als typerend voor literatuur in het Nederlands naar voren gebracht. Zulke uitspraken over typische kenmerken werden hier uiterst zelden gevonden. Uit de resultaten blijkt dat de literatuur als kosmopolitisch en niet als vertegenwoordiger van een nationale cultuur wordt behandeld.  .
This study considers books translated from Dutch into Swedish in the period 1995-2019. This is an example of culture transfer between two peripheral languages. The comparison with culture transfer from Dutch to the central languages English, German and French is in the focus. Major issues are the selection, the transfer and the reception.  In the period studied, Dutch is consistently on the top ten list of languages from which books are translated into Swedish, although contributing less than 1% of the total number of translated books. Titles from nine established authors constitute a third of a total of 137 books translated from Dutch to Swedish. The selection contains many older books of established value, whereas new contemporary writers scarcely are represented. The transfer shows different patterns for Dutch and Flemish authors. About one third of the books appeared in Swedish without earlier translation into the central languages. Considering the other titles, German is most often the first language of translation for titles by Dutch authors and French for Flemish authors. Titles of Dutch authors are most often published by large Swedish publishers and those of Flemish authors most often by small publishers.  The reception is studied in 245 articles from the Swedish four largest daily papers using qualitative and quantitative methods. Earlier studies of the reception in the central languages showed that specific characteristics typically were associated with literature in Dutch. This is most rarely found here. The reception studies indicate that the books are considered as cosmopolitan rather than being representative of a national culture.
6

Rodrigues, Ciro Jos? Ferreira. "Estudo da efic?cia do tensoativo sorbitano tween 80 veiculado em nanoemuls?o contendo ?leo de soja, como inibidor de corros?o." Universidade Federal do Rio Grande do Norte, 2012. http://repositorio.ufrn.br:8080/jspui/handle/123456789/17700.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Made available in DSpace on 2014-12-17T15:42:05Z (GMT). No. of bitstreams: 1 CiroJFR_DISSERT.pdf: 5688343 bytes, checksum: f082f764a617b016bd522ef00e0f6125 (MD5) Previous issue date: 2012-08-30
Conselho Nacional de Desenvolvimento Cient?fico e Tecnol?gico
Nowadays, the use of chemicals that satisfactorily meet the needs of different sectors of the chemical industry is linked to the consumption of biodegradable materials. In this context, this work contemplated biotechnological aspects with the objective of developing a more environmentally-friendly corrosion inhibitor. In order to achieve this goal, nanoemulsion-type systems (NE) were obtained by varying the amount of Tween 80 (9 to 85 ppm) a sortitan surfactant named polyoxyethylene (20) monooleate. This NE-system was analyzed using phase diagrams in which the percentage of the oil phase (commercial soybean oil, codenamed as OS) was kept constant. By changing the amount of Tween 80, several polar NE-OS derived systems (O/W-type nanoemulsion) were obtained and characterized through light scattering, conductivity and pH, and further subjected to electrochemical studies. The interfacial behavior of these NE-OS derived systems (codenamed NE-OS1, S2, S3, S4 and S5) as corrosion inhibitors on carbon steel AISI 1020 in saline media (NaCl 3.5%) were evaluated by measurement of Open Circuit Potential (OCP), Polarization Curves (Tafel extrapolation method) and Electrochemical Impedance Spectroscopy (EIS). The analyzed NE-OS1 and NE-OS2 systems were found to be mixed inhibitors with quantitative efficacy (98.6% - 99.7%) for concentrations of Tween 80 ranging between 9 and 85 ppm. According to the EIS technique, maximum corrosion efficiency was observed for some tested NE-OS samples. Additionaly to the electrochemical studies, Analysis of Variance (ANOVA) and Principal Component Analysis (PCA) were used, characterization of the nanoemulsion tested systems and adsorption studies, respectively, which confirmed the results observed in the experimental analyses using diluted NE-OS samples in lower concentrations of Tween 80 (0.5 1.75 ppm)
Atualmente, a utiliza??o de produtos qu?micos que atendam satisfatoriamente as necessidades de diferentes setores da ind?stria qu?mica encontra-se vinculada ao consumo de materiais biodegrad?veis. Neste contexto, o presente trabalho foi desenvolvido contemplando aspectos biotecnol?gicos com o objetivo de se desenvolver um inibidor de corros?o menos agressivo ao meio ambiente. Para tanto, uma nanoemuls?o (denominada de NE-OS) foi obtida com Tween 80 (polioxietileno (20) monooleato), ?leo de soja (OS) e ?gua bidestilada. A modifica??o dos percentuais deste tensoativo sorbitano (entre 9 e 85 ppm) foi avaliada em diagramas de fases em que se manteve constante o percentual da fase ?leo, possibilitando a forma??o do sistema NE-OS em diferentes concentra??es. Desta forma, nanoemuls?es (NE-OS1, S2, S3, S4 e S5) polares (do tipo O/A) foram caracterizadas (an?lise de dispers?o de luz, condutividade e pH) e submetidas a estudos eletroqu?micos. O comportamento interfacial destes sistemas nanoemulsionados como inibidores de corros?o na presen?a de a?o-carbono AISI 1020, em meio salino (NaCl 3,5%) foi avaliado pela determina??o do Potencial de Circuito Aberto (OCP), Curvas de Polariza??o (obtidas pela extrapola??o das curvas de Tafel) e Espectroscopia de Imped?ncia Eletroqu?mica (EIE). De acordo com os resultados obtidos pelas an?lises das Curvas de Polariza??o, foi poss?vel caracterizar o sistema NE-OS (e suas varia??es) como inibidores mistos com efici?ncias m?ximas de inibi??es quantitativas (98,6% - 99,7%) para NE-OS1 e NE-OS2 (9 - 85 ppm do tensoativo). No entanto, de acordo com a t?cnica EIE, as efici?ncias m?ximas de inibi??es quantitativas foram observadas para v?rias amostras deste sistema NE-OS. Como ferramentas adicionais aos estudos eletroqu?micos foram realizados quimiom?tricos de An?lise de Vari?ncia (ANOVA) e An?lise de Componentes Principais (PCA), utilizadas, respectivamente, para caracteriza??o das nanoemuls?es e estudo de isotermas de adsor??o, que confirmaram os resultados observados nas investiga??es experimentais em que se utilizaram as nanoemuls?es dilu?das com o tensoativo em concentra??es reduzidas (0,5 1,75 ppm)
7

Akhtyrska, Kateryna. "Linguistic expression of irony in social media." Master's thesis, 2014. http://hdl.handle.net/10400.1/8365.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dissertação de Mestrado, Ciências da Linguagem, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014
Esta pesquisa tem como objectivo investigar a expressão linguística da ironia nos media sociais, centrando-se particularmente na análise e descrição dos mecanismos linguísticos e retóricos utilizados para expressar ironia sobre entidades humanas num domínio específico, a política. O estudo visa distinguir as classes mais representativas de ironia e classificá-las em diferentes subclasses. Esta investigação foi realizada com base num corpus manualmente criado para este fim, composto por mensagens breves (tweets) recolhidas a partir do Twitter. Este corpus é centrado especificamente no tema da campanha eleitoral de um candidato para as presidenciais dos Estados Unidos no ano de 2012. Os resultados provam que a ironia poderia ser representada com a ajuda de outros recursos linguísticos e retóricos no que respeita às expressões de cariz politica nas redes sociais. Os resultados mostraram que as categorias mais representativas são as de exclamação (21,6%), perguntas de retórica (14%), antífrases (11,2%) e de metáfora (12,4%). O estudo com base no acordo de Inter-annotator agreement (IAA) foi realizado para fins distintos após a implementação das pesquisas para anotação: (i) para validar a fiabilidade da análise manual, com o valor de α = 0,77 o que demonstra um resultado "altamente experimental"; (ii) para demonstrar as expressões linguísticas de ironia nos Mídia Sociais em concordância com Inter-annotator agreement (IAA), em que obtivemos uma relação de valores entre os grupos A, B e C: α= 0,38; 0,095; 0,042; (iii) e finalmente para avaliar o impacto da ironia no domínio político num conteúdo gerado pelo próprio usuário que mostrou o valor: α = 0,172 e % agr. 73.7.
Erasmus Mundus
8

Besbes, Mounira. "Mapping the captive body in three twenty-first century women’s writings." Thesis, 2019. http://hdl.handle.net/1866/24628.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dans cette thèse de doctorat, “Mapping the Captive Body in Three Twenty- First Century Diasporic Women’s Writings,” j’analyse le fonctionnement du pouvoir de l’État en relation avec le corps, comme le montrent les mémoires de Edwidge Danticat, Azar Nafisi and Marina Nemat. En m’appuyant sur leurs écrits, j’explore les différentes manières dont la violence, la dictature, et le patriarcat, parrainés par l’État, modifient les constructions du corps, de l’esprit, de la voix et de la subjectivité. En examinant ces formes institutionnalisées de violence et de coercition, je montre comment le confinement physique engendre la captivité de l’esprit et la dé(con)struction de soi. Ainsi, je conceptualise la captivité comme étant physique, psychologique mais aussi sociale. En outre, je soutiens que la lutte pour résister à cet effacement identitaire afin de récupérer la subjectivité et la corporealité prend la forme d’une action individuelle et/ou collective. Le premier chapitre contextualise les oeuvres étudiées pendant le règne de deux Duvaliers, de Khomeini, en plus de la politique d’immigration des Etats-Unis après le 11 septembre. En outre, ce chapitre fournit le cadre théorique. Le deuxième chapitre est consacré à l’analyse l’emprisonnement et la privation des droits fondamentaux de Joseph Dantica, soulevant ainsi des questions sur le biopouvoir qui définit le Centre de Détention de Krome. Je montre comment Edwidge Danticat a récupéré l’identité de son oncle à titre posthume. Le troisième chapitre étudie la captivité des femmes engendrée par la surveillance et l’imposition d’un code vestimentaire. J’analyse aussi comment Nafisi et ses étudiantes prennent refuge dans la littérature afin de résister. Dans le dernier chapitre, je regarde comment la prison régularise le genre et l’identité de Nemat. Je soutiens que le viol conjugual, étant une violence politique liée au genre, devient un moyen par lequel la soumission et la domination de Nemat deviennent possibles. Enfin, la dernière partie étudie l’importance de l’amitié carcérale et de l’acte de l’écriture dans la résistance à et la défiance de l’effacement.
In my doctorat project, entitled “Mapping the Captive Body in Three Diasporic Women’s Writings,” I analyze the workings of state power in relation to the body, as illustrated in the works of Edwidge Danticat, Azar Nafisi and Marina Nemat. I explore the different ways state-sponsored violence, dictatorship and patriarchy alter the very constructions of body, mind, voice, and subjectivity. By considering these institutionalized forms of violence and coercion, I demonstrate how physical confinement engenders the captivity of the mind and the de(cons)truction of the self. In so doing, I conceptualize captivity as physical, psychological and social. In addition, I contend that the struggle to resist this erasure and reclaim subjectivity and corporeality takes the forms of individual, communal, and/ or collective action. The first chapter contextualizes and historicizes the studied works with the era the Duvalier, Khomeini’s dictatorship, in addition to the post 9/11 US immigration policies. It also provides the theoretical framework that frames this dissertation. The second chapter focuses on Joseph Dantica’s imprisonment and disfranchisement and raises questions about the biopwer that defines Krome Detention Center. I demonstrate the way Edwidge Danticat posthumously recover her uncle’s identity. The third chapter studies female captivity in terms of forced veiling and constant surveillance. I analyze how Nafisi and her students take refuge in and resist through the power of literature. In the fourth chapter, I look at how prison regulates Nemat’s gender and identity. I argue that marital rape, as a gendered political violence, becomes a means through which Nemat’s subjection and domination is possible. The second part of the chapter explores the importance of carceral friendship and the act of writing in defying and resisting erasure.
9

Collins, Brian J. "The United States Air Force and profession : why sixty percent of Air Force general officers are still pilots when pilots comprise just twenty percent of the officer corps /." 2006. http://handle.dtic.mil/100.2/ADA462738.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Corpus de tweets":

1

Enegwea, Gregory. NYSC: Twenty years of national service. Yaba, Lagos: Gabumo Pub. Co. Ltd., 1993.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

King, Nancy J. Habeas for the twenty-first century: Uses, abuses, and the future of the great writ. Chicago: The University of Chicago Press, 2011.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Page, Kathy, and Marlene A. D. Lynne Van Luven. In the flesh: Twenty writers explore the body. [Victoria, B.C.]: Brindle & Glass Pub., 2012.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Carmin, Charles L. My twenty & then some. Ellinwood, Kansas: Charles L. Carmin, 2011.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Hearn, Chester G. Marines: An illustrated history : the US Marine Corps from 1775 to the twenty-first century. Minneapolis, Minn: Quatro Publishing Group, 2015.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

United States. Congress. House. Committee on Government Operations. The Peace Corps: Entering its fourth decade of service : twenty-second report / by the Committee on Government Operations. Washington: U.S. G.P.O., 1990.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

LeHockey, John D. Strategic and operational military deception: U.S. Marines and the next twenty years. [Washington, DC]: U.S. Marine Corps, 1990.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

LeHockey, John D. Strategic and operational military deception: U. S. Marines and the next twenty years. [Washington, DC]: U.S. Marine Corps, 1990.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

International Conference on English Language Research on Computerized Corpora (2000 Sydney, N.S.W.). New frontiers of corpus research: Papers from the Twenty First International Conference on English Language Research on Computerized Corpora, Sydney 2000. Amsterdam: Rodopi, 2002.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Corti, Eugenio. Few returned: Twenty-eight days on the Russian Front, winter 1942-1943. Columbia, Mo: University of Missouri Press, 1997.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Corpus de tweets":

1

Fafalios, Pavlos, Vasileios Iosifidis, Eirini Ntoutsi, and Stefan Dietze. "TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets." In The Semantic Web, 177–90. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-93417-4_12.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

dos Santos, Allisfrank, Jorge Daniel Barros Júnior, and Heloisa de Arruda Camargo. "Annotation of a Corpus of Tweets for Sentiment Analysis." In Lecture Notes in Computer Science, 294–302. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99722-3_30.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Navas-Loro, María, Víctor Rodríguez-Doncel, Idafen Santana-Pérez, Alba Fernández-Izquierdo, and Alberto Sánchez. "MAS: A Corpus of Tweets for Marketing in Spanish." In Lecture Notes in Computer Science, 363–75. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-98192-5_53.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Siagh, Asma, Fatima Zohra Laallam, and Okba Kazar. "Building a Multilingual Corpus of Tweets Relating to Algerian Higher Education." In Communications in Computer and Information Science, 132–38. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-08277-1_11.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Hammer, Hugo Lewi, Anis Yazidi, Aleksander Bai, and Paal Engelstad. "Improving Classification of Tweets Using Linguistic Information from a Large External Corpus." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 122–34. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-52569-3_11.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Filardo-Llamas, Laura. "Chapter 8. Spain." In Voices of Supporters, 162–86. Amsterdam: John Benjamins Publishing Company, 2023. http://dx.doi.org/10.1075/dapsac.101.c8.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This chapter explores how populist traits permeate Vox’s supporters’ discursive construction of this political party, of the nation, and of other political and social actors. The analysis is based on a corpus of 400 tweets produced during the 2019 European and general elections. The qualitative analysis uses tools from cognitive linguistics, mostly related to positioning and framing, and multimodality. Findings show a nativist trait in the discursive blend established between the political party, its supporters and the nation. The construction of the Other is varied, comprising other political parties and social groups. The existence of Vox is legitimised by supporters as a means for maintaining a Spanish identity both within Spain and Europe. The centrality of the nation is seen both in the textual and visual mode of the tweets, where emojis abound.
7

Bischetti, Luca, and Salvatore Attardo. "From mode adoption to saluting a dead kitten." In Pragmatics & Beyond New Series, 65–86. Amsterdam: John Benjamins Publishing Company, 2023. http://dx.doi.org/10.1075/pbns.335.03bis.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
There has been some research in what can be broadly described as humor performance in the reactions to humorous turns (be they punch lines or jab lines). Two areas that have received particular attention are (1) the reactions to irony, which have been shown to range from mode adoption (i.e., when the respondent adopts the ironical mode and responds with irony to the irony) to treating the irony as a “literal” statement and reacting to it ignoring its ironical intention; and (2) failed humor, in which the range of reactions is much broader, from laughter to open expression of displeasure. Much less has been written on reactions in online discourse and this case study of one humorous tweet by comedian Ricky Gervais, mocking the Academy Award ceremony (the Oscars), will be examined to examine the differences between a corpus of 200+ tweets as responses to Gervais’ original tweet, and proviso studies. We will also consider how much the exchange can be quantified as to its viral nature.
8

Yousuf, Rami Naim Mohammed. "COVID-19 Informative Tweets Identification Through Word-by-Word Lexicon Replacement Using Pretrained Biomedical Corpus." In Financial Technology (FinTech), Entrepreneurship, and Business Development, 237–46. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-08087-6_17.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Goodwin, Jean, and Ekaterina Bogomoletc. "Critical Questions About Scientific Research Publications in the Online Mask Debate." In The Pandemic of Argumentation, 331–54. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-91017-4_17.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractSuccessful management of sociotechnical issues like those raised by the COVID-19 pandemic requires members of the public to use scientific research in their reasoning. In this study, we explore the nature and extent of the public’s abilities to assess research publications through analyzing a corpus of close to 5 K tweets from the early months of the pandemic which mentioned one of six key studies on the then-uncertain topic of the efficacy of face masks. We find that arguers relied on a variety of critical questions to test the adequacy of the research publications to serve as premises in reasoning, their relevance to the issues at hand, and their sufficiency in justifying conclusions. In particular, arguers showed more skill in assessing the authoritativeness of the sources of the publications than in assessing the epistemic qualities of the studies being reported. These results indicate specific areas for interventions to improve reasoning about research publications. Moreover, this study suggests the potential of studying argumentation at the system level in order to document collective preparedness to address sociotechnical issues, i.e., community science literacy.
10

Wehrmeyer, Ella. "Chapter 1. Sign language corpus linguistics." In Advances in Sign Language Corpus Linguistics, 1–27. Amsterdam: John Benjamins Publishing Company, 2023. http://dx.doi.org/10.1075/scl.108.01weh.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Taking the 2004 LREC Workshop on the Representation and Processing of Sign Languages (Streiter & Vettori 2004) as the watershed, this chapter takes stock of the development of sign language corpus linguistics over the past twenty years. After dispelling myths about sign languages, the chapter gives an overview of practices and challenges in building and annotating sign language corpora. The chapter then tracks the historical development of sign language linguistic corpora across the globe, noting the leading role that EU countries still play in this regard. Online archiving and the availability of sign language linguistic corpora to researchers and other users are then discussed. The chapter then explores their contributions in terms of research, education, interpreting, and avatar creation, before reflecting on their role in documenting and preserving sign languages, and their potential benefit for national Deaf communities. Finally, summaries of the chapters that make up this volume are presented.

Тези доповідей конференцій з теми "Corpus de tweets":

1

Mekki, Jade, Gwénolé Lecorvé, Delphine Battistelli, and Nicolas Béchet. "TREMoLo-Tweets: a Multi-Label Corpus of French Tweets for Language Register Characterization." In International Conference Recent Advances in Natural Language Processing. INCOMA Ltd. Shoumen, BULGARIA, 2021. http://dx.doi.org/10.26615/978-954-452-072-4_108.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Xavier, Clarissa Castellã. "Polarity Classification of Traffic Related Tweets." In XV Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2018. http://dx.doi.org/10.5753/eniac.2018.4417.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this paper we present a study about polarity classification of tweets in the traffic domain. Specifically, we use the data in Portuguese language from an account maintained by a traffic management agency. We evaluate the performance of three learning methods: SVM (Support Vector Machine), Naive Bayes and Maximum Entropy. We also explore how the use of balanced vs. unbalanced corpus affects the models behavior. The results show that, in this context, a ML classifier obtains better results than the reported in the literature. In our experiments, SVM trained with a balanced corpus outperforms all tested models, achieving 99% of Accuracy, Average Recall and Average Precision.
3

Trye, David, Andreea Calude, Felipe Bravo-Marquez, and Te Taka Keegan. "MāOri Loanwords: A Corpus of New Zealand English Tweets." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. http://dx.doi.org/10.18653/v1/p19-2018.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Antonakaki, Despoina, Dimitris Spiliotopoulos, Christos V. Samaras, Sotiris Ioannidis, and Paraskevi Fragopoulou. "Investigating the complete corpus of referendum and elections tweets." In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2016. http://dx.doi.org/10.1109/asonam.2016.7752220.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Cruz, Ramon Souza da, Gilberto Nunes Neto, and Rafael Torres Anchiêta. "Detecting Misinformation in Tweets Related to COVID-19." In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2021. http://dx.doi.org/10.5753/eniac.2021.18260.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A propagação de desinformação trouxe e ainda traz diversos problemas para a sociedade, sendo considerada uma infodemia pela Organização Mundial da Saúde (OMS). A grande maioria dos trabalhos desenvolvidos para lidar com desinformação são focados para a língua inglesa. A fim de preencher essa lacuna, este trabalho investiga estratégias baseadas em aprendizado de máquina supervisionado para detectar desinformação em tweets escritos na língua portuguesa. Além disso, criou-se um corpus que foi manualmente anotado para esta tarefa, a fim de avaliar as abordagens desenvolvidas e compará-las com trabalhos relacionados. Os resultados alcançados são competitivos com trabalhos correlatos, indicando que a abordagem produz um interessante baseline para o corpus construído.
6

Liu, Junhua, Trisha Singhal, Lucienne T. M. Blessing, Kristin L. Wood, and Kwan Hui Lim. "EPIC30M: An Epidemics Corpus of Over 30 Million Relevant Tweets." In 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020. http://dx.doi.org/10.1109/bigdata50022.2020.9377739.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Milajevs, Dmitrijs. "Toward a Comparable Corpus of Latvian, Russian and English Tweets." In Proceedings of the 10th Workshop on Building and Using Comparable Corpora. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-2505.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Treurniet, Maaske, and Eric Sanders. "Chats, Tweets and SMS in the SoNaR Corpus: Social Media Collection." In Annual International Conference on Language, Literature & Linguistics. Global Science & Technology Forum (GSTF), 2012. http://dx.doi.org/10.5176/2251-3566_l312149.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Pano, Toni, and Rasha Kashef. "A Corpus of BTC Tweets in the Era of COVID-19." In 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). IEEE, 2020. http://dx.doi.org/10.1109/iemtronics51293.2020.9216427.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Sarbazi-Azad, Saeed, Ahmad Akbari, and Mohsen Khazeni. "ExaAEC: A New Multi-label Emotion Classification Corpus in Arabic Tweets." In 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE). IEEE, 2021. http://dx.doi.org/10.1109/iccke54056.2021.9721493.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Звіти організацій з теми "Corpus de tweets":

1

Forkin, Keith A. Proactive Marine Corps Transition Assistance In The Twenty-First Century. Fort Belvoir, VA: Defense Technical Information Center, February 2015. http://dx.doi.org/10.21236/ada620284.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Chriscoe, Mackenzie, Rowan Lockwood, Justin Tweet, and Vincent Santucci. Colonial National Historical Park: Paleontological resource inventory (public version). National Park Service, February 2022. http://dx.doi.org/10.36967/nrr-2291851.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Colonial National Historical Park (COLO) in eastern Virginia was established for its historical significance, but significant paleontological resources are also found within its boundaries. The bluffs around Yorktown are composed of sedimentary rocks and deposits of the Yorktown Formation, a marine unit deposited approximately 4.9 to 2.8 million years ago. When the Yorktown Formation was being deposited, the shallow seas were populated by many species of invertebrates, vertebrates, and micro-organisms which have left body fossils and trace fossils behind. Corals, bryozoans, bivalves, gastropods, scaphopods, worms, crabs, ostracodes, echinoids, sharks, bony fishes, whales, and others were abundant. People have long known about the fossils of the Yorktown area. Beginning in the British colonial era, fossiliferous deposits were used to make lime and construct roads, while more consolidated intervals furnished building stone. Large shells were used as plates and dippers. Collection of specimens for study began in the late 17th century, before they were even recognized as fossils. The oldest image of a fossil from North America is of a typical Yorktown Formation shell now known as Chesapecten jeffersonius, probably collected from the Yorktown area and very likely from within what is now COLO. Fossil shells were observed by participants of the 1781 siege of Yorktown, and the landmark known as “Cornwallis Cave” is carved into rock made of shell fragments. Scientific description of Yorktown Formation fossils began in the early 19th century. At least 25 fossil species have been named from specimens known to have been discovered within COLO boundaries, and at least another 96 have been named from specimens potentially discovered within COLO, but with insufficient locality information to be certain. At least a dozen external repositories and probably many more have fossils collected from lands now within COLO, but again limited locality information makes it difficult to be sure. This paleontological resource inventory is the first of its kind for Colonial National Historical Park (COLO). Although COLO fossils have been studied as part of the Northeast Coastal Barrier Network (NCBN; Tweet et al. 2014) and, to a lesser extent, as part of a thematic inventory of caves (Santucci et al. 2001), the park had not received a comprehensive paleontological inventory before this report. This inventory allows for a deeper understanding of the park’s paleontological resources and compiles information from historical papers as well as recently completed field work. In summer 2020, researchers went into the field and collected eight bulk samples from three different localities within COLO. These samples will be added to COLO’s museum collections, making their overall collection more robust. In the future, these samples may be used for educational purposes, both for the general public and for employees of the park.
3

Bingham, Sonia, and Craig Young. Sentinel wetlands in Cuyahoga Valley National Park: I. Ecological characterization and management insights, 2008–2018. Edited by Tani Hubbard. National Park Service, February 2023. http://dx.doi.org/10.36967/2296885.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Sentinel wetlands at Cuyahoga Valley National Park (NP) comprise a set of twenty important management areas and reference sites. These wetlands are monitored more closely than other wetlands in the wetlands monitoring program and are the focus of the volunteer monitoring program for water levels. We used the Ohio Rapid Assessment Method (ORAM) to evaluate habitat in the sentinel wetlands. A total of 37 long-term sample plots have been established within these wetlands to monitor biological condition over time using vegetation as an indicator. Vegetation is intensively surveyed using the Vegetation Index of Biotic Integrity (VIBI), where all plant species within the plot are identified to the lowest taxonomic level possible (genus or species). Sample plots were surveyed twice from 2008 to 2018 and the vegetation data were evaluated using five metrics: VIBI, Floristic Quality Assessment Index (FQAI), percent sensitive plant species, percent invasive graminoids, and species richness. These metrics are discussed for each location. This report also highlights relevant land use histories, common native plant species, and invasive species of concern at each wetland. This is the first report in a two-part series, designed to summarize the results from intensive vegetation surveys completed at sentinel wetlands in 2008–2018. Boston Mills, Virginia Kendall Lake, Stumpy Basin, Columbia, and Beaver Marsh are all in excellent condition at one or more plots. They have unique habitats with some specialized plant species. Fawn Pond is in good condition at most plots and scores very high in comparison to other wetlands within the riverine mainstem hydrogeomorphic class. Metric scores across mitigation wetlands were low. Two of the three wetlands (Brookside and Rockside) are not meeting the benchmarks originally established by the United States Army Corps of Engineers and Ohio Environmental Protection Agency. Krejci is still a young mitigation site and success will be determined over time. Park-supported invasive species control efforts will be crucial for long-term success of these sites and future mitigation/restoration projects. The wetlands monitored because of proposed ecological restoration projects (Pleasant Valley, Stanford, and Fawn Pond) have extensive invasive plant communities. These restoration sites should be re-evaluated for their feasibility and potential success and given an order of prioritization relative to the newer list of restoration sites. Cuyahoga Valley NP has added many new areas to their list of potential wetland restoration sites after these areas were selected, and there may be better opportunities available based on restoration objectives. Restoration goals should be based on the park's desired future conditions, and mitigation goals of outside partners may not always be in line with those. The multiple VIBI plots dispersed throughout the large wetlands at Cuyahoga Valley NP detected and illuminated spatial patterns in condition. Many individual wetlands had a wide range of VIBI scores within their boundaries, sometimes reflecting localized disturbances, past modifications, and management actions. Most often, these large fluctuations in condition were linked to local invasive plant infestations. These infestations appear to be the most obvious and widespread threat to wetland ecosystems within the park, but also the most controllable threat. Some sensitive species are still present in some of the lowest scoring plots, which indicates that invasive plant species control efforts may pay off immediately with a resurgence of native communities. Invasive plant control at rare habitat sites would have large payoffs over time by protecting some of the park's most unique wetlands. Reference wetlands would also be good demonstration sites for park managers to try to maintain exemplary conditions through active management. Through this work, park managers can evaluate the feasibility, effectiveness, and scalability of management practices required to maintain wetland condition.
4

Bingham, Sonia, Craig Young, and Tanni Hubbard. Sentinel wetlands in Cuyahoga Valley National Park: II. Condition trends for wetlands of management concern, 2008?2018. National Park Service, 2023. http://dx.doi.org/10.36967/2301705.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Twenty important management areas (wetlands of management concern) and reference wetlands compose the sentinel wetlands at Cuyahoga Valley National Park. These wetlands are monitored more intensively than other wetlands in the program. This is the second report in a two-part series, designed to summarize the results from intensive vegetation surveys completed at sentinel wetlands from 2008 to 2018. The first report (Bingham and Young 2023) characterized the conditions in each wetland and provided baseline reference information for other reports and site-specific projects. In this report, we examine results from five selected metrics more closely within and across three natural wetlands of management concern groups (restoration wetlands, mitigation wetlands, and rare habitat wetlands) using the reference wetlands as overall benchmarks. We used the Ohio Rapid Assessment Method (ORAM) to evaluate habitat in the sentinel wetlands. In addition, a total of 37 long-term sample plots were established within these wetlands to monitor biological conditions over time using vegetation as an indicator. Multiple plots were located in larger wetland complexes to capture spatial differences in condition. Vegetation was intensively surveyed within the plots using the Vegetation Index of Biotic Integrity (VIBI), where all plant species are identified to the lowest taxonomic level possible (genus or species). The sample plots were surveyed twice, and the five evaluation metrics included the VIBI score, Floristic Quality Assessment Index (FQAI), percent sensitive plant species, percent invasive graminoids, and species richness. For the analysis, VIBI plot locations were rank ordered based on their 2018 scores, the range and average for each metric was examined across the wetlands of management concern groups and plotted against reference wetlands for comparison, and the two survey years (pre-2015 and 2018) were plotted against each other for substantial changes from the established baseline. Across the sample plot locations, VIBI scores ranged from a low of 7 (Stanford Run SF1) to a high of 91 (Columbia Run 554). The top scoring plots were at four reference wetlands (Stumpy Basin 526, Virginia Kendall Lake 241K, Columbia Run 554, and Boston Mills 683) and one rare habitat wetland (Beaver Marsh BM3). All of these plots fell within an excellent condition range in one or both survey years. They each have unique habitats with some specialized plant species. The majority (24) of the sentinel wetlands plots ranked within the poor or fair ranges. These include the three mitigation wetlands: Brookside 968, Rockside RS2, and Krejci, as well as all plots within the Pleasant Valley and Stanford Run wetlands. Most of the large wetlands had dramatic condition differences within their boundaries? effected by pollution sources, land-use modifications, and/or invasive species in some areas more than others. We documented these wide condition ranges at Fawn Pond, Virginia Kendall Lake, Beaver Marsh and Stumpy Basin, but the most pronounced within-wetland differences were at Virginia Kendall Lake, which had a 58-point difference between the highest and lowest scoring plot. Fawn Pond is in good condition at most plots and scored very high in comparison to other wetlands within the riverine mainstem hydrogeomorphic class. The average and range of most metric scores were notably different across the four different wetlands groups. Average values at rare habitat wetlands plots were similar to reference plots for VIBI and FQAI scores, percent invasive graminoids, and percent sensitive metrics. Krejci KR1 and Fawn Pond FP3 had unusually high percent cover of sensitive species (31.0% and 27.9%, respectively) for the mitigation and restoration groupings. However, average overall metric scores across the restoration and mitigation wetlands were generally very low, with Stanford Run being the lowest scoring restoration wetland and Brookside being the lowest scoring mitigation wetland. With restoration efforts completed, the expectation is that mitigation wetlands should be performing much higher. Two of the three mitigation wetlands sites are not meeting the mitigation benchmarks that were created for them by the US Army Corp of Engineers and the Ohio Environmental Protection Agency. Contractor reports state that the wetlands met the criteria within the first five years of establishment. However, upon release from monitoring and maintenance, invasive species have gradually re-established, which has led to condition deterioration over time, and lower metric scores. VIBI scores stayed the same or improved (only slightly in many cases) in the majority of plots (67.6%) between survey years. The Krecji mitigation wetlands had the largest improvement in VIBI scoring. Scores at six plots decreased by at least 10 points from the baseline survey. Two of the park?s most beloved wetlands, Beaver Marsh (at one location) and the Stumpy Basin reference plot, had the two most notable declines in VIBI scores. In 2018, 11 plots (29.7%) had greater than 25% invasive graminoid cover (e.g. cattail, common reed grass, reed canary grass) and 18 plots (48.7%) experienced an increase in invasive graminoid cover between survey years. A marked increase (>10% cover) in invasive graminoids was documented at eight locations (Rockside 1079RS2, Beaver Marsh BM5, Fawn Pond FP3 and FP4, Brookside 968, Stumpy Basin SB1, and two other Pleasant Valley plots: 1049 and 969). These trends are likely to continue, and biological conditions are expected to deteriorate at these wetlands in response. Regardless of invasive species increases, many of the wetlands showed remarkable resilience over the last decade with fairly stable VIBI categories.

До бібліографії