Увійти

Готові списки джерел за темами / Corpus de tweets / Статті в журналах

Статті в журналах з теми "Corpus de tweets"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Corpus de tweets.

Автор: Grafiati

Опубліковано: 22 червня 2024

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Corpus de tweets".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Mitra, Tanushree, and Eric Gilbert. "CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations." Proceedings of the International AAAI Conference on Web and Social Media 9, no. 1 (August 3, 2021): 258–67. http://dx.doi.org/10.1609/icwsm.v9i1.14625.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Social media has quickly risen to prominence as a news source, yet lingering doubts remain about its ability to spread rumor and misinformation. Systematically studying this phenomenon, however, has been difficult due to the need to collect large-scale, unbiased data along with in-situ judgements of its accuracy. In this paper we present CREDBANK, a corpus designed to bridge this gap by systematically combining machine and human computation. Specifically, CREDBANK is a corpus of tweets, topics, events and associated human credibility judgements. It is based on the real-time tracking of more than 1 billion streaming tweets over a period of more than three months, computational summarizations of those tweets, and intelligent routings of the tweet streams to human annotators — within a few hours of those events unfolding on Twitter. In total CREDBANK comprises more than 60 million tweets grouped into 1049 real-world events, each annotated by 30 human annotators. As an example, with CREDBANK one can quickly calculate that roughly 24% of the events in the global tweet stream are not perceived as credible. We have made CREDBANK publicly available, and hope it will enable new research questions related to online information credibility in fields such as social science, data mining and health.

2

Chen, Lu, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, and Amit Sheth. "Extracting Diverse Sentiment Expressions with Target-Dependent Polarity from Twitter." Proceedings of the International AAAI Conference on Web and Social Media 6, no. 1 (August 3, 2021): 50–57. http://dx.doi.org/10.1609/icwsm.v6i1.14252.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.

3

Yang, Yuan-Chi, Mohammed Ali Al-Garadi, Whitney Bremer, Jane M. Zhu, David Grande, and Abeed Sarker. "Developing an Automatic System for Classifying Chatter About Health Services on Twitter: Case Study for Medicaid." Journal of Medical Internet Research 23, no. 5 (May 3, 2021): e26616. http://dx.doi.org/10.2196/26616.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Background The wide adoption of social media in daily life renders it a rich and effective resource for conducting near real-time assessments of consumers’ perceptions of health services. However, its use in these assessments can be challenging because of the vast amount of data and the diversity of content in social media chatter. Objective This study aims to develop and evaluate an automatic system involving natural language processing and machine learning to automatically characterize user-posted Twitter data about health services using Medicaid, the single largest source of health coverage in the United States, as an example. Methods We collected data from Twitter in two ways: via the public streaming application programming interface using Medicaid-related keywords (Corpus 1) and by using the website’s search option for tweets mentioning agency-specific handles (Corpus 2). We manually labeled a sample of tweets in 5 predetermined categories or other and artificially increased the number of training posts from specific low-frequency categories. Using the manually labeled data, we trained and evaluated several supervised learning algorithms, including support vector machine, random forest (RF), naïve Bayes, shallow neural network (NN), k-nearest neighbor, bidirectional long short-term memory, and bidirectional encoder representations from transformers (BERT). We then applied the best-performing classifier to the collected tweets for postclassification analyses to assess the utility of our methods. Results We manually annotated 11,379 tweets (Corpus 1: 9179; Corpus 2: 2200) and used 7930 (69.7%) for training, 1449 (12.7%) for validation, and 2000 (17.6%) for testing. A classifier based on BERT obtained the highest accuracies (81.7%, Corpus 1; 80.7%, Corpus 2) and F1 scores on consumer feedback (0.58, Corpus 1; 0.90, Corpus 2), outperforming the second best classifiers in terms of accuracy (74.6%, RF on Corpus 1; 69.4%, RF on Corpus 2) and F1 score on consumer feedback (0.44, NN on Corpus 1; 0.82, RF on Corpus 2). Postclassification analyses revealed differing intercorpora distributions of tweet categories, with political (400778/628411, 63.78%) and consumer feedback (15073/27337, 55.14%) tweets being the most frequent for Corpus 1 and Corpus 2, respectively. Conclusions The broad and variable content of Medicaid-related tweets necessitates automatic categorization to identify topic-relevant posts. Our proposed system presents a feasible solution for automatic categorization and can be deployed and generalized for health service programs other than Medicaid. Annotated data and methods are available for future studies.

4

Al-Twairesh, Nora, Hend Al-Khalifa, AbdulMalik Al-Salman, and Yousef Al-Ohali. "AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets." Procedia Computer Science 117 (2017): 63–72. http://dx.doi.org/10.1016/j.procs.2017.10.094.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Abayomi-Alli, Adebayo, Olusola Abayomi-Alli, Sanjay Misra, and Luis Fernandez-Sanz. "Study of the Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms." Information 13, no. 3 (March 15, 2022): 152. http://dx.doi.org/10.3390/info13030152.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Mining opinion on social media microblogs presents opportunities to extract meaningful insight from the public from trending issues like the “yahoo-yahoo” which in Nigeria, is synonymous to cybercrime. In this study, content analysis of selected historical tweets from “yahoo-yahoo” hash-tag was conducted for sentiment and topic modelling. A corpus of 5500 tweets was obtained and pre-processed using a pre-trained tweet tokenizer while Valence Aware Dictionary for Sentiment Reasoning (VADER), Liu Hu method, Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and Multidimensional Scaling (MDS) graphs were used for sentiment analysis, topic modelling and topic visualization. Results showed the corpus had 173 unique tweet clusters, 5327 duplicates tweets and a frequency of 9555 for “yahoo”. Further validation using the mean sentiment scores of ten volunteers returned R and R2 of 0.8038 and 0.6402; 0.5994 and 0.3463; 0.5999 and 0.3586 for Human and VADER; Human and Liu Hu; Liu Hu and VADER sentiment scores, respectively. While VADER outperforms Liu Hu in sentiment analysis, LDA and LSI returned similar results in the topic modelling. The study confirms VADER’s performance on unstructured social media data containing non-English slangs, conjunctions, emoticons, etc. and proved that emojis are more representative of sentiments in tweets than the texts.

6

V, Ashwin. "Twitter Tweet Classifier." IAES International Journal of Artificial Intelligence (IJ-AI) 5, no. 1 (March 1, 2016): 41. http://dx.doi.org/10.11591/ijai.v5.i1.pp41-44.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

<p>This paper addresses the task of building a classifier that would categorise tweets in Twitter. Microblogging nowadays has become a tool of communication for Internet users. They share opinion on different aspects of life. As the popularity of the microblogging sites increases the closer we get to the era of Information Explosion.Twitter is the second most used microblogging site which handles more than 500 million tweets tweeted everyday which translates to mind boggling 5,700 tweets per second. Despite the humongous usage of twitter there isn’t any specific classifier for these tweets that are tweeted on this site. This research attempts to segregate tweets and classify them to categories like Sports, News, Entertainment, Technology, Music, TV, Meme, etc. Naïve Bayes, a machine learning algorithm is used for building a classifier which classifies the tweets when trained with the twitter corpus. With this kind of classifier the user may simply skim the tweets without going through the tedious work of skimming the newsfeed.</p>

7

Park, Jung Ran, and Houda El Mimouni. "Emoticons and non-verbal communications across Arabic, English, and Korean Tweets." Global Knowledge, Memory and Communication 69, no. 8/9 (June 6, 2020): 579–95. http://dx.doi.org/10.1108/gkmc-02-2020-0021.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Purpose The purpose of this study is to examine how tweeters drawn from three different languages and cultural boundaries manage the lack of contextual cues through an analysis of Arabic, English and Korean tweets. Design/methodology/approach Data for this study is drawn from a corpus of tweets (n = 1,200) streamed using Python through Twitter API. Using the language information, the authors limited the number of tweets to 400 randomly selected tweets from each language, totaling 1,200 tweets. Final coding taxonomy was derived through interactive processes preceded by literature and a preliminary analysis based on a small subset (n = 150) by isolating nonverbal communication devices and emoticons. Findings The results of the study present that there is great commonality across these tweets in terms of strategies and creativity in compensating for the constraints imposed by the tweet platform. The language-specific characteristics are also shown in the form of different usage of devices. Research limitations/implications Emoticon usage indicates that the communication mode influences online social interaction; the restriction of 140 maximum characters seems to engender a frequent usage of emoticons across tweets regardless of language differences. The results of the study bring forth implications into the design of social media technologies that reflect affective aspects of communication and language-/culture-specific traits and characteristics. Originality/value To the best of the authors’ knowledge, there are no qualitative studies examining paralinguistic nonverbal communication cues in the Twitter platform across language boundaries.

8

Li, Quanzhi, Sameena Shah, Xiaomo Liu, and Armineh Nourbakhsh. "Data Sets: Word Embeddings Learned from Tweets and General Data." Proceedings of the International AAAI Conference on Web and Social Media 11, no. 1 (May 3, 2017): 428–36. http://dx.doi.org/10.1609/icwsm.v11i1.14859.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

A word embedding is a low-dimensional, dense and real-valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually generated from a large text corpus. The embedding of a word captures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets and the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general data. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks.

9

Vieira da Silva, Fernando J., Norton T. Roman, and Ariadne M. B. R. Carvalho. "Stock market tweets annotated with emotions." Corpora 15, no. 3 (November 2020): 343–54. http://dx.doi.org/10.3366/cor.2020.0203.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

As stock trading became a popular topic on Twitter, many researchers have proposed different approaches to make predictions on it, relying on the emotions found in messages. However, detailed studies require a reasonably sized corpus with emotions properly annotated. In this work, we introduce a corpus of tweets in Brazilian Portuguese annotated with emotions. Comprising 4,277 tweets, this is, to the best of our knowledge, the largest annotated corpus available in the stock market domain for this language. Amongst its possible uses, the corpus lends itself to the application of machine learning models for automatic emotion identification, as well as to the study of correlations between emotions and stock price movements.

10

McDonald, Graham, Romain Deveaud, Richard McCreadie, Craig Macdonald, and Iadh Ounis. "Tweet Enrichment for Effective Dimensions Classification in Online Reputation Management." Proceedings of the International AAAI Conference on Web and Social Media 9, no. 1 (August 3, 2021): 654–57. http://dx.doi.org/10.1609/icwsm.v9i1.14674.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Online Reputation Management (ORM) is concerned with the monitoring of public opinions on social media for entities such as commercial organisations. In particular, we investigate the task of reputation dimension classification, which aims to classify tweets that mention a business entity into different dimensions (e.g. "financial performance'' or "products and services''). However, producing a general reputation dimension classification system that can be used across businesses of different types is challenging, due to the brief nature of tweets and the lack of terms in tweets that relate to specific reputation dimensions. To tackle these issues, we propose a robust and effective tweet enrichment approach that expands tweets with additional discriminative terms from a contemporary Web corpus. Using the RepLab 2014 test collection, we show that our tweet enrichment approach outperforms effective baselines including the top performing submission to RepLab 2014. Moreover, we show that the achieved accuracy scores are very close to the upper bound that our approach could achieve on this collection.

11

Smułczyński, Michał. "Microblogging in Denmark and Poland — a contrastive analysis. Part II." Scandinavian Philology 19, no. 2 (2021): 285–312. http://dx.doi.org/10.21638/11701/spbu21.2021.205.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This article is the second part of a comparative study of Danish and Polish tweets, inspired by the anthology “Microblogs global”. The first part of the study deals with the social network and microblogging tool Twitter, including the more technical side of microblogging. The many tweet types and the extensive terminology in the field were thoroughly and conscientiously explained. In addition, the contrasts concerning orthography and spoken language were analysed. For the following description, 320 Danish and 320 Polish tweets were collected from randomly selected profiles belonging to various politicians, journalists, and private individuals posting mainly in Danish/ Polish. The analysis covers the period from 30 March to 6 April 2019, and includes differences in tweets in terms of word preference, e.g. colloquialisms, anglicisms, or profanity, to technical and linguistic reductions and the structure of sentences. The degree of graphostilistics in relation to emojis, emoticons, smileys, and iteration is analyzed. Interaction operators are investigated, as well as functional aspects of tweets, i.e. whether tweets can be categorized as messages, comments, statements, or questions. The greatest contrasts occur in such domains as reduction, graphostylistics, and interaction. While the Polish corpus contains only 15 short forms/composites/abbreviations, the Danish corpus contains 3 times more such words (51 examples). Differences in graphostilistics concern above all the number and types of emojis. In addition, emoticons are repeated or concentrated only in the Danish tweets. The divergences in vocabulary and syntax are almost imperceptible.

12

Slemp, Katie. "Attitudes towards varied inclusive language use in Spanish on Twitter." Working papers in Applied Linguistics and Linguistics at York 1 (September 13, 2021): 60–74. http://dx.doi.org/10.25071/2564-2855.6.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Research into gender-inclusive language in Spanish has demonstrated that inclusive language generally appears in four forms: doublets, -@, -x, and -e. There is little research on language attitudes towards the use of gender-inclusive language in Spanish, although studies exist for other languages. The present study compiled a corpus of published tweets that contained the markers -@, -x, and -e. Based on this data, hypothetical tweets were constructed that fell into four different categories, corresponding to the author of the tweet: business, personal, academic, and political. These hypothetical tweets were built into an attitudes survey that was distributed on Twitter. Findings indicate that language attitudes for each type of inclusive marker and category of tweet are generally positive. Statistical analysis indicates a significant relationship between gender identity and attitudes towards the use of inclusive language in the political category.

13

Maceda, Lany L., Jennifer L. Llovido, and Thelma D. Palaoag. "Corpus Analysis of Earthquake Related Tweets through Topic Modelling." International Journal of Machine Learning and Computing 7, no. 6 (December 2017): 194–97. http://dx.doi.org/10.18178/ijmlc.2017.7.6.645.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Roberts, Helen, Bernd Resch, Jon Sadler, Lee Chapman, Andreas Petutschnig, and Stefan Zimmer. "Investigating the Emotional Responses of Individuals to Urban Green Space Using Twitter Data: A Critical Comparison of Three Different Methods of Sentiment Analysis." Urban Planning 3, no. 1 (March 29, 2018): 21–33. http://dx.doi.org/10.17645/up.v3i1.1231.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In urban research, Twitter data have the potential to provide additional information about urban citizens, their activities, mobility patterns and emotion. Extracting the sentiment present in tweets is increasingly recognised as a valuable approach to gathering information on the mood, opinion and emotional responses of individuals in a variety of contexts. This article evaluates the potential of deriving emotional responses of individuals while they experience and interact with urban green space. A corpus of over 10,000 tweets relating to 60 urban green spaces in Birmingham, United Kingdom was analysed for positivity, negativity and specific emotions, using manual, semi-automated and automated methods of sentiment analysis and the outputs of each method compared. Similar numbers of tweets were annotated as positive/neutral/negative by all three methods; however, inter-method consistency in tweet assignment between the methods was low. A comparison of all three methods on the same corpus of tweets, using character emojis as an additional quality control, identifies a number of limitations associated with each approach. The results presented have implications for urban planners in terms of the choices available to identify and analyse the sentiment present in tweets, and the importance of choosing the most appropriate method. Future attempts to develop more reliable and accurate algorithms of sentiment analysis are needed and should focus on semi-automated methods.

15

Shin, Han-Sub, Hyuk-Yoon Kwon, and Seung-Jin Ryu. "A New Text Classification Model Based on Contrastive Word Embedding for Detecting Cybersecurity Intelligence in Twitter." Electronics 9, no. 9 (September 18, 2020): 1527. http://dx.doi.org/10.3390/electronics9091527.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Detecting cybersecurity intelligence (CSI) on social media such as Twitter is crucial because it allows security experts to respond cyber threats in advance. In this paper, we devise a new text classification model based on deep learning to classify CSI-positive and -negative tweets from a collection of tweets. For this, we propose a novel word embedding model, called contrastive word embedding, that enables to maximize the difference between base embedding models. First, we define CSI-positive and -negative corpora, which are used for constructing embedding models. Here, to supplement the imbalance of tweet data sets, we additionally employ the background knowledge for each tweet corpus: (1) CVE data set for CSI-positive corpus and (2) Wikitext data set for CSI-negative corpus. Second, we adopt the deep learning models such as CNN or LSTM to extract adequate feature vectors from the embedding models and integrate the feature vectors into one classifier. To validate the effectiveness of the proposed model, we compare our method with two baseline classification models: (1) a model based on a single embedding model constructed with CSI-positive corpus only and (2) another model with CSI-negative corpus only. As a result, we indicate that the proposed model shows high accuracy, i.e., 0.934 of F1-score and 0.935 of area under the curve (AUC), which improves the baseline models by 1.76∼6.74% of F1-score and by 1.64∼6.98% of AUC.

16

Dynel, Marta. "#HaStatoPutin Affinity Space: From Political Work to Autotelic Humor." Social Media + Society 8, no. 4 (October 2022): 205630512211387. http://dx.doi.org/10.1177/20563051221138760.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This article reports the findings of a diachronic sociopragmatic study on the politically loaded Italian hashtag #HaStatoPutin based on an automatically generated corpus of tweets ( N = 31,334), encompassing two datasets from before and after what Putin originally called Russia’s “special military operation” in Ukraine that commenced on 24 February 2022. #HaStatoPutin appeared on Twitter in 2015 to mark tweets criticizing what Italian users considered to be unsupported conspiracy theories targeting Vladimir Putin, viewed by these users as a scapegoat in mainstream political rhetoric spread in the Western world. As this comparative study shows, the emergent applications of the hashtag are hardly affected by the events of 2022, indicating the stability of the expression and the political opinions of polarized Italian society, regardless of the socio-political context. Specifically, four tweet categories, which express tweeters’ political opinions or serve humorous purposes, are identified in both datasets: dissociative echo, counter-criticism, mock conspiracy theories, and metacomments. Given the specificity of #HaStatoPutin, its political contextualization, and applications, with which the users need to be familiar to create and understand tagged tweets, it is proposed that the tweeting practice makes for a “hashtag affinity space” on Twitter.

17

Alruily, Meshrif. "Issues of Dialectal Saudi Twitter Corpus." International Arab Journal of Information Technology 17, no. 3 (May 1, 2019): 367–74. http://dx.doi.org/10.34028/iajit/17/3/10.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Text mining research relies heavily on the availability of a suitable corpus. This paper presents a dialectal Saudi corpus that contains 207452 tweets generated by Saudi Twitter users. In addition, a comparison between the Saudi tweets dataset, Egyptian Twitter corpus and Arabic top news raw corpus (representing Modern Standard Arabic (MSA) in various aspects, such as the differences between formal and colloquial texts was carried out. Moreover, investigation into the issues and phenomena, such as shortening, concatenation, colloquial language, compounding, foreign language, spelling errors and neologisms on this type of dataset was performed.

18

Fernández-Martínez, Nicolás José. "The FGLOCTweet Corpus: An English tweet-based corpus for fine-grained location-detection tasks." Research in Corpus Linguistics 10, no. 1 (2022): 117–33. http://dx.doi.org/10.32714/ricl.10.01.06.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Location detection in social-media microtexts is an important natural language processing task for emergency-based contexts where locative references are identified in text data. Spatial information obtained from texts is essential to understand where an incident happened, where people are in need of help and/or which areas have been affected. This information contributes to raising emergency situation awareness, which is then passed on to emergency responders and competent authorities to act as quickly as possible. Annotated text data are necessary for building and evaluating location-detection systems. The problem is that available corpora of tweets for location-detection tasks are either lacking or, at best, annotated with coarse-grained location types (e.g. cities, towns, countries, some buildings, etc.). To bridge this gap, we present our semi-automatically annotated corpus, the Fine-Grained LOCation Tweet Corpus (FGLOCTweet Corpus), an English tweet-based corpus for fine-grained location-detection tasks, including fine-grained locative references (i.e. geopolitical entities, natural landforms, points of interest and traffic ways) together with their surrounding locative markers (i.e. direction, distance, movement or time). It includes annotated tweet data for training and evaluation purposes, which can be used to advance research in location detection, as well as in the study of the linguistic representation of place or of the microtext genre of social media.

19

Breeze, Ruth. "Angry tweets." Journal of Language Aggression and Conflict 8, no. 1 (February 25, 2020): 118–45. http://dx.doi.org/10.1075/jlac.00033.bre.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract The rise of populism has turned researchers’ attention to the importance of affect in politics. This is a corpus-assisted study investigating lexis in the semantic domain of anger and violence in tweets by radical-right campaigner Nigel Farage in comparison with four other prominent British politicians. Both quantitative and qualitative analyses of discourse show that Farage cultivates a particular set of affective-discursive practices, which bring anger into the public sphere and offer a channel to redirect frustrations. Rather than expressing his own emotions, he presents anger as generalised throughout society, and then performs the role of defending ‘ordinary people’ who are the victims of the elites. This enables him to legitimise violent emotions and actions by appealing to the need for self-assertion and self-defence.

20

Baig, Amber, Mutee U. Rahman, Hameedullah Kazi, and Ahsanullah Baloch. "Developing a POS Tagged Corpus of Urdu Tweets." Computers 9, no. 4 (November 7, 2020): 90. http://dx.doi.org/10.3390/computers9040090.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Processing of social media text like tweets is challenging for traditional Natural Language Processing (NLP) tools developed for well-edited text due to the noisy nature of such text. However, demand for tools and resources to correctly process such noisy text has increased in recent years due to the usefulness of such text in various applications. Literature reports various efforts made to develop tools and resources to process such noisy text for various languages, notably, part-of-speech (POS) tagging, an NLP task having a direct effect on the performance of other successive text processing activities. Still, no such attempt has been made to develop a POS tagger for Urdu social media content. Thus, the focus of this paper is on POS tagging of Urdu tweets. We introduce a new tagset for POS-tagging of Urdu tweets along with the POS-tagged Urdu tweets corpus. We also investigated bootstrapping as a potential solution for overcoming the shortage of manually annotated data and present a supervised POS tagger with an accuracy of 93.8% precision, 92.9% recall and 93.3% F-measure.

21

Singh, Purva. "Covhindia: Deep Learning Framework for Sentiment Polarity Detection of Covid-19 Tweets in Hindi." International Journal on Natural Language Computing 9, no. 5 (October 30, 2020): 23–34. http://dx.doi.org/10.5121/ijnlc.2020.9502.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

On 11th March 2020, the World Health Organization (WHO) declared Corona Virus Disease of 2019 (COVID-19) as a pandemic. Over time, the exponential growth of this disease has highlighted a mixture of sentiments expressed by the general population from various parts of the world speaking varied languages. It is, therefore, essential to analyze the public sentiment during this wave of the pandemic. While much work prevails to determine the sentiment polarity for tweets related to COVID-19, expressed in the English language, we still need to work on public sentiments expressed in languages other than English. This paper proposes a framework, Covhindia, a deep-learning framework that performs sentiment polarity detection of tweets related to COVID-19 posted in the Hindi language on the Twitter platform. The proposed framework leverages machine translation on Hindi tweets and passes the translated data as input to a deep learning model which is trained on an English corpus of COVID-19 tweets posted from India [18]. The paper compares nine deep learning models' performances in classifying the sentiment polarity on an English dataset. Performance comparison of these architectures reveals that the BERT model had the best polarity detection accuracy on the English corpus. As part of testing the Covhindia’s accuracy in performing sentiment classification on Hindi tweets, the paper employs a separate dataset developed using a python library called Tweepy to extract Hindi tweets related to COVID-19. Experimental results reveal that Covhindia achieved state-of-the-art accuracy in classifying COVID-19 tweets posted in the Hindi language. The use of open-source machine translation tools paved the way for leveraging Covhindia for performing multilingual sentiment classification on COVID-19 tweets. For the benefit of the research community, the code and Jupyter Notebooks related to this paper are available on Github

22

Pereira, Márcia Helena de Melo, and Ana Claudia Oliveira Azevedo. "A reelaboração de gêneros em tweets: propósitos comunicativos em 280 caracteres." Fórum Linguístico 19, no. 3 (November 23, 2022): 8232–51. http://dx.doi.org/10.5007/1984-8412.2022.e76925.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Os gêneros do discurso são, segundo Bakhtin (2011), tipos relativamente estáveis de enunciados que circulam nos diferentes campos da atividade humana. Com o passar do tempo, especialmente após a popularização de determinadas tecnologias, como a internet, novos gêneros emergiram, a partir da transmutação/reelaboração de gêneros já existentes. O fenômeno de reelaboração ocorre, também, quando um gênero passa por transformações, sem que um novo gênero surja. Este artigo apresenta um olhar sobre a reelaboração de gêneros no tweet, texto de 280 caracteres publicado na rede social Twitter, visando analisar quais estratégias são utilizadas para reelaboração de gêneros em um corpus de tweets postados por diferentes usuários, com diversos propósitos comunicativos. A análise demonstrou que há uma produtividade do fenômeno de reelaboração nos tweets, nos quais gêneros de esferas diversas, como a jornalística, a publicitária e a cotidiana, além do próprio tweet, são reelaborados.

23

Tak, Raghu. "A Quantifiable Analysis of Ambivalence in Tweets." International Journal for Research in Applied Science and Engineering Technology 10, no. 4 (April 30, 2022): 691–99. http://dx.doi.org/10.22214/ijraset.2022.41340.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract: To detect emotions of the writer in any textual content is a requirement that has been persistent in the industry. Much of the research work has been done on this topic but only a handful of them explore the quantified association of the given text with the purpose of recognizing mixed emotions and emotional ambivalence, our work tends to do so. Emotional Ambivalence is a state in which person feels a blend of both positive and negative emotions. Therefore, we needed a suitable and brief emotion model capableof being bipolar to begin our work with, so we chose Paul Ekman’s Emotion model. This model focuses only on the six primary emotions which are: anger, fear, joy, sadness, surprise and disgust. In our work we used a Lexicon based approach for emotion recognition.The objective was achieved with the help of newly created lexicon for bi-grams that contained annotations according to their PMI scores. We also created a subset of Twitter Emotional Corpus (TEC) by passing it to our annotators, to test our approach. The automatic process through which the original corpus was created did have some wrong annotations; thereforea manual cross validation of this corpus was required. So, we chose three annotators for this purpose and fetched a uniform agreed corpus on whichwe worked further. The output of our approach detects a combination of the emotions felt associated with their respective values. Keywords: Emotion Analysis, Affect Computing, Sentiment Analysis

24

Weissenbacher, Davy, Abeed Sarker, Ari Klein, Karen O’Connor, Arjun Magge, and Graciela Gonzalez-Hernandez. "Deep neural networks ensemble for detecting medication mentions in tweets." Journal of the American Medical Informatics Association 26, no. 12 (September 27, 2019): 1618–26. http://dx.doi.org/10.1093/jamia/ocz156.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract Objective Twitter posts are now recognized as an important source of patient-generated data, providing unique insights into population health. A fundamental step toward incorporating Twitter data in pharmacoepidemiologic research is to automatically recognize medication mentions in tweets. Given that lexical searches for medication names suffer from low recall due to misspellings or ambiguity with common words, we propose a more advanced method to recognize them. Materials and Methods We present Kusuri, an Ensemble Learning classifier able to identify tweets mentioning drug products and dietary supplements. Kusuri (薬, “medication” in Japanese) is composed of 2 modules: first, 4 different classifiers (lexicon based, spelling variant based, pattern based, and a weakly trained neural network) are applied in parallel to discover tweets potentially containing medication names; second, an ensemble of deep neural networks encoding morphological, semantic, and long-range dependencies of important words in the tweets makes the final decision. Results On a class-balanced (50-50) corpus of 15 005 tweets, Kusuri demonstrated performances close to human annotators with an F1 score of 93.7%, the best score achieved thus far on this corpus. On a corpus made of all tweets posted by 112 Twitter users (98 959 tweets, with only 0.26% mentioning medications), Kusuri obtained an F1 score of 78.8%. To the best of our knowledge, Kusuri is the first system to achieve this score on such an extremely imbalanced dataset. Conclusions The system identifies tweets mentioning drug names with performance high enough to ensure its usefulness, and is ready to be integrated in pharmacovigilance, toxicovigilance, or more generally, public health pipelines that depend on medication name mentions.

25

Escamilla, Imelda, Clodoveu A. Davis Jr., Marco Moreno-Ibarra, and Vladimir Luna. "Geocoding of Spatial Relationships Contained in Tweets." International Journal of Knowledge Society Research 7, no. 1 (January 2016): 26–42. http://dx.doi.org/10.4018/ijksr.2016010102.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Human ability to understand approximate references to locations, disambiguated by means of context and reasoning about spatial relationships, is the key to describe spatial environments and to share information about them. In this paper, the authors propose an approach for geocoding that takes advantage of the spatial relationships contained in the text of tweets, using semantic and spatial analyses. Microblog text has special characteristics (e.g. slang, abbreviations, acronyms, etc.) and thus represents a special variation of natural language. The main objective of this work is to associate spatial relationships found in text with a spatial footprint, to determine the location of the event described in the tweet. The feasibility of the proposal is demostrated using a corpus of 200,000 tweets posted in Spanish related with traffic events in Mexico City.

26

Tahir, Bilal, and Muhammad Amir Mehmood. "Anbar: Collection and analysis of a large scale Urdu language Twitter corpus." Journal of Intelligent & Fuzzy Systems 42, no. 5 (March 31, 2022): 4789–800. http://dx.doi.org/10.3233/jifs-219266.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The confluence of high performance computing algorithms and large scale high-quality data has led to the availability of cutting edge tools in computational linguistics. However, these state-of-the-art tools are available only for the major languages of the world. The preparation of large scale high-quality corpora for low-resource language such as Urdu is a challenging task as it requires huge computational and human resources. In this paper, we build and analyze a large scale Urdu language Twitter corpus Anbar. For this purpose, we collect 106.9 million Urdu tweets posted by 1.69 million users during one year (September 2018-August 2019). Our corpus consists of tweets with a rich vocabulary of 3.8 million unique tokens along with 58K hashtags and 62K URLs. Moreover, it contains 75.9 million (71.0%) retweets and 847K geotagged tweets. Furthermore, we examine Anbar using a variety of metrics like temporal frequency of tweets, vocabulary size, geo-location, user characteristics, and entities distribution. To the best of our knowledge, this is the largest repository of Urdu language tweets for the NLP research community which can be used for Natural Language Understanding (NLU), social analytics, and fake news detection.

27

Almuqren, Latifah, and Alexandra Cristea. "AraCust: a Saudi Telecom Tweets corpus for sentiment analysis." PeerJ Computer Science 7 (May 20, 2021): e510. http://dx.doi.org/10.7717/peerj-cs.510.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Comparing Arabic to other languages, Arabic lacks large corpora for Natural Language Processing (Assiri, Emam & Al-Dossari, 2018; Gamal et al., 2019). A number of scholars depended on translation from one language to another to construct their corpus (Rushdi-Saleh et al., 2011). This paper presents how we have constructed, cleaned, pre-processed, and annotated our 20,0000 Gold Standard Corpus (GSC) AraCust, the first Telecom GSC for Arabic Sentiment Analysis (ASA) for Dialectal Arabic (DA). AraCust contains Saudi dialect tweets, processed from a self-collected Arabic tweets dataset and has been annotated for sentiment analysis, i.e.,manually labelled (k=0.60). In addition, we have illustrated AraCust’s power, by performing an exploratory data analysis, to analyse the features that were sourced from the nature of our corpus, to assist with choosing the right ASA methods for it. To evaluate our Golden Standard corpus AraCust, we have first applied a simple experiment, using a supervised classifier, to offer benchmark outcomes for forthcoming works. In addition, we have applied the same supervised classifier on a publicly available Arabic dataset created from Twitter, ASTD (Nabil, Aly & Atiya, 2015). The result shows that our dataset AraCust outperforms the ASTD result with 91% accuracy and 89% F1avg score. The AraCust corpus will be released, together with code useful for its exploration, via GitHub as a part of this submission.

28

Valdez, Danny, and Jennifer B. Unger. "Difficulty Regulating Social Media Content of Age-Restricted Products: Comparing JUUL’s Official Twitter Timeline and Social Media Content About JUUL." JMIR Infodemiology 1, no. 1 (December 7, 2021): e29011. http://dx.doi.org/10.2196/29011.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Background In 2018, JUUL Labs Inc, a popular e-cigarette manufacturer, announced it would substantially limit its social media presence in compliance with the Food and Drug Administration’s (FDA) call to curb underage e-cigarette use. However, shortly after the announcement, a series of JUUL-related hashtags emerged on various social media platforms, calling the effectiveness of the FDA’s regulations into question. Objective The purpose of this study is to determine whether hashtags remain a common venue to market age-restricted products on social media. Methods We used Twitter’s standard application programming interface to download the 3200 most-recent tweets originating from JUUL Labs Inc’s official Twitter Account (@JUULVapor), and a series of tweets (n=28,989) from other Twitter users containing either #JUUL or mentioned JUUL in the tweet text. We ran exploratory (10×10) and iterative Latent Dirichlet Allocation (LDA) topic models to compare @JUULVapor’s content versus our hashtag corpus. We qualitatively deliberated topic meanings and substantiated our interpretations with tweets from either corpus. Results The topic models generated for @JUULVapor’s timeline seemingly alluded to compliance with the FDA’s call to prohibit marketing of age-restricted products on social media. However, the topic models generated for the hashtag corpus of tweets from other Twitter users contained several references to flavors, vaping paraphernalia, and illicit drugs, which may be appealing to younger audiences. Conclusions Our findings underscore the complicated nature of social media regulation. Although JUUL Labs Inc seemingly complied with the FDA to limit its social media presence, JUUL and other e-cigarette manufacturers are still discussed openly in social media spaces. Much discourse about JUUL and e-cigarettes is spread via hashtags, which allow messages to reach a wide audience quickly. This suggests that social media regulations on manufacturers cannot prevent e-cigarette users, influencers, or marketers from spreading information about e-cigarette attributes that appeal to the youth, such as flavors. Stricter protocols are needed to regulate discourse about age-restricted products on social media.

29

Makowska, Magdalena. "#naukanatwitterze. O multimodalnym designie informacji w dyskursie cyfrowym." Forum Lingwistyczne, no. 7 (November 20, 2020): 89–104. http://dx.doi.org/10.31261/fl.2020.07.07.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The subject of the article are the ways of popularizing scientific and popular science knowledge by using Twitter as a social media platform. The research corpus comprises 100 tweets which come from institutional and individual senders, including both representatives of the world of science as well as of the media that aim to popularize scientific knowledge. As a result of the medialinguistic analysis, which focused on the structural and functional plane of the studied media texts, the high representativeness of multimodal tweets was noted. Namely, language and image contextualize each other: the image allows us to fill the semantic gap resulting from the ellipticalness of the verbal layer of a given tweet. Thus, the contribution of the multimodal information design to knowledge transfer in digital discourse has been demonstrated.

30

Schaefer, Robin, and Manfred Stede. "Argument Mining on Twitter: A survey." it - Information Technology 63, no. 1 (February 1, 2021): 45–58. http://dx.doi.org/10.1515/itit-2020-0053.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract In the last decade, the field of argument mining has grown notably. However, only relatively few studies have investigated argumentation in social media and specifically on Twitter. Here, we provide the, to our knowledge, first critical in-depth survey of the state of the art in tweet-based argument mining. We discuss approaches to modelling the structure of arguments in the context of tweet corpus annotation, and we review current progress in the task of detecting argument components and their relations in tweets. We also survey the intersection of argument mining and stance detection, before we conclude with an outlook.

31

Martínez-Cámara, Eugenio, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, and Ruslan Mitkov. "Polarity classification for Spanish tweets using the COST corpus." Journal of Information Science 41, no. 3 (February 3, 2015): 263–72. http://dx.doi.org/10.1177/0165551514566564.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

32

Sulaiman, Hamdun, Muhamad Ryansyah, Kudiantoro Widianto, Sidik Sidik, and Andria Nugraha. "Implementasi Machine Learning Dengan Metode Text Mining Pada Twitter." Infotek : Jurnal Informatika dan Teknologi 7, no. 1 (January 20, 2024): 52–62. http://dx.doi.org/10.29408/jit.v7i1.23734.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Currently PT. Telkom Indonesia (Indihome), uses the role of social media as a form of concern for its customers to handle complaints. Tweets from indihome customers on social media twitter are handled by the customer service division of Indihome. The manual of the categorization process carried out by the customer service division of Indihome on every narration of the "complain" complaint tweet that goes to @indihome twitter, makes the process considered inefficient. The purpose of this research is to provide solutions related to the problem of categorizing complaint tweets and to develop tools that can extract the narration of "complain" tweets in Indonesian. The research method used is comparative. On the other hand, gataframework and rapidminer tools are also used in this research to assist in preprocessing and cleaning of datasets to help create corpus and sentiment analysis. The total dataset after cleansing and preprocessing is 1,510. Based on the method proposed in this study on the Support Vector Machine classification algorithm, the highest category was found to have 82.42% accuracy, 75.33% precision, and 98.75% recall with an AUC of 0.826

33

Bel-Enguix, Gemma, Helena Gómez-Adorno, Alejandro Pimentel, Sergio-Luis Ojeda-Trueba, and Brian Aguilar-Vizuet. "Negation Detection on Mexican Spanish Tweets: The T-MexNeg Corpus." Applied Sciences 11, no. 9 (April 25, 2021): 3880. http://dx.doi.org/10.3390/app11093880.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In this paper, we introduce the T-MexNeg corpus of Tweets written in Mexican Spanish. It consists of 13,704 Tweets, of which 4895 contain negation structures. We performed an analysis of negation statements embedded in the language employed on social media. This research paper aims to present the annotation guidelines along with a novel resource targeted at the negation detection task. The corpus was manually annotated with labels of negation cue, scope, and, event. We report the analysis of the inter-annotator agreement for all the components of the negation structure. This resource is freely available. Furthermore, we performed various experiments to automatically identify negation using the T-MexNeg corpus and the SFU ReviewSP-NEG for training a machine learning algorithm. By comparing two different methodologies, one based on a dictionary and the other based on the Conditional Random Fields algorithm, we found that the results of negation identification on Twitter are lower when the model is trained on the SFU ReviewSP-NEG Corpus. Therefore, this paper shows the importance of having resources built specifically to deal with social media language.

34

Al-Laith, Ali, Muhammad Shahbaz, Hind F. Alaskar, and Asim Rehmat. "AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus." Applied Sciences 11, no. 5 (March 9, 2021): 2434. http://dx.doi.org/10.3390/app11052434.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

At a time when research in the field of sentiment analysis tends to study advanced topics in languages, such as English, other languages such as Arabic still suffer from basic problems and challenges, most notably the availability of large corpora. Furthermore, manual annotation is time-consuming and difficult when the corpus is too large. This paper presents a semi-supervised self-learning technique, to extend an Arabic sentiment annotated corpus with unlabeled data, named AraSenCorpus. We use a neural network to train a set of models on a manually labeled dataset containing 15,000 tweets. We used these models to extend the corpus to a large Arabic sentiment corpus called “AraSenCorpus”. AraSenCorpus contains 4.5 million tweets and covers both modern standard Arabic and some of the Arabic dialects. The long-short term memory (LSTM) deep learning classifier is used to train and test the final corpus. We evaluate our proposed framework on two external benchmark datasets to ensure the improvement of the Arabic sentiment classification. The experimental results show that our corpus outperforms the existing state-of-the-art systems.

35

Albu, Elena. "“Tired, emotional and very very happy. Fantastic day #AFC.” The Expression of Emotions on Twitter during the 2014 European Elections." Recherches anglaises et nord-américaines 51, no. 1 (2018): 57–70. http://dx.doi.org/10.3406/ranam.2018.1564.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Exprimer et susciter des émotions différentes est une stratégie communicative stratégiquement exploitée par les politiciens à l’occasion de campagnes électorales. Cet article vise à expliquer comment les politiciens qui se sont présentés aux élections européennes de 2014 ont exprimé leurs émotions sur la plate-forme Twitter. Afin de mieux comprendre comment les émotions sont transmises et interprétées dans le milieu numérique, les questions suivantes sont posées : (1) Quelles émotions sont fréquemment utilisées ? (2.) Comment sont-elles distribuées ? (3.) Comment ces émotions sont-elles représentées au niveau linguistique ? Nous utilisons les outils et les méthodes fournis par la linguistique de corpus et des analyses du type corpus-based et corpus-driven sont effectuées. Le logiciel TXM est employé pour les concordances et l’analyse des données (Heiden 2010). Le corpus analysé comprend les tweets envoyés dans un délai de quatre semaines par les candidats britanniques aux élections parlementaires européennes en mai 2014. Les résultats de l’analyse indiquent que les types d’émotions les plus représentés sont «bonheur», «tristesse» et «peur» sous la forme d’adjectifs et de noms. Les tweets contenant «bonheur» et «tristesse» sont directement associés aux politiciens, exprimant leurs propres émotions, tandis que «surprise» et «peur » sont présents dans les tweets qui représentent des attaques virulentes contre les opposants.

36

Kuhaneswaran, Banujan, Banage T. G. S. Kumara, and Incheon Paik. "Strengthening Post-Disaster Management Activities by Rating Social Media Corpus." International Journal of Systems and Service-Oriented Engineering 10, no. 1 (January 2020): 34–50. http://dx.doi.org/10.4018/ijssoe.2020010103.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In times of natural disasters such as floods, tsunamis, earthquakes, landslides, etc., people need information so that relief operations such as help can save many lives. The implications of using social media in post-disaster management are explored in the article. The approach has three main parts: (1) extraction, (2) classification, and (3) validation. The results prove that machine learning algorithms are highly reliable in elimination disaster non-related tweets and news posts. The authors strongly believe that their model is more reliable as they are validating the tweets using news posts by providing various ratings according to the trueness.

37

Schneider, Ulrike. "How Trump tweets: A comparative analysis of tweets by US politicians." Research in Corpus Linguistics 9, no. 2 (2021): 34–63. http://dx.doi.org/10.32714/ricl.09.02.03.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This paper analyses tweets sent from Donald Trump’s Twitter account @realDonaldTrump and contextualises them by contrasting them with several genres (i.e. political and ‘average’ Twitter, blogs, expressive writing, novels, The New York Times and natural speech). Taking common claims about Donald Trump’s language as a starting point, the study focusses on commonalities and differences between his tweets and those by other US politicians. Using the sentiment analysis tool Linguistic Inquiry and Word Count (LIWC) and a principal component analysis, I examine a newly compiled 1.5-million-word corpus of tweets sent from US politicians’ accounts between 2009 and 2018 with a special focus on the question whether Trump’s Twitter voice has linguistic features commonly associated with informality, I-talk, negativity and boasting. The results reveal that all political tweets are grammatically comparatively formal and centre around the topics of achievement, money and power. Trump’s tweets stand out, however, because they are both more negative and more positive than the language in other politicians’ tweets, i.e. his Twitter voice relies far more strongly on adjectives and emotional language.

38

Xu, Xiaoyu, Jeroen Gevers, and Luca Rossi. "“Can I write this is ableist AF in a peer review?”: A corpus-driven analysis of Twitter engagement strategies across disciplinary groups." Ibérica, no. 46 (December 15, 2023): 207–36. http://dx.doi.org/10.17398/2340-2784.46.207.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

At a time when scholars are increasingly expected to participate in public knowledge dissemination, social media platforms like Twitter hold great promise for engaging both experts and non-experts. However, it remains unclear in what ways academic tweets are shaped by disciplinary concerns and how this might, in turn, impact audience engagement. Our paper reports an early-stage corpus-driven analysis of 4,000 English tweets from 40 scholars’ Twitter accounts across four disciplinary groups: Arts and Humanities (AH), Social Sciences (SS), Life Sciences (LS), and Physical Sciences (PS). Engagement rates (Tardy, 2023), multimodal elements, tweet types, and interaction markers were quantitatively calculated using corpus and computational methods and qualitatively analysed through close reading. Our findings revealed some disciplinary variation in the corpus: specifically, LS used more multimodal elements than SS on Twitter; SS used fewer interactional markers than LS and PS on Twitter. We further found that LS also has the highest number of threads and the longest threads, often to unfold their multimodal information. Despite being the least multimodal and interactive disciplinary group, SS has the highest engagement rate. Our analysis suggests that explicit evaluation and critique plays an important role in eliciting responses on Twitter, particularly with regard to current social or political issues —a finding that resonates with previous research on science communication and popularization (Orpin, 2019). The findings can be applied in science communication training to raise disciplinary awareness in shaping one’s social media presence.

39

Papaccio, Mara. "Matteo Salvini auf Twitter: eine Analyse ausgewählter sprachlicher, stilistischer und rhetorischer Strategien." Italienisch 44, no. 87 (September 5, 2022): 64–80. http://dx.doi.org/10.24053/ital-2022-0007.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The present article intends to identify some of Matteo Salvini’s rhetoric strategies on the platform Twitter. The corpus consists of more than 1800 tweets, posted during three key months of his political activity in 2018: the last month of the political campaign for the general elections on March 4th; the first month of Giuseppe Conte’s cabinet with Salvini as Minister of the Interior and Deputy Prime Minister; the sixth month of the cabinet, representing a first evaluation of his work in Parliament. The analysis focuses first on Salvini’s use of Twitter’s typical contextualization tools (hashtag, mention and retweet), revealing a preference for hashtags. These are used mostly to create powerful slogans (#primagliitaliani, #dalleparoleaifatti, #chiudiamoiporti etc.) and to quote himself in short tweets (#Salvini is the most common hashtag in the entire corpus). The second part of the article looks at some aspects of Salvini’s language, which includes the use of an informal lexis with traits that are typical for the orality, excluding structures (noi/loro), a simple often paratactic syntax, a strategic use of repetitions (anaphora, accumulatio) and useful typographical means (uppercase letters, punctuation) that underline the emotions of a «ministro normale che fa cose normali» (Matteo Salvini in a tweet from 21st December 2018).

40

Camargo, Jorge E., Vladimir Vargas-Calderon, Nelson Vargas, and Liliana Calderón-Benavides. "Sentiment polarity classification of tweets using a extended dictionary." Inteligencia Artificial 21, no. 62 (September 7, 2018): 1. http://dx.doi.org/10.4114/intartif.vol21iss62pp1-12.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.

41

Brogueira, Gaspar, Fernando Batista, and Joao P. Carvalho. "A Smart System for Twitter Corpus Collection, Management and Visualization." International Journal of Technology and Human Interaction 13, no. 3 (July 2017): 13–32. http://dx.doi.org/10.4018/ijthi.2017070102.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Social networks have become popular and are now becoming an alternate mean of communication, used to share information on various topics, ranging from politics or sports to simple aspects of everyday life. Twitter messages (tweets) are shared in real time and are essentially public, making them a useful source of information for areas such as tourism, marketing, health, and safety. This paper describes an information system that involves the creation and storage of a corpus of tweets, written in European Portuguese and published within the Portuguese territory. The system also involves a REST API that allows access to the stored information, and a web-based dashboard that makes it possible to analyze and visualize indicators concerning the stored data.

42

Baihaqi, Wiga Maulana, Muliasari Pinilih, and Miftakhul Rohmah. "Kombinasi K-Means dan Support Vector Machine (SVM) untuk Memprediksi Unsur Sara pada Tweet." Jurnal Teknologi Informasi dan Ilmu Komputer 7, no. 3 (May 22, 2020): 501. http://dx.doi.org/10.25126/jtiik.2020732126.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

<p class="Abstrak">Tulisan yang disampaikan melalui twitter dinamakan dengan <em>tweets</em> atau dalam bahasa indonesia lebih dikenal dengan kicau, tulisan yang di<em>share</em> memiliki batas maksimum, tulisan tidak boleh lebih dari 140 karakter, karakter disini terdiri dari huruf, angka, dan simbol. Penyalahgunaan dalam berpendapat sering terjadi di media sosial, sering kali pengguna media sosial dengan sadar atau tidak sadar telah membuat konten yang mengandung isu Suku (dalam hal ini menyangkut keturunan), agama, ras (kebangsaan) dan antargolongan (SARA). Perlu adanya analisis yang dapat mengidentifikasi secara otomatis apakah kalimat yang ditulis pada media sosial mengandung unsur SARA atau tidak, akan tetapi korpus tentang kalimat yang mengandung unsur SARA belum ada, selain itu label kalimat yang menandakan kalimat SARA atau bukan tidak ada. Penelitian ini bertujuan untuk membuat <em>corpus</em> kalimat yang mengandung unsur SARA yang didapatkan dari twitter, kemudian melabeli kalimat dengan label mengandung unsur SARA dan tidak, serta melakukan <em>sentiment</em> klasifikasi. Algoritme yang digunakan untuk proses pelabelan adalah k-<em>means</em>, sedangkan <em>Support Vector Machine</em> (SVM) digunakan untuk proses klasifikasi. Hasil yang diperoleh berdasarkan k-<em>means</em> antara lain 118 <em>tweet</em> positif SARA dan 83 <em>tweet</em> negatif SARA. Dalam proses klasifikasi menggunakan dua metode validasi, yaitu 5-<em>fold cross validation</em> yang dibandingkan dengan 10-<em>fold cross validation</em>, hasil akurasi dari kedua metode validasi tersebut yaitu, masing-masing 64,18% dan 63,68%. Berdasarkan hasil akurasi yang diperoleh untuk meningkatkan hasil akurasi, data hasil proses k-<em>means</em> diolah kembali dengan validasi pakar bahasa, hasil yang diperoleh menjadi 139 <em>tweet</em> positif SARA dan 62 <em>tweet</em> negatif SARA, hasil akurasi meningkat menjadi 70,15% dan 71,14%. Dari hasil yang didapatkan, twitter dapat dijadikan sumber untuk membuat <em>corpus</em> mengenai kalimat SARA, dan metode yang diusulkan berhasil untuk proses pelabelan dan sentimen klasifikasi, akan tetapi masih perlu peningkatan hasil akurasi.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>Posts sent via twitter are called tweets or in Indonesian better known as chirping, the posts shared have a maximum limit, the writing cannot be more than 140 characters, the characters here consist of letters, numbers, and symbols. Broadcasting in discussions that often occur on social media, often users of social media consciously or unconsciously have created content that contains issues of ethnicity, religion, race (nationality) and intergroup (SARA). Obtained from the analysis that can automatically contain sentences on social media containing no SARA or not, but the corpus about sentences containing SARA does not yet exist, other than that the sentence label indicates SARA or no sentence. This study aims to make sentence corpus containing SARA elements obtained from twitter, then label sentences with labels containing elements of SARA and not, and conduct group sentiments. The algorithm used for the labeling process is k-means, while Support Vector Machine (SVM) is used for the classification process. The results obtained based on k-means include 118 positive SARA tweets and 83 negative SARA tweets. In the classification process using two validation methods, namely cross-fold validation of 5 times compared with 10-fold cross validation, the accuracy of the two validation methods is 64.18% and 63.68%, respectively. Based on the results obtained to improve the results, the k-means process data were reprocessed with linguists, the results obtained were 139 positive SARA tweets and 62 SARA negative tweets, the results of which increased to 70.15% and 71.14%. From the results obtained, Twitter can be used as a source to create a corpus about SARA sentences, and methods that have succeeded in labeling and classification sentiments, but still need to improve the results of accuracy.<strong></strong></em></p><p class="Abstrak"><em><strong><br /></strong></em></p>

43

Knight, Dawn, Svenja Adolphs, and Ronald Carter. "CANELC: constructing an e-language corpus." Corpora 9, no. 1 (May 2014): 29–56. http://dx.doi.org/10.3366/cor.2014.0050.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC). 3 3 This corpus has been built as part of a collaborative project between the University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, Tweets, discussion board content and private/business e-mails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At present, the annotated corpus is only available to authors and researchers working for CUP and is not more generally available. CANELC is a one-million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, e-mails and Short Message Services (SMS). The paper outlines the approaches used when planning the corpus: obtaining consent, collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used, as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation into how e-language operates in ways that are both similar to and different from spoken and written records of communication (as evidenced by the British National Corpus, BNC).

44

Albanyan, Abdullah, and Eduardo Blanco. "Pinpointing Fine-Grained Relationships between Hateful Tweets and Replies." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10418–26. http://dx.doi.org/10.1609/aaai.v36i10.21284.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Recent studies in the hate and counter hate domain have provided the grounds for investigating how to detect this pervasive content in social media. These studies mostly work with synthetic replies to hateful content written by annotators on demand rather than replies written by real users. We argue that working with naturally occurring replies to hateful content is key to study the problem. Building on this motivation, we create a corpus of 5,652 hateful tweets and replies. We analyze their fine-grained relationships by indicating whether the reply (a) is hate or counter hate speech, (b) provides a justification, (c) attacks the author of the tweet, and (d) adds additional hate. We also present linguistic insights into the language people use depending on these fine-grained relationships. Experimental results show improvements (a) taking into account the hateful tweet in addition to the reply and (b) pretraining with related tasks.

45

Asraoui, Fadi Oukili. "Using the Machine Learning Naive Bayes Algorithms for Sentiment Analysis on Online Product Reviews in the Air of Energy Optimization." E3S Web of Conferences 412 (2023): 01071. http://dx.doi.org/10.1051/e3sconf/202341201071.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The purpose of this study was to explore how consumers perceive two of the leading smartphone brands, Samsung and iPhone, using a corpus of tweets. Our approach involved sifting through the tweets to remove any irrelevant content, followed by a sentiment analysis to gain an overall perspective of how each brand was viewed. Our analysis demonstrated that Samsung received a higher proportion of tweets with negative sentiment as compared to iPhone. Moreover, the most common terms in tweets referring to Samsung reflected negative emotions like “concern,” “issue,” and “trouble,” while tweets about iPhone expressed positive emotions such as “like,” “great,” and “best.” These findings have significant implications for marketing research and offer valuable insights for businesses on how they can utilize social media to enhance their brand reputation and image.

46

Alvi, Arooj. "A Corpus Analysis of Online Education Tweets During Covid-19." Pakistan Social Sciences Review 5, no. III (September 30, 2021): 376–91. http://dx.doi.org/10.35484/pssr.2021(5-iii)28.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

47

Tarrade, Louis, Jean-Philippe Magué, and Jean-Pierre Chevrot. "Detecting and categorising lexical innovations in a corpus of tweets." Psychology of Language and Communication 26, no. 1 (January 1, 2022): 313–29. http://dx.doi.org/10.2478/plc-2022-15.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Abstract In this paper, we present the methodology we have developed for the detection of lexical innovations, implemented here on a corpus of 650 million of French tweets covering a period from 2012 to 2019. Once detected, innovations are categorized as change or buzz according to whether their use has stabilized or dropped over time, and three phases of their dynamics are automatically identified. In order to validate our approach, we further analyse these dynamics by modelling the user network and characterising the speakers using these innovations via network variables. This allows us to propose preliminary observations on the role of individuals in the diffusion process of linguistic innovations which are in line with Milroy & Milroy’s (1997) theories and encourage further investigations.

48

Pascual, Daniel, Pilar Mur-Dueñas, and Rosa Lorés. "Looking into international research groups’ digital discursive practices: Criteria and methodological steps taken towards the compilation of the EUROPRO digital corpus." Research in Corpus Linguistics 8, no. 2 (2020): 87–102. http://dx.doi.org/10.32714/ricl.08.02.05.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The EUROPRO digital corpus was designed by the InterGedi research group, based at the University of Zaragoza (Spain). The main focus of InterGedi is the analysis of the textual resources used by international research groups as part of their dissemination and visibility strategies. The corpus comprises a collection of 30 international research project websites funded by the European Horizon2020 Programme (EUROPROwebs corpus). By looking into their websites, 20 projects were observed to maintain a Twitter account and the tweets from these accounts were the basis for the compilation of the EUROPROtweets corpus. This paper delves into the criteria used for the selection of the research project websites and the methodological steps taken to classify, label and tag the verbal component in these websites and tweets. The paper discusses the challenges in the compilation of the corpus because of the dynamic, hypermodal, and hypermedial nature of the digital texts it contains. The paper closes by underlining the potential uses and applications of EUROPRO in order to gain insights into the digital discursive and professional practices used by international research groups to foster their visibility online.

49

Canhasi, Ercan, and Rexhep Shijaku. "Using Twitter to collect a multi-dialectal corpus of Albanian using advanced geotagging and dialect modeling." PLOS ONE 18, no. 11 (November 27, 2023): e0294284. http://dx.doi.org/10.1371/journal.pone.0294284.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In this study, we present the acquisition and categorization of a geographically-informed, multi-dialectal Albanian National Corpus, derived from Twitter data. The primary dialects from three distinct regions—Albania, Kosovo, and North Macedonia—are considered. The assembled publicly available dataset encompasses anonymized user information, user-generated tweets, auxiliary tweet-related data, and annotations corresponding to dialect categories. Utilizing a highly automated scraping approach, we initially identified over 1,000 Twitter users with discernible locations who actively employ at least one of the targeted Albanian dialects. Subsequent data extraction phases yielded an augmentation of the preliminary dataset with an additional 1,500 Twitterers. The study also explores the application of advanced geotagging techniques to expedite corpus generation. Alongside experimentation with diverse classification methodologies, comprehensive feature engineering and feature selection investigations were conducted. A subjective assessment is conducted using human annotators, which demonstrates that humans achieve significantly lower accuracy rates in comparison to machine learning (ML) models. Our findings indicate that machine learning algorithms are proficient in accurately differentiating various Albanian dialects, even when analyzing individual tweets. A meticulous evaluation of the most salient attributes of top-performing algorithms provides insights into the decision-making mechanisms utilized by these models. Remarkably, our investigation revealed numerous dialectal patterns that, despite being familiar to human annotators, have not been widely acknowledged within the broader scientific community.

50

Haque, Md Enamul, Eddie C. Ling, Aminul Islam, and Mehmet Engin Tozal. "Predicting Domain Specific Personal Attitudes and Sentiment." International Journal of Semantic Computing 14, no. 02 (June 2020): 199–222. http://dx.doi.org/10.1142/s1793351x20400073.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Microblog activity logs are useful to determine user’s interest and sentiment towards specific and broader category of events such as natural disaster and national election. In this paper, we present a corpus model to show how personal attitudes can be predicted from social media or microblog activities for a specific domain of events such as natural disasters. More specifically, given a user’s tweet and an event, the model is used to predict whether the user will be willing to help or show a positive attitude towards that event or similar events in the future. We present a new dataset related to a specific natural disaster event, i.e. Hurricane Harvey, that distinguishes user’s tweets into positive and non-positive attitudes. We build Term Embeddings for Tweet (TEmT) to generate features to model personal attitudes for arbitrary user’s tweets. In addition, we present sentiment analysis on the same disaster event dataset using enhanced feature learning on TEmT generated features by applying Convolutional Neural Network (CNN). Finally, we evaluate the effectiveness of our method by employing multiple classification techniques and comparative methods on the newly created dataset.