Dissertations / Theses: 'Graph, social and multimedia data'

1

Kim, Pilho. "E-model event-based graph data model theory and implementation /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/29608.

Full text

Abstract:

Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2010.
Committee Chair: Madisetti, Vijay; Committee Member: Jayant, Nikil; Committee Member: Lee, Chin-Hui; Committee Member: Ramachandran, Umakishore; Committee Member: Yalamanchili, Sudhakar. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Guan. "Graph-Based Approach on Social Data Mining." Thesis, University of Illinois at Chicago, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3668648.

Full text

Abstract:

Powered by big data infrastructures, social network platforms are gathering data on many aspects of our daily lives. The online social world is reflecting our physical world in an increasingly detailed way by collecting people's individual biographies and their various of relationships with other people. Although massive amount of social data has been gathered, an urgent challenge remain unsolved, which is to discover meaningful knowledge that can empower the social platforms to really understand their users from different perspectives.

Motivated by this trend, my research addresses the reasoning and mathematical modeling behind interesting phenomena on social networks. Proposing graph based data mining framework regarding to heterogeneous data sources is the major goal of my research. The algorithms, by design, utilize graph structure with heterogeneous link and node features to creatively represent social networks' basic structures and phenomena on top of them.

The graph based heterogeneous mining methodology is proved to be effective on a series of knowledge discovery topics, including network structure and macro social pattern mining such as magnet community detection (87), social influence propagation and social similarity mining (85), and spam detection (86). The future work is to consider dynamic relation on social data mining and how graph based approaches adapt from the new situations.

APA, Harvard, Vancouver, ISO, and other styles

3

Wong, León Kevin, and Valdivia Diego Eduardo Antonio Rodríguez. "Distributed Social Media System - Multimedia Data Linkage." Bachelor's thesis, Universidad Peruana de Ciencias Aplicadas (UPC), 2014. http://hdl.handle.net/10757/324525.

Full text

Abstract:

Actualmente, las redes sociales en línea son uno de los principales medios donde se intercambia gran cantidad de información. En estas, los usuarios intentan reflejar su actividad diaria en forma de publicaciones en sus muros o de otros usuarios. Asimismo, las imágenes representan gran parte de la información sobre la actividad del usuario, por ejemplo, una foto en donde esté etiquetado. Estas interacciones del usuario en las redes ayudan a generar su identidad digital. La información revelada por la metadata de las imágenes enriquece este perfil y contribuye a mejorar los resultados en procesos como minería de datos, marketing, etc. El objetivo de este proyecto es generar un perfil digital en base a la información y actividad que contribuye un usuario a una red social, recopilando y mostrando explícitamente varios hechos que se revelan aprovechando la metadata de las imágenes y el factor temporal de la actividad en línea. Esto incluye el proceso de extracción, enriquecimiento y encapsulación de data en un modelo ontológico propuesto. Los resultados de los experimentos muestran que la información en el perfil, luego del enriquecimiento, es aproximadamente cuatro veces la información inicial, y la precisión de la nueva información está por encima del 75%. Trabajos futuros se inclinan hacia la detección del tipo de relación que existe entre una persona y uno de sus contactos. Asimismo, otro tema relevante a explorar incluye la extracción de un mayor rango de entidades, tales como eventos o temas de interés de un individuo, con el fin de mejorar el perfil digital del usuario. Finalmente, la minería de datos en el proceso de extracción de información ayudaría a enfocar mejor el marketing a los usuarios de redes sociales ya que dicha publicidad podría hacerse más personalizada. Palabras clave Linked data, información multimedia, perfil digital, redes sociales, metadata
Tesis

APA, Harvard, Vancouver, ISO, and other styles

4

Bracamonte, Nole Teresa Jacqueline. "Improving web multimedia information retrieval using social data." Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168681.

Full text

Abstract:

Tesis para optar al grado de Doctora en Ciencias, Mención Computación
Buscar contenido multimedia es una de las tareas más comunes que los usuarios realizan en la Web. Actualmente, los motores de búsqueda en la Web han mejorado la precisión de sus búsquedas de contenido multimedia y ahora brindan una mejor experiencia de usuarios. Sin embargo, estos motores aún no logran obtener resultados precisos para consultas que no son comunes, y consultas que se refieren a conceptos abstractos. En ambos escenarios, la razón principal es la falta de información preliminar. Esta tesis se enfoca en mejorar la recuperación de información multimedia en la Web usando datos generados a partir de la interacción entre usuarios y recursos multimedia. Para eso, se propone mejorar la recuperación de información multimedia desde dos perspectivas: (1) extrayendo conceptos relevantes a los recursos multimedia, y (2) mejorando las descripciones multimedia con datos generados por el usuario. En ambos casos, proponemos sistemas que funcionan independientemente del tipo de multimedia, y del idioma de los datos de entrada. En cuanto a la identificación de conceptos relacionados a objetos multimedia, desarrollamos un sistema que va desde los resultados de búsqueda específicos de la consulta hasta los conceptos detectados para dicha consulta. Nuestro enfoque demuestra que podemos aprovechar la vista parcial de una gran colección de documentos multimedia para detectar conceptos relevantes para una consulta determinada. Además, diseñamos una evaluación basada en usuarios que demuestra que nuestro algoritmo de detección de conceptos es más sólido que otros enfoques similares basados en detección de comunidades. Para mejorar la descripción multimedia, desarrollamos un sistema que combina contenido audio-visual de documentos multimedia con información de su contexto para mejorar y generar nuevas anotaciones para los documentos multimedia. Específicamente, extraemos datos de clicks de los registros de consultas y usamos las consultas como sustitutos para las anotaciones manuales. Tras una primera inspección, demostramos que las consultas proporcionan una descripción concisa de los documentos multimedia. El objetivo principal de esta tesis es demostrar la relevancia del contexto asociado a documentos multimedia para mejorar el proceso de recuperación de documentos multimedia en la Web. Además, mostramos que los grafos proporcionan una forma natural de modelar problemas multimedia.
Fondef D09I-1185, CONICYT-PCHA/Doctorado Nacional/2013-63130260, Apoyo a estadías corta de la Escuela de Postgrado de la U. de Chile, y el Núcleo Milenio CIWS

APA, Harvard, Vancouver, ISO, and other styles

5

Hassanzadeh, Reza. "Anomaly detection in online social networks : using data-mining techniques and fuzzy logic." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/78679/1/Reza_Hassanzadeh_Thesis.pdf.

Full text

Abstract:

This research is a step forward in improving the accuracy of detecting anomaly in a data graph representing connectivity between people in an online social network. The proposed hybrid methods are based on fuzzy machine learning techniques utilising different types of structural input features. The methods are presented within a multi-layered framework which provides the full requirements needed for finding anomalies in data graphs generated from online social networks, including data modelling and analysis, labelling, and evaluation.

APA, Harvard, Vancouver, ISO, and other styles

6

Maryokhin, Tymur. "Data dissemination in large-cardinality social graphs." Thesis, Linnéuniversitetet, Institutionen för datavetenskap (DV), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-48268.

Full text

Abstract:

Near real-time event streams are a key feature in many popular social media applications. These types of applications allow users to selectively follow event streams to receive a curated list of real-time events from various sources. Due to the emphasis on recency, relevance, personalization of content, and the highly variable cardinality of social subgraphs, it is extremely difficult to implement feed following at the scale of major social media applications. This leads to multiple architectural approaches, but no consensus has been reached as to what is considered to be an idiomatic solution. As of today, there are various theoretical approaches exploiting the dynamic nature of social graphs, but not all of them have been applied in practice. In this paper, large-cardinality graphs are placed in the context of existing research to highlight the exceptional data management challenges that are posed for large-scale real-time social media applications. This work outlines the key characteristics of data dissemination in large-cardinality social graphs, and overviews existing research and state-of-the-art approaches in industry, with the goal of stimulating further research in this direction.

APA, Harvard, Vancouver, ISO, and other styles

7

Casas, Roma Jordi. "Privacy-preserving and data utility in graph mining." Doctoral thesis, Universitat Autònoma de Barcelona, 2014. http://hdl.handle.net/10803/285566.

Full text

Abstract:

En los últimos años, ha sido puesto a disposición del público una gran cantidad de los datos con formato de grafo. Incrustado en estos datos hay información privada acerca de los usuarios que aparecen en ella. Por lo tanto, los propietarios de datos deben respetar la privacidad de los usuarios antes de liberar los conjuntos de datos a terceros. En este escenario, los procesos de anonimización se convierten en un proceso muy importante. Sin embargo, los procesos de anonimización introducen, generalmente, algún tipo de ruido en los datos anónimos y también en sus resultados en procesos de minería de datos. Generalmente, cuanto mayor la privacidad, mayor será el ruido. Por lo tanto, la utilidad de los datos es un factor importante a tener en cuenta en los procesos de anonimización. El equilibrio necesario entre la privacidad de datos y utilidad de éstos puede mejorar mediante el uso de medidas y métricas para guiar el proceso de anonimización, de tal forma que se minimice la pérdida de información. En esta tesis hemos trabajo los campos de la preservación de la privacidad del usuario en las redes sociales y la utilidad y calidad de los datos publicados. Un compromiso entre ambos campos es un punto crítico para lograr buenos métodos de anonimato, que permitan mejorar los posteriores procesos de minería de datos. Parte de esta tesis se ha centrado en la utilidad de los datos y la pérdida de información. En primer lugar, se ha estudiado la relación entre las medidas de pérdida de información genéricas y las específicas basadas en clustering, con el fin de evaluar si las medidas genéricas de pérdida de información son indicativas de la utilidad de los datos para los procesos de minería de datos posteriores. Hemos encontrado una fuerte correlación entre algunas medidas genéricas de pérdida de información (average distance, betweenness centrality, closeness centrality, edge intersection, clustering coefficient y transitivity) y el índice de precisión en los resultados de varios algoritmos de clustering, lo que demuestra que estas medidas son capaces de predecir el perturbación introducida en los datos anónimos. En segundo lugar, se han presentado dos medidas para reducir la pérdida de información en los procesos de modificación de grafos. La primera, Edge neighbourhood centrality, se basa en el flujo de información de a través de la vecindad a distancia 1 de una arista específica. El segundo se basa en el core number sequence y permite conservar mejor la estructura subyacente, mejorando la utilidad de los datos. Hemos demostrado que ambos métodos son capaces de preservar las aristas más importantes del grafo, manteniendo mejor las propiedades básicas estructurales y espectrales. El otro tema importante de esta tesis ha sido los métodos de preservación de la privacidad. Hemos presentado nuestro algoritmo de base aleatoria, que utiliza el concepto de Edge neighbourhood centrality para guiar el proceso de modificación preservando los bordes más importantes del grafo, logrando una menor pérdida de información y una mayor utilidad de los datos. Por último, se han desarrollado dos algoritmos diferentes para el k-anonimato en los grafos. En primer lugar, se ha presentado un algoritmo basado en la computación evolutiva. Aunque este método nos permite cumplir el nivel de privacidad deseado, presenta dos inconvenientes: la pérdida de información es bastante grande en algunas propiedades estructurales del grafo y no es lo suficientemente rápido para trabajar con grandes redes. Por lo tanto, un segundo algoritmo se ha presentado, que utiliza el micro-agregación univariante para anonimizar la secuencia de grados. Este método es cuasi-óptimo y se traduce en una menor pérdida de información y una mejor utilidad de los datos.
In recent years, an explosive increase of graph-formatted data has been made publicly available. Embedded within this data there is private information about users who appear in it. Therefore, data owners must respect the privacy of users before releasing datasets to third parties. In this scenario, anonymization processes become an important concern. However, anonymization processes usually introduce some kind of noise in the anonymous data, altering the data and also their results on graph mining processes. Generally, the higher the privacy, the larger the noise. Thus, data utility is an important factor to consider in anonymization processes. The necessary trade-off between data privacy and data utility can be reached by using measures and metrics to lead the anonymization process to minimize the information loss, and therefore, to maximize the data utility. In this thesis we have covered the fields of user's privacy-preserving in social networks and the utility and quality of the released data. A trade-off between both fields is a critical point to achieve good anonymization methods for the subsequent graph mining processes. Part of this thesis has focused on data utility and information loss. Firstly, we have studied the relation between the generic information loss measures and the clustering-specific ones, in order to evaluate whether the generic information loss measures are indicative of the usefulness of the data for subsequent data mining processes. We have found strong correlation between some generic information loss measures (average distance, betweenness centrality, closeness centrality, edge intersection, clustering coefficient and transitivity) and the precision index over the results of several clustering algorithms, demonstrating that these measures are able to predict the perturbation introduced in anonymous data. Secondly, two measures to reduce the information loss on graph modification processes have been presented. The first one, Edge neighbourhood centrality, is based on information flow throw 1-neighbourhood of a specific edge in the graph. The second one is based on the core number sequence and it preserves better the underlying graph structure, retaining more data utility. By an extensive experimental set up, we have demonstrated that both methods are able to preserve the most important edges in the network, keeping the basic structural and spectral properties close to the original ones. The other important topic of this thesis has been privacy-preserving methods. We have presented our random-based algorithm, which utilizes the concept of Edge neighbourhood centrality to drive the edge modification process to better preserve the most important edges in the graph, achieving lower information loss and higher data utility on the released data. Our method obtains a better trade-off between data utility and data privacy than other methods. Finally, two different approaches for k-degree anonymity on graphs have been developed. First, an algorithm based on evolutionary computing has been presented and tested on different small and medium real networks. Although this method allows us to fulfil the desired privacy level, it presents two main drawbacks: the information loss is quite large in some graph structural properties and it is not fast enough to work with large networks. Therefore, a second algorithm has been presented, which uses the univariate micro-aggregation to anonymize the degree sequence and reduce the distance from the original one. This method is quasi-optimal and it results in lower information loss and better data utility.

APA, Harvard, Vancouver, ISO, and other styles

8

Rossi, Maria. "Graph Mining for Influence Maximization in Social Networks." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX083/document.

Full text

Abstract:

La science moderne des graphes est apparue ces dernières années comme un domaine d'intérêt et a apporté des progrès significatifs à notre connaissance des réseaux. Jusqu'à récemment, les algorithmes d'exploration de données existants étaient destinés à des données structurées / relationnelles, alors que de nombreux ensembles de données nécessitent une représentation graphique, comme les réseaux sociaux, les réseaux générés par des données textuelles, les structures protéiques 3D ou encore les composés chimiques. Il est donc crucial de pouvoir extraire des informations pertinantes à partir de ce type de données et, pour ce faire, les méthodes d'extraction et d'analyse des graphiques ont été prouvées essentielles.L'objectif de cette thèse est d'étudier les problèmes dans le domaine de la fouille de graphes axés en particulier sur la conception de nouveaux algorithmes et d'outils liés à la diffusion d'informations et plus spécifiquement sur la façon de localiser des entités influentes dans des réseaux réels. Cette tâche est cruciale dans de nombreuses applications telles que la diffusion de l'information, les contrôles épidémiologiques et le marketing viral.Dans la première partie de la thèse, nous avons étudié les processus de diffusion dans les réseaux sociaux ciblant la recherche de caractéristiques topologiques classant les entités du réseau en fonction de leurs capacités influentes. Nous nous sommes spécifiquement concentrés sur la décomposition K-truss qui est une extension de la décomposition k-core. On a montré que les noeuds qui appartiennent au sous-graphe induit par le maximal K-truss présenteront de meilleurs proprietés de propagation par rapport aux critères de référence. De tels épandeurs ont la capacité non seulement d'influencer une plus grande partie du réseau au cours des premières étapes d'un processus d'étalement, mais aussi de contaminer une plus grande partie des noeuds.Dans la deuxième partie de la thèse, nous nous sommes concentrés sur l'identification d'un groupe de noeuds qui, en agissant ensemble, maximisent le nombre attendu de nœuds influencés à la fin du processus de propagation, formellement appelé Influence Maximization (IM). Le problème IM étant NP-hard, il existe des algorithmes efficaces garantissant l’approximation de ses solutions. Comme ces garanties proposent une approximation gloutonne qui est coûteuse en termes de temps de calcul, nous avons proposé l'algorithme MATI qui réussit à localiser le groupe d'utilisateurs qui maximise l'influence, tout en étant évolutif. L'algorithme profite des chemins possibles créés dans le voisinage de chaque nœud et précalcule l'influence potentielle de chaque nœud permettant ainsi de produire des résultats concurrentiels, comparés à ceux des algorithmes classiques.Finallement, nous étudions le point de vue de la confidentialité quant au partage de ces bons indicateurs d’influence dans un réseau social. Nous nous sommes concentrés sur la conception d'un algorithme efficace, correct, sécurisé et de protection de la vie privée, qui résout le problème du calcul de la métrique k-core qui mesure l'influence de chaque noeud du réseau. Nous avons spécifiquement adopté une approche de décentralisation dans laquelle le réseau social est considéré comme un système Peer-to-peer (P2P). L'algorithme est construit de telle sorte qu'il ne devrait pas être possible pour un nœud de reconstituer partiellement ou entièrement le graphe en utilisant les informations obtiennues lors de son exécution. Notre contribution est un algorithme incrémental qui résout efficacement le problème de maintenance de core en P2P tout en limitant le nombre de messages échangés et les calculs. Nous fournissons également une étude de sécurité et de confidentialité de la solution concernant la désanonymisation des réseaux, nous montrons ainsi la rélation avec les strategies d’attaque précédemment definies tout en discutant les contres-mesures adaptés
Modern science of graphs has emerged the last few years as a field of interest and has been bringing significant advances to our knowledge about networks. Until recently the existing data mining algorithms were destined for structured/relational data while many datasets exist that require graph representation such as social networks, networks generated by textual data, 3D protein structures and chemical compounds. It has become therefore of crucial importance to be able to extract meaningful information from that kind of data and towards this end graph mining and analysis methods have been proven essential. The goal of this thesis is to study problems in the area of graph mining focusing especially on designing new algorithms and tools related to information spreading and specifically on how to locate influential entities in real-world networks. This task is crucial in many applications such as information diffusion, epidemic control and viral marketing. In the first part of the thesis, we have studied spreading processes in social networks focusing on finding topological characteristics that rank entities in the network based on their influential capabilities. We have specifically focused on the K-truss decomposition which is an extension of the core decomposition of the graph. Extensive experimental analysis showed that the nodes that belong to the maximal K-truss subgraph show a better spreading behavior when compared to baseline criteria. Such spreaders can influence a greater part of the network during the first steps of a spreading process but also the total fraction of the influenced nodes at the end of the epidemic is greater. We have also observed that node members of such dense subgraphs are those achieving the optimal spreading in the network.In the second part of the thesis, we focused on identifying a group of nodes that by acting all together maximize the expected number of influenced nodes at the end of the spreading process, formally called Influence Maximization (IM). The IM problem is actually NP-hard though there exist approximation guarantees for efficient algorithms that can solve the problem while obtaining a solution within the 63% of optimal classes of models. As those guarantees propose a greedy approximation which is computationally expensive especially for large graphs, we proposed the MATI algorithm which succeeds in locating the group of users that maximize the influence while also being scalable. The algorithm takes advantage the possible paths created in each node’s neighborhood to precalculate each node’s potential influence and produces competitive results in quality compared to those of baseline algorithms such as the Greedy, LDAG and SimPath. In the last part of the thesis, we study the privacy point of view of sharing such metrics that are good influential indicators in a social network. We have focused on designing an algorithm that addresses the problem of computing through an efficient, correct, secure, and privacy-preserving algorithm the k-core metric which measures the influence of each node of the network. We have specifically adopted a decentralization approach where the social network is considered as a Peer-to-peer (P2P) system. The algorithm is built based on the constraint that it should not be possible for a node to reconstruct partially or entirely the graph using the information they obtain during its execution. While a distributed algorithm that computes the nodes’ coreness is already proposed, dynamic networks are not taken into account. Our main contribution is an incremental algorithm that efficiently solves the core maintenance problem in P2P while limiting the number of messages exchanged and computations. We provide a security and privacy analysis of the solution regarding network de-anonimization and show how it relates to previously defined attacks models and discuss countermeasures

APA, Harvard, Vancouver, ISO, and other styles

9

Zulfiqar, Omer. "Detecting Public Transit Service Disruptions Using Social Media Mining and Graph Convolution." Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/103745.

Full text

Abstract:

In recent years we have seen an increase in the number of public transit service disruptions due to aging infrastructure, system failures and the regular need for maintenance. With the fleeting growth in the usage of these transit networks there has been an increase in the need for the timely detection of such disruptions. Any types of disruptions in these transit networks can lead to delays which can have major implications on the daily passengers. Most current disruption detection systems either do not operate in real-time or lack transit network coverage. The theme of this thesis was to leverage Twitter data to help in earlier detection of service disruptions. This work involves developing a pure Data Mining approach and a couple different approaches that use Graph Neural Networks to identify transit disruption related information in Tweets from a live Twitter stream related to the Washington Metropolitan Area Transit Authority (WMATA) metro system. After developing three different models, a Dynamic Query Expansion model, a Tweet-GCN and a Tweet-Level GCN to represent the data corpus we performed various experiments and benchmark evaluations against other existing baseline models, to justify the efficacy of our approaches. After seeing astounding results across both the Tweet-GCN and Tweet-Level GCN, with an average accuracy of approximately 87.3% and 89.9% we can conclude that not only are these two graph neural models superior for basic NLP text classification, but they also outperform other models in identifying transit disruptions.
Master of Science
Millions of people worldwide rely on public transit networks for their daily commutes and day to day movements. With the growth in the number of people using the service, there has been an increase in the number of daily passengers affected by service disruptions. This thesis and research involves proposing and developing three different approaches to help aid in the timely detection of these disruptions. In this work we have developed a pure data mining approach along with two deep learning models using neural networks and live data from Twitter to identify these disruptions. The data mining approach uses a set of dirsuption related input keywords to identify similar keywords within the live Twitter data. By collecting historical data we were able to create deep learning models that represent the vocabulary from the disruptions related Tweets in the form of a graph. A graph is a collection of data values where the data points are connected to one another based on their relationships. A longer chain of connection between two words defines a weak relationship, a shorter chain defines a stronger relationship. In our graph, words with similar contextual meanings are connected to each other over shorter distances, compared to words with different meanings. At the end we use a neural network as a classifier to scan this graph to learn the semantic relationships within our data. Afterwards, this learned information can be used to accurately classify the disruption related Tweets within a pool of random Tweets. Once all the proposed approaches have been developed, a benchmark evaluation is performed against other existing text classification techniques, to justify the effectiveness of the approaches. The final results indicate that the proposed graph based models achieved a higher accuracy, compared to the data mining model, and also outperformed all the other baseline models. Our Tweet-Level GCN had the highest accuracy of 89.9%.

APA, Harvard, Vancouver, ISO, and other styles

10

Dos, Santos Raimundo Fonseca Jr. "Effective Methods of Semantic Analysis in Spatial Contexts." Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/49697.

Full text

Abstract:

With the growing spread of spatial data, exploratory analysis has gained a considerable amount of attention. Particularly in the fields of Information Retrieval and Data Mining, the integration of data points helps uncover interesting patterns not always visible to the naked eye. Social networks often link entities that share places and activities; marketing tools target users based on behavior and preferences; and medical technology combines symptoms to categorize diseases. Many of the current approaches in this field of research depend on semantic analysis, which is good for inferencing and decision making. From a functional point of view, objects can be investigated from a spatial and temporal perspectives. The former attempts to verify how proximity makes the objects related; the latter adds a measure of coherence by enforcing time ordering. This type of spatio-temporal reasoning examines several aspects of semantic analysis and their characteristics: shared relationships among objects, matches versus mismatches of values, distances among parents and children, and bruteforce comparison of attributes. Most of these approaches suffer from the pitfalls of disparate data, often missing true relationships, failing to deal with inexact vocabularies, ignoring missing values, and poorly handling multiple attributes. In addition, the vast majority does not consider the spatio-temporal aspects of the data. This research studies semantic techniques of data analysis in spatial contexts. The proposed solutions represent different methods on how to relate spatial entities or sequences of entities. They are able to identify relationships that are not explicitly written down. Major contributions of this research include (1) a framework that computes a numerical entity similarity, denoted a semantic footprint, composed of spatial, dimensional, and ontological facets; (2) a semantic approach that translates categorical data into a numerical score, which permits ranking and ordering; (3) an extensive study of GML as a representative spatial structure of how semantic analysis methods are influenced by its approaches to storage, querying, and parsing; (4) a method to find spatial regions of high entity density based on a clustering coefficient; (5) a ranking strategy based on connectivity strength which differentiates important relationships from less relevant ones; (6) a distance measure between entity sequences that quantifies the most related streams of information; (7) three distance-based measures (one probabilistic, one based on spatial influence, and one that is spatiological) that quantifies the interactions among entities and events; (8) a spatio-temporal method to compute the coherence of a data sequence.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

11

Cimenler, Oguz. "Social Network Analysis of Researchers' Communication and Collaborative Networks Using Self-reported Data." Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5201.

Full text

Abstract:

This research seeks an answer to the following question: what is the relationship between the structure of researchers' communication network and the structure of their collaborative output networks (e.g. co-authored publications, joint grant proposals, and joint patent applications), and the impact of these structures on their citation performance and the volume of collaborative research outputs? Three complementary studies are performed to answer this main question as discussed below. 1. Study I: A frequently used output to measure scientific (or research) collaboration is co-authorship in scholarly publications. Less frequently used are joint grant proposals and patents. Many scholars believe that co-authorship as the sole measure of research collaboration is insufficient because collaboration between researchers might not result in co-authorship. Collaborations involve informal communication (i.e., conversational exchange) between researchers. Using self-reports from 100 tenured/tenure-track faculty in the College of Engineering at the University of South Florida, researchers' networks are constructed from their communication relations and collaborations in three areas: joint publications, joint grant proposals, and joint patents. The data collection: 1) provides a rich data set of both researchers' in-progress and completed collaborative outputs, 2) yields a rating from the researchers on the importance of a tie to them 3) obtains multiple types of ties between researchers allowing for the comparison of their multiple networks. Exponential Random Graph Model (ERGM) results show that the more communication researchers have the more likely they produce collaborative outputs. Furthermore, the impact of four demographic attributes: gender, race, department affiliation, and spatial proximity on collaborative output relations is tested. The results indicate that grant proposals are submitted with mixed gender teams in the college of engineering. Besides, the same race researchers are more likely to publish together. The demographics do not have an additional leverage on joint patents. 2. Study II: Previous research shows that researchers' social network metrics obtained from a collaborative output network (e.g., joint publications or co-authorship network) impact their performance determined by g-index. This study uses a richer dataset to show that a scholar's performance should be considered with respect to position in multiple networks. Previous research using only the network of researchers' joint publications shows that a researcher's distinct connections to other researchers (i.e., degree centrality), a researcher's number of repeated collaborative outputs (i.e., average tie strength), and a researchers' redundant connections to a group of researchers who are themselves well-connected (i.e., efficiency coefficient) has a positive impact on the researchers' performance, while a researcher's tendency to connect with other researchers who are themselves well-connected (i.e., eigenvector centrality) had a negative impact on the researchers' performance. The findings of this study are similar except that eigenvector centrality has a positive impact on the performance of scholars. Moreover, the results demonstrate that a researcher's tendency towards dense local neighborhoods (as measured by the local clustering coefficient) and the researchers' demographic attributes such as gender should also be considered when investigating the impact of the social network metrics on the performance of researchers. 3. Study III: This study investigates to what extent researchers' interactions in the early stage of their collaborative network activities impact the number of collaborative outputs produced (e.g., joint publications, joint grant proposals, and joint patents). Path models using the Partial Least Squares (PLS) method are run to test the extent to which researchers' individual innovativeness, as determined by the specific indicators obtained from their interactions in the early stage of their collaborative network activities, impacts the number of collaborative outputs they produced taking into account the tie strength of a researcher to other conversational partners (TS). Within a college of engineering, it is found that researchers' individual innovativeness positively impacts the volume of their collaborative outputs. It is observed that TS positively impacts researchers' individual innovativeness, whereas TS negatively impacts researchers' volume of collaborative outputs. Furthermore, TS negatively impacts the relationship between researchers' individual innovativeness and the volume of their collaborative outputs, which is consistent with `Strength of Weak Ties' Theory. The results of this study contribute to the literature regarding the transformation of tacit knowledge into explicit knowledge in a university context.

APA, Harvard, Vancouver, ISO, and other styles

12

Fang, Chunsheng. "Novel Frameworks for Mining Heterogeneous and Dynamic Networks." University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1321369978.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Ruan, Yiye. "Joint Dynamic Online Social Network Analytics Using Network, Content and User Characteristics." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1420765022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Green, Oded. "High performance computing for irregular algorithms and applications with an emphasis on big data analytics." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/51860.

Full text

Abstract:

Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.

APA, Harvard, Vancouver, ISO, and other styles

15

Anderson, Paul. "GeoS: A Service for the Management of Geo-Social Information in a Distributed System." Scholar Commons, 2010. https://scholarcommons.usf.edu/etd/1561.

Full text

Abstract:

Applications and services that take advantage of social data usually infer social relationships using information produced only within their own context, using a greatly simplified representation of users' social data. We propose to combine social information from multiple sources into a directed and weighted social multigraph in order to enable novel socially-aware applications and services. We present GeoS, a geo-social data management service which implements a representative set of social inferences and can run on a decentralized system. We demonstrate GeoS' potential for social applications on a collection of social data that combines collocation information and Facebook friendship declarations from 100 students. We demonstrate its performance by testing it both on PlanetLab and a LAN with a realistic workload for a 1000 node graph.

APA, Harvard, Vancouver, ISO, and other styles

16

Giannini, Andrea. "Social Network Analysis: Architettura Streaming Big Data di Raccolta e Analisi Dati da Twitter." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25378/.

Full text

Abstract:

Negli ultimi anni i social media, come ad esempio Facebook, Twitter, WhatsApp, YouTube, si sono diffusi a macchia d'olio. Ormai quasi tutti accedono giornalmente su almeno uno di questi per informarsi, esprimere opinioni e interagire con altri utenti. Per questa ragione sono diventati fondamentali per i reparti marketing delle aziende essendo non solo un ottimo canale di comunicazione, ma anche una fonte di informazioni sui clienti e potenziali tali. La tesi si focalizza proprio su quest'ultimo aspetto. Il progetto Social Network Analysis (SNA) vuole essere infatti uno strumento attraverso il quale è possibile visionare e analizzare per intero le reti di interazione tra utenti. Ci si è posti l'obiettivo di realizzare SNA in modo che raccogliesse e si aggiornasse in tempo reale, così da essere sempre al passo con le ultime novità, data la dinamicità delle informazioni all'interno dei social media. Un progetto come SNA comporta dover affrontare diversi ostacoli. Oltre a quello di riuscire a realizzare un'architettura che accolga un flusso continuo di informazioni, uno degli ostacoli più importanti è quello di gestire la grande mole di dati. Per farlo ci si è affidati ad un'architettura distribuita e facilmente scalabile che comprende l'uso di elaborazioni in cluster, di funzioni serverless e di database NoSQL approvvigionati attraverso il servizio cloud di Microsoft, Azure. In questa tesi SNA è stato progettato e implementato basandosi su Twitter, ma è possibile sfruttare la stessa idea su tanti altri social media.

APA, Harvard, Vancouver, ISO, and other styles

17

Charbey, Raphaël. "Sociabilités en ligne, usages et réseaux." Thesis, Paris, ENST, 2018. http://www.theses.fr/2018ENST0049/document.

Full text

Abstract:

Avec l’avènement du numérique, il est désormais possible aux chercheurs d’amasser des grandes quantités de données et les plateformes de réseaux sociaux en ligne ne font pas exception à cela. Les sociologues, comme d’autres, se sont emparés de ces nouvelles ressources afin de poursuivre leurs enquêtes sur les modalités de l’interaction entre individus et leur impact sur la structuration de la sociabilité. Suivant cette voie, ce travail de thèse vise à l’analyse d’un grand nombre de comptes Facebook, aussi bien au travers des outils classiques de l’analyse de données que de la théorie des graphes, à laquelle des contributions méthodologiques sont apportées. Deux facteurs principaux encouragent l’étude de l’activité et de la sociabilité en ligne. D’une part, le temps important dédié à cette plateforme par de nombreux internautes justifie l’intérêt porté par les sociologues aux échanges qui s’y construisent. Par ailleurs, et contrairement à ce que l’on peut observer sur d’autre sites de réseaux sociaux en ligne, les liens entre individus sur Facebook sont proches de ceux hors-lignes. Dans un premier temps, la thèse s’évertue à démêler les multiples facettes de ce à quoi ”être sur Facebook” correspond. Distribués autour de pratiques normatives fabulées, les usages de nos enquêtés fluctuent au gré de leur appropriation ou non des composantes de l’importante variété de moyens de communication proposés par la plateforme. Ces usages, comme on le verra, sont ainsi différemment adoptés selon les catégories socioprofessionnelles et influent par ailleurs sur les modalités d’échanges et d’interactions des enquêtés avec leurs amis en ligne. Ces modalités sont également explorées dans ce travail, tout comme le rôle du conjoint et sa place dans la structure relationnelle. La seconde partie de la thèse se propose de construire une typologie de ces structures relationnelles dites égocentrées, c’est-à-dire depuis le point de vue de l’enquêté. Cette typologie des réseaux de sociabilité en ligne se base sur l’énumération de leurs sous-graphes induits, les graphlets, initialement développée par des chercheurs en bioinformatique. Cette approche offre une vision méso (entre micro et macro) des réseaux, propice à souligner des phénomènes inédits de sociologie des réseaux. A fort potentiel pluri-disciplinaire, la méthodologie graphlets elle même est également discutée et explorée
With the digital advent, it is now possible for researchers to collect important amounts of data and online social network platforms are surely part of it. Sociologists, among others, seized those new resources to investigate over interaction modalities between individuals as well as their impact on the structure of sociability. Following this lead, this thesis work aims at analyzing a large number of Facebook accounts, through data analysis and graph theory classical tools, and to bring methodological contributions. Two main factors encourage to study Facebook social activities. On one hand, the importance of time spent on this platform by many Internet users justifies by itself the sociologists interest. On the other, and contrarily to what we observe on other social network websites, ties between individuals are similar to the ones that appear offline. First, the thesis proposes to detangle the multiple meanings that are behind the fact of ”being on Facebook”. The uses of our surveyed are not compacted in fantasized normative practices but vary depending on how they appropriate the different composers of the platform tools. These uses, as we will see it, do not concern all the socioprofessional categories in the same way and they also influence how the respondents interact with their online friends. The manuscript also explores these interactions, as well as the lover role into the relational structure. Second part of the thesis builds a typology of these relational structures. They are said as egocentred, which means that they are taken from the perspective of the respondent. This typology of social networks is based on their graphlet counts, that are the number of times each type of subnetwork appear in them. This approach offers a meso perspective (between micro and macro), that is propitious to underline some new social phenomena. With a high pluri-disciplinary potential, the graphlet methodology is also discussed and explored itself

APA, Harvard, Vancouver, ISO, and other styles

18

Gabardo, Ademir cristiano. "A heuristic to detect community structures in dynamic complex networks." Universidade Tecnológica Federal do Paraná, 2014. http://repositorio.utfpr.edu.br/jspui/handle/1/970.

Full text

Abstract:

Complex networks are ubiquitous; billions of people are connected through social networks; there is an equally large number of telecommunication users and devices generating implicit complex networks. Furthermore, several structures can be represented as complex networks in nature, genetic data, social behavior, financial transactions and many other structures. Most of these complex networks present communities in their structure. Unveiling these communities is highly relevant in many fields of study. However, depending on several factors, the discover of these communities can be computationally intensive. Several algorithms for detecting communities in complex networks have been introduced over time. We will approach some of them. Our goal in this work is to identify or create an understandable and applicable heuristic to detect communities in complex networks, with a focus on time repetitions and strength measures. This work proposes a semi-supervised clustering approach as a modification of the traditional K-means algorithm submitting each dimension of data to a weight in order to obtain a weighted clustering method. As a first case study, databases of companies that have participated in public bids in Paraná state, will be analyzed to detect communities that can suggest structures such as cartels. As a second case study, the same methodology will be used to analyze datasets of microarray data for gene expressions, representing the correlation of the genes through a complex network, applying community detection algorithms in order to witness such correlations between genes.

APA, Harvard, Vancouver, ISO, and other styles

19

Kourtellis, Nicolas. "On the Design of Socially-Aware Distributed Systems." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4107.

Full text

Abstract:

Social media services and applications enable billions of users to share an unprecedented amount of social information, which is further augmented by location and collocation information from mobile phones, and can be aggregated to provide an accurate digital representation of the social world. This dissertation argues that extracted social knowledge from this wealth of information can be embedded in the design of novel distributed, socially-aware applications and services, consequently improving system response time, availability and resilience to attacks, and reducing system overhead. To support this thesis, two research avenues are explored. First, this dissertation presents Prometheus, a socially-aware peer-to-peer service that collects social information from multiple sources, maintains it in a decentralized fashion on user-contributed nodes, and exposes it to applications through an interface that implements non-trivial social inferences. The system's socially-aware design leads to multiple system improvements: 1) it increases service availability by allowing users to manage their social information via socially-trusted peers, 2) it improves social inference performance and reduces message overhead by exploiting naturally-formed social groups, and 3) it reduces the opportunity of attackers to influence application requests. These performance improvements are assessed via simulations and a prototype deployment on a local cluster and on a worldwide testbed (PlanetLab) under emulated application workloads. Second, this dissertation defines the projection graph, the result of decentralizing a social graph onto a peer-to-peer system such as Prometheus, and studies the system's network properties and how they can be used to design more efficient socially-aware distributed applications and services. In particular: 1) it analytically formulates the relation between centrality metrics such as degree centrality, node betweenness centrality, and edge betweenness centrality in the social graph and in the emerging projection graph, 2) it experimentally demonstrates on real networks that for small groups of users mapped on peers, there is high association of social and projection graph properties, 3) it shows how these properties of the (dynamic) projection graph can be accurately inferred from the properties of the (slower changing) social graph, and 4) it demonstrates with two search application scenarios the usability of the projection graph in designing social search applications and unstructured P2P overlays. These research results lead to the formulation of lessons applicable to the design of socially-aware applications and distributed systems for improved application performance such as social search, data dissemination, data placement and caching, as well as for reduced system communication overhead and increased system resilience to attacks.

APA, Harvard, Vancouver, ISO, and other styles

20

Fleig, John David. "Citationally Enhanced Semantic Literature Based Discovery." Diss., NSUWorks, 2019. https://nsuworks.nova.edu/gscis_etd/1082.

Full text

Abstract:

We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if the information is all contained within a single document. But it doesn’t draw connections, make hypotheses, or find knowledge hidden across multiple documents. Literature-based discovery is an approach that can uncover hidden interrelationships between topics by extracting information from existing published scientific literature. The proposed study utilizes a semantic-based approach that builds a graph of related concepts between two user specified sets of topics using semantic predications. In addition, the study includes properties of bibliographically related documents and statistical properties of concepts to further enhance the quality of the proposed intermediate terms. Our results show an improvement in precision-recall when incorporating citations.

APA, Harvard, Vancouver, ISO, and other styles

21

Gilbert, Frédéric. "Méthodes et modèles pour la visualisation de grandes masses de données multidimensionnelles nominatives dynamiques." Thesis, Bordeaux 1, 2012. http://www.theses.fr/2012BOR14498/document.

Full text

Abstract:

La visualisation d'informations est un domaine qui connaît un réel intérêt depuis une dizaine d'années. Dernièrement, avec l'explosion des moyens de communication, l'analyse de réseaux sociaux fait l'objet de nombreux travaux de recherches. Nous présentons dans cette thèse des travaux sur l'analyse de réseaux sociaux dynamiques, c'est à dire que nous prenons en compte l'aspect temporel des données. [...]
Since ten years, informations visualization domain knows a real interest.Recently, with the growing of communications, the research on social networks analysis becomes strongly active. In this thesis, we present results on dynamic social networks analysis. That means that we take into account the temporal aspect of data. We were particularly interested in communities extraction within networks and their evolutions through time. [...]

APA, Harvard, Vancouver, ISO, and other styles

22

Novosad, Andrej. "Využití metod dolování dat pro analýzu sociálních sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236424.

Full text

Abstract:

Thesis discusses data mining the social media. It gives an introduction about the topic of data mining and possible mining methods. Thesis also explores social media and social networks, what are they able to offer and what problems do they bring. Three different APIs of three social networking sites are examined with their opportunities they provide for data mining. Techniques of text mining and document classification are explored. An implementation of a web application that mines data from social site Twitter using the algorithm SVM is being described. Implemented application is classifying tweets based on their text where classes represent tweets' continents of origin. Several experiments executed both in RapidMiner software and in implemented web application are then proposed and their results examined.

APA, Harvard, Vancouver, ISO, and other styles

23

Hsieh, Liang-Chi, and 謝良奇. "Image Graph Construction and Semantic Annotation for Large-Scale Social Multimedia." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/90088768166155247056.

Full text

Abstract:

博士
國立臺灣大學
資訊網路與多媒體研究所
102
In recent years, mobile devices equipped with cameras prevail on consumer markets. These devices plus the emerged trend of multimedia sharing on social networks, makes the scale of multimedia data grow explosively. These raw multimedia data are usually stored without well organized. That causes significant challenge to further retrieving and using these content. With regard to the large-scale multimedia content, we can explore and leverage hidden relations and semantic meanings to help us create useful multimedia applications. In this dissertation, we focus on two problems faced in dealing with large-scale multimedia: data volume and semantics. First, for the data volume problem, in order to improve navigation and search experience over large-scale image data, we investigate the efficient method to construct image graphs that represent visual and semantic relations between images. We leverage constructed graphs to build efficient and scalable group-based image search system. Binary codes are very compact representation for storing and searching image data. However, how to efficient index and search very large-scale images encoded as longer binary codes is still a challenging problem. We propose a new search framework for very large-scale binary image codes that leverages GPU devices to achieve better performance and storage efficiency than previous works. For the second problem with regard to multimedia semantics, we propose several methods to extract semantics from multimedia content shared in social networks. There exist bother visual and semantic relations between images. These relations can be explored to help us better navigate and use image collections. However, current image search systems generally use multi-pages image list to display their search results. The list causes no significant harm when the user''s search target is obvious. However, in the case with the query of higher ambiguity, it is usually difficult for users to find their search targets in such long image list. The kind of paged image lists causes browsing problem for mobile devices too. That is because mobile devices are usually only equipped display screen with limited size. Thus, we propose to build a group-based image search system that summarizes image search results in semantic and visual groups. We leverage visual and semantic relations of images to construct image graphs at offline stage. This design makes the system be efficient at responding user online query. In order to scale up for large-scale images, we propose to use modern parallel technology MapReduce to solve scalability issue in this system. Compared with constructing graphs on single machine, our graph construction method is 69 times faster. In order to solve the data volume problem faced by processing very large-scale image data, binary codes are recently recognized as enabling and promising technique for encoding and searching images. The compact representation of binary code provides better storage efficiency when dealing with huge image data. Besides, compared with other image representations, the pairwise similarity computation of binary codes is much faster. For example, the similarity comparison between a query and millions of binary codes can be done in less than one second with very simple baseline method of linear scanning. These advantages make binary codes as an important component for applications on very large-scale image data. However, when it is required to encode very large-scale image data (at least 1 billion images) as longer binary codes (more than 32 bits), how to efficiently store and search these binary codes still is a challenging problem. We propose a new framework to store and search very large-scale binary codes that leverages GPU devices. Compared with multiple hashing index method proposed in previous work, our random-sampling index approaches are more storage efficient and simpler. It supports both exact and approximate nearest neighbor search on binary codes. By leveraging the parallel computation of GPU, we also achieve faster search time performance than previous works. In order to further improve storage efficiency of our index, we propose a compression scheme for binary codes called bit compression. With GPU-based decompression method, compression version of index would not sacrifice too much search performance. Large-scale image data without properly annotated hinders image browsing and searching application. This problem motivates the development of effective automatic image annotation method. Given an image without textual information, automatic image annotation method can select best textual annotations for the image. Prior works in this area mostly focus on supervised learning approaches. These approaches are not practical due to poor performance, out-of-vocabulary problem, and being time-consuming in acquiring training data and learning. Thus, we claim that automatic image annotation by search over user-contributed photo sites (e.g., Flickr) would be an alternative solution to this problem. The intuition behind it is to select most suitable annotations for unlabeled image from the tags associated with visually similar user-contributed photos. However, the tags are generally few and noisy. To solve this problem, we propose a tag expansion method and use visual and semantic consistency between tag and image. We show that the proposed method significantly outperforms prior works and even provide more diverse annotations. Microblogging as a new form of communication on Internet, has attracted the attention from researchers recently. Relying the real-time and conversational properties of microblogging, its users update their statuses and share experience within their the social network. Those characteristics also make microblogging an important tool for users to share or discuss real world events such as earth quake or sport game. We propose a novel and flexible solution to detect and recognize real-time events from sport games based on analyzing the messages posted on microblogging services. We take Twitter as the experiment platform and collect a large-scale dataset of Twitter messages that are called tweets for 18 prominent sport games covering four types of sports in 2011. We also collect corresponding sport videos for those games. The proposed solution applies moving-threshold burst detection on the volume of tweets to detect highlights in sport games. A tf-idf-based weighting method is applied on the tweets within detected highlights for semantic extraction. According to the experiments we perform on the tweet and video datasets, we find that the proposed methods can achieve competent performance in sport event detection and recognition. Besides, our method can find non pre-defined tidbits that are difficult to detect in previous works. Not all images are interesting to people. People are drawn by interesting images and ignore tasteless ones. Image interestingness has the importance no less than other subjective image properties that have received significant research interest, but has not been systematically studied before. In this proposal, we focus on visual and social aspects of image interestingness. We rely on crowdsourcing tools to survey human perceptions for these subjective properties and verify data by analyzing consistency and reliability. We show that people have an agreement when deciding if an image is interesting or not. We examine the correlation between the social, visual aspects of interestingness and aesthetics. By exploring the correlation, we find that: (1) Weak correlation between social interestingness and both of visual interestingness and image aesthetics indicates that the images frequently re-shared by people are not necessarily aesthetic or visually interesting. (2) High correlation between image aesthetics and visual interestingness implies aesthetic images are more likely to be visually interesting to people. Then we wonder what features of an image lead to social interestingness, e.g. receiving more likes and shares on social networking sites? We train classifiers to predict visual and social interestingness and investigate the contribution from different image features. We find that social and visual interestingness can be best predicted with color and texture, respectively, providing a way to manipulate social and visual liking of images with image features. Further, we investigate the correlation between social/visual image interestingness and image color. We find that colors with arousal effect show more frequently in images with higher social interestingness. That could be explained by previous studies for activation-related affect of colors and provides useful and important advice when advertising on social networking sites.

APA, Harvard, Vancouver, ISO, and other styles

24

Tai, Chih-Hua, and 戴志華. "Graph-based Data Mining for Transactional,Spatial and Social-networking Data." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/02698742304930048631.

Full text

Abstract:

博士
國立臺灣大學
電機工程學研究所
99
Data Mining is a data-and-application dependant technique, and has received significant attentions in the last decade. In the past years, various techniques have been developed to deal with set or sequence data in business marketing, computer networks, bioinformatics, to name a few. Many real applications, however, have called for the need of new techniques to tackle data with structural information, i.e., graphs. Graph-based data mining, which discovers novel knowledge in graph-represented data, is thus becoming more and more important. In this dissertation, motivated by the fact that graph-based data mining is still in its fancy compared to the wide applications, we attempt to address the use of graph-based data mining in realistic problems with three kinds of data complexity, respectively. First, due to the rise of cloud computing, people who lack of expertise in data mining and/or computational resources now can also take advantages from data mining by outsourcing their mining tasks. However, for any outsourcing service, privacy is a major concern. In Chapter 2, we study the problem of privacy protection in outsourcing frequent itemset mining. This problem has two challenges. One is on how to protect sensitive information, including the raw data and the frequent itemsets, with reasonable overhead and preserve the precise mining results. The other is how to protect against an attacker with related background knowledge such as item support information. To overcome these challenges, we propose k-support anonymity and develop a novel encryption approach that constructs a pseudo taxonomy tree to hide sensitive items. By leveraging the property that only the items at the leaf level of the taxonomy need to be appear at the transactions, the storage overhead is limited while the privacy protection is conformed. Second, note that data collected by sensors can consist of not only geographic attributes but also informative attributes. Since the spatial-alone clustering approaches consider only the geographic attributes to identify spatial clusters at data-dense regions, it is infeasible to obtain spatial clusters with informatively similar data points from such data by the spatial-alone clustering approaches. Therefore, we address the informative spatial data clustering (ISDC) problem in Chapter 3. One of the main challenges in this problem is that geographic and informative attributes represent different concepts and should not be tackled in the same way in clustering. To overcome this challenge, we proposed Algorithm BiAgree that introduces a graph structure, named NeiGraph, to integrate informative attributes and geographic attributes in vertices and edges, respectively. Afterward, Algorithm BiAgree is able to identify informatively similar regions regardless of the data density by partitioning NeiGraph into informative-consistent connected components. In addition, by maintaining NeiGraph, Algorithm BiAgree also provides the online computing capability to acquire the solutions with high quality and smaller computation time respectively. Finally, as the rapid growth in the number of services and applications leverage social network data, there is increasing concern about privacy issues in published social networks. Recently several studies have addressed the privacy issues on vertex/edge attributes, vertex identity, link disclosure, and so on. However, compared to the rich information inherent in graph data, the privacy issues in publications of social networks have not been fully solved. In Chapter 4, we address a new privacy issue, referred to as the community identification. The community identity of an individual is a kind of structural information that indicates the neighborhood or connections of the individual. The community identity could also represent the personal privacy information sensitive to the public, such as on-line political activity group, on-line disease support group information, or friend group in a social network. To protect such information, therefore, we propose a new privacy model, named k-structural diversity, and develop an Integer Programming formulation to find the optimal solutions to k-SDA. Moreover, we devise three scalable heuristics to solve the large instances of k-SDA with different perspectives.

APA, Harvard, Vancouver, ISO, and other styles

25

Balasundaram, Balabhaskar. "Graph theoretic generalizations of clique: optimization and extensions." 2007. http://hdl.handle.net/1969.1/ETD-TAMU-1539.

Full text

Abstract:

This dissertation considers graph theoretic generalizations of the maximum clique problem. Models that were originally proposed in social network analysis literature, are investigated from a mathematical programming perspective for the first time. A social network is usually represented by a graph, and cliques were the first models of "tightly knit groups" in social networks, referred to as cohesive subgroups. Cliques are idealized models and their overly restrictive nature motivated the development of clique relaxations that relax different aspects of a clique. Identifying large cohesive subgroups in social networks has traditionally been used in criminal network analysis to study organized crimes such as terrorism, narcotics and money laundering. More recent applications are in clustering and data mining wireless networks, biological networks as well as graph models of databases and the internet. This research has the potential to impact homeland security, bioinformatics, internet research and telecommunication industry among others. The focus of this dissertation is a degree-based relaxation called k-plex. A distance-based relaxation called k-clique and a diameter-based relaxation called k-club are also investigated in this dissertation. We present the first systematic study of the complexity aspects of these problems and application of mathematical programming techniques in solving them. Graph theoretic properties of the models are identified and used in the development of theory and algorithms. Optimization problems associated with the three models are formulated as binary integer programs and the properties of the associated polytopes are investigated. Facets and valid inequalities are identified based on combinatorial arguments. A branch-and-cut framework is designed and implemented to solve the optimization problems exactly. Specialized preprocessing techniques are developed that, in conjunction with the branch-and-cut algorithm, optimally solve the problems on real-life power law graphs, which is a general class of graphs that include social and biological networks. Computational experiments are performed to study the effectiveness of the proposed solution procedures on benchmark instances and real-life instances. The relationship of these models to the classical maximum clique problem is studied, leading to several interesting observations including a new compact integer programming formulation. We also prove new continuous non-linear formulations for the classical maximum independent set problem which maximize continuous functions over the unit hypercube, and characterize its local and global maxima. Finally, clustering and network design extensions of the clique relaxation models are explored.

APA, Harvard, Vancouver, ISO, and other styles

26

Lehončák, Michal. "Analýza odvozených sociálních sítí." Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-448611.

Full text

Abstract:

Analysis of Inferred Social Networks While the social network analysis (SNA) is not a new science branch, thanks to the boom of social media platforms in recent years new methods and approaches appear with increasing frequency. However, not all datasets have network structure visible at first glance. We believe that every reasonable interconnected system of data hides a social network, which can be inferred using specific methods. In this thesis we examine such social network, inferred from the real-world data of a smaller bank. We also review some of the most commonly used methods in SNA and then apply them on our complex network, expecting to find structures typical for traditional social networks.

APA, Harvard, Vancouver, ISO, and other styles

27

Měkota, Ondřej. "Predikce spojení v odvozených sociálních sítích." Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-448563.

Full text

Abstract:

Social networks can be helpful for the analysis of behaviour of people. An existing social network is rarely available, and its nodes and edges have to be inferred from not necessarily graph data. Link prediction can be used to either correct inaccuracies or to forecast links about to appear in the future. In this work, we study the prediction of miss- ing links in a social network inferred from real-world bank data. We review and compare both verified and modern approaches to link prediction. Following the advancements of deep learning in recent years, we primarily focus on graph neural networks, and their ability to scale to large networks. We propose an adjustment to an existing graph neural network method and show that its performance is either comparable with or outperform- ing the original method. The comparison is performed on two social networks inferred from the same data. We show that it is relatively hard to outperform the verified link prediction methods with graph neural networks. 1

APA, Harvard, Vancouver, ISO, and other styles

28

Zhao, Tao. "Identification of Online Users' Social Status via Mining User-Generated Data." Doctoral thesis, 2019. http://hdl.handle.net/21.11130/00-1735-0000-0003-C1B1-A.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Ketterl, Markus. "Scalable Multimedia Learning: From local eLectures to global Opencast." Doctoral thesis, 2014. https://repositorium.ub.uni-osnabrueck.de/handle/urn:nbn:de:gbv:700-2014032712324.

Full text

Abstract:

Universities want to go where the learners are to share their rich scientific and intellectual knowledge beyond the walls of the academy and to expand the boundaries of the classroom. This desire has become a critical need, as the worldwide economy adjusts to globalization and the need for advanced education and training becomes ever more critical. Unfortunately, the work of creating, processing, distributing and using quality multimedia learning content is expensive and technically challenging. The work combines research results, lessons learned and usage findings in the presentation of a fully open source based scalable lecture capture solution, that is useful in the heterogenous computing landscape of today’s universities and learning institutes. Especially implemented user facing applications and components are being addressed, which enable lecturers, faculty and students to record, analyze and subsequently re-use the recorded multimedia learning material in multiple and attractive ways across devices and distribution platforms.

APA, Harvard, Vancouver, ISO, and other styles

30

(11048391), Hao Sha. "SOLVING PREDICTION PROBLEMS FROM TEMPORAL EVENT DATA ON NETWORKS." Thesis, 2021.

Find full text

Abstract:

Many complex processes can be viewed as sequential events on a network. In this thesis, we study the interplay between a network and the event sequences on it. We first focus on predicting events on a known network. Examples of such include: modeling retweet cascades, forecasting earthquakes, and tracing the source of a pandemic. In specific, given the network structure, we solve two types of problems - (1) forecasting future events based on the historical events, and (2) identifying the initial event(s) based on some later observations of the dynamics. The inverse problem of inferring the unknown network topology or links, based on the events, is also of great important. Examples along this line include: constructing influence networks among Twitter users from their tweets, soliciting new members to join an event based on their participation history, and recommending positions for job seekers according to their work experience. Following this direction, we study two types of problems - (1) recovering influence networks, and (2) predicting links between a node and a group of nodes, from event sequences.

APA, Harvard, Vancouver, ISO, and other styles

31

Gordon, Jesse. "When data crimes are real crimes: voter surveillance and the Cambridge Analytica conflict." Thesis, 2019. http://hdl.handle.net/1828/11075.

Full text

Abstract:

This thesis asks what conditions elevated the Cambridge Analytica (CA) conflict into a sustained and global political issue? Was this a privacy conflict and if so, how was it framed as such? This work demonstrates that the public outcry to CA formed out of three underlying structural conditions: The rise of the alt-right as an ideology, surveillance capitalism, and a growing and unregulated voter analytics industry. A network of actors seized the momentum of this conflict to drive the message that voter surveillance is a threat to democratic elections. These actors humanized the CA conflict and created a catalyst for a large scale public outrage to these previously ignored structures. Their focus on democratic threat also allowed this conflict to transcend the typical contours of a privacy conflict and demonstrate that the consequences of CA are societal, rather than personal. Despite the democratic threat of voter surveillance, Canada and the United States have yet to address the wider implications of voter surveillance adequately. Thus, how these systems are used will be a question of central importance in upcoming elections.
Graduate

APA, Harvard, Vancouver, ISO, and other styles

32

Pitcher, Sandra. "The mass collaboration of digital information : an ethical examination of YouTube and intellectual property rights." Thesis, 2010. http://hdl.handle.net/10413/565.

Full text

Abstract:

The Internet has been lauded as an open and free platform from which one is able to engage with, and share large amounts of information (Stallman, 1997). As one witnesses the shift from analogue media to digitalism, so too is it possible to note a change in cultural practices of media consumers. Users of the media can now be viewed as “prosumers”, producing as well as consuming media products (Marshall, 2004). Digital media users have been given the ability to engineer their own unique media experiences, especially within the realms of the Internet. However, this process has seemingly led to mass copyright infringement as Internet users appropriate various movies, music, television programmes, photographs and animations in order to create such an experience. The art of digital mashing in particular, has been deemed an explicit exploitation of intellectual property rights as it re-cuts, re-mixes and re-broadcasts popular media in a number of alternative ways. YouTube especially has been at the forefront of the copyright furore surrounding digital mash-ups because it allows online users the facility to post and share these video clips freely with other online users. While YouTube claims that they do not promote the illegal use of copyrighted material, they simultaneously acknowledge that they do not actively patrol that which is posted on their website. As such, copyright infringement appears seemingly rife as users share their own versions of popular media through the art of digital mashing. This dissertation however, explores the concept that the creation of mash-ups is not undermining intellectual property rights, but instead produces a new avenue from which culture can emerge. It highlights how Internet users are utilising the culture which surrounds them in an attempt to navigate the new social structures of the online, subsequently arguing that mash-ups are an important element of defining a new postmodern culture, and that the traditional copyright laws of analogue need to be modified in order to secure the development of new and emerging societal structures.
Thesis (M.A.)-University of KwaZulu-Natal, Pietermaritzburg, 2010.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Graph, social and multimedia data'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles