Scalable Mining of Complex Social Graphs


The proliferation of user-generated rich web-based social media data, such as microblogs (e.g., short text messages called Tweets on Twitter), blogs, chat rooms, newsgroups, discussion boards, and web sites, ooeer unprecedented opportunities for public health informatics. However, to date, most of the health-related social data has been used to plot prevalence, occurrence and incidence trends over time, based on relatively simple analysis of the keywords in Internet searches, or blogs and microblogs. In contrast, we propose to model the diverse social media data sources as disease-specioc, massive and enriched isocialj graphs, enabling the use of both structure and content for deep analysis and mining. A disease-specioc social graph (e.g., H1N1 Flu social graph) is comprised of multi-attributed (numeric, symbolic, and complex-typed) nodes and edges, where the nodes represent the dioeerent types of actors and entities in the social network ecosystem, such as users, microblogs, blogs, web sites, and documents, and the edges represent various types of roles and relationships, such as author, follower, commentator, hyperlink, citation, and so on. Intellectual Merit: Mining complex, multi-attributed disease-specioc social graphs poses signiocant chal- lenges. There is a crucial need for graph pattern mining and querying tools to rapidly make sense of and extract knowledge from such massive and enriched social graphs, with potentially millions of nodes and edges. The mining and querying problem is made more challenging due to the rich set of attributes on the nodes and edges. Another challenge is that we may need to mine and query approximate patterns, given the underlying uncertainty of the data. We also need dynamic approaches for aggregation, mining and indexing to account for data changes. The research goal of this project is to develop an integrated framework for mining and querying massive and complex health-related social graphs. To achieve this objective, we identify the following main aims: i) Construct disease-specioc enriched social graphs, fo- cusing on three main diseases: Dengue fever, InAEuenza-like illness (ILI), and breast cancer. ii) Develop an integrated framework for exploratory analysis, mining and querying of complex social networks. iii) Validate mined patterns and develop interactive visual tools for network exploration. Broader Impact: Successful completion of this project will yield a generic and scalable framework for the exploratory analysis of complex health-related networks aggregated from social media platforms. The focus on Dengue fever, InAEuenza-like illness, and breast cancer will provide novel insights into the public perceptions, responses, and support mechanisms, which can guide how information on health care or pan- demics is delivered to the general populace. More broadly, due to the ubiquitous nature of graph data, the proposed tools and techniques are highly relevant and useful for various applications domains ranging from social network analysis to network biology. Professor Mohammed J. Zaki is an internationally recognized leader in the area of data mining, and a leading authority on pattern mining ~ especially graph mining, biological data mining, and high performance data mining. He will closely collaborate and partner with two National Institute of Science and Technology (INCT) Centers in Brazil, namely the INCT for the Web (IN- WEB), and INCT for Dengue, which will allow a broad national and international impact, especially for the use of social media for smarter public health.  



Início: 2012
Término: 2015
Coordenador: Virgilio Augusto Fernandes Almeida
Agência: CNPq
Programa: Chamada 61/2011 Bolsa Pesquisador Visitante Especial - CAPES/CNPq/FAPs / Linha 2 - Bolsa Pesquisador Visitante Especial
Processo: 401735/2012-5
Situação: Encerrado