A Multi-View Approach for Automatic Quality Assessement in Collaborative Web Documents

The Web 2.0 has brought deep changes to the Internet, as users are now able not only to consume, but also to produce content in a much faster and easier manner, in many cases in a collaborative way. This change gave rise to new ways for creating knowledge repositories to which anyone can freely contribute. Some examples of these repositories include blogs, forums, or collaborative digital libraries, whose collections are maintained by the own Web community. However, such freedom also carries an important question: given the rhetoric of democratic access to everything, by everyone, at any time, how can a user determine the quality of the information provided? Currently, content generated in a more traditional, centralized manner, published using physical media, such as books or journals, is still naturally seen as being of higher quality and more trustworthy [Dondio et al. 2006]. Nevertheless, the growth and level of dissemination of this collaboratively created content is such that mechanisms to assess the quality and trust of this type of material should be provided. For instance, collaborative e fforts such as Wikipedia and Wikia rely on human judgments of specialized editors for quality assessment. However, manual assessment not only does not scale to the current rate of growth of such collections, but is also subject to human bias, which can be in uenced by the varying background, expertise, and even a tendency for abuse. A possible solution to this problem would be to automatically estimate the quality of these collaborative content. Accordingly, this research proposal aims at developing automatic quality assessment methods of collaborative content such as collaborative encyclopedia and QA Forums. In particular, we intend to explore machine learning methods which exploit the idea of the combination of multiple experts” for quality estimation. Our hypothesis is that quality is a multifaceted problem in which each facet corresponds to a quality aspect (e.g., readability, style, organizational structure, link/citation coverage, review history) which can be individually analyzed by an automated expert” (learner) and the opinions” of these experts can be combined for a nal decision about the overall quality of a particular item. Moreover, based on lessons learned in these other domains, for which some labels about the quality of certain items can be obtained (e.g., editors’ analyses of Wikipedia articles, est” answers for QA forums), we intend to study how to ransfer” the obtained knowledge for other domains for which such labeled data is not easily available such as the open Web. Particularly, the goal is to explore the quality of web pages aiming at improving ranking results.

Alunos envolvidos: Doutorado: (1) .

Integrantes: Marcos André Gonçalves – Coordenador.

Número de produções C, T A: 1