Submissions/Opening up Wikipedia's data: A lightweight approach to Wikipedia as a platform

From Wikimania 2011 • Haifa, Israel
Jump to navigation Jump to search

Presentation Media

30px No slides known
(please upload your slides and/or add it)



Review no.

128

Title of the submission

Opening up Wikipedia's data: A lightweight approach to Wikipedia as a platform

Type of submission (workshop, tutorial, panel, presentation)

presentation

Author of the submission
  1. Dario Taraborelli
  2. Diederik van Liere
  3. Ryan Lane
E-mail address or username (if username, please confirm email address in Special:Preferences)
  1. dtaraborelli@wikimedia.org
  2. dvanliere@wikimedia.org
  3. rlane@wikimedia.org
Country of origin

USA

Affiliation, if any (organization, company etc.)

Wikimedia Foundation

Personal homepage or blog
  1. http://nitens.org/taraborelli
  2. http://twitter.com/dvanliere
Abstract (please use no less than 300 words to describe your proposal)

There is a final frontier where Wikipedia and its sister projects are not as open as they could / should be: non-human interactions. Wikimedia’s infrastructure is designed to make it possible for millions of humans worldwide to freely reuse its contents, but it falls short of providing tools to allow third-party services to easily reuse its data. The goal of this presentation is to start a discussion on what it takes to rethink Wikipedia as a platform (WAAP) to facilitate the reuse of its contents in the form of structured data. In this talk we will focus on two technologies -- Wikilytics and OAuth -- that, combined, could spearhead the creation of an ecosystem of new services based on Wikipedia’s data.

Wikilytics is an analytics platform to answer questions about the different Wikipedia communities. The different Wikipedia communities generate two kinds of data:

  • primary data which consists of the edits made by editors to the different namespaces;
  • derived data from the edits such as timestamp, type of editor and other variables;

Wikilytics takes this data and transforms it into an editor-centric dataset which consists of analytic data about editors. By transposing the data from article-centric to editor-centric, wikilytics makes it easier to answer questions about a given Wikipedia community. Currently, Wikilytics can be used to run queries that will create a dataset that can be used to analyze a Wikipedia community. THe next step that will help implement the Wikipedia As A Platform vision is to add a to Wikilytics. Such an API would unlock analytic data about Wikipedia editors that can be used to create new applications focusing on editor reputation, wiki roles and expertise, wiki identities and anything else that the community finds relevant.

OAuth (Open Authorization) is an open standard for authorization, allowing users to share their private resources stored on one site with another site without having to hand out their credentials, typically username and password. OAuth allows users to hand out tokens instead of credentials to their data hosted by a given service provider. Each token grants access to a specific site (e.g. a video editing site) for specific resources (e.g. just videos from a specific album) and for a defined duration (e.g. the next 2 hours). This allows a user to grant a third-party site access to their information stored with another service provider, without sharing their access permissions or the full extent of their data. OAuth APIs are a common solution adopted by social media platforms with a large mashup and application ecosystem: using a single Flickr or Facebook log in, users of these platforms can access thousands of applications that expose and reuse Flickr or Facebook data in creative ways. Most discussions around OAuth in Wikimedia projects have been hindered by a focus on write permissions. However introducing simple OAuth read privileges will enable a slate of new apps to come to existence. These apps would not be focused on defining alternative ways for people to contribute to Wikipedia’s contents, but to design support tools to enhance the editing and social activity of contributors and to reuse Wikipedia’s data in novel ways. Contents generated via third-party services would not be managed by WMF, but the OAuth usage terms will require that data produced by any such service be made available under open licenses to the Wikimedia community and be respectful of privacy terms established by WMF. Individual editors will have complete control on which OAuth-based applications to enable (and for how long) and which to switch off or ban indefinitely. OAuth implementations for MediaWiki already exist, so there would be no need to invest major technical efforts to implement the basic infrastructure.

We submit that the combination of these two technologies, by allowing fine-grained user authentication and by exposing rich editor data, will enable the creation of a potentially large number of consumer services and research tools without the need of changing the core technology currently running the project: we will showcase in this presentation several examples of potential applications based on this lightweight approach to WAAP. We will discuss the implications of this model for the future of Wikimedia and how it may help meet some of the strategic goals for the project in the coming years. In particular, we will discuss how WAAP may help unlock the development of innovative technology and decouple it from the maintenance and constant improvement of the main platform used by Wikimedia projects.

Track (People and Community/Knowledge and Collaboration/Infrastructure)

Infrastructure

Will you attend Wikimania if your submission is not accepted?

Yes

Slides or further information (optional)
External references
WAAP proposals from Strategy Wiki
References on OAuth and Wikilytics


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Phoebe 00:18, 29 April 2011 (UTC)[reply]
  2. Erik Zachte 14:13, 29 April 2011 (UTC)[reply]
  3. Eloquence 03:38, 1 May 2011 (UTC)[reply]
  4. Vibhijain 13:07, 4 May 2011 (UTC)[reply]
  5. iopensa 05:26, 6 May 2011 (UTC)[reply]
  6. Ryan lane 19:42, 10 May 2011 (UTC)[reply]
  7. Mietchen 02:47, 27 May 2011 (UTC)[reply]
  8. Nemo 22:46, 12 June 2011 (UTC)[reply]
  9. Amir E. Aharoni
  10. Roy Emanuel 07:01, 30 June 2011 (UTC)[reply]