Submissions/Anti-Vandalism Research: The Year in Review

From Wikimania 2011 • Haifa, Israel
Anti-Vandalism Research: The Year in Review
Andrew G. West (west.andrew.g on
United States of America
University of Pennsylvania -- Philadelphia, PA
Vandalism has long been an issue for collaborative environments. However, technological advances are making the problem less acute by: (1) undoing it automatically, or (2) reducing the human effort required to locate and revert it. The prior year (since the last Wikimania, in July 2010) has seen much anti-vandalism research [1-8]. The proposed presentation intends to summarize these efforts: including both practical/implemented systems [2, 5, 7] as well as more academic [1, 3, 4, 6, 8] ones.

From a practical perspective, discussion will begin with the STiki tool [5, 7]. STiki is a crowd-sourced GUI tool that presents edits from server-side "queues" to human users -- streamlining the reversion and warning process. The proposed presentation will demonstrate the STiki software. Initially, STiki contained just one queue (based on machine-learning over metadata [8]). However, STiki has since expanded into a general-purpose tool: accumulating nearly 50,000 reverts in the past year and integrating third-party queues/scoring-systems.

One third-party system included in STiki is ClueBot-NG [2], which leverages neural network learning over many edit properties. Where confident, ClueBot-NG automatically reverts edits in a bot-like fashion (with nearly 250,000 reverts in 6-months time). However, in some cases (e.g., confidence below threshold) it also feeds edits to the STiki interface. This combination of autonomous reversion and prioritized human inspection has proven an effective model on which future efforts intend to build. In fact, the model's success has led to recent proposals to more tightly integrate techniques into wiki infrastructure. Suggestions include: (1) using vandalism scores to optimize "pending changes" protections, and (2) creating more informative "watchlists". The proposed presentation will report on (and possibly, demonstrate) such proposals.

Moving forward, the next-generation of on-wiki systems will likely be based on current academic efforts. In mid-2010 a vandalism-detection workshop was organized at the CLEF conference [4]. This invited a diversity of anti-vandalism contributions, with a natural-language processing (NLP) approach winning the competition. Later, this NLP-approach would cooperate with a reputation-driven system (WikiTrust, 2nd place in competition) and a metadata-driven detector [8]. Together they produced [1], which examined how the diverse feature sets worked in combination. They found the varied feature types caught *unique* vandalism and established a new performance baseline. The proposed presentation will provide intuition about each sub-system and discuss ongoing efforts to bring the technique online.

Finally, the immediate future promises additional anti-vandalism research. A second edition of the CLEF competition is scheduled for mid-2011 [3] (prior to Wikimania). Uniquely, the 2011 version will include foreign language edits and will permit *future evidence* to be used to locate vandalism. The latter functionality is especially relevant in creating static copies of Wikipedia content (e.g., the Wikipedia 1.0 Project). The proposed presentation will close by reporting on the results of this competition/workshop.

Wiki Infrastructure and Technology
Yes. Thank you to the committee for a partial scholarship.
[1] B. Adler, L. de Alfaro, S.M. Mola-Velasco, P. Rosso, and A.G. West. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In CICLing’11: Proceedings of the 12th International Conference on Intelligent Text Processing and Computational Linguistics, LNCS 6609, pages 277–288, February 2011.

[2] C. Breneman and C. Carter. Cluebot NG.

[3] PAN 2011 Lab Uncovering Plagiarism, Authorship, and Social Software Misuse.

[4] M. Potthast, B. Stein, and T. Holfeld. Overview of the 1st International competition on Wikipedia vandalism detection. In Notebook Papers of CLEF 2010 LABs and Workshops, 2010.

[5] A.G. West. STiki: A vandalism detection tool for Wikipedia.

[6] A.G. West, J. Chang, K. Venkatasubramanian, and I. Lee. Trust in Collaborative Web Applications. To appear in Future Generation Computer Systems, special section on Trusting Software Behavior, Elsevier Press, 2011. (A preliminary version was published as University of Pennsylvania Technical Report MS-CIS-10-33, 2010).

[7] A.G. West. Spatio-Temporal Analysis of Revision Metadata and the STiki Anti-Vandalism Tool. Presented at Wikimania 2010.

[8] A.G. West, S. Kannan, and I. Lee. Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata. In EUROSEC ‘10: Proceedings of the Third European Workshop on System Security, pages 22–28, April 2010.

  1. West.andrew.g 03:31, 23 April 2011 (UTC) - By default!
  2. --Tobias 16:53, 24 April 2011 (UTC)
  3. Steven (WMF) 22:35, 25 April 2011 (UTC)
  4. Andrew Garrett
  5. Blahma 12:48, 1 May 2011 (UTC)
  6. DarTar 20:53, 5 May 2011 (UTC)
  7. Vibhijain 07:16, 8 May 2011 (UTC)
  8. M7 22:01, 8 May 2011 (UTC)
  9. It would be especially interesting to know how this can be expanded to other wikis/languages and how this relates to Submissions/Collaborative_Watchlist. Nemo 21:54, 12 June 2011 (UTC)
  10. Maniago 09:34, 9 July 2011 (UTC)
  11. Krinkle 10:29, 5 August 2011 (UTC)