Predictive data mining in a collaborative editing system: the Wikipedia articles for deletion process.

K-REx Repository

Show simple item record Ashok, Ashish Kumar 2011-08-15T14:24:56Z 2011-08-15T14:24:56Z 2011-08-15
dc.description.abstract In this thesis, I examine the Articles for Deletion (AfD) system in /Wikipedia/, a large-scale collaborative editing project. Articles in Wikipedia can be nominated for deletion by registered users, who are expected to cite criteria for deletion from the Wikipedia deletion. For example, an article can be nominated for deletion if there are any copyright violations, vandalism, advertising or other spam without relevant content, advertising or other spam without relevant content. Articles whose subject matter does not meet the notability criteria or any other content not suitable for an encyclopedia are also subject to deletion. The AfD page for an article is where Wikipedians (users of Wikipedia) discuss whether an article should be deleted. Articles listed are normally discussed for at least seven days, after which the deletion process proceeds based on community consensus. Then the page may be kept, merged or redirected, transwikied (i.e., copied to another Wikimedia project), renamed/moved to another title, userfied or migrated to a user subpage, or deleted per the deletion policy. Users can vote to keep, delete or merge the nominated article. These votes can be viewed in article’s view AfD page. However, this polling does not necessarily determine the outcome of the AfD process; in fact, Wikipedia policy specifically stipulates that a vote tally alone should not be considered sufficient basis for a decision to delete or retain a page. In this research, I apply machine learning methods to determine how the final outcome of an AfD process is affected by factors such as the difference between versions of an article, number of edits, and number of disjoint edits (according to some contiguity constraints). My goal is to predict the outcome of an AfD by analyzing the AfD page and editing history of the article. The technical objectives are to extract features from the AfD discussion and version history, as reflected in the edit history page, that reflect factors such as those discussed above, can be tested for relevance, and provide a basis for inductive generalization over past AfDs. Applications of such feature analysis include prediction and recommendation, with the performance goal of improving the precision and recall of AfD outcome prediction. en_US
dc.language.iso en_US en_US
dc.publisher Kansas State University en
dc.subject WEKA en_US
dc.subject LibSVM en_US
dc.subject J48 en_US
dc.subject Perceptron en_US
dc.title Predictive data mining in a collaborative editing system: the Wikipedia articles for deletion process. en_US
dc.type Thesis en_US Master of Science en_US
dc.description.level Masters en_US
dc.description.department Department of Computing and Information Sciences en_US
dc.description.advisor William H. Hsu en_US
dc.subject.umi Computer Science (0984) en_US 2011 en_US August en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx

Advanced Search


My Account

Center for the

Advancement of Digital


118 Hale Library

Manhattan KS 66506

(785) 532-7444