Itemset size-sensitive interestingness measures for association rule mining and link prediction

dc.contributor.authorAljandal, Waleed A.
dc.date.accessioned2009-02-13T17:20:42Z
dc.date.available2009-02-13T17:20:42Z
dc.date.graduationmonthMayen
dc.date.issued2009-02-13T17:20:42Z
dc.date.published2009en
dc.description.abstractAssociation rule learning is a data mining technique that can capture relationships between pairs of entities in different domains. The goal of this research is to discover factors from data that can improve the precision, recall, and accuracy of association rules found using interestingness measures and frequent itemset mining. Such factors can be calibrated using validation data and applied to rank candidate rules in domain-dependent tasks such as link existence prediction. In addition, I use interestingness measures themselves as numerical features to improve link existence prediction. The focus of this dissertation is on developing and testing an analytical framework for association rule interestingness measures, to make them sensitive to the relative size of itemsets. I survey existing interestingness measures and then introduce adaptive parametric models for normalizing and optimizing these measures, based on the size of itemsets containing a candidate pair of co-occurring entities. The central thesis of this work is that in certain domains, the link strength between entities is related to the rarity of their shared memberships (i.e., the size of itemsets in which they co-occur), and that a data-driven approach can capture such properties by normalizing the quantitative measures used to rank associations. To test this hypothesis under different levels of variability in itemset size, I develop several test bed domains, each containing an association rule mining task and a link existence prediction task. The definitions of itemset membership and link existence in each domain depend on its local semantics. My primary goals are: to capture quantitative aspects of these local semantics in normalization factors for association rule interestingness measures; to represent these factors as quantitative features for link existence prediction, to apply them to significantly improve precision and recall in several real-world domains; and to build an experimental framework for measuring this improvement, using information theory and classification-based validation.en
dc.description.advisorWilliam H. Hsuen
dc.description.degreeDoctor of Philosophyen
dc.description.departmentDepartment of Computing and Information Sciencesen
dc.description.levelDoctoralen
dc.identifier.urihttp://hdl.handle.net/2097/1245
dc.language.isoen_USen
dc.publisherKansas State Universityen
dc.subjectData Miningen
dc.subjectAssociation Ruleen
dc.subjectInterestingness Measuresen
dc.subjectLink Predictionen
dc.subject.umiComputer Science (0984)en
dc.titleItemset size-sensitive interestingness measures for association rule mining and link predictionen
dc.typeDissertationen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WaleedAljandal2009.pdf
Size:
3.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.69 KB
Format:
Item-specific license agreed upon to submission
Description: