Itemset size-sensitive interestingness measures for association rule mining and link prediction

K-REx Repository

Show simple item record Aljandal, Waleed A. 2009-02-13T17:20:42Z 2009-02-13T17:20:42Z 2009-02-13T17:20:42Z
dc.description.abstract Association rule learning is a data mining technique that can capture relationships between pairs of entities in different domains. The goal of this research is to discover factors from data that can improve the precision, recall, and accuracy of association rules found using interestingness measures and frequent itemset mining. Such factors can be calibrated using validation data and applied to rank candidate rules in domain-dependent tasks such as link existence prediction. In addition, I use interestingness measures themselves as numerical features to improve link existence prediction. The focus of this dissertation is on developing and testing an analytical framework for association rule interestingness measures, to make them sensitive to the relative size of itemsets. I survey existing interestingness measures and then introduce adaptive parametric models for normalizing and optimizing these measures, based on the size of itemsets containing a candidate pair of co-occurring entities. The central thesis of this work is that in certain domains, the link strength between entities is related to the rarity of their shared memberships (i.e., the size of itemsets in which they co-occur), and that a data-driven approach can capture such properties by normalizing the quantitative measures used to rank associations. To test this hypothesis under different levels of variability in itemset size, I develop several test bed domains, each containing an association rule mining task and a link existence prediction task. The definitions of itemset membership and link existence in each domain depend on its local semantics. My primary goals are: to capture quantitative aspects of these local semantics in normalization factors for association rule interestingness measures; to represent these factors as quantitative features for link existence prediction, to apply them to significantly improve precision and recall in several real-world domains; and to build an experimental framework for measuring this improvement, using information theory and classification-based validation. en
dc.language.iso en_US en
dc.publisher Kansas State University en
dc.subject Data Mining en
dc.subject Association Rule en
dc.subject Interestingness Measures en
dc.subject Link Prediction en
dc.title Itemset size-sensitive interestingness measures for association rule mining and link prediction en
dc.type Dissertation en Doctor of Philosophy en
dc.description.level Doctoral en
dc.description.department Department of Computing and Information Sciences en
dc.description.advisor William H. Hsu en
dc.subject.umi Computer Science (0984) en 2009 en May en

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search K-REx


My Account


Center for the

Advancement of Digital