Email and phone number entity search and ranking

Date

2008-12-01T20:41:17Z

Journal Title

Journal ISSN

Volume Title

Publisher

Kansas State University

Abstract

Entity search has been proposed as a search method for domain-specific Internet applications. It differs from the classical approaches used by search engines which give a "page-view result": listing the URLs of web pages containing the desired keywords. Entity search returns more structured results listing the specific information that a user seeks, such as an email address or a phone number. It not only provides the URL links to targets, but also attributes of target entities (e.g., email address, phone number, etc.). Compared to classical search methods, entity search is a more direct and user-friendly method for searching through a large volume of web documents. After the user submits a query, the extracted entities are ordered by their relevance to the query. While previous work has proposed various complex formulas for entity ranking, it has not been shown whether such complexity is needed. In this research I explore the problem of whether a simpler method can achieve reasonable results. I have designed an entity-search and ranking algorithm using a formula that simply combines a page’s PageRank and an entity's distance to the query keywords to produce a metric for ranking discovered entities. My research goal is to answer the question of whether effective entity ranking can be performed by an algorithm that computes matching scores specific to the entity search domain, and what improvements are necessary to refine the result. My approach takes into account the entity's proximity to the keywords in the query as well as the quality of the page where it is contained. I implemented a system based on the algorithm and perform experiments to show that in most cases the result is consistent with the user's desired outcome.

Description

Keywords

Entity Search, Entity Ranking

Graduation Month

December

Degree

Master of Science

Department

Department of Computing and Information Sciences

Major Professor

William H. Hsu

Date

2008

Type

Thesis

Citation