2008-10-01

Effective Pruning Strategies for Sequential Pattern Mining

Effective Pruning Strategies for Sequential Pattern Mining

Xu Yusheng Ma Zhixin Li Lian
School of Information Science and Technology
Lanzhou University, China
e-mail: f xuyusheng, mazhx,lilg@ lzu.edu.cn

Tharam S. Dillon
School of Information System
Curtin University, Perth, Australia
e-mail: tharam.dillon@cbs.curtin.edu.au

Abstract—In this paper, we systematically explore the search
space of frequent sequence mining and present two novel
pruning strategies, SEP (Sequence Extension Pruning) and
IEP (Item Extension Pruning), which can be used in all
Apriori-like sequence mining algorithms or lattice-theoretic
approaches. With a little more memory overhead, proposed
pruning strategies can prune invalidated search space and
decrease the total cost of frequency counting effectively. For
effectiveness testing reason, we optimize SPAM [2] and present
the improved algorithm, SPAMSEPIEP, which uses SEP and
IEP to prune the search space by sharing the frequent 2-
sequences lists. A set of comprehensive performance experiments
study shows that SPAMSEPIEP outperforms SPAM by
a factor of 10 on small datasets and better than 30% to 50%
on reasonably large dataset.

沒有留言: