Mining Sequential Patterns
Abstract
We are given a large database of customer transactions,
where each transaction consists of customer-id,
transaction time, and the items bought in the transaction.
We introduce the problem of mining sequential
pattrriis over such databases. We present three algorithms
to solve this problem, and empirically evaluate
their performance using synthetic data. Two of
the proposed algorithms, AprioriSome and Apriori-
All, have comparable performance, albeit AprioriSome
performs a little better when the minimum number
of customers that must support a sequential pattern
is low. Scale-up experiments show that both AprioriSome
and AprioriAll scale linearly with the number
of customer transactions. They also have excellent
scale-up properties with respect to the number of
transactioiis per customer and the number of items in
a t ransart ion.
References
[l] R. Agrawal, T. Imielinski, and A. Swami. Mining
association rules between sets of items in large
tlatahases. In Proc. of the ACM SIGMOD Conffrritcc
on Management of Data, pages 207-216,
Washington, D.C., May 1993.
[2] 11. Agrawal and R. Srikant. Fast algorithms
for inining aqsociation rules. In Proc. of the
L'L DD Conje re n ce, Santiago, Chile, September
1994. Expanded version available as IBM Rewarcli
Rcyort. R.19839, June 1004.
[3] R. Agrawal and R. Srikant. Mining sequential
patltrns. Research Report RJ 9910, IBM Aliiiatlen
Research Center, San Jose, California, October
1994.
[4] S. Alt.schul, W. Gish, W. Miller, E. Myers, and
D. Lipman. A basic local alignment search tool.
Journal of Molecular Biology, 1990.
[5] A. Califano and I. Rigoutsos. Flash: A fast lookup
algorithm for string homology. In Proc. of the
Is/ litternatzonal Cotitrerelice on Intelligent Sys-
Icitts for Molecular Biology, Bethesda, MD, July
1093.
[6] T. G. Dietterich and R. S. Michalski. Discovering
patterns in sequences of events. Artificial Intelligence,
25:187-232, 1985.
[7] L. Hui. Color set size problem with applications
to string matching. In A. Apostolico,
M. Crochemere, Z. Galil, and U. Manber, editors,
Combinatorial Pattern Matching, LNCS
644, pages 230-243. Springer-Verlag, 1992.
[8] M. Roytberg. A search for common patterns in
Computer Applications in the many sequences.
Biosciences, 8( 1):57-64, 1992.
[9] M. Vingron and P. Argos. A fast and sensitive
multiple sequence alignment algorithm. Computer
Applications in the Biosciences, 5:115-122,
1989.
[lo] J. T.-L. Wang, G.-W. Chirn, T. G. Marr,
B. Shapiro, D. Shasha, and K. Zhang. Combinatorial
pattern discovery for scientific data: Some
preliminary results. In Proc. of the ACM SIGMOD
Conference on Management of Data, Minneapolis,
May 1994.
[ll] M. Waterman, editor. Mathematical Methods for
DNA Sequence Analysis. CRC Press, 1989.
[12] S. Wu and U. Manber. Fast text searching al-
Communications of the ACM, lowing errors.
35(10):83-91, October 1992.
沒有留言:
張貼留言