4.1. ClosedPROWL演算法架構
Ref
http://thesis.lib.ncu.edu.tw/ETD-db/ETD-search-c/view_etd?URN=91522018
博碩士論文 91522018 詳細資訊
姓名 林國瑞(Kuo-Zui Lin) 電子郵件信箱 kuozui@db.csie.ncu.edu.tw 畢業系所 資訊工程研究所(Computer Science and Information Engineering) 畢業學位 碩士(Master) 畢業時期 92學年第2學期 論文名稱(中) 時序資料庫中緊密頻繁連續事件型樣之有效探勘 論文名稱(英) ClosedPROWL: Efficient Mining of Closed Frequent Continuities in Temporal Databases 檔案 本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
91522018.pdf
請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。論文使用權限 同意立即開放 論文語文/頁數 中文/40 統計 本論文已被瀏覽 395 次,被下載 193 次 摘要(中) 在資料探勘的領域中,型樣 探勘一直是個相當重要的課題。早期,大部分的研究如頻繁項目集,主要在找尋同一筆交易中項目間的關聯性。近來,為能更有效地預測分析資料庫的行為趨勢,學 者開始將焦點集中在交易間關聯性之探勘,用來描述不同交易間項目彼此的關係。連續事件即為一種交易間關聯性型樣,其明確描述著不同交易之間的相對位置與前 後順序等關係。由於連續事件跨越了交易記錄間的藩籬,以致於潛在型樣與規則的數量急遽增加,如此不但會降低整體演算法的效率,還會使探勘結果難以運用,因 此我們選擇探勘緊密頻繁連續事件。緊密頻繁連續事件是一群具有代表性的頻繁連續事件,不但數量相對較少,且可以由其展開列舉出所有的頻繁連續事件,因此具 有消除冗餘資訊又不喪失其完整性的優點。本篇論文中,我們提出一個有效率的演算法ClosedPROWL,主要採用投影視窗列表技術以進行緊密頻繁連續事 件的探勘。實驗結果顯示,不論在合成資料集或真實資料集,相較於之前其他方法,我們的演算法皆擁有更佳的效能與延展性。 摘要(英) Mining frequent patterns in temporal databases is a fundamental and essential problem in data mining areas. Over the past few years a considerable number of studies have been made in frequent itemset mining, which consider only relationships among items in the same transaction. Recently, researchers began to focus the problem on the inter-transaction association that describes the association relationships among different transactions. A continuity is a kind of inter-transaction association which describes definite temporal relationships among different transactions. Since continuities breaks the barrier of transactions, the number of potential patterns will increase drastically. An alternative idea is to mine closed frequent continuities. Mining closed frequent patterns has the same power as mining the complete set of frequent patterns, while substantially reduce redundant rules to be generated and increase the effectiveness of mining. In this paper, we propose an efficient algorithm, ClosedPROWL, for closed frequent continuities mining by projected window list technology. Experimental evaluation on both real world and synthetic datasets shows that our algorithm is more efficient and scalable compared to previously proposed algorithm. 關鍵字(中) 型樣探勘 緊密頻繁連續事件 交易間關聯性探勘 資料探勘 關鍵字(英) Pattern Mining Closed Frequent Continuities Inter-Transaction Association Mining Data Mining 論文目次 第一章 緒論1
1.1. 研究動機與目的1
1.2. 研究貢獻4
1.3. 論文架構4
第二章 相關研究5
2.1. 頻繁事件序探勘5
2.1.1. WINEPI演算法5
2.1.2. MINEPI演算法6
2.2. 週期性型樣探勘7
2.2.1. LSI演算法7
2.2.2. SMCA演算法9
2.3. 頻繁連續事件探勘11
2.3.1 FITI演算法11
第三章 問題定義14
第四章 ClosedPROWL演算法18
4.1. ClosedPROWL演算法架構18
4.2. 緊密頻繁事件集之探勘20
4.3. 緊密頻繁事件集編碼與資料庫轉換21
4.4. 緊密頻繁連續事件之探勘21
4.4.1. 探勘流程21
4.4.2. 搜尋空間刪減技術24
4.4.3. 緊密連續事件檢查機制27
4.4.4. 實例說明28
4.5. ClosedPROWL演算法正確性分析30
第五章 實驗結果32
5.1. 合成資料集(Synthetic Data)32
5.1.1. 資料產生器說明32
5.1.2. 效能與延展性分析33
5.2.真實資料集(Real World Data)37
第六章 結論40
參考文獻41參考文獻 1.R.C. Agarwal, C.C. Aggarwal, and V. Parsad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing, 61(3): 350-371, 2001.
2.R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th International Conference Very Large Data Bases (VLDB'94), pp. 487-499, 1994.
3.M. N. Garofalakis, R. Rastogi, and K. Shim. Spirit: Sequential pattern mining with regular expression of constraints. IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(3): 530-552, 2002.
4.K.Y. Huang and C.H. Chang, Asynchronous periodic patterns mining in temporal databases, In Proc. of the IASTED International Conference on Databases and Applications (DBA), pp. 43-48, February 17-19, 2004, Austria.
5.K.Y. Huang, C.H. Chang and K.Z. Lin, PROWL: An efficient frequent continuity mining algorithm on event sequences. In Proc. of 6th International Conference on Data Warehousing and Knowledge Discovery (DaWak'04), Septemper 1-3, 2004, Spain. To appear.
6.J. Han and J. Pei. Mining frequent patterns by pattern-growth: Methodology and implications. ACM SIGKDD Explorations (Special Issue on Scalable Data Mining Algorithms), 2(2): 14-20, 2000.
7.J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery: An International Journal(DMKD), 8(1): 53-87, 2004.
8.H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proc. of the International Conference on Knowledge Discovery and Data Mining, pp. 146-151, 1996.
9.H. Mannila, H. Toivonen and A. I. Verkamo. Discovering frequent episodes in sequences. In Proc. of the First International Conference on Knowledge Discovery and Data Mining. (KDD'95), pp. 210-215, 1995.
10.H. Mannila, H. Toivonen and A. I. Verkamo. Discovery of frequent episodes in event sequences. In Journal of the Data Mining and Knowledge Discovery, pp. 259-289, 1997.
11.R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. of the 5th International Conference on Extending Database Technology (EDBT'96), pp. 3-17, 1996.
12.A. K. H. Tung, H. Lu, J. Han and L. Feng. Breaking the barrier of transactions: Mining inter-transaction association rules. In Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 297-301, 1999.
13.A. K. H. Tung, H. Lu, J. Han and L. Feng. Efficient mining of intertransaction association rules. IEEE Transactions on Knowledge and Data Engineering, 15(1): 43-56, 2003.
14.J. Yang, W. Wang, and P. S. Yu. Mining asynchronous periodic patterns in time series data. In Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'00), pp. 275-279, 2000.
15.J. Yang, W. Wang, and P. S. Yu. Mining asynchronous periodic patterns in time series data. IEEE Transactions on Knowledge and Data Engineering, 15(3): 613-628, 2003.
16.M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering (TKDE), 12(3): 372-390, 2000.
17.M. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31-60, 2001.
18.M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. of 2nd SIAM International Conference on Data Mining (SIAM' 02), pp. 457-473, 2002.指導教授 張嘉惠 (Chia-Hui Chang)
口試日期 2004-07-09 繳交日期 2004-07-15