2009-03-04

DM Dataset

http://www.kdnuggets.com/

*
Datasets

*

Datasets for Data Mining

*

KDD Cup and Workshop 2007

Co-organized by ACM SIGKDD and Netflix

To be held at KDD-2007, San Jose, California, Aug 12, 2007

http://www.cs.uic.edu/~liub/Netflix-KDD-Cup-2007.html#download


3. Obtaining the Training Dataset and the Qualifying Answer Sets

The Netflix Prize training dataset is available for download from here. You must register separately at that site to download the training dataset, even if you elect not to enter the Netflix Prize contest itself. The format of the training data is described on the Netflix Prize website and in the training dataset file. No additional training data will be provided. The qualifying answer sets can be downloaded from the links below. The user_ids and movie_ids are taken from the Netflix Prize training dataset.

*

Welcome to the UCI Knowledge Discovery in Databases Archive

This is an online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application areas. The primary role of this repository is to enable researchers in knowledge discovery and data mining to scale existing and future data analysis algorithms to very large and complex data sets.

Creation of this archive was supported by a grant from the Information and Data Management Program at the National Science Foundation. The archive is intended to serve as a permanent repository of publicly-accessible data sets for research in KDD and data mining. It complements the original UCI Machine Learning Archive , which typically focuses on smaller classification-oriented data sets.

In addition to storing data and description files, we also archive task files that describe a specific analysis, such as clustering or regression, for the data sets stored. The call for data sets lists typical data types and tasks of interest.

Contents

     Data Sets                               Task Files
 

http://kdd.ics.uci.edu/

*

Welcome to the UC Irvine Machine Learning Repository!

We currently maintain 177 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. Our old web site is still available, for those who prefer the old format. For a general overview of the Repository, please visit our About page. For information about citing data sets in publications, please read our citation policy. If you wish to donate a data set, please consult our donation policy. For any other questions, feel free to contact the Repository librarians. We have also set up a mirror site for the Repository.

http://archive.ics.uci.edu/ml/

*

Frequent Itemset Mining Dataset Repository


Home | Implementations | Datasets | Experiments | FIMI'03 | FIMI'04

The following two datasets were generated using the generator from the IBM Almaden Quest research group. This generator can be downloaded from their website.
Another implementation that can be compiled using the g++ compilers can be dowloaded from Paolo Palmerini's website.
The following datasets were prepared by Roberto Bayardo from the UCI datasets and PUMSB.
The next dataset was provided to us by Ferenc Bodon and contains (anonymized) click-stream data of a hungarian on-line news portal.
There are three datasets available which were used for the KDD CUP 2000.
They're described in the paper "Real world performance of association rule algorithms" by Zheng, Kohavi and Mason.
Before you can download the datasets, you are required to clickthrough on an agreement,
after which you recieve a password that will allow you to download the datasets: The following dataset was donated by Tom Brijs and contains the (anonymized) retail market basket data from an anonymous Belgian retail store.
The data are provided 'as is'. Basically, any use of the data is allowed as long as the proper acknowledgment is provided and a copy of the work is provided to Tom Brijs.
More details can be found here. The following dataset was donated by Karolien Geurts and contains (anonymized) traffic accident data.
The data are provided 'as is'. Basically, any use of the data is allowed as long as the proper acknowledgement is provided and a copy of the work is provided to Karolien Geurts.
More details can be found here. The following dataset was donated by Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri and was built from a spidered collection of web html documents.
More details can be found here.
http://fimi.cs.helsinki.fi/data/

*

Gene Expression Omnibus: a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.

Public data  
GPL Platforms 5674
GSM Samples 288390
GSE Series 11381
Total
305445


http://www.ncbi.nlm.nih.gov/geo/

***


沒有留言: