Resources: Supplementary Readings
Part I: Reference Materials for
CS412
Introduction
- U.
M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and
Data Mining. The MIT Press, 1996.
- J.
Han and M. Kamber. Data Mining:
Concepts and Techniques. Morgan Kaufmann, 2000.
- R.
O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed.,
Wiley-Inter-science, 2001.
- U.
Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining
and Knowledge Discovery, Morgan Kaufmann, 2001
- T.
Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, Springer-Verlag, 2001
- I.
H. Witten and E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques with Java Implementations, Morgan Kaufmann, 2001
- V.
Ganti, J. Gehrke, R. Ramakrishnan. Mining very
large databases. COMPUTER, 32(8):38-45, 1999.
- S.
Chaudhuri, U. Dayal, and V. Ganti, Database
Technology for Decision Support Systems. Computer, 34(12):48-55,
Dec. 2001.
Data Preprocessing
- T.
Dasu and T. Johnson, Exploratory Data Mining and Data
Cleaning, John Wiley \& Sons, Inc., New Jersey, 2003.
- D.
Barbará et al. The New
Jersey Data Reduction Report.Bulletin of the Technical Committee on Data
Engineering, 20, Dec. 1997, pp. 3-45.
- Liu
H.; Hussain F.; Tan C.L.; Dash M.. Discretization:
An enabling techniques. Data Mining and Knowledge Discovery, 6(4):
393-423, 2002.
- V.
Raman and J. M. Hellerstein. Potter's
Wheel: An Interactive Data Cleaning System, Proc. 2001 Int. Conf. on
Very Large Data Bases (VLDB'01), Rome, Italy, pp. 381-390, Sept. 2001.
- H.
Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative
Data Cleaning: Language, Model, and Algorithms Proc. 2001 Int. Conf. on
Very Large Data Bases (VLDB'01), Rome, Italy, pp. 371-380, Sept. 2001.
- D.
Pyle. Data Preparation for Data Mining. Morgan
Kaufmann, 1999.
- T.
Dasu, T. Johnson, S. Muthukrishnan, V.
Shkapenyuk. Mining
Database Structure; Or, How to Build a Data Quality Browser. Proc. 2002
ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'02), Madison, WI, pp. 240-251, June 2002.
Data Warehouse, OLAP, and Data Generalization
- R.
Kimball. The Data Warehouse Toolkit,
2ed, John Wiley & Sons, New York, 2002.
- S.
Chaudhuri, and U. Dayal. An
overview of data warehousing and OLAP technology. ACM SIGMOD
Record, 26(1):65-74, 1997.
- J.
Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F.
Pellow, and H. Pirahesh. Data
cube: A relational aggregation operator generalizing group-by, cross-tab and
sub-totals. Data Mining and Knowledge Discovery, 1(1):29-54, 1997.
- V.
Harinarayan, A. Rajaraman, and J. D. Ullman.
Implementing data cubes efficiently. In SIGMOD'96, pp. 205-216,
Montreal,
Canada, June
1996.
- S.
Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R.
Ramakrishnan, and S. Sarawagi. On the
computation of multidimensional aggregates. In Proc. 1996 Int. Conf.
Very Large Data Bases (VLDB'96), pp. 506-521, Bombay, India, Sept. 1996.
- Y.
Zhao, P. M. Deshpande, and J. F. Naughton. An
array-based algorithm for simultaneous multidimensional aggregates. In
SIGMOD'97, pp. 159-170, Tucson, Arizona, May 1997.
- R.
Agrawal, A. Gupta, and S. Sarawagi. Modeling
multidimensional databases. In Proc. 1997 Int. Conf. Data Engineering
(ICDE'97), Birmingham,
England, April
1997.
- J.
Han, Y. Cai and N. Cercone, Knowledge
Discovery in Databases: An Attribute-Oriented Approach in (VLDB'92)
, Vancouver,
Canada, August
1992, pp. 547-559.
- S.
Sarawagi, R. Agrawal, and N. Megiddo.
Discovery-driven exploration of OLAP data cubes. In Proc. Int. Conf. of
Extending Database Technology (EDBT'98), Valencia, Spain, pp. 168-182,
March 1998.
- S.
Sarawagi Explaining
Differences in Multidimensional Aggregates. In Proc. Int. Conf. of Very
Large Data Bases (VLDB'99), pp. 42-53
- K. A. Ross, D. Srivastava, and D.
Chatziantoniou. Complex aggregation at multiple
granularities. In EDBT'98, pp. 263-277, Valencia, Spain, March 1998.
- K.
Beyer and R. Ramakrishnan. Bottom-up
computation of sparse and iceberg cubes. In SIGMOD'99, pp.
359--370, Philadelphia,
PA, June 1999.
- J.
Han. Towards
on-line analytical mining in large databases.ACM SIGMOD Record,
27:97-107, 1998.
- G.
Sathe and S. Sarawagi. Intelligent
Rollups in Multidimensional OLAP Data. In Proc. Int. Conf. of Very
Large Data Bases (VLDB'01), Rome, Italy, pp. 531-540
- J.
Han, J. Pei, G. Dong, and K. Wang. Efficient
computation of iceberg cubes with complex measures. In SIGMOD'01,
pp. 1--12, Santa Barbara,
CA, May 2001.
- G.
Dong, J. Han, J. Lam, J. Pei, and K. Wang. Mining
Multi-Dimensional Constrained Gradients in Data Cubes. In VLDB'01,
Rome, Italy, Sept. 2001.
- W.
Wang, H. Lu, J. Feng, and J. X. Yu. Condensed
Cube: An Effective Approach to Reducing Data Cube Size. In Proc. 2002
Int. Conf. Data Engineering (ICDE'02) , San Fransisco, CA, April 2002.
- L.
V. S. Lakshmanan, J. Pei, and J. Han, Quotient
Cube: How to Summarize the Semantics of a Data Cube, Proc. 2002
Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002.
- D.
Xin, J. Han, X. Li, B. W. Wah, “Star-Cubing:
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration”,
Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
- X.
Li, J. Han, and H. Gonzalez, “High-Dimensional
OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very
Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004
- Z.
Shao, J. Han, and D. Xin, “MM-Cubing:
Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004 Int.
Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004
Mining Frequent Patterns and Association Rules in
Large Databases
Basic concepts
- R.
Agrawal, T. Imielinski, and A. Swami.
Mining association rules between sets of items in large databases. SIGMOD'93, 207-216, Washington, D.C. (citeseer)
- H.
Mannila, H. Toivonen, and A. I. Verkamo. Efficient
algorithms for discovering association rules. KDD'94, 181-192, Seattle, WA, July 1994. (citeseer)
Efficient mining algorithms (including
efficient algorithms for mining max
and closed patterns)
- R.
Agrawal and R. Srikant. Fast
algorithms for mining association rules. In VLDB'94, pp. 487-499,
Santiago,
Chile, Sept.
1994.
- Ashoka
Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for
Mining Association Rules in Large Databases. VLDB 1995: 432-444. (citeseer)
- J.S.
Park,
M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD'95, San
Jose, CA, May
1995. (citeseer)
- D.W.
Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association
rules in large databases: An incremental updating technique. ICDE'96,
New Orleans, LA. (citeseer)
- T.
Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using
two-dimensional optimized association rules: Scheme, algorithms, and
visualization. SIGMOD'96, Montreal, Canada. (citeseer)
- H.
Toivonen. Sampling large
databases for association rules.
VLDB'96, 134-145, Bombay, India, Sept. 1996. (citeseer)
- J.
Han, J. Pei, and Y. Yin. Mining
Frequent Patterns without Candidate Generation., Proc. 2000
ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX, May 2000.
- R.
Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for
generation of frequent itemsets. In Journal of Parallel and Distributed
Computing, 2000. (citeseer)
- J.
Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-Mine:
Hyper-Structure Mining of Frequent Patterns in Large Databases , Proc.
2001 Int. Conf. on Data Mining (ICDM'01)}, San Jose, CA, Nov. 2001.
- Zaki and Hsiao. CHARM: An
Efficient Algorithm for Closed Itemset Mining, Proc. 2002
SIAM Int. Conf. Data Mining
(SDM'02), Arlington,
VA, pp. 457-473, April 2002.
- J.
Wang, J. Han, and J. Pei, “CLOSET+:
Searching for the Best Strategies for Mining Frequent Closed Itemsets”,
Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'03), Washington,
D.C., Aug. 2003.
- Y.
Xu, J. X. Yu, G. Liu, H. Lu, From Path Tree
To Frequent Patterns: A Framework for Mining Frequent Patterns, Proc. 2002
Int. Conf. on Data Mining (ICDM'02)}, Japan, Dec. 2002
- F. Pan, G. Cong, A. K. H. Tung, J. Yang, and
M. Zaki , CARPENTER:
Finding Closed Patterns in Long Biological Datasets, Proc. 2003 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington,
D.C., Aug. 2003.
- G. Liu, H. Lu, Y. Xu, J. X. Yu, Ascending
Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns,
Proc. 2003 Int. Conf. on Database Systems for Advanced Applications
(DASFAA’03), Kyoto, Japan, March
2003.
- Mohammad El-Hajj and
Osmar R. Zaïane, Inverted Matrix: Efficient Discovery of Frequent Items
in Large Datasets in the Context of Interactive Mining, in
Proc. 2003 Int'l Conf. on Data Mining and Knowledge Discovery (ACM SIGKDD),
Washington, DC, USA, August 24-27, 2003
- G.
Liu, H. Lu, W. Lou, J. X. Yu , On Computing,
Storing and Querying Frequent Patterns, Proc. 2003 ACM SIGKDD Int. Conf.
on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.
- B.
Goethals, M. Zaki: FIMI: Workshop on Frequent Itemset Mining
Implementations (An Introduction). ICDM-FIMI Workshop, Melbourne, Florida, Nov. 2003.
- Gao Cong, Anthony K.H.
Tung, Xin Xu, Feng Pan, Jiong Yang, FARMER:
Finding Interesting Rule Groups in Microarray Datasets, SIGMOD’04
Extension
of the scope: Mining multilevel,
quantitative rules, correlation and causality
- J.
Han and Y. Fu. Discovery of
multiple-level association rules from large databases. In VLDB'95,
pp. 420-431, Zürich,
Switzerland,
Sept. 1995.
- R.
Srikant and R. Agrawal. Mining
generalized association rules. In VLDB'95, pp. 407-419, Zürich, Switzerland, Sept. 1995.
- R.
Srikant and R. Agrawal. Mining
quantitative association rules in large relational tables. In
SIGMOD'96, pp. 1-12, Montreal, Canada, June 1996.
- M.J.
Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast
discovery of association rules. KDD’97. August 1997. (citeseer)
- B.
Lent, A. Swami, and J. Widom. Clustering
association rules. In ICDE'97, pp. 220-231, Birmingham, England, April 1997.
- S. Brin, R. Motwani, and C. Silverstein.
Beyond market
basket: Generalizing association rules to correlations. In
SIGMOD'97, pp. 265-276, Tucson, Arizona, May 1997.
- C.
Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable
techniques for mining causal structures.
VLDB'98, 594-605, New
York, NY. (citeseer)
- D.
Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov.
Query flocks: A generalization of
association-rule mining. SIGMOD'98, 1-12, Seattle, Washington. (citeseer)
- Y. Aumann and Y. Lindell. A Statistical
Theory for Quantitative Association Rules Proc. 1999 Int. Conf.
Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, 261-270, Aug. 1999.
- R.
J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93,
Seattle, Washington. (citeseer)
- J.
Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining Top-K
Frequent Closed Patterns without Minimum Support”, Proc. 2002 Int.
Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002.
- A.
Savasere, E. Omiecinski, S. B. Navathe, Mining
for Strong Negative Associations in a Large Database of Customer
Transactions, In ICDE’98,Feb., 1998, Orlando, Florida.
- E.
Omiecinski. Alternative Interest Measures for Mining
Associations, IEEE Trans. Knowledge and Data Engineering,
15(1):57-69, 2003.
- Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J.
Han, “CoMine:
Efficient Mining of Correlated Patterns”, Proc. 2003 Int. Conf. on Data Mining
(ICDM'03), Melbourne,
FL, Nov. 2003.
- Deepayan Chakrabarti, Spiros Papadimitriou,
Dharmendra Modha, Christos Faloutsos, Fully
Automatic Cross-Associations, Proc.
2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004, pp. 79-88
Constraint-based mining:
- R.
Srikant, Q. Vu, and R. Agrawal. Mining association rules with item
constraints. KDD'97, 67-73, Newport
Beach, California,
1997. (citeseer)
- R.
Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory
mining and pruning optimizations of constrained associations rules. In
SIGMOD'98, pp. 13-24 Seattle, Washington, June 1998.
- F.
Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm
for fast, quantifiable data mining. VLDB'98, 582-593, New York, NY. (citeseer)
- J.
Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based,
multidimensional data mining. COMPUTER, 32(8): 46-50, 1999.
- Edith
Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev
Motwani, Jeffrey D. Ullman, Cheng Yang: Finding Interesting
Associations without Support Pruning. In Proc. Int. Conf. on Data
Engineering (ICDE 2000), pp. 489-499, 2000.
- R.
J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. 1999
Int. Conf. Knowledge Discovery and Data Mining (KDD'99), pp. 145-154,
San Diego, CA, Aug. 1999. (citeseer)
- N.
Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed
itemsets for association rules. In Proc. 7th Int. Conf. Database Theory
(ICDT'99), pages 398-416, Jerusalem, Israel, Jan. 1999. (citeseer)
- J.
Pei, J. Han, and L. V. S. Lakshmanan. Mining
Frequent Itemsets with Convertible Constraints, Proc. 2001 Int.
Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
- G.
Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated
sets. ICDE'00, 512-521, San
Diego, CA, Feb.
2000. (citeseer)
Language
primitives and applications:
- R.
Meo, G. Psaila, and S. Ceri. A new SQL-like
operator for mining association rules. In VLDB'96, pp. 122-133,
Bombay,
India, Sept.
1996.
- T.
Imielinski and A. Virmani. MSQL: a query
language for database mining. Data Mining and Knowledge Discovery, 3(4):
373-408, 1999.
- G. Dong, J. Han, J. Lam, J. Pei, K. Wang,
and W. Zou, ``Mining
ConstrainedGradients in Multi-Dimensional Databases'', IEEE Transactions
on Knowledgeand Data Engineering, 16(5):, 2004.
Classification and Prediction
- J.
R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106,
1986.
- T.
M. Mitchell, Machine Learning, McGraw Hill,
1997.
- S.
M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan
Kaufmann, 1998
- J.
Shafer, R. Agrawal, and M. Mehta. SPRINT: A
scalable parallel classifier for data mining. In VLDB'96, pp.
544-555, Bombay,
India, Sept.
1996.
- J. Gehrke, R. Ramakrishnan, V. Ganti. RainForest:
A framework for fast decision tree construction of large datasets. In
VLDB'98, pp. 416-427, New
York, NY, August 1998.
- J.
Gehrke, V. Gant, R. Ramakrishnan, and W.-Y. Loh, BOAT --
Optimistic Decision Tree Construction . In SIGMOD'99 , Philadelphia, Pennsylvania, 1999
- S.
K. Murthy. Automatic
construction of decision trees from data: A multi-disciplinary survey.
Data Mining and Knowledge Discovery, 2(4): 345-389, 1998.
- C.
J. C. Burges. A Tutorial
on Support Vector Machines for Pattern Recognition. Data Mining and
Knowledge Discovery, 2(2): 121-168, 1998.
- B.
Liu, W. Hsu, and Y. Ma. Integrating
Classification and Association Rule Mining. Proc. 1998 Int. Conf.
Knowledge Discovery and Data Mining (KDD'98) New York, NY, Aug. 1998.
- M. Ankerst, M. Ester, and H.-P. Kriegel.
Towards an effective cooperation of the user and the computer for
classification. In Proc. 2000 Int. Conf. Knowledge Discovery and Data
Mining (KDD'00), pages 179-188, Boston, MA,
Aug. 2000. (citeseer)
- W.
Li, J. Han, and J. Pei, CMAR: Accurate
and Efficient Classification Based on Multiple Class-Association Rules,
, Proc. 2001 Int. Conf. on Data Mining (ICDM'01), San Jose, CA, Nov. 2001.
- X.
Yin and J. Han, “CPAR:
Classification based on Predictive Association Rules”, Proc. 2003 SIAM
Int.Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003.
- H.
Yu, J. Yang, and J. Han, “Classifying
Large Data Sets Using SVM with Hierarchical Clusters”, Proc. 2003 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington,
D.C., Aug. 2003.
- X.
Yin, J. Han, J. Yang, and P. S. Yu, “CrossMine:
Efficient Classification across Multiple Database Relations”, Proc. 2004 Int. Conf. on Data
Engineering
(ICDE'04), Boston, MA, March 2004.
Cluster Analysis
- L.
Kaufman and P. J. Rousseeuw. Finding Groups in Data: an
Introduction to Cluster Analysis. John Wiley & Sons, 1990.
- R.
Ng and J. Han. Efficient and
effective clustering method for spatial data mining. In VLDB'94,
pp. 144-155, Santiago,
Chile, Sept.
1994.
- T.
Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An
efficient data clustering method for very large databases. In
SIGMOD'96, pp. 103-114, Montreal, Canada, June 1996.
- E.
Schikuta. Grid clustering: An efficient hierarchical clustering method for
very large data sets. Proc. 1996 Int. Conf. on Pattern Recognition, 101-105.
(citeseer)
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu.
A
density-based algorithm for discovering clusters in large spatial
databases. In KDD'96, pp. 226-231, Portland, Oregon, August 1996.
- W.
Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach to
Spatial Data Mining, VLDB’97, 1997. (citeseer)
- S.
Guha, R. Rastogi, and K. Shim. CURE: An
efficient clustering algorithm for large databases. In SIGMOD'98,
pp. 73-84, Seattle,
Washington, June 1998.
- S.
Guha, R. Rastogi, and K. Shim. ROCK: A robust
clustering algorithm for categorical attributes. In ICDE'99, pp.
512-521, Sydney,
Australia,
March 1999.
- R.
Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic
subspace clustering of high dimensional data for data mining applications.
In SIGMOD'98, pp. 94-105, Seattle, Washington, June 1998.
- Alexander
Hinneburg, Daniel A. Keim: An Efficient Approach to Clustering in Large
Multimedia Databases with Noise. KDD 1998: 58-65, 1998. (citeseer)
- G.
Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster:
A multi-resolution clustering approach for very large spatial databases.
In VLDB'98, pp. 428-439, New
York, NY, August 1998.
- D.
Gibson, J. Kleinberg, and P. Raghavan. Clustering
categorical data: An approach based on dynamic systems. In Proc. VLDB’98. (citeseer)
- G.
Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A
Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER,
32(8): 68-75, 1999.
- Wei Wang, Jiong Yang, Richard Muntz.
STING+: an approach to active spatial data mining. ICDE 99, pp.
116-125. 1999. (citeseer)
- M. Ankerst, M. Breunig, H.-P. Kriegel, and J.
Sander. Optics:
Ordering points to identify the clustering structure. In SIGMOD'99,
pp. 49-60, Philadelphia,
PA, June 1999.
- V.
Ganti, J. Gehrke, R. Ramakrishan. CACTUS Clustering Categorical Data Using
Summaries. Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining
(KDD'99), San Diego,
CA, 261-270, Aug. 1999. (citeseer) (Journal
version: citeseer)
- M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander.
LOF: Identifying Density-Based Local Outliers. In Proc. ACM SIGMOD Int.
Conf. on Management of Data (SIGMOD 2000), Dallas, TX, 2000, pp. 93-104. (citeseer)
- A.
K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based
Clustering in Large Databases , Proc. 2001 Int. Conf. on Database
Theory (ICDT'01), London,
U.K., Jan.
2001.
- A.
K. H. Tung, J. Hou, and J. Han. Spatial
Clustering in the Presence of Obstacles , Proc. 2001 Int. Conf. on
Data Engineering (ICDE'01), Heidelberg, Germany, April 2001
- H.
Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by
pattern similarity in large data sets, Proc. the ACM SIGMOD
International Conference on Management of Data (SIGMOD), Madison, Wisconsin, 2002.
- Beil F., Ester M., Xu
X.: "Frequent
Term-Based Text Clustering", Proc. 8th Int. Conf. on Knowledge Discovery
and Data Mining (KDD'02), Edmonton, Alberta,
Canada, 2002.
- L.
Parsons, E. Haque and H. Liu, Subspace
Clustering for High Dimensional Data: A Review , SIGKDD Explorations, Vol.
6(1), June 2004
- Samer Nassar, Jörg Sander, Corrine
Cheng, Incremental
and Effective Data Summarization for Dynamic Hierarchical Clustering,
SIGMOD’04
- Sugato
Basu, Mikhail Bilenko, Raymond Mooney, A
Probabilistic Framework for Semi-Supervised Clustering, Proc. 2004 ACM-SIGKDD Int. Conf. on
Management of Data (KDD'04), Seattle, WA,
Aug. 2004
Part II: Reference Materials for
CS512
Stream Data Mining
- R.
Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press,1995.
- S.
Babu and J. Widom Continuous
Queries over Data Streams. SIGMOD Record, pp. 109-120, Sept. 2001.
- B.
Babcock, S. Babu, M. Datar, R. Motwani and J. Widom, “Models and
Issues in Data Stream Systems”, Proc. 2002 ACM-SIGACT/SIGART/SIGMOD Int.
Conf. on Principles of Data base (PODS'02), Madison, WI, June 2002. (Conference
tutorial)
- M.
Garofalakis, J. Gehrke, R. Rastogi, “Querying
and Mining Data Streams: You Only Get One Look”, Tutorial at
2002 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'02), Madison, WI, June 2002.
- S.
Muthukrishnan, Data streams: algorithms and
applications, Proceedings
of the fourteenth annual ACM-SIAM symposium on Discrete algorithms,
2003.
- Y.
Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, " Multi-Dimensional
Regression Analysis of Time-Series Data Streams '', Proc. 2002 Int. Conf.
on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002.
- S.
Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering
Data Streams, Proc. IEEE Symposium on Foundations of Computer Science
(FOCS'00), Redondo
Beach, CA, pp.
359-366, 2000
- Stratis Viglas, Jeffrey
Naughton, Rate-Based
Query Optimization for Streaming Information Sources, SIGMOD’02
- Samuel Madden, Mehul
Shah, Joseph Hellerstein, Vijayshankar Raman, Continuously
Adaptive Continuous Queries over Streams, SIGMOD02.
- Alin Dobra, Minos
N. Garofalakis, Johannes
Gehrke, Rajeev
Rastogi:, Processing
Complex Aggregate Queries over Data Streams, SIGMOD’02
- Gurmeet Singh Manku,
Rajeev Motwani.. Approximate
Frequency Counts over Data Streams, VLDB’02
- Yunyue Zhu, Dennis
Shasha. StatStream: Statistical
Monitoring of Thousands of Data Streams in Real Time, VLDB’02
- J. Gehrke, F. Korn, D. Srivastava.
On
computing correlated aggregates over continuous data streams. Proc.
2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, pp. 13-24, May 2001.
- Geoff Hulten, Laurie
Spencer, Pedro Domingos: Mining
time-changing data streams. KDD
2001: 97-106
- J.
Han, ``Mining
Dynamics of Data Streams in Multidimensional Space '' (in PowerPoint),
ICDM'02 Keynote Speech, Maebashi
City, Japan, Dec. 2002.
- C.
Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework
for Clustering Evolving Data Streams”, Proc. 2003 Int. Conf.
on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
- H.
Wang, W. Fan, P. S. Yu, and J. Han, “Mining
Concept-Drifting Data Streams using Ensemble Classifiers”, Proc. 2003 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington,
D.C., Aug. 2003.
- C.
Giannella, J. Han, J. Pei, X. Yan and P.S. Yu, “Mining Frequent
Patterns in Data Streams at Multiple Time Granularities”, H. Kargupta, A.
Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, 2003.
- Wei
Fan, Systematic
Data Selection to Mine Concept-Drifting Data Streams, KDD-04.
Mining Time-Series
- R.
Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast
similarity search in the presence of noise, scaling, and translation in
time-series databases. In VLDB'95, pp. 490-501, Zurich, Switzerland, Sept. 1995.
- Y.-S. Moon, K.-Y.
Whang, W.-K. Loh. Duality-Based
Subsequence Matching in Time-Series Databases., Proc. 2001 Int. Conf.
Data Engineering (ICDE'01), Heidelberg, Germany, pp. 263-272, April
2001
- R.
Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying
shapes of histories. In VLDB'95, pp. 502-514, Zürich, Switzerland, Sept. 1995.
- Michail Vlachos, Chris
Meek, Zografoula Vagena, Dimitrios Gunopulos, Identifying Similarities, Periodicities and Bursts for Online
Search Queries, SIGMOD’04, pp. 213-224
Mining Sequential Patterns
5.
R. Agrawal and R. Srikant. Mining
sequential patterns. In ICDE'95, pp. 3-14, Taipei, Taiwan, March 1995.
- Mannila
H.; Toivonen H.; Inkeri Verkamo A., Discovery
of Frequent Episodes in Event Sequences. Data Mining and Knowledge
Discovery, 1997, vol. 1, no. 3, pp. 259-289(31)
- M.
Garofalakis, R. Rastogi, and K. Shim. SPIRIT:
Sequential pattern mining with regular expression constraints. In Proc.
1999 Int. Conf. Very Large Data Bases (VLDB'99), pp. 223-234, Edinburgh,
UK, Sept. 1999.
- J.
Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan:
Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth.
, Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
- J.
Han, G. Dong, and Y. Yin. Efficient mining of
partial periodic patterns in time series database. In ICDE'99, pp.
106-115, Sydney,
Australia,
April 1999.
- J.
Pei, J. Han, and W. Wang, “Mining Sequential
Patterns with Constraints in Large Databases”, Proc. 2002 Int. Conf.
on Information and Knowledge Management (CIKM'02)}, Washington, D.C., Nov. 2001.
- X.
Yan, J. Han, and R. Afshar, “CloSpan:
Mining Closed Sequential Patterns in Large Datasets”, Proc. 2003
SIAM Int. Conf. on Data
Mining (SDM'03), San
Fransisco, CA, May
2003.
- P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K
Closed Sequential Patterns”, Proc.
2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL, Nov. 2003.
- J.
Wang and J. Han, “BIDE: Efficient Mining
of Frequent Closed Sequences”, Proc.
2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA,
March 2004.
Spatial, Spatiotemporal, and Multimedia Data Mining
- K.
Koperski and J. Han. Discovery of spatial
association rules in geographic information databases. In Proc. 4th
Int'l Symp. on Large Spatial Databases (SSD'95), pp. 47-66, Portland, Maine, Aug. 1995.
- X.
Zhou, D. Truffet, and J. Han. Efficient polygon
amalgamation methods for spatial OLAP and spatial data mining. In
SSD'99, pp. 167-187, Hong Kong, Aug.
1999.
- J.
Han, R. B. Altman, V. Kumar, H. Mannila and D. Pregibon, “ Emerging Scientific
Applications in Data Mining”, Communications of ACM, 45(8):54-58,
2002.
- Shashi Shekhar and Sanjay Chawla,
Spatial Databases: A Tour , Prentice Hall, 2003 (ISBN 013-017480-7). Chapter
7.: Introduction to
Spatial Data Mining.
- S.
Shekhar and Y. Huang, Discovering
Spatial Co-location Patterns: A Summary of Results , Proc. in 7th
International Symposium on Spatial and Temporal Databases(SSTD01), L.A., CA,
July 2001.
- S.
Shekhar, C. T. Lu, and P. Zhang, A
Unified Approach to Detecting Spatial Outliers , GeoInformatica, 2003
(A shorter version appeared in SIGKDD 2001).
- S.
Shekhar, V. R. Raju, P. Schrater, W. Wu, Spatial
Contextual Classification and Prediction Models for Mining Geospatial
Data, IEEE Trans. on Multimedia Systems, vol4. No.2, June 2002.
- Tom
Barclay, Jim Gray, and Don Slutz,
Microsoft
TerraServer: A Spatial Data Warehouse, 2000 ACM SIGMOD Dallas, TX, pg. 307-318
- Yufei
Tao - City University of Hong Kong, Christos
Faloutsos - Carnegie Mellon University, Dimitris Papadias,
Bin Liu - Hong Kong University of Science, Prediction and Indexing of Moving Objects with Unknown Motion
Patterns. Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data
(SIGMOD'04), Paris,
France, June
2004
- Yuhan Cai, Raymond
Ng - University
of British Columbia, Vancouver, Canada, Indexing Spatio-Temporal Trajectories with Chebyshev
Polynomials. Proc. 2004 ACM-SIGMOD Int.
Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004
- Man Lung Yiu, Nikos Mamoulis, Clustering
Objects on a Spatial Network, SIGMOD’04
Mining Graphs
and Structured Patterns
- X.
Yan and J. Han, “gSpan:
Graph-Based Substructure Pattern Mining”, Proc. 2002 Int. Conf. on
Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002.
- X.
Yan and J. Han, “CloseGraph:
Mining Closed Frequent Graph Patterns”, Proc. 2003 ACM SIGKDD Int. Conf.
on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.
- G.Jeh,
and J. Widom, Mining the
Space of Graph Properties, KDD'04 pp.187-197.
- Christos
Faloutsos, Kevin McCurley, and Andrew Tomkins, “Fast
Discovery of 'Connection Subgraphs’,” Proc. 2004 ACM-SIGKDD Int. Conf. on
Management of Data (KDD'04), Seattle, WA,
Aug. 2004
- X.
Yan, P. S. Yu, and J. Han, “Graph Indexing: A
Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD Int. Conf. on
Management of Data (SIGMOD'04), Paris, France, June 2004
Biological Data Mining
- J. Yang, P. Yu, W.
Wang, and J. Han, '' Mining Long
Sequential Patterns in a Noisy Environment '', Proc. 2002 ACM-SIGMOD Int.
Conf. on Management of Data (SIGMOD'02), Madison, WI, June 2002.
- H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by
pattern similarity in large data sets, Proc. the ACM SIGMOD
International Conference on Management of Data (SIGMOD), Madison, Wisconsin, 2002.
Mining Social Networks
- P. Domingoa and M.
Richardson. Mining the
Network Value of Customers, in Proc. 2001 ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (pp. 57-66), 2001. San Francisco, CA: ACM Press.
- P. Domingos and M.
Richardson. Mining
Knowledge-Sharing Sites for Viral Marketing, Proceedings of the Eighth
International Conference on Knowledge Discovery and Data Mining, 2002.
- D. Kempe, J. Kleinberg, E. Tardos. Maximizing
the Spread of Influence through a Social Network. Proc. 9th ACM SIGKDD
Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
- Deng
Cai, Xiaofei He, Ji-Rong Wen and Wei-Ying Ma. Block-level Link Analysis
( pdf ) , The 27th Annual International ACM SIGIR
Conference (SIGIR'2004) , July 2004.
- Deng
Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma and Ji-Rong Wen. Hierarchical
Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis
( pdf
), ACM Multimedia 2004 , Oct. 2004.
Multi-relational
Data Mining
- S.
Dzeroski. Multi-relational data mining: an introduction.
ACM SIGKDD Explorations, Volume 5, Issue 1, July,
2003.
- X. Yin, J.
Han, J. Yang, and P. S. Yu, “CrossMine: Efficient Classification across Multiple
Database Relations”, Proc. 2004 Int. Conf. on Data
Engineering (ICDE'04), Boston, MA, March 2004
Intrusion Detection and Data Mining
- S.
Mukkamala et al. “Intrusion
detection: support vector machines and neural networks,” in IEEE IJCNN
(May 2002).
- W.
Lee, S. Stolfo, and K. Mok. A data
mining framework for building intrusion detection models. In Information
and System Security, Vol. 3, No. 4, 2000.
- Stefan Axelsson, “Intrusion
Detection Systems: A Taxomomy and Survey”, Technical Report No
99-15, Dept. of Computer Engineering, Chalmers University of Technology, Sweden, March 2000.
- Stefan
Axelsson, “Research
in Intrusion Detection Systems: A Survey”, Technical Report No
98-17, Dept. of Computer
Engineering, Chalmers University of Technology, Sweden, Dec 15, 1998
revised Aug 19, 1999.
- L. Mé and C.
Michel, Intrusion
Detection: A Bibliography - (2001)
- The Snort project,
Snort
User Manual 2.1.1, 2004
Collaborative Filtering and Data Mining
- Badrul M. Sarwar,
George Karypis, Joseph A. Konstan, John Riedl: “Analysis
of recommendation algorithms for e-commerce.” ACM Conference on Electronic
Commerce 2000:158-167
- J.
Breese, D. Heckerman, C. Kadie, “Empirical
Analysis of Predictive Algorithms for Collaborative Filtering”, In Proceedings of Fourteenth
Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann, July 1998 (also has some item-item methods)
- B. M. Sarwar, G. Karypis, J. A. Konstan,
and J. Riedl.“Item-based
collaborative filtering recommendation algorithms”. In Proc. of the 10th
International World Wide Web Conference (WWW10), Hong Kong, May
2001.
- Weiyang Lin, Sergio A. Alvarez, and
Carolina Ruiz. “Efficient
adaptive-support association rule mining for recommender systems”, Data
Mining and Knowledge Discovery, 6:83--105, 2002
Web Mining
- S. Chakrabarti, B. E.
Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J.
Kleinberg. Mining the
Web's link structure. COMPUTER, 32(8):60-67, 1999.
- J. M. Kleinberg. Authoritative
Sources in a Hyperlinked Environment. Journal of ACM, 46(5):604-632, 1999.
- H. Yu, J. Han, and K.
C.-C. Chang, " PEBL: Positive
Example Based Learning for Web Page Classification Using SVM '', Proc.
2002 Int. Conf. on Knowledge Discovery in Databases (KDD'02), Edmonton, Canada, July 2002.
- K. Wang, S. Zhou and
S. C. Liew. Building
hierarchical classifiers using class proximity. In VLDB99, Edinburgh, UK, Sept. 1999.
- J. Han, and K. C.-C.
Chang, “Data Mining for
Web Intelligence”, Computer, Nov. 2002
- Corin
R. Anderson, Pedro Domingos, Daniel S. Weld: Personalizing Web Sites
for Mobile Users. In WWW 2001: pages 565-575. 2001.
Data Mining Applications and Trends in Data Mining
- H. Mannila, Theoretical
Frameworks of Data Mining. SIGKDD Explorations , 1(2): 30-32, 2000
- C. Clifton and D.
Marks. Security and
Privacy Implications of Data Mining. In Proc. 1996 SIGMOD'96 Workshop on
Research Issues on Data Mining and Knowledge Discovery (DMKD'96), Montreal, Canada, pp. 15-20, June 1996.
- R. Agrawal and R. Srikant. Privacy-preserving
data mining. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data
(SIGMOD'00), pages 439-450, Dallas, TX,
May 2000.
- H.
V. Jagadish, J. Madar, and R. Ng. Semantic
compression and pattern extraction with fascicles. In Proc. 1999 Int.
Conf. Very Large Data Bases (VLDB'99), pages 186-197, Edinburgh, UK, Sept. 1999.
- Qiming Chen, Umesh Dayal, Meichun Hsu, OLAP-based Scalable Profiling of Customer
Behavior, In Proc.1999 Int.l Conf.Data Warehousing and Knowledge
Discovery(DAWAK99), Italy, 1999.
- Ron Kohavi, Mining E-Commerce Data: The Good, the Bad,
and the Ugly, KDD’2001, 2001.
- S.
Hill and F. Provost,The Myth of the Double-Blind
Review? Author Identification Using Only Citations,
KDD Explorations,
5(2), Jan. 2004
Data
Mining and Software Engineering
- Jeremy
Kolter and Marcus A. Maloof, Learning to Detect Malicious Executables in the
Wild, Proc. 2004 ACM-SIGKDD Int. Conf. on
Management of Data (KDD'04), Seattle, WA, Aug. 2004
Jiawei
Han