Resources

Newsgroup

Resources:  Supplementary Readings

Part I: Reference Materials for CS412


Introduction

  1. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining. The MIT Press, 1996.
  2. J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
  3. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Inter-science, 2001.
  4. U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001
  5. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001
  6. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2001
  7. V. Ganti, J. Gehrke, R. Ramakrishnan. Mining very large databases. COMPUTER, 32(8):38-45, 1999.
  8. S. Chaudhuri, U. Dayal, and V. Ganti, Database Technology for Decision Support Systems. Computer, 34(12):48-55, Dec. 2001.

Data Preprocessing

  1. T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning, John Wiley \& Sons, Inc., New Jersey, 2003.
  2. D. Barbará et al. The New Jersey Data Reduction Report.Bulletin of the Technical Committee on Data Engineering, 20, Dec. 1997, pp. 3-45.
  3. Liu H.; Hussain F.; Tan C.L.; Dash M.. Discretization: An enabling techniques. Data Mining and Knowledge Discovery, 6(4): 393-423, 2002.
  4. V. Raman and J. M. Hellerstein. Potter's Wheel: An Interactive Data Cleaning System,  Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, pp. 381-390, Sept. 2001.
  5. H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative Data Cleaning: Language, Model, and Algorithms Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, pp. 371-380, Sept. 2001.
  6. D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, 1999.
  7. T. Dasu, T. Johnson, S. Muthukrishnan, V. Shkapenyuk.  Mining Database Structure; Or, How to Build a Data Quality Browser. Proc. 2002 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'02), Madison, WI, pp. 240-251, June 2002.

Data Warehouse, OLAP, and Data Generalization

  1. R. Kimball. The Data Warehouse Toolkit, 2ed, John Wiley & Sons, New York, 2002.
  2. S. Chaudhuri, and U. Dayal. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1):65-74, 1997.
  3. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1(1):29-54, 1997.
  4. V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96, pp. 205-216, Montreal, Canada, June 1996.
  5. S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB'96), pp. 506-521, Bombay, India, Sept. 1996.
  6. Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97, pp. 159-170, Tucson, Arizona, May 1997.
  7. R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. In Proc. 1997 Int. Conf. Data Engineering (ICDE'97), Birmingham, England, April 1997.
  8. J. Han, Y. Cai and N. Cercone, Knowledge Discovery in Databases: An Attribute-Oriented Approach in (VLDB'92) , Vancouver, Canada, August 1992, pp. 547-559.
  9. S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven exploration of OLAP data cubes. In Proc. Int. Conf. of Extending Database Technology (EDBT'98), Valencia, Spain, pp. 168-182, March 1998.
  10. S. Sarawagi Explaining Differences in Multidimensional Aggregates. In Proc. Int. Conf. of Very Large Data Bases (VLDB'99), pp. 42-53
  11. K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In EDBT'98, pp. 263-277, Valencia, Spain, March 1998.
  12. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD'99, pp. 359--370, Philadelphia, PA, June 1999.
  13. J. Han. Towards on-line analytical mining in large databases.ACM SIGMOD Record, 27:97-107, 1998.
  14. G. Sathe and S. Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. In Proc. Int. Conf. of Very Large Data Bases (VLDB'01), Rome, Italy, pp. 531-540
  15. J. Han, J. Pei, G. Dong, and K. Wang. Efficient computation of iceberg cubes with complex measures. In SIGMOD'01, pp. 1--12, Santa Barbara, CA, May 2001.
  16. G. Dong, J. Han, J. Lam, J. Pei, and K. Wang. Mining Multi-Dimensional Constrained Gradients in Data Cubes. In VLDB'01, Rome, Italy, Sept. 2001.
  17. W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed Cube: An Effective Approach to Reducing Data Cube Size. In Proc. 2002 Int. Conf. Data Engineering (ICDE'02) , San Fransisco, CA, April 2002.
  18. L. V. S. Lakshmanan, J. Pei, and J. Han, Quotient Cube: How to Summarize the Semantics of a Data Cube, Proc. 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002.
  19. D. Xin, J. Han, X. Li, B. W. Wah, “Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration”, Proc.  2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
  20. X. Li, J. Han, and H. Gonzalez, “High-Dimensional OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004  
  21. Z. Shao, J. Han, and D. Xin, “MM-Cubing: Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004

Mining Frequent Patterns and Association Rules in Large Databases

Basic concepts

  1. R. Agrawal, T. Imielinski, and A. Swami.  Mining association rules between sets of items in large databases.  SIGMOD'93, 207-216, Washington, D.C. (citeseer)
  2. H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. KDD'94, 181-192, Seattle, WA, July 1994. (citeseer)

Efficient mining algorithms (including efficient algorithms for mining max and closed patterns)

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'94, pp. 487-499, Santiago, Chile, Sept. 1994.
  2. Ashoka Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995: 432-444. (citeseer)
  3. J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95, San Jose, CA, May 1995. (citeseer)
  4. D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. ICDE'96, New Orleans,  LA. (citeseer)
  5. T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD'96, Montreal, Canada. (citeseer)
  6. H. Toivonen.  Sampling large databases for association rules.  VLDB'96, 134-145, Bombay, India, Sept. 1996. (citeseer)
  7. J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation., Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX, May 2000.
  8. R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing, 2000. (citeseer)
  9. J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases , Proc. 2001 Int. Conf. on Data Mining (ICDM'01)}, San Jose, CA, Nov. 2001.
  10. Zaki and Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining,  Proc. 2002 SIAM Int. Conf. Data Mining (SDM'02), Arlington, VA, pp. 457-473, April 2002.
  11. J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.
  12. Y. Xu, J. X. Yu, G. Liu, H. Lu, From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns, Proc. 2002 Int. Conf. on Data Mining (ICDM'02)}, Japan, Dec. 2002
  13. F. Pan, G. Cong, A. K. H. Tung, J. Yang,  and M. Zaki , CARPENTER: Finding Closed Patterns in Long Biological Datasets,  Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.
  14. G. Liu, H. Lu, Y. Xu, J. X. Yu, Ascending Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns, Proc. 2003 Int. Conf. on Database Systems for Advanced Applications (DASFAA’03), Kyoto, Japan, March 2003.
  15. Mohammad El-Hajj and Osmar R. Zaïane, Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining, in Proc. 2003 Int'l Conf. on Data Mining and Knowledge Discovery (ACM SIGKDD), Washington, DC, USA, August 24-27, 2003
  16. G. Liu, H. Lu, W. Lou, J. X. Yu , On Computing, Storing and Querying Frequent Patterns, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.
  17. B. Goethals, M. Zaki: FIMI: Workshop on Frequent Itemset Mining Implementations (An Introduction). ICDM-FIMI Workshop, Melbourne, Florida, Nov. 2003.
  18. Gao Cong, Anthony K.H. Tung, Xin Xu, Feng Pan, Jiong Yang, FARMER: Finding Interesting Rule Groups in Microarray Datasets, SIGMOD’04

Extension of the scope:  Mining multilevel, quantitative rules, correlation and causality

  1. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In VLDB'95, pp. 420-431, Zürich, Switzerland, Sept. 1995.
  2. R. Srikant and R. Agrawal. Mining generalized association rules. In VLDB'95, pp. 407-419, Zürich, Switzerland, Sept. 1995.
  3. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD'96, pp. 1-12, Montreal, Canada, June 1996.
  4. M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. KDD’97. August 1997. (citeseer)
  5. B. Lent, A. Swami, and J. Widom. Clustering association rules. In ICDE'97, pp. 220-231, Birmingham, England, April 1997.
  6. S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In SIGMOD'97, pp. 265-276, Tucson, Arizona, May 1997.
  7. C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures.  VLDB'98, 594-605, New York, NY. (citeseer)
  8. D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks:  A generalization of association-rule mining. SIGMOD'98, 1-12, Seattle, Washington. (citeseer)
  9. Y. Aumann and Y. Lindell. A Statistical Theory for Quantitative Association Rules Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, 261-270, Aug. 1999.
  10. R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93, Seattle, Washington. (citeseer)
  11. J. Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining Top-K Frequent Closed Patterns without Minimum Support”, Proc. 2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002.
  12. A. Savasere, E. Omiecinski, S. B. Navathe, Mining for Strong Negative Associations in a Large Database of Customer Transactions, In ICDE’98,Feb., 1998, Orlando, Florida.
  13. E. Omiecinski. Alternative Interest Measures for Mining Associations, IEEE Trans. Knowledge and Data Engineering, 15(1):57-69, 2003.
  14. Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J. Han, “CoMine: Efficient Mining of Correlated Patterns”,  Proc.  2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL, Nov. 2003.
  15. Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004, pp. 79-88

Constraint-based mining:

  1. R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97, 67-73, Newport Beach, California, 1997. (citeseer)
  2. R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD'98, pp. 13-24 Seattle, Washington, June 1998.
  3. F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast, quantifiable data mining. VLDB'98, 582-593, New York, NY. (citeseer)
  4. J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-based, multidimensional data mining. COMPUTER, 32(8): 46-50, 1999.
  5. Edith Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Jeffrey D. Ullman, Cheng Yang: Finding Interesting Associations without Support Pruning. In Proc. Int. Conf. on Data Engineering (ICDE 2000), pp. 489-499, 2000.
  6. R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD'99), pp. 145-154, San Diego, CA, Aug. 1999. (citeseer)
  7. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. 7th Int. Conf. Database Theory (ICDT'99), pages 398-416, Jerusalem, Israel, Jan. 1999. (citeseer)
  8. J. Pei, J. Han, and L. V. S. Lakshmanan. Mining Frequent Itemsets with Convertible Constraints, Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
  9. G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. ICDE'00, 512-521, San Diego, CA, Feb. 2000. (citeseer)

Language primitives and applications:

  1. R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. In VLDB'96, pp. 122-133, Bombay, India, Sept. 1996.
  2. T. Imielinski and A. Virmani.  MSQL: a query language for database mining. Data Mining and Knowledge Discovery, 3(4): 373-408, 1999.
  3. G. Dong, J. Han, J. Lam, J. Pei, K. Wang, and W. Zou, ``Mining ConstrainedGradients in Multi-Dimensional Databases'', IEEE Transactions on Knowledgeand Data Engineering, 16(5):, 2004.

Classification and Prediction

  1. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986.
  2. T. M. Mitchell, Machine Learning, McGraw Hill, 1997.
  3. S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998
  4. J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In VLDB'96, pp. 544-555, Bombay, India, Sept. 1996.
  5. J. Gehrke, R. Ramakrishnan, V. Ganti. RainForest: A framework for fast decision tree construction of large datasets. In VLDB'98, pp. 416-427, New York, NY, August 1998.
  6. J. Gehrke, V. Gant, R. Ramakrishnan, and W.-Y. Loh, BOAT -- Optimistic Decision Tree Construction . In SIGMOD'99 , Philadelphia, Pennsylvania, 1999
  7. S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4): 345-389, 1998.
  8. C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2): 121-168, 1998.
  9. B. Liu, W. Hsu, and Y. Ma. Integrating Classification and Association Rule Mining. Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98) New York, NY, Aug. 1998.
  10. M. Ankerst, M. Ester, and H.-P. Kriegel. Towards an effective cooperation of the user and the computer for classification. In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD'00), pages 179-188, Boston, MA, Aug. 2000. (citeseer)
  11. W. Li, J. Han, and J. Pei, CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules, , Proc. 2001 Int. Conf. on Data Mining (ICDM'01), San Jose, CA, Nov. 2001.
  12. X. Yin and J. Han, “CPAR: Classification based on Predictive Association Rules”, Proc. 2003 SIAM Int.Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003.
  13. H. Yu, J. Yang, and J. Han, “Classifying Large Data Sets Using SVM with Hierarchical Clusters”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.  
  14. X. Yin, J. Han, J. Yang, and P. S. Yu, CrossMine: Efficient Classification across Multiple Database Relations”, Proc.  2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004.

Cluster Analysis

  1. L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
  2. R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In VLDB'94, pp. 144-155, Santiago, Chile, Sept. 1994.
  3. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In SIGMOD'96, pp. 103-114, Montreal, Canada, June 1996.
  4. E. Schikuta. Grid clustering: An efficient hierarchical clustering method for very large data sets. Proc. 1996 Int. Conf. on Pattern Recognition, 101-105. (citeseer)
  5. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD'96, pp. 226-231, Portland, Oregon, August 1996.
  6. W. Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach to Spatial Data Mining, VLDB’97, 1997. (citeseer)
  7. S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large databases. In SIGMOD'98, pp. 73-84, Seattle, Washington, June 1998.
  8. S. Guha, R. Rastogi, and K. Shim. ROCK: A robust clustering algorithm for categorical attributes. In ICDE'99, pp. 512-521, Sydney, Australia, March 1999.
  9. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD'98, pp. 94-105, Seattle, Washington, June 1998.
  10. Alexander Hinneburg, Daniel A. Keim: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. KDD 1998: 58-65, 1998. (citeseer)
  11. G. Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster: A multi-resolution clustering approach for very large spatial databases. In VLDB'98, pp. 428-439, New York, NY, August 1998.
  12. D. Gibson, J. Kleinberg, and P. Raghavan. Clustering categorical data: An approach based on dynamic systems. In Proc. VLDB’98. (citeseer)
  13. G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32(8): 68-75, 1999.
  14. Wei Wang, Jiong Yang, Richard Muntz. STING+: an approach to active spatial data mining. ICDE 99, pp. 116-125. 1999. (citeseer)
  15. M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In SIGMOD'99, pp. 49-60, Philadelphia, PA, June 1999.
  16. V. Ganti, J. Gehrke, R. Ramakrishan. CACTUS Clustering Categorical Data Using Summaries. Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, 261-270, Aug. 1999. (citeseer) (Journal version: citeseer)
  17. M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. In Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2000), Dallas, TX, 2000, pp. 93-104. (citeseer)
  18. A. K. H. Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based Clustering in Large Databases , Proc. 2001 Int. Conf. on Database Theory (ICDT'01), London, U.K., Jan. 2001.
  19. A. K. H. Tung, J. Hou, and J. Han. Spatial Clustering in the Presence of Obstacles , Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001
  20. H. Wang, W. Wang, J. Yang, and P.S. Yu.  Clustering by pattern similarity in large data setsProc. the ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison, Wisconsin, 2002.
  21. Beil F., Ester M., Xu X.: "Frequent Term-Based Text Clustering", Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), Edmonton, Alberta, Canada, 2002.
  22. L. Parsons, E. Haque and H. Liu, Subspace Clustering for High Dimensional Data: A Review , SIGKDD Explorations, Vol. 6(1), June 2004
  23. Samer Nassar, Jörg Sander, Corrine Cheng, Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering, SIGMOD’04
  24. Sugato Basu, Mikhail Bilenko, Raymond Mooney, A Probabilistic Framework for Semi-Supervised Clustering, Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004

 

Part II: Reference Materials for CS512


Stream Data Mining

  1. R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press,1995.
  2. S. Babu and J. Widom Continuous Queries over Data Streams. SIGMOD Record, pp. 109-120, Sept. 2001.
  3. B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom, “Models and Issues in Data Stream Systems”, Proc. 2002 ACM-SIGACT/SIGART/SIGMOD Int. Conf. on Principles of Data base (PODS'02), Madison, WI, June 2002.  (Conference tutorial)
  4. M. Garofalakis, J. Gehrke, R. Rastogi, “Querying and Mining Data Streams: You Only Get One Look”,  Tutorial  at 2002 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'02), Madison, WI, June 2002.
  5.  S. Muthukrishnan, Data streams: algorithms and applications, Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, 2003.
  6. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, " Multi-Dimensional Regression Analysis of Time-Series Data Streams '', Proc. 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002.
  7. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering Data Streams, Proc. IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, pp. 359-366, 2000
  8. Stratis Viglas, Jeffrey Naughton, Rate-Based Query Optimization for Streaming Information Sources, SIGMOD’02
  9. Samuel Madden, Mehul Shah, Joseph Hellerstein, Vijayshankar Raman, Continuously Adaptive Continuous Queries over Streams, SIGMOD02.
  10. Alin Dobra, Minos N. Garofalakis, Johannes Gehrke, Rajeev Rastogi:, Processing Complex Aggregate Queries over Data Streams, SIGMOD’02
  11. Gurmeet Singh Manku, Rajeev Motwani..  Approximate Frequency Counts over Data Streams, VLDB’02
  12. Yunyue Zhu, Dennis Shasha.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, VLDB’02
  13. J. Gehrke, F. Korn, D. Srivastava.  On computing correlated aggregates over continuous data streams.  Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, pp. 13-24, May 2001.
  14. Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining time-changing data streams. KDD 2001: 97-106
  15. J. Han, ``Mining Dynamics of Data Streams in Multidimensional Space '' (in PowerPoint), ICDM'02 Keynote Speech, Maebashi City, Japan, Dec. 2002.
  16. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Clustering Evolving Data Streams”,  Proc.  2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
  17. H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining Concept-Drifting Data Streams using Ensemble Classifiers”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003. 
  18.  C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu, “Mining Frequent Patterns in Data Streams at Multiple Time Granularities”, H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation Data Mining, 2003.
  19. Wei Fan, Systematic Data Selection to Mine Concept-Drifting Data Streams, KDD-04.

Mining Time-Series

  1. R. Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB'95, pp. 490-501, Zurich, Switzerland, Sept. 1995.
  2. Y.-S. Moon, K.-Y. Whang, W.-K. Loh.  Duality-Based Subsequence Matching in Time-Series Databases., Proc. 2001 Int. Conf. Data Engineering (ICDE'01), Heidelberg, Germany, pp. 263-272, April 2001
  3. R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying shapes of histories. In VLDB'95, pp. 502-514, Zürich, Switzerland, Sept. 1995.
  4. Michail Vlachos, Chris Meek, Zografoula Vagena, Dimitrios Gunopulos, Identifying Similarities, Periodicities and Bursts for Online Search Queries, SIGMOD’04, pp. 213-224

Mining Sequential Patterns

5.      R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE'95, pp. 3-14, Taipei, Taiwan, March 1995.

  1. Mannila H.; Toivonen H.; Inkeri Verkamo A., Discovery of Frequent Episodes in Event Sequences. Data Mining and Knowledge Discovery, 1997, vol. 1, no. 3, pp. 259-289(31)
  2. M. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular expression constraints. In Proc. 1999 Int. Conf. Very Large Data Bases (VLDB'99), pp. 223-234, Edinburgh, UK, Sept. 1999.
  3. J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. , Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
  4. J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In ICDE'99, pp. 106-115, Sydney, Australia, April 1999.
  5. J. Pei, J. Han, and W. Wang, “Mining Sequential Patterns with Constraints in Large Databases”, Proc. 2002 Int. Conf. on Information and Knowledge Management (CIKM'02)}, Washington, D.C., Nov. 2001.
  6. X. Yan, J. Han, and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets”, Proc. 2003 SIAM Int. Conf. on Data Mining (SDM'03), San Fransisco, CA, May 2003.
  7. P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K Closed Sequential Patterns”, Proc.  2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL, Nov. 2003.
  8. J. Wang and J. Han, BIDE: Efficient Mining of Frequent Closed Sequences”, Proc.  2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004.

Spatial, Spatiotemporal, and Multimedia Data Mining

  1. K. Koperski and J. Han. Discovery of spatial association rules in geographic information databases. In Proc. 4th Int'l Symp. on Large Spatial Databases (SSD'95), pp. 47-66, Portland, Maine, Aug. 1995.
  2. X. Zhou, D. Truffet, and J. Han. Efficient polygon amalgamation methods for spatial OLAP and spatial data mining. In SSD'99, pp. 167-187, Hong Kong, Aug. 1999.
  3. J. Han, R. B. Altman, V. Kumar, H. Mannila and D. Pregibon, “ Emerging Scientific Applications in Data Mining”,  Communications of ACM, 45(8):54-58, 2002.
  4. Shashi Shekhar and Sanjay Chawla, Spatial Databases: A Tour , Prentice Hall, 2003 (ISBN 013-017480-7). Chapter 7.:  Introduction to Spatial Data Mining.
  5. S. Shekhar and Y. Huang, Discovering Spatial Co-location Patterns: A Summary of Results , Proc. in 7th International Symposium on Spatial and Temporal Databases(SSTD01), L.A., CA, July 2001.
  6. S. Shekhar, C. T. Lu, and P. Zhang, A Unified Approach to Detecting Spatial Outliers , GeoInformatica, 2003 (A shorter version appeared in SIGKDD 2001).
  7. S. Shekhar, V. R. Raju, P. Schrater, W. Wu, Spatial Contextual Classification and Prediction Models for Mining Geospatial Data, IEEE Trans. on Multimedia Systems, vol4. No.2, June 2002.
  8. Tom Barclay, Jim Gray, and Don Slutz,  Microsoft TerraServer: A Spatial Data Warehouse, 2000 ACM SIGMOD Dallas, TX, pg. 307-318
  9. Yufei Tao - City University of Hong Kong, Christos Faloutsos - Carnegie Mellon University, Dimitris Papadias, Bin Liu - Hong Kong University of Science, Prediction and Indexing of Moving Objects with Unknown Motion Patterns. Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004
  10. Yuhan Cai, Raymond Ng - University of British Columbia, Vancouver, Canada, Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials. Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004
  11. Man Lung Yiu, Nikos Mamoulis, Clustering Objects on a Spatial Network, SIGMOD’04

 

Mining Graphs and Structured Patterns

 

  1. X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining”, Proc. 2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan, Dec. 2002.
  2. X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns”, Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug. 2003.
  3. G.Jeh, and J. Widom, Mining the Space of Graph Properties, KDD'04 pp.187-197.
  4. Christos Faloutsos, Kevin McCurley, and Andrew Tomkins, “Fast Discovery of 'Connection Subgraphs’,” Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004
  5. X. Yan, P. S. Yu, and J. Han, “Graph Indexing: A Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004 

Biological Data Mining

  1. J. Yang, P. Yu, W. Wang, and J. Han, '' Mining Long Sequential Patterns in a Noisy Environment '', Proc. 2002 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'02), Madison, WI, June 2002.
  2. H. Wang, W. Wang, J. Yang, and P.S. Yu.  Clustering by pattern similarity in large data sets,  Proc. the ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison, Wisconsin, 2002.

Mining Social Networks

  1. P. Domingoa and M. Richardson. Mining the Network Value of Customers, in Proc. 2001 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (pp. 57-66), 2001. San Francisco, CA: ACM Press.
  2. P. Domingos and M. Richardson. Mining Knowledge-Sharing Sites for Viral Marketing, Proceedings of the Eighth International Conference on Knowledge Discovery and Data Mining, 2002.
  3. D. Kempe, J. Kleinberg, E. Tardos. Maximizing the Spread of Influence through a Social Network. Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
  4. Deng Cai, Xiaofei He, Ji-Rong Wen and Wei-Ying Ma. Block-level Link Analysis ( pdf ) , The 27th Annual International ACM SIGIR Conference (SIGIR'2004) , July 2004.
  5. Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma and Ji-Rong Wen. Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis ( pdf ), ACM Multimedia 2004 , Oct. 2004.

Multi-relational Data Mining

  1. S. Dzeroski.   Multi-relational data mining: an introduction. ACM SIGKDD Explorations, Volume 5, Issue 1, July, 2003.
  2. X. Yin, J. Han, J. Yang, and P. S. Yu, “CrossMine: Efficient Classification across Multiple Database Relations”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004 

Intrusion Detection and Data Mining

  1. S. Mukkamala et al. “Intrusion detection: support vector machines and neural networks,” in IEEE IJCNN (May 2002).
  2. W. Lee, S. Stolfo, and K. Mok. A data mining framework for building intrusion detection models. In Information and System Security, Vol. 3, No. 4, 2000.
  3. Stefan Axelsson,  Intrusion Detection Systems: A Taxomomy and Survey”, Technical Report No 99-15, Dept. of Computer Engineering, Chalmers University of Technology, Sweden, March 2000.
  4. Stefan Axelsson,  Research in Intrusion Detection Systems: A Survey”, Technical Report No 98-17,  Dept. of Computer Engineering, Chalmers University of Technology, Sweden, Dec 15, 1998 revised  Aug 19, 1999.
  5. L. Mé and C. Michel, Intrusion Detection: A Bibliography - (2001)  
  6. The Snort project, Snort User Manual 2.1.1, 2004  

Collaborative Filtering and Data Mining

  1. Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: “Analysis of recommendation algorithms for e-commerce.” ACM Conference on Electronic Commerce 2000:158-167
  2. J. Breese, D. Heckerman, C. Kadie, “Empirical Analysis of Predictive Algorithms for Collaborative Filtering”,  In Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, Morgan Kaufmann, July 1998  (also has some item-item methods)
  3. B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl.“Item-based collaborative filtering recommendation algorithms”. In Proc. of the 10th International World Wide Web Conference (WWW10), Hong Kong, May 2001.
  4. Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems”, Data Mining and Knowledge Discovery, 6:83--105, 2002

Web Mining

  1. S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. COMPUTER, 32(8):60-67, 1999.
  2. J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of ACM, 46(5):604-632, 1999.
  3. H. Yu, J. Han, and K. C.-C. Chang, " PEBL: Positive Example Based Learning for Web Page Classification Using SVM '', Proc. 2002 Int. Conf. on Knowledge Discovery in Databases (KDD'02), Edmonton, Canada, July 2002.
  4. K. Wang, S. Zhou and S. C. Liew. Building hierarchical classifiers using class proximity. In VLDB99, Edinburgh, UK, Sept. 1999.
  5. J. Han, and K. C.-C. Chang, “Data Mining for Web Intelligence”, Computer, Nov. 2002
  6. Corin R. Anderson, Pedro Domingos, Daniel S. Weld: Personalizing Web Sites for Mobile Users. In WWW 2001: pages 565-575. 2001.

Data Mining Applications and Trends in Data Mining

  1. H. Mannila, Theoretical Frameworks of Data Mining. SIGKDD Explorations , 1(2): 30-32, 2000
  2. C. Clifton and D. Marks. Security and Privacy Implications of Data Mining. In Proc. 1996 SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), Montreal, Canada, pp. 15-20, June 1996.
  3. R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), pages 439-450, Dallas, TX, May 2000.
  4. H. V. Jagadish, J. Madar, and R. Ng. Semantic compression and pattern extraction with fascicles. In Proc. 1999 Int. Conf. Very Large Data Bases (VLDB'99), pages 186-197, Edinburgh, UK, Sept. 1999.
  5. Qiming Chen, Umesh Dayal, Meichun Hsu, OLAP-based Scalable Profiling of Customer Behavior, In Proc.1999 Int.l Conf.Data Warehousing and Knowledge Discovery(DAWAK99), Italy, 1999.
  6. Ron Kohavi, Mining E-Commerce Data: The Good, the Bad, and the Ugly, KDD’2001, 2001.
  7. S. Hill and F. Provost,The Myth of the Double-Blind Review?  Author Identification Using Only Citations, KDD Explorations, 5(2), Jan. 2004

Data Mining and Software Engineering

  1. Jeremy Kolter and Marcus A. Maloof, Learning to Detect Malicious Executables in the Wild, Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004

 


 

 

 

Jiawei Han