COSC 085/ COSC 185

Theoretical Computer Science

 

Data Mining and Pattern Recognition

A new course at Computer Science Department

for undergraduate (COSC 085) and graduate (COSC 185) students

Prof. Eugene Demidenko

Course features: 

 

Syllabus

1. Introduction to R.

2. Unsupervised learning on the real line. Kernel density estimation.

3. The two types of misclassification error: sensitivity and specificity.

4. Receiver Operating Characteristic (ROC) curve. Optimal classification.

5. Calculation of empirical and theoretical misclassification errors.

6. Homework #1.

7. Supervised classification, logistic regression.

8. PCA as a dimension reduction technique.

9. Homework #3.

10. Hierarchical cluster analysis for unsupervised learning.

11. K-means cluster analysis.

12. Homework #3.

13. Discriminant analysis.

14. Nonlinear discriminant analysis,  support vector machine.

13. Homework #4.

14. Predictive mining, model building. ARIMA model.

15. Text and web mining.

16. Image mining.

17. Homework #5.

18. Final team project (2-3 people on the team).

 

Grade distribution: 50% homework, 50% final project

 

Tasks and datasets:

1. Gender identification based on height and weight gender.jpg.

2. Counterfeit bank notes Banknote.jpg.

2. Microarray data analysis for cancer survival Cancer.jpg.

3. SP500 stock prices analysis SP500.jpg.

4. Handwriting ZIP digits recognition ZIP.jpg.

5. Histology images Histology.jpg.

6. Language discrimination.

7. Authorship identification Text.jpg.

 

Sample team projects (Fall 2007):

 

How To Be Rich in Stock Market: A data-mining approach

 

 

 

 

Some literature:

M. Levine. Introduction to Statistical Thought, 2006, online: http://www.stat.duke.edu/~michael/book.html

 

Hastie T, Tibshirani, R, Friedman J. The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, 2001.

 

Rencher AC. Methods of Multivariate Analysis. 2nd edition. Wiley, 2002.

 

Giudici P. Applied Data Mining. Statistical Methods for Business and Industry. Wiley, 2003.

 

Elden L. Matrix Methods in Data Mining and Pattern Recognition. SIAM, 2007.