COSC 085/ COSC 185
Theoretical Computer Science
Data Mining and Pattern Recognition
A new course at Computer Science Department
for undergraduate (COSC 085) and graduate (COSC 185) students
Prof. Eugene Demidenko
Course features:
Project-driven, active learning
Self-contained with minimum prerequisites: linear algebra and some probability/statistics
Programming, Matlab or R.
Syllabus
1. Introduction to R.
2. Unsupervised learning on the real line. Kernel density estimation.
3. The two types of misclassification error: sensitivity and specificity.
4. Receiver Operating Characteristic (ROC) curve. Optimal classification.
5. Calculation of empirical and theoretical misclassification errors.
6. Homework #1.
7. Supervised classification, logistic regression.
8. PCA as a dimension reduction technique.
9. Homework #3.
10. Hierarchical cluster analysis for unsupervised learning.
11. K-means cluster analysis.
12. Homework #3.
13. Discriminant analysis.
14. Nonlinear discriminant analysis, support vector machine.
13. Homework #4.
14. Predictive mining, model building. ARIMA model.
15. Text and web mining.
16. Image mining.
17. Homework #5.
18. Final team project (2-3 people on the team).
Grade distribution: 50% homework, 50% final project
Tasks and datasets:
1. Gender identification based on height and weight gender.jpg.
2. Counterfeit bank notes Banknote.jpg.
2. Microarray data analysis for cancer survival Cancer.jpg.
3. SP500 stock prices analysis SP500.jpg.
4. Handwriting ZIP digits recognition ZIP.jpg.
5. Histology images Histology.jpg.
6. Language discrimination.
7. Authorship identification Text.jpg.
Sample team projects (Fall 2007):
Some literature: