Information
From ResearchWiki
Contents |
Information Track
Courses
- Web Mining (Crawling, Information retrieval and extraction)
- Machine Leaning (maybe 2, one more application-oriented, and another more 'theoretical')
- Algorithms
- Databases
- Data Mining
- NLP
( Statistics Pete has proposed to create a statistics course in CS--we should talk to him)
We also talked about having Linear Algebra as required background--but maybe instead of enforcing this in the track, we should say it is highly recommended. John (Hollerbach) suggested we cover topics like linear algebra in the courses where the knowledge is needed.
[Suresh] I just worry about what happens the first time we introduce SVDs and PCA in a class and someone looks blankly when they see an eigenvalue. Some of these concepts take time to distill into one's brain. covering the definition of an eigenvalue doesn't give the same intuition..
Proposal
- have 3 required courses: DB. Alg, ML
- let students select "n" of the electives
[Suresh] Let students take any electives they want ?
[Hal] then i think the issues on the table are:
course stuff:
- what courses to require as core (PhD-only core; MS: more structured) - what courses to allow as electives (PhD: any, but recommend some from the track; MS: TBD) - what to do about remedial courses (give remedial lectures in our courses; crosslist Pete's stats class)
other stuff:
- track name? Information!
- what rules are there reg. # of track faculty per thesis?
(3 is standard, but if there are only 3 of us, this is strongly
limiting... do we have flexibility here?)
[hal] just a note: mike kirby says the rule is that every committee has to have 2 track faculty and 3 soc faculty (complete overlap is fine).
is there anything else?
i suggest that for general wording, etc., we just steal from gfx.
http://www.cs.utah.edu/research/areas/graphics/graphicsPhD.shtml
we should probably read through and see if there's anything else we don't like there first though.
[Hal] mike jordan's practical machine learning course is at: http://www.cs.berkeley.edu/%7Easimma/294-fall06/
by coincidence, it's actually almost identical to the course i'm teaching now, both in terms of topics covered and depth of presentation.
given this, i'm somewhat reluctant to over-practicalify the ML course. i know that some things in the current course are a bit too math heavy, which i've planned on fixing in subsequent years anyway. but i feel that creating a new "practically-oriented" course seems like overkill. by analogy, one doesn't take a compilers course to learn to use gcc.
[Hal] i've looked through courses in math, ece and i.s. (biz school) to see what might be interesting...here are some notes:
MATH (all potentially relevant are listed): 5010 Introduction to Probability 5080 Statistical Inference I (sampling, CLT, s.s., point est, optimality) 5090 Statistical Inference II (intervals, testing, likelihood, order, rank) *5250 Matrix Analysis (lin. tran, eigen, norms, inverses, groups) 6010 Linear Models (regression and anova) 6020 Multilinear Models (regression and anova) *7710 Optimization 7870 Methods of Optimization ECE 5510 Random Processes (prob, large #s, station., ergo, correlation, noise) *6520 Info Thy and Coding (info, entropy, s-c models, codes, decoding) *6540 Estimation Thy (param est, unbiased, mve, ss, lme, mse, filtering) *6551 Survey of Optimization Techniques (nns, grad+hess desc, CG, annealing, etc) IS (More for MS students) 6481 Data Warehousing (scalability, on-line and off-line) 6482 Data Mining (str, semi-str, un-str) Medical Informatics 6010 Foundations of Medical Informatics 6020 Foundations of Bioinformatics and Genetic Epidemiology 6105 Statistics for Biomedical Informatics 6300 Medical Decision-Making *means not offered recently
within CS, it seems reasonable electives are web mining, data mining, NLP, NLP apps, AI, and then perhaps computer vision and robotics and maybe something else...not sure about the last two though....depends on the student; I don't really care (like suresh) but it would be nice to give a bit of suggested direction.
[Suresh] And don't forget advanced algorithms classes (if I offer a stream algorithms seminar for example).
i'm also curious as to why db should be required and web mining/data mining an elective rather than vice-versa (i don't have a strong preference -- just curious).
[Suresh] I was talking to a colleague of mine at AT&T (divesh), and was describing our track idea to him. The point he made is that for managing large data, if you don't understand at some level how large database systems that use SQL work, how indices work, how relational databases store data etc, you'll have a hard time coming up with data mining schemes that might be effective for large systems. This is true for "real" applications, rather than toy applications one might code up. For example, given that weka is a java toolkit, it's not clear to me how scalable it is. understanding the mechanics of databases might help in this regard.
[Suresh] Note that it's a good thing to draw in students from other departments. I am particularly thinking of the IS classes: if we can advertise data mining/web mining classes and get students from IS, this really helps our SCH
MS Program
- 3 required: DB, Alg, ML
- Select 2 from:
- Web Mining (Crawling, Information retrieval and extraction)
- Advanced Databases
- Applications of NLP
- NLP
- Geometry
- cs5630 Scientific Visualization
Other potential electives of interest:
- cs5100 Foundations of Computer Science
- cs5300 Artificial Intelligence
- cs6210 Advanced Scientific Computing I
- cs6230 High Performance Parallel Computing (?)
- cs6490 Network Security
- Data Mining (later)
Questions arising from Track Document
What "prereq" courses should we list in terms of what the students should know? SCI has: Linear algebra, ODE, PDE, Software Practive (CS 3010/5010), Advanced algorithms and data structures (CS 3020/5020) and Intro to SCI (3200). I am leaning toward: prob/stats, CS3010 and CS3020 only... thoughts?
-- (suresh: The course numbers are not quite right: at least the algorithms class should be 4100 (Joe's algo class). I don't know if making 'foundations' a prereq makes sense; i'd much rather have an algorithms prereq)
What is the course number for the algorithms class we require? ML = 6350, DB = 6530, (Suresh: Alg = 6150 (soon))
I have listed the following courses for WITHIN CS as elective: Foundations (6100), AI (6300), Vision (6320), NLP (6340), SciVis (6630), Applications of NLP (????), Web Mining (????), Advanced Databases (????), Geometry (????). Do any of the ???? have numbers that will remain consistent? I can probably just leave the numbers off for these courses if we don't know yet. Am I missing anything? (I also listed the ones just above, from 5100 through data mining; the latter also lacking a number)
(suresh: Geometry will be 61xx: I'll email joe about this.)