publications Hal Daumé III
about me
cv + bio
publications
research
teaching
students
software
photos
calendar
contact me
links
All of my publications are available here. My Erdös number is at most 4 (me to Suresh Venkatasubramanian to Andy Yao to Paul Erdös).

Click here for a listing with summaries. The SRC files are gzipped tars containing the tex, figures and required style files.

Thesis

Practical Structured Learning for Natural Language Processing. Ph.D. Thesis. [PDF] [BIB] [TGZ] [html]

Journal Papers

newSearch-based Structured Prediction. Machine Learning Journal (2009). (With:John Langford and Daniel Marcu) [PDF] [BIB] [TGZ] [html]
newA Bayesian Statistics Approach to Multiscale Coarse Graining. Journal of Chemical Physics (2009). (With:Pu Liu, Qiang Shi and Gregory Voth) [BIB] [TGZ]
Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research (2006). (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Induction of Word and Phrase Alignments for Automatic Document Summarization. Computational Linguistics (2005). (With:Daniel Marcu) [PDF] [BIB] [TGZ]
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior. Journal of Machine Learning Research (2005). (With:Daniel Marcu) [PDF] [BIB] [TGZ]

Conference Papers

newMulti-Label Prediction via Sparse Infinite CCA. Neural Information Processing Systems 2009. (With:Piyush Rai) [BIB] [TGZ]
newMarkov Random Topic Fields. Association for Computational Linguistics 2009. [BIB] [TGZ]
newBayesian Multitask Learning with Latent Hierarchies. Uncertainty in Artificial Intelligence 2009. [BIB] [TGZ]
newUnsupervised Search-based Structured Prediction. International Conference on Machine Learning 2009. [BIB] [TGZ]
newExponential Family Hybrid Semi-Supervised Learning. International Joint Conference on Artificial Intelligence 2009. (With:Arvind Agarwal) [BIB] [TGZ]
newStreamed Learning: One-Pass SVMs. International Joint Conference on Artificial Intelligence 2009. (With:Piyush Rai and Suresh Venkatasubramanian) [BIB] [TGZ]
newNon-Parametric Bayesian Areal Linguistics. North American Association for Computational Linguistics 2009. [BIB] [TGZ]
newStreaming for Large Scale NLP: Language Modeling. North American Association for Computational Linguistics 2009. (With:Amit Goyal and Suresh Venkatasubramanian) [BIB] [TGZ]
newThe Infinite Hierarchical Factor Regression Model. Neural Information Processing Systems 2008. (With:Piyush Rai) [BIB] [TGZ]
newCross-Task Knowledge-Constrained Self Training. Empirical Methods in NLP 2008. [BIB] [TGZ]
newStructure Compilation: Trading Structure for Features. International Conference on Machine Learning 2008. (With:Percy Liang and Dan Klein) [BIB] [TGZ]
newName Translation in Statistical Machine Translation: Learning When to Transliterate. Association for Computational Linguistics 2008. (With:Ulf Hermjakob and Kevin Knight) [BIB] [TGZ]
Bayesian Agglomerative Clustering with Coalescents. Neural Information Processing Systems 2007. (With:Yee Whye Teh and Daniel Roy) [PDF] [BIB] [TGZ]
Frustratingly Easy Domain Adaptation. Association for Computational Linguistics 2007. [PDF] [BIB] [TGZ]
A Bayesian Model for Discovering Typological Implications. Association for Computational Linguistics 2007. (With:Lyle Campbell) [PDF] [BIB] [TGZ] [html]
Fast search for Dirichlet process mixture models. Conference on AI and Statistics (2007). [PDF] [BIB] [TGZ]
Bayesian Query-Focused Summarization. Association for Computational Linguistics 2006. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model. Human Language Technologies/Empirical Methods in NLP 2005. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction. International Conference on Machine Learning 2005. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
A Phrase-Based HMM Approach to Document/Abstract Alignment. Empirical Methods in NLP 2004. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
NP Bracketing by Maximum Entropy Tagging and SVM Reranking. Empirical Methods in NLP 2004. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Web Search Intent Induction via Automatic Query Reformulation. North American Association for Computational Linguistics 2004 Short Paper. (With:Eric Brill) [PDF] [BIB] [TGZ]
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks. International Conference on Natural Language Generation 2002. (With:Kevin Knight, Irene Langkilde-Geary, Daniel Marcu and Kenji Yamada) [PDF] [BIB] [TGZ]
A Noisy-Channel Model for Document Compression. Association for Computational Linguistics 2002. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval. Human Language Technologies 2001. (With:Eric Nyberg) [PDF] [BIB] [TGZ]

Workshop Papers

newUnsupervised Part of Speech Tagging Without a Lexicon. NIPS Workshop on Grammar Induction, Representation of Language and Language Learning. (With:Adam Teichert) [BIB] [TGZ]
newFast Search for Infinite latent Feature Models. NIPS Workshop on Non-parametric Bayes. (With:Piyush Rai) [BIB] [TGZ]
newSemi-supervised or Semi-unsupervised?. NAACL Workshop on Semi-supervised Learning for NLP, 2009. [BIB] [TGZ]
newPerceptron-based Coherence Prediction. Chip Multiprocessor Memory Systems and Interconnects, ICSA 2008. (With:Devyani Ghosh and John Carter) [BIB] [TGZ]
Search-Based Structured Prediction as Classification. NIPS 2005 Workshop on Advances in Structured Learning for Text and Speech Processing. (With:John Langford and Daniel Marcu) [PDF] [BIB] [TGZ]
Bayesian Summarization and DUC and a Suggestion for Extrinsic Evaluation. Document Underanding Conference (DUC) 2005. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Bayesian Multi-Document Summarization at MSE. ACL 2005 Workshop on Multilingual Summarization Evaluation (MSE). (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Supervised clustering with the Dirichlet process. NIPS 2004 Learning With Structured Outputs Workshop. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
Generic Sentence Fusion is an Ill-Defined Summarization Task. Text Summarization Branches Out Workshop (ACL 2004). (With:Daniel Marcu) [PDF] [BIB] [TGZ]
A Tree-Position Kernel for Document Compression. Document Underanding Conference (DUC) 2004. (With:Daniel Marcu) [PDF] [BIB] [TGZ]
GLEANS: A Generator of Logical Extracts and Abstracts for Nice Summaries. Document Underanding Conference (DUC) 2002. (With:Abdesammad Echihabi, Daniel Marcu, Dragos Stefan Munteanu and Radu Soricut) [BIB] [TGZ]

Book Review

Book Review: Automatic Summarization (by Inderjeet Mani). Machine Translation. [PDF]

Unpublished Papers

The following papers are not published anywhere, nor have they been peer reviewed. I put them up because I think (hope!) people might find them useful.
Searn in Practice. (With:John Langford and Daniel Marcu) [PDF] [BIB] [TGZ] [html]
Carefully Approximated Bayes Factors for Feature Selection in MaxEnt Models. [PDF] [BIB] [TGZ]
Notes on CG and LM-BFGS Optimization of Logistic Regression. [PDF] [BIB] [TGZ]
Support Vector Machines for Natural Language Processing. [PDF]
From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less. [PDF] [BIB] [TGZ]
Yet Another Haskell Tutorial. [PDF] [BIB] [TGZ] [html]
A Phrase-Based HMM. [PDF] [BIB] [TGZ]
Some notes on binning for Good-Turing smoothing.
Asymmetry of Coordination. [PDF] [BIB] [TGZ]
quick links
   nlp blog
   searn
   nlp/ml meeting
   ml (cs5350)
   ai (cs5300)
   anlp (cs5964)
   mlrg (cs7941)
   algo (cs7936)
   whattosee
   thesis
   jmlr
   haskell tutorial
conferences
   nips 09
   psb 10
   soda 10
   aistats 10
   recomb 10
   naacl-hlt 10
   cvpr 10
   icml 10
   colt 10
   ismb 10
   aaai 10
   acl 10
   conll 10
   coling 10
   sigir 10
   kdd 10
   emnlp 10
   uai 10
last updated on eight november, two thousand nine; contact me AT hal3 DOT name