about me
cv + bio
publications
research
teaching
students
software
photos
calendar
contact me
links
|
|
All of my publications are available here. My Erdös number is at most 3 (me to Suresh Venkatasubramanian to Andy Yao to Paul Erdös). Click here for a listing without summaries. The SRC files are gzipped tars containing the tex, figures and required style files.
New Papers:
Search-based Structured Prediction (Machine Learning J. 2009)
A Bayesian Statistics Approach to Multiscale Coarse Graining (J. of Chemical Physics 2009)
Extracting Multilingual Topics from Unaligned Corpora (European Conference on Information Retrieval 2010)
Multi-Label Prediction via Sparse Infinite CCA (NIPS 2009)
Markov Random Topic Fields (ACL 2009)
Bayesian Multitask Learning with Latent Hierarchies (UAI 2009)
Unsupervised Search-based Structured Prediction (ICML 2009)
Exponential Family Hybrid Semi-Supervised Learning (IJCAI 2009)
Streamed Learning: One-Pass SVMs (IJCAI 2009)
Non-Parametric Bayesian Areal Linguistics (NAACL 2009)
Streaming for Large Scale NLP: Language Modeling (NAACL 2009)
The Infinite Hierarchical Factor Regression Model (NIPS 2008)
Cross-Task Knowledge-Constrained Self Training (EMNLP 2008)
Structure Compilation: Trading Structure for Features (ICML 2008)
Name Translation in Statistical Machine Translation: Learning When to Transliterate (ACL 2008)
Unsupervised Part of Speech Tagging Without a Lexicon (NIPS Workshop on Grammar Induction, Representation of Language and Language Learning)
Fast Search for Infinite latent Feature Models (NIPS Workshop on Non-parametric Bayes)
Factor Regression Combining Heterogeneous Sources of Information (NIPS Workshop on Learning from Multiple Sources)
Semi-supervised or Semi-unsupervised? (NAACL Workshop on Semi-supervised Learning for NLP, 2009)
Perceptron-based Coherence Prediction (Chip Multiprocessor Memory Systems and Interconnects, ICSA 2008)
Thesis
Journal Papers





|
Search-based Structured Prediction #
Machine Learning Journal (2009)
With:
John Langford and
Daniel Marcu.
We describe an algorithm, Searn for solving structured
prediction problems for cases where neither the loss nor the features
decompose over the structure. This algorithm comes with reduction
guarantees about performance, and strong empirical results in both
standard sequence labeling tasks and a novel document summarization
task.
|



|
A Bayesian Statistics Approach to Multiscale Coarse Graining #
Journal of Chemical Physics (2009)
With:
Pu Liu,
Qiang Shi and
Gregory Voth.
We apply Bayesian analysis to model a multiscale coarse-grained force
field in a thermodynamically consistent way. The robustness and
accuracy of the Bayesian MS-CG algorithm is demonstrated for three
different systems, including simple liquid methanol, polyalanine
peptide solvated in explicit water, and a much more complicated
peptide assembly with 32 NNQQNY hexapeptides.
|




|
Domain Adaptation for Statistical Classifiers #
Journal of Artificial Intelligence Research (2006)
With:
Daniel Marcu.
We address the problem of moving classifiers from one domain to
another (compromising the second "i" in i.i.d.) by treating
the data distribution as a mixture of three sources: in-domain,
out-of-domain and general. Inference is based on conditional EM for
conditional models (maximum entropy) and we get quite good performance
on three natural language tasks.
Note: There are errors in some of the equation derivations; I will put out an official errata soon!
|




|
Induction of Word and Phrase Alignments for Automatic Document Summarization #
Computational Linguistics (2005)
With:
Daniel Marcu.
We develop a model for automatically building word and phrase
alignments for the task of automatic document summarization, based on
the segment HMM (or semi-HMM). This explicates and extends the EMNLP
paper below, with more experiments and some syntactically motived
transition models.
|




|
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior #
Journal of Machine Learning Research (2005)
With:
Daniel Marcu.
We develop three models for the supervised clustering problem based on
the Dirichlet process as a prior distribution over the potentially
infinite sets encountered in this problem. On three of four data sets, we
achieve quite impressive results. This extends and completes the NIPS
workshop paper below.
|
Conference Papers



|
Extracting Multilingual Topics from Unaligned Corpora #
European Conference on Information Retrieval 2010
With:
Jagadeesh Jagaralamudi.
We show how topic models can be used to model unaligned multilingual
corpora, such as Wikipedia.
|



|
Multi-Label Prediction via Sparse Infinite CCA #
Neural Information Processing Systems 2009
With:
Piyush Rai.
Using an infinte variant of CCA, we propose a non-parametric Bayesian
framework for semi-supervised dimensionality reduction for multitask
learning and multi-label prediction.
|



|
Markov Random Topic Fields #
Association for Computational Linguistics 2009
We show how to integrate topic models in an undirected graph for topic
mining in -- for instance -- scientific publications. We explore
several different model parameterizations.
|



|
Bayesian Multitask Learning with Latent Hierarchies #
Uncertainty in Artificial Intelligence 2009
We show that hierarchical multitask learning can be accomplished even
when the hierarchical structure is not known in advance. We use the
coalescent to make this happen.
|



|
Unsupervised Search-based Structured Prediction #
International Conference on Machine Learning 2009
We extend the Searn algorithm to
handle unsupervised learning; experiments on dependency parsing are
promising and easy to implement.
|



|
Exponential Family Hybrid Semi-Supervised Learning #
International Joint Conference on Artificial Intelligence 2009
With:
Arvind Agarwal.
We generalize the coupled prior approach to hybrid models
probabilistic models to arbitrary exponential family distributions.
Experiments show that this is a good idea.
|



|
Streamed Learning: One-Pass SVMs #
International Joint Conference on Artificial Intelligence 2009
With:
Piyush Rai and
Suresh Venkatasubramanian.
We present a one-pass approach to learning in the support vector
machine framework. Our algorithm leverages the computational geometry
view of SVMs as minimum enclosing balls, plus results on streaming
MEBs.
|



|
Non-Parametric Bayesian Areal Linguistics #
North American Association for Computational Linguistics 2009
We present a model of linguistic phylogenetics that takes into account
areas affects in a non-parametric way. Our model simultaneously
discovers phylogenetic trees, areal clusters and areal features.
|



|
Streaming for Large Scale NLP: Language Modeling #
North American Association for Computational Linguistics 2009
With:
Amit Goyal and
Suresh Venkatasubramanian.
We describe an approach to language modeling based on the streaming
model of algorithms. We show that a single-pass, high efficiency
approximate counting method can lead to tiny language models that
perform as well as forever-to-build entropy-pruned language models.
|



|
The Infinite Hierarchical Factor Regression Model #
Neural Information Processing Systems 2008
With:
Piyush Rai.
We address shortcomings in factor analysis by: (1) we do not assume a
known number of factors; (2) we do not assume factors are independent;
(3) we do not assume all features are relevant to the factor
analysis. We apply this to microarray analysis data.
|



|
Cross-Task Knowledge-Constrained Self Training #
Empirical Methods in NLP 2008
We present a simple algorithm for self-training with knowledge that
tells us when outputs for multiple tasks are "compatible." We can
learn to do NER on a tiny amount of labeled data by this method.
|



|
Structure Compilation: Trading Structure for Features #
International Conference on Machine Learning 2008
With:
Percy Liang and
Dan Klein.
We investigate the trade-off between structured models and
feature-rich models, empirically and theoretically. We also present a
way to trade structure for features using unlabeled data.
|



|
Name Translation in Statistical Machine Translation: Learning When to Transliterate #
Association for Computational Linguistics 2008
With:
Ulf Hermjakob and
Kevin Knight.
We transliterate names in an end-to-end MT system, where names are
explicitly tagged with there to transliterate or not. Our
transliterator outperforms 3 of 4 human translators.
|



|
Bayesian Agglomerative Clustering with Coalescents #
Neural Information Processing Systems 2007
With:
Yee Whye Teh and
Daniel Roy.
We introduce a new Bayesian model for hierarchical clustering based on
a prior over trees called Kingman's coalescent. We develop novel
greedy and sequential Monte Carlo inferences which operate in a
bottom-up agglomerative fashion. We show experimentally the
superiority of our algorithms over others, and demonstrate our
approach in document clustering and phylolinguistics.
|




|
Frustratingly Easy Domain Adaptation #
Association for Computational Linguistics 2007
When both source and target labeled data are available, a very simple
"merge" can lead to state-of-the-art results on the target task. This
is essentially a reduction for domain adaptation and can be
implimented in 10 lines of
Perl. Slides available as
OpenOffice and
PDF.
Note: There is related work in multitask learning by Evgeniou and Pontil.
|





|
A Bayesian Model for Discovering Typological Implications #
Association for Computational Linguistics 2007
With:
Lyle Campbell.
Based on the WALS data, we automatically discover typological
implications of the form "verb-object implies noun-adjective." Many
of our discovered implications are well known; some are not. We
introduce a hierarchical prior to cope with the sampling problem. Slides available as
OpenOffice and
PDF.
|




|
Fast search for Dirichlet process mixture models #
Conference on AI and Statistics (2007)
When using DPs for mixture modeling, one often only cares about
getting the MAP cluster assignment. In such cases, we show that it is
a good idea just to do an efficient search, rather than performing
(expensive) sampling. A surprising result emerges: nearly greedy
search actually works incredibly well! Slides available as
OpenOffice and
PDF.
|




|
Bayesian Query-Focused Summarization #
Association for Computational Linguistics 2006
With:
Daniel Marcu.
We describe the BayeSum system (previously called BQFS in our DUC and
MSE papers) for query-focused sentence extraction in a Bayesian
framework (the model looks like a topic model if you squint a
little). Achieves competitive performance on white-box experiments
and also leads to good systems in DUC. Also, from an IR perspective,
provides a formalism for statistically grounded query expansion.
Slides available as
OpenOffice and
PDF.
|




|
A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model #
Human Language Technologies/Empirical Methods in NLP 2005
With:
Daniel Marcu.
We apply the LaSO technique to the EDT
problem (entity identification and coreference); doing so allows us to
use many more interesting features than have been previously available
in models for this task, and we get very good results on the ACE
benchmark data (with very efficient algorithms).
|




|
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction #
International Conference on Machine Learning 2005
With:
Daniel Marcu.
We describe a machine learning framework for producing structured
outputs (such as sequences, parse trees, sentences, etc.) when search
is intractable. The framework is based on combining online updates
with a generic search algorithm, and leads to very efficient, very
effective models. Slides in OpenOffice or
PDF format.
Note: See Alan Fern's web page for follow-up work.
|



|
A Phrase-Based HMM Approach to Document/Abstract Alignment #
Empirical Methods in NLP 2004
With:
Daniel Marcu.
We present a model called the phrase-based HMM (see under unpublished
work below) for producing word-to-word and phrase-to-phrase alignments
between documents and abstracts. This is in some sense my seminal
accomplishment in summarization, and one of the pieces of work I am
currently most proud of. I have also made available the annotation software and guide. The slides from
an extended version of the talk are available in OpenOffice format, or
for view in your
browser (warning: animations don't work in the HTML version, which
makes some slides impossible to read).
|



|
NP Bracketing by Maximum Entropy Tagging and SVM Reranking #
Empirical Methods in NLP 2004
With:
Daniel Marcu.
NP Bracketing is the task of identifying all base- and embedded-NPs.
We do this task first by ME tagging using an underspecified tagset,
and then SVM-based reranking of hypotheses. We get best performance
to date, and offer advantages over full parsers.
|



|
Web Search Intent Induction via Automatic Query Reformulation #
North American Association for Computational Linguistics 2004 Short Paper
With:
Eric Brill.
We describe an algorithm that uses queries made by other people to
help nail down intents for underspecified queries, and show its
effectiveness in a real system. This is a short version of an
unpublished paper available in
Postscript or
PDF format.
|



|
The Importance of Lexicalized Syntax Models for Natural Language Generation Tasks #
International Conference on Natural Language Generation 2002
With:
Kevin Knight,
Irene Langkilde-Geary,
Daniel Marcu and
Kenji Yamada.
We compare the performance difference of n-gram language models and
syntax-based langauge models on pure natural language generation,
summarization and machine translation. Our results show that
lexicalized syntax-based models greatly improve readability.
|



|
A Noisy-Channel Model for Document Compression #
Association for Computational Linguistics 2002
With:
Daniel Marcu.
We present a document compression system that uses a hierarchical
noisy-channel model of text production based on combining discourse
trees and syntax trees. We apply this system to summarization.
|



|
Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval #
Human Language Technologies 2001
With:
Eric Nyberg.
We present the Integrated Information Management (IIM) architecture
for component-based development of IR applications, which is general
enough to model different types of IR tasks, beyond indexing and
retrieval.
|
Workshop Papers



|
Unsupervised Part of Speech Tagging Without a Lexicon #
NIPS Workshop on Grammar Induction, Representation of Language and Language Learning
With:
Adam Teichert.
We show how to use simple typological knowledge to improve
unsupervised part of speech tagging. The basic result is that we can
do without seed lexicons if we know a little linguistics.
|



|
Fast Search for Infinite latent Feature Models #
NIPS Workshop on Non-parametric Bayes
With:
Piyush Rai.
We show that, like A* and beam search can solve MAP for Dirichlet
Processes, it can also be applied to the Indian Buffet Process.
|



|
Factor Regression Combining Heterogeneous Sources of Information #
NIPS Workshop on Learning from Multiple Sources
With:
Amrish Kapoor and
Piyush Rai.
We combine text and microarray data in a factor regression model to
achieve higher prediction accuracy.
|



|
Semi-supervised or Semi-unsupervised? #
NAACL Workshop on Semi-supervised Learning for NLP, 2009
Are you doing learning with labeled and unlabeled data in such a way
that it would work with only unlabeled data or such that it would work
with only labeled data? Why not both? This is an invited position
paper for the SSLNLP workshop.
|



|
Perceptron-based Coherence Prediction #
Chip Multiprocessor Memory Systems and Interconnects, ICSA 2008
With:
Devyani Ghosh and
John Carter.
We present a learning solution to the coherence problem in multicore
systems, achieve very high precision and recall on the "will you need
this data" prediction task.
|




|
Search-Based Structured Prediction as Classification #
NIPS 2005 Workshop on Advances in Structured Learning for Text and Speech Processing
With:
John Langford and
Daniel Marcu.
We extend and formalize the LaSO framework developed in the ICML 2005
paper and obtain significantly stronger theoretical guarantees, even for
non-0/1 loss. Relationships to reinforcement learning are also discussed.
|




|
Bayesian Summarization and DUC and a Suggestion for Extrinsic Evaluation #
Document Underanding Conference (DUC) 2005
With:
Daniel Marcu.
Since we submitted essentially the same system to DUC as to MSE, we
briefly describe the minor differences, then spend the rest of the
paper advocating a new evaluation technique, called Filtering by
Summary. This is an extrinsic evaluation that is easy to deploy,
requires little training, and isn't costly.
|




|
Bayesian Multi-Document Summarization at MSE #
ACL 2005 Workshop on Multilingual Summarization Evaluation (MSE)
With:
Daniel Marcu.
We describe our entry into the MSE competition; it is based on our
Bayesian Query-Focused Summarization model (under review), adapted to
the multidocument setting. The model performed very well in the
evaluation, coming in first according to the human evaluation and
fifth or third according to the two automatic evaluations.
|



|
Supervised clustering with the Dirichlet process #
NIPS 2004 Learning With Structured Outputs Workshop
With:
Daniel Marcu.
In problems such as reference matching, identity uncertainty and
coreference resolution, you need to be able to learn to partition sets
based on supervised examples. We present a Bayesian model for
learning how to do this based on the semi-parametric Dirichlet process
prior. A more complete version of this paper is in preparation.
|



|
Generic Sentence Fusion is an Ill-Defined Summarization Task #
Text Summarization Branches Out Workshop (ACL 2004)
With:
Daniel Marcu.
We perform a series of human experiments to assess the legitimacy of
the task of fusing two sentences without context or task. We find
that there is no agreement between humans on this task, even in the
limited case of 2 to 1 sentence fusion.
|



|
A Tree-Position Kernel for Document Compression #
Document Underanding Conference (DUC) 2004
With:
Daniel Marcu.
We introduct the Tree-Position kernel for performing document
compression, in a model which is a generalization of the document
compression model from ACL 2002 (below). Works well by itself, but not
in a full system. The slides from the talk are available as Postscript or PDF.
|


|
GLEANS: A Generator of Logical Extracts and Abstracts for Nice Summaries #
Document Underanding Conference (DUC) 2002
With:
Abdesammad Echihabi,
Daniel Marcu,
Dragos Stefan Munteanu and
Radu Soricut.
We describe a summarization system that functions by mapping documents
into a canonical database-like representation, categorizes the
documents, and generates abstracts and extracts based on generic
templates.
|
Book Review
Unpublished Papers
The following papers are not published anywhere, nor have they been
peer reviewed. I put them up because I think (hope!) people might
find them useful.




|
Searn in Practice #
With:
John Langford and
Daniel Marcu.
We recently introduced an algorithm, Searn, for solving hard
structured prediction problems. This report is designed to showcase
how Searn can be applied to a wide variety of techniques and what
really goes on behind the scenes. We show how to apply Searn to three
common NLP problems: (1) sequence labeling, (2) parsing and (3)
machine translation.
|



|
Carefully Approximated Bayes Factors for Feature Selection in MaxEnt Models #
I describe a method of feature selection in maximum entropy (logistic
regression) models based on the Bayes factor. I approximate data
evidence using the Laplace approximation, and using two reasonable
assumptions manage to get a very efficient procedure that has many
benefits, including automatic stopping.
|



|
Notes on CG and LM-BFGS Optimization of Logistic Regression #
These are notes on the implementation I have
written of conjugate gradient and limited memory BFGS optimization for
logistic regression (aka maximum entropy) classifiers. The notes were
created because it is actually quite difficult to find good references
on efficient implementation of these algorithms, though discussion of
them exists everywhere.
|


|
Support Vector Machines for Natural Language Processing #
These are notes to accompany a lecture I gave on the use of SVMs in an
intro graduate level NLP class. The slides I used can be found here. Both sources contain approximately the same
information, but the notes are much more in-depth and can be
understood without a soundtrack.
|



|
From Zero to Reproducing Kernel Hilbert Spaces in Twelve Pages or Less #
This is a tutorial on the mathematics behind RKHSs and their use in
machine learning (eg., support vector machines and gaussian
processes).
|




|
Yet Another Haskell Tutorial #
A tutorial on the Haskell programming language, designed for people
who have experience programming (just not in a functional language).
|



|
A Phrase-Based HMM #
This describes the derivation of the inference algorithms used in the
EMNLP 2004 document/summary alignment paper above. Unfortunately,
there was no room in that document to explain these in detail, so they
are relegated to this unpublished version.
|

|
Some notes on binning for Good-Turing smoothing #
This reports a small experiment done to explain binning in Good-Turing
smoothing for a lecture on that topic in a graduate intro to NLP
class.
|



|
Asymmetry of Coordination #
A final paper for a graduate linguistics syntax class, examines the
issue of coordination in English and Japanese and proposes a unified
structure for this phenomena. Also reviews several proposals to date.
|
My Co-authors and the percentage of my papers they are on:
Daniel Marcu (35%)
Piyush Rai (9%)
John Langford (5%)
Suresh Venkatasubramanian (3%)
Kevin Knight (3%)
Jagadeesh Jagaralamudi (1%)
Amit Goyal (1%)
Arvind Agarwal (1%)
Gregory Voth (1%)
Pu Liu (1%)
Qiang Shi (1%)
Dan Klein (1%)
Devyani Ghosh (1%)
John Carter (1%)
Percy Liang (1%)
Ulf Hermjakob (1%)
Daniel Roy (1%)
Lyle Campbell (1%)
Yee Whye Teh (1%)
Eric Brill (1%)
Abdesammad Echihabi (1%)
Dragos Stefan Munteanu (1%)
Irene Langkilde-Geary (1%)
Kenji Yamada (1%)
Radu Soricut (1%)
Adam Teichert (1%)
Amrish Kapoor (1%)
Eric Nyberg (1%)
My publishing productivity by year:
2010: 1; 2009: 14; 2008: 5; 2007: 4; 2006: 4; 2005: 7; 2004: 10; 2002: 4; 2001: 1My dozen most popular papers (by download count):
- Yet Another Haskell Tutorial (119048)
- Practical Structured Learning for Natural Language Processing (4448)
- Frustratingly Easy Domain Adaptation (3168)
- Notes on CG and LM-BFGS Optimization of Logistic Regression (2943)
- Fast search for Dirichlet process mixture models (2119)
- Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction (2094)
- Bayesian Query-Focused Summarization (1853)
- Bayesian Agglomerative Clustering with Coalescents (1820)
- A Bayesian Model for Discovering Typological Implications (1714)
- Searn in Practice (1561)
- Domain Adaptation for Statistical Classifiers (1230)
- A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model (1182)
|