I am primarily interested in machine learning and have done some work in the area of transfer learning. Currently, I am working on a distributed model for machine learning. Prior to this, I have dabbled in databases, video compression and computer architecture.
I did my Masters [2005-2007] in Computer Science from Indian Institute of Technology, Kharagpur and my Bachelors [1999-2003] in Electronics and Telecommunications Engineering from Jadavpur University, India. In between, I worked for a couple of years on software development for embedded systems. My masters thesis was on motion estimation algorithms for video compression and my bachelors thesis involved navigational path planning for mobile robots.
In this work, we propose a fast algorithm for multiple kernel learning (MKL). Our proposed approach builds on the original QCQP formulation of Lanckriet et al. It uses a multiplicative weight update based approximation for the underlying SDP, exploiting a careful reformulation of the MKL problem as well as a novel fast matrix exponentiation routine for QCQP constraints that might be of independent interest. Our method avoids the use of commercial nonlinear solvers, and scales efficiently to much larger data sets than most prior methods can handle. Empirical evaluation on eleven datasets shows that our method is significantly faster than most current MKL methods and compares favorably with a uniform unweighted combination of kernels.
In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication required for learning classifiers while allowing for $\eps$ training error on linearly separable data adversarially distributed across nodes.
In this work, we develop key improvements and extensions to this basic model. Our first result is a two-party multiplicative-weight-update based protocol that uses $O(d^2 \log{1/\eps})$ words of communication to classify distributed data in arbitrary dimension $d$, $\eps$-optimally. This readily extends to classification over $k$ nodes with $O(kd^2 \log{1/\eps})$ words of communication. Our proposed protocol is simple to implement and is considerably more efficient than baselines compared, as demonstrated by our empirical results.
In addition, we illustrate general algorithm design paradigms for doing efficient learning over distributed data. We show how to solve fixed-dimensional and high dimensional linear programming efficiently in a distributed setting where constraints may be distributed across nodes. Since many learning problems can be viewed as convex optimization problems where constraints are generated by individual points, this models many typical distributed learning scenarios. Our techniques make use of a novel connection from multipass streaming, as well as adapting the multiplicative-weight-update framework more generally to a distributed setting. As a consequence, our methods extend to the wide range of problems solvable using these techniques.
We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets while minimizing the communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets. We present several very general sampling-based solutions as well as some two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for noiseless data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning the important data from another node.
In this paper, we harness the synergy between two important
learning paradigms, namely, active learning and domain adaptation. We
show how active learning in a target domain can leverage information
from a different but related source domain. Our proposed framework, Active
Learning Domain Adapted (Alda), uses source domain knowledge
to transfer information that facilitates active learning in the target domain.
We propose two variants of Alda: a batch B-Alda and an online
O-Alda. Empirical comparisons with numerous baselines on real-world
datasets establish the efficacy of the proposed methods.
[C] Online Learning of Multiple Tasks and Their Relationships
by Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian
at International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
We propose an Online MultiTask Learning (Omtl) framework which simultaneously
learns the task weight vectors as well as the task relatedness adaptively from the
data. Our work is in contrast with prior work on online multitask learning which
assumes fixed task relatedness, a priori. Furthermore, whereas prior work in such
settings assume only positively correlated tasks, our framework can capture negative
correlations as well. Our proposed framework learns the task relationship matrix
by framing the objective function as a Bregman divergence minimization problem
for positive definite matrices. Subsequently, we exploit this adaptively learned
task-relationship matrix to select the most informative samples in an online multitask
active learning setting. Experimental results on a number of real-world datasets
and comparisons with numerous baselines establish the efficacy of our proposed approach.
*[C] Co-regularization Based Semi-supervised Domain Adaptation
by Hal Daumé III, Abhishek Kumar, Avishek Saha
at Advances in Neural Information Processing Systems (NIPS), 2010.
This paper presents a co-regularization based approach to
semi-supervised domain adaptation. Our proposed approach (EA++) builds
on the notion of augmented space (introduced in EASYADAPT (EA) [1])
and harnesses unlabeled data in target domain to further enable the
transfer of information from source to target. This semi-supervised
approach to domain adaptation is extremely simple to implement and can
be applied as a pre-processing step to any supervised learner. Our
theoretical analysis (in terms of Rademacher complexity) of EA and
EA++ show that the hypothesis class of EA++ has lower complexity
(compared to EA) and hence results in tighter generalization bounds.
Experimental results on sentiment analysis tasks reinforce our
theoretical findings and demonstrate the efficacy of the proposed
method when compared to EA as well as a few other baseline approaches.
In this paper, we propose an online multitask learning framework where the weight
vectors are updated in an adaptive fashion based on inter-task relatedness. Our work
in is contrast with earlier work on online multitask learning
where the authors use a fixed interaction matrix of tasks to derive (fixed) update rules for all the tasks.
In this work, we propose to update this interaction matrix itself in an adaptive fashion so that the
weight vector updates are no longer fixed but are instead adaptive. Our framework can be extended
to an active learning setting where the informativeness of an incoming instance across all the tasks can be
evaluated using this adaptive interaction matrix. Empirical results on standardized datasets
show improved performance in terms of accuracy, label complexity and number of mistakes made.
*[W] Frustratingly Easy Semi-supervised Domain Adaptation
by Hal Daumé III, Abhishek Kumar, Avishek Saha
at ACL 2010 Workshop on Domain Adaptation for Natural Language Processing (DANLP), 2010.
In this work, we propose a semi-supervised extension to a well-known
supervised domain adaptation approach (EA)~\cite{daume07easyadapt}.
Our proposed approach (EA++) builds on the notion of augmented
space (introduced in EA) and harnesses unlabeled data in target
domain to ameliorate the transfer of information from \emph{source} to \emph{target}.
This semi-supervised approach to domain adaptation is extremely simple to implement, and can
be applied as a pre-processing step to any supervised learner.
Experimental results on sequential labeling tasks demonstrate the efficacy
of the proposed method.
In this work, we show how active learning in some (target) domain can leverage information
from a different but related (source) domain. We present an algorithm that harnesses
the source domain data to learn the best possible initializer hypothesis for doing active
learning in the target domain, resulting in improved label complexity. We also present a
variant of this algorithm which additionally uses the domain divergence information to selectively
query the most informative points in the target domain, leading to further reductions
in label complexity. Experimental results on a variety of datasets establish the efficacy
of the proposed methods.
We study sequential dependencies that express the semantics of data with ordered domains
and help identify quality problems with such data. Given an interval g, we write {X}->(g){Y} to
denote that the difference between the Y-attribute values of any two consecutive records,
when sorted on X, must be in g. For example, {time}->(0,\inf){sequence number} indicates
that sequence numbers are strictly increasing over time, whereas {sequence number}->(4,5){time}
means that the time "gaps" between consecutive sequence numbers are between 4 and 5.
Sequential dependencies express relationships between ordered attributes,
and identify missing (gaps too large), extraneous (gaps too small) and out-of-order data.
To make sequential dependencies applicable to real-world data, we relax their requirements
and allow them to hold approximately (with some exceptions) and conditionally (on various subsets
of the data). This paper proposes the notion of conditional approximate sequential dependencies
and provides an efficient framework for discovering pattern tableaux, which are compact representations
of the subsets of the data (i.e., ranges of values of the ordered attributes) that satisfy the
underlying dependency. We present analyses of our proposed algorithms, and experiments on real
data demonstrating the efficiency and utility of our framework.
When merging data from various sources, it is often the case that small variations in data format
and interpretation cause traditional functional dependencies (FDs) to be violated, without there
being an intrinsic violation of semantics. Examples include differing address formats, or different
reported latitude/longitudes for a given address. In this paper, we define metric functional dependencies,
which strictly generalize traditional FDs by allowing small differences (controlled by a metric) in values
of the consequent attribute of an FD. We present efficient algorithms for the verification problem:
determining whether a given metric FD holds for a given relation. We experimentally demonstrate the
validity and efficiency of our approach on various data sets that lie in multidimensional spaces.