The Machine Learning series takes place in the large conference room on the fourth floor of the Computer Science building on Wednesdays at 12:30.
Wed Oct 10, 2012  12:30 PM  CS 402  Miroslav Dudik (Microsoft Research  NYC) 
Tractable market making in combinatorial prediction markets
Prediction markets are emerging as a powerful and accurate method of aggregating information from populations of experts (and nonexperts). Traders in prediction markets are incentivized to reveal their information through buying and selling "securities" for events such as "Romney to win Florida". The prices of securities reflect the aggregate belief about the events and the key challenge is to correctly price the securities. We present a new automated market maker for providing liquidity across multiple logically interrelated securities. Our approach lies somewhere between the industry standardtreating related securities as independent and thus not transmitting any information from one security to anotherand a full combinatorial market maker for which pricing is computationally intractable. Our market maker, based on convex optimization and constraint generation, is tractable like independent securities yet propagates some information among related securities like a combinatorial market maker, resulting in more complete information aggregation. Our techniques borrow heavily from variational inference in exponential families. We prove several favorable properties of our scheme and evaluate its information aggregation performance on survey data involving hundreds of thousands of complex predictions about the 2008 U.S. presidential election. Joint work with Sebastien Lahaie and David Pennock. Bio: Miroslav Dudik joined MSRNYC in May 2012. His interests are in combining theoretical and applied aspects of machine learning, statistics, convex optimization and algorithms. He received his PhD from Princeton in 2007. 

Wed Oct 17, 2012  12:30 PM  CS 402  Tamir Hazan (TTIChicago) 
Inference and Learning with Random Maximum APosteriori Perturbations
Learning and inference in complex models drives much of the research in machine learning applications, from computer vision, natural language processing, to computational biology. The inference problem in such cases involves assessing the weights of possible structures, whether objects, parsers, or molecular structures. Although it is often feasible to only find the most likely or maximum aposteriori (MAP) assignment rather than considering all possible assignment, MAP inference is limited when there are other likely assignments. In a fully probabilistic treatment, all possible alternative assignments are considered thus requiring summing over the assignments with their respective weights which is considerably harder (#P hard vs NP hard). The main surprising result of our work is that MAP inference (maximization) can be used to approximate and bound the weighted counting. This leads us to a new approximate inference framework that is based on MAPstatistics, thus does not depend on pseudoprobabilities, contrasting the current framework of Bethe approximations which lacks statistical meaning. This approach excels in regimes where there are several but not exponentially many prominent assignments. For example, this happens in cases where observations carry strong signals (local evidence) but are also guided by strong consistency constraints (couplings). Joint work with Tommi Jaakkola 

Wed Oct 24, 2012  12:30 PM  CS 402  Shay Cohen (Columbia) 
Consistent and Efficient Algorithms for LatentVariable PCFGs
In the past few years, there has been an increased interest in the machine learning community in spectral algorithms for estimating models with latent variables. Examples include algorithms for estimating mixture of Gaussians or for estimating the parameters of a hidden Markov model. Until the introduction of spectral algorithms, the EM algorithm has been the mainstay for estimation with latent variables. Still, with EM there is no guarantee of convergence to the global maximum of the likelihood function, and therefore EM generally does not provide consistent estimates for the model parameters. Spectral algorithms, on the other hand, are often shown to be consistent. In this talk, I am interested in presenting a spectral algorithm for latentvariable PCFGs, a model widely used for parsing in the NLP community. This model augments the nonterminals in a PCFG grammar with latent states. These latent states refine the nonterminal category in order to capture subtle syntactic nuances in the data. This model has been successfully implemented in stateoftheart parsers such as the Berkeley parser. The algorithm developed is considerably faster than EM, as it makes only one pass over the data. Statistics are collected from the data in this pass, and singular value decomposition is performed on matrices containing these statistics. Our algorithm is also provably consistent in the sense that, given enough samples, it will estimate probabilities for test trees close to their true probabilities under the latentvariable PCFG model. If time permits, I will also present a method aimed at improving the efficiency of parsing with latentvariable PCFGs. This method relies on tensor decomposition of the latentvariable PCFG. This tensor decomposition is approximate, and therefore the new parser is an approximate parser as well. Still, the quality of approximation can be theoretically guaranteed by inspecting how errors from the approximation propagate in the parse trees. Joint work with Michael Collins, Dean Foster, Karl Stratos and Lyle Ungar. 

Wed Nov 14, 2012  12:30 PM  CS 402  Xin Luna Dong 
Truth Finding on the Deep Web
The Web has been changing our lives enormously and people rely more and more on the Web to fulfill their information needs. Compared with traditional media, information on the Web can be published fast, but with fewer guarantees on quality and credibility. Indeed, Web sources are of different qualities, sometimes providing conflicting, outofdate and incomplete data. The sources can also easily copy, reformat and modify data from other sources, propagating erroneous data. In this talk we present a recent study for truthfulness of Deep Web data in two domains where we believed data quality is important to people's lives: Stock and Flight. We then describe how we can resolve conflicts from different sources by leveraging accuracy of the sources and the copying relationships between the sources using statistical models. We demo our SOLOMON system, which can effectively detect copying between data sources, leverage the results in truth discovery, and provide a userfriendly interface to facilitate users in understanding the results. Dr. Xin Luna Dong is a researcher at AT&T LabsResearch. She received a Ph.D. in Computer Science and Engineering from University of Washington in 2007, received a Master's Degree in Computer Science from Peking University in China in 2001, and received a Bachelor's Degree in Computer Science from Nankai University in China in 1998. Her research interests include databases, information retrieval and machine learning, with an emphasis on data integration, data cleaning, personal information management, and web search. She has led the Solomon project, whose goal is to detect copying between structured sources and to leverage the results in various aspects of data integration, and the Semex personal information management system, which got the Best Demo award (one of top3) in Sigmod'05. She has cochaired Sigmod/PODS PhD Symposium'12, Sigmod New Researcher Symposium'12, QDB'12, WebDB'10, has served as a track chair for the program committee of ICDE'13, CIKM'11, and has served in the program committee of VLDB'13, Sigmod'12, VDLB'12, Sigmod'11, VLDB'11, PVLDB'10, WWW'10, ICDE'10, VLDB'09, etc. 

Wed Nov 7, 2012  12:30 PM  CS 402  Jean Honorio 
Convergence Rates of Biased Stochastic Optimization for Learning Sparse Ising Models
We study the convergence rate of stochastic optimization of exact (NPhard) objectives, for which only biased estimates of the gradient are available. We motivate this problem in the context of learning the structure and parameters of Ising models. We first provide a convergencerate analysis of deterministic errors for forwardbackward splitting (FBS). We then extend our analysis to biased stochastic errors, by first characterizing a family of samplers and providing a high probability bound that allows understanding not only FBS, but also proximal gradient (PG) methods. We derive some interesting conclusions: FBS requires only a logarithmically increasing number of random samples in order to converge (although at a very low rate); the required number of random samples is the same for the deterministic and the biased stochastic setting for FBS and basic PG; accelerated PG is not guaranteed to converge in the biased stochastic setting. Jean Honorio is a PhD in Computer Science from Stony Brook University. He received his MSc in Computer Science from The George Washington University in 2006, and BSc in Systems Engineering from Universidad de Lima, Peru in 1997. His research focuses on tractable learning of graphical model structures from data. He will soon join CSAIL at MIT as a postdoctoral researcher. He was member of the Image Analysis Lab at Stony Brook University and the Medical Department at Brookhaven National Lab in 20072012, the Institute for Computer Graphics at The George Washington University in 20052006, and the Institute for System Engineering and Operations Research at Universidad de Lima in 19951996. 

Wed Dec 12, 2012  12:30 PM  CS 402  Lauren Hannah (Columbia) 
Multivariate convex regression
Shape constraints, like convexity, occur in a variety of inference problems, including geometric programming based circuit design, options pricing and value function approximation. Current methods for multivariate regression subject to convexity constraints do not scale to more than a couple of thousand observations. In this talk, we present a new computationally efficient, consistent method for multivariate convex regression using adaptive partitioning with linear partition models. We apply this method to large scale inference problems and discuss extensions to objective function and constraint approximation in a convex optimization setting.

Wed Mar 7, 2012  12:30 PM  CS 402  David Sontag (NYU) 
Complexity of Inference in Latent Dirichlet Allocation
[Slides]
In this talk, I will discuss recent work on understanding the
computational complexity of probabilistic inference in Latent
Dirichlet Allocation (LDA). Time permitting, I will also discuss my
recent work on using probabilistic models such as LDA for
personalizing web search and for realtime surveillance of patients'
medical records.


Wed Mar 14, 2012  12:30 PM  CS 402  Marc Ratkovic (Politics) 
Achieving Optimal Covariate Balance Under General Treatment Regimes
[Slides]
Balancing covariates across treatment levels provides an effective and increasingly popular strategy for reducing bias in observational studies. Matching procedures, as a means of achieving balance, preprocess the data through identifying a subset of control observations whose background characteristics are similar to the treated observations. The proposed method adapts the support vector machine classifier to the matching problem. The method provides a fully automated, nonparametric procedure for identifying a balanced subset that accommodates both binary and continuous treatment regimes. I show that, for the identified subset, the expected treatment level is jointly independent of the pretreatment covariates. Unlike existing methods, the proposed method maximizes balance across all covariates simultaneously, rather than along a summary measure of balance. Two applications, a benchmark dataset measuring the impact of a job training program and survey data with information on medical expenditures and smoking habits, illustrate the method's use and efficacy.


Wed Mar 28, 2012  12:30 PM  CS 402  
TBA


Wed Apr 4, 2012  12:30 PM  CS 402  Ryan Adams (Harvard) 
TreeStructured Stick Breaking Processes for Hierarchical Data
The Dirichlet process and related distributions over infinitedimensional random measures have had a large impact on Bayesian nonparametric modeling, but they can be limited by the lack of structure they place over their random partitions. In this work, we use a recursive stick breaking process to construct a Bayesian nonparametric prior for inferring hierarchies in data, yielding an exchangeable urn scheme on trees of unbounded width and depth. We use a novel slice sampling approach based on an auxiliary variable model to update the structure of the tree, as well as Gibbs moves that take advantage of invariance under sizebiased permutation. We use this MCMC sampling scheme to perform hierarchical clustering on a set of 50,000 color images, and also to discover structure from natural language in a corpus of scientific documents. This is joint work with Zoubin Ghahramani (Cambridge) and Michael Jordan (Berkeley). 

Wed Apr 11, 2012  12:30 PM  CS 402  Bob Wilson (Psychology) 
TBA


Wed Apr 18, 2012  12:30 PM  CS 402  Yael Niv (Neuroscience) 
TBA


Wed May 9, 2012  12:30 PM  CS 402  Kuzman Ganchev (Google) 
Rich Prior Knowledge in Learning for Natural Language Processing
We possess a wealth of prior knowledge about most prediction problems,
and particularly so for many of the fundamental tasks in natural
language processing. Unfortunately, it is often difficult to make use
of this type of information during learning, as it typically does not
come in the form of labeled examples, may be difficult to encode as a
prior on parameters in a Bayesian setting, and may be impossible to
incorporate into a tractable model. Instead, we usually have prior
knowledge about the values of output variables. For example,
linguistic knowledge or an outofdomain parser may provide the
locations of likely syntactic dependencies for grammar induction.
Motivated by the prospect of being able to naturally leverage such
knowledge, no less than five different groups have recently developed
similar, general frameworks for expressing and learning with side
information about output variables. This talk will provide a brief
survey of these frameworks as well as some examples of how they have
been applied.


Wed May 16, 2012  12:30 PM  CS 402  Matthew Hoffman (Columbia) 
The NoUTurn Sampler: Adaptively Setting Path Lengths in Hamiltonian
Monte Carlo
Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo (MCMC)
algorithm that avoids the random walk behavior and sensitivity to
correlations that plague many MCMC methods by taking a series of steps
informed by firstorder gradient information. These features allow it
to converge to highdimensional target distributions much more quickly
than popular methods such as random walk Metropolis or Gibbs sampling.
However, HMC's performance is highly sensitive to two userspecified
parameters: a step size epsilon and a desired number of steps L. In
particular, if L is too small then the algorithm exhibits undesirable
random walk behavior, while if L is too large the algorithm wastes
computation. We present the NoUTurn Sampler (NUTS), an extension to
HMC that eliminates the need to set a number of steps L. NUTS uses a
recursive algorithm to build a set of likely candidate points that
spans a wide swath of the target distribution, stopping automatically
when it starts to double back and retrace its steps. NUTS is able to
achieve similar performance to a well tuned standard HMC method,
without requiring user intervention or costly tuning runs. NUTS can
thus be used in applications such as BUGSstyle automatic inference
engines that require efficient "turnkey" sampling algorithms.
