Meeting the Challenges of High Dimension: Statistical Methodology, Theory and Applications(13 Aug - 26 Oct 2012)
## ~ Abstracts ~
Howard D. Bondell, North Carolina State University, USAFor high?dimensional data, particularly when the number of predictors greatly exceeds the sample size, selection of relevant predictors for regression is a challenging problem. Methods such as sure screening, forward selection, or penalized regressions such as LASSO or SCAD are commonly used. Bayesian variable selection methods place prior distributions on the parameters along with a prior over model space, or equivalently, a mixture prior on the parameters having mass at zero. Since exhaustive enumeration is not feasible, posterior model probabilities are often obtained via long MCMC runs. The chosen model can depend heavily on various choices for priors and also posterior thresholds. Alternatively, we propose a conjugate prior only on the full model parameters and to use sparse solutions within posterior credible regions to perform selection. These posterior credible regions typically have closed form representations, and it is shown that these sparse solutions can be computed via existing algorithms. The approach is shown to outperform common methods in the high?dimensional setting, particularly under correlation. By searching for a sparse solution within a joint credible region, consistent model selection is established. Furthermore, it is shown that the simple use of marginal credible intervals can give consistent selection up to the case where the dimension grows exponentially in the sample size.
Peter Buehlmann, ETH Zürich, Switzerland We discuss prediction of causal effects in the high-dimensional setting, with direct implications on designing new intervention experiments. We highlight exciting possibilities and fundamental limitations. Regarding the latter, besides obvious limitations, we largely focus on the so-called strong-faithfulness assumption: it is a necessary condition for popular and computationally efficient algorithms in graphical modeling based on conditional independence testing (PC-algorithm and versions of it), and we provide new probabilistic bounds for failure of the condition. Maximum-likelihood estimation (talk by Sara van de Geer) is an interesting alternative which does not require strong-faithfulness. In view of uncheckable assumptions, statistical inference can not be confirmatory and needs to be complemented with experimental validations: we exemplify this in the context of molecular biology for yeast (Saccharomyces Cerevisiae) and the model plant Arabidopsis Thaliana.
Tony Cai, University of Pennsylvania, USA Covariance structure is of fundamental importance in many areas of statistical inference and a wide range of applications. In this talk I will present recent results on minimax and adaptive estimation of large covariance matrices including bandable covariance matrices, sparse covariance matrices, and sparse precision matrices. I will also discuss related problems such as linear discriminant analysis and testing covariance matrices.
Tony Cai, University of Pennsylvania, USA The coherence of a random matrix, which is defined to be the largest magnitude of the correlations between the columns of the random matrix, is an important quantity for a wide range of applications including high-dimensional statistics and signal processing. In this talk I will discuss the limiting laws of the coherence of an $n\times p$ random matrix for a full range of the dimension p with a special focus on the ultra high-dimensional setting where $p$ can be much larger than $n$. The results show that the limiting behavior of the coherence differs significantly in different regimes and exhibits interesting phase transition phenomena as the dimension $p$ grows as a function of $n$. Applications to statistics and compressed sensing in the ultra high-dimensional setting will also be discussed.
Song Xi Chen, Iowa State University, USAThis paper shows that the optimal Gaussian detection boundary attained by the Higher Critism (HC) test of Donoho and Jin (2004) can be achieved by two alternative tests for high dimensional non-Gaussian data with unknown covariance when the non-zero means are sparse and faint . The two alternative tests are constructed by a L _{1} and a L_{2} versions of the marginal sample means with thresholding away those dimensions whose sample means are weak, followed by maximizing over a range of thresholding levels to make the tests adaptive to the unknown signal strength and sparsity. We shown that the maximal L_{2}-thresholding test is more powerful than the maximal L_{1}-thresholding test, and both are more powerful than the HC test.
Ming-Yen Cheng, National Taiwan University, TaiwanHigh-dimensional data analysis has been an active area, and the main focuses have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lower-dimensional nonlinear manifold. Under this manifold assumption, one purpose of this paper is regression and gradient estimation on the manifold, and another goal is developing a new tool for manifold learning. To the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computation time when the ambient space dimension is much larger than the intrinsic dimension. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and non-uniform sampling effects. A bandwidth selector that can handle heteroscedastic errors is proposed. To the second aim, we analyze carefully the behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the Laplace-Beltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. Simulation studies and the Isomap face data example are used to illustrate the computational speed and estimation accuracy of our methods. This is joint work with Hau-Tieng Wu.
Jeng-Min Chiou, Academia Sinica, TaiwanMotivated by the need for traffic flow prediction in transportation management, we propose a functional data approach to analyze traffic flow patterns and predict the future traffic flow for an up-to-date and partially observed flow trajectory. We approach the problem by sampling traffic flow trajectories from a mixture of stochastic processes, each reflecting a distinct pattern of daily traffic flow trajectories. The proposed functional mixture prediction approach combines functional prediction with probabilistic functional classification to take distinct traffic flow patterns into account. The probabilistic classification procedure, which incorporates functional clustering and discrimination, hinges on subspace projection that lays the groundwork for functional mixture prediction. Although motivated by traffic flow analysis and prediction, the proposed methodology is widely applicable in analysis and prediction of longitudinally recorded functional data.
Dennis Cook, University of Minnesota, USAWe will discuss the asymptotic behavior of methods for dimension reduction in high dimensional regressions where prediction is the overarching goal. Some methods can give consistent predictor reductions as the sample size and number of predictors grow in various alignments, particularly in abundant regressions where most predictors contribute some information on the response. Oracle rates are possible. Potential application in chemometrics and biomedical engineering will be mentioned and an example will be given to illustrate the theoretical conclusions.
Aurore Delaigle, University of Melbourne, AustraliaWe consider classification of curves when the training curves are not observed on the same interval. We suggest different types of classifier, depending on whether or not the observable intervals overlap by a significant amount. We study asymptotic properties of our classifier, and illustrate performance in applications to real and simulated data.
Jianqing Fan, Princeton University, USAEstimating false discovery rate for high-dimensional multiple testing problems based on dependent test statistics is very important to scientific discovery and challenging to statistics.When the covariance matrix is known, Fan, Han and Gu (2012) proposed a principal factor approximation method to deal with an arbitrary dependence structure of test statistics. They derived an approximate formula for false discovery proportion (FDP), which depends on the covariance matrix of the test statistics. In many applications, however, the covariance matrix of test statistics is unknown and has to be estimated. The accuracy of estimated covariance matrix needed for accurately estimating FDP will be unveiled in this talk. In particular, it is shown that when the covariance admits an approximatefactor structure, an estimate can be constructed to satisfy the required accuracy. The results will be illustrated by both simulation and real data applications. (Joint work with Xu Han and Weijie Gu)
Yingying Fan, University of Southern California, USAThis paper is concerned with the selection and estimation of fixed and random effects in linear mixed effects models. We propose a class of nonconcave penalized profile likelihood methods for selecting and estimating important fixed effects. To overcome the difficulty of unknown covariance matrix of random effects, we propose to use a proxy matrix in the penalized profile likelihood. We establish conditions on the choice of the proxy matrix and show that the proposed procedure enjoys the model selection consistency where the number of fixed effects is allowed to grow exponentially with the sample size. We further propose a group variable selection strategy to simultaneously select and estimate important random effects, where the unknown covariance matrix of random effects is replaced with a proxy matrix. We prove that, with the proxy matrix appropriately chosen, the proposed procedure can identify all true random effects with asymptotic probability one, where the dimension of random effects vector is allowed to increase exponentially with the sample size. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. We further illustrate the proposed procedures via a real data example. This is a joint work with Professor Runze Li.
Frédéric Ferraty, Université de Toulouse, FranceThe high dimensional setting is a modern and dynamic research area in Statistics. It covers numerous situations where the number of explanatory variables is much larger than the sample size. Last fifteen years have been devoted to develop new methodologies (mainly in the linear setting) able to manage such high dimensional data (HDD). In this talk, two categories of HDD will retain our attention. The first one (HDD-I) concerns the situation where each statistical unit is featured by the discretization of some underlying continuous process. This particular setting corresponds to the observation of a collection of curves, surfaces, etc; this is what we call functional data. An electricity consumption dataset aiming the prediction of daily load-demand curves from daily temperature curves will illustrate our purpose. A second category of HDD (HDD-II) is the most popular situation when one observe for each unit a large vector derived from a large set of covariates. In this case, we especially will focus on a genomics dataset. In both HDD categories (I and II), most of published works tackle prediction problem mainly by means of linear modelling. However, it is well known that taking into account nonlinearities may improve significantly the predictive power of the statistical methods and also may reveal relevant informations allowing to better understand the observed phenomen. This is what we propose to do in both high-dimensional settings. In the case of HDD-I, we estimate the relationship by means of kernel estimator and afterthen, we propose a methodology allowing us to build a pseudo-confidence area in functions space. When considering HDD-II, we tackle the prediction problem by implementing a sparse nonparametric regression. Of course, investigating on nonparametric model with HDD is very challenging and we have to progress carefully step by step. This is why this talk proposes some modest advances. But the obtained results, especially in the sparse nonparametric regression context (where the introduced methodology leads to more parsimonious model with higher predictive power), are sufficiently convincing and motivating to develop deeper researches in this direction.
Jianhua Guo, Northeast Normal University, ChinaIn the last decades, graphical modeling has been one of the most efficient statistical tools for high dimensional data analysis in various application domains, because graphical modeling not only provides an intuitive probabilistic framework for representing the conditional independence structures of multi-variables, but also has great advantage in reducing computation and complexity for making inference by full using the structural information. There are many useful strategies of using the structural information, such as collapsibility, decomposition and clique tree propagation. In this talk, we will mainly focus on how to use these strategies for structure dimension reduction.
Peter Hall, Saw Swee Hock Professor of Statistics in DSAP, NUS and
University of Melbourne, AustraliaThe lectures will introduce and discuss methodology for functional data analysis, and describe theory that underpins it.
Iain Johnstone, Stanford University, USARoy's largest root test appears in a variety of multivariate problems, including MANOVA, signal detection in noise, etc. In this work, assuming multivariate Gaussian observations, we derive a simple yet accurate approximation for the distribution of the largest eigenvalue in certain settings of "concentrated non-centrality", in which the signal or difference between groups is concentrated in a single direction. The results allow relatively simple power calculations for Roy's test. (Joint work with Boaz Nadler).
Clifford Lam, London School of Economics, UKWe investigate factor modelling when the number of time series grows with sample size. In particular we focus on the case when the number of time series is at least the same order as the sample size. We introduce a method utilizing the autocorrelations of time series for estimation of the factor loading matrix and the factors series, which in the end is equivalent to an eigenanalysis of a non-negative definite matrix. Asymptotic properties will be presented, as well as the choice of the number of factors by an eye-ball test. Theories about such an eye-ball test will also be presented. The method will be illustrated with an analysis of a set of macroeconomic data, as well as extensive simulation results. Some new results about standard principal component analysis (PCA) will also be presented, showing that PCA still works when the noise vector in the factor model is cross-sectionally correlated to a certain extent, beyond which consistency of factor loading matrix is not guaranteed. The method we introduce, however, do not suffer from heavy cross-sectional correlations in the noise. Moreover, it has better performance when factors are categorized into small categories, when PCA cannot have consistent estimation. An improvement of our method will also be introduced when we have time, showing that it is possible to achieve the superb performance of PCA under classical settings, and at the same time performs better under certain conditions.
Ann Lee, Carnegie Mellon University, USAMany estimation problems in statistics and machine learning are highly complex, with high-dimensional, non-standard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transform the data into a simpler, reduced form. Spectral kernel methods are non-linear data transformation methods that efficiently reveal the underlying, often lower-dimensional, structure of observable data. Here we focus on one particular technique: diffusion maps, or more generally, spectral connectivity analysis (SCA). We describe its novel use in high-dimensional regression and density estimation via adaptive bases, with applications in astronomy, image analysis and content-based information retrieval.
Lexin Li, North Carolina State University, USAClassical regression methods treat covariates as a vector and estimate a corresponding vector of regression coefficients. Modern applications in medical imaging generate covariates of more complex form such as multidimensional arrays (tensors). Traditional statistical and computational methods are proving insufficient for analysis of these high-throughput data due to their ultrahigh dimensionality as well as complex structure. In this article, we propose a new family of tensor regression models that efficiently exploit the special structure of tensor covariates. Under this framework, ultrahigh dimensionality is reduced to a manageable level, resulting in efficient estimation and prediction. A fast and highly scalable estimation algorithm is proposed for maximum likelihood estimation and its associated asymptotic properties are studied. Effectiveness of the new methods is demonstrated on both synthetic and real MRI imaging data.
Bing Li, Pennsylvania State University, USAWe give a general formulation of nonlinear sufficient dimension reduction, and explore its ramifications and scope. This formulation subsumes recent work employing reproducing kernel Hilbert spaces, and reveals many parallels between linear and nonlinear sufficient dimension reduction. Using these parallels we analyze the population-level properties of existing methods and develop new ones. We begin at the completely general level of $\sigma$-fields, and proceed to that of measurable and generating classes of functions. This leads to the notions of sufficient, complete and sufficient, and central dimension reduction classes. We show that, when it exists, the complete and sufficient class coincides with the central class, and can be unbiasedly and exhaustively estimated by a generalized slice inverse regression estimator (GSIR). When completeness does not hold, this estimator captures only part of the central class (i.e.~remains unbiased but is no longer exhaustive). However, we show that a generalized sliced average variance estimator (GSAVE) can capture a larger portion of the class. Both estimators require no numerical optimization, because they can be computed by spectral decomposition of linear operators. Finally, we compare our estimators with existing methods by simulation. (Joint work with Kuang-Yao Lee and Francesca Chiaromonte)
Runze Li, Pennsylvania State University, USAUltra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this talk, I first introduce recent development on the methodology and theory of nonconvex penalized quantile linear regression in ultra-high dimension. I further propose a two-stage feature screening and cleaning procedure to study the estimation of the index parameter in heteroscedastic single-index models with ultrahigh dimensional covariates. Sampling properties of the proposed procedures are studied. Finite sample performance of the proposed procedure is examined by Monte Carlo simulation studies. A real example example is used to illustrate the proposed methodology.
Yufeng Liu, University of North Carolina at Chapel Hill, USAClustering methods provide a powerful tool for the exploratory analysis of high dimensional datasets, such as gene expression microarray data. A fundamental statistical issue in clustering is which clusters are "really there," as opposed to being artifacts of the natural sampling variation. In this talk, I will present Statistical Significance of Clustering (SigClust) as a cluster evaluation tool. In particular, we define a cluster as data coming from a single Gaussian distribution and formulate the problem of assessing statistical significance of clustering as a testing procedure. Under this hypothesis testing framework, the cornerstone of our SigClust analysis is accurate estimation of those eigenvalues of the covariance matrix of the null multivariate Gaussian distribution. In this talk, we propose a likelihood based soft thresholding approach for the estimation of the covariance matrix eigenvalues. Our theoretical work and simulation studies show that our proposed SigClust procedure works remarkably well. Applications to some cancer microarray data examples demonstrate the usefulness of SigClust.
Jinchi Lv, University of Southern California, USAHigh-dimensional data analysis has motivated a spectrum of regularization methods for variable selection and sparse modeling, with two popular classes of convex ones and concave ones. A long debate has been on whether one class dominates the other, an important question both in theory and to practitioners. In this paper, we characterize the asymptotic equivalence of regularization methods, with general penalty functions, in a thresholded parameter space under the generalized linear model setting, where the dimensionality can grow up to exponentially with the sample size. To assess their performance, we establish the oracle inequalities, as in Bickel, Ritov and Tsybakov (2009), of the global optimizer for these methods under various prediction and variable selection losses. These results reveal an interesting phase transition phenomenon. For polynomially growing dimensionality, the $L_1$-regularization method of Lasso and concave methods are asymptotically equivalent, having the same convergence rates in the oracle inequalities. For exponentially growing dimensionality, concave methods are asymptotically equivalent but have faster convergence rates than the Lasso method. We also establish a stronger property of the oracle risk inequalities of the regularization methods. Our new theoretical results are illustrated and justified by simulation and real data examples. This is a joint work with Yingying Fan.
Zongming Ma, University of Pennsylvania, USAWe consider the problem of constructing confidence bands for a collection of Lipschitz function classes. It is well known that though there exist estimators that adapt to the unknown smoothness of the underlying function, honest adaptive confidence band does not exist. In this talk, we propose a relaxation of the usual definition of coverage, and show that there exists a new confidence band procedure that can adapt to a limited collection of Lipschitz classes under this new notion of coverage. This is a joint work with Tony Cai and Mark Low.
Geoff McLachlan, University of Queensland, AustraliaWe consider the modelling of multivariate (p-dimensional) continuous data via finite mixtures of normal distributions and of variants of the normal distribution. We focus on the case where p is not small relative to the number of observations n. There has been a proliferation of applications in practice where p is not small relative to n. Hence there has been increasing attention given in statistics to the analysis of complex data in this situation where p is not relatively small. The normal mixture model is a highly parameterized one with each component-covariance matrix consisting of p(p+1)/2 distinct parameters in the unrestricted case. Hence some restrictions must be imposed and/or a variable selection method applied beforehand with the use of normal mixture models if p is not small relative to n. Attention is given to the use of factor models that reduce the number of parameters in the specification of the component-covariance matrices. In some applications the underlying dimension p is extremely large as, for example, in microarray-based genomics and other high-throughput experimental approaches. In most of these applications p is so large that despite any sparsity that may be present there needs to be some reduction in the number of variables for it to be computationally feasible to fit the mixture of factor models. We discuss some approaches to this preliminary variable selection problem. The proposed methods are to be demonstrated in their application to some high-dimensional data sets from the bioinformatics literature. We also consider the use of mixture models with component t-distributions to provide better protection than normal components against outliers and the use of mixtures with skew normal and skew t-components in situations where there is asymmetry in the clusters (that is, they are not elliptically symmetric).
Geoff McLachlan, University of Queensland, AustraliaThere has been a rapid growth in applications in which the number of experimental units n is comparatively small but the underlying dimension p is extremely large as, for example, in microarray-based genomics and other high-throughput experimental approaches. Hence there has been increasing attention given not only in bioinformatics and machine learning, but also in mainstream statistics, to the analysis of complex data in this situation where n is small relative to p. In this talk, we focus on the clustering of high-dimensional data, using normal and t-mixture models. Their use in this context is not straightforward, as the normal mixture model is a highly parameterized one with each component-covariance matrix consisting of p(p+1)/2 distinct parameters in the unrestricted case. Hence some restrictions must be imposed and/or a variable selection method applied beforehand. We shall focus on the use of factor models that reduce the number of parameters in the specification of the component-covariance matrices. In some applications p is so large that despite any sparsity that may be present there needs to be some reduction in the number of variables for it to be computationally feasible to fit the mixture of factor models. We discuss some approaches to this preliminary variable selection problem. The proposed methods are to be demonstrated in their application to some high-dimensional data sets from the bioinformatics literature.
Boaz Nadler, Weizmann Institute of Science, IsraelPhase retrieval - namely the recovery of a signal from its absolute Fourier transform coefficients is a problem of fundamental importance in many scientific fields. While in two dimensions phase retrieval typically has a unique solution, in 1-D the phase retrieval problem is often not even well posed, admitting multiple solutions. In this talk I'll present a novel framework for reconstruction of pairs of signals, from possibly noisy measurements of both their spectral intensities, and of their mutual interferences. First, We show that for noise-free measurements of compactly supported signals, this new setup, denoted vectorial phase retrieval, admits a unique solution. We then derive a computationally efficient and statistically robust spectral algorithm to solve the vectorial phase retrieval problem, as well as a model selection criteria to estimate the unknown compact support. We illustrate the reconstruction performance of our algorithm on several simulated signals. We conclude with some yet unresolved challenges - mathematical, statistical and computational. Joint work with Oren Raz and Nirit Dudovich (Weizmann Institute of Science).
John Rice, University of California at Berkeley, USAIn contrast to targeted searches, synoptic surveys scan large portions of the sky nightly, producing images of a multitude of objects. Looking toward the future, it is planned that beginning in 2020, the Large Scale Synoptic Survey Telescope (LSST) will take an image of the entire available sky every few nights with a 3.2 giga-pixel camera taking exposures every 15 seconds. The camera is expected to take over one petabytes per year, far more than can be reviewed by humans. Managing and effectively data mining the enormous output of the telescope is expected to be the most technically difficult part of the project. Initial computer requirements are estimated at 100 teraflops of computing power and 15 petabytes of storage. These 100 teraflops will be largely used executing algorithms that implement statistical methodology. In this talk I will discuss some issues that have arisen in developing such methodology for smaller scale precursors to LSST. In particular, I will present work on discovering transient objects in real time from digital images and on classifying those objects from noisy, erratically sampled time series. I will discuss approaches to problems of distributional shifts and "errors in variables" that arise in trying to use well characterized times series of long duration to classify new unknown sources and in trying to extrapolate from one survey to another. My presentation is based on work with collaborators at the Center for Time Domain Informatics at Berkeley - https://sites.google.com/site/cftdinfo/
Philippe Rigollet, Princeton University, USASparse Principal Component Analysis (SPCA) is a remarkably useful tool for practitioners who had been relying on ad-hoc thresholding methods. Our analysis aims at providing a a framework to test whether the data at hand indeed contains a sparse principal component. More precisely we propose an optimal test procedure to detect the presence of a sparse principal component in a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels and performs very well on simulated datasets.
Qiman Shao, The Chinese University of Hong Kong, Hong KongIt has been shown that the Benjamini-Hochberg method of controlling false discovery rate remains valid under various dependence structures. It is also often assumed that the p-values are known and the number of true alternative hypothesis is of the same order as the number of tests. However, this idealized assumption is hard to meet in practice because the population distribution is usually unknown and the signals in many applications may be sparse. In this talk we propose a robust control of false discovery rate under dependence. It not only allows the sparse alternatives but also is robust against the tails of the underlying distributions and the dependence structure. Only finite fourth moment of the null distribution is required to achieve asymptotic-level accuracy of large scale tests in the ultra high dimension. The method is applied to gene selection, shape analysis of brain structures and periodic patterns in gene expression data. The method also shares favorable numerical performance on both the simulated data and a real breast cancer data. To get a more accurate approximation for the null distribution, a computation efficient bootstrap procedure is also developed. This talk is based on a joint work with Weidong Liu.
Yee Whye Teh, University College London, UKIn this talk I will present a novel approach to modelling sequence data called the sequence memoizer. As opposed to most other sequence models, our model does not make any Markovian assumptions. Instead, we use a hierarchical Bayesian approach which allows effective sharing of statistical strength across the different parts of the model and effective parameter estimation. To make computations with the model efficient, and to better model the power-law statistics often observed in sequence data arising from data-driven linguistics applications, we use a Bayesian nonparametric prior called the Pitman-Yor process as building blocks in the hierarchical model. Computations in the model are very efficient, allowing to handle the large scale data present in language modelling and text compression applications, where we show state-of-the-art results. This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and Lancelot James.
Aad van der Vaart, Vrije Universiteit, The NetherlandsWe present some (frequentist) properties of the posterior distribution for hierarchical priors that first select a (small) subset of nonzero parameters and next the values of the nonzero parameters. We consider this in particular for the many normal means problem and also for the linear regression problem. We review some other (empirical) Bayesian approaches to these problems.
Martin Wainwright, University of California at Berkeley, USAAlthough the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both provide non-asymptotic bounds on the error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. We illustrate these theoretical predictions with applications to graphical model estimation and selection. Joint work with Po-Ling Loh Pre-print: http://arxiv.org/abs/1109.3714
Ernst Wit, University of Groningen, The NetherlandsFrom genetics to finance, from home-land security to epidemiology the objective of several data-intensive studies is to infer the relationships between various actors under scrutiny. A graph is one possible way to describe these relationships. In many cases, the data comes from large monotoring systems with no prior screening. The actual set of relationships, therefore, tends to be sparse. When data is obtained from noisy measurements of (some of) the nodes in the graph, then graphical models present an appealing and insightful way to describe graph-based dependencies between the random variables. Although potentially still interesting, the main aim of inference is not the precise estimation of the parameters in the graphical model, but the underlying structure of the graph. Graphical lasso and related methods opened up the field of sparse graphical model inference in high-dimensions. We show how extensions of such methods in more structured settings can improve interpretation. Moreover, in this presentation, we show how novel model selection criteria can deal with the determining the underlying graph in an efficient way.
Jinghao Xue, University College London, UKThere are two approaches to probabilistic modelling for statistical classification: one is called the ?generative? approach, and the other is called the ?discriminative? approach. Examples of the generative approach include Gaussian-based linear discriminant analysis and the naïve Bayes classifier; a canonical discriminative method is linear logistic regression. Each approach has its advantages and disadvantages. In order to exploit the best of both worlds, hybrid generative-discriminative methods have been proposed by the statistics and machine-learning communities. The talk will discuss one of our pieces of work in this area. (Joint work with D. Michael Titterington.)
Jean Yang, University of Sydney, AustraliaAs microarray and other forms of high throughput and meta-data become more readily available there is a growing need to successfully integrate expression and other forms of high-throughput data in practice. In this talk, we examine two biological questions associated with integrating clinical, gene expression and microRNA expression data. Our discussion uses a case study that examines aggressive metastasised melanoma. We first examine models for integrating data in a prediction context. The second question aims to understand the underlying regulatory mechanisms by incorporating microRNA (miRNA) and gene expression data. One major challenge in the identification of target-mRNAs is the need to accommodate the many-to-many mapping between miRNAs and mRNAs. A miRNA can target hundreds of mRNAs and several miRNAs target a single mRNA. I will discuss statistical approaches based on multivariate random forest for determining the miRNA regulatory modules and their target-mRNA modules.
Ming Yuan, Georgia Institute of Technology, USAThe problem of low rank estimation naturally arises in a number of functional or relational data analysis settings. We consider a unified framework for these problems and devise a novel penalty function to exploit the low rank structure in such contexts. The resulting empirical risk minimization estimator can be shown to be optimal under fairly general conditions.
Harrison Huibin Zhou, Yale University, USAIn this talk I will present some results on estimation of large sparse precision matrices including rate-optimalities and statistical inference, and discuss their application to estimation of latent variable graphical models.
Liping Zhu, Shanghai University of Finance and Economics, ChinaWe provide a novel and completely different approach to dimension reduction problems from the existing literature. We cast the dimension reduction problem in a semiparametric estimation framework and derive estimating equations. Viewing this problem from the new angle allows us to derive a rich class of estimators, and obtain the classical dimension reduction techniques as special cases in this class. The semiparametric approach also reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression. The semiparametric estimators without these common assumptions are illustrated through simulation studies and a real data example.
Lixing Zhu, Hong Kong Baptist University, Hong KongIn this talk, I will introduce a screening and cleaning method to select informative covariates in a generalized linear model with longitudinal data. The screening step is based on generalized estimating equations and cleaning step uses a weight least squares to define an objective function such that a penalty can be added to further select covariates. The method is used to a bio-medical data for illustration.
Hui Zou, University of Minnesota, USAWe introduce a constrained empirical loss minimization framework for estimating highdimensional sparse precision matrices and propose a new loss function called the D-Trace loss for that purpose. A novel sparse precision matrix estimator is defined as the minimizer of the L1 penalized D-Trace loss under a positive definite constraint. Under a new irrepresentability condition the L1 penalized D-Trace estimator has the sparse recovery property. Concrete examples are given to show that the new irrepresentability condition can hold while the irrepresentability condition for the L1 penalized Gaussian likelihood estimator fails. We establish rates of convergence of the new estimator under the element-wise maximum norm, Frobenius norm and operator norm. We develop a very efficient algorithm based on alternating direction methods for computing the proposed estimator. Simulated and real data are used to demonstrate the computational efficiency of our algorithm and the finite sample performance of the new estimator. It is shown that the L1 penalized D-Trace estimator compares favorably with the L1 penalized Gaussian likelihood estimator, even when the underlying distribution is Gaussian. |
||