Business Administration - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    Thumbnail Image
    Mitigating spatial confounding by explicitly correlating Gaussian random fields
    Marques, I ; Kneib, T ; Klein, N (Wiley, 2022-08-01)
  • Item
    Thumbnail Image
    Variational inference and sparsity in high-dimensional deep Gaussian mixture models
    Kock, L ; Klein, N ; Nott, DJ (Springer Science and Business Media LLC, 2022-10-01)
    Abstract Gaussian mixture models are a popular tool for model-based clustering, and mixtures of factor analyzers are Gaussian mixture models having parsimonious factor covariance structure for mixture components. There are several recent extensions of mixture of factor analyzers to deep mixtures, where the Gaussian model for the latent factors is replaced by a mixture of factor analyzers. This construction can be iterated to obtain a model with many layers. These deep models are challenging to fit, and we consider Bayesian inference using sparsity priors to further regularize the estimation. A scalable natural gradient variational inference algorithm is developed for fitting the model, and we suggest computationally efficient approaches to the architecture choice using overfitted mixtures where unnecessary components drop out in the estimation. In a number of simulated and two real examples, we demonstrate the versatility of our approach for high-dimensional problems, and demonstrate that the use of sparsity inducing priors can be helpful for obtaining improved clustering results.
  • Item
    Thumbnail Image
    A non-stationary model for spatially dependent circular response data based on wrapped Gaussian processes
    Marques, I ; Kneib, T ; Klein, N (Springer Science and Business Media LLC, 2022-10-01)
    Abstract Circular data can be found across many areas of science, for instance meteorology (e.g., wind directions), ecology (e.g., animal movement directions), or medicine (e.g., seasonality in disease onset). The special nature of these data means that conventional methods for non-periodic data are no longer valid. In this paper, we consider wrapped Gaussian processes and introduce a spatial model for circular data that allow for non-stationarity in the mean and the covariance structure of Gaussian random fields. We use the empirical equivalence between Gaussian random fields and Gaussian Markov random fields which allows us to considerably reduce computational complexity by exploiting the sparseness of the precision matrix of the associated Gaussian Markov random field. Furthermore, we develop tunable priors, inspired by the penalized complexity prior framework, that shrink the model toward a less flexible base model with stationary mean and covariance function. Posterior estimation is done via Markov chain Monte Carlo simulation. The performance of the model is evaluated in a simulation study. Finally, the model is applied to analyzing wind directions in Germany.
  • Item
    Thumbnail Image
    Is age at menopause decreasing? - The consequences of not completing the generational cohort.
    Martins, R ; Sousa, BD ; Kneib, T ; Hohberg, M ; Klein, N ; Duarte, E ; Rodrigues, V (Springer Science and Business Media LLC, 2022-07-11)
    BACKGROUND: Due to contradictory results in current research, whether age at menopause is increasing or decreasing in Western countries remains an open question, yet worth studying as later ages at menopause are likely to be related to an increased risk of breast cancer. Using data from breast cancer screening programs to study the temporal trend of age at menopause is difficult since especially younger women in the same generational cohort have often not yet reached menopause. Deleting these younger women in a breast cancer risk analyses may bias the results. The aim of this study is therefore to recover missing menopause ages as a covariate by comparing methods for handling missing data. Additionally, the study makes a contribution to understanding the evolution of age at menopause for several generations born in Portugal between 1920 and 1970. METHODS: Data from a breast cancer screening program in Portugal including 278,282 women aged 45-69 and collected between 1990 and 2010 are used to compare two approaches of imputing age at menopause: (i) a multiple imputation methodology based on a truncated distribution but ignoring the mechanism of missingness; (ii) a copula-based multiple imputation method that simultaneously handles the age at menopause and the missing mechanism. The linear predictors considered in both cases have a semiparametric additive structure accommodating linear and non-linear effects defined via splines or Markov random fields smoothers in the case of spatial variables. RESULTS: Both imputation methods unveiled an increasing trend of age at menopause when viewed as a function of the birth year for the youngest generation. This trend is hidden if we model only women with an observed age at menopause. CONCLUSION: When studying age at menopause, missing ages must be recovered with an adequate procedure for incomplete data. Imputing these missing ages avoids excluding the younger generation cohort of the screening program in breast cancer risk analyses and hence reduces the bias stemming from this exclusion. In addition, imputing the not yet observed ages of menopause for mostly younger women is also crucial when studying the time trend of age at menopause otherwise the analysis will be biased.
  • Item
    Thumbnail Image
    Using Background Knowledge from Preceding Studies for Building a Random Forest Prediction Model: A Plasmode Simulation Study.
    Hafermann, L ; Klein, N ; Rauch, G ; Kammer, M ; Heinze, G (MDPI AG, 2022-06-20)
    There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use external information for variable selection to improve model interpretability and variable selection accuracy, thereby prediction quality. However, it is unclear to which extent, if at all, RF and ML methods may benefit from external information. In this paper, we examine the usefulness of external information from prior variable selection studies that used traditional statistical modeling approaches such as the Lasso, or suboptimal methods such as univariate selection. We conducted a plasmode simulation study based on subsampling a data set from a pharmacoepidemiologic study with nearly 200,000 individuals, two binary outcomes and 1152 candidate predictor (mainly sparse binary) variables. When the scope of candidate predictors was reduced based on external knowledge RF models achieved better calibration, that is, better agreement of predictions and observed outcome rates. However, prediction quality measured by cross-entropy, AUROC or the Brier score did not improve. We recommend appraising the methodological quality of studies that serve as an external information source for future prediction model development.
  • Item
    No Preview Available
    bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)
    Umlauf, N ; Klein, N ; Simon, T ; Zeileis, A (JOURNAL STATISTICAL SOFTWARE, 2021-11-01)
  • Item
    Thumbnail Image
    Review of guidance papers on regression modeling in statistical series of medical journals.
    Wallisch, C ; Bach, P ; Hafermann, L ; Klein, N ; Sauerbrei, W ; Steyerberg, EW ; Heinze, G ; Rauch, G ; topic group 2 of the STRATOS initiative, ; Mathes, T (Public Library of Science (PLoS), 2022)
    Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical publications. This problem of knowledge transfer from statistical research to application was identified by some medical journals, which have published series of statistical tutorials and (shorter) papers mainly addressing medical researchers. The aim of this review was to assess the current level of knowledge with regard to regression modeling contained in such statistical papers. We searched for target series by a request to international statistical experts. We identified 23 series including 57 topic-relevant articles. Within each article, two independent raters analyzed the content by investigating 44 predefined aspects on regression modeling. We assessed to what extent the aspects were explained and if examples, software advices, and recommendations for or against specific methods were given. Most series (21/23) included at least one article on multivariable regression. Logistic regression was the most frequently described regression type (19/23), followed by linear regression (18/23), Cox regression and survival models (12/23) and Poisson regression (3/23). Most general aspects on regression modeling, e.g. model assumptions, reporting and interpretation of regression results, were covered. We did not find many misconceptions or misleading recommendations, but we identified relevant gaps, in particular with respect to addressing nonlinear effects of continuous predictors, model specification and variable selection. Specific recommendations on software were rarely given. Statistical guidance should be developed for nonlinear effects, model specification and variable selection to better support medical researchers who perform or interpret regression analyses.
  • Item
    Thumbnail Image
    Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification
    Hafermann, L ; Becher, H ; Herrmann, C ; Klein, N ; Heinze, G ; Rauch, G (BMC, 2021-09-29)
    BACKGROUND: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies. METHODS: We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant. RESULTS: Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection. CONCLUSIONS: The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.
  • Item
    Thumbnail Image
    Bayesian Effect Selection in Structured Additive Distributional Regression Models
    Klein, N ; Carlan, M ; Kneib, T ; Lang, S ; Wagner, H (INT SOC BAYESIAN ANALYSIS, 2021-06-01)
  • Item
    Thumbnail Image
    Multivariate conditional transformation models
    Klein, N ; Hothorn, T ; Barbanti, L ; Kneib, T (WILEY, 2020-12-13)