Business Administration - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    No Preview Available
    bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)
    Umlauf, N ; Klein, N ; Simon, T ; Zeileis, A (JOURNAL STATISTICAL SOFTWARE, 2021-11-01)
  • Item
    Thumbnail Image
    Review of guidance papers on regression modeling in statistical series of medical journals.
    Wallisch, C ; Bach, P ; Hafermann, L ; Klein, N ; Sauerbrei, W ; Steyerberg, EW ; Heinze, G ; Rauch, G ; topic group 2 of the STRATOS initiative, ; Mathes, T (Public Library of Science (PLoS), 2022)
    Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical publications. This problem of knowledge transfer from statistical research to application was identified by some medical journals, which have published series of statistical tutorials and (shorter) papers mainly addressing medical researchers. The aim of this review was to assess the current level of knowledge with regard to regression modeling contained in such statistical papers. We searched for target series by a request to international statistical experts. We identified 23 series including 57 topic-relevant articles. Within each article, two independent raters analyzed the content by investigating 44 predefined aspects on regression modeling. We assessed to what extent the aspects were explained and if examples, software advices, and recommendations for or against specific methods were given. Most series (21/23) included at least one article on multivariable regression. Logistic regression was the most frequently described regression type (19/23), followed by linear regression (18/23), Cox regression and survival models (12/23) and Poisson regression (3/23). Most general aspects on regression modeling, e.g. model assumptions, reporting and interpretation of regression results, were covered. We did not find many misconceptions or misleading recommendations, but we identified relevant gaps, in particular with respect to addressing nonlinear effects of continuous predictors, model specification and variable selection. Specific recommendations on software were rarely given. Statistical guidance should be developed for nonlinear effects, model specification and variable selection to better support medical researchers who perform or interpret regression analyses.
  • Item
    Thumbnail Image
    Statistical model building: Background "knowledge" based on inappropriate preselection causes misspecification
    Hafermann, L ; Becher, H ; Herrmann, C ; Klein, N ; Heinze, G ; Rauch, G (BMC, 2021-09-29)
    BACKGROUND: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies. METHODS: We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant. RESULTS: Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection. CONCLUSIONS: The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.
  • Item
    Thumbnail Image
    Editorial "Joint modeling of longitudinal and time-to-event data and beyond"
    Suarez, CC ; Klein, N ; Kneib, T ; Molenberghs, G ; Rizopoulos, D (WILEY, 2017-11-01)
  • Item
    Thumbnail Image
    Studying the relationship between a woman's reproductive lifespan and age at menarche using a Bayesian multivariate structured additive distributional regression model
    Duarte, E ; de Sousa, B ; Cadarso-Suarez, C ; Klein, N ; Kneib, T ; Rodrigues, V (WILEY, 2017-11-01)
    Studies addressing breast cancer risk factors have been looking at trends relative to age at menarche and menopause. These studies point to a downward trend of age at menarche and an upward trend for age at menopause, meaning an increase of a woman's reproductive lifespan cycle. In addition to studying the effect of the year of birth on the expectation of age at menarche and a woman's reproductive lifespan, it is important to understand how a woman's cohort affects the correlation between these two variables. Since the behavior of age at menarche and menopause may vary with the geographic location of a woman's residence, the spatial effect of the municipality where a woman resides needs to be considered. Thus, a Bayesian multivariate structured additive distributional regression model is proposed in order to analyze how a woman's municipality and year of birth affects a woman's age of menarche, her lifespan cycle, and the correlation of the two. The data consists of 212,517 postmenopausal women, born between 1920 and 1965, who attended the breast cancer screening program in the central region of Portugal.
  • Item
    Thumbnail Image
    Boosting joint models for longitudinal and time-to-event data
    Waldmann, E ; Taylor-Robinson, D ; Klein, N ; Kneib, T ; Pressler, T ; Schmid, M ; Mayr, A (WILEY, 2017-11-01)
    Joint models for longitudinal and time-to-event data have gained a lot of attention in the last few years as they are a helpful technique clinical studies where longitudinal outcomes are recorded alongside event times. Those two processes are often linked and the two outcomes should thus be modeled jointly in order to prevent the potential bias introduced by independent modeling. Commonly, joint models are estimated in likelihood-based expectation maximization or Bayesian approaches using frameworks where variable selection is problematic and that do not immediately work for high-dimensional data. In this paper, we propose a boosting algorithm tackling these challenges by being able to simultaneously estimate predictors for joint models and automatically select the most influential variables even in high-dimensional data situations. We analyze the performance of the new algorithm in a simulation study and apply it to the Danish cystic fibrosis registry that collects longitudinal lung function data on patients with cystic fibrosis together with data regarding the onset of pulmonary infections. This is the first approach to combine state-of-the art algorithms from the field of machine-learning with the model class of joint models, providing a fully data-driven mechanism to select variables and predictor effects in a unified framework of boosting joint models.
  • Item
    Thumbnail Image
    Mixed binary-continuous copula regression models with application to adverse birth outcomes
    Klein, N ; Kneib, T ; Marra, G ; Radice, R ; Rokicki, S ; McGovern, ME (Wiley, 2019-02-10)
    Bivariate copula regression allows for the flexible combination of two arbitrary, continuous marginal distributions with regression effects being placed on potentially all parameters of the resulting bivariate joint response distribution. Motivated by the risk factors for adverse birth outcomes, many of which are dichotomous, we consider mixed binary‐continuous responses that extend the bivariate continuous framework to the situation where one response variable is discrete (more precisely, binary) whereas the other response remains continuous. Utilizing the latent continuous representation of binary regression models, we implement a penalized likelihood–based approach for the resulting class of copula regression models and employ it in the context of modeling gestational age and the presence/absence of low birth weight. The analysis demonstrates the advantage of the flexible specification of regression impacts including nonlinear effects of continuous covariates and spatial effects. Our results imply that racial and spatial inequalities in the risk factors for infant mortality are even greater than previously suggested.
  • Item
    Thumbnail Image
    Bayesian Effect Selection in Structured Additive Distributional Regression Models
    Klein, N ; Carlan, M ; Kneib, T ; Lang, S ; Wagner, H (INT SOC BAYESIAN ANALYSIS, 2021-06-01)
  • Item
    Thumbnail Image
    Multivariate conditional transformation models
    Klein, N ; Hothorn, T ; Barbanti, L ; Kneib, T (WILEY, 2020-12-13)
  • Item
    Thumbnail Image
    Bayesian variable selection for non-Gaussian responses: a marginally calibrated copula approach
    Klein, N ; Smith, MS (WILEY, 2020-09-02)
    We propose a new highly flexible and tractable Bayesian approach to undertake variable selection in non-Gaussian regression models. It uses a copula decomposition for the joint distribution of observations on the dependent variable. This allows the marginal distribution of the dependent variable to be calibrated accurately using a nonparametric or other estimator. The family of copulas employed are "implicit copulas" that are constructed from existing hierarchical Bayesian models widely used for variable selection, and we establish some of their properties. Even though the copulas are high dimensional, they can be estimated efficiently and quickly using Markov chain Monte Carlo. A simulation study shows that when the responses are non-Gaussian, the approach selects variables more accurately than contemporary benchmarks. A real data example in the Web Appendix illustrates that accounting for even mild deviations from normality can lead to a substantial increase in accuracy. To illustrate the full potential of our approach, we extend it to spatial variable selection for fMRI. Using real data, we show our method allows for voxel-specific marginal calibration of the magnetic resonance signal at over 6000 voxels, leading to an increase in the quality of the activation maps.