Business Administration - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 40
  • Item
    Thumbnail Image
    Is age at menopause decreasing? - The consequences of not completing the generational cohort.
    Martins, R ; Sousa, BD ; Kneib, T ; Hohberg, M ; Klein, N ; Duarte, E ; Rodrigues, V (Springer Science and Business Media LLC, 2022-07-11)
    BACKGROUND: Due to contradictory results in current research, whether age at menopause is increasing or decreasing in Western countries remains an open question, yet worth studying as later ages at menopause are likely to be related to an increased risk of breast cancer. Using data from breast cancer screening programs to study the temporal trend of age at menopause is difficult since especially younger women in the same generational cohort have often not yet reached menopause. Deleting these younger women in a breast cancer risk analyses may bias the results. The aim of this study is therefore to recover missing menopause ages as a covariate by comparing methods for handling missing data. Additionally, the study makes a contribution to understanding the evolution of age at menopause for several generations born in Portugal between 1920 and 1970. METHODS: Data from a breast cancer screening program in Portugal including 278,282 women aged 45-69 and collected between 1990 and 2010 are used to compare two approaches of imputing age at menopause: (i) a multiple imputation methodology based on a truncated distribution but ignoring the mechanism of missingness; (ii) a copula-based multiple imputation method that simultaneously handles the age at menopause and the missing mechanism. The linear predictors considered in both cases have a semiparametric additive structure accommodating linear and non-linear effects defined via splines or Markov random fields smoothers in the case of spatial variables. RESULTS: Both imputation methods unveiled an increasing trend of age at menopause when viewed as a function of the birth year for the youngest generation. This trend is hidden if we model only women with an observed age at menopause. CONCLUSION: When studying age at menopause, missing ages must be recovered with an adequate procedure for incomplete data. Imputing these missing ages avoids excluding the younger generation cohort of the screening program in breast cancer risk analyses and hence reduces the bias stemming from this exclusion. In addition, imputing the not yet observed ages of menopause for mostly younger women is also crucial when studying the time trend of age at menopause otherwise the analysis will be biased.
  • Item
    Thumbnail Image
    Using Background Knowledge from Preceding Studies for Building a Random Forest Prediction Model: A Plasmode Simulation Study.
    Hafermann, L ; Klein, N ; Rauch, G ; Kammer, M ; Heinze, G (MDPI AG, 2022-06-20)
    There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use external information for variable selection to improve model interpretability and variable selection accuracy, thereby prediction quality. However, it is unclear to which extent, if at all, RF and ML methods may benefit from external information. In this paper, we examine the usefulness of external information from prior variable selection studies that used traditional statistical modeling approaches such as the Lasso, or suboptimal methods such as univariate selection. We conducted a plasmode simulation study based on subsampling a data set from a pharmacoepidemiologic study with nearly 200,000 individuals, two binary outcomes and 1152 candidate predictor (mainly sparse binary) variables. When the scope of candidate predictors was reduced based on external knowledge RF models achieved better calibration, that is, better agreement of predictions and observed outcome rates. However, prediction quality measured by cross-entropy, AUROC or the Brier score did not improve. We recommend appraising the methodological quality of studies that serve as an external information source for future prediction model development.
  • Item
    No Preview Available
    bamlss: A Lego Toolbox for Flexible Bayesian Regression (and Beyond)
    Umlauf, N ; Klein, N ; Simon, T ; Zeileis, A (JOURNAL STATISTICAL SOFTWARE, 2021-11-01)
  • Item
  • Item
    Thumbnail Image
    Interpretable modelling of retail demand and price elasticity for passenger flights using booking data
    Meyer, JF ; Kauermann, G ; Smith, MS (SAGE PUBLICATIONS LTD, 2022-05-09)
    We propose a model of retail demand for air travel and ticket price elasticity at the daily booking and individual flight level. Daily bookings are modelled as a non-homogeneous Poisson process with respect to the time to departure. The booking intensity is a function of booking and flight level covariates, including non-linear effects modelled semi-parametrically using penalized splines. Customer heterogeneity is incorporated using a finite mixture model, where the latent segments have covariate-dependent probabilities. We fit the model to a unique dataset of over one million daily counts of bookings for 9 602 scheduled flights on a short-haul route over two years. A control variate approach with a strong instrument corrects for a substantial level of price endogeneity. A rich latent segmentation is uncovered, along with strong covariate effects. The calibrated model can be used to quantify demand and price elasticity for different flights booked on different days prior to departure and is a step towards continuous pricing; something that is a major objective of airlines. As our model is interpretable, forecasts can be created under different scenarios. For instance, while our model is calibrated on data collected prior to COVID-19, many of the empirical insights are likely to remain valid as air travel recovers in the post-COVID-19 period.
  • Item
    Thumbnail Image
    How the Linguistic Styles of Donald Trump and Joe Biden Reflect Different Forms of Power
    Körner, R ; Overbeck, JR ; Körner, E ; Schütz, A (SAGE Publications, 2022-01-01)
    Can theories of power be used to explain differences in the linguistic styles of Donald Trump and Joe Biden? We argue that the two candidates possess and use different forms of power—and that this is associated with typical language patterns. Based on their personal history, news reports, and empirical studies, we expect that Trump’s approach to power is characterized by coercive power forms and Biden’s by collaborative power forms. Using several LIWC categories and the moral foundations dictionary, we analyzed over 500 speeches and 15,000 tweets made during the 2020 election battle. Biden’s speeches can be described as analytical and frequently relating to moral values, whereas Trump’s speeches were characterized by a positive emotional tone. In tweets, Biden used more social words and words related to virtue, honesty, and achievement than Trump did. Trump’s coercive power and Biden’s collaborative power were more observable in tweets than speeches, which may reflect the fact that tweets are more spontaneous than speeches.
  • Item
    Thumbnail Image
    Review of guidance papers on regression modeling in statistical series of medical journals.
    Wallisch, C ; Bach, P ; Hafermann, L ; Klein, N ; Sauerbrei, W ; Steyerberg, EW ; Heinze, G ; Rauch, G ; topic group 2 of the STRATOS initiative, ; Mathes, T (Public Library of Science (PLoS), 2022)
    Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical publications. This problem of knowledge transfer from statistical research to application was identified by some medical journals, which have published series of statistical tutorials and (shorter) papers mainly addressing medical researchers. The aim of this review was to assess the current level of knowledge with regard to regression modeling contained in such statistical papers. We searched for target series by a request to international statistical experts. We identified 23 series including 57 topic-relevant articles. Within each article, two independent raters analyzed the content by investigating 44 predefined aspects on regression modeling. We assessed to what extent the aspects were explained and if examples, software advices, and recommendations for or against specific methods were given. Most series (21/23) included at least one article on multivariable regression. Logistic regression was the most frequently described regression type (19/23), followed by linear regression (18/23), Cox regression and survival models (12/23) and Poisson regression (3/23). Most general aspects on regression modeling, e.g. model assumptions, reporting and interpretation of regression results, were covered. We did not find many misconceptions or misleading recommendations, but we identified relevant gaps, in particular with respect to addressing nonlinear effects of continuous predictors, model specification and variable selection. Specific recommendations on software were rarely given. Statistical guidance should be developed for nonlinear effects, model specification and variable selection to better support medical researchers who perform or interpret regression analyses.
  • Item
    Thumbnail Image
    International survey evidence on user and community co-delivery of prevention activities relevant to public services and outcomes
    Bovaird, T ; Loeffler, E ; Yates, S ; Van Ryzin, G ; Alford, J (ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD, 2021-10-17)
  • Item
    Thumbnail Image
    More on Convening Technology: Blockchain, Fashion, and the Right to Know
    Lim, K ; Richardson, M ; Teoh, SY ; Seto, W (Wiley, 2022)
    For many years mired in its cryptocurrency history, blockchain held little interest to those working outside the financial world. It now offers the fashion industry and its diverse publics the enticing prospect of a transparent value chain for ethical and sustainable fashion, catering to public demands for a right to know data on authenticity and provenance. Whether this is a feasible prospect remains to be seen. Nevertheless, in staking out its position, blockchain appears to be moving into an interesting phase of its short tumultuous existence. In short, it is taking on the character of a “convening technology” – becoming “the focus of a conversation that can [potentially] address issues far beyond what it may ultimately be able to address itself”, and marshalling “resources, institutions and other forms of power”. A difficulty is reconciling this beneficial function with the need for the so-called “technology of trust” to be trustworthy in practice, requiring at least a minimal governance model.
  • Item
    Thumbnail Image
    Exact confidence limits after a group sequential single arm trial
    Lloyd, C (John Wiley and Sons, 2021-05-10)
    Group sequential single arm designs are common in phase II trials as well as attribute testing and acceptance sampling. After the trial is completed, especially if the recommendation is to proceed to further testing, there is interest in full inference on treatment efficacy. For a binary response, there is the potential to construct exact upper and lower confidence limits, the first published method for which is Jennison and Turnbull (1983). We place their method within the modern theory of exact confidence limits and provide a new general result that ensures that the exact limits are consistent with the test result, an issue that has been largely ignored in the literature. Amongst methods based on the minimal sufficient statistic, we propose two exact methods that out‐perform Jennison and Turnbull's method across 10 selected designs. One of these we prefer and recommend for practical and theoretical reasons. We also investigate a method based on inverting Fisher's combination test, as well as a pure tie‐breaking variant of it. For the range of designs considered, neither of these methods result in large enough improvements in efficiency to justify violation of the sufficiency principle. For any nonadaptive sequential design, an R‐package is provided to select a method and compute the inference from a given realization.