Economics - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Multivariate Count Regression Models with Applications in Insurance
    Zhang, Pengcheng ( 2021)
    This thesis proposes several multivariate count regression models in insurance. In today's life, given that insurance companies often write multiple lines of insurance business, where the claim counts on these lines of business are often correlated, there is a strong incentive to analyze multivariate claim count models. Motivated by real insurance datasets, we construct a number of multivariate count models to deal with various types of characteristics exhibited in the data. Some distributional properties concerning the models are examined. The inferential procedures are completed either via the expectation–maximization (EM) algorithm or the Markov chain Monte Carlo (MCMC) algorithm. First of all, we investigate multivariate count modeling with dependence. One possible way to construct such model is to implement copula directly on discrete margins. However, likelihood inference under this construction involves the computation of multidimensional rectangle probabilities, which could be computationally expensive especially in the elliptical copula case. Another potential approach is based on the multivariate mixed Poisson model. The crucial work under this method is to find an appropriate multivariate continuous distribution for mixing parameters. By virtue of the copula, this issue could be easily addressed. Under such framework, MCMC method is a feasible strategy for inference. The usefulness of our model is then illustrated through a real-life example. Both the in-sample analysis and out-of-sample prediction demonstrate the superiority of using copula-based mixture over other types of mixture. The fact that a large proportion of insurance policyholders make no claims during a one-year period highlights the importance of zero-inflated count models when analyzing the frequency of insurance claims. There is a vast literature focused on the univariate case of zero-inflated count models, while work in the area of multivariate models is considerably less advanced. In the second part of this thesis, we develop a multivariate zero-inflated hurdle model to describe multivariate count data with extra zeros. Our model offers flexibility in modeling the behavior of individual claim counts while also incorporating a correlation structure between claim counts for different lines of insurance business. We develop an application of the EM algorithm to enable the statistical inference necessary to estimate the parameters associated with our model. Our model is then applied to an automobile insurance portfolio from a major insurance company in Spain. We demonstrate that the model performance for the multivariate zero-inflated hurdle model is superior when compared to existing multivariate claim count models. In contrast to the previous section, the third part of this thesis investigates a multivariate zero-truncation problem. In the general insurance modeling literature, there has been a lot of work based on the univariate zero-truncated models, but little has been done in the multivariate zero-truncation case. There are three types of zero-truncation in the multivariate setting: only records with all zeros are missing, zero counts for one or some classes are missing, or zeros are completely missing for all classes. In this chapter, we focus on the first case, the so-called Type I zero-truncation, and a new multivariate zero-truncated hurdle model is developed to study it. The key idea of developing such a model is to identify a stochastic representation for the underlying random variables, which enables us to use the EM algorithm to simplify the estimation procedure. This model is used to analyze a health insurance claims dataset that contains claim counts from different categories without common zero observations. In the process of insurance underwriting, policyholders tend to report untrue statements especially when they can benefit from this act, like paying lower premium. However, it is a demanding task to fully investigate the unobservable misrepresentation status, and models without accommodating the misrepresentation tend to result in biased estimates. So in the fourth part of this thesis, we address the issue of misrepresentation in multivariate frequency setting. A multivariate Poisson model is developed with a binary misrepresented risk factor incorporated. The fact that each margin depends on this factor complicates the covariance structure. For inference, the EM algorithm is implemented. Two small simulation studies are then carried out to compare the performance of the model adjusted for misrepresentation with the unadjusted model. Then we perform a frequency analysis using real data obtained from the Australian Health Survey 1977-1978, where a binary variable indicating the health score is regarded as the misrepresented factor. At last, we present some concluding remarks. The limitations of the research conducted in this thesis are discussed, and several potential future research problems are also specified.