Estimation of the Extreme Distribution Model of Economic Losses Due to Outbreaks Using the POT Method with Newton Raphson Iteration

Extreme distribution is the distribution of a random variable that focuses on determining the probability of small values in the tail area of the distribution. This distribution is widely used in various fields, one of which is reinsurance. An outbreak catastrophe is non-natural disaster that can pose an extreme risk of economic loss to a country that is exposed to it. To anticipate this risk, the government of a country can insure it to a reinsurance company which is then linked to bonds in the capital market so that new securities are issued, namely outbreak catastrophe bonds. In pricing, knowledge of the extreme distribution of economic losses due to outbreak catastrophe is indispensable. Therefore, this study aims to determine the extreme distribution model of economic losses due to outbreak catastrophe whose models will be determined by the approaches and methods of Extreme Value Theory and Peaks Over Threshold, respectively. The threshold value parameter of the model will be estimated by Kurtosis Method, while the other parameters will be estimated with Maximum Likelihood Estimation Method based on Newton-Raphson Iteration. The result of the research obtained is the resulting model of extreme value distribution of economic losses due to outbreak catastrophe that can be used by reinsurance companies as a tool in determining the value of risk in the outbreak catastrophe bonds.


Introduction
Extreme events are events that rarely occur, but can cause very large losses. This incident can occur in various fields such as health, finance, agriculture and others. Examples are outbreak catastrophe, monetary crises, death of family head, crop failure, and so on. One tool that can be used to explain the risk of these events is the extreme distribution. The extreme distribution is the distribution of a random variable which is only limited to values that have a small probability of occurrence. In other words, the determination of risk using this distribution only focuses on the tail area of the distribution. Extreme distribution is widely used in various fields, one of which is the field of reinsurance. To prevent bankruptcy, reinsurance companies must determine the amount of claims so that the probability of it occuring is not too great. Thus, this extreme distribution will greatly help reinsurance companies in determining the amount of claims it bears because the occurrence of claims is included in extreme events with a small probability of occurrence.
Outbreak catastrophe that occurs in a country not only cause health problems, but also cause great economic losses. These losses can occur in various sectors, such as tourism, households, corporations, finance, and others (Qiu et al., 2018). To anticipate the risk, the government can insure it to reinsurance companies. Then, to expand their coverage capacity, they transfer the risk to the capital market (Cox and Pedersen, 2000). One of the securities that can be used as a means to accepting this risk is a bond. These securities werer called outbreak catastrophe bonds (Liu et al., 2014) . The most important part in the pricing of these bonds is the determination of the risk of losses. Therefore, knowledge of this matter is urgently needed, one of which can be obtained by extreme distribution (Jockovic, 2012). Zimbidis et al. (2007) determined the extreme distribution of catastrophe losses with the Extreme Value Theory (EVT) Approach, whose interest rates are stochastically determined through the Cox-Ingersoll-Ross Model (CIR). Chao and Zou (2018) determined the extreme distribution of catastrope losses with two trigger events each obtained by the EVT Approach which was then combined with the Copula function. Then, Residori (2019) determined the extreme distribution of catastrophe losses with the EVT Approach through the Block Maxima Method.
Based on the explanation above, this study will focus on determining the extreme distribution of losses due to outbreak catastrophe. The approach and method used to determine this distribution are Extreme Value Theory (EVT) and Peaks Over Threshold (POT), respectively. The quantile model of the distribution will also be estimated by the High Quantile Estimation Model. The EVT approach was chosen because of the extreme nature of outbreak catastrophe events. Then, the POT method was chosen because the classification of extreme data is determined based on data that exceeds a certain threshold value, regardless of the time of catastrophe, so it is very effective to use considering the small number of data.

Methods
The object of research used is data on economic losses due to outbreak catastrophe in various countries from 1976 to 2020. Data is collected from various sources such as the World Bank (http://pubdocs.worldbank.org), Asian Development Bank (https://www.adb.org), and so on. The tail of the data distribution must be fat. It is examined via its kurtosis. The tail of the data distribution is said to be fat if the kurtosis is greater than three, and vice versa. Then, the threshold value parameter will be estimated by Kurtosis Method, while the parameters of Generalized Pareto Distribution (GPD) which are assumed to match the data that has been resampled will be estimated by Maximum Likelihood Estimation Method based on Newton-Raphson Iteration. After that, the assumption of match of distribution will be proven by Kolmogorov-Smirnov Test. Finally, extreme distribution models and their quantile models will be built with Peaks Over Threshold and High Quantile Estimation, respectively.

Kurtosis Method
Threshold value is the boundary value between extreme data and non-extreme data. If are defined as the data sorted from smallest to largest, then iteration of kurtosis calculations can be done with the following equation: where represents the number of observed data, represents the number of members of the sub data, represents the kurtosis of the -subdata, ̅ represents the average of the -subdata, and represents the standard deviation of the -subdata. If -subdata has been obtained with for the first time, then the iteration is stopped and the largest data from the -subdata is selected as the threshold value. After that, a data resample is done by reducing each extreme data by the threshold value.

Generalized Pareto Distribution (GPD)
According to Jockovic (2012), if ( ) where and each represent the scale and shape parameters of the GPD, then the distribution can be expressed as follows: while the density function is as follows:

Maximum Likelihood Estimation
Maximum Likelihood Estimation is a method of estimating the parameters of a probability distribution that maximizes the likelihood function. If are defined as random variables which are independent and identically distributed with the probability function ( ) where is an unknown parameter, then the likelihood function of can be expressed as follows: The log-likelihood function is the natural logarithmic form of equation (4). According to Purba et al. (2017), this function is monotonically related to equation (4) so that maximizing this function is equivalent to maximizing equation (4). The following is an expression of the log-likelihood function: If the first derivative of equation (5) against exists, then the value which maximizes equation (5) is the solution of the following equation:

Newton-Raphson Iteration
According to Bakari et al. (2016), Newton-Raphson iteration is a method of estimating the maximum solution of a function numerically. Let ( ) be a function with unknown parameter and the ( )-th and -th estimates of are denoted as ̂ and ̂ . If the first and second derivatives of ( ) against in ̂ are known, then the general equation is used for finding the optimum solution of ( ) is as follows: where ( ̂ ) represents the first derivative of ( ) against in ̂ , and ( ̂ ) and ( ̂ ) represent the first and second derivatives of ( ) against in ̂ , respectively. The error value for each iteration, denoted by , is determined by the following equation: Let be error that is tolerated. If the iteration -th with has been obtained for the first time, the iteration is stopped. The value of ̂ is called the maximum solution if ( ̂ | ) .

Kolmogorov-Smirnov Test
Kolmogorov-Smirnov test is a formal test used to check the fit between certain empirical and theoretical distributions. According to Vribik (2020), the test statistic in this test is the largest absolute difference between the two distributions whose value can be determined by the following equation: where represent the maximum value between ( ) ( ) and ( ) ( ) . To the significance level of , will be compared with the critical value which is obtained from the Kolmogorov-Smirnov table according to the number of data and the value of selected. There is a two-tailed hypothesis formulation of this test, namely ( ) ( ) and ( ) ( ). Reject if , meaning that at the significance level , the empirical distribution does not fit the specified theoretical distribution, and vice versa.

Extreme Value Theory (EVT)
According to Gilli and Kellezi (2006), Extreme Value Theory (EVT) is an approach used to determine extreme distributions so that it focuses on determining probability in the tail area. According to Jindrova and Pacakova (2016), there are two methods in the EVT approach, namely Block Maxima (BM) and Peaks Over Threshold (POT). The BM method identifies extreme data based on the highest value of each period (weekly, monthly, and so on), while the POT method identifies extreme data based on values that exceed a certain threshold value regardless of the time of the event.

Peaks Over Threshold Method
Peaks Over Threshold is a modern method of estimating the extreme distribution where the selection of the extreme values are based on values that exceed a certain threshold value regardless of the time of the event. Suppose as random variable which represents economic losses due to outbreak catastrophe with distribution and represents a large threshold value. If the random variable with is defined which represents the excess of extreme economic losses due to the outbreak catastrophe from the threshold value, then the distribution can be determined by the following equation: (10) is transformed, then the extreme distribution of is as follows: Based on the Pickands-Balkema-de Haan theorem, if the threshold value of is large, then ( ) can be approximated by ( ) or ( ) ( ) ( ). Then, Galambos et al. (1993) formulated that the probability is greater than the threshold value is , where and represent the number of observed data and the number of extreme data, respectively. Based on this, equation (11) can be rewritten as follows: ( ) . / ( )

High Quantile Estimation Model
According to Deng et al. (2020), High Quantile Estimation (HQE) model is a model used to determine the -quantile of the extreme distribution, where ( ( ) ). If ( ) is defined, then the HQE model of equation (12) can be determined by the following equation:

Results and Discussion
The data must have a fat distribution tail. This can be seen from the data kurtosis. Calculation of data kurtosis is done by equation (1). The result of the calculation of kurtosis obtained is 10.8989. Since the kurtosis is greater than three, the tail of the data distribution is confirmed to be fat-tailed so that the process can continue.

Threshold Value Estimation
The iteration of the kurtosis calculation is done with equation (1), the process is assisted by the Scilab software. Snippets of the iteration results are presented in Table 1.  Table 1, it appears that in the 43rd iteration, 2.949846 for the first time. Therefore, the iteration is stopped and the largest datum of the 61-subdata, 664, is chosen as the threshold value so that there are 42 extreme data ( = 42), namely the 62th datum to the 103th datum. After that, resample the data. Snippets of resample data results are presented in Table 2. Based on Table 2, it appears that the smallest data is USD 36 million, while the largest data is USD 24,336 million.

GPD Parameter Estimation
Suppose that are random variables that are independent and have identical distributions that represent the economic loss due to the th outbreak catastrophe. If defined as independent random variables and have a ( ) are the excess of extreme economic losses from the threshold value, then based on equations (4) and (5), the likelihood and the log-likelihood functions of can be expressed as follows: where , -. Based on (6), the value of that maximizes the equation (15) is the solution of the following equation: where ( ) represents the first partial derivative of , ( )against whose elements are as follows: , ( )-∑ ( )  (17) and (18) have a closed form so that will be estimated through the Newton-Raphson Iteration. The second partial derivative of , ( )against , denoted ( ), is determined first. ( ) is stated as follows: Based on equation (7), the Newton-Raphson iteration is done with the following equation: Next, the tolerable error value is chosen, namely 0.000001. For the first iteration, the initial estimate vector is determined first. According to Jockovic (2012), the initial estimated vector of GPD is determined by the following equation: where ̅ and represent the mean and standard deviation of , respectively. Iteration is done with the help of Scilab software. Snippets of iterotion results are presented in Table 3.  Table 3, it appears that the iteration stops at the 11th iteration with an error of 0.000000055364 and the parameter estimators obtained are and . It appears that , -( ̂ ) ,( ( ) -, so ̂ is the maximum solution of , ( )-.

Fit Test for Empirical Distribution and GPD
The following is the Probability-Probability Plot (P-P Plot) and Quantile-Quantile Plot (Q-Q Plot) of the empirical and GPD distributions which can be seen in Figure 1. Based on Figure 1, it appears that the set of points on the P-P Plot and Q-Q Plot are scattered around the onegradient line, so it can be seen that the empirical distribution fits the GPD. Formal testing is carried out by the Kolmogorov-Smirnov test. The level of significance chosen is = 0.05. Visualization of fitting empirical and GPD distributions using the Kolmogorov-Smirnov test was made with the help of Scilab software. The result of the visualization is presented in Figure 2. Based on Figure 2, it appears that the test statistics are at intervals (10,000, 15,000). Based on equation (9), the test statistic obtained is . Based on the number of data and the selected significance level, the critical value obtained from the Kolmogorov- Smirnov table is . It appears that , so the decision taken is to accept , which means that the empirical distribution follows GPD.

Building Extreme Distribution from Economic Losses Due to Outbreak Catastrophe
Based on equation (12), the following is the extreme distribution of economic losses due to outbreak catastrophe: Based on equation (25), if the amount of the bond claim is USD 1,167.26 million, then the probability of the claim will be 0.35. Then, based on equation (13), here is the quantile model of equation (25): Based on equation (26), if the probability of a claim is 0.35, then the of the bond claim will be USD 1,167.26 million.

Discussion
Reinsurance companies must determine the amount of the claim so that the amount is greater than USD 664 million so that the probability of it happening is not too big. The sponsor as the insured must know the relationship between the amount of the claim expected and the probability of its occurrence. Investors also have to think carefully about the risk of losing the principal and the amount of return they might get. For outbreak catastrophe bonds with a principal amount of 1, a coupon rate of 10%, an interest rate of 8%, and a term of one year, these relationships are visualized in a Cartesian Diagram. Visualization is done with Scilab software. The results of the visualization are presented in Figure 3. Based on Figure 3, it appears that the greater the claims expected by the sponsor, the smaller the probability of claims occurring. Therefore, if the sponsor does not want to lose the benefits of the bond, the determination of the amount of the claim must be calculated in such a way that the amount is not too large so that the probability of the claim is not too small. Then, it also appears that the higher the risk of losing principal, the higher the return that investors may get. The minimum return that investors may get is 11.55%, while the maximum return that investors may get is 66.15%. Therefore, if investors dare to take risks, then buying outbreak catastrophe bonds assuming the principal is lost is more profitable.