Probability distributions of COVID-19 tweet posted trends use a nonhomogeneous Poisson process

The influence of social media in disseminating information, especially during the COVID-19 pandemic, can be observed with time interval, so that the probability of number of tweets discussed by netizens on social media can be observed. The nonhomogeneous Poisson process (NHPP) is a Poisson process dependent on time parameters and the exponential distribution having unequal parameter values and, independently of each other. The probability of no occurrence an event in the initial state is one and the probability of an event in initial state is zero. Using of nonhomogeneous Poisson in this paper aims to predict and count the number of tweet posts with the keyword coronavirus, COVID-19 with set time intervals every day. Posting of tweets from one time each day to the next do not affect each other and the number of tweets is not the same. The dataset used in this study is crawling of COVID-19 tweets three times a day with duration of 20 minutes each crawled for 13 days or 39 time intervals. The result of this study obtained predictions and calculated for the probability of the number of tweets for the tendency of netizens to post on the situation of the COVID-19 pandemic.


Introduction
The dissemination of information has a positive impact in the midst of a pandemic that occurs due to the spread of the Coronavirus in the world community (Severo et al., 2020). The dissemination of information through social media, especially Twitter, is very fast, especially conversations between netizens. Use of a non-homogeneous Poisson process (NHPP) to define the retweet hierarchy of the original tweet data (Gu and Kurov, 2020) set where all retweet processes are placed on a single hierarchical model so that information can be collect to estimate a parameter (Lee and Wilkinson, 2020). In determining the estimate for the application for non-life insurance data, NHPP is used as a method of solution to analyze data within a certain period of time (Vedyushenko, 2018). Even the nonhomogeneous Poisson process (NHPP) hierarchical model can also be used to disseminate information on online social media, especially Twitter retweets. The retweet of each original tweet modeled by the NHPP, which has a function of intensity, is the product of a component that is built up by time and other components that are assigned to the number of followers of the original tweet's author.
In the Poisson process, it is provided the process of counting for each number of events at a certain time interval with  parameters. The Poisson process of calculation does not depend on the previous interval process or depends on one another or is stationary and have relationship with exponential distribution process consisting of homogeneous Poisson and nonhomogeneous Poisson (Ross, 2014).
In its application, the Poisson process is very widely used in the field of statistics, as well as in calculations for prediction and implementation of other problems. In fact, other developments for nonhomogeneous are linked to the Poisson process and produce a nonhomogeneous compound Poisson process which is described mathematically to anticipate the number of occurrences (Grabski, 2019). In the process of calculating the process of marine ship accidents in the Baltic sea and ports, the nonhomogeneous Poisson process involved an important role in calculating and providing a model that can anticipate marine ship accidents (Franciszek, 2018). In medicine, to model seasonal events that occur due to dengue fever, using a nonhomogeneous Poisson process, which combines seasonal factors for the number of disease sufferers with analysis to improve the function of NHPP in daily cases (Cifuentes-Amado and Cepeda-Cuervo, 2015). In terms of climatology, modeling of extreme rainfall is also useful for studying the effects of seasonality and trends on modeling of extreme events of daily rainfall that exceed predetermined threshold values (Ngailo et al., 2016). In geostatistical modeling, the process of calculating the space-time approach data uses a nonhomogeneous Poisson process, involving two components: the Gaussian spatial component and the accounting component for its temporal effects, the objective is data suitability and identification of areas with the highest levels of pollution, namely the Southwest, Central and Northwest of Mexico City (Morales et al., 2017). In order to maximize the performance of NHPP, keep using the traditional Poisson base for software performance optimization by presenting the process in detail to prove that the resulting model is considered effective for improving and optimizing the performance of the traditional NHPP model and distribution function (Wang et al. 2016;Kim et al., 2010). In fact, in addition to testing software failures, from the hardware development side, NHPP is used to predict how long the development testing process will be terminated (Yu et al., 2007), So we also utilize of NHPP for the tweet dataset. In this study, we tried to model the set of Twitter crawling data with the keywords coronavirus and (Covid19 or COVID-19 as we call COVID-19), the process of calculating each keyword that appears in each tweet at defined time intervals. With the number of tweets obtained, the number of tweets can be modeled at a certain time using the nonhomogeneous Poisson process.

Materials
In this study, we used a dataset obtained from crawling tweet data about COVID-19, netizen posts, taken by filtering using predetermined keywords. We use coronavirus and COVID-19 as keywords and use of Twitter developer Application Programming Interface (API) which can be accessed to facilitate the crawling process. The data retrieval process is carried out in 3 time durations each day for 20 minutes for each time (See Table 1). This process is carried out for 13 days.

Counting Process
The stochastic process { ( ); 0} N t t  is defined as counting process if () Nt or t N states the number of events that occurred during time t (Osaki, 1992). If it satisfies; (i).
denotes the number of events that occur at an interval time ( , ]. st The counting process is called a process with independent increments if the number of events that occurs in separate time intervals is mutually independent (Santitissadeekorn et al., 2020;Grabski., 2019). That is, the number of events that happened to time t , ( i.e. () Nt ) , is independent of the number time events between t and ts  , ( i.e. ( ) ( ). N t s N t  ). The counting process is named a process with stationary increments if the distribution of the number of events is occurs at certain time intervals only depending on the length of the interval, not depending on the location of the interval. It mean, the number of events in the time interval   12 , has the same distribution as the number of events in the time intervals 12 ( , ) ) , for all 12 , t t s  (Kenney and Keeping, 1962).

Homogeneous Poisson Process
In the counting process with{ ( ); 0} N t t  is named Poisson process with a parameter rate 0   as well (0) 0 N  and process has stationary independent increments as satisfies ( ( ) 1 as Poisson stationer process, then: for any 0, 0 st  , represents the probability that k events occur at the interval (0, ] t . The Poisson process is a plain stochastic process and is highly used for modeling the time at which appearance put in a system (Mingola, 2013). A counting process { ( ); 0} N t t  is named to be a Poisson process with rate (parameter) λ if: (i).
= rate or the average number of events that occur per time . t

Nonhomogeneous Poisson Process
In the counting process { ( ); 0} N t t  is named nonhomogeneous Poisson process with intensity . The stochastic process with independent increments for the process { ( ); 0} N t t  . (iii).
Based on the above statement, it can be determined for one dimension of the nonhomogeneous Poisson process, is: Meanwhile, the probabilities of no occur an event in the initial state is one and the number of events occurring at an interval of time is independent of each other.
The distributions of nonhomogeneous Poisson processes have expectation and variance function: And standard deviation is: The increment expected value for ( And appropriate to standard deviation is: A nonhomogeneous Poisson process with ( ) , 0 tt   for each 0 t  , is a regular Poisson process. The increments of a nonhomogeneous Poisson process are independent, but not necessarily stationary.

Model Parameter Estimation
Tweet posts with the topic of coronavirus, COVID-19 within 13 days and 3 time intervals every day provide information on the number of tweets that are still being discussed by netizens. From Figure 1, it can be seen the number of tweets obtained every 20 minutes. Then it will be approximated to calculate the intensity of () t  with an estimate of the simple linear regression function, namely yx   with satisfied (Sumiati et al., 2019): To increase this equation to find  and  in quadratic expression with derive value minimum the objective function T and denote µ  and µ  (Kenney and Keeping, 1962).

Results and Discussion
The numbers of tweets obtained through crawling three times per day to retrieve post tweet data according to the desired keywords are coronavirus and COVID-19, and then the empirical intensity of tweets per crawling duration can be obtained as shown in Table 2.  (11) and (12) to obtain the linear regression intensity of tweet dataset for Table 2 1 In Figure 1, it can be seen that to get a linear regression equation ( ( )) x  as in equation (13), a plot is carried out against the center of interval data which is represented as an independent variable with the intensity of tweets per minutes represented by the dependent variable, Using the equation (5) Based on equations of (4), (5) and (14), we can obtain a one dimensional distribution of nonhomogeneous Poisson process for tweet dataset above is: Example Suppose we will predict the number of coronavirus and COVID-19 tweets occurring 10 May 2020:21.00-21.20, So that we have a time interval [2920,2940) and Probability that the number of coronavirus and COVID-19 tweets in considered interval of time is not greater than g=38,000 and not less that h=36,000. Before calculate number of tweets that occurred on that date, it the first doing to determine the t and s parameter, the length of the interval s = 1 and t = 2920. With equations (8) and (9), we can predict that interval and standart deviation:

Conclusion
Using the calculation process theory by utilizing the nonhomogeneous Poisson process concept, it is possible to build a stochastic model of the number of tweets with the keywords coronavirus and COVID-19 obtained by crawling the tweet data with a specified duration of time every day continuously. The counting process with an independent increment is a reasonable model for counting the number of tweets in a certain time period. The use of linear regression is one of is one of options to determine estimation parameters in approximation of number of tweets. With numbers of tweet data obtained is different for each defined time duration, namely three times taken in one day, and does not affect each other, so that the number of tweets can be marked as nonhomogeneous Poisson process. The application of the model obtained can be used to count the number of tweets at certain intervals, as well as the probability for prediction at time intervals outside the defined dataset.