Indonesian Stock Price Prediction using Deep Learning during COVID-19 Financial Crisis

This research paper aims to use the deep learning model Long Short-Term Memory (LSTM) for the stock prediction model under the financial crisis of COVID-19. The financial impact of the COVID-19 has brought many of the world's indexes down. The impact of the financial crisis is even riskier for an emerging country such as Indonesia where foreign investors tend to take out their investments in emerging countries in financial crisis events. The application of deep learning in financial time series applications such as stock price prediction has been researched extensively. This study used the (Bidirectional LSTM) BiLSTM model which is a variation of the LSTM model to predict stock closing price. The stock prediction is applied to a selected company from the Indonesian stock market using historical prices. The model is then evaluated using metrics Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE). A graphical comparison between the actual price and predicted price of the stock is charted to study the stock price movement. To study the impact during COVID-19 on the stock prices, an intervention analysis is conducted along with the Wilcoxon model. The stock price prediction model can forecast the price of stocks before and during the financial crisis with minimal error. The intervention analysis result showed that health sectors have a positive effect while other sectors such as transportation, finance, information technology, and entertainment have a negative effect during the financial crisis of COVID-19. Being able to analyse and study the stock price movement of stocks is beneficial to investors in understanding the impact of the financial crisis on some industries and the behavior of certain stocks or industries under the circumstances which can lead to alternate investment strategies and decision making.


Introduction
The outbreak of the COVID-19 pandemic is a systematic risk that affected many sectors which include Dow Jones Index to drop by 2000 points, the U.S. geopolitical risk index fell by 896 points, and the U.S. economic policy uncertainty index rose by 300 points on the height of the pandemic in March 2020 (Sharif et al., 2020). Meanwhile, the impact of the COVID-19 is worse for the emerging market compared to the developed market. For example, in Indonesia, The IDX composite price fell by 63.49% from January 2020 to its lowest in March 2020 (IDX, 2020). Wójcik and Ioannou (2020) stated that the emerging market is affected the most by the COVID-19 pandemic than developed countries because of the falling price of exported commodity prices. Others suggested that although stock markets in emerging markets are often used as a haven, emerging markets are also vulnerable to capital outflow during uncertain times and financial crises (Singh, 2020).
During the financial crisis, investors reacted in a flight to safety behavior and shifted to more risk-averse behavior in their investment choices. One criterion is to invest in industry sectors that are least impacted. Shen et al. (2020) stated that the tourism, catering, and transportation industries have the worst performance out of the other industry. Therefore, it is imperative to be able to forecast the future stock prices and analyze the impact of certain events such as the COVID-19 pandemic on the stock price movement so that investors in the emerging market especially can act accordingly regarding their investment strategy (Mahdi and Khaddafi, 2020).
Deep learning application in the financial sector has been researched heavily with primary focus research such as stock price prediction, volatility prediction, financial text analysis, financial text mining. The rise of adopting deep learning into financial application has brought new development into the finance sector. The highest adoption of deep learning is its application on the time series data. Time-series application usually involves predicting market movement, predicting future price, asset allocation, portfolio optimization, predicting volatility, etc. Sezer et al. (2020) stated that the application of deep learning is categorized into two groups which are price prediction and trend prediction. In terms of a deep learning framework that is mostly used in the finance application, Recurrent Neural Network (RNN) with LSTM and its hybrid variations are used widely among researchers. Along with the application of deep learning to predict stock price, the resulted stock price prediction is used to study stock price during COVID-19 using the Wilcoxon model and Causal Impact.
This research paper seeks to give investors a way to analyze the stock prices during the COVID-19 pandemic and possible implementation to the investors' decision-making in their strategy. The research is organized where section 1 discusses the background and the motivation of the research. Section 2 reviews the previous literature on the impact of COVID-19 on stock prices and the application of deep learning for stock price prediction. Section 3 introduces the research design and the technical aspects of the methodologies. Section 4 presents the results of the research and evaluates the models used in the design. Section 5 reviews the conclusion of the research.

The Impact of COVID-19 on Stock Price
The pandemic of COVID-19 is a global pandemic that has impacted many sectors in many parts of the world such as tourism, transportation, economics, healthcare. As the nature of COVID-19 is regarded as an infectious disease where direct contact between humans is limited during the pandemic to contain the spread of the disease (Rahmayani and Oktavilia, 2021). The limitation of the human movement had a severe impact on the activities that require direct contact. The limitation also impacted how businesses operate which affects the performance of the businesses. Since the performance of the businesses is affected, in turn, it affects the performance of stock prices around the world (Rislawati et al., 2022). Many of the world indexes plummeted during the pandemic which caused many investors and especially foreign investors in emerging countries withdrawing their investments in the capital market (Rahmayani and Oktavilia, 2021). As Indonesia is still regarded as an emerging country, the effect of foreign investors withdrawing from their investments is quite severe. Azis et al. (2021) surveyed the Indonesian investor community from Samarinda regarding their perceptions of the impact of COVID-19 which is found that during the pandemic there were many buybacks by the issuers, however, it is unbalanced by the outflow from foreign investors and the public funds. Nurhayati et al. (2021) found that the average return is -0.075 and -0.0013 which shows the severe impact of the pandemic on the performance of the stock market. The researcher found that almost all stocks during the pandemic underperform which raises the risk in investment rises. Meanwhile, Zainuri et al. (2021) found that during the pandemic, more bad news appeared which drove the composite price to decline. To stabilize the composite index, the Indonesian government published macroeconomic policies to maintain market stability.

Deep Learning Application for Stock Price Prediction
One of the applications for deep learning in the finance sector is the application for stock price prediction. The most popular deep learning model used in the finance sector is the RNN model (Sezer et al., 2020). RNN is a type of deep learning where most of the applications are related to sequential data, language, and text processing. RNN takes the current and previous data as its inputs and the output of the model is dependent on the previous inputs. Sezer et al. (2020) stated that RNN can process data for long periods which is why it is more preferred in dealing with time series forecasting. However, as the data get larger and has a longer period, RNN becomes more complex because as information is stored longer, it becomes harder to learn with RNN. The LSTM model is introduced to handle the problem that exists in the RNN model such as the vanishing-gradient (Gao and Chai, 2018). LSTM is a class of RNN where it can process a longer period of data than RNN (Wang et al., 2020). LSTM model can handle both short and long-term data. Out of those RNN models variations, LSTM is the most popular as it has a simple model development phase. When compared to other deep learning models or RNN models, LSTM has a higher performance hence its wide usage in journal publication of finance applications (Vo et al., 2019;Sambas et al., 2020). A variation of the LSTM model, the Bidirectional LSTM (BiLSTM) model is thought to fit better for prediction problems having a bidirectional flow instead of unidirectional flow such as the LSTM model.

Intervention Analysis
To determine the impact of the financial crisis, there is an added data analysis process that involves the usage of intervention analysis and the Wilcoxon model. Wilcoxon model test and compare the mean values of a variable from two data sample determining whether the two samples come from different distribution (Caraka et al., 2020). Intervention analysis is used to compare the impact of certain events or intervention periods on a study subject. Caraka et al. (2020) used the intervention model to analyze the impact of COVID-19 on the Indonesia composite index by utilizing the Auto-Regressive Integrated Moving Average (ARIMA) model. The recent development of intervention analysis also introduced the application of Causal Impact based on Bayesian structure that is also applicable for analyzing the effect of intervention events in time-series dataset (Brodersen et al., 2015).

Materials
The dataset used for the stock price prediction problem and intervention analysis is daily stock price sourced from Yahoo finance. The data source extracted from Yahoo finance include stock close price, trading volume, open price, high price, low price, and adjusted close price. The stock closing price is used as the input variable for the stock price prediction process. The predicted closing stock price is then used as the input variable for the intervention analysis. The analysis for the stock price prediction and intervention analysis includes stocks that are researched to have a strong positive impact and negative impact during the COVID-19. According to He et al. (2020), research on the impact of COVID-19 on sectors in China, the information technology and entertainment sectors have a strong positive impact on COVID-19. Meanwhile, health and transportation have a strong negative impact and lastly, COVID-19 has less impact on the finance sector. The analysis includes 5 stocks consisting of different sectors which are BBCA (banking or finance sector), TLKM (information technology sector), BMTR (entertainment sector), GIAA (transportation sector), and KLBF (health sector).

Methods
This research aims to predict the closing price of selected stocks and use the output of the stock price prediction model as the input for the intervention analysis consisting of Causal Impact and Wilcoxon model on the event of COVID-19. The method used in this research paper follows data source identification, dataset cleaning and preprocess, stock price prediction with the BiLSTM model, and finally intervention analysis with causal impact and Wilcoxon model.

BiLSTM
BiLSTM means bidirectional LSTM where it combined forward and backward LSTM, hence being able to process previous and future data at the same time (Vo et al., 2019). LSTM model has been used previously in many research to handle time series forecasting and outperformed other time series, forecasting models. Different from the univariate LSTM model where it is only preserve information from the past, the BiLSTM model can preserve information from the past and the future (Vo et al., 2019). BiLSTM takes in three-dimensional input with format (T -∆t -δt, δt, N) where T denotes time, ∆t denotes the time gap, δt denotes the size of the sliding window, and N as the number of stocks.
Before data transformation into the three-dimesnional format takes place, testing the dataset for stationary of the data is required. The dataset goes through an Augmented Dickey-Fuller (ADF) test that checks the stationary of the data. Stationary data refers to a dataset that reverts to its long-term mean where the dataset is not affected by the change in time whereas non-stationary data is the opposite where the dataset its variance, mean, and covariance changes over time (Shrestha and Bhatta, 2018). Variables that failed the test goes through a data transformation process to reduce the non-stationary presents in the data.

Intervention Analysis
The output of the stock price prediction model becomes the input for the intervention analysis models. Data preparation for the intervention model include splitting the dataset into two samples with different period. In this case, the two samples dataset include the pre-COVID-19 and COVID-19 period. The pre-COVID-19 includes the period before the selection of certain events as the intervention period and the COVID-19 is the period after the intervention event. The intervention period used for this research is March 2nd, 2020 which is identified as the first reported case of COVID-19 in Indonesia. As for the Causal impact, the intervention period lasted from March 3rd, 2020 until June 1st, 2020 which is roughly three months after the first reported case of COVID-19 in Indonesia.

Stock Price Prediction
The result of the ADF test for the stock close price stated that the variable is not stationary as the statistic score is higher than the significant level of 95%. To make the data more stationary, the close price is put into a data transformation process. The data transformation process includes taking the log value of the data and calculating the moving average of the log value with a sliding window size of 12. Finally, to complete the transformation process, the calculated log value is subtracted with the moving average. The dataset is put into another ADF test after the transformation and checked for stationary which after the transformation, the test statistic is lower than the significant level of 95%.
The transformed data is then split between training and test data with a split ratio of 80:20. The data then is standardized using a min-max scaler which scales the data between a range of zero to one. The architecture of the BiLSTM model consisted of 50 neurons with two BiLSTM layers. The optimizer used is adam optimizer with loss monitoring of mean squared error. The model is trained with a batch size of 50, the number of epochs of 200, and the training data is further split for data validation with a ratio of 80:20. The evaluation metrics used for the stock prediction metric are MAPE and SMAPE. Both of these measures specify that the lower the values of the metrics then the better the performance of the model . For the level significance of MAPE, there are different levels of prediction significant that measure the accuracy of the predicted result. MAPE value of below 4.9% is determined to be highly accurate forecasting. The MAPE and SMAPE value of the stock price prediction model is the average value for the 5 stocks that are analyzed in this research paper. The averaged MAPE value is 1.836% while the SMAPE value is 1.837% showing low error in the forecasting result. According to the reference of accuracy based on the MAPE value, the resulting average MAPE value is considered as a highly accurate forecasting result where it is lower than 4.9%. Figure 1 shows the chart result of the 5 stocks comparing the training and prediction line graph. In this case, validation is the actual close price in the testing dataset. The chart shows that there is a minimal gap between the validation and predictions line graph showing that the model can predict the stock price before COVID-19 consisting of the year 2018 -2019 and the impact of COVID-19 on the stock prices during the first quarter of 2020.

Intervention Analysis
To understand the impact of the COVID-19 on stock prices in Indonesia, each stock price is analyzed with an intervention model consisting of Wilcoxon and Causal Impact analysis. Table 1 shows the result of the Wilcoxon test for each of the stocks. As stated previously, the intervention period used for Wilcoxon analysis is the first reported case of Indonesia. The post-period used is after the intervention date and the pre-period is before the intervention date with the same length of data. The purpose of the Wilcoxon model is to and compare the mean values of a variable from two data samples determining whether the two samples come from different distributions (Caraka, R.E. et al., 2020). All stocks show that the pre-COVID-19 and post-COVID-19 period come from a different distribution from rejection of the null hypothesis. To summarize, the stock price experienced an impact after the first reported case of COVID-19 in Indonesia. Wilcoxon's analysis summarized that there are differences of distribution between pre-COVID-19 and post-COVID-19. To further understand the effect of COVID-19 on the stock prices Causal Impact analysis is conducted. Table 2 shows the summarized report of the Causal Impact Analysis. The relative effect shows the impact of the intervention period in percentage. To summarize, BMTR, TLKM, BBCA, and GIAA experienced a negative impact from the first reported case of COVID-19 up to three months after the reported case. BMTR has the highest negative impact with an average relative effect of -34.7%. The result of the Causal Impact analysis is in contrast with what He et al. (2020) has researched on the impact of COVID-19 on stock prices and sectors in China where entertainment and information technology sectors have positive impact. Although it must be considered that the result of the causal impact analysis is conducted on one of the many companies in each sector. Further analysis on each sector might yield a different result. For GIAA, the intervention period has a negative impact, however, it is not statistically significant considering the 95% interval is (-38.92%, 4.4%) with a relative effect of -17.75%. The application of different intervention periods may cause different perspectives on the result of the GIAA stock. However, at this point using the determined intervention period, there is no definite conclusion on the effect although the predicted chart itself shows that there is a negative impact during the intervention period. Different from the other stocks, KLBF experienced a positive impact of 15.03%. The relative effect of the positive impact is statistically significant with a 95% confidence level between (5.83%, 24.1%). During the pandemic, health sector companies show a promising trend where it is expected to continue to increase until the end of 2020.

Conclusion
In this research paper, deep learning methodology is used to predict the stock price during the pandemic of COVID-19. To further analyze the impact of COVID-19 on the price of a stock, intervention analysis is conducted using Wilcoxon and Causal Impact model. Ability to forecast future stock prices and study the effect of certain events can give investors insight into the stock price movement during such events and act accordingly regarding their strategies in such events. The study includes 5 stocks representing different sectors in Indonesia. The deep learning model BiLSTM can predict the stock price well with low values of error. The output of the stock price model is used in intervention analysis to study the impact of COVID-19 on the stocks. The Wilcoxon model showed that the pre-COVID-19 and post-COVID-19 period has different distribution for all stocks showing the possible impact on the stock prices. Further analysis of the Causal Impact showed that all sectors in the study have a negative impact while the health sector has a positive impact during the COVID-19. The research limitation of this paper is that it is limited to only a few stocks to represent certain sectors. Future research regarding deeper analysis of each sector can give an overall perspective on the performance of each sector during COVID-19. Future research regarding the use of other deep learning methods such as Gated Recurrent Unit (GRU) or implementation of (Partial Autocorrelation Function) PACF from ARIMA for data preprocessing to get the best time-series model.