Spatial Clustering-Based Gas Station Location Determination

The absence of gas stations built in Cibeber Subdistrict is not balanced with the high level of transportation use for ease of mobility among residents. The purpose of this research is to cluster data using K-Means clustering and spatial modeling to provide a potential location for the construction of gas stations in Cibeber District. Based on the research process that has been carried out using RStudio, the potential villages for the construction of gas stations consist of four villages, namely Cikotok, Cibeber, Neglasari, and Wanasari. As for the results of spatial modeling, Cibeber District has a total of 862 potential location points, and within the scope of potential villages, namely four villages, there are 233 potential location points. Then, after being processed with weighted products for optimization and getting the best location results, 3 potential locations were obtained, namely Tegalumbu Village located in Wanasari Village, Nagrak Village located in Cikotok Village, and Cinangga Village located in Cibeber Village.


Introduction
Accelerated growth in the transportation sector has a huge impact on human life. The development and progress in the transportation sector can be seen with the increasing number of motorized vehicles and the needs for modernization. The availability of an adequate road network system in an area as a structural form of spatial utilization patterns has a relationship to the development of the potential of the city or local area.
Cibeber District is one part of Lebak Regency, Banten Province, with an area of 400.96 km2, which consists of 22 villages or sub-districts. Cibeber Sub-district has a population of 57,815 people with a density of 165 (lebakkab.bps.go.id). The absence of gas stations built in Cibeber Subdistrict is not balanced with the high level of transportation use for the ease of mobility of residents. Taking data based on the questionnaire made from as many as 200 data samples from 22 villages in Cibeber District, 99% have private two-wheeled and four-wheeled vehicles, and 96% are the main means of transportation. The frequency of refueling vehicles in 1-3 days amounted to 56.2%, in 4-7 days to 31.8%, and above one week to 11.9%. A total of 98.5% answered the importance of gas stations in Cibeber Sub-district, with 76.1% citing reasons for cost savings, 19.9% needing certain fuels, and 4% citing other reasons. By considering the results of the questionnaire above, it can be concluded that the people in Cibeber Sub-District really need the existence of gas stations.
However, because Cibeber Sub-district has a very large area, it is necessary to select potential villages where gas stations will be built. To select potential villages by looking at several criteria, K-means clustering is used. K-Means clustering is one of the non-hierarchical data clustering methods that seeks to partition the available data into one or more groups (Aldino et al., 2020).
After the potential villages are obtained, the selection of potential locations is done using spatial modeling with Model Builder. A model is a workflow that assembles a sequence of geoprocessing tools, feeding the output of one tool to another as input. Geoprocessing models automate and document your spatial analysis and data management processes (Stefanidis et al., 2021).
Therefore, using the Geographic Information System (GIS) with clustering is expected to provide information to the public or government about potential land that is strategic for building gas stations and can help provide input to investors to be able to establish a branch of gas stations in Cibeber District by considering the requirements and criteria based on the data obtained, it can also be a comparison for research using different methods and models.

Material
This research conducts the process of determining the location of gas stations in Cibeber District using K-Means clustering and spatial modeling. The tools used are RStudio for K-Means clustering and ArcGIS for spatial modeling. The data used are population, number of vehicles, number of health facilities, number of schools, Cibeber District map, land use map, land slope map, river map, and road map. Cibeber District data can be seen in Table 1

Methods
The methods used in this research are K-Means clustering, spatial modeling, and weighted product. The stages of using the method can be seen in Figure 2.

K-Means Clustering with Rstudio
This K-Means clustering process uses the R Studio application to generate clusters. The K-means clustering algorithm is an iterative clustering analysis algorithm that has been considered an effective way to solve urban road planning problems by scholars over the past few decades; however, it is very difficult to determine the number of clusters and initialize the central cluster sensitively (Ran et al., 2021). The data used in this K-Means clustering with RStudio is the data in Table 1.

Spatial Modeling using ArcGIS
ArcGIS spatial analysis Model building is completed in Model Builder, which can connect data and spatial processing tools to handle complex GIS tasks. In the model generator, input data, output data, and corresponding spatial processing tools are represented by a visual graphical language (Jin et al., 2023). Therefore, researchers created this spatial modeling using Model Builder in order to obtain the potential locations of gas stations. The model builder can be seen in Figure 3.

Weighted Product
The weighted product method is known to be more thorough because it obtains twice the valuation value, and the weighted value results in ranking order. This makes the ranking information more precise with specific results and has a faster average execution time (Khrisna et al., 2020). Weighted Product was used as an optimization of the potential location results obtained in the Model Builder. The weighted product stage can be seen in Figure 4.

K-Means Clustering with RStudio
Data processing with K-Means Clustering uses Cibeber District Data, namely data on population, number of vehicles, number of schools, and number of health facilities. Stages in RStudio for K-Means Clustering processing, starting with the installation of packages in RStudio, import files containing Cibeber District data in csv format. Furthermore, the selection of numeric data, data standardization to equalize data units, and finally the clustering process.
The results of K-Means clustering processing with RStudio obtained Cluster 1 members; the villages included in this cluster are Kujangjaya, Mekarsari, Sukamulya, Warungbanten, Cikadu, West Citorek, Central Citorek, Ciherang, and Situmulya. Then in Cluster 2, the villages included in this cluster are Sirnagalih, Cihambali, Gunungwangun, Citorek Sabrang, East Citorek, Cisungsang, Kujangsari, Hegarmanah, and Citorek Kidul. While only four villages in Cibeber Sub-district have high potential for the construction of gas stations included in Cluster 3, namely Neglasari, Wanasari, Cibeber, and Cikotok Villages. The data visualization of the cluster result plot can be seen in Figure 5. Villages included in Cluster 1 have a high number of residents and vehicles, but the absence of health facilities makes them non-potential villages. Furthermore, villages included in Cluster 2 have populations and vehicles that are below those in Cluster 1, but some have health facilities. The villages included in Cluster 3 can be said to be complete in fulfilling the criteria because they have a high number of residents and vehicles, as well as many health facilities and schools. Beyond that, the four villages included in Cluster 3 are seen from a geographical perspective and are traversed by local roads; these things affect the determination of the location of gas stations in Cibeber Sub-district.
Furthermore, in mapping the cluster results, the Cibeber District map is colored using symbolism based on the category, namely cluster. Symbology for Cluster 1, which is not potential, is colored red; Cluster 2, which is quite potential, is colored yellow; and Cluster 3, which is potential, is colored green. The cluster result map can be seen in Figure 6.

Spatial Modeling with ArcGIS
Spatial processing using Model Builder is carried out to obtain potential locations for gas stations by using several features in it. Spatial model development is based on the Model Builder tool in ArcGIS. Model Builder is used to automate and document selected spatial analysis and data management processes as a collected chain diagram, which is a sequence of geo-processing tools that use the output of one process as input to another (Shokr et al., 2021).

Reclassify Slope (DEM)
The reclassification tool in ArcGIS is used to reclassify the DEM into different elevation values. The slope function in ArcGIS under the spatial analyst tool is used to generate slope gradient maps in percentage (Seja et al., 2022). In the slope parameter, the area of each slope class can be seen in Table 2. Based on the slope table above, it can be seen that Cibeber Sub-district is dominated by land slopes of 8-15% and 15-25%, which means it is dominated by hills and cliffs.

Dissolve Landuse
Dissolve creates new coverage by combining adjacent polygons, lines, or regions that have the same value for a particular item and can produce very large features in the output feature class (Shehab et al., 2021). After the land use shapefile was dissolved, nine land use classes were obtained in Cibeber District. The land use class with the largest area is mixed dryland agriculture with an area of 15,438.27 ha, or 154.38 km2, followed by secondary dryland forest with 12,139.19 ha, and the smallest area is open land with an area of 15.80 ha.

Weighted Overlay Results
Weighted overlay analysis is a technique usually applied in geographic information systems to select a suitable system for a particular use (Zakaria et al., 2022). According to Papadopoulou and Hatzichristos (2019), it requires the analysis of many different factors in raster layers with varying scales of value and relative importance. Weighted Overlay is the result of two shapefiles, namely land use and slope, which have their respective weights. In this process, the result is an area that has the highest value and an area with a value of 1 to 9 from the weights that have been entered, where 9 is the highest value. However, there are no areas with a value of 9. This is because the weight of land use does not have a value of 9, and there are also limited areas that can fulfill it. The weighted overlay results can be seen in Figure 7.

Conditional and Majority Filter Results
Conditional features are used to help control the desired output value; the higher the value selected, the better the resulting region. For the proposed region, a river threshold value of 100 m is used. A reference stream has been used for the stream network. All steps were executed using the conditional tool of Arc GIS (Arefin et al., 2020). Since the results of the weighted overlay are still on a scale of 1 to 8, and the highest value is chosen, areas with a value equal to or above 7 were selected using this conditional feature.
Furthermore, due to the large number of noise areas in the conditional results, the Majority Filter feature was used for better depiction. Class-specific sediment distribution maps are automatically filtered in ArcGIS using the "Majority Filter" to remove noise or artifacts from the ensemble model (Galvez et al., 2022). The majority filter uses a 4 4 grid cell window to determine the most common (majority) value to replace smaller cells or pixels in the raster image (ESRI 2021). In the Number of neighbors to use section, researchers used eight, and the replacement threshold used the majority.

Merge Buffer Result
Merge Buffer is the result of merging the shapefiles that have been buffered, namely the shapefiles of settlements, roads, schools, health facilities, and roads, with the classification in the table. To facilitate analysis, after cleaning, all individual visitor traces for each sampling location were merged using the Merge tool in ArcGIS, resulting in one merged shapefile for each sampling location (Sisneros-Kidd et al., 2021). This merge buffer serves to summarize the overlay processed shapefile with the results from the majority filter using the erase feature.

Erase Result
The Erase feature is good for deleting parts of the shapefile and taking only the parts needed (Apriyanti et al., 2021). The Erase feature is a feature to delete the shapefile resulting from the majority filter by the merge buffer shapefile. The results of this erase feature produce shapefiles that have potential locations for gas stations to be built. In this Erase result, 862 location points are obtained, which are spread across 22 villages in Cibeber Subdistrict.

Results of Overlay with Cluster Map
In the Erase result, this potential area is still within the scope of the Cibeber sub-district level and not yet in a cluster of villages suitable for the construction of gas stations. Therefore, an overlay process was carried out between the potential map and the map of cluster three results, consisting of four villages. Clipping is done by overlaying polygons on one or more target features (layers) and extracting from the target feature (or features) only the target feature data located within the area outlined by the clip polygon (Okwudili, 2021). The results of potential location points obtained after overlaying the Cluster 3 map totaled 233 points.

Near Feature Result
Based on the partnership.pertamina.com website, the criteria for the location of gas stations on local land is at least 1000 m 2 , with a minimum face width of 20 m2 and a minimum side width of 50 m2 from the local road. To overcome this problem, researchers used the Near feature on local roads with a radius of 50m. The Near Tool in ArcGIS is used to determine the distance from each district to the nearest public health facility. The Near Tool determines the Euclidean distance between input features (district centers) and near features (public health facilities) (Shawky, 2016).
The potential location points obtained from this near feature totaled 21 points. The most potential locations were obtained in Cikotok Village with 7 potential location points, followed by Neglasari Village with 6 points, Cibeber Village with 4 points, and Wanasari Village with 4 points. Distribution of potential location points in Cluster 3 villages. Results of the Near feature can be seen in Figure 8.

Ground Truthing
Ground truthing is an activity carried out to assist in the classification process and improve the quality and accuracy of the Model Builder results. Candidate areas are ground truth boundaries, and grounding is correct if the most suitable area is the same as ground truthing (Liu et al., 2019). Ground truthing results from 21 potential location points; as many as 17 points are suitable for the construction of gas stations, and 4 points are not suitable for the construction of gas stations. The four points are declared unsuitable because they have a land slope above 15%.

Optimization with Weighted Product
After designing a weighted product, the weighted product process is carried out. Hypothesis testing is carried out using the weighted product method so that it can be known whether the results of the test are acceptable or not and the best alternative can be known (Fitria et al., 2019). In the first stage of determining the suitability rating table, 18 data points have a land value of 6 and a slope of 9, and 3 data points have a land value of 7 and a slope of 9, namely locations 3, 14, and 21. Then the next weight improvement is to calculate the S vector value later. Weight improvement is done by dividing the values of the criteria from land use and DEM by 10, and the weight of land use is 0.53 and DEM is 0.47.
Furthermore, calculating the S vector value is done by calculating the value of each data point multiplied by the weight of each criterion. For 18 data points that have land value 6 and slope 9, the S vector value is 7.259, and for 3 data points that have land value 7 and slope 9, the S vector value is 7.877. The total number of S vectors obtained from 21 data points is 154.29. The total of this S vector is to be used in the calculation of vector V. The calculation of vector V is done by multiplying the vector S value of each data point by the total vector S. Then the result of vector V in 18 data points is 0.047046 and in 3 data points is 0.051052.
From the results of the above calculations, the highest V vector value is 0.051052, and it can be concluded that the best alternative potential land candidates for gas stations are locations 3, 14, and 21. The results of optimization with weighted products can be seen in Table 3.

Potential Locations to Crowded Points
Based on research conducted by Elsa Juliany with the title "Analysis of the Effect of Product Availability, Facilities, Service Quality, and Location on Consumer Purchasing Decisions in Purchasing Pertalite Type Fuel Oil in Rantauprapat", location selection must also look at crowd factors and be easily accessible to consumers. Therefore, the location obtained from the weighted product is reviewed based on the distance from the crowd centers, such as markets, schools, and settlements. The selection of markets, schools, and settlements is based on the closest distance to the location. The crowd points can be seen in Table 4. Based on Table 4, it can be seen that the distance from the nearest settlement, school, and market to the location The distances from the three crowd points are averaged to get the most strategic location close to the crowd point. The average calculation can be seen below.

Conclusion
Based on the research process that has been carried out using RStudio, the potential villages for the construction of gas stations consist of four villages, namely Cikotok, Cibeber, Neglasari, and Wanasari. These four villages have potential because they meet the four attributes in the highest clustering process, especially in Cikotok and Cibeber villages, which are mobility centers with markets and terminals. As for the results of spatial modeling by measuring based on the classification of land suitability of gas stations and weighting from experts on the criteria, in Cibeber Sub-district there are 862 potential location points, and within the scope of potential villages, namely four villages, there are 233 potential location points.
However, because gas stations must be located at least within the local road area, locations within a 50-meter radius of the local road were selected, and 21 potential locations were obtained in four potential villages. In this study, an accuracy test was carried out with Ground Truthing by taking coordinate points from the spatial modeling results. After testing the accuracy with ground truthing, the results were optimized using weighted products to get the best land results. The criteria used in the weighted product calculation are slope and land use, according to experts. The results obtained three best location points, namely Kp. Tegalumbu located in Wanasari Village, Kp. Nagrak located in Cikotok Village, and Kp. Cinangga located in Cibeber Village.