Selecting Location of Retail Stores Using Artiﬁcial Neural Networks and Google Places API

,


Introduction
Analogously, selecting location for a facility or a retail store is like an effort of estimating an unknown parameter without any data in some cases.Since the retail store does not exist, there is no data for demand and sales at hand, the decision maker is obliged to use auxiliary data which is collected by surveys, and/or, the data collected by other stores similar and near to the candidate in order to perform any statistical forecast, and/or, subjective comparisons of attributes.Selecting location of a warehouse is more deterministic since the problem can be well modelled and solved using optimization techniques whereas retail stores are more profit-oriented and future demand should be forecasted before making any decision.
The problem of location selection can be separated into two branches.The first branch is a binary decision of building a facility in a predefined location.The decision maker is responsible to find the right answer which is either yes or no.The other branch can be considered as an optimization problem, that is, what must be the location to minimize some cost or maximize the profit.The former requires a single step decision process while the latter consists of iterations of steps given in the former (Church, 1999).Applebaum (1966) reported the results of a survey study of store location research in the United States.Randhawa and West (1995) suggested an integrated approach which is based on analytical location models and a multi-criteria decision model for the problem of location selection.Yang and Lee (1997) used an analytic hierarchy process for facility location selection which is based on decision makers' subjective comparisons.Shen and Yu (2009) stated that the attributes used in multi-criteria decision models (such as political environment, proximity to markets and customers, supplier networks, expansion potential, availability of transportation systems and utility, quality-of-life issues, culture issues, etc.) are evaluated by human perceptions and judgments which are not measured numerically.Erbiyik et al. (2012) also applied analytical hierarchy process in a real-life problem as an example of location selection of retail stores.Yong (2006) applied TOPSIS to determine location of plants.Some other areas of location selection problem are determining locations of international tourist hotels (Chou et al., 2008), restaurants (Tzeng et al., 2002), shopping malls (Cheng et al., 2005), shipyards (Guneri et al., 2009) and large scale power systems (Nagata et al., 2001), among others.
Using GIS (Geographical Information Systems) is not trivial in the subject of location selection.Cheng et al. (2007) suggested a GIS approach for selection of shopping mall locations.Chang et al. (2008), combined GIS with fuzzy multi-criteria decision-making for location selection.Roig-Tierno et al. (2013) suggested a method based on analytic hierarchy process and GIS data for selecting retail store locations.In their study, Roig-Tierno et al. (2013) used parking, accessibility by car, distance of competition, growth in the area, etc, as criteria for the decision making procedure.In some perspective, our method is similar to the reported work as we use environmental data collected using Google Places API.An artificial neural network is performed to reveal relations between environmental data and financial performances of existing stores to forecast financial performances of non-existing stores.With these properties, our method can be considered as novel.
In this paper, we devise a new method for the location selection problem which is based on artificial neural networks and geographical data collected using Google Places API.In section 2, we give a brief literature overview on artificial neural networks and their use and refer the Google Places API and its nearby search utility.We present our method on a real-life example applied on a retail stores chain with 144 clients in Turkey.In section 3, we present the results.Finally, in section 4, we discuss the results and conclude.

Suppose the regression model is
where β is a p-vector of unknown parameters, X is an n × p matrix of independent variables, Y is a vector of dependent variable, n is the number of observations, p is the number of unknown parameters, i = 1, 2, . . ., n, a i is ith row of a, is the stochastic error term which is generally assumed to be normally distributed with zero mean and constant variance.If the model shown in Equation ( 1) is linear and has a single independent variable, the model turns into the simple linear regression and it is shown in Equation (2).
Since the ordinary least squares (OLS) is a distribution-free optimization algorithm, β 0 and β 1 can be estimated using OLS without a distributional assumption, whereas, the maximum likelihood estimator requires the distribution knowledge.Regardless the method used, the estimators β0 and β1 are linear and unbiased estimators of the true values.If the dependent variable Y in Equation ( 1) is not only the function of X but a linear function of more than one independent variable, then the model is called multiple regression model and a k−variable multiple regression model is shown in (3).
If Y is not a vector but a matrix of more than one dependent variable, then the model turns into a multivariate regression model (Bedrick & Tsai, 1994).If the dependent variable Y in Equation ( 1) takes either 0 or 1 as value, the model turns into a binary response model.A binary response model is generally used to predict the probability that the dependent variable having 1 as a function of some observed explanatory variables (Davidson & MacKinnon, 2004).Linear Probability, logit and probit are well-known binary response models in the statistics literature.A logit model can be written as and it is important in the basis of ANNs as the logistic function is frequently used as the transfer function.Although the logistic regression and ANNs have their roots in two different disciplines (statistics and computer science), they share similarities and many authors applied and compared these methods in their studies (Dreiseitl & Ohno-Machado, 2002).
If Y is a variable of non-negative integers or specifically count data and the link function in Equation ( 1) is Poisson Distribution Function then Y ∼ Poisson(λ) where λ is the mean and the variance of the distribution and the model is estimated using the Poisson regression (Saffari et al., 2013).
In short, regression estimators are developed for several measurement types or data structure as we highlight below: 1) The relation between dependent and independent variables may be either linear or non-linear, 2) Dependent variable(s) may have real, binary or integer values, 3) Data may have vector of responses rather than a single dependent variable.
Note that the regression estimators that are mentioned above cannot handle all of the situations, but ANNs are capable to estimate and make predictions using the data that have these properties.where and the final equation can be written as where f (•) is the activation function and it is generally selected as sigmoid as defined below where a is a custom real.Given the residuals ê = Y − Ŷ and total network error E = 1 2 ê2 , solution of the equations ∂E ∂w i = 0 for i = 1, 2, . . ., p minimizes the total network error.Since derivatives of the sigmoid and tanh functions can be easly handled, they are frequently used.Note that any activation function can be used instead, including y = x, but derivatives are required if any gradiant based algorithm performed for the optimization.Evolutionary algorithms, such as genetic algorithms (Goldberg & Holland, 1988) and differential evolution (Storn & Price, 1997) can also be applied to estimate unknown parameters w i (Kalderstam et al., 2013;Ilonen et al., 2003).
In most of cases, the mathematical model can also be defined as where bias is an input node which has the value of 1.This parameter plays a role similar to intercept parameter β 0 in model (3).

Google Places API
Google Place API (Note 2) (Application Programming Interface) provides location based geographical data via http (Note 3) over the Internet.The API, in deep, allows developers to build applications with capabilities of Place Search, Place Details, Place Actions, Place Photos, Place Autocomplete and Query Autocomplete.
Nearby search functionality in the current version returns a list of items with supported types given in Table 1.
Using this API, it is possible to fetch the number of items in a circumferential area for a given radius.The format of results is either JSON (Note 4) or XML (Note 5) which is easy to parse.
The free version of API limits the number of results per request to 60.These results are fetched in groups of 20, so whole the results must be handled in iterations.Since the maximum number of objects for each items is restricted, a small area should be selected to prevent the effects of censored data.

Proposed Method and an Example
The number of objects fetched using Google Places API serves many information about the structure of potential customers for a given area.For instance, the number of schools and universities is a proxy variable for educational structure of population, whereas, the numbers of bus stations, gas stations, subway stations and taxi stands are proxies for transportation structure of the given area.Some variables, such as, the number of casinos and the number of night clubs and pubs give an idea of customer population in certain time intervals.The number of worship places also reflects the male/female ratio and other properties of potantial population in special dates and predefined days.The total number of banks, atms, financial instutions is a symptom of high daytime population of adults, whereas, the number of cafes, shopping malls and restaurants reminds young people at weekdays and families at weekends.Finally, the environmental structure of a certain area directly influences the distribution of population.If the relation between the environmental structure and demand of existing retail stores in same sector is estimated then demand of the new store under consideration could be predicted.
Suppose that we have the frequency data of Table 1 for N retail stores in a chain and rankings of each stores assigned by a clustering algorithm or personal judgments as shown in Table 2. Rankings can be numeric or ordered as well as non-numeric.In our example, we apply k-Medoids clustering algorithm (Kaufman & Rousseeuw, 2009) on a data in which the colums are number of sales slips, total sales etc. and rows are retail stores.An other clustering algorithm such as k-Means can be used instead but k-Medoids is superior as it is more robust and efficient (Reynolds et al., 2006).4 Our proposed method is based on constructing an ANN that takes environmental variables as input (Note 6) and Rankings as output.If the relation between the input and the output variables is exposed, we can feed the network with input values of new store and evaluate the generated output.
Since the input variables have the integers between 0 and 60, no encoding and decoding operations are required.
If any variables in different ranges are used, data should be rescaled in order to map all of the variables into the same space.If the output variable is non-numeric, the output should be translated to binary vectors.Gilbert and Troitzsch (1999) stated that the variables are best coded in binary with units assigned to each binary position.
In our example, the output variable holds the segmentation labels A, B, C, D, E as Rankings which are encoded in binary vectors as shown in Table 4.

Table 4. Binary encoding of output variable
Ranking We train the ANN using the 123 retail stores and left the 18 cases for test to take a decision of opening 3 new stores.Zhang et al. (1999) stated that the large training sets are more capable to reveal clusters of data, however, if the objective is forecasting or clustering unseen cases, size of the test data must be enlarged.In many cases, the ratio of the test cases is ranging from 10% to 30% and the researcher would select the network which performs best on the testing set (Kaastra & Boyd, 1996).In our network, we set the ratio of testing set to 13% which includes the stores that are suggested by firm experts.In the whole research process, the rankings of stores in testing set are unknown for an objective and blind check.
We select 42 input variables out of 96 by taking expert advises and considering performance and they are shown in Table 1 with symbol.Note that a principal component analysis could be applied on the data for dimensional reduction.However, the process of interpreting, calculating and extracting linear combinations is not straightforward.There are many suggested rules of thumb for determining the number of hidden nodes, such as n/2, n, n + 1 and 2n + 1, where n is the number of input nodes (Zhang et al., 1999).In our case, non of these suggestions performed well and we choose a single hidden layer with 120 neurons as the network topology.
Figure 2. Two principal components of input data In Figure 2, principal components of input data are shown.Learning and test sets are plotted using • and symbols, respectively.Figure 2 shows an approximation of similarities between old and new stores by means of input variables rather than the relation between inputs and Rankings.Note that other dimension reduction tools such as multidimensional scaling can be used instead.Without a dimensional reduction technique, a distance matrix can be calculated and k nearest stores can be identified.Median of Rankings of selected nearest stores is the estimated Ranking of the new store.However, these methods only increase the human perception on the given input variables but the relation between geographical data and the financial rankings is not investigated.

Results
Success of the proposed method for retail store location selection is evaluated in two stages.Firstly, the selected network is acceptable as it has small sum of squares of errors over all learning dataset and forecasting accuracy on the out of sample data.
Environmental data are fetched using a JavaScript program which uses the Google Places API as a library.The minimum, maximum, Q1, median, and Q3 statistics of 42 variables are shown in Figure 3 (a-d) using boxplots.The radius of circumferential areas is set to 500 meters for all stores.This limitation is convenient for most cases except for the number of restaurants because 43 stores have at least 60 restaurants and it is impossible to be sure whether they have 60 or more restaurants using the free edition of the API.In our study, we left the radius as 500 meters by considering the descriptive statistics of other variables.The network has been trained in a PC with Intel i5 CPU, 4GBs memory and Linux installed.The statistical software R (R Core Team, 2013) and the R package neuralnet (Fritsch et al., 2012) were used for analysis.Network has been trained many times and the best one has been reported by the package.The final network has been reported in 0.74 seconds with network error 0.00009.
The small network error shows that the network fits the training data well but forecasting quality of the network should be investigated.MAD (Mean Absolute Deviations), MSE (Mean Square Error), and MEAD (Median Absolute Deviations) statistics are generally calculated for this purpose and these statistics are shown in Table 5.In Table 5, it is shown that the calculated MAD, MSE, and MEAD are quite small when they are compared to maximum possible values in the case of worst forecasting performance.
Secondly, method is simulated on the data collected for random locations and expert reviews are considered for testing issues.Reviews of experts show us our method produces reasonable estimates of Rankings of candidate stores.Our method also reveals similarities between the candidate and currently available stores.
Finally, the firm has applied the algorithm and forecasted the Rankings of three candidate retail stores for several places.After the starting up process, they reported that the Rankings are realized as forecasted by the method.

Discussion
In this paper, we suggest a method for estimating the relation between the environmental structures and the financial Rankings of retail stores using ANNs and Google Places API.The method was applied in a retail stores chain (NT Book Stores) with 144 clients which is previously used to make decisions by marketing researches.The firm reported that neither multi-criteria decision methods nor survey data worked well in their cases.
In statistics, properties of an estimator should be investigated either theoretically or computationally.Although the neural networks has an increasing trend in statistics literature, the properties of the estimator are not unveiled because of their black-box and non-linear nature.However, partitioning data into training and testing sets generally reflects the forecasting performance.
We are aware of testing usability of such a method is not possible using a single example, however, we have some evidence that the method work well on estimating financial Rankings of candidate retail stores by only introducing their environmental structures to an ANN trained before as we stated in Section 3.
The proposed method is inexpensive as it is replacing the data collecting, survey or marketing research part by downloading and parsing environmental data which could be considered as proxy variables.Note that reliability of the method increases as the number of existing retail stores increases.If the training set is not large enough to reveal the true relation, method may fail.
In our example, since the financial Rankings are constructed after performing a clustering algorithm using electronically collected data, they do not contain measurement errors.However, the data collected by Google Places API is not that reliable in all cases.The researcher should examine the data as it contains universities in type of schools, bars in type of restaurants, etc., if the data itself is used as inputs rather than principal components.
Our method forecasts the financial Rankings rather than the demand.The estimated demand is simply the arithmetic mean of demands of existing retail stores that fall into the same Ranking.Forecasting the demand directly would be the subject of a future study.
The suggested method can be run in a loop for several circumferential areas and the area with the highest Ranking can be reported.This modification turns our method into an optimization problem which has an objective of profit maximization.Investigating properties and measuring performance of such a window sliding algorithm would be also the subject of a future work.

Copyrights
Copyright for this article is retained by the author(s), with first publication rights granted to the journal.
This is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Figure 1 .
Figure 1.A simple feed-forward neural network diagram

Figure 3 .
Figure 3. Boxplots of Google places data

Table 1 .
Supported data types by Google Places API