Optimal Combination of Trading Rules Using Neural Networks

A large number of trading rules based on technical analysis of prices are being used by investing community for generating trading signals for short term investments. As profitability of these trading rules vary, it is not easy to judge which particular rule really ‘works’. Instead of a single trading rule, combination of rules are likely to offer the portfolio benefits of better risk adjusted return and hence, an experiment is carried out to combine signals generated from of moving averages of different window size using an artificial neural network. It is observed that the risk adjusted performance measure of the artificial neural network based trading model is better than that of simple ‘Buy and Hold’ strategy.


Introduction
The investors not only look for long term capital gains from the market, but also like to maximize returns exploiting opportunities from short term price movements.A large number of trading rules based on technical analysis of prices are used for generating buy and sell signal for short term investments.From the numerous trading rules being used by the trading community, it is not easy to judge which particular rule really 'works'.
Investment professionals with long years of experience may take very good trading decision form their expertise.But there is no guarantee that such decisions will always work and therefore the use of a systematic procedure to generate trading decisions becomes important.A systematic decision making approach can also help to overcome various limitations that are inherent in human professionals.Further, as different practitioners have different views on the same information set, systematic evaluation methods will reduce personal bias.In the area of investment management, where decisions involve very large amounts of money, sometimes the lifelong savings of the clients, reassurance of the soundness of the investment decision-making process is necessary.
Technical trading rules are increasingly being used in financial markets for over a century ever since it was popularized by Charles Dow in 1900.But analysis of trading rules have started drawing more attention in 1990s and several authors have expressed that financial prices and returns are predictable to some extent, either from their own past or from some other publicly available information.For example, Bessembinder & Chan (1995), Blume, Easley & O.hara (1994), Brock, Lakonishok & Lebaron (1992), Ramazan (1998), Jegadeesh & Titman (1993, 2000), Lo & MacKinlay (1988), Neftci (1991), Ready (1997) test various trading rules based on technical analysis and reported that technical analysis provides information beyond that already incorporated in the current price.However, there cannot be a fixed trading rule as excessive usage of a particular trading rule will reduce efficacy of that rule.If everybody starts using a particular rule, that rule will not work any more and hence, the trading rules need to be continuously upgraded based on changing market dynamics.
Analyses made the literature are based on performance of a specific trading rule used in isolation.Instead of relying on a single trading rule, traders often use a variety of trading rules and sometimes a combination.Combining of rules are likely to offer the portfolio benefits of better risk adjusted return.Markowitz (1952Markowitz ( , 1959) ) has shown that in portfolio context unsystematic risk can be reduced by diversification; possibly similar benefit arises when multiple trading rules are combined for taking a trading decision.

Using Moving Averages for Generating Trading Signal
The moving average (MA) method is one of the most widely used methods of generating trading rules.It includes numeral versions and different levels of complexity.A moving average is an average of observations from several consecutive time periods.To compute a moving average sequence, we compute successive averages of a given number of consecutive observations.The objective underlying the MA method is to smooth out seasonal variation in the data.
The most widely used moving average (MA) is the n-day simple MA given by: , where SMA t is the simple n-day moving average at period t and P i is the closing price for period i.In the simple MA procedure, a buy signal is generated when the closing price rises above the MA and a sell signal is generated when the closing price falls below the MA.If there is a clear trend, this method will work well.If, however, the market move sideways or if there is excessive volatility, there will be a lot of false signals.
A modification of simple moving average is exponential moving average (EMA) that gives more weight to the most recent time periods.It is described recursively as: ). 1 ( .
, where α is a value between 0 and 1.
For example, if α = 0.5, the most recent value P t is given 50% weight and all other past values are given remaining 50% weight.When the computation begins, the current price is set to EMA and as more prices are available, the averaging process is continued.
The exponential moving average performs well for many business applications, usually producing results superior to the moving average.(Krantardzic, 2001) A moving average summarizes the recent past data, further; spotting the change in the trend of data may additionally improve forecasting performances.Some of the measures that compare current price with the moving averages are the difference between the current price and its moving average; MA t -MA (t-k) : the differences between two moving averages, of same window size; (t,m) : the differences between two moving averages, of different window size; and the ratio between the current value and its moving average.
In the present study the current price will be compared with its moving average in ratio format.When current price P t is be more than its Moving average MA t , it is considered as an indication of uptrend and a buy signal is generated and vice versa a sell signal is generated when current price is less than its moving average.Numerically, the ratio The value of the ratio t t MA P will nevertheless depend on the window size of the moving average.A moving average having less window size, say 3 days, will follow current price closely and the ratio t t MA P will change from 'more than 1' to 'less than 1' more frequently generating a large number of buy and sell trading signals.Whereas a moving average having a large window size, 'say 200 days' will generate trading signal less frequently.More trading arising out of less window size may capture the minor movements of prices well, but consequently the transaction costs will also increase.The success of a moving average based trading method clearly depends on selection of a proper window size, but there is no known method to determine the window size.Therefore, an experiment is carried out to combine signals generated from of moving averages of different window size using an artificial neural network.

An Introduction to Artificial Neural Network
The artificial neural network based techniques are an information processing model derived from functioning of human brain.This is a simple information processing device that accepts many inputs, combines them, and produces an output.The basic element of a neural network is a neuron.The output of one neuron becomes input to other neurons.A neural network is a structure of many such neurons connected in a systematic way.In the study, the neural networks used are feed-forward neural networks, where information processing moves only in forward direction as shown in figure -1.
The neurons in the network are arranged in layers.Typically, there is one layer for input neurons (the input layer), one or more layers of internal processing units (the hidden layers), and one layer for output neurons (the output layer).Each layer is either partly or fully interconnected to the preceding layer and the following layer.
The connections between neurons have weights associated with them, which determine the strength of influence of a neuron to other neurons.Information flows from the input layer through the hidden processing layer(s) to the final output layer to generate predictions.The connection weights are determined by a training process, wherein known input and known output data is fed to the network.The network adjusts connected weights so that a relationship between inputs and outputs can be established with certain degree of accuracy.

Designing of Network Structure
There are many parameters to design a feed forward neural network.Decisions regarding number of inputs in the input layer, number of hidden layers and number of neurons in the hidden layers, interconnection of neurons among layers etc. are to be taken.Though some techniques are mentioned in literature for determining these parameters, there is no uniformity.Structure of the network largely remains a design issue and leaves ample scope of innovation to the analyst.

Input Layer
The input layer to the neural network is the medium through which the inputs are presented to the neural network.When a set of input is presented to the input later of the neural network, the inputs are processed and resultant information is passed to the subsequent layer(s).Every input neuron should represent some known variable that has an influence over the output of the neural network.As final output will depend on inputs introduced to the network, the quality and relevance inputs are very important.

Hidden Layers
There are really two decisions to be made with regards to the hidden layers.The first is how many hidden layers to have in the neural network and then how many neurons will be in each of these layers.Neural networks with two or more hidden layers can represent functions with any kind and hence there is no theoretical reason to use neural networks with any more than two hidden layers.
Deciding the number of hidden neurons in layers is an important part of deciding the overall neural network structure.Hidden layers do not directly interact with the external environment but influences the final output.Hence, both the number of hidden layers and number of neurons in each of these hidden layers must be carefully designed.Using lesser number of neurons in the hidden layers will result in under-fitting.Under-fitting occurs when few neurons in the hidden layers are unable detect relationships is complex scenario.On the contrary, using too many neurons in the hidden layers may result in over-fitting.Over-fitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers.Another problem can occur even when training dataset is very large.A large number of neurons in the hidden layers can increase the training time of the network.Over-fitting with large training data may fit past data very well.The objective of neural network model is to extract general relationship in the data which can be used in new environment.Thus generalization of network relations is more important than over-fitting.
There are few rule-of-thumb methods for deciding structure of hidden layer.These rules-of-thumb are only starting points to consider the initial structure.Ultimately the selection of the architecture of the neural network has to be finalized by experimentation.

•
The number of neurons should be in the range between the size of the input layer and the size of the output layer.

•
The number of neurons may be 2/3 of the input layer size, plus the size of the output layer.

•
The number of neurons should be less than twice the input layer size.
In the present study, a three layered network with: one input layer having three input nodes, one hidden layer having three processing nodes and finally one output layer producing a single output is used, as shown in figure 1.The structure of the network can indeed be varied as per requirement of the analysis.

Output Layer
The output layer of the neural network presents output to the external environment.The output is derived from inputs via complex relationships inbuilt in the neural network structure.

Input-Output Relationships
Data ranges of real-world input parameters vary widely.For example, one variable may have data that ranges between 0 and 1, while another variable can be a five digit value.If both of the variables are used in their natural scale, the second variable is likely to be given much more weight in the model than the first variable, simply because of its original values (and therefore the differences between records).To compensate for this effect of scale, range fields are usually transformed so that they all have the same scale.In the study, range fields are made uniform to have values between -0.50 and +0.5 by using a rescaled sigmoid function.
The activation of each neuron from input later to hidden layer and again from hidden layer to output layer is calculated as , where j a is the activation of neuron j, i is the set of neurons in the preceding layer, ij w is the weight of the connection between neuron i and neuron j, i x is the output of neuron i, ( ) x f is a transfer function used to scale the summation values from -0.5 to +0.5.In the study, we used sigmoid or logistic transfer function to scale the numerical values.The original formula for sigmoid conversion is ( ) x SIGMOID e x f − + = 1 1 which converts any x value within a range of 0 to 1.The sigmoid values are further rescaled by a deducting 0.5 so that values remain within -0.5 to + 0.5.Thus the conversion function is ( ) In the network used (figure -1), x 1 , x 2 and x 3 are the input nodes, y 1 , y 2 and y 3 are nodes in the hidden layer and z is the final output.The activation of each node in hidden layer is calculated by using following rescaled sigmoid function of weighted inputs.
( ) , where ij w is the connection weights between y i and x j nodes.Similarly, the final output z can be estimated using the transformation: The final value (z) is used for generating trading signal.Values of these rescaled sigmoid functions range between -0.5 and +0.5 with mean value of 0. If the value of z is found positive, it is considered as a signal of uptrend and conversely, when the value of z is negative it is taken as a signal of down trend.If buy, hold and sell decisions are represented by +1, 0 and -1 respectively, then final buy and sell decisions are determined using the value of Sign(z).Sign(.)determines the sign of a number: returns 1 if the number is positive, zero (0) if the number is 0, and -1 if the number is negative.

Training
The output value z can be calculated from the inputs (x i ) and connection weights w ij using relationships mentioned in previous section.The values of x i (inputs) are known to us but the values of connection weights (w ij ) are not known.
The training the network is carried out to find out values of w ij , so that these values can be used for generating future signals.The objective is to forecast output z (t+1) , which will match with future actual return r (t+1) .However the future returns can never be accurately predicted and any prediction will always have some error.The purpose is to minimize these errors as much as possible so that the forecast is of some practical use.The total error can be measured by adding absolute errors of each observation ABS(r (t+1) -z (t+1) ).A more acceptable form is based on minimization of Total Squared Error (TSE): Minimize: Σ (r (t+1) -z (t+1) ) 2 The minimization can be done using any commercially available software.In the study, the optimization was carried out using "Solver" add-in available in Microsoft Excel.Solver uses the Generalized Reduced Gradient (GRG2) nonlinear optimization code developed by Leon Lasdon, University of Texas at Austin, and Allan Waren, Cleveland State University.When Solver reaches an acceptable solution, it has minimized the total squared error term TSE by changing value of specified cells (these cells are weights w ij of the network).The values of changed cells are the optimized weights and can be used in predicting trading signals in future.

Data
The study examines the profitability of technical trading rules applied to three Indian Stock Indices for the period 1 st April 1998 to 31 st December 2007, covering a period of 10 years.The daily closing values of following indices are analyzed in the study (details on these indices can be obtained from www.nseindia.co.in).

Converting Indices data to Network Inputs
Inputs to the network must contain information pertaining to output (to predict price movements).A large number of academic studies support usefulness of moving averages for determining trends in stock price series.The following inputs selected in the study compares current price with past moving averages.
The first input compares closing price of the security with its 3 day moving average : x 1 = (p t / Moving average of past 3 days) The second input compares closing price of the security with its 7 day moving average: x 2 = (p t / Moving average of past 7 days) The third input compares closing price of the security with its 30 day moving average: x 3 = (p t / Moving average of past 30 days)

Finding Value of Network Weights
Training of network refers to a method of determining the value of connecting weights of the network based on a certain performance measure, such as cumulative profit.The performance of the trading systems is usually determined by optimizing over past known data, but there is no consensus on how much past data to be used.A common procedure to assess the profitability of technical trading is to choose the optimal parameter using the first part of the available data and then test the parameter upon the remaining data for out-of-sample verification.Out-of-sample verification is an important factor in testing the performance of technical trading strategies due to the danger of data snooping biases.
For each financial series, the training procedure is carried out using the past one year's data.The network weights (w ij ), that have shown the best performance over a year, are used for the out-of-sample trading in the next year.At the end of the next year, new optimal weights for the year are again calculated, and this procedure is repeated during the rest of the sample period.For example, the connection weights used for the year 2000-01 are trained weights that generated the highest cumulative return in the year 1999-00.The new connection weights for 2001-02 are selected using the data for the year 2000-01, and so forth.This procedure ensures that the entire neural network model is adaptive and all the trading results are out-of-sample.

Estimation of Profit (Loss)
Profitability of a trading position depends on the change of market price of the traded security and the position of the trader (either long or short).If trader has taken a long position, he will be benefited by a price rise of the security but will incur loss by a price decrease.Similarly the trader can make profit in a declining market by taking a short position.The trader is presumed to take a long position whenever the network output z (t+1) gives a positive value.Likewise, a short position is taken whenever the value of z (t+1) is negative.The trading decision can be represented by the following dummy variable.
The dummy variable d ( t+1) is equal to one (negative one) when the trader goes long (short) in traded asset.The return of the trader on a particular day can be estimated as follows: r (t+1) = d (t+1) ( P (t+1) -P (t) ), where, r (t+1) denotes the return of the trader resulting from the decision taken (d (t+1) ) at the close of period t, which depends on value of d (t+1) and change in the asset value within period t and (t+1).
When d (t+1) = d (t) , the existing position (long/short position in the asset) is maintained and no new transaction need to be carried out.Hence transaction cost is not applicable.If d (t+1) ≠ d (t) , then the position held is reversed at the close of period (t+1), necessitating two single transaction (closing existing position and opening a new position in opposite direction).Taking transaction cost into account, daily gross profit becomes: where c is transaction cost (in fraction of asset value) of a single transaction and ⏐d (t+1) -d (t) ⏐ denotes absolute value of the difference d (t+1) -d (t) .Total cumulative profit after transaction costs can be obtained adding daily profits.
About 15 years ago, transaction costs in Indian markets used to be very high.But the scenario is changed now.Brokerage rates on Indian bourses have crashed to historic lows due to competition.Apart from competition, sustained reforms in the financial markets have led to lower transaction cost.Brokers can no longer justify higher transaction cost after introduction screen based trading, electronic transfer of shares through depositories, launch of internet driven trading and substantial increase of trading volume.The brokerage rates in the market during the period have plummeted from around 2% for delivery bases transactions to less than 0.05% of turnover for future trades.All profitability calculation in the study is carried out at 0.05% transaction cost analogous to cost applicable in futures market.

Trading Profits
The trading results of using Neural Network model is calculated for these three financial series and the same are compared with the profitability of Buy and Hold strategy.In Buy and Hold Strategy, the security is bought at the start of the study period and sold at the end of the period.No transaction is carried out during the period and no transaction cost is incurred.Whereas in trading model using neural networks, transactions were many causing high transaction cost.The profitability of Neural Network model is compared with Buy & Hold strategy for the selected three series.The results are given in tables 1, 2 & 3.
It may be seen from the tables that total net profit using Neural Network Model is generally higher than that of Buy and Hold Strategy even after transaction costs indicating usefulness of technical trading rules.

Statistical tests
The most widely used risk adjusted investment performance measure is developed by Prof. William F. Sharpe; his measure is not only widely used in academia but also by market practitioners.
The Sharpe Ratio (SR) can be calculated as follows.
Where r i is the return over period i, is the mean and is the sample standard deviation over the n periods observed and r f is the risk-free rate of interest.Originally, the benchmark for the Sharpe Ratio was taken to be a risk-less security, where the differential return is equal to the excess return of the fund over a one-period risk-less rate of interest.The usefulness of the Sharpe Ratio is based on the premise that a differential return represents the result of a zero-investment strategy.But in case of trading in a forward or future contract, one need not finance the asset by making full payment, often such contracts can be purchased by providing a small margin payment or providing some short of guarantee.Therefore traded contracts of stock index futures can be considered as zero-investment strategies.
Sharpe ratio in zero-investment strategies can be calculated omitting risk free rate as follows.
The Sharpe ratio of an investment can be compared with any benchmark by computing the Sharpe Ratio for the benchmark and compare with the investment model.The Sharpe Ratio of Neural Network model and Buy and Hold Strategy are compared for the three financial series in Tables 4, 5 and 6.
In the tables, it can be found that Sharpe Ratio of Neural Network model is higher than that of Buy and Hold Strategy in most of the cases indicating that the model has given better risk adjusted return.The Sharpe Ratio is also directly related to the t-statistic for measuring the statistical significance of the return.The Sharpe Ratio, when multiplied by the square root of 'n' (the number of returns used for the calculation) is equivalent to t-statistic.The t-statistic as defined above for the full financial series is calculated in table -7.Since t-statistics of all the three financial series are not only positive in but also statistically significant at 1% level, use of the artificial neural network based trading model may be considered as a better alternative to Buy and Hold Strategy.

Conclusion
In the study, investment decisions were taken using technical analysis based trading rules and tested on three stock index series in Indian stock market.Instead of relying on a single trading rule, the rules are combined using an artificial neural network model.The theoretical profits from the model are estimated and compared with profits obtainable from "Buy and Hold" strategy.It is observed that the risk-adjusted performance of the Neural Network based trading model is generally better than Buy and Hold strategy.
Like many previous studies, the present study also demonstrates that it is possible to earn positive return by using technical trading rules.However one of the major impediments of trading profit is transaction cost.The study is carried out taking a relatively low transaction cost of 0.05%, usually applicable for futures trading where trades are squared off without delivery.Wherever the delivery is involved, the brokerage fees are significantly higher.Thus investor has to pay more attention in minimizing transaction cost for trading success.In the study, only moving averages are used as inputs to the network.Many other indicators; both technical analysis indicators and fundamental analysis ratios can also be used as inputs to the artificial neural network model to improve the investment performance.Configuration of the neural network model and node relationships can also be altered for further development. 1

Table 2 .
Comparing profits of Neural Network model and Buy and hold strategy(For Junior-Nifty)

Table 4 .
Comparison of Sharpe Ratio (For Nifty)

Table 5 .
Comparison of Sharpe Ratio (For Junior-Nifty)

Table 7 .
Sharpe Ratio and t-statistic of Financial Series * Significant at 1%. Figure 1.Schematic diagram of an Artificial Neural Network