A Bayesian Approach for Large Asset Allocation

The Black-Litterman model combines investor’s personal views with historical data and gives optimal portfolio weights. In (Andrei & Hsu, 2020), they reviewed the original Black-Litterman model and modified it in order to fit it into a Bayesian framework, when a certain number of assets is considered. They used the idea by (Leonard & Hsu, 1992) for a multivariate normal prior on the logarithm of the covariance matrix. When implemented and applied to a large number of assets such as all the S &P500 companies, they ran into memory allocation and running time issues. In this paper, we reduce the dimensions by considering Bayesian factor models, which solve the asset allocation problems for a large number of assets. In addition, we will conduct sensitivity analysis for the confidence levels that the investors have to input.


Black-Litterman Asset Allocation Model
The Black-Litterman asset allocation model developed by Black and Litterman in 1992 has been widely used for decades and it is presented in more detail in the paper by (He & Litterman, 2002). Suppose that m assets in the market are considered. The returns of those assets r = (r 1 , r 2 , . . . , r m ) T follow a multivariate normal distribution with mean µ and covariance matrix Σ. That is r ∼ N m (µ, Σ) Black and Litterman proposed the following CAPM (Capital Asset Pricing Model) prior for the mean of the return: where π is the equilibrium risk parameter and τ indicates the uncertainty of the CAPM prior. In addition to the CAPM prior, they also take investor's views into consideration. Suppose that the investor has k views. His or her views can be expressed in the equation: where the matrix P is a k × m matrix, q 0 is a k × 1 vector, and Ω is a k × k matrix, usually diagonal. Each row in P and q 0 represents a personal view.

Introduction
Black and Litterman derived the corresponding optimal portfolio weights and these weights can be computed by inputting all involved parameters into their probability model. Along the same lines, (Andrei & Hsu, 2020) considered a Bayesian statistical model instead for asset allocation for a certain fixed number of assets. They assumed that the n returns r 1 , r 2 , ..., r n are independent, where each r i is a vector containing the returns of m assets, which are normally distributed as in equation (1) for i ∈ {1, 2, ..., n}. The priors for µ and Σ are assumed to be independent, where the prior on µ can be expressed as the investor's views in equation (3). Following (Leonard & Hsu, 1992), they considered a multivariate normal prior for the logarithm of the covariance matrix A = log(Σ). The corresponding covariance matrix involved in the multivariate normal distribution is of size 1 2 m(m + 1) × 1 2 m(m + 1). Therefore, if one considers the whole S &P500, the size of this random matrix in terms of memory would be of around 106GB. Because of this issue, we decided to incorporate factors in order to reduce the dimension. Hence, as we will see in the following sections, after applying factor models, we will incorporate a prior on the covariance matrix of the common factors instead of incorporating a prior directly on the covariance matrix of the returns. The dimension of the covariance matrix of the common factors is q × q, where q represents the number of common factors, which is much smaller than m × m, the dimension of the covariance matrix of the returns, where m is the number of assets.
Our work consists of combining the two Bayesian versions for the Black-Litterman model from (Andrei & Hsu, 2020) with the work of (Lee et al., 2007) and the work of (Lee & Shi, 2000).

Assumptions
We consider a factor model on n returns with m assets each and k investor's views: Where we introduced the following notation: • µ is a vector of expected returns for each of the m assets • Λ is the loading matrix from the factor model • e i are unobserved errors which satisfy the following assumption: (1) Hence, from the above factor model, we obtain that the conditional distribution of the returns r i , given the factor f i is: (2) Next, we consider a two-stage prior for Λ and Ψ: ∼ IG(α j , β j ) for all j ∈ {1, 2, ..., m} Here, Λ T j is the j th row in Λ.
Therefore, all the model assumptions are:

Posterior Distributions
(a) Approximate conditional posterior for α given µ, σ 2 1 , σ 2 2 , r 1 , ..., r n . Following (Andrei & Hsu, 2020), with the common factors playing the role of the returns, we can approximate the conditional posterior distribution of α given σ 2 1 and σ 2 2 with a normal distribution that has as mean the vector α * and covariance matrix (Q + G) −1 . Hence, we obtain that: and the matrix Q is computed in the following way. If we let e i , d i be the i th normalized eigenvector with its corresponding eigenvalue of S = 1 n (r i − µ)(r i − µ) T , respectively, then h i j is obtained by looking at the equation Vec * (log(Φ)) T h i j = e T i log(Φ)e j and identifying the coefficients of the entries in the log(Φ) matrix. With those h i j , we can finally compute Q: Conditional posteriors for σ 2 1 and σ 2 2 given α Following (Andrei & Hsu, 2020), with the number of stocks m replaced by the number of common factors q and d = 1 2 q(q + 1): Let us find now the updated density for f i : Let us focus on the term in the exponential. For simplicity of formulas, let us drop the exp {·} and the − 1 2 factor: We will repeatedly make use of the following Lemma: Lemma 1. Let M be a symmetric and invertible matrix, then the following identity holds: To update the density for f i , we first apply Lemma 1 for and we obtain that the term in the exponential for the posterior of f i multiplied by − 1 2 is: Here, only the first term depends on f i and we actually observe that it is the kernel of a normal distribution. Therefore, we obtain that: Next, we are ready to find the posterior for µ: If we focus only on the first exponential, we can apply the typical trick of subtracting and adding r * and we obtain that the term in the first exponential is equal to: Therefore, by using Lemma 1, we obtain that the posterior for µ is: Let us first focus our attention on the last two exponentials. We notice that one sum is over columns (the one over i), while the other sum is over the rows (the one over j). However, we can write the sum over i as a sum over j in the following way: Here, we introduced the following notation: • r j· = the j th row in the matrix of returns R = r 1 ... r n • F = f 1 ... f n is the matrix in which we have as columns the common factors.
• µ j is the j th entry in the vector of means µ.
• Λ j is the j th row in the matrix Λ.
Since we managed to change the summation so that it is with respect to the rows, we can now combine the last two exponentials from the joint posterior density presented above: Only the last two exponentials in equation (8) depend on Λ and for simplicity we omit the − 1 2 factor: Since the only terms that depend on Λ j are the first two, for now we can focus only on them and they will give us the posterior. We can apply again Lemma 1 for j Λ 0 j and we obtain: Therefore, the posterior for Λ j is proportional to: That is, given Ψ j , the conditional posterior for Λ j is a normal with mean vector µ j and covariance matrix Ψ j Ω j , where: (e2) Posterior for Ψ If we go back to equation (8) and collect the terms that depend on Ψ, we obtain that the posterior is: Therefore, we obtain that the posterior for Ψ j is:

Implementation
Now that we have derived our posteriors, we are ready to implement them using a Gibbs Sampler. We will use a Metropolis-Hastings algorithm for sampling α, for which we need both the exact posterior distribution and the approximation obtained with the Volterra integral equation. It is an approach introduced by (Leonard & Hsu, 1992) and (Albert et al., 2000) and used in (Andrei & Hsu, 2020). Given σ 2 1 , σ 2 2 and µ, the posterior for α can be approximated by a normal density with mean vector α * and covariance matrix (Q + G) −1 , where α * , G and Q are defined in (5) and (6), respectively. The exact conditional posterior given σ 2 1 , σ 2 2 and µ is Vol. 10, No. 1;2021 The Metropolis-Hastings step at i th iteration would be that we would simulate a candidate value from the approximate posterior distribution: α ≈∼ N(α * , (Q + G) −1 ) and we would accept it with probability min(ρ, 1), where

International Journal of Statistics and Probability
It is useful at this point to remember that there is a connection between π * and π, since there is one between A and α, namely: Albeit the Gibbs Sampler converges to the same distribution no matter the starting points, we should try to initialize it with good estimates. Also, we have to make sure that we specify the hyper-parameters with values that would make sense in the real world: • n = number of returns in the historical dataset= number of returns from 1/2/2014 to 12/29/2017.
•F init = f 1f2 ...f n , wheref i for i ∈ {1, 2, ..., n} are the common factors obtained by fitting a factor model on the historical dataset with an optimal number of factors of q = 18 determined from a scree plot of eigenvalues.
• We also have that f i |Φ indep.
∼ N q (0, Φ). In order to specifyΦ init , we take the covariance of the above found common factors:Φ init = Cov(f i ).
• We also have the following assumption.
-In order to initialize σ 2 1 , we have to take the variance of the first q entries in Vec * (log(Φ init )).
-In order to initialize σ 2 2 , we have to take the variance of the last d − q entries in Vec * (log(Φ init )).
∼ N q (Λ 0 j , Ψ j H j ). Since, in general, we do not have any prior information on the factor weights, we specify the hyper-parameters to be: -We initialize the variance Ψ j H j with a big value:Ψ jinit = 1 andĤ jinit = 10 10 I q .
• Also, we remember that Ψ j indep.
∼ IG(α j , β j ), for all j ∈ {1, 2, ..., m}. Similarly to the previous point made, in the real world, we do not have any prior information on Ψ j and this should be reflected in our choice of α j and β j . If we let α j → 0 and β j → 0 in the pdf of the IG(α j , β j ), we notice that we obtain an uninformative prior. Therefore, we initializeα jinit =β jinit = 10 −10 .
Using the Metropolis Hastings step that was just discussed, we arrive at the following Gibbs Sampler: http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 10, No. 1;2021 Algorithm 1 Gibbs Sampler log(Φ) (t+1) and e j (t+1) the eigenvalue and normalized eigenvector of S (t+1) f respectively.
9: Compute f (t+1) i j by identifying the coefficients of the entries of the log (Φ) matrix from the equation

Results-Personal Views on 4 Stocks
We would like to conduct sensitivity analysis for the confidence in the investor views (the diagonal elements of the Ω matrix denoted by ω i ). We would expect the model to behave in a similar manner as before: • The more confident the investor is in the inputted views, the closer the model should follow them • The less confident the investor is in the inputted views, the closer the model should follow history We choose to have views for the following 4 stocks: AAPL, FB, GOOG, MSFT and we will consider views on industry sectors in the next section. We will use the daily returns from 1/2/2014 to 12/29/2017. We create the following investor inputs (again the columns are in order AAPL, FB, GOOG, MSFT and the rows represent the views). Please notice that the matrix P in our implementation has a lot more columns (one for each stock actively traded in S &P500), but the vast majority of the entries are 0: Albeit the memory allocation problem which originally motivated the incorporation of the Bayesian factor model was solved, the version presented in this paper is still computationally expensive since we have to sample from many distributions. Therefore, an exhaustive search was ran in parallel on multiple cores (each core running the Gibbs Sampler for 1 pair (ω 1 , ω 2 ), the grid was split into 16 evenly split ranges, each one running 6 simulations). The burn period is 10 3 and the iteration period is 10 4 .
In the following plot, 2 of the axis are represented by the two confidence levels (ω 1 and ω 2 ) and the third one is represented by the l 2 distance ||Pµ post − q 0 || 2 . As mentioned previously, this distance should go to 0 as ω 1 and ω 2 go to 0, which can easily be observed in Figure 1. Furthermore, similar to the versions introduced previously, as ω 1 and ω 2 increase, the distance converges to the same number. Since ω 1 and ω 2 are standard deviations, a high standard deviation represents a lack of confidence in the personal views inputted. Therefore, intuitively, the model should only take into consideration the history. This is precisely how the model behaves. If we just consider the historical returns, the unbiased estimator for µ is the sample mean of the returns, r. The distance ||Pr − q 0 || 2 = 0.05386381, which is the level at which the curve in Figure 1 flattens.
One could use this model to hold a portfolio with an initial starting capital of $100, 000 over a testing data set consisting of the daily returns during the month of January 2018. Please keep in mind that, in order to obtain portfolio weights, we estimate from the Gibbs Sampler Σ post = Λ post Φ post Λ post T + Ψ post and we try to maximize the portfolio returns, while minimizing the portfolio risk. Hence, we would like to find max w w T µ post − λ 2 w T Σ post w, where λ is the investor's risk aversion coefficient. In his paper, ? suggests that λ = 2.5 is a reasonable choice for equities. By making the derivative with respect to w equal to 0, and by solving the resulting equation for w, we obtain: w * = 1 2.5 Σ −1 post µ post . The profits without considering any fees on a testing data set consisting of the returns over the month of January 2018 for all the previously mentioned combinations of confidence levels (ω 1 and ω 2 ) averaged $40, 075.87 with a standard deviation of $14, 373.88

Results-Personal Views on Industry Sectors
In this section, we will present similar results to the ones presented in the previous section. We will have the exact same training and testing data sets as before. The only change is in the personal views inputted in our model. However, this time we would like to enter personal views about different industry sectors. The stocks in the S &P500 are divided into 11 industry sectors. These industry sectors are described in Appendix 2.6.
In order to have good personal views and not just random guesses, as we have done so far, we will use the weighting recommendations provided by CFRA 1 , an independent fundamental and forensic investment research firm. Each stock within the same sector receives equal weight that sum up to 1, with a positive weight for the ones outperforming and a negative weight for the ones under-performing. We will have the following 4 personal views 2 : (1) Information technology outperforms utilities by 0.2% with confidence level ω 1 .
We picked 2 distinct confidence levels for the 4 views simply because we wanted to have another 3D plot with 2 of the axis represented by ω 1 and ω 2 and the third axis represented by the distance ||Pµ post − q 0 || 2 . Just as before, the exhaustive search was ran in parallel on multiple cores. The same burning and iteration periods were used also.

Figure 2. Distance when considering industry sectors
Again, the model behaves exactly as our intuition suggests. As ω 1 and ω 2 go to 0, the distance ||Pµ post −q 0 || 2 converges to 0. Moreover, for bigger values of ω 1 and ω 2 (small confidence in views) the distance converges to 0.004999748 = ||Pr−q 0 || 2 . This confirms our intuition that the less confident the investor is in his or her views, the more the model takes into consideration the history.
Moving on to presenting the profits, we used the same starting capital of $100, 000, the same testing data set over the month of January 2018 and the same methodology for computing the portfolio weights. The mean of the profits over all the simulated pairs (ω 1 , ω 2 ) was $37, 576.68 with a standard deviation of $5, 857.198.