Donald R. Burleson, Ph.D.
Copyright (c) 2005 by Donald R. Burleson. All rights reserved.

(To return to the "About the Author" page, click here.)




                Given a square (n-by-n) matrix A, the zeros li of the nth-degree polynomial


det(A - lI) (where I represents the identity matrix of the same size as A)


are the eigenvalues of A, and to each eigenvalue li there corresponds an eigenvector vi (a vector unique up to a scalar multiple), such that for 1 < i < n,


Avi = livi .


That is,  the nature of an eigenvalue and its corresponding eigenvector is that the effect of multiplying the matrix A onto the vector is the same as multiplying the number li onto the vector.  Also,


A2vi = A(Avi) = A(livi) = li(Avi) = li(liA) = li2A


and in fact one may easily show by induction on n that


Anvi = linvi


for all positive integers n.  Thus we have, in this context, the very economical substitution of a mere power of the number li for the more laboriously computed power An of the given matrix, having the same multiplicative effect on the eigenvector—and potentially, by extension, the same multiplicative effect on any vector in the same space, when that vector is written in terms of an eigenbasis {vi} : for w = Σ kivi , in terms of appropriate basis-representation scalar coefficients ki  we would have Anw = Σ kiλinvi since A represents a linear transformation.


                It is natural to consider generalizing the relation Anvi = linvi to non-integer values of the exponent n.  [Remark: By the method about to be described, for a nonsingular matrix A if one puts n = -1 one simply produces the multiplicative inverse matrix A -1 by a somewhat circuitous way of computing it; all other negative-integer values of n are of course powers of A -1.]


                Let’s look at the case n = 1/2, so that the relation in question becomes


A1/2= li1/2  vi .


The “square root” of the matrix A should reasonably be defined as a matrix M such that MM = A and such that M = A1/2 satisfies the above relation for any given eigenvalue and corresponding eigenvector.  And indeed it turns out that the eigenvalues of A provide a way of computing it.


Take for example the 2X2 matrix


            A =              


The eigenvalues li are the zeros of the characteristic polynomial of A:



det(A - lI) = det     = (3-l)(2-l) - 2(1) = l2 - 5l+ 4 = (l-4)(l-1)

so that the spectrum of A, i. e. the set of all eigenvalues of A, is { l1 = 4, l2 = 1).  Let an eigenvector corresponding to l1 be (x,y).  By putting



so that with 3x + y = 4x (and equivalently 2x + 2y = 4y), we have y = x, and we may most simply take x = y  = 1 so that the corresponding eigenvector (x,y) is v1 = (1,1).


                In like manner for the other eigenvalue l2 we may derive the eigenvector v2 = (1,-2).


                If we denote the desired square root matrix A1/2 as  with a view to finding the elements a, b, c, and d, we have, from the relation A1/2v1 = l11/2v1:



which by matrix multiplication implies a + b = 2 and c + d = 2.  Similarly, from A1/2v2 = l21/2v2 we have



which implies a - 2b = 1 and c - 2d = -2.   Solving the a,b equations together and the c,d equations together gives a = 5/3, b = 1/3, c = 2/3, and d = 4/3, so that the desired square root matrix is



As a check, since we should have M2 = A, we may multiply:



                Different sorts of eigenvalues may produce different sorts of results.  (For present purposes we will look only at scenarios with distinct eigenvalues, though similar results may be pursued in other cases.)  For example, the matrix



has eigenvalue spectrum { l1 = 4, l2 = -1 } with corresponding eigenvectors v1 = (1,-2) and v2 = (1,3), and since one of the eigenvalues is negative, with l21/2 = (-1)1/2 = i, the square root matrix turns out to have complex entries when we work it out.  One obtains



which, as one may verify, checks upon multiplication:


                Naturally the same process applies to larger square matrices.  E.g., for C = ,

which (since it’s a triangular matrix) has eigenvalues equal to the diagonal elements 1, 9, and 4, we have



(The only limitation here, with larger matrices, is the possibility of finding the zeros of the characteristic polynomial; for degree n > 5 they can  in general only be approximated.)


                One may of course pursue higher-index “roots” of a square matrix A, e.g.  etc. by working with the appropriate fractional power.  For example, with n = 1/3, from the relation A1/3vi = li1/3vi it turns out that



which checks as



                In a similar fashion we may compute noninteger square-matrix powers in general, as they only involve raising an eigenvalue to the powers in question and following the procedure shown.


                Remark: By using the identity (where i denotes the imaginary unit) li =  (e ln(λ))i  = ei ln(λ) = cos[ln(λ)] + i sin[ln(λ)]  (where if λ < 0 we may use the principle complex value of the logarithm, i.e. ln(|λ|) + πi) one may extend the determination of Az to complex powers: Az = Ax+iy  =  Ax(Ay)i . In this connection it is interesting to note that for a nonsingular square matrix A, if we use the indicated method to raise A to the power i  and then raise the resulting matrix Ai to the power i  producing (Ai)i  = A – 1, all in terms of principal complex value, the result is the ordinary inverse of A.  Since the action of raising the nonsingular matrix A to the power i  twice in succession has the effect of producing A – 1, we may call the matrix Ai  the semi-inverse of A.  For purposes of raising the eigenvalues to the power i, it is helpful to recall that Euler’s formula imples 1i = e- and (-1)i  = e , in principal value.


                But let us look at an important application of the eigenvalue-based method of raising a square matrix A to real but non-integer powers.




Computing a market-share transition matrix from knowledge of

cumulative market-share changes over two or more transition periods


                Consider, by way of quick illustration, a simple economic scenario in which there are two brands A and B of a certain product.  Suppose that over any given “transition” period (period in which buyers may or may not switch brands, e.g. possibly a week, since many people shop weekly) there is a constant pattern of consumer “defection” or non-defection, defined by the following column-stochastic transition matrix T:



This means that, over one transition period, 70% of A-users will stay put with Brand A while 30% will defect to Brand B; and of B-users, 20% will defect to A and 80% will stick with B.  (The first column is the “from A” column; the second column is the “from B” column; the first row is the “to A” row; the second row is the “to B” row.)  If the current market share configuration is 81% for A and 19% for B, the matrix multiplication



shows that after one transition period the market shares will be 60.5% for A and 39.5% for B, an improvement for B because B’s retention rate is better than A’s.  (Further transitions will show further improvement for B, but only up to a point.  It can be shown that if the transition pattern T remains steady over sufficient time, the market share sequence (Markov chain) approaches a kind of fixed market share vector (“steady-state” vector) of (0.40, 0.60).  I.e., eventually A will have 40% of the market and B will have 60%.  A’s situation will not deteriorate beyond that point, and B’s situation will not further improve.)


                Now—suppose the matrix



is known to describe the market share changes after two transition periods.  (I.e., we have empirically observed that by final tally after two such periods, 55% of “going in” A-users have stuck with A while 45% have defected to B, and 30% of “going-in” B users have defected to A with 70% staying with B.)  Suppose also that the basic transition matrix T describing the market share changes over one typical transition period is unknown, and this is what we need to find.  (Among other things, it’s useful to recover T if one plans to project market share patterns over several further transition periods.)


                Since M describes the total effect of two standard transitions T, we have TT = M and we need to compute T = , which by the above-illustrated eigenvalue method turns out to be



                Thus from knowing the cumulative changes over two transition periods, we have been able to recover the transition structure that prevails over each single transition period, always assuming this structure to be constant for the duration in question.  If the recovered matrix structure T is indeed constant, it becomes a predictor of the effects over the next transition period, as well as subsequent ones.


                Similarly, if we knew that a matrix M described cumulative market share changes over three transition periods, we could recover the basic one-period transition matrix by computing  and similarly for higher-order iterations.


                In fact, if we knew, for example, that the market share changes after five periodic iterations were described by the matrix



and if for some reason we needed to find out what matrix describes the state of things at the end of the third of those five transition periods (where, again, the basic transition matrix T is unknown), we could use the above eigenvalue method to compute



finding that by the end of the third (out of five) transition periods, brand A had a 47.5% brand retention to B’s 65.0%.  In this scenario if the initial market shares are 81% (A) and 19% (B), we could recover the fact that after three transitions, the market share configuration must have been



and this from knowing only the “end” conditions.  Clearly, in this manner, with a fixed transition structure T, whether initially known or not, we may recover the accumulated effects of market share changes at any point in the chain.  And of course we may obtain the original basic transition matrix T in similar fashion.


                Obviously there are similar applications in any similar Markov-chain situation.