Thursday, November 24, 2011

Quantum Mechanics

...or: a point of view


Introduction

Quantum Mechanic comes with a strange, non-intuitive way to look at the world. In the following text, I try to summarize one possible point of view on it. It is mostly a summary derived from a paper (XXX find reference) written around 1990. I know that quantum mechanics and related notions like determinism are, still nowadays, open to discussion: basically everybody agrees on the mathematics, but not necessarily on how to interpret the results. So this is just one possible point of view.


Basics

In a rather computer-scientisty simplification, let us assume that "the world" contains a finite number of "particles" that can each be in two possible states. For example, it can be elementary particles whose spin is either "up" or "down"; but the details don't matter here, so it can also stand for, say, an atom that is located either here or there.

Let us assume that there are n such particles. The classical model is to say that the state of the world is encoded by n binary digits; i.e. there are 2^n possible states. We can summarize a complete state with a tuple (a_1, ..., a_n) of boolean values, or encode it all in only one integer i between 0 and 2^n-1.

The quantic model, on the other hand, is more complicated. A "state of the world" is, instead of a tuple of n booleans, now a tuple of 2^n complex numbers. In other words, speaking linear algebra, it is a vector in the complex vector space of 2^n dimensions. Moreover, it is not any vector, but we restrict ourselves to the vectors of norm 1. (In addition, there is the rule that two vectors are considered equivalent if they are multiple of each other by a complex number. This gives us precisely the complex projective space of 2^n-1 dimensions. But it is a bit harder to work with, so I'll work with the vector space instead.)

The classical state i (for i between 0 and 2^n-1) corresponds to the quantum state vector where all components are zero, apart from the i'th component, which is one.

In particular, consider the situation where we have only one particle. In the classical model, the set of states is {0, 1}. In the quantic model, the set of states is the (much larger, because infinite) set of 2-vectors (a, b) of complex numbers of norm 1, under the rule of the projective space. (This is equivalent to taking only the 2-vectors of the shape (1, b), for any complex number b, plus one additional 2-vector (0, 1) that we may intuitively see as (1, infinity) renormalized. In other words this is topologically equivalent to a regular sphere.)

The theory can be developed purely on vectors like these ones, and we get the (simplified) physics of quantum mechanics this way. (The first simplification is obviously that we are in the finite case here, with only two classical states per particle.)


Operations

Now we are going to operate on the state. First we will assume that all operations are reversible. For example, in the classical model, consider the world of two particles (a, b), for a and b in {0, 1}. Assume that b is originally in the state 0, and we want to "measure" the state in which a is --- that is, we want to copy the state of a to b. This can be done using the following function:

               (0, 0)   |---->   (0, 0)
               (1, 0)   |---->   (1, 1)

For completeness, we also have to define what the function would do if we started with a state with b = 1, so let's say:

               (0, 1)   |---->   (0, 1)
               (1, 1)   |---->   (1, 0)

i.e. the function is

               (a, b)   |----> (a, b XOR a).

The idea is that we can measure, in the classical model, if some particle a is in the state 0 or 1 by repeating the operation above a large number of times: we copy the state of a to the first atom of our measuring device, and to the second atom, and to the third one, and so on and so forth, until our complete measuring device ends up as particles that are either all in the state 0 ("the big light of the measuring device is off") or all in the state 1 ("the big light of the measuring device is on").

The equivalent in the quantic model is straightforward: the four classical state (0, 0), (1, 0), (0, 1) and (1, 1) correspond to four quantum states that we are going to write e00, e10, e01 and e11. They make a basis of the 2^2-dimentional vector space. The four relations above give the image of these four basis vectors, so obviously the mapping can be extended as a linear application. We obtain in this way an orthonormal linear application. More generally, any orthonormal linear application can be used and represents some physical operation that we can apply onto the state.

So what do we get if we try, like above, to "measure" the state in which the first particle is, by copying it to a large number of other particles? There are actually more than two cases to consider, because the original particle can be in more than two (non-classical) states. But first, note that the two classical cases behave like before. If we work with 3 particles, we can denote by O1 the operation that copies the state from the first particle to the second and O2 the operation from the second to the third:

                    O1           O2
             e000  |-->  e000   |-->  e000
             e100  |-->  e110   |-->  e111

More generally, if we assume that the state of the first particle can be arbitrary, but the other particles start at their state "0", we get the following transformations, written as 2^3-vectors:

e000  e010  e001  e011
  |e100 |e110 |e101 |e111             e000     e110        e111
  |  |  |  |  |  |  |  |                |        |           |
                               O1
( a, b, 0, 0, 0, 0, 0, 0 )    |-->   (  a, 0, 0, b, 0, 0, 0, 0 )

                               O2
                              |-->   (  a, 0, 0, 0, 0, 0, 0, b )


Combining states

The description given above assumes a closed-world system: we assume that the world contains exactly n particles, and we manipulate its state as a vector of 2^n complex numbers. But there are however two extra things we might want to do: we might want to take two systems, so far independent, and combine them into one bigger system; and we might want to do the reverse, i.e. remove one or some particles from our system. The exact physical sense of these operations is not completely clear, but they are essential in practice. We will consider them separately, because they have very different implications. I will first show the easy part, which is combining.

Consider what it means classically. You have a system of n particles whose state can be written (a_1, ..., a_n), using booleans. You want to combine it with another system (b_1, ..., b_m). The resulting system is obviously (a_1, ..., a_n, b_1, ..., b_m), i.e. it is encoded as n+m booleans.

A similar operation works fine in the quantic model. You have two systems (a_1, ..., a_(2^n)) and (b_1, ..., b_(2^m)). These are vectors of complex numbers of norm 1. You want to get a combined vector of length 2^(n+m). Note that the combined dimension is now the product of the two original dimensions 2^n and 2^m. I believe the operation to be called a tensor product:

(a_1 b_1,  a_1 b_2, ...,  a_1 b_(2^m),  a_2 b_1,  a_2 b_2, ...)

This preserves the property of "being of norm 1" (just compute the norm of the above longish vector to check). It also works as expected for the case where the original vectors were "classical", i.e. they where just (0, 0, .., 1, .., 0): the tensor product of two such vectors is also 0 everywhere except in one place.

Note however the main difference with the classical case: we don't obtain all possible vectors in this way. There are a lot more vectors in a 2^(n+m)-dimension vector space than what you can get as the image of the "combination" function, which starts from a (2^n + 2^m)-dimension vector space. (Or rather, to consider them as projective spaces instead of vector spaces, there is a lot more in 2^(n+m)-1 dimensions than there is in (2^n-1 + 2^m-1) dimensions, but the argument is the same.)


Forgetting

In the section "Operations" above, we ended up in the 2^3-dimension state (a, 0, 0, 0, 0, 0, 0, b). This 8-vector cannot be obtained as the tensor product of a 4-vector and a 2-vector. What does it mean? One answer is that it is impossible to "split up" this system of 3 particles. It is in a different state than any combination of two independent systems of respectively 2 and 1 particles. (This is what is called "entanglement".)

Our perception of reality is limited to systems of many particles --- which is why I talked about the "measuring" operation above. But this same limitation has another consequence: if we can only know the result of a physical experiment by measuring a large number of particles together, it is fine too if that same measure "forgets" to account for a few of them. In other words, when we copy the state of our first particle onto the state of all the particles in the measuring device, we will create many particles in the same state, and observe some or most of them --- but never absolutely all of them.

"Forgetting" some particles is what is classically done with a projection. Let's say you want to forget the last particle in a tuple of booleans (a_1, ..., a_n); then obviously the result is the shorter tuple (a_1, ..., a_(n-1)). How can we similarly "forget" the last particle in the entangled state above? Or, even more practically, if the state (a, 0, 0, ..., 0, b) describes the state of our whole measuring device, then what do we see by looking at it?

As it turns out, this "forgetting" can be written down as some computation rule which has unexpected consequences.

We need to introduce a different notation first. From a vector v of 2^n components, of norm 1, we can build a (2^n)x(2^n) matrix. Let 'v' denote, more precisely, the vector written in one column, and let 'v*' denote the transposed vector written in one row, in which, additionally, each component has been replaced with its complex conjugate. Then we can look at 'v' as a (2^n)x1 matrix and 'v*' as a 1x(2^n) matrix, and perform the matrix product 'v times v*', obtaining a (2^n)x(2^n) matrix.

We cannot obtain all possible matrices this way; the matrices we obtain satisfy this rule (understanding it is not essential here): they are Hermitian positive-semidefinite, and the sum of their eigenvalues (which are necessarily nonnegative real numbers) is 1.

In fact, the matrices we obtain this way happen to have all their eigenvalues being 0, apart from one of them which is 1, and the corresponding eigenvector is the original vector 'v'. We did not do anything apart from changing the representation, so far. (The common terminology is to say it is because we started from a so-called "pure state"; in this post I argue that all quantum states of reality are pure.)

But now we want to define the operation of "forgetting", which should transform a 2^n-vector into a 2^(n-1)-vector. From the starting vector we form the Hermitian matrix H as above. Then we perform a trace-by-block operation: we split the matrix H in 4 equal blocks, we forget the two blocks that are off diagonal, and we add the two blocks that are on the diagonal. In this way we obtain a 2^(n-1) x 2^(n-1) matrix H'. It is still Hermitian positive-semidefinite, and the sum of its eigenvalues is still 1. However, the eigenvalues are no longer necessarily 0, 0, ..., 0, 1. In general we get nonnegative real eigenvalues l_1, ..., l_(2^(n-1)) whose sum is 1, and a corresponding set of eigenvectors v_1, ..., v_(2^(n-1)).

The physical interpretation of this is: when we forget about one particle, then the "world state" in which the remaining particles are is not always one single well-defined answer. The eigenvalues are nonnegative real numbers whose sum is 1, so they can readily be interpreted as "probabilities". Simplifying a bit, we can say that the state can be either v_1 with probability l_1, or v_2 with probability l_2, and so on. (In the terminology above, this is a "non-pure state", or "mixed state".)

Let us see what it gives in our running example. Start from the vector

       ( a )
       ( 0 )
       ( 0 )
   v = ( 0 ).
       ( 0 )
       ( 0 )
       ( 0 )
       ( b )

Write ~a for the complex conjugate of a number a. (Note that a*~a is real and nonnegative, and the norm of a complex number is the square root of a*~a). Then

                ( a*~a 0  0  0  0  0  0 a*~b )
                (   0  0  0  0  0  0  0  0   )
                (   0  0  0  0  0  0  0  0   )
   v times v* = (   0  0  0  0  0  0  0  0   )
                (   0  0  0  0  0  0  0  0   )
                (   0  0  0  0  0  0  0  0   )
                (   0  0  0  0  0  0  0  0   )
                ( b*~a 0  0  0  0  0  0 b*~b )

Perform the trace-by-block once:

    (a*~a 0  0  0 | 0  0  0 a*~b)
    (  0  0  0  0 | 0  0  0  0  )
    (  0  0  0  0 | 0  0  0  0  )              (a*~a 0  0  0  )
    ( _0__0__0__0_|_0__0__0__0_ )    forget    (  0  0  0  0  )
    (  0  0  0  0 | 0  0  0  0  )   |------>   (  0  0  0  0  )
    (  0  0  0  0 | 0  0  0  0  )              (  0  0  0 b*~b)
    (  0  0  0  0 | 0  0  0  0  )
    (b*~a 0  0  0 | 0  0  0 b*~b)

Performing it more times doesn't essentially change the result any more:

    (a*~a 0 | 0  0  )
    ( _0__0_|_0__0_ )    forget    ( a*~a  0   )
    (  0  0 | 0  0  )   |------>   (   0  b*~b )
    (  0  0 | 0 b*~b)

This final matrix is diagonal, so its eigenvalues are a*~a and b*~b and its eigenvectors are (1, 0) and (0, 1). a*~a and b*~b are real, nonnegative numbers, and their sum is 1.

Let me summarize what we did. We started with a single particle in the nonclassical quantum state (a, b). We "measured" it by copying its state to a large number of particles, and then we reduced the result by the "forgetting" operation. The experimental result that we obtain is: with a probability of a*~a we observe the quantum state (1, 0), i.e. the first classical state; and with a probability of b*~b we observe the quantum state (0, 1), i.e. the second classical state.


Conclusion (and handwaving)

Doing this strange trace-by-block reduction is, in fact, really the "correct" way to proceed --- in the sense that it gives the correct answer in practice. Any of the common "strange experiments in quantum physics" can be modelled this way. We start with some particles, perform some operation on them, and at the end we want a "measurable" answer, so we perform the "forgetting" operation to reduce the (immense) dimension of the complex linear space we end up with. We end up with a set of eigenvalues and corresponding eigenvectors. If the experiment that we performed behaves "as classically expected", then the eigenvalues are 0, 0, ..., 0, 1, and there is only one interesting eigenvector. But if we performed one of the "strange" experiments, then what we end up observing is one of the possible eigenvectors, with the probability given by the corresponding eigenvalue. If we repeat the very same experiment again and again, we will end up observing different results, but the probabilities will be as predicted. (Also, note that because of the way we perform the reductions, the eigenvectors themselves correspond to classical states.)

Yet another way to put it is that this reduction is very effective: it allows us to reduce an immense vector space into just one of two answers. Our minds can only make sense of a reduced answer; there is no way we can hope to observe directly the immense vector space. Moreover, in practical everyday experimentations, the reduction gives a 100% chance to one answer, so the world appears deterministic to us.

And in case it doesn't, then I believe that some handwaving argument is possible to explain why we still observe a single classical answer with the given values as probabilities, instead of actually observing the combined answer. The argument goes along the lines of: consider an experiment that ends up with a reduced state that is a family of N classical answers instead of just one. Then if we reason about the answer that we got, then we are actually performing N reasonings in parallel, each one classically based on one of the N answers; both before and after the reasoning our mind itself is in the combination of N classical states. But describing such a situation is a purely theoretical exercice. It is not distinguishable in practice from a situation where only one of the answers showed up.