Yet another derivation of the Born–Oppenheimer approximation
2018-11-02
There are plenty of existing discussions of the Born–Oppenheimer approximation, but none that I've read so far are entirely satisfying.
They tend to use confusing notation, conflate operators with their representations, gloss over some crucial aspects, and so on.
The following is my attempt at a succinct derivation of the approximation that touches on all the important details.
Specifically, here are some of the questions that arose when I was first learning about this (and that I try to answer below):
What is the exact nature of the parametric dependence of the fast wavefunctions χk(x;X) on the slow coordinate X? Do the fast wavefunctions form an orthonormal basis in some way?
How does the wavefunction expansion ∑kφk(X)χk(x;X) differ from a standard basis expansion? Is it a Schmidt decomposition?
How can the kinetic energy operator on the slow space result in a derivative of the fast wavefunctions?
Why do all the surfaces seem to have the same energy?
Hilbert space
Consider a system with degrees of freedom that we will group into "slow" and "fast".
They don't actually need to be slow and fast, but these are the labels we will use.
The prototypical example is a molecule with slow nuclei and fast electrons.
States of this system live in the tensor product Hilbert spaceH=Hs⊗Hf.
On the slow space, we have the (multivariate) continuous representation ∣X⟩, and on the fast space we have ∣x⟩; together, that's ∣Xx⟩.
This doesn't have to be the position representation, but it almost certainly will be.
Parameterized Hamiltonian
For the time being, we'll keep the Hamiltonian fairly general:
H^=K^s+K^f+V^.
We have kinetic energies for the slow and fast degrees of freedom and a potential energy term that operates on the entire space.
One requirement that we'll impose is that the potential energy operator must be diagonal in the continuous representation we've chosen:
V^∣Xx⟩=V(X,x)∣Xx⟩.
This allows us to express the Hamiltonian as
⟨Xx∣H^=⟨X∣K^s⊗⟨x∣+⟨X∣⊗⟨x∣K^f+V(X,x)⟨Xx∣.
This isn't quite in the position representation, since we haven't given the kinetic energy operators a form yet.
If they looked like
i∑∂★i2∂2,
we could express the Hamiltonian properly in the continuous representation:
⟨Xx∣H^=(−i∑∂Xi2∂2−i∑∂xi2∂2+V(X,x))⟨Xx∣.
We'll come back to this form later, so you should keep it in mind, but we'll stick to being more general for now.
We define a parametrized potential operator with the following eigenvalue equation:
V^f(X)∣x⟩=V(X,x)∣x⟩.
Using this operator, we construct another Hamiltonian, which we will call the fast Hamiltonian:
H^f(X)=K^f+V^f(X).
The fast Hamiltonian is parameterized by X and acts only on Hf.
Conceptually, this is the Hamiltonian that describes the remaining (fast) system when we freeze out the slow degrees of freedom (by removing K^s) and pin them at a specific position X (by parameterizing V^).
For every X, the operator H^f(X) is a perfectly legitimate Hamiltonian for the fast system.
That means that we could construct the Hamiltonian
K^s+H^f(X),
but this is a useless object!
It describes a complicated system in the fast degrees of freedom and a collection of free particles in the slow degrees of freedom; there is no coupling whatsoever between the two.
What we'll do instead is note that the above definitions allow us to write
⟨Xx∣H^=⟨Xx∣(K^s+H^f(X)).
This may look like we've simply thrown a ⟨Xx∣ onto the Hamiltonian that we've only just ridiculed, but there's a vital difference: the X parameter of H^f(X) depends on the X value in the bra.
This is what gives rise to the coupling between the slow and fast degrees of freedom, and it's at least a little weird to think about.
Parameterized basis
The infinitely many fast Hamiltonians H^f(X) give rise to infinitely many orthonormal bases for Hf.
For any choice of X, the states ∣k;X⟩ satisfy the eigenvalue equation
H^f(X)∣k;X⟩=Ek(X)∣k;X⟩,
where the kets are also parameterized by X.
The wavefunctions for these states are commonly written as
⟨x∣k;X⟩=χk(x;X).
To be perfectly clear, we have defined a basis {∣k;X⟩}k for each value of X.
There is a basis {∣k;X′⟩}k, and another basis {∣k;X′′⟩}k, and so forth; there is nothing we can say in general about the overlap
⟨k′;X′∣k;X⟩
when X′≠X.
Given a wavefunction ∣ψ⟩∈Hf, we can expand it as
∣ψ⟩=k∑Ckψ(X)∣k;X⟩,
where the expansion coefficients are given by
Ckψ(X)=⟨k;X∣ψ⟩,
and X is arbitrary.
Since we're not mathematicians, we can (and will) take continuity for granted.
It's fairly safe to assume that the potential V(X,x) varies continuously as X is changed; after all, an arbitrarily large change in the potential when the configuration undergoes an infinitesimal shift would be unphysical.
Hence, the Hamiltonian H^f(X) and its eigenfunctions should also be continuous in the parameter X, as should the expansion coefficients Ckψ(X) for any state.
One wrinkle that we do expect is that funny things can happen at degeneracies.
The adiabatic theorem tells us that if we vary X sufficiently slowly (compared to the gap between Ek(X) and adjacent energies Ek′(X)), then the ordering of the eigenvalues remains the same and we can treat each k as roughly independent.
In that case, we have what looks like multiple hypersurfaces in (X,E) space floating above one another like sheets.
However, if the energies become equal, the gap between them vanishes, so these sheets touch and cease to be independent.
In making the Born–Oppenheimer approximation, we'll be implicitly assuming that this won't happen, so we won't dwell on this.
The ∣X⟩ representation is complete for Hs, and every {∣k;X⟩}k basis is complete for Hf.
Thus, we are free to pick a specific X′ and use the states ∣X⟩⊗∣k;X′⟩ as a basis for H, but this would be silly, since ∣k;X′⟩ is not generally an eigenstate of H^f(X) when X≠X′.
Instead, we want to use the states
∣Xk⟩=∣X⟩⊗∣k;X⟩,
where the same X appears in both kets.
To see that {∣Xk⟩}X,k also forms a basis for H (technically some sort of half-basis, half-representation mutant), we show that the transformation matrix
Uk′k(X′,X;X~)=(⟨X′∣⊗⟨k′;X~∣)∣Xk⟩=⟨X′∣X⟩⟨k′;X~∣k;X⟩
is unitary for any fixed X~.
The requirements for this are
∫dXk∑Uk′k(X′,X;X~)Uk′′k∗(X′′,X;X~)=∫dX⟨X′∣X⟩⟨X∣X′′⟩k∑⟨k′;X~∣k;X⟩⟨k;X∣k′′;X~⟩=∫dX⟨X′∣X⟩⟨X∣X′′⟩⟨k′;X~∣k′′;X~⟩=⟨X′∣X′′⟩⟨k′;X~∣k′′;X~⟩=δ(X′−X′′)δk′k′′
and
∫dX′k′∑Uk′k′′∗(X′,X′′;X~)Uk′k(X′,X;X~)=∫dX′⟨X′′∣X′⟩⟨X′∣X⟩k′∑⟨k′′;X′′∣k′;X~⟩⟨k′;X~∣k;X⟩=∫dX′⟨X′′∣X′⟩⟨X′∣X⟩⟨k′′;X′′∣k;X⟩=⟨X′′∣X⟩⟨k′′;X′′∣k;X⟩=δ(X′′−X)δk′′k.
In the last step we used the sampling property of the Dirac delta function outside an integral, with the understanding that it only exists inside an integral anyway.
A consequence of the ∣Xk⟩ states forming a complete basis is that a state ∣Ψ⟩∈H has the wavefunction
⟨Xx∣Ψ⟩=∫dX′k∑⟨Xx∣X′k⟩⟨X′k∣Ψ⟩=k∑⟨x∣k;X⟩⟨Xk∣Ψ⟩.
Alternatively, we could write this as
Ψ(X,x)=⟨Xx∣Ψ⟩=k∑φkΨ(X)χk(x;X),
where
φkΨ(X)=⟨Xk∣Ψ⟩.
The last of these is a strange animal, simultaneously serving both the roles of a basis expansion coefficient and a wavefunction for the slow space.
Since it only has a single index, this expansion looks suspiciously like a Schmidt decomposition, but is it one?
To express ∣Ψ⟩ in Schmidt form, we would need to be able to write
⟨Xx∣Ψ⟩=j∑λj⟨X∣φj⟩⟨x∣χj⟩,
where ∣φj⟩ are orthogonal states on Hs and ∣χj⟩ are orthogonal states on Hf.
While our ∣k;X⟩ are orthogonal for a fixed X, they have an explicit dependence on X, which is not allowed.
Additionally,
∫dXφk′Ψ∗(X)φkΨ(X)=∫dX⟨Ψ∣Xk′⟩⟨Xk∣Ψ⟩≠δk′k,
so these functions aren't even orthogonal.
Conversely, we also have
⟨Xk∣Ψ⟩=∫dX′∫dx⟨Xk∣X′x⟩⟨X′x∣Ψ⟩=∫dx⟨k;X∣x⟩⟨Xx∣Ψ⟩,
or
φkΨ(X)=⟨Xk∣Ψ⟩=∫dxχk∗(x;X)Ψ(X,x).
Adiabatic representation
Now that we believe that the states ∣Xk⟩ form a basis, we can try to find the matrix elements of the Hamiltonian:
⟨X′k′∣H^∣Xk⟩=∫dx∫dx′⟨k′;X′∣x′⟩⟨X′x′∣H^∣Xx⟩⟨x∣k;X⟩=∫dx∫dx′χk′∗(x′;X′)⟨X′x′∣K^s∣Xx⟩χk(x;X)+∫dx∫dx′χk′∗(x′;X′)⟨X′x′∣H^f(X′)∣Xx⟩χk(x;X)=⟨X′∣K^s∣X⟩∫dxχk′∗(x;X′)χk(x;X)+Ek′(X′)⟨X′∣X⟩∫dxχk′∗(x;X′)χk(x;X)=⟨X′∣K^s∣X⟩⟨k′;X′∣k;X⟩+Ek′(X′)δ(X′−X)δk′k.
We have a complicated kinetic energy term, but a very diagonal potential energy term.
To proceed, we'll choose the form that was mentioned earlier for the kinetic energy:
⟨X∣K^s=−i∑2Miℏ2∂Xi2∂2⟨X∣.
Then, it follows that
⟨X′∣K^s∣X⟩=−i∑2Miℏ2∂Xi2∂2⟨X′∣X⟩=−i∑2Miℏ2δi(2)(X′−X),
where δi(2)(X′−X) is the second (distributional) derivative of the delta function in the ith direction, which satisfies
∫dX′δi(2)(X−X′)f(X′)=∂Xi2∂2f(X).
Thus, we can apply the Hamiltonian to a generic state ∣Ψ⟩ as follows:
⟨Xk∣H^∣Ψ⟩=∫dX′k′∑⟨Xk∣H^∣X′k′⟩⟨X′k′∣Ψ⟩=∫dX′k′∑⟨X∣K^s∣X′⟩⟨k;X∣k′;X′⟩⟨X′k′∣Ψ⟩+∫dX′k′∑Ek(X)δ(X−X′)δkk′⟨X′k′∣Ψ⟩=∫dxχk∗(x;X)k′∑∫dX′⟨X∣K^s∣X′⟩χk′(x;X′)φk′Ψ(X′)+Ek(X)φkΨ(X)=−∫dxχk∗(x;X)k′∑i∑2Miℏ2∫dX′δi(2)(X−X′)χk′(x;X′)φk′Ψ(X′)+Ek(X)φkΨ(X)=−∫dxχk∗(x;X)k′∑i∑2Miℏ2∂Xi2∂2χk′(x;X)φk′Ψ(X)+Ek(X)φkΨ(X)=−∫dxχk∗(x;X)k′∑i∑2Miℏ2[∂Xi2∂2χk′(x;X)]φk′Ψ(X)−∫dxχk∗(x;X)k′∑i∑Miℏ2[∂Xi∂χk′(x;X)][∂Xi∂φk′Ψ(X)]−∫dxχk∗(x;X)k′∑i∑2Miℏ2χk′(x;X)[∂Xi2∂2φk′Ψ(X)]+Ek(X)φkΨ(X)=−k′∑i∑2Miℏ2[∫dxχk∗(x;X)∂Xi2∂2χk′(x;X)]φk′Ψ(X)−k′∑i∑Miℏ2[∫dxχk∗(x;X)∂Xi∂χk′(x;X)][∂Xi∂φk′Ψ(X)]−k′∑i∑2Miℏ2[∫dxχk∗(x;X)χk′(x;X)][∂Xi2∂2φk′Ψ(X)]+Ek(X)φkΨ(X)=−k′∑i∑2Miℏ2[∫dxχk∗(x;X)∂Xi2∂2χk′(x;X)]φk′Ψ(X)−k′∑i∑Miℏ2[∫dxχk∗(x;X)∂Xi∂χk′(x;X)][∂Xi∂φk′Ψ(X)]−i∑2Miℏ2[∂Xi2∂2φkΨ(X)]+Ek(X)φkΨ(X).
We have used the product rule, which states that
∂Xi∂χk′(x;X)φk′Ψ(X)=[∂Xi∂χk′(x;X)]φk′Ψ(X)+χk′(x;X)[∂Xi∂φk′Ψ(X)]
and consequently
∂Xi2∂2χk′(x;X)φk′Ψ(X)=[∂Xi2∂2χk′(x;X)]φk′Ψ(X)+2[∂Xi∂χk′(x;X)][∂Xi∂φk′Ψ(X)]+χk′(x;X)[∂Xi2∂2φk′Ψ(X)].
More pertinently, we have used the continuously-varying parametric dependence of ∣k;X⟩ on X to allow the kinetic energy operator to take its derivative remotely through the derivative of the delta function.
For convenience, we use the gradient vector ∇ with elements
∇i=∂Xi∂,
so that
∇2=∇⋅∇=i∑∂2Xi∂2,
and we drop the unitful quantities to make the expressions below look clean.
If this makes you feel dirty, don't hesitate to pencil them in where appropriate.
With this in mind, we can write
⟨Xk∣H^∣Ψ⟩=k′∑[(−∇2+Ek(X))δkk′−2τkk′(1)(X)⋅∇−τkk′(2)(X)]φk′Ψ(X)
where
τkk′(1)(X)=∫dxχk∗(x;X)∇χk′(x;X)
and
τkk′(2)(X)=∫dxχk∗(x;X)∇2χk′(x;X)
are the non-adiabatic couplings and the terms containing them are the non-adiabatic coupling terms (NACTs).
Because the derivative operator is antihermitian, we find that τkk′(1)(X)=−τk′k(1)∗(X), so τkk′,i(1)(X) is a skew-Hermitian matrix (in k and k′).
A consequence of this is that all its diagonal terms vanish: τkk(1)(X)=0.
Because the second derivative operator is Hermitian, we also find that τkk′(2)(X)=τk′k(2)∗(X), so τkk′(2)(X) is a Hermitian matrix.
Hence, all its diagonal terms τkk(2)(X) are real.
Note how the terms in the big square brackets smell like a Hamiltonian for the slow degrees of freedom, parameterized by k and k′, and expressed in the position representation.
If we define the matrix H^ with elements H^kk′ that have the position representation
⟨X∣H^kk′=[(−∇2+Ek(X))δkk′−2τkk′(1)(X)⋅∇−τkk′(2)(X)]⟨X∣,
then the overall Hamiltonian H^ looks like a matrix of Hamiltonians for the slow degrees of freedom, indexed by the surfaces.
On the diagonal, we have simply
−∇2+Ek(X)−τkk(2)(X),
where the last two terms are plain old potentials.
On the off-diagonal, we instead have
−2τkk′(1)(X)⋅∇−τkk′(2)(X),
which is a bit strange, because instead of a second derivative, it has first derivatives.
Nevertheless, this has the effect of turning the single Schrödinger equationH^∣n⟩=En∣n⟩
into a collection of coupled differential equations, indexed by k:
k′∑Dkk′φk′n(X)=Enφkn(X),
where we have given a name to the differential operatorDkk′=(−∇2+Ek(X))δkk′−2τkk′(1)(X)⋅∇−τkk′(2)(X)
as a shorthand.
In the matrix picture, this looks like
⎝⎛D1,1D2,1⋮D1,2D2,2⋮⋯⋯⋱⎠⎞⎝⎛φ1n(X)φ2n(X)⋮⎠⎞=En⎝⎛φ1n(X)φ2n(X)⋮⎠⎞.
If one is able to find the functions φkn(X) that simultaneously satisfy these equations, one can then assemble the eigenfunction
⟨Xx∣n⟩=k∑φkn(X)χk(x;X)
of the full Hamiltonian H^.
Before we continue, a few brief words about the Hamiltonians H^kk′.
It is tempting to say that these are partial matrix elements of H^ in the ∣k;X⟩ basis, but that direction is full of potential pitfalls.
For starters, which basis do we mean?
After all, there is a different one for each X, and no matter which one we pick, it would be a mistake to claim that ⟨k;X∣H^∣k′;X⟩ is the object of interest, since its position representation ⟨X′∣⟨k;X∣H^∣k′;X⟩ is not useful for us.
We could also try
⟨Xk∣H^∣k′;X⟩=∫dx⟨k;X∣x⟩⟨Xx∣H^∣k′;X⟩=∫dx⟨k;X∣x⟩⟨Xx∣K^s∣k′;X⟩+∫dx⟨k;X∣x⟩⟨Xx∣H^f(X)∣k′;X⟩=∫dx⟨k;X∣x⟩⟨x∣k′;X⟩⟨X∣K^s+∫dx⟨k;X∣x⟩⟨x∣k′;X⟩Ek′(X)⟨X∣=⟨X∣(K^s+Ek′(X))δkk′,
which is definitely not what we wanted.
No, this sort of thinking just will not do.
Born–Oppenheimer approximation
Now that we have the Hamiltonian in the adiabatic representation, all that remains is to assume that the non-adiabatic couplings are sufficiently small that neglecting the NACTs entirely is a good approximation.
This leaves us with a Hamiltonian that is diagonal in surfaces:
⟨X∣H^kk′=(−∇2+Ek(X))δkk′⟨X∣.
In other words, each H^kk is the complete Hamiltonian for the slow degrees of freedom on surface k.
It is then clear that the resulting collection of differential equations is completely uncoupled:
(−∇2+Ek(X))φkn(X)=Enφkn(X),
and they may be solved independently.
In the matrix picture, that's
⎝⎛D1,10⋮0D2,2⋮⋯⋯⋱⎠⎞⎝⎛φ1n(X)φ2n(X)⋮⎠⎞=En⎝⎛φ1n(X)φ2n(X)⋮⎠⎞.
It's somewhat peculiar that all of these seemingly independent Hamiltonians share the same eigenspectrum!
It's easy to show that τkk′,i(1)(X)=0 implies that ∇iχk(x;X)=0.
The integral
∫dxχk∗(x;X)∇iχk′(x;X)
is the overlap between the familiar state χk∗(x;X) and the weird object ∇iχk′(x;X).
Because the χk(x;X) form a complete basis, having ∇iχk′(x;X) be orthogonal to allχk(x;X) implies that ∇iχk′(x;X) is the zero element.
Hence, the Xi derivative of χk(x;X) is zero; if this is true for all i, χk(x;X) doesn't depend on X, so we can write just χk(x).
Now we can quickly show in two related ways that the above conclusion isn't a figment of our imagination:
⟨Xk∣H^∣n⟩=−∫dxχk∗(x)k′∑i∑∫dX′δi(2)(X−X′)χk′(x)φk′n(X′)+Ek(X)φkn(X)=(−∇2+Ek(X))φkn(X)=En⟨Xk∣n⟩
and
⟨Xx∣H^∣n⟩=∫dX′k∑⟨Xx∣H^∣X′k⟩⟨X′k∣n⟩=k∑∫dX′⟨Xx∣K^s∣X′k⟩⟨X′k∣n⟩+k∑∫dX′⟨Xx∣H^f(X)∣X′k⟩⟨X′k∣n⟩=k∑∫dX′⟨X∣K^s∣X′⟩⟨x∣k⟩⟨X′k∣n⟩+k∑Ek(X)∫dX′⟨X∣X′⟩⟨x∣k⟩⟨X′k∣n⟩=−k∑i∑∫dX′δi(2)(X−X′)⟨x∣k⟩φkn(X′)+k∑Ek(X)⟨x∣k⟩φkn(X)=k∑(−∇2+Ek(X))φkn(X)χk(x)=Enk∑φkn(X)χk(x)=En⟨Xx∣n⟩.
In fact, because all the φkn(X) are degenerate, any linear combination
k∑βkφkn(X)χk(x)
is also an eigenstate of H^, including those that only include a single term.
(I think that the above implies that ∇H^f(X)=0, so ∇Ek(X)=0.
Proving this or giving a counterexample is left as an exercise for the reader.)
However, in practice the NACTs don't disappear by themselves.
If that were the case, the word "approximation" wouldn't appear on this page.
Instead, we create a new Hamiltonian H^′ which has the same diagonal elements as H^, but with the couplings artificially set to zero.
In this more realistic scenario, it's not the case that all the surfaces are identical.
Still, because they are not coupled in this approximation, they may be dealt with independently.
When the non-adiabatic couplings are sufficiently strong that they can't be neglected, more sophisticated methods are necessary to treat more than one surface at a time; for example, switching to a diabatic representation.
This is commonly termed going "beyond the Born–Oppenheimer approximation", and is beyond the scope of the present work.