By: Sadik Inverse and ImplicitMapping Theorems: Motivation and Heuristics Let’s start with a motivating example Say some nefarious agent gives you the equation and asks you to solve y + yx2 + 2y + = for y as a function of x Even though it’s hard to write down an explicit formula for y as a function of x, i.e solve the equation, if you were to sit down and numerically find pairs (x, y) satisfying the equation and plot them in the plane (or use Mathematica ), you’d see that it looks like y can be written down as a function of x, i.e it passes the so-called ”vertical line test” Contrast this to the case of x2 + y = 1, a circle, where you can’t write y as a function of x The question then becomes, what conditions you need on the defining equation to ensure that such an ”implicit function” exists? And if it does exist, given that the defining equation is sufficiently regular (i.e you can differentiate it a bunch of times), can we prove that the implicit function also shares its regularity? An important thing to note is that all of the results that we’ll end up obtaining will be local results, in the sense that close to a point (x0 , y0 ) on the defining curve, under certain conditions, we’ll be able to show that you can solve for y as a function of x near this point, but our results won’t be able to say anything about solving for y as a function of any x To be concrete, in the above case of the circle x2 + y = 1, we’ll be able to show that as long as y0 = 0, close to (x0 , y0 ) on the circle, you can solve for y as a function of x, but this clearly doesn’t extend to the whole circle Moving on from the motivation stage (but still in the pre-proof stage) let’s try to figure out what conditions we might need to guarantee the existence of a differentiable implicit function Generaizing somewhat, say you have a differentiable function G : R2 → R which defines an equation G(x, y) = Let’s also say that you can solve for y, so there exists a differentiable function φ(x) s.t G(x, y) = iff y = φ(x) If this were the case, we would have G(x, φ(x)) = Now, let’s try differentiating this equation with respect to x Using the Chain Rule, we’d get ∂G ∂x ∂x ∂x and if ∂G ∂y + ∂G ∂y φ (x) = = 0, we could say ∂G ∂x φ (x) = − ∂G ∂y so it seems like ∂G ∂y = is related to the implicit function existing and being differentiable BUT NOTE this is not a neccessary condition for the existence and differentiability of an implicit function φ(x) (as Edwards seems to suggest) as the example G(x, y) = x3 − y shows (Think about it at the origin ) However it will be a crucial condition for the theorem we’ll state later on Still in heuristic mode, let’s look at a consequence of being able to solve equations like G(x, y) = (given other conditions of course) If we take G(x, y) = x − f (y) for some function f , then being able to find a φ(x) like above s.t G(x, y) = iff y = φ(x), we would have G(x, φ(x)) = = x − f (φ(x)) → f (φ(x)) = x, i.e it becomes apparent that φ is f ’s inverse Hence our ability to solve for implicitly defined equations seems to allow us to find inverses to functions as well With this connection in mind, let’s see what conditions we might need to guarantee the existence/regularity of local inverses to functions First, the simple example of x2 This function doesn’t have an inverse in any neighborhood around the origin (Why?) and its derivative at the origin is Hmm, coincidence, I think not So maybe requiring the derivative of the function to be non-zero at a point is important to guaranteeing an inverse function (Again, this condition isn’t neccessary for an inverse to exist, as x3 demonstrates near the origin, but if you require the inverse to also be differentiable at the point, the condition becomes neccessary) But is this enough? That is, if we have a differentiable function with nonzero derivative at a point, is it locally bijective (i.e does it have a local inverse)? Well, I probably wouldn’t have asked these questions if the answer was yes so in fact the claim is false An illuminating counterexample (although most surely not the only one) is given by x + sgn(x) · x1.5 sin( x1 ) and is at x = If you plot this in Mathematica/Wolfram Alpha (try it!), it doesn’t pass the horizontal line test in a neighborhood near the origin (or doesn’t seem to I chose this example since it’s formula is fairly simple, but to construct a rock-solid counterexample it’s easier to something along the lines of what was done in class) but we can compute its derivative at x = 0, like we did in an earlier pset for similar functions, and it equals 1, which is surely non-zero So what’s the issue that’s causing the lack of injectivity near the origin? Well, if you compute the derivative of this function (and/or plot it), it isn’t continuous at the origin, which suggests that it’s the lack of continuity of the derivative that’s ruining our chances of getting a local inverse 2 Actual Statement of the Theorems Ok, now that we’ve gone through the exploratory/heuristic stages and we have an idea of what is needed to guarantee existence/regularity of implicit functions/inverses, I’m going to skip the months/years where we struggle to figure out the precise statement of the theorems and their proofs instead here are the basic cases of the Inverse and ImplicitMapping Theorems 1-D Inverse Function Theorem: Given a function f : R → R that is C (i.e differentiable and the derivative is continuous) in a neighborhood of a point a ∈ R AND f (a) = 0, then there exists a C function g defined in a neighborhood of f (a) and neighborhoods U and V of a and f (a) respectively s.t g(f (x)) = x for x ∈ U and f (g(y)) = y for y ∈ U 1-equation ImplicitMapping Theorem: Given a C function G : Rn+1 → R in a neighborhood of a point (x0 , y0 ) (x0 ∈ Rn and y0 ∈ R) AND ∂G ∂y (x0 , y0 ) = 0, then ∃ a C function φ : Rn → R and a local neighborhood U of (x0 , y0 ) s.t G(x, y) = iff y = φ(x) for (x, y) ∈ U and for good measure, here are the general, multivariable results Inverse Mapping Theorem: Given a function f : Rn → Rn that is C (i.e differentiable and the derivative is continuous) in a neighborhood of a point a ∈ R AND f (a) is an invertible linear map, then there exists a C mapping g defined in a neighborhood of f (a) and neighborhoods U and V of a and f (a) respectively s.t g(f (x)) = x for x ∈ U and f (g(y)) = y for y ∈ U Note here that derivatives are now linear maps/matrices w.r.t a given basis, so when we say the derivative is continuous, the simplest way to think about it is that the entries of the derivative matrix each vary continuously with the point the derivative is being evaluated at (Even though this conceptualization is basis dependent, it coincides with the more abstract definition) ImplicitMapping Theorem: Given a C function G : Rn+m → Rm in a neighborhood of a point (x0 , y0 ) (x0 ∈ Rn and y0 ∈ Rm ) AND ∂G ∂y (x0 , y0 ) is invertible, then ∃ a C function φ : Rn → Rm and a local neighborhood U of (x0 , y0 ) s.t G(x, y) = iff y = φ(x) for (x, y) ∈ U A simple way to think about what of G , an m x (n + m) matrix, then ∂G ∂y ∂G ∂y means is that if you look at the matrix corresponds to the matrix formed by the last m columns of G , which makes it an m x m matrix (See Edwards for the actual basis-independent definition and how to work with these generalized partials) And the heuristic I use to remember the dimensions of the spaces (like which one is m + n, is the target space m or n, what is φ mapping between, etc.) is to think of G as defining m equations of your n + m variables, so setting all these equations to takes out m degrees of freedom from the equations, so there are n degrees of freedom left Hence, the last m variables should be completely dependent on the first n variables, i.e y ∈ Rm should depend on x ∈ Rn So the proof of the Inverse Mapping Theorem will be given in the next lecture, but it turns out that one can derive the ImplicitMapping Theorem from the Inverse Mapping Theorem AND vice-versa (i.e we only have to prove one of them later) To see why this is the case, assume that we know that the ImplicitMapping Theorem is true If we let G(x, y) = x − f (y) for x, y ∈ Rn , and if the conditions of the Inverse Mapping Theorem are satisfied, then G becomes C (since f is) and ∂G ∂y = −f (y), which is invertible at (x0 , y0 ) (again, by assumption) Hence we can apply the ImplicitMapping Theorem to G(x, y), which gives us locally a C mapping g : Rn → Rn s.t G(x, y) = iff y = g(x), i.e G(x, g(x)) = x − f (g(x)) = → f (g(x)) = 0, giving us g as f ’s local inverse For the other direction, assume that we know the Inverse Mappin’ Theorem is true Assuming the conditions of the ImplicitMapping Theorem are satisfied, define the function f (x, y) = (x, G(x, y)), i.e f : Rn+m → Rn+m Since G is C it follows that f is a composition of C mappings and hence is C Computing the derivative of f and writing its matrix in block-form, we get ∂x ∂x ∂G ∂x ∂x ∂y ∂G ∂y = In ∂G ∂x 0n,m ∂G ∂y where In is the n x n identity matrix (again, if you’re worried about why I can still this even though x and y are vectors, check out Edwards, or think about everything componentwise) The determinant of this matrix is equal to det( ∂G ∂y ) (work this out entry-wise using the permutation formula for the determinant if this calculation isn’t apparent), so it is non-zero (by assumption again), hence f is invertible Applying the Inverse Mapping Theorem to f , we get locally a C inverse mapping h(x, y) = (h1 (x, y), h2 (x, y)) to f (here h1 and h2 are just the ”coordinate” functions of h) so that h(f (x, y)) = (x, y) It follows by definition that h(x, G(x, y)) = (x, y) Now, if G(x, y) = 0, then h(x, 0) = (h1 (x, 0), h2 (x, 0)) = (x, y) and entrywise, we see that h1 (x, 0) = x and y = h2 (x, 0) So if we take φ(x) = h2 (x, 0) to be our C implicitly defined function, we’ve shown that G(x, y) = implies that y = φ(x) For the other direction, since f (h(x, y)) = (x, y) as well, we have (h1 (x, y), G(h1 (x, y), h2 (x, y))) = (x, y) so letting y = (possible since f (x0 , y0 ) = (x0 , 0), so points with y = are in a neighborhood of (x0 , 0)) and using our previous results, we get that (x, G(x, φ(x)) = (x, 0) → G(x, φ(x)) = I.e if y = φ(x), then G(x, y) = G(x, φ(x)) = 0, and we’re done In the above discussion I ignored a lot of issues about the neighborhoods where things were defined, but if you’re interested, Edwards’ proofs cover that much better And the main takeaway is that the Inverse and ImplicitMapping Theorems are equivalent (once you know the general forms that work in any dimension) The Banach Contraction Principle As I said, the Inverse Mapping Theorem will be proven in the next lecture, but we’ll prove here a very important, general principle that can be used all over the place to establish local existence of things (inverses, zeroes, solutions to differential equations, etc.) and that we’ll use in the proof of the IMT, namely the Banach Contraction Principle It usually even gives you an algorithm to compute the thing you’re trying to show exists, as we’ll see in a minute I’ll give the theorem and proof in its most general setting (in the context of metric spaces) but we’ll only end up using it in Rn , so don’t worry too much if it seems a bit weird at first The setup is as follows: You have a complete metric space X, and a contraction mapping φ : X → X that has the property that ∃k < s.t d(φ(x), φ(y)) ≤ k · d(x, y) ∀x, y ∈ X (i.e it contracts the distance between any two points, hence the name) The theorem then states that ∃ a unique fixed point x ∈ X, i.e a point such that φ(x) = x, and it can be obtained by taking any point x0 ∈ X, defining a sequence of points xn+1 = φ(xn ), and then finding the limit of this sequence (it’ll turn out to converge), i.e you take any point in the space, and then repeatedly apply the contraction map to it Before we get to the proof, let me give a few tips on usage Where we’ll be using it, Euclidean spaces, we’ll normally have our candidate contraction map φ already figured out based on its fixed point equation, φ(x) = x, leading to something we desire (an inverse, a root, etc.) The next thing to is to find a suitable closed set (remeber subsets of Rn whose induced metric space is complete are closed sets) that the map preserves This is a crucial, subtle thing to do, as the theorem only applies if φ maps X to itself, so be wary of this (I forget this step all the time ) To show that it’s a contraction, for our purposes, we’ll usually have to resort to the Multivariable Mean-Value Inequality If you don’t know what this is, that’s fine (but check out Edwards’ section on it, because it’s fairly important to know on at least some level), the idea is similar to the 1-dimensional case In that case, if you have a differentiable function on R and the derivative is bounded by a constant k < everywhere in the region of interest, note that the MVT says that |f (b) − f (a)| = |f (c)(b − a)| = |f (c)||b − a| ≤ k|b − a|, i.e its a contraction So in general, in our applications, something akin to this will have to be done to show that a map is a contraction And lastly, φ satsifies the weaker condition that ∃C s.t d(φ(x), φ(y)) ≤ C · d(x, y) (we’re not requiring C to be less than here), which is known as Lipschitz continuity The word continuous is in the name, because as you probably already guessed, such mappings are continuous (Why?) However the converse is not true, i.e continuity does not imply Lipschitz continuity (Can you think of a example of such a function? Don’t overthink this one ) Ok, here goes the proof Let’s first show that a fixed point of φ exists, and later we’ll show it’s unique To find it, we execute the algorithm I mentioned eariler: take some point x0 ∈ X and define a sequence of points via xn+1 = φ(xn ) Now we need to worry about if this sequence actually converges But since X is complete, this is the same as asking if it’s Cauchy (which is a bit easier b/c we don’t need to know what the limit is explicitly) To see that it’s Cauchy, we need to get a bound on terms like d(xn+m , xn ) How we this? Well, a little trickiness and the Triangle Inequality goes a long way so first note that we have the following inequality via the fact that φ is a contraction mapping d(xn+1 , xn ) = d(φ(xn ), φ(xn−1 )) ≤ k · d(xn , xn−1 ) and doing this trick repeatedly, we see that d(xn+1 , xn ) ≤ k n d(x1 , x0 ) Now, repeated use of the Triangle Inequality tells us that d(xn+m , xn ) ≤ d(xn+m , xn+m−1 ) + d(xn+m−1 , xn ) ≤ ≤ d(xn+m , xn+m−1 ) + + d(xn+i , xn+i−1 ) + + d(xn+1 , xn ) and so using the previous inequalities we got, we get d(xn+m , xn ) ≤ k n+m−1 d(x1 , x0 ) + + k n d(x1 , x0 ) = k n d(x1 , x0 )(1 + k + + k m−1 ) but the geometric series in the above equation is definitely less than the infinite series + k + k + = 1−k , so we get the inequality d(xn+m , xn ) ≤ kn 1−k d(x1 , x0 ) and this is precisely what we need to show that the sequence is Cauchy, kN d(x1 , x0 ) < , in because given any > 0, we can choose N so large so that 1−k which case ∀n, m > N , d(xn , xm ) ≤ k n is a decreasing function of n kmin(n,m) d(x1 , x0 ) 1−k ≤ kN 1−k d(x1 , x0 ) < since Ok, so we know that this sequence xn is Cauchy, and thus converges to a limit, say x∗ So what? Well, that limit is probably the fixed point but how we check this? We note that φ(x∗ ) = φ( lim xn ) = lim φ(xn ) = lim xn+1 = x∗ n→∞ n→∞ n→∞ where we used the fact that φ is continuous, and thus respects convergent sequences (i.e it’s sequentially continuous) Nice, we”ve almost got everything! The final thing to show is that this fixed point is unique Doing the usual thing, let’s say it wasn’t unique, and that there existed distinct fixed points x∗ and x∗∗ Hmm, I wonder what would happen if we applied φ to them I mean, it’s supposed to bring them closer, but they’re fixed points well let’s see d(x∗ , x∗∗ ) = d(φ(x∗ ), φ(x∗∗ )) ≤ k · d(x∗ , x∗∗ ) but wait, k < 1, and this is telling us that d(x∗ , x∗∗ ) ≤ k · d(x∗ , x∗∗ ), which could only be possible if d(x∗ , x∗∗ ) = 0, i.e if x∗ = x∗∗ , contrary to our assumption Hence the fixed point is unique That’s all folks Tune in next time for the proof of the Inverse Mapping Theorem P.S If you have any comments/feedback/suggestions on my lecture notes, I’d love to hear about them so I can improve to help you guys learn better E-mail would be preferable (smoke signals would not be ) Thanks! ... So the proof of the Inverse Mapping Theorem will be given in the next lecture, but it turns out that one can derive the Implicit Mapping Theorem from the Inverse Mapping Theorem AND vice-versa... the case, assume that we know that the Implicit Mapping Theorem is true If we let G(x, y) = x − f (y) for x, y ∈ Rn , and if the conditions of the Inverse Mapping Theorem are satisfied, then G... invertible at (x0 , y0 ) (again, by assumption) Hence we can apply the Implicit Mapping Theorem to G(x, y), which gives us locally a C mapping g : Rn → Rn s.t G(x, y) = iff y = g(x), i.e G(x, g(x)) =