Recitation notes


April 4, 2008

1 Norms

1.1 lp norms for p 1 satisfy the triangle inequality (based on Royden)

We wish to show that ∥x + y∥p ∥x∥p + ∥y∥p for p 1.

For p = , this is maxi|x + y |
 i   i maxi(|x |+ |y |)
  i    i maxi|x |
  i + maxi|y|
  i.

For 1 p < , first assume without loss that x,y0 (the inequality is trivial if either vector is zero). Let α = ∥x∥p, and β = ∥y∥p, from which we define x0 = 1αx, and y0 = 1βy. Then:

∥x+ y∥p  =  ∥αx0 + β∥y0∥p            ∥
                   ∥∥--α--    --β--  ∥∥
         =  (α + β)∥α + βx0 + α + βy0∥p
                   (   |                   |) 1p
                    ∑d ||--α--      --β--   ||p
         =  (α + β)    |α + βx0,d + α +β y0,d|
                    k=1

Since f(x ) = |x|p is convex for 1 p < , and noting that αα+β-,αβ+β- 0 and αα+β- + αβ+β- = 1:

                   ( d∑  (                       ) ) 1p
∥x+ y∥   ≤  (α + β)      --α-- |x0,d|p +--β--|y0,d|p
      p              k=1  α + β        α + β
                   (                       ) 1p
         ≤  (α + β)  -α---∥x0∥pp +--β--∥y0∥pp
                     α+ β        α + β

Recalling that ∥x ∥
  0pp = ∥y∥
  0pp = 1:

∥x + y∥p ≤   |α + β|

WLOG But α = ∥x∥p and β = ∥y∥p, so:

∥x + y∥p ≤   ∥x∥p + ∥y∥p

1.2 Norms are convex

A norm, by definition, satisfies the triangle inequality. Suppose that ∥⋅∥ is a norm. Then, if λ ∈[0,1]:

∥λx + (1 - λ)y∥  ≤  ∥λx∥ +∥(1- λ) y∥

                ≤  λ∥x∥ + (1 - λ)∥y∥

2 Hessians

Suppose that f : ℝd ℝ is twice differentiable. The gradient of f is the vector of partial derivatives:

        ⌊     ⌋
          ∂∂xf1
        || ∂∂xf2 ||
▽f  =   ||  ..  ||
        ⌈  .∂f ⌉
          ∂xd

The Hessian is the matrix of partial second derivatives:

         ⌊   ∂22f-  --∂2f--  ⋅⋅⋅  -∂2f--⌋
         |  -∂∂x21f-- ∂x1∂2∂fx2      ∂x∂1∂2xfd-|
▽2f   =  ||  ∂x2∂x1    ∂x22    ⋅⋅⋅  ∂x2∂xd ||
         ||    ...       ...    ...    ...   ||
         ⌈  -∂2f-- --∂2f--       ∂2f  ⌉
            ∂xd∂x1  ∂xd∂x2  ⋅⋅⋅   ∂x2d

Note that since -∂2f-
∂xi∂xj = -∂2f-
∂xj∂xi, this matrix is symmetric. It’s important to realize that, like a derivative (or gradient), the Hessian is a function of the vector x–it takes on different values on different parts of the domain.

A word on notation: 2 is frequently used for the Laplacian operator 2f = i=1d∂∂22xf-
  i, but not in this class.

2.1 Positive definiteness

An eigenvalue is a scalar λ which satisfies the following:

Ax   =  λx

For some x0 (which is called an eigenvector). Note that if the above holds for x, it holds for αx where α0 is an arbitrary scalar. That is, the scale of eigenvectors is irrelevant. A n × n symmetric matrix will always have exactly n eigenvalue/eigenvector pairs, though the eigenvalues are not necessarily distinct. Additionally, the eigenvectors may always be taken to be orthogonal.

For small matrices, we may solve for the eigenvalues & eigenvectors explicitly:

[ 17  - 6 ] [ x1]      [ x1 ]
  - 6  8     x2   =   λ  x2
   [           ]       [    ]
     17x1 - 6x2   =   λ  x1
   [ - 6x1 + 8x2]     [  x2 ]
     (17- λ) x1   =     6x2
      (8- λ)x2          6x1

Soving for x1 gives that x1 = --6-
17-λx2 and x1 = 8-λ
 6x2, so that:

--6---     8--λ-
17- λ  =     6
   36  =   (8 - λ)(17- λ)
    0  =   136- 25λ+ λ2 - 36
            2
    0  =   λ - 2√5λ+-100---
    λ  =   25±---625--400
               √ 2--
           25±---225
       =       2
           25±-15-
       =     2
       ∈   {20,5}

A matrix is positive definite if all of its eigenvalues are strictly positive, and positive semi-definite if they are all nonnegative. Is the above matrix positive definite? (yes)

One simple test for positive (semi-)definiteness is that a matrix A is positive (semi-)definite if xTAx is positive (nonnegative) for all x0.

A twice-differentiable function is convex if its Hessian is positive semidefinite everywhere, and is strictly convex if its Hessian is positive definite.

The definitions of negative (semi-) definite and (strictly) concave are the same, with the signs reversed.

2.2 Quadratic-over-linear is convex (page 73 of Boyd)

Consider f(x,y) = x2
y- for y > 0. Then:

        [ ∂f ]
▽f  =     ∂∂xf
        [ ∂y  ]
           2xy-
    =     - x22
           y

And:

         [  2     2   ]
  2         ∂∂2fx-  ∂∂xf∂y
▽ f  =     -∂2f-  ∂22f
         [ ∂y∂x  ∂ y ]
            2y   - 2yx2
     =     - 2xy2-  2yx32

Which we may write as:

           [   2       ]
▽2f  =   23   y    - xy2
         y [ - xy]  x
     =   2-   y   [ y  - x ]
         y3  - x

By inspection, one eigenvector of [ -y x2y  -xx2y ] is [-yx ], with eigenvalue y2 + x2 0:

[   2       ] [    ]      [    ][        ][    ]
   y    - xy2    y     =     y     y  - x    y
  - xy   x      - x       [- x ]            - x
                      =     y   (y2 + x2)
                           - x

While the other eigenvalue is zero, since [  y2   - xy ]
  - xy   x2 has rank 1 (eigenvector [ x ]
  y). Since y23 0 for y > 0, this gives that:

▽2f   ≽  0

Showing that f is convex. Is it strictly convex? (no)

2.3 Arithmetic-geometric means inequality

Consider f(x ) =  ∏
(  ni=1 xi)1n for x > 0. We may write log f(x) = n1 i=1n log xi. By the concavity of the logarithm:

1 ∑n             ( 1 ∑n   )
n-   log xi ≤   log  n-   xi
  i=1                 i=1

Exponentiating both sides gives:

          1-∑n
f (x) ≤   n    xi
            i=1

Which is the arithmetic-geometric means inequality.

2.4 Geometric mean is concave (page 74 of Boyd)

Again consider f(x ) =  ∏n
(  i=1 xi)1
n for x > 0. Its gradient is:

           (∏n   ) 1n-1 ∏n
-∂f-  =  1-    xi     --i=1xi
∂xk      n  i=1         xk
             ( n   ) 1n
      =  -1-- ∏  x
         nxk  i=1 i

And its Hessian is:

   2       (    )2 ( n∏   ) 1n      ( n∏   ) 1n
  ∂-f-  =    -1--      xi   - --12    xi
  ∂2xk       nxk    i=1       nx k  i=1
           (       )    ( n   )n1
        =    -1 - 1- -1- ∏  xi
             n2   n  x2k  i=1
                      (  n  ) 1n
                  -1--- ∏
        =  (1- n )n2x2k  i=1 xi
                  (     ) 1
-∂2f---     --1---  n∏     n
∂xk∂xj  =   n2xkxj    xi
                    i=1

Which we may write as:

                      ⌊   n-1      1          1  ⌋
             (     ) 1    x21-   -x1x2  ⋅⋅⋅ - x1xn-
 2         1  ∏n     n|| - x12x1   nx-221   ⋅⋅⋅ - x12xn-||
▽ f  =   -n2     xi   ||    ..      ..    ..    ..   ||
              i=1     ⌈   -.1--   -.1-   .   n.-1  ⌉
                        - xnx1  -xnx2  ⋅⋅⋅   x2n
                      ( ⌊ xn2   0   ⋅⋅⋅   0  ⌋  ⌊   12-   x1x-- ⋅⋅⋅ x-1x- ⌋)
             (∏n   ) 1n| |  10  n-12  ⋅⋅⋅   0  |  |  -x11--  1122  ⋅⋅⋅ -11n ||
     =   --1     xi   || ||  .   x.2        .  ||- ||  x2x.1   x2.       x2x.n ||||
          n2  i=1     |( |⌈  ..   ..   ...   ..  |⌉  |⌈   ..     ..   ...   ..  |⌉|)
                           0   0   ⋅⋅⋅  nx-21       x1x- x-1x-  ⋅⋅⋅  x12
                      (                  n    ⌊  1 n1⌋⌊  n1 2⌋T)     n
             (     ) 1                           x1     x1
          -1  ∏n     n||      (-1  1-    -1-)  ||  1x2 ||||  1x2 || ||
     =   -n2     xi   || ndiag  x2 ,x2,...,x2n  - |⌈  ..  |⌉|⌈  ..  |⌉ ||
              i=1     (        1   2             .1-     .1-   )
                                                 xn     xn

Consider vT 2fv:

                  ( ∏n  ) 1n (  ∑n  2   (∑n   )2 )
vT (▽2f )v =  - 1-     xi   (n    vi2 -     vi   )
                n2  i=1         i=1 xi    i=1xi

Note that we may write i=1nv2
xi2i as:

           ∥⌊  v1-⌋ ∥2
           ∥∥   x1v   ∥∥
∑n  vi2     ∥∥||  2x2-|| ∥∥
    x2i  =  ∥∥|⌈  ... |⌉ ∥∥
i=1        ∥∥  -vn   ∥∥
              xn    2

And by the Cauchy-Schwarz inequality ((aT b)2 ∥a∥22∥b∥22) with a = 1 and bi = vi
xi:

(      )
 ∑n  vi 2       ∑n v2i
     xi    ≤  n    x2
 i=1            i=1  i

So that vT(▽2f )v 0, showing that f is concave.