Friday, December 12, 2014

A Tutorial on android.hardware.camera2.*

Introduction

Since android.hardware.camera is deprecated after API V21, we use android.hardware.camera2.*. The related reference is http://developer.android.com/reference/android/hardware/camera2/package-summary.html . But this is a kind of a dictionary. We need simple tutorials. There is no good sample code even in http://developer.android.com .


After long googling I found good one :
http://blog.csdn.net/torvalbill/article/details/40378539 . And I made my own document focused on only Camera Preview.

All of my code is here : https://github.com/stereoboy/android_samples

Setup All of related objects

The following list is as follows:

  • CameraManager
    • Select Camera
    • Create CameraDevice
  • CameraDevice
    • Create CaptureRequest
    • Create CameraCaptureSession
  • CaptureRequest, CaptureRequest.CameraBuilder
    • Link Surface for Viewing
    • Make CaptureRequest
  • CameraCaptureSession
    • Capture Camare Image and put the result on the Surface registered in CaptureRequest.

One of reasons why the related objects are a little bit complex is most of set-up process is done asynchronously. For example when we create CameraDevice, at first we register a callback function on CameraManager. When CameraDevice is created and the callback is called, we get CameraDevice in the callback code.

Repeated Capture by using HandlerThread

Main thread should not be blocked by Camera Capture. So we set up a background thread using HandlerThread and Handler

This handler in background thread do repeatedly Capturing set by CaptureRequest and CameraCaptureSession.


Saturday, November 29, 2014

Dimension Reduction using spectral decomposition (1) - LDA


Introduction

Most of texts including textbooks and web articles describe FLD(Fisher Linear Discriminant) as LDA instead of classical LDA. However if you survey academical papers released 10 years ago, you will find more simpler LDA than FLD.

The issues that LDA concerns are totally different from PCA's. The concept of "group or class" is added into LDA. PCA deals with individual samples. LDA focuses on classifications of sample groups.

PCA can be used as a recognition/classification approach by itself. But actually PCA is used as preprocessing of dimention reduction for more complex methods for classification or recognition. And some strategy used in PCA process is already melt inside LDA.

My reference on web is :
  1. https://onlinecourses.science.psu.edu/stat557/node/37
  2. http://en.wikipedia.org/wiki/Linear_discriminant_analysis
Before discussion on LDA, I will explain Mahalanobis Distance and Multivariate Normal Distribution. And then QDA, complicated version of LDA. Finally LDA. All of things in this articles are based on simillar process using covariance matrix.

Unfortunately this classical LDA needs 2 strict assumptions, identical covaraince matrix and multivariate normal distribution. They make hard to apply LDA on real world problems. But its strategy is simple. and through traveling to LDA here, we can understand relationships between a lot of contents look simillar but never explained in one shot.


Euclidean Distance and Mahalanobis Distance

There are so many types of distance metric between two vectors or features. Euclidean Distance is the simplest.
\begin{equation} D_{Euclidean}(\mathbf{x})=\sqrt{(\mathbf{x}-\mathbf{y})^{T}(\mathbf{x}-\mathbf{y})} \end{equation}

As describe in the figure above, if we select euclidean distance, we don't care about the distribution of sample group and its covariances.

 
In the calculation of Mahalanobis Distance, normalization transform on bases is added just before Euclidean Distance calculation.
\begin{equation} D_{Mahalanobis}(\mathbf{x})=\sqrt[]{(\mathbf{x}-\mathbf{y})^{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mathbf{y})} \end{equation}


Let's deep-dive into the $\Sigma^-1$.
We already look over covariance matrix $\Sigma$ in the previous article of PCA.

\begin{eqnarray*} \Sigma & = & \left[\begin{array}{ccc} \sigma_{11}^{2} & \cdots & \sigma_{1D}^{2}\\ \vdots & \ddots & \vdots\\ \sigma_{D1}^{2} & \cdots & \sigma_{DD}^{2} \end{array}\right]\\ & \Downarrow & diagonalization\\ & = & Q\varLambda Q^{T}\\ & = & \left[\begin{array}{cccc} \mathbf{e}_{1} & \mathbf{e}_{2} & \cdots & \mathbf{e}_{D}\end{array}\right]\left[\begin{array}{ccc} \lambda_{1} & \cdots & 0\\ \vdots & \ddots & \vdots\\ 0 & \cdots & \lambda_{D} \end{array}\right]\left[\begin{array}{c} \mathbf{e}_{1}^{T}\\ \mathbf{e}_{2}^{T}\\ \vdots\\ \mathbf{e}_{D}^{T} \end{array}\right] \end{eqnarray*}

$\lambda$s can be replaced with "NEW" variances, $\sigma^2$s.

\begin{eqnarray*} \Sigma & = & Q\varLambda Q^{T}\\ & = & \left[\begin{array}{cccc} \mathbf{e}_{1} & \mathbf{e}_{2} & \cdots & \mathbf{e}_{D}\end{array}\right]\left[\begin{array}{ccc} \sigma_{11}^{2} & \cdots & 0\\ \vdots & \ddots & \vdots\\ 0 & \cdots & \sigma{}_{DD}^{2} \end{array}\right]\left[\begin{array}{c} \mathbf{e}_{1}^{T}\\ \mathbf{e}_{2}^{T}\\ \vdots\\ \mathbf{e}_{D}^{T} \end{array}\right] \end{eqnarray*}
 
The matrix consisting of $\mathbf{e}^T_d$s means transformation to new bases ${\mathbf{e}_d}$.



\begin{eqnarray*} \Sigma^{-1} & = & Q\varLambda^{-1}Q^{T}\\ & = & \left[\begin{array}{cccc} \mathbf{e}_{1} & \mathbf{e}_{2} & \cdots & \mathbf{e}_{D}\end{array}\right]\left[\begin{array}{ccc} \frac{1}{\sigma{}_{11}^{2}} & \cdots & 0\\ \vdots & \ddots & \vdots\\ 0 & \cdots & \frac{1}{\sigma{}_{DD}^{2}} \end{array}\right]\left[\begin{array}{c} \mathbf{e}_{1}^{T}\\ \mathbf{e}_{2}^{T}\\ \vdots\\ \mathbf{e}_{D}^{T} \end{array}\right]\\ & = & \left(\left[\begin{array}{cccc} \mathbf{e}_{1} & \mathbf{e}_{2} & \cdots & \mathbf{e}_{D}\end{array}\right]\left[\begin{array}{ccc} \frac{1}{\sigma{}_{11}} & \cdots & 0\\ \vdots & \ddots & \vdots\\ 0 & \cdots & \frac{1}{\sigma{}_{DD}} \end{array}\right]\right)\left(\left[\begin{array}{ccc} \frac{1}{\sigma{}_{11}} & \cdots & 0\\ \vdots & \ddots & \vdots\\ 0 & \cdots & \frac{1}{\sigma{}_{DD}} \end{array}\right]\left[\begin{array}{c} \mathbf{e}_{1}^{T}\\ \mathbf{e}_{2}^{T}\\ \vdots\\ \mathbf{e}_{D}^{T} \end{array}\right]\right) \end{eqnarray*}

$\frac{1}{\sigma}$ means unit-varaince normalization.


Multivariate Normal Distribution

Multivariate Normal Distribution(MND) is :

\begin{equation} \mathcal{N}_{D}(\mu,\Sigma)=\frac{1}{\sqrt{2\pi^{D}\left|\Sigma\right|}}\exp\left(-\frac{1}{2}(\mathbf{x}-\mu)^{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mu)\right) \end{equation}

where $D=\dim\left(\mathbf{x}\right)$.

We can see a term of squared Mahalanobis Distance. $(\mathbf{x}-\mu)^T\Sigma^{-1}(\mathbf{x}-\mu)$ means the squared Mahalanobis Distance from $\mu$. Determinant of $\Sigma$ is a product of variances $\prod_{d}\sigma_d^2$.


QDA (Quadratic Discriminant Analysis)

QDA is more general form of LDA. For QDA We need a important assumption that all classes have a normal distribution (MND).



If we want to discriminate class k and h, we have the following discriminant system.

\begin{equation} \frac{g_{k}(\mathbf{x})}{g_{h}(\mathbf{x})}>T_{k\text{ against }h} \end{equation} \begin{eqnarray*} g_{k}(\mathbf{x}) & = & p(\mathbf{x}|C_{k})p(C_{k})\\ & = & \frac{1}{\sqrt{2\pi^{D}\left|\Sigma_{k}\right|}}\exp\left(-\frac{1}{2}(\mathbf{x}-\mu_{k})^{T}\mathbf{\Sigma}_{k}^{-1}(\mathbf{x}-\mu_{k})\right)p(C_{k}) \end{eqnarray*} \begin{eqnarray*} \frac{g_{k}(\mathbf{x})}{g_{h}(\mathbf{x})} & = & \frac{\frac{1}{\sqrt{2\pi^{D}\left|\Sigma_{k}\right|}}\exp\left(-\frac{1}{2}(\mathbf{x}-\mu_{k})^{T}\mathbf{\Sigma}_{k}^{-1}(\mathbf{x}-\mu_{k})\right)p(C_{k})}{\frac{1}{\sqrt{2\pi^{D}\left|\Sigma_{h}\right|}}\exp\left(-\frac{1}{2}(\mathbf{x}-\mu_{h})^{T}\mathbf{\Sigma}_{h}^{-1}(\mathbf{x}-\mu_{h})\right)p(C_{h})}\\ & = & \frac{\left|\Sigma_{k}\right|^{-\frac{1}{2}}\exp\left(-\frac{1}{2}(\mathbf{x}-\mu_{k})^{T}\mathbf{\Sigma}_{k}^{-1}(\mathbf{x}-\mu_{k})\right)p(C_{k})}{\left|\Sigma_{h}\right|^{-\frac{1}{2}}\exp\left(-\frac{1}{2}(\mathbf{x}-\mu_{h})^{T}\mathbf{\Sigma}_{h}^{-1}(\mathbf{x}-\mu_{h})\right)p(C_{h})} \end{eqnarray*}

It is based on simple Bayesian Decision. Let's refine the formula by using log-scale and the general notation, $\pi_{k}=p(C_{k})$

\begin{eqnarray*} \ln\left(\frac{g_{k}(\mathbf{x})}{g_{h}(\mathbf{x})}\right) & = & -\frac{1}{2}\ln\left|\Sigma_{k}\right|+\left(-\frac{1}{2}(\mathbf{x}-\mu_{k})^{T}\mathbf{\Sigma}_{k}^{-1}(\mathbf{x}-\mu_{k})\right)+\ln\pi_{k}+\frac{1}{2}\ln\left|\Sigma_{h}\right|-\left(-\frac{1}{2}(\mathbf{x}-\mu_{h})^{T}\mathbf{\Sigma}_{h}^{-1}(\mathbf{x}-\mu_{h})\right)-\ln\pi_{h} \end{eqnarray*}

The discriminant function has a quadratic form of vectors. This shows why its name is "Q"DA.


LDA (Linear Discriminant Analysis)

LDA need one more assumption. $\Sigma$s are same.

It looks unrealistics. But here go ahead.

\begin{eqnarray*} \ln\left(\frac{g_{k}(\mathbf{x})}{g_{h}(\mathbf{x})}\right) & = & -\frac{1}{2}\ln\left|\Sigma\right|+\left(-\frac{1}{2}(\mathbf{x}-\mu_{k})^{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mu_{k})\right)+\ln\pi_{k}+\frac{1}{2}\ln\left|\Sigma\right|-\left(-\frac{1}{2}(\mathbf{x}-\mu_{h})^{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mu_{h})\right)-\ln\pi_{h}\\ & = & \left(-\frac{1}{2}(\mathbf{x}-\mu_{k})^{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mu_{k})\right)+\ln\pi_{k}-\left(-\frac{1}{2}(\mathbf{x}-\mu_{h})^{T}\mathbf{\Sigma}^{-1}(\mathbf{x}-\mu_{h})\right)-\ln\pi_{h}\\ & = & -\frac{1}{2}\left(\mathbf{x}^{T}\mathbf{\Sigma}^{-1}\mathbf{x}-2\mu_{k}^{T}\mathbf{\Sigma}^{-1}\mathbf{x}+\mu_{k}^{T}\mathbf{\Sigma}^{-1}\mu_{k}\right)+\frac{1}{2}\left(\mathbf{x}^{T}\mathbf{\Sigma}^{-1}\mathbf{x}-2\mu_{h}^{T}\mathbf{\Sigma}^{-1}\mathbf{x}+\mu_{h}^{T}\mathbf{\Sigma}^{-1}\mu_{h}\right)+\ln\pi_{k}-\ln\pi_{h}\\ & = & -\frac{1}{2}\left(-2\mu_{k}^{T}\mathbf{\Sigma}^{-1}\mathbf{x}+\mu_{k}^{T}\mathbf{\Sigma}^{-1}\mu_{k}\right)+\frac{1}{2}\left(-2\mu_{h}^{T}\mathbf{\Sigma}^{-1}\mathbf{x}+\mu_{h}^{T}\mathbf{\Sigma}^{-1}\mu_{h}\right)+\ln\pi_{k}-\ln\pi_{h}\\ & = & (\mu_{k}-\mu_{h})^{T}\mathbf{\Sigma}^{-1}\mathbf{x}-\frac{1}{2}(\mu_{k}+\mu_{h})^{T}\mathbf{\Sigma}^{-1}(\mu_{h}-\mu_{k})+\ln\pi_{k}-\ln\pi_{h} \end{eqnarray*}

As you see, the discriminant boundary changes from a curve to a straight line (Linear)





Conclusion

Covariance matrix is everywhere. The contents described here are so feasible that most of researchers and developers do not need to check again. But someone who doesn't have concrete mathematical or statistical funcdamentals cannot catch these common pieces and can be confused easily.

Syntax-Highlighter on blogspot

  • Select "Template" in Setting
  • "Edit HTML"

  • Add the following code in <head> </head>



<link href="http://alexgorbatchev.com/pub/sh/current/styles/shCore.css" rel="stylesheet" type="text/css"></link>
<link href="http://alexgorbatchev.com/pub/sh/current/styles/shThemeEmacs.css" rel="stylesheet" type="text/css"></link>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js" type="text/javascript">
 
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushAS3.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushBash.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushColdFusion.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushCSharp.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushCpp.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushCss.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushDelphi.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushDiff.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushErlang.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushGroovy.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushJScript.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushJava.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushJavaFX.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPerl.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPhp.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPlain.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPowerShell.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushPython.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushRuby.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushScala.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushSql.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushVb.js' type='text/javascript'/>
<script src='http://alexgorbatchev.com/pub/sh/current/scripts/shBrushXml.js' type='text/javascript'/>
  
<script language='javascript' type='text/javascript'>
 SyntaxHighlighter.config.bloggerMode = true;
 SyntaxHighlighter.all();
</script>
<!-- Syntax Highlighter Additions END -->
  •  Add the following javascript code at the end of every post in HTML mode
<pre class="brush:c;"><code>&lt;script type="text/javascript"&gt;
 SyntaxHighlighter.highlight();
&lt;/script&gt;</code>
</pre>

<script type="text/javascript">
 SyntaxHighlighter.highlight();
</script>

Dimension Reduction using spectral decomposition (0) - PCA


Introduction

I survey most of dimensional reduction methods by using spectral decomposition such as PCA(Principal Component Analysis), LDA(Linear Discriminant Analysis), FLD(Fischer Linear Discriminant), MDA(Multivariate Discriminant Analysis), CCA(Cannonical Correlation Analysis). The following description is in chronological order. Whenever we move to the next methodology, the conditions of problems are more complicated. There are so many articles about these methodology. But I could not found any report that focus on the relationship between methods. I need to know why our predecessors developed these:  from PCA to CCA or other more complicated method. My final goal is setup a unified  table of all methods using same mathematical notation for easier understanding.



Line Fitting on Sample Points

Before we start to discuss PCA, let's review typical line fitting problem ussing sum of squared errors.
All of basic machine learning methods are based on minimzing sum of squared errors.



Let's see a regression line $\mathbf{\mathbf{x}_{0}+}a\mathbf{e}$ (where $\left\Vert \mathbf{e}\right\Vert =1$ ) on the sample $\mathbf{x}_{n}$.

Objective function $f$ to minimize is : \begin{equation} f(\mathbf{x}_{0},\mathbf{e},a_{0},...a_{N})=\sum_{n}^{N}\left\Vert \mathbf{x}_{n}-(\mathbf{x}_{0}+a_{n}\mathbf{e})\right\Vert ^{2} \end{equation} where $\mathbf{x}_{0}+a_{n}\mathbf{e}$ is the closed point to $\mathbf{x}_{n}$ on the regression line.

This sum of squared errors (SSE) is just one among several candidates of cost functions. L1 norm or other norms can be the cost function. If a specific cost function of error is given, we must use it instead of SSE. The reason why most of standard methods use it is easy to handle the formula. SSE is differentiable, so we can analyze its parameters in closed form expressions. (c.f. In logistics regression we should do numerical analysis.)

I will check the relationship between the regression line and $\mu$, mean of $\mathbf{x}_{n}$.

At first, calculate the derivative of $a_{n}$in terms of $\mathbf{x}_{0},\mathbf{x}_{n},\mathbf{e}$. \begin{equation} f(\mathbf{x}_{0},\mathbf{e},a_{0},...a_{N})=\sum_{n}^{N}\left\Vert a_{n}\mathbf{e}+\mathbf{\mathbf{x}}_{0}-\mathbf{x}_{n}\right\Vert ^{2} \end{equation} \begin{eqnarray*} \frac{\partial f}{\partial a_{n}} & = & \frac{\partial\left(\sum_{n}^{N}\left\Vert a_{n}\mathbf{e}+\mathbf{\mathbf{x}}_{0}-\mathbf{x}_{n}\right\Vert ^{2}\right)}{\partial a_{n}}=\frac{\partial\left(\left\Vert a_{n}\mathbf{e}+\mathbf{\mathbf{x}}_{0}-\mathbf{x}_{n}\right\Vert ^{2}\right)}{\partial a_{n}}\\ & = & \frac{\partial\left(a_{n}^{2}\left\Vert \mathbf{e}\right\Vert ^{2}+2a_{n}\mathbf{e}^{T}(\mathbf{\mathbf{x}}_{0}-\mathbf{x}_{n})+\left\Vert \mathbf{\mathbf{x}}_{0}-\mathbf{x}_{n}\right\Vert ^{2}\right)}{\partial a_{n}}\\ & = & 2a_{n}+2\mathbf{e}^{T}(\mathbf{\mathbf{x}}_{0}-\mathbf{x}_{n}) \end{eqnarray*}

We can get $\underset{a_{n}}{\arg\min}(f)$ by ${\displaystyle \frac{\partial f}{\partial a_{n}}=0}$. \begin{equation} a_{n}=\mathbf{e}^{T}(\mathbf{\mathbf{x}}_{n}-\mathbf{x}_{0}) \end{equation}

This means that projection of $(\mathbf{\mathbf{x}}_{n}-\mathbf{x}_{0})$ on $\mathbf{e}$. Actually most of books skip this proof, since it is a basic linear algebra.

Now analyze $f$ with respect to $\mathbf{x}_{0}$.

In expanding $f$ I use $a_{n}$ instead of $\mathbf{e}^{T}(\mathbf{\mathbf{x}}_{n}-\mathbf{x}_{0})$ for a moment. It is just for easier calculation. \begin{eqnarray*} f(\mathbf{x}_{0},\mathbf{e}) & = & \sum_{n}^{N}\left\Vert \mathbf{x}_{0}-\mathbf{x}_{n}+a_{n}\mathbf{e}\right\Vert ^{2}\\ & = & \sum_{n}^{N}\left(\left\Vert \mathbf{x}_{0}-\mathbf{x}_{n}\right\Vert ^{2}+2(a_{n}\mathbf{e})^{T}(\mathbf{x}_{0}-\mathbf{x}_{n})+\left\Vert a_{n}\mathbf{e}\right\Vert ^{2}\right)\\ & = & \sum_{n}^{N}\left\Vert \mathbf{x}_{0}-\mathbf{x}_{n}\right\Vert ^{2}+2\sum_{n}^{N}a_{n}\overbrace{\left(\mathbf{e}{}^{T}(\mathbf{x}_{0}-\mathbf{x}_{n})\right)}^{=-a_{n}}+\sum_{n}^{N}a_{n}^{2}\overbrace{\left\Vert \mathbf{e}\right\Vert ^{2}}^{=1}\\ & = & \sum_{n}^{N}\left\Vert \mathbf{x}_{0}\right\Vert ^{2}-2\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{x}_{0}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}\right\Vert ^{2}-2\sum_{n}^{N}a_{n}(a_{n})+\sum_{n}^{N}a_{n}^{2}\\ & = & \sum_{n}^{N}\left\Vert \mathbf{x}_{0}\right\Vert ^{2}-2\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{x}_{0}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}\right\Vert ^{2}-\sum_{n}^{N}a_{n}^{2}\\ & = & \sum_{n}^{N}\left\Vert \mathbf{x}_{0}\right\Vert ^{2}-2\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{x}_{0}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}\right\Vert ^{2}-\sum_{n}^{N}(\mathbf{x}_{0}-\mathbf{x}_{n})^{T}\mathbf{e}\mathbf{e}^{T}(\mathbf{x}_{0}-\mathbf{x}_{n})\\ & = & N\left\Vert \mathbf{x}_{0}\right\Vert ^{2}-2\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{x}_{0}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}\right\Vert ^{2}-\sum_{n}^{N}\left(\mathbf{x}_{0}^{T}\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{0}-2\mathbf{x}_{n}^{T}\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{0}+\mathbf{x}_{n}^{T}\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{n}\right)\\ & = & N\left\Vert \mathbf{x}_{0}\right\Vert ^{2}-\sum_{n}^{N}\mathbf{x}_{0}^{T}\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{0}-2\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{x}_{0}+2\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{0}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}\right\Vert ^{2}-\sum_{n}^{N}\mathbf{x}_{n}^{T}\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{n} \end{eqnarray*} Similarly to the case of $a_{n}$, set ${\displaystyle \frac{\partial f}{\partial\mathbf{x}_{0}}=0}$ \begin{equation} \frac{\partial f(\mathbf{x}_{0},\mathbf{e})}{\partial\mathbf{x}_{0}}=2N\mathbf{\mathbf{x}_{0}}-2N\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{0}-2\sum_{n}^{N}\mathbf{x}_{n}+2\mathbf{e}\mathbf{e}^{T}\sum_{n}^{N}\mathbf{x}_{n} \end{equation} We don't need to build a closed form of $\mathbf{x}_{0}$. Just consider it in this implicit form. \begin{eqnarray*} \frac{\partial f(\mathbf{x}_{0},\mathbf{e})}{\partial\mathbf{x}_{0}} & = & 0\\ & = & 2N\mathbf{\mathbf{x}_{0}}-2N\mathbf{e}\mathbf{e}^{T}\mathbf{x}_{0}-2\sum_{n}^{N}\mathbf{x}_{n}+2\mathbf{e}\mathbf{e}^{T}\sum_{n}^{N}\mathbf{x}_{n}\\ & = & 2N\mathbf{\mathbf{x}_{0}}-2\sum_{n}^{N}\mathbf{x}_{n}+2\mathbf{e}\mathbf{e}^{T}\sum_{n}^{N}(\mathbf{x}_{n}-\mathbf{x}_{0})\\ & = & 2N\mathbf{\mathbf{x}_{0}}-2\sum_{n}^{N}\mathbf{x}_{n}+2\mathbf{e}\sum_{n}^{N}a_{n} \end{eqnarray*} \begin{equation} \frac{\partial f}{\partial\mathbf{x}_{0}}=0\Longleftrightarrow\frac{1}{N}\sum_{n}^{N}\mathbf{x}_{n}=\mu=\mathbf{x}_{0}+\mathbf{e}\left(\frac{1}{N}\sum_{n}^{N}a_{n}\right) \end{equation} The meaning of Eq.(5) is that the mean should be on the regression line $\mathbf{x}_{0}+a\mathbf{e}$. It is reasonable intuitively.


This explanation for Line Fitting is not general. It is for easier descripition of the relationship between PCA and Regression.


PCA(Principal Component Analysis)

I will explain PCA as a expansion of Line Fitting. 
The strategy of PCA is :

Analyze covariances of sample vectors and extract a new set of bases that represents samples more "Discriminable". Finally set up smaller size of new basis set by ignoring less discriminative bases. Why these type of processes are called as "Dimension Reduction" is that the size of new bases is smaller than original dimensions.

And "Discriminability" means larger variances.
See the following figure. After getting new basis vectors $\mathbf{e}_1$, $\mathbf{e}_2$, just ignore $\mathbf{e}_2$. Now sample data can be represented values projected on $\mathbf{e}_1$. This process reduces 2D to 1D

It is reasonable that basis $\mathbf{e}_1$ is more discriminative than $\mathbf{e}_2$ since $\mathbf{e}_1$ maximizes new variance, the variance of projected values on it.
$\mathbf{e}_2$ is also the largest basis representing the subspace orthogonal to $\mathbf{e}_1$. In reduction problems from 3D to 2D, $\mathbf{e}_2$ can be a member of new bases.

These new elements of new vectors, $a$s after projection on new bases is called as "Principal Components".



Now I will explain fundamentals of PCA step by step.
The following mathematical discription is based on the famous text book, "Richard O. Duda et al., Pattern Classification 2nd Ed.".

Let's think about only one new basis vector for samples. It is exactly same as Line Fitting explained in the previous section.
Rearrange the main formula of the previous section in terms of $e$ by setting $\mathbf{x}_0=\mu$. I will explain why $\mathbf{x}_0=\mu$ is fisible by using the previous analysis. In this step just assume that $\mu$ is the best representative value of samples' distribution.

\begin{eqnarray*} f(\mathbf{e}) & = & \sum_{n}^{N}\left\Vert a_{n}\mathbf{e}-(\mathbf{x}_{n}-\mu)\right\Vert ^{2}\\ & = & \sum_{n}^{N}\left(\left\Vert a_{n}\mathbf{e}\right\Vert ^{2}-2a_{n}\mathbf{e}^{T}(\mathbf{x}_{n}-\mu)+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & \sum_{n}^{N}\left(a_{n}^{2}-2a_{n}^{2}+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & \sum_{n}^{N}\left(-a_{n}^{2}+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & -\sum_{n}^{N}a_{n}^{2}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\\ & = & -\sum_{n}^{N}\mathbf{e}^{T}(\mathbf{x}_{n}-\mu)(\mathbf{x}_{n}-\mu)^{T}\mathbf{e}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\\ & = & -N\mathbf{e}^{T}\mathbf{\Sigma}\mathbf{e}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2} \end{eqnarray*}

where $\Sigma$ is covariance matrix.

To minimize $f$, constraint $\left\Vert \mathbf{e}\right\Vert =1$ should be considered. $f$ is transformed to Lagrangian formula $L$ with a Lagrangian multiplier$\lambda$.

\begin{equation} L(\mathbf{e},\lambda)=-N\mathbf{e}^{T}\mathbf{\Sigma}\mathbf{e}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}+\lambda(\mathbf{e}^{T}\mathbf{e}-1) \end{equation} \begin{equation} \frac{\partial L(\mathbf{e},\lambda)}{\partial\mathbf{e}}=-2N\mathbf{\Sigma}\mathbf{e}+2\lambda\mathbf{e}=0\Longleftrightarrow N\mathbf{\Sigma}\mathbf{e}=\lambda\mathbf{e} \end{equation}

This is a typical eigen problem and $(\mathbf{e},\lambda)$ is (eigenvector, eigenvalue). 

We can get the following analysis :

1. $\mathbf{rank}$ of covariance matrix means numbers of possible eigenvectors.
2. Since $\sum_{n}^{N}a_{n}^{2}$ is N times of variance of $a_{n}$s and $\sum_{n}^{N}a_{n}^{2} = N\mathbf{e}^{T}\mathbf{\Sigma}\mathbf{e}= \mathbf{e}^{T}\lambda\mathbf{e}=\lambda$, a larger $\lambda$ means a larger variance.
3. Since $f = -N\mathbf{e}^{T}\mathbf{\Sigma}\mathbf{e}+ ...= -\mathbf{e}^{T}\lambda\mathbf{e}=\lambda$,  $\mathbf{e}$ with the largest eigenvalue means a input value of global miminum. Others are local maximums.

Let's expand these analyses to the case of multiple basis vectors,
\begin{eqnarray*} f(\mathbf{e}_{0,}\mathbf{e}_{1},...,\mathbf{e}_{D}) & = & \sum_{n}^{N}\left\Vert \sum_{d}^{D}a_{n,d}\mathbf{e}_{d}-(\mathbf{x}_{n}-\mu)\right\Vert ^{2}\\ & = & \sum_{n}^{N}\left(\left(\sum_{d}^{D}a_{n,d}\mathbf{e}_{d}\right)^{2}-2\sum_{d}^{D}a_{n,d}\mathbf{e}_{d}^{T}(\mathbf{x}_{n}-\mu)+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & \sum_{n}^{N}\left(\sum_{d}^{D}a_{n,d}^{2}+2\sum_{d\neq d'}^{D}a_{n,d}a_{n,d'}\underbrace{\mathbf{e}_{d}\mathbf{e}_{d'}}_{=0}-2\sum_{d}^{D}a_{n,d}\mathbf{e}_{d}^{T}(\mathbf{x}_{n}-\mu)+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & \sum_{n}^{N}\left(\sum_{d}^{D}\left(a_{n,d}^{2}-2a_{n,d}^{2}\right)+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & \sum_{n}^{N}\left(-\sum_{d}^{D}a_{n,d}^{2}+\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2}\right)\\ & = & -N\sum_{d}^{D}\mathbf{e}_{d}^{T}\mathbf{\Sigma}\mathbf{e}_{d}+\sum_{n}^{N}\left\Vert \mathbf{x}_{n}-\mu\right\Vert ^{2} \end{eqnarray*}
where $D\leqq\mathbf{dim}(\mathbf{x}_n)$.
The process getting individual basis vectors is same as the previous process.

\begin{equation} \frac{\partial L(\mathbf{e}_{d},\lambda)}{\partial\mathbf{e}_{d}}=-2N\mathbf{\Sigma}\mathbf{e}_{d}+2\lambda\mathbf{e}_{d}=0\Longleftrightarrow N\mathbf{\Sigma}\mathbf{e}_{d}=\lambda\mathbf{e}_{d} \end{equation}

Since eigenvectors are orthogonal the eigenvectors of $N\mathbf{\Sigma}$ means new basis vectors for PCA.

As mentioned before. Larger $\lambda$ makes $f$ smaller. So we choose $D$ eigenvectors in order of size of $\lambda$ from large.



Let's go back to why $\mathbf{x}_0$ should be $\mu$. As I explained in the previous section, $\mu$ is on the line $\mathbf{e}_0$ (the largest basis). All of $\mathbf{e}_d$ have same formula.

\begin{equation} \mu=\mathbf{x}_{0}+\frac{1}{N}\sum_{n}^{N}a_{n,d}\mathbf{e}_{d} \end{equation}
$\mu$ should be on all of lines of $\mathbf{e}_d$. If $D>1$, $\mathbf{x}_0$ should be only one, $\mu$.


Sunday, August 31, 2014

Testing MathJax




When $a \ne 0$, there are two solutions to \(ax^2 + bx + c = 0\) and they are $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$

Friday, June 06, 2014

[Links] Computer Vision Links & References

Information for Deep Learning

Mathematical Fundamental

Computer Vision, Model, Learning and Inference

Recognizing and Learning Object Categories

David Forsyth's Homepage

Derek Hoiem

 Robert T. Collins

Technische Universitat Munchen

Computer Vision Online

Advanced Robotics Lecture Notes@UC Berkeley 

Monday, March 24, 2014

OpenCL on Ubuntu 12.04 + Intel CPU



Reference


Install

Install OpenCL Driver & related files (intel_sdk_for_ocl_applications_2014_beta_sdk_4.0.1.17537_x64.tgz) downloaded from http://software.intel.com/en-us/vcsource/tools/opencl-sdk-beta



Before installing the packages, debian package manager said that we need to install libnuma1.

./intel_sdk_for_ocl_applications_2014_beta_sdk_4.0.1.17537_x64
├── EULA.txt
├── install-cpu.sh
├── opencl-1.2-base-4.0.1.17537-1.x86_64.deb
├── opencl-1.2-devel-4.0.1.17537-1.x86_64.deb
├── opencl-1.2-intel-cpu-4.0.1.17537-1.x86_64.deb
├── opencl-1.2-intel-devel-4.0.1.17537-1.x86_64.deb
├── opencl-1.2-intel-devel-android-4.0.1.17537-1.x86_64.deb
└── uninstall.sh

Why OpenCL(libOpenCL.so) needs NUMA(libnuma)? libnuma1 is pre-required package of opencl-1.2-base-4.0.1.17537-1.x86_64.deb. You can check by "dpkg-dev --control opencl-1.2-base-4.0.1.17537-1.x86_64.deb ." and "vim control".

Package: opencl-base
Version: 1.2-4.0.1.17537
Depends: bash, dpkg, libnuma1
Section: extras
Maintainer: Intel Corporation
Architecture: all
Priority: optional
Description: OpenCL* (Open Computing Language) is an open standard for general-purpose
  parallel programming. OpenCL* provides a uniform programming environment for software
  developers to write portable code for client computer systems, high-performance computing
  servers, and other compute systems. This package contains OpenCL installable client driver
  loader. The OpenCL* installable client driver (ICD) enables different OpenCL* implementations
  to coexist on the same system. ICD also  enables applications to select between OpenCL*
  implementations at run time.

$ dpkg-deb -x ./opencl-1.2-base-4.0.1.17537-1.x86_64.deb .
$ tree
...
├── opt
│   └── intel
│       └── opencl-1.2-4.0.1.17537
│           └── lib64
│               ├── libOpenCL.so -> libOpenCL.so.1
│               ├── libOpenCL.so.1 -> libOpenCL.so.1.2
│               └── libOpenCL.so.1.2


$ ls -al /usr/lib/x86_64-linux-gnu/libOpenCL.so
lrwxrwxrwx 1 root root 37 Mar 22 23:10 /usr/lib/x86_64-linux-gnu/libOpenCL.so -> /etc/alternatives/opencl-libOpenCL.so

$ ls -al /etc/alternatives/opencl-libOpenCL.so
lrwxrwxrwx 1 root root 52 Mar 22 23:10 /etc/alternatives/opencl-libOpenCL.so -> /opt/intel/opencl-1.2-4.0.1.17537/lib64/libOpenCL.so


Querying Profile by using OpenCL Query functions

My laptop is MSI GE60 with Intel Core i7 4800MQ, whose spec is : http://ark.intel.com/products/75128/Intel-Core-i7-4800MQ-Processor-6M-Cache-up-to-3_70-GHz
Unfortunately we cannot use  CL_DEVICE_TYPE_GPU YET (2014-03-23)

The following is the result of my own tiny sample (https://github.com/stereoboy/opencl_samples).
num of platform ids: 1
-<0>------------------- Platform ID[0xa79f60]
CL_PLATFORM_PROFILE = FULL_PROFILE
CL_PLATFORM_VERSION = OpenCL 1.2 LINUX
CL_PLATFORM_NAME = Intel(R) OpenCL
CL_PLATFORM_VENDOR = Intel(R) Corporation
CL_PLATFORM_EXTENSIONS = cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_local_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_fp64
num of device ids: 1
-<0>------------------- Device ID[0xa7abc8]
CL_DEVICE_NAME = Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
CL_DEVICE_TYPE = 2
CL_DEVICE_VENDOR = Intel(R) Corporation
CL_DEVICE_VERSION = OpenCL 1.2 (Build 83073)
CL_DEVICE_MAX_COMPUTE_UNITS = 8
CL_DEVICE_GLOBAL_MEM_SIZE = 0

Intel OpenCL does not support Ubuntu 12.04 + Intel GPU. Look at the "Supported Hardware" list in the release notes (http://software.intel.com/en-us/articles/intel-sdk-for-opencl-applications-2014-beta-release-notes)

Running OpenCL on NVidia Optimus (uncompleted)

https://wiki.ubuntu.com/Bumblebee