To give is to know.: 03/22/19

CNN contd..
Begin.

> So, before I start understanding Conv Neural Net in more details, I thought ill explore a bit more of basic Neural Net.
> After a bit of googling and youtube, I came across this site that had 1 of the best comprehensive detailed description about Deep Neural Net.
> It is simply mind blowing as to how they could accommodate all the necessary information. They have everything - best part - they have used an example to explain the flow.

https://www.matrices.io/deep-neural-network-from-scratch/

> Yes, it's a huge post, but I believe I can revisit this several times if needed, but the way they have described it is beautiful.
> From going through like half of the page contents - I am able to have a fair understanding on things like -
- Represent features as matrices
- Feed the features into the neural network
- How the NN fuses the features with weights. Matrix multiplications of weights.
Something like:
for each layer:
for each neuron in a given layer:
for each given feature from previous layer:
z = (feature) * weight on the neuron
res = some_sort_of_non_linear_function(z) // so, some of the functions could be - ReLU, sigmoid, tanh... basically they squeeze the res btw a given set of ranges which can be used by us for setting a threshold or cut-off for predicting valid results.
____
> Now I cannot afford to have the above loop as it is computationally expensive for the computer.
> Hence the smart people have come up with vectorised computations (using libraries like numpy where they make use of optimized underlying C functions)

> A line in this below post said : ' the vectorized NumPy call wins out by a factor of about 70 times:'

https://realpython.com/numpy-array-programming/

Here, they try to calculate the maximum profit when u r asked to make 1 purchase and 1 sale. They have 2 versions of the same - loops and numpy - Numpy seems super simplified (2 lines!).
____

So, continuing with NN,
- The output is called the activation function (non linear function)
- Realised how output of 1 layer's neurons is passed onto the next layer.
- In general, when I talk about each layer, the task of multiplying previous layers' outputs (if previous layer is the first layer - then input is the given features) with weights of all the neurons of this layer can be expressed as 1 step!

Some videos that I saw:

https://www.youtube.com/watch?v=aircAruvnKk

https://www.youtube.com/watch?v=2-Ol7ZB0MmU&t=1062s

There are a lot more stuff that I can visit, understanding has been a cumulative process.

Little beastly matrices for math!

From that website (matrices.io),
> I could make sense of what I have written on the LHS, x1 and x2 are the features (in my case, pixel intensity of the food at top left and bottom right corner) in first layer
> There are 4 neurons on the second layer with their own weights expressed as a matrix.
> First row corresponds to the weights for the feature x1 and its connections with the 4 neurons in that layer.
> Same goes with the second row of the weight matrix which is wrt the second feature's weights for all 4 neurons of the second layer.
> On asking numpy to multiply this, we get a variable Z which we give it to the symbol σ which is the activation functions (ReLU, sigmoid, tanh...)

> Now we have values that we can probably use to predict the final class (food ingredient), but what about taking care of the correctness??

Error Optimization:

I cannot afford my system detecting the object to be a book when it's actually a potato (but what if its a book that has a potato pic.. hmm.. for next time)!

> The website had summarized certain things I could actually relate, things like how I compute my error (predicted value - actual value)
> its the J(W) formula that I have written above - they call it the cost function (that tells how much is the error) which we need to minimize.
> the formula is actually intuitive, y is the actual output class (potato), y^ (book) is the calculated value, have used squared function to remove negative, have summed things up to get like a final accumulated collective error.

> Now, that error function is a convex function that is of the form like y = x^2 like a bowl shaped graph, once again, indicating that it actually has a minimum value (bottom of the bowl).

> So this is my error function and am here right now,

But I want to reach here! Hence I badly want to slide down (gradient descent) this bowl to the bottom.

TakeAways:

> So bit more on how I can represent input features and - weights on each neuron across layers w.r.t each input feature - using matrices. (each row in weight matrix represents each feature's connection with all the neurons in that layer)
> A rough algorithmic steps involved.
> Some sort of mathematical description (image)
> How numpy is all being boss when it comes to computations.
> Its a book! Not a potato?? No can do! - this is where I cannot afford to overlook error optimization - which I realised is a bowl and I just need to slide to the bottom and it's all "downhill" from that point :D

- In the next one I intend to write a bit more on gradient descent (which is the technique for sliding down the bowl) and back propagation which is the technique used by our NN to calculate errors.
- Post this, probably try and implement a basic "sub-idea" NN to see stuff in action.
- And yes, I havent mentioned about the 'bias' thing that I have written, shall visit that as well.
- Then visit Conv NN.
- Exams going on, might get delayed by a bit. End.

Friday, March 22, 2019