Monday, March 18, 2019

Begin.

> So understanding how the system interprets live video stream using Open CV was a good exercise.
> Now I would want the system to smartly identify objects by itself from the feed. I should not be driving the identification step.
> Hence with that, I came across Convolutional Neural Net, that can be used to find patterns and identify images (extend that to live feed).
> Obviously there is a lot to know before I can proceed with the implementation. (Although in the end it might just be the part where I provide my train data and invoke a few function calls like everytime).
> I would like to know from a decent ground up.
> Shall visit the necessary things and create the next post.
> Of Course I need to keep the main goal at the back of my mind, once I feel I am equipped with the necessary things, I can move on.

End.

Thursday, March 14, 2019



Ideation contd.

Trying to implement :

Selectively ask the computer to track a given area from the real-time video stream.

And it seemed to be working -

> Select a custom area in the live feed, by drawing a bounding box, the computer calculates the histogram of that area
> that area's histogram is compared against the current live feed - the mean shift algorithm gives u a "track window" that specifies the "most relevant " area that matches with your bounding box.
> Draw a rectangle in the place of that "track window" for every frame of your live feed.



> Mean shift algorithm tries to detect a bigger concentration of white points.
Hence 1 disadvantage, is that if the object enters the frame from some other position, the algorithm initially fails to detect the object as it fails to find the "new concentration" of the white points.
> Mean shift algorithm does not track the size of the object of interest.
Hence I had to follow Cam shift algorithm.



Using camshaft I could keep track of the size of the object, its orientation as well:



1. As you can see, the green box is the ROI that I interactively selected from the live feed window (top second)

2. The black and white are calculated in real-time, that uses the histogram of the ROI and the back projection function of Open CV...it is the mask  - white portion indicates relevant portions of the frame, and black corresponds to the non-relevant ones.

3. I use this mask from step 2 (obtained from the back-projection method), give it to the camshaft  algorithm, that gives me the track window, its orientation, size which in turn is passed to the ready-made Open CV function that calculates relevant co-ordinates and finally draw the corresponding rectangle box (blue coloured box in the frame)


Takeaways :
A lot to take in, but bottom line, tracking a given Region Of Interest using its histograms - back projection - and finally mean shift OR cam shift algorithms to get the track window that is drawn on the live feed.

Trying to think about what I could do with the things explored so far,
1. I can probably try and revisit what I wanted to do in the first place - the system should be able to detect food items.
2. For food items, I need features - I need to revisit the algos that detect features (SIFT, ORB...).
3. Without incorporating any intelligence, a brute force approach could be, I select the ROI, the system matches the "features" of the ROI with the predefined set of features of various food items and based on most matches, gives the best result.
 Or I can wait for a few more video tutorials where I might encounter something new.

PS: Of Course there were lighting issues, I think that is fine at the moment. As of now, am trying to understand how the thing works in the background.

End.




Begin:

Identification contd.

> Tried using the histogram of a road, to extract only the road from a traffic image.
Steps were like:

> Get the HSV of both the template and original image.
> Get the histogram of the template road.
> Use the back projection method of open CV to extract only those parts that match with the histograms of the template.
> Next imp step is to optimize the match, using kernel estimation and thresholding.

a. Kernel estimation, acc to my understanding estimates the density/intensity of the given pixel using a prescribed filter (Ellipse of a given size, circle of a given size, etc ..)
b. Thresholding tells the system to consider all the values below a threshold as black and above that threshold as white.

The above 2 filtering steps is done for the mask to improve the mask that selects the road  :













Final step is to do merge (bitwise and ) the mask with the original image to get the below image on the left.




TakeAways:

1. Using histogram of the template ROI (region of interest) to filter out specific regions from the given image.
2. Best use case that I could think of is to track a given object in real time.
3. Next, I would want to be able to custom select an area in real time, and start tracking parts of the image  that matches with the given bounding box.


To be contd..


Wednesday, March 13, 2019

Identification:

Begin.

Trying to get closer to what it takes to identify a person, object.

>Realised the concept of how the computer can use histograms to sort of identify things.
>Plotted various histograms of the R,G and Blue channels of an image to see how different colors are distributed.
> Tried plotting histograms for real time Video, turns out its a bad idea :D
>
 image example - clearly green component is more

> Next to explore more of mouse events, I want to be able to get a histogram of a selected custom area from a real time stream.
>Using multiple flags (top_left, bottom_right - extracting the x and y ranges of the bounding box - slicing)
>

Pretty decent output. In the end, the result was:
a. Draw a bounding box over the live Video stream - (draw box using mouseCallBackEvents by locating position of the pointers)
b. Crop out the image inside the bounding box.
c. The cropped imaged is passed to a function that 'splits'  the image into the 3 channels.
d. flatten the channel and plot the histogram with 255 bins (0-255) for each of the color.
As seen in the above pic, the histogram says the cropped image has a distribution of the green color .


TakeAways:
> A call back mouse event can be registered to track co-ords of the pointer and also various events.
> Use the above technique to draw a bounding box. Using this for selection in the image.
> Use plt.hist() function to plot the color distributions of the image.

End.

Thursday, March 7, 2019

Begin.

Sub - Idea implementation :

> After Harry Potter Cloak project, I couldn't help but explore more of Open CV which is when I encountered mouse events in Open CV.
> This is when I had my second idea in mind - to be able to accept mouse events and use it for perspective projection. Kinda like selected area has a perspective that goes away from the screen, which you will "rectify" and project it directly to the screen. Pretty cool idea to think of when u want to scan docs (as seen in CamScanner app).

Steps were pretty clear and simple:
> Accept co-ords of mouse pointer using callback function, keep track of the points.
> Color those locations to sort of make it readable.
> Also, explored options where user could delete the points set (by pressing 'd' key, Basically you can have various events).
> Lastly, ask Open CV to get the perspective projection matrix that transformed selected area to the screen (rectangular co-ord which u would have set in the beginning).
>And do this while Video captures ur live feed.

The result:


TakeAway:
> Mouse events to keep track of pointer's co-ords.
> Several other events that you can listen to. (try dot/line trails while live feed, pretty fancy it was)
> Perspective projection of custom area as selected by the mouse.

End.

Wednesday, March 6, 2019

Begin.

Ideation:


Started exploring Open CV (Computer Vision). It is one amazing utility! So what better way to learn, than implementing your ideas right!

So 1 use case that is fun and interesting that I could think of is a scenario where the user shows the application what all food ingredients he/she has, and the application should reverse engineer the items and present recipes.


So I have started exploring Open CV through a youtube channel :

https://www.youtube.com/channel/UC5hHNks012Ca2o_MPLRUuJw

The guy is brilliant, keeps things simple and to the point.

So far, I was able to :
> do some basic stuff, play around with camera of the system.
>Perspective projection was pretty cool
> selective masking of colors, displaying only the stuff I wanted
> Conceptually understood dilation (fill stuff in the masked region), erosion (remove stuff from the masked region), contour lines (draws boundaries around the objects of a given color - was pretty cool)
> On doing so, I came across something, which even I wanted to try out using the knowledge I had acquired.
>Realised, having a lot of sub ideas in your way of implementing your bigger idea is a good way to learn better!
>to implement Harry Potter's Invisibility Cloak! :)
> Concept was simple:
        > Select your object of choice, that you would be using as a "Cloak".
        > Record that bkg, wen the scene is empty (without you in the cam's field of view or the "Cloak":). Store it.
        > During live, remove the Cloak obj(mask it) and select only the masked areas from the static background that you had recorded beforehand.

Looked something like this :

was excited! :)

>Went through a video where he uses algorithms SIFT, SURF and ORB to detect features in an image.
Apparently the above techniques for feature detection is only good for image comparisons itseems and not good for videos as it takes a good computational time.

> Am about to learn mouse tracking, once that is done, I want to be able to :
>Take area input from user - project it onto the screen


Take aways:
1. Main Idea: Show Me Recepies
2. Sub Ideas: Harry Potter Cloak (done), Input area - project it to screen
3. There are lot of things that I went thru - dont remember everything - the ones that I remember - Dialation, Erosion, Perspective Projection, Selective Masking (inRange), Contour Lines, Sift, Surf, Orb.....
4. Ofcourse, I dont need to remember anything at all, but just try and focus on the things that would cater my "idea". Hence its important to have an idea in mind wen u are learning something.


End.




Begin.

Learning Path:

Hello World,

The main purpose of this blog is to have a 1 stop source of reference, for all the things that I explore. Not sure if it turns out to be useful, but hopefully will give a head start for people who are as clueless as I was when I had to start off.

Main intentions :
1. I need to be able to organise the stuff that I learn (mainly technical stuff) in 1 place.
2. Any quick tips as to how to or not to do certain things.
PS:
This might not be a place where 1 could learn stuff, as in there might not be code snippets, instead the journey, sort of the path that I took to implement what I wanted to implement.
I have categorised the posts into day-wise to your right (in case of Mobile phones, click on "View Web version" of the blog).

End.