Ideation contd.
Trying to implement :
Selectively ask the computer to track a given area from the real-time video stream.
And it seemed to be working -
> Select a custom area in the live feed, by drawing a bounding box, the computer calculates the histogram of that area
> that area's histogram is compared against the current live feed - the mean shift algorithm gives u a "track window" that specifies the "most relevant " area that matches with your bounding box.
> Draw a rectangle in the place of that "track window" for every frame of your live feed.
> Mean shift algorithm tries to detect a bigger concentration of white points.
Hence 1 disadvantage, is that if the object enters the frame from some other position, the algorithm initially fails to detect the object as it fails to find the "new concentration" of the white points.
> Mean shift algorithm does not track the size of the object of interest.
Hence I had to follow Cam shift algorithm.
Using camshaft I could keep track of the size of the object, its orientation as well:
1. As you can see, the green box is the ROI that I interactively selected from the live feed window (top second)
2. The black and white are calculated in real-time, that uses the histogram of the ROI and the back projection function of Open CV...it is the mask - white portion indicates relevant portions of the frame, and black corresponds to the non-relevant ones.
3. I use this mask from step 2 (obtained from the back-projection method), give it to the camshaft algorithm, that gives me the track window, its orientation, size which in turn is passed to the ready-made Open CV function that calculates relevant co-ordinates and finally draw the corresponding rectangle box (blue coloured box in the frame)
Takeaways :
A lot to take in, but bottom line, tracking a given Region Of Interest using its histograms - back projection - and finally mean shift OR cam shift algorithms to get the track window that is drawn on the live feed.
Trying to think about what I could do with the things explored so far,
1. I can probably try and revisit what I wanted to do in the first place - the system should be able to detect food items.
2. For food items, I need features - I need to revisit the algos that detect features (SIFT, ORB...).
3. Without incorporating any intelligence, a brute force approach could be, I select the ROI, the system matches the "features" of the ROI with the predefined set of features of various food items and based on most matches, gives the best result.
Or I can wait for a few more video tutorials where I might encounter something new.
PS: Of Course there were lighting issues, I think that is fine at the moment. As of now, am trying to understand how the thing works in the background.
End.