Artificial Intelligence

ArticlesDeep LearningFeaturedMachine Learning

Using Generative Adversarial Network for Image Generation


Generative Adversarial Network (GAN) is class of deep learning algorithm, comprising of 2 networks – a generator and discriminator, both competing against each other to solve a goal. For instance, for image generation, the generator goal is to generate real like images which discriminator can’t classify as a fake or unreal image. The discriminator goal is to classify real images from fake ones. Initially the generator network would start off from blank images and keep on generating better images after each iteration, up to a point it start generating real like images. The discriminator network would take an input of real images and the images provided by the generator network and classifies the image as real or fake, up to a point where generator start generating real like images which is hard for the discriminator to discriminate.  The same algorithm is being applied in other domains also. However, based on my experiments, lot of optimization need to happen for large image sizes. I had to create a custom generator/discriminator network to work against input size of 128*128 and 256*256 image pixels and lot of iterations to generate real-like images. The training data used was of Indian Bird.

Here is a snippet of my talk on GAN at the Eclipse Summit Conference, which demonstrates the experiment.

read more
ArticlesArtificial IntelligenceConferencesDeep LearningIOTMachine LearningViews & Opinions

Building Intelligent Connected Products using Artificial Intelligence, Cognitive and Blockchain


Building Intelligent Connected Products using Artificial Intelligence, Cognitive and Blockchain

My technical talk at IoTNext, where I talked about applying intelligence at the edge gateway and cloud. Topics covered – Deep Learning, Computer Vision at the Edge Gateway for security and surveillance, Cognitive IoT – Cognitive Cricket and Connected Car and Security and Trust compliance using Blockchain as a service.

read more
Artificial IntelligenceDeep LearningMachine Learning

Convolutional Neural Network with Internet of Birds


I am happy to share we are live with Internet of Birds platform.

Internet of Birds (IoB) is the first citizen science platform to identify birds from the Indian subcontinent through the power of Artificial Intelligence, Deep Learning and Image Recognition. IoB is a citizen science platform by Accenture Labs in collaboration with BNHS.

Internet of Birds is trained on Indian birds using a Convolutional Neural Network. I will share my findings on building the Internet of Birds platform in a later blog, where i would describe the challenges in building up a generic Image recognition service. The same learnings can be applied to any use cases. The end solution can be accessed as a service over the cloud or at the edge network (for more details listen to my talk on – building connected products – edge gateway and CNN) for real-time decision making using Images as one of the context parameters.

The internet of birds website can be accessed at – Here is the youtube video on IoB –

More news @


read more
Architecture PatternsArticlesArtificial IntelligenceCognitive ComputingDeep LearningIOTMachine Learning

Cognitive IoT in Sports


Cognitive Internet of Things is about enabling current IoT technologies with human-like intelligence. The end goal is to provide expert advice based on the domains being targeted.  Cognitive IoT can be applied on the edge gateway or in the cloud as part of the solution.

Let’s see how we can apply Cognitive IoT technologies for Sports domain. In  Sports domain, there are actually 3 primarily use case –

  • Learning from an expert/coach (or visually) and improving one’s game
  • Personalization – where all information is personalized to improve a player’s game
  • Continuous learning to keep a player improving his game based on how is he is playing from current and past records.

I will talk about an example of cricket. (I call this as connected cricket -) ).The real value that we want to derive is to enable batsman understand their game better, help them master various batting strokes like cover drive, pull shot etc., analyse their performance continuously to be an expert batsman, for instance what should I do to bat like Sachin Tendulkar.

With respect to a baller, the baller would like to understand how well he is bowling, his speed, his run-up, the way he delivers the ball, spin variations, all these insights can improve his game continuously (so there is a feedback loop) and how similar he is bowling to an expert baller, may be like Ashwin.

So let’s talk about how do you go about realizing it.

  1. Embedded device on cricket ball (without increasing form factor)
  2. Embedded device on cricket bat, pads, gloves
  3. A Connected Stadium.

cogxFor an architecture stack perspective, you have the low powered embedded device  installed inside the ball or embedded as part of the design and manufacturing process, its provides at least 6 Axis combo sensor for accelerometer and gyroscopes reading to identify any movement in 3d space. A Motion SDK is installed on top of the device to identify any movements in general and communicate the reading to the cloud. In cloud, we have the learning model or the training data. Basically, we would ask an expert batsman to bat and play various expert strokes like cover drive etc. and record their movements from sensors (bats/pads etc) as well as visuals (postures etc), this would be used as the training / test data and comparison would be made against it. As we are comparing 3D models, machine learning approaches like dimension reduction can be employed ( and many new innovation approaches) to compare two motion and predict the similarity. Similar training data is captured from an expert baller, along with other conceptual information like hand movements, pitch angles etc.

The feedback is continuously captured and the system provides guidance for improving a player’s game. The player tracks all this information on his mobile and can now look at these insights and suggestions on how he can be an expert in his game. For instance, a player can ask a system “what is takes to master a cover drive like Sachin” and the system analyses the motion information from batting strokes (sensors on bats, pads etc.), visual information (postures etc.), compares it with an expert model and provide an accuracy score and suggestions to improve a players’ game. The key here is that the cognitive system understands the domain and its trained on the domain to provide an expert advice or suggestion.

The same technique and concept can be applied in any game to get cognitive insights.  In future, technology would be a key enabler in Sports.

The following is part of my presentation that I delivered at IoTNext. I will update the article with the youtube video once available.

read more
Architecture PatternsArticlesArtificial IntelligenceDeep LearningIOT

Applying deep learning and computer vision on the Edge Gateway


Edge gateway is about providing real-time decision making at the edge. In this article, I will talk about how to apply computer vision at the edge for various security and surveillance activities. For instance, you would like to detect suspicious activities at the ATM, simple cases like people wearing helmets and entering the ATM to unusual movements or would like to capture and identify objects in remote areas using Drones like crops length to mining activities. In such scenario, you would need real-time decision making at the edge as it might not make sense to transfer the data over the cloud for processing, be it for latency issues, bandwidth, cost issues or there might be none or very limited connectivity. In this case you employ intelligence at the edge of the devices.

edgegateway-useIn order to build out the solution, you need to employ computer vision algorithms on the edge. You can build this using commercial available API’s or using various open source deep learning framework like Theano, TensorFlow, Café etc.  Deep learning is a branch of machine learning for learning multiple levels of representation through neural networks.  Neural network in image processing, automatically infer rules for recognizing images instead of you extracting thousands of features to detect images.

Various deep learning architectures are available like convolutional neural network, recurring neural network to solve specific problems like computer vision, natural language processing, speech recognition and achieves state of the art results.

For Computer Vision, we specially use a CNN network to identify images or use of the pre-trained instance (like Inception model released by Google which is a 22 layers’ deep network which achieves start of the art results for classification and detection for images). For a computer, images are nothing but a vector or set of numbers/pixels. Convolutional Neural Networks learns features automatically using small frames of equal size images and each network gets the same input and interesting pieces are extracted (for more details, please refer to how CNN works).  The extract of interesting pieces (which are again vector of numbers) is the heart of how CNN works. For instance, in case of helmet, some network would learn a round edge, some may learn a glass pane in the front and so on. The idea is irrespective of where the object is in the frame, you should be able to identify to image.

Prior to CNN for image detection, you would need to crop the images and provide area of interest. For instance, if you are detecting various categories of birds, you would usually crop the bird image and remove any surrounding context, like trees, bushes, sea, sky etc and provide the bird image only. With CNN, the idea is to train with those images and let network figure this out (though cropping may help in some cases for increasing accuracy). Based on my experiments, the CNN is able to predict object most of the times which have surrounding context, but with lower accuracy. As mentioned, the idea is to identify objects irrespective of where they are found in the image and I am sure lot of research is going on to improve the CNN networks. Having the right training data (images and label) is a must for training networks with such variations.



On the left hand side, is the stack view of employing deep learning API on the edge gateway. It consists of API, the learning model and the classification service which is used for classification of objects. There are lot of innovations happening to optimize the use of deep learning libraries on the edge.




So how do you go about implementing this. I will talk about one approach of building this out –

  1. Build your own CNN or Start with a pertained CNN network (like an Inception Model)
  2. Get the training and test data (images and labels)
  3. Train or retrain (i.e transfer learning) the network.
  4. Optimize with learning rates, batch size etc. for desired accuracy. Get the trained model
  5. Use the trained model for classification.
  6. Deploy TensorFlow API and trained model on the edge  (you can package this in docker)
  7. Use the classification code on the edge gateway to detect objects.

The following is part of my presentation that I delivered at IoTNext. I will update the article with the youtube video once available.

read more
ArticlesBooksCognitive Computing

Building Cognitive Solutions – A Definitive Handbook


Let me first start with an introduction on “Cognitive Computing”. Cognitive computing are systems that are designed to make computer’s think and learn like human brain.Similar to an evolution of a human mind from new born to child to an adult, where new information is learned and existing information is augmented, cognitive system’s learn through the vast amount of information fed to it and training on a set of information,so it can understand the context and help in making informed decisions.

For example, if you look at any learning methodology, a human mind learns and understand the context, but its equipped to answer questions in an examination/interview which it might not have seen before, but using the experiences and past learning,a informed judgement can be made. Similarly cognitive systems are modelled to learn from past set of reference data set (or learning) to help users make informed decisions. Cognitive systems can be thought of as a non programming system which learns through the set of information, training, interactions and a reference data set.

From a technology perspective, at a very high level, building a cognitive systems requires technologies that can understand the language,context, entities and relationship (NLP), learn through a set of supervised or unsupervised learning (Machine learning methodology), domain adoption through various techniques, technologies to help source, curate and manage content, runtimes to  build out the components together in a loosely coupled manner and wide variety of  tooling’s and methodology to enable making cognitive applications. I envision most of the cognitive capabilities offered as service over a cloud (a marketplace for cognitive and smart apps) which can be used individually or as a composite to create applications. The key here is domain adaptability, else we are looking at a general purpose AI system, which in my view would not provide precise and accurate suggestions or predictions. Most of the first generation cognitive services are focused on providing API without any provision to train or adapt to use cases. Even if they provide a provision for training and recognizing new terms etc, based on my experiences it doesn’t work out well. For instance, take the example of Google NLP, Watson NLP APIs or any open source NLP framework like standford or Apache NLP, which provides general NLP parsing (based on wikipedia it can recognize common terms etc..), but fails to solve any real use case on its own. The point is you can’t just rely on the bare APIs, you need to build upon it to solve for any real use case. When it comes to solving enterprise solutions,you are looking at precise suggestions at the top and most of AI engines in the market are actually general purpose AI , which fails to get the level of accuracy required from an AI system. Even if you train a general purpose AI, there are “n” number of factors to get the required level of accuracy. I haven’t seen a AI system or design, which is build from the ground up to make it easy for end consumers, enterprises or users to adapt to required use cases.

Through my upcoming book – “Building Cognitive Solutions – A Definitive Handbook”, I would share my experiences on building a cognitive solution, the right way. There are lot of misconceptions on how to build cognitive applications and this would be the first practical guide on building cognitive solutions.

I plan to show a general methodology to build cognitive applications and the recipe to build and end to end cognitive solution. The book would also cover “deep learning” and new approaches to build cognitive solutions. This would follow the same style as my earlier handbook -” Enterprise IoT – A Definitive handbook

I am looking out for contributors/co-author for my book, who are experts on deep learning and would like to contribute and share their knowledge with the wider audience. Kindly reach me at for more details.

read more
1 2 3 4
Page 1 of 4