Using Unsupervised AI Techniques To Cluster Images

favorite

visibility

December 8, 2020 in Computer Science

This project looks at taking an unsupervised learning approach to identifying different cloud types using the VGG-16 model provided in the Keras library. The data for this project was provided by the NASA GLOBE cloud program. This program is comprised of citizen scientist observers that upload and identify images of clouds they capture photos of.

I was inspired to work with the FIRE Cloud Computing stream on this project due to the importance that clouds have on the overall climate of the planet. With better models that are able to identify what types of clouds are in images taken from the ground, the data would have numerous applications for future weather models.

During this project, our team encountered several challeges that lead to growth in our Python skills and AI technique skills. One of these challenges was implementing a Tensor-Board for the first time. Tensor-Board is a tool that allows for the visualization of the developed model created in this project to be shown in a 3D space. As this was my teams first time making one of these objects, we learned that researching and understanding the API of different libraries is critical to having successful code implementations. I personally learned that it is okay to run into issues as running into challenges in a project often results in a better final product, a more honest product, and a better understanding of the overall project in general.

There were several results for this project. Our team set out to accomplish four objectives when developing the project plan for the semester.

Our first objective was to have a successfully cleaned and scraped data structure which contained all of the links to the images necessary for developing our model. This was accomplished using the pandas library in Python and resulted in a datastructure that contained links to all of the unique types of cloud image links that could be found in the GLOBE dataset. Following this, we were able to develop a function that could read all of these links and download the images from the GLOBE servers and store them in our Google Drive workspace for use in future objectives.

Our second objective was to identify the number of features that we needed to analyze for images. This was done by selecting 100 quality images, 10 for each cloud type, and sending these selected images through the VGG-16 model. From this, we were able to receive a feature vector for each image and reduce the dimensionality of these images using PCA techniques. What resulted from this was a chart (shown in the prezi) that identified the number of dimensions we needed to save for each image in order to have an accurate representation of each image. We found that 7 dimensions was suficient for our data.

Our third objective was to test how well our reduced dimension images help their characteristics when passed into the model. We did this by creating several functions that could compare the passed in images to other images with similar feature vectors. From this, we concluded that the model was very accurate when returning images that looked similar to the original image passed in.

Our final objective was developing a way to visualize our model. We implemented a Tensor-board in order to accomplish this for the project. From this Tensor-board implementation, we were able to see that our model was able to cluster some cloud types, such as the stratus cloud type, but unable to produce 10 distinct cloud clusters with our current approach.

For the future, I would like to continue developing the current model we have. I plan to implement this model to include more initial testing images instead of the 100 we elected to use for this project. Further, I would like to test out different available models with the larger set of cloud images. My goal from this would be to develop a model that would be accurate enough to identify an image passed in as one of the ten unique cloud types the model was trained to identify.

The FIRE program has helped me develop my career readiness skills in several categories over the past three semesters I have participated in the program. During this semester, I found that this project helped me develop my collaboration skills by working with others to reach a common goal. Without the collaboration skills that were developed this semester, this project would have taken much longer and would not have been as impactful to the future of the FIRE Cloud Computing program as it is in its current state. Through collaboration, our team was able to come together in a new environment to produce a product that improved the skills we had developed earlier in the FIRE program. As a team we were able to present our work at the FIRE Summit and accomplish all of our objectives we set out to achieve for the semester of research. Without the collaboration that myself and my fellow teammates put into this project, the resulting product would not have been the same. I am very proud to be a part of the FIRE program and pleased with the progress that was made to solve the question of whether unsupervised cloud clustering is a viable option for developing a model to identify clouds by type.