This is a guest post from Scott Miller from Florida State University. Scott helped us beta-test the new deploy API and was kind enough to share his experience in this post. Please check out the API help pages for more details, and happy CoralNetting.
-- the CoralNet team
Introduction
Hello, CoralNet community!
I’m Scott Miller, a PhD candidate at Florida State University under the mentorship of Dr. Andrew Rassweiler. In this blog-post I'll discuss using the new CoralNet API to annotate over 175.000 survey images and share the code we developed to use the API. I hope this post manages to highlight the utility of the CoralNet API and provide a jumping off point on the coding/methodological side for you to use in your own research.
Background
We have been working in Moorea, French Polynesia to understand island-scale distributional patterns of fish and how they relate to benthic conditions and fishing pressure.
To address these questions, we developed a method of georeferenced, paired fish and benthic surveys in the shallow lagoons. Briefly, we conduct fish surveys on snorkel for a subset of species, while towing a float equipped with a GPS and downward-facing, time-lapsing GoPro that is passively taking thousands of images over the same areas we are surveying fish. Our group is currently writing a manuscript that goes into more detail on our methods and gives descriptions on our data workflows to pair the survey data to the GPS.
Curating and evaluating an automated classifier
Because we are collecting tens to hundreds of thousands of images – way too many to score by hand – CoralNet has been an invaluable resource and is the workhorse behind the annotation of our images. After manually annotating ~8000 images from around the island to train a CoralNet classifier on the web version of CoralNet, we did some internal testing (that is also a part of the upcoming manuscript) to determine that the CoralNet algorithm was performing comparably to humans in estimating the percent cover of our groups of interest at our scale of interest (one minute of fish counting or roughly 10m of linear reef). We found that using the trained CoralNet algorithm to automatically annotate images gave results that were inferior to traditional by-hand annotation on an image-by-image basis. However, the automated approach was as good or better than the traditional approach at estimating percent cover at the “minute scale” for the majority of common substrates or at aggregated groups (such as all live coral or all algae) when we accounted for the much greater number of images which could be processed by the computer vision algorithm compared to the highest reasonable number of images that could be scored by a human.
Classify thousands of images automatically using the CoralNet API
Processing large numbers of images would have been slow with CoralNet’s web interface. Luckily, we were given the opportunity to work with the new CoralNet API to automatically process these images. When using the web version of CoralNet, one needs to upload images and metadata, wait for the images to be annotated, then download the resulting percent covers and annotation files – all of which can take a long time and requires a significant amount of human intervention for large datasets such as ours. However, with the API, we upload our images to Dropbox, then run a series of Python scripts in the background throughout the day to generate these annotations and retrieve the data from them with minimal work on our part.
A guide to the CoralNet deploy API
I hope this post manages to highlight the utility of the CoralNet API and provides a jumping off point on the coding/methodological side for you to use in your own research. Here you can find a link to a GitHub repository that includes the Python scripts that I have been using to interface with the CoralNet API. A brief note: I am an ecologist by trade, so although I have done my best to make the scripts as efficient and well-documented as possible, I do not claim that these are perfect and they may need to be modified to fit your specific research purposes. However, they get the job done and should serve as a good place to start if this is your first foray into this kind of work.
Image upload using Dropbox
To begin our automatic annotation workflow, we uploaded images in chunks of about 3000-10000 images into separate folders representing different study sites on a Dropbox account. We then needed to retrieve our Dropbox authorization token so the scripts can access the folders. In addition to a Dropbox authorization token, we needed to retrieve our CoralNet authorization token and determine the URL of the source we were going to use to score our images. The CoralNet API help page has steps to do this and provides additional useful reading.
Our code on github
Once completing these steps, you can begin using the Python scripts located in our GitHub repository. For each one, you need to define certain variables, including file paths and your authorization tokens. I won’t go into too much detail here because more information can be found in the repository’s readme file and in the comments of the individual scripts, but I will briefly discuss the purpose of each script below.
json_generator
The first script to run is “json_generatory.py”, which connects with your Dropbox account to the folder you specify, retrieves a list of the images stored there, and then generates a “JSON” file that will be sent to CoralNet in subsequent scripts. This file contains information about the URL of each image and the location of points (based on pixel value) you want CoralNet to annotate within these images. It is currently set up to generate 30 stratified random points on our GoPro images, but you can change this script to whatever you like depending on your needs.
coralnet_api_deployer
The next script to run is “coralnet_api_deployer.py” which sends the request to CoralNet and retrieves the annotated data. Currently, CoralNet is set up to allow requests of up to 100 images, so this breaks the image list down into chunks of 100 and sequentially sends them to CoralNet. This can take some time to run depending on the size of your dataset, but it prints status updates to the console so you can track its progress.
json_error_checker
The third script is “json_error_checker.py” which checks for errors in the data which occasionally occur, then attempts to fix them by sending new requests to CoralNet for these images and overwriting the errored data. In my experience, errors are infrequent and minimal when they occur, but they have always been fixed when running this script.
json_parser
The final script “json_parser.py” takes the data generated from the previous scripts, which are in a JSON format, and converts them to a .csv in a format similar to the annotation export from the web version of CoralNet.
Enjoy the automation
Once you run the scripts, you will have the same data that you would as if you manually uploaded the images to the web version of CoralNet and downloaded the annotations, with the added flexibility of using any source you have access to and the ability to place the points wherever you like on the requested images. We have had great success using the CoralNet API and I hope that this helps you get started on your own projects!