This dialog contain information about the CoralNet automated annotation system. For technical details refer to "Towards automated annotation of benthic survey images: Variability of human experts and operational modes of automation"
- Basic information
The principal goal of the CoralNet computer-vision backend is to facilitate faster and more accurate point annotations. Here are the basics:
After an initial set of images have been annotated a first classifier will be trained. This classifier will then "pre-annotate" all images uploaded to the source. The status of these images change from unannotated to unconfirmed.
When a user enters the annotation-tool for an unconfirmed image, all point annotation will have label-suggestions. These are provided along with a posterior probability estimate indicating how "confident" the classifier is that the substrate directly under the point pertains to the respective labels
New classifier are trained continuously as more images are confirmed by the users.
Source admins can set the source "confidence threshold". The default is 100%
When a user enters the annotation tool all points for which the classification confidence is higher than the source threshold will be automatically confirmed so that the user can focus on the other points and more rapidly work through the images.
- Robot Performance Estimation
The confirmed data is split in eight parts. A classifier is trained on 7/8'ths and evaluated on the last part, the "validation-set". This procedure is used to generate the confusion matrix and classifier analytics curve shown on the backend page.
By inspecting the curves the source admin can set the appropriate confidence threshold balancing the amount of manual work with the decrease in performance
NOTE: since the classifier is compared to a single set of annotations, and that set of taken to be the "ground truth", the performance will always seem to deteriorate as more annotations are done automatically. However, in our experiments, inter and intra operator variability is significant, and a well-trained classifier operating at a threshold where around 50% of the points are done automatically may actually increase the performance compared to fully manual annotations. We encourage all users to do a thorough inter- and intra- operator assessment before starting large-scale annotation projects.
- Source specific Robots
All Robots are source-specific. This means that it will learn only from confirmed annotation within the source, and only annotate image in that same source. The reason for this is simple: machine learning across different sources is difficult, and it's not clear how to do this efficiently yet.
- Classifier threshold sweep
The first plot on the backend page is created by trying hundreds of confidence thresholds on the validation-set. For each threshold all annotation for which the classifier confidence is lower then the threshold are discarded. Then accuracy is then calculated for the remaining annotations both on the level of individual levels and the functional group level. This is plotted along with the fraction of points above the threshold
- Confusion Matrix
Users can investigate any point on the threshold sweep by selecting a threshold and label mode using the form provided. A confusion matrix is then displayed for the selection.