Ways to address concerns in automated data annotation

Pandas and gibbons differ quite significantly as far as their appearances go. A sober human would hardly mistake one for the other. But to an AI system, they apparently are identical.

Add some noise to a picture of a panda, and an AI system (mis)identifies the image as a gibbon —with absolute confidence to boot.

Bloopers such as this caution us to be wary and not go overboard with automation, especially for fundamentally critical tasks like annotation.

Automation though has been a boon for annotation and has helped reduce much banefulness associated with it. Data annotation has as a result become more efficient and effective—and a lot less tedious. However there are lingering concerns with automated data annotation.

The concerns are pressing and justified given how pivotal a role data annotation plays in machine learning. And so a deeper dive into them is worthwhile.

On this page

Concerns with automated data annotation and their redresses

Automation of data annotation has one primary purpose: to annotate vast amounts of data quickly. And this has been achieved remarkably, though in the process sacrificing a small degree of quality for a much larger degree in quantity. But this is not all. Issues and concerns abound with automated data annotation.

Inaccurate annotation

Because automated annotation tools learn from the sample datasets they are trained with, they are inherently limited. This is because the sample is never exhaustive, fully accurate, flawless, or without bias.

These limitations mean that automated annotation systems struggle with, among other things, intricate details and nuanced contexts. They also struggle with subtle differences resulting in incorrectly identifying certain characteristics or missing them.

This is problematic because automated systems do not have innate or instinctive guiding principles; all they have are algorithms and data they have been trained with. And so shortcomings in these will inevitably produce poorly annotated data.

There is no silver bullet. But there are effective answers to this issue. Semi-automated annotation is the most reasonable. This involves human annotators manually annotating a subset of data for the automation system to learn so that it can then propagate to the rest of the dataset.

The quality of the automated annotations can be further improved by implementing active learning. Where there are complexities and uncertainties, humans strategically intervene and correct errors, thus improving the overall quality of the annotated data.

Sensitivity to noise and variability

Automated annotation systems can be sensitive to noise, outliers, or variations in the training datasets. This is due to the systems’ inability to distinguish meaningful patterns from random fluctuations in the data—or sometimes due to their proclivity to find patterns that do not generalize.

This may cause them to identify patterns where there are none or fail to transfer their learning to unfamiliar datasets.

And because of these systems’ propensity to perpetuate what they have learned—accurate or otherwise—the outcome of their annotation may be tainted. But they can also easily be confused by noise, like shadows, reflections, and occlusions, causing them to mislabel data.

To mitigate this, it is imperative to manually check the automatically labeled data for quality analysis. In addition, automated checks and validation scripts could be implemented to identify and rectify common annotation errors caused by noise and in noisy data.

This still would require human validation but the scope of the task would be much reduced. The uncertainty estimation of the model’s output can also be leveraged to gauge the level of confidence and credibility.

Robust filtration and smoothing of noisy data before annotation, too, would help improve the reliability of automated annotations.

Lack of explainability and transparency

What explainability is? – Image source: ScienceDirect

For all their progress and pervasiveness, AI systems remain essentially a black box. The lack of transparency in their operations and their inability to explain their decisions are acute concerns.

These facts are often overlooked because of their practicality and because higher predictive accuracy often further confounds explanation; so long as their outputs conform to expectations, then it doesn’t matter how the conclusions are reached, so the argument runs.

The lack of explainability and transparency, however, is a concern for several reasons. Providing context to annotations, handling biases, and addressing edge cases are more challenging as a result.

If, for example, the automatically annotated data is inaccurate, troubleshooting is more knotty because of the lack of understanding. They also make it less certain whether or when human intervention is necessitated. This can lead to perpetuation of implicit biases or misinterpretation of annotations.

To see through this, or rather to ensure that these concerns do not pass unnoticed, it is incumbent upon us to implement comprehensive validation procedures to assess the reliability and accuracy of the annotated data.

Use techniques like fairness-aware learning to attenuate existing biases and model-agnostic interpretability to generate explanations. None of these methods are perfect and they need to be guarded, but they help shed some light.

Bias in automation tools

Automation systems are trained with data collected or generated and labeled by humans which may be biased toward certain particularities. The problem here is not models per se. It is rather the replication of human biases without judgment and the absence of a means to identify biases and work toward mitigating them without human intervention.

The systems may not just reflect real-world biases but exacerbate them. The automatically annotated data could be skewed, inaccurate, discriminatory, or all of them.

And whereas humans may be able to rationalize (though an irrational rationalization it may be) and explain their biases, the same cannot be expected from the automation systems due to their unexplainability and opacity.

Given the pernicious downstream impacts biases in automation tools can have, it behooves those developing them to have diverse samples of data and fair representation of various examples of datasets.

They also need to widen the pool of labelers drawing them from different backgrounds or have multiple annotators annotate the same dataset.

This will ensure that, for example, Black Americans’ accents are not classified as hate speech or more toxic—as a lot of speech classification systems are wont to. And those using these tools need to bring human oversight into the mix to analyze and rectify biases.

Debiasing techniques can also be used. And establish detailed guidelines and train the annotators thoroughly to minimize bias.

Debiasing methods – Image source: Otavio Parraga et al.

Vulnerability to adversarial attacks

Machine learning models can outperform humans in certain data annotation tasks but they are highly susceptible to adversarial attacks and data poisoning, which cause them to go haywire. This makes their robustness questionable. One instance of this is adversarial examples.

The panda-gibbon classification goof we began with is a case of adversarial example. Because, in the case of image classification, automation systems use pixels rather than a conceptual understanding of the image, they can be befuddled by noises and fooled.

The noise can simply be a small color shift, or rotating, shearing, or scaling of a natural image. The noise may be maliciously inserted or it may be added unintentionally. This can compromise the reliability of the automation systems, making their annotations less accurate.

Noise causing an AI system to misidentify a panda | Source: MIT Technology Review

While these will continue to plague automation systems and make them less robust than could be, we can take certain actions to mitigate their effects. For one, the good old manual quality checks need to be employed frequently. Human annotators should be trained to recognize and handle potential adversarial attacks.

Adversarial examples can also be incorporated into the training dataset along with the correct labels to fine-tune and pre-empt the system from such adversarial attacks. Additionally, a binary classifier could be constructed that separates (or attempts to separate) the adversarial examples from the regular examples.

And the system performance needs to be continuously monitored and assessed so that the issues can be addressed immediately.

The best solution around

Automation has helped reduce the time and money involved with data annotation. And in some areas, automated systems even outperform humans—not just in speed but in accuracy.

But for all their prowess, they are beleaguered by many and pressing concerns. They have trouble handling ambiguities and edge cases, generalize poorly, and have low adaptability to name just a few nodi.

Is there a way around it?

There is no perfectly paved way forward, but there are safe paths. One of these is to invest heavily in professionals and tools to enhance the accuracy of data annotation. Another less costly and more viable path is to entrust the task of annotation to a reliable company with special expertise that provides data annotation services.

They can provide bespoke data annotations with greater accuracy than automated annotation—and often at a higher speed. Either way, human involvement is essential to assess and validate the quality of the annotation.

For now, and for the foreseeable future perhaps, human annotations remain more robust and accurate; and humans more reliable than machines—for many obvious reasons. For efficiency though human and machine synergy is optimal.

References:

Biggest Concerns with Automation in Data Annotation and How to Tackle Them