FacebookInstagramTwitterContact

 

Marco Reus Buys Beer For All Dortmund Fans At Farewell Game           >>           Slisz's 1st Career Goal Gives Atlanta Draw With Nashville           >>           Atalanta Seal UCL Place, Seek To End Leverkusen Run In UEL Final           >>           Chelsea Crowned WSL Champions After Crushing Man United           >>           Emma Hayes Ends Chelsea Reign With 5th Successive WSL Title           >>           How To Watch Blue Origin's NS-25 Private Space Tourist Mission Online May 19           >>           Yuck: Slack Has Been Scanning Your Messages To Train Its AI Models           >>           Adobe Threatens To Sue Nintendo Emulator Delta For Its Look-Alike Logo           >>           Iran Nobel Laureate Says She Faces New Trial           >>           Russia Could Increase Ukraine Attacks, Says Zelensky           >>          

 

SHARE THIS ARTICLE




REACH US


GENERAL INQUIRY

[email protected]

 

ADVERTISING

[email protected]

 

PRESS RELEASE

[email protected]

 

HOTLINE

+673 222-0178 [Office Hour]

+673 223-6740 [Fax]

 



Upcoming Events





Prayer Times


The prayer times for Brunei-Muara and Temburong districts. For Tutong add 1 minute and for Belait add 3 minutes.


Imsak

: 05:01 AM

Subuh

: 05:11 AM

Syuruk

: 06:29 AM

Doha

: 06:51 AM

Zohor

: 12:32 PM

Asar

: 03:44 PM

Maghrib

: 06:32 PM

Isyak

: 07:42 PM

 



The Business Directory


 

 



Security & Privacy


  Home > Security & Privacy


MIT Study Finds Labelling Errors In Datasets Used To Test AI


ImageNet/MIT

 


 March 29th, 2021  |  13:42 PM  |   1226 views

ENGADGET

 

Over three percent of data in the most-cited datasets was deemed inaccurate or mislabeled.

 

A team led by computer scientists from MIT examined ten of the most-cited datasets used to test machine learning systems. They found that around 3.4 percent of the data was inaccurate or mislabeled, which could cause problems in AI systems that use these datasets.

 

The datasets, which have each been cited more than 100,000 times, include text-based ones from newsgroups, Amazon and IMDb. Errors emerged from issues like Amazon product reviews being mislabeled as positive when they were actually negative and vice versa.

 

Some of the image-based errors result from mixing up animal species. Others arose from mislabeling photos with less-prominent objects ("water bottle" instead of the mountain bike it's attached to, for instance). One particularly galling example that emerged was a baby being confused for a nipple.

 

One of the datasets centers around audio from YouTube videos. A clip of a YouTuber talking to the camera for three and a half minutes was labeled as "church bell," even though one could only be heard in the last 30 seconds or so. Another error emerged from a misclassification of a Bruce Springsteen performance as an orchestra.

 

To find possible errors, the researchers used a framework called confident learning, which examines datasets for label noise (or irrelevant data). They validated the possible mistakes using Mechanical Turk, and found around 54 percent of the data that the algorithm flagged had incorrect labels. The researchers found the QuickDraw test set had the most errors with around 5 million (about 10 percent of the dataset). The team created a website so that anyone can browse the label errors.

 

Some of the errors are relatively minor and others seem to be a case of splitting hairs (a closeup of a Mac command key labeled as a "computer keyboard" is still correct). Sometimes, the confident learning approach got it wrong too, like confusing a correctly labeled image of tuning forks for a menorah.

 

If labels are even a little off, that could lead to huge ramifications for machine learning systems. If an AI system can't tell the difference between a grocery and a bunch of crabs, it'd be hard to trust it with pouring you a drink.

 


 

Source:
courtesy of ENGADGET

by Kris Holt

 

If you have any stories or news that you would like to share with the global online community, please feel free to share it with us by contacting us directly at [email protected]

 

Related News


Lahad Datu Murder: Remand Of 13 Students Extende

 2024-03-30 07:57:54

Flash Floods Kill At Least 50 In Afghanistan

 2024-05-19 00:23:50

We'll Need Universal Basic Income - AI 'Godfather'

 2024-05-19 01:20:48