@braddwyerThis is really scary: one of the most popular open sourced self-driving car datasets has been missing labels for hundreds of pedestrians and dozens of cyclists for years, and nobody has noticed.
This is how people get run over.
@bigdata“If you're using public datasets in your projects, please do your due diligence and check their integrity” t.co/S6bktFWrRR
1 RT, 7 Fav2020/02/12 06:49
@kushnerbombsoftware bugs may kill a lot of people, but they also help a lot of people get to work on time, so, it;s impossible to say if they're bad or not, t.co/85V2IqLLaO
3 RT, 7 Fav2020/02/11 19:37
@dancowdaily reminder that the hardest part of data science and machine learning is the data. A self-driving algorithm trained on images that mislabels/doesn't label pedestrians is going to have a bad time with pedestrians
2 RT, 6 Fav2020/02/11 18:58
@went1955Self-driving car dataset missing labels for hundreds of pedestrians. Open source datasets are great, but if the public is going to trust our community with their safety we need to do a better job of ensuring the data we're sharing is complete and accurate t.co/n1xHK0mNZG
@MathieuTriclotA popular self-driving car dataset is missing labels for hundreds of pedestrians : t.co/Fqxy0fmbXu | We did a hand-check of the 15,000 images in the widely used Udacity Dataset 2 and found problems with 4,986 (33%) of them
0 RT, 3 Fav2020/02/11 17:21
@matroidThis is why an annotation tool has been part of the Matroid product since day one: to inspect and correct such problems before training even begins. t.co/6LiuE7PIJb
0 RT, 2 Fav2020/02/12 08:22
GitHub - magenta/ddsp: DDSP: Differentiable Digital Signal Processing