@braddwyerThis is really scary: one of the most popular open sourced self-driving car datasets has been missing labels for hundreds of pedestrians and dozens of cyclists for years, and nobody has noticed.
This is how people get run over.
4 RT, 13 Fav2020/02/11 14:49
@bigdata“If you're using public datasets in your projects, please do your due diligence and check their integrity” t.co/S6bktFWrRR
@went1955Self-driving car dataset missing labels for hundreds of pedestrians. Open source datasets are great, but if the public is going to trust our community with their safety we need to do a better job of ensuring the data we're sharing is complete and accurate t.co/n1xHK0mNZG
2 RT, 5 Fav2020/02/12 21:08
@dancowdaily reminder that the hardest part of data science and machine learning is the data. A self-driving algorithm trained on images that mislabels/doesn't label pedestrians is going to have a bad time with pedestrians
2 RT, 6 Fav2020/02/11 18:58
@MathieuTriclotA popular self-driving car dataset is missing labels for hundreds of pedestrians : t.co/Fqxy0fmbXu | We did a hand-check of the 15,000 images in the widely used Udacity Dataset 2 and found problems with 4,986 (33%) of them
0 RT, 3 Fav2020/02/11 17:21
ML From Scratch, Part 6: Principal Component Analysis - OranLooney.com