How We’re Solving Data Discovery Challenges at Shopify – Shopify Engineering

How We’re Solving Data Discovery Challenges at Shopify
    
    
    
      – Shopify Engineering

Humans generate a lot of data. Every two days we create as much data as we did from the beginning of time until 2003! The International Data Corporation estimates the global datasphere totaled 33 zettabytes (one trillion gigabytes) in 2018. The estimate for 2025 is 175 ZBs, an increase of 430%. This growth is challenging organizations across all industries to rethink their data pipelines. The nature of data usage is problem driven, meaning data assets (tables, reports, dashboards, etc.) are aggregated from underlying data assets to help decision making about a particular business problem, feed a machine learning algorithm, or serve as an input to another data asset. This process is repeated multiple times, sometimes for the same problems, and results in a large number of data assets serving a wide variety of purposes. Data discovery and management is the practice of cataloguing these data assets and all of the applicable metadata that saves time for data professionals, increasing data

24 mentions: @fulhack@ShopifyEng@ShopifyData@datiobd@Ubunta@MartijnSch@miry_sof@theresiatanzil
Date: 2020/08/01 18:53

Referring Tweets

@fulhack Indexing your data sets and documenting it is another example of a problem lots of companies have but there’s no great tool for it, so everyone builds their own. Here’s Shopify’s: t.co/8LvXIkaHqt
@ShopifyData ICYMI checkout our latest blog that dives into the architecture & UX of Artifact, a #datamanagement tool we built to combat our #data discovery challenges at Shopify t.co/lvg1BLgrBv
@ShopifyEng The data discovery issues at Shopify can be categorized into three main challenges: curation, governance, and accessibility. - Ranko Cupovic t.co/rfoAhejqok
@Ubunta Data Discovery Platform 🥇 Data discovery and management is based on - 📍Acquire ✔️Where is the data coming from? ✔️What is the quality of this data? ✔️Who owns the data source? 📍Transform 📜What transformations are being ap…t.co/lbq2x9LmPJ t.co/DBklkKTgCd

Related Entries

Read more Interview with Data Scientist at kaggle: Dr. Rachael Tatman
0 users, 14 mentions 2019/01/08 07:53
Read more Near-perfect point-goal navigation from 2.5 billion frames of experience
0 users, 3 mentions 2020/01/21 18:52
Read more The Next Generation of Machine Learning Tools | Roman Ring
0 users, 9 mentions 2020/01/22 00:00
Read more Research in Brief: Making conversation models more empathic - YouTube
0 users, 3 mentions 2020/02/04 21:51
Read more Higher accuracy on vision models with EfficientNet-Lite — The TensorFlow Blog
0 users, 7 mentions 2020/03/23 08:20