All recent innovation in the data has taken place in two areas - helping data engineers produce data, and helping data consumers (primarily data analysts and scientists) consume that data. Data warehouses and lakes are flooding with data, but the consumers still don’t know what exists and what to trust.

The biggest gap doesn’t sit in production or consumption of data but right between them. Data Engineers continuously report being bombarded by questions from users while striving to deliver it on time and with high quality. Analysts and Data Scientists spend a huge amount of time answering questions around the source of truth of data, how it is usually used, how it gets produced and validating that it's the right source for them to use. At Lyft, over 30% of analyst time was wasted in finding and validating trusted data.

This story is not unique to Lyft. Stemma helps users discover & trust data, and organizations to safeguard the privacy and security of their data subjects.

The data production-consumption gap