Data Loop 01570

A story of sheep and technology – Our investment in Dataloop

| Insights

Here at NGP Capital, we love data – tons of it! Day to day we work leveraging our very own home-built data powerhouse ‘Q’ ’. Like a moth to a flame, we’re drawn to new ways of working with data and inevitably, to startup companies innovating in this space.

And so, naturally, we were attracted to the topic of data annotation; or in other words how to structure data so it can be utilized by AI, to give machines ears, eyes and a brain. Ears to hear, for music, voice and video data; eyes to see, for images, text and videos; and a brain to think; to be able to identify and understand the context of the data it is presented with, such as the situation of a child sleeping or a dog playing.

This is of course not a new topic. We have been covering data annotation and labeling for a while now, particularly, framed by conversations around autonomous vehicles and huge training data requirements of self-driving cars. Data requirements for automation, it seems, became a large problem overnight and subsequently, several companies raised hundreds of millions of dollars at unicorn valuations.

For future cars, for example, to drive themselves, the technology requires adequate training data for the vehicles to reach a high level of automation. This is especially true around data representing rare real-world situations or edge cases.

The first generation of data annotation often needed thousands of people to manually annotate data. The business model was based on the complexity, number of images, and desired accuracy of the annotation. These first-generation annotation platforms were built for humans to manually perform the tasks supported by technology. The second generation is about automation and enabling humans to build applications with a SaaS business model. Part of the automation will be also the separation of the annotation tasks into two parts, pattern detection and meaning assignment.

This is where Dataloop comes in. Dataloop is the leader of the second generation of data annotation, and we are excited to lead their $33m Series B together with Alpha Wave Global and participation from existing investors like Amiti, F2 VC, and OurCrowd.

Product & Technology

Dataloop's platform allows scaling AI to production. The platform consists of three key elements. First, the platform enables unstructured data management, meaning that the customer can explore, search, filter, and visualize complex and resource-intensive data. The data management feature provides a source of truth for all training data. Second, Dataloop provides annotation tools for various types of data, including images, video, audio, text, and LIDAR, Dataloop refers to these tools as data applications and will soon release new data applications designed for labeling massive amounts of data items in a few clicks. The annotation tools utilize AI-assisted labeling based on off-the-shelf or the customers’ models. Third, Dataloop’s platform provides data pipelines, which allows building production workflows and automation with a no-code drag and drop interface.

Dataloop’s key differentiation is its developer first platform approach, which includes a very modular design, integrating well with other solutions in the ML stack (well in sync with the concept of modern data stack), and allowing 3rd party applications and models being developed and run on the platform. The data management module is often a key reason why customers choose Dataloop. Dataloop also just released its 2nd generation data management engine, designed to handle 100 million items per single dataset with real-time, sub second query response which is critical to data engineers working directly with the data.

Why does it matter?

Data management and labeling capabilities are crucial for building new AI-based products and companies. For example, Snowflake states in its 2022 data science report that by 2025 80% of the world’s data will be unstructured, and only 0.5% of the data is analyzed today. The preparation of data (data loading, data cleansing, and data visualization) takes 80% of the work time of data scientists alone (Anacoda), and for each data scientist there are 1-3 data engineers. This has all we love: A big growing market with a real pain point due to the lack of talent and importance of automation. Automation equals ROI.

How did it start?

Dataloop was founded in 2017 by Eran Shlomo (CEO), Avi Yashar (CPO), and Nir Buschi (CBO). Before founding Dataloop, Eran and Avi worked at Intel together for many years and worked jointly on many innovation projects. The full story is here: https://dataloop.ai/book/intro/.

Both worked in startups before and really understand the startup and big-company world. Nir worked over 10 years in various business development roles in addition to being a founder of two technology companies before meeting Eran and Avi.

Today, the team is so much more than the three founders, over 60 amazing people from a variety of backgrounds work at Dataloop. In fact, Dataloop is probably one of the most diverse companies we have seen in a long time.

From the village hacker to the heart of the startup nation

Eran, CEO of Dataloop, had a quite unusual journey into the Israeli tech eco-system. Born into a family of sheep and cattle farmers in rural Israel, in a time before the mainstream adoption of home computers. At the age of 9 Eran was gifted his first commodore computer by his mother who saw the kid had a touch to computers and spent 2 years of the family savings to get it. Along with the computer came a heavy programming book, by the age of 11, Eran was a self-taught programmer, already coding full applications and the path from being the village hacker to the heart of the startup nation was paved. This small, unexpected, but important event in his life incepted the humble tech visionary that we know and are thrilled to have invested in and to be partnering with today.

Read the news on TechCrunch