Hive raises $85M for AI-based APIs to help moderate content, identify objects and more

As content moderation continues to be a critical aspect of how social media platforms work — one that they may be pressured to get right, or at least do better in tackling — a startup that has built a set of data and image models to help with that, along with any other tasks that require automatically detecting objects or text, is announcing a big round of funding.

Hive, which has built a training data trove based on crowdsourced contributions from some 2 million people globally, which then powers a set of APIs that can be used to identify automatically images of objects, words and phrases — a process used not just in content moderation platforms, but also in building algorithms for autonomous systems, back-office data processing, and more — has raised $85 million in a Series D round of funding that the startup has confirmed values it at $2 billion.

“At the heart of what we’re doing is building AI models that can help automate work that used to be manual,” said Kevin Guo, Hive’s co-founder and CEO. “We’ve heard about RPA and other workflow automation, and that is important too but what that has also established is that there are certain things that humans should not have to do that is very structural, but those systems can’t actually address a lot of other work that is unstructured.” Hive’s models help bring structure to that other work, and Guo claims they provide “near human level accuracy.”

The funding is being led by Glynn Capital, with General Catalyst, Tomales Bay Capital, Jericho Capital, and Bain & Company, and other unnamed investors participating. The company has now raised $121 million, making this latest round a particularly big leap.

The company has been somewhat under the radar since it was founded in 2017, in what appears to have been a pivot from founder Kevin Guo’s previous startup, a Q&A platform that was called Kiwi, which itself was a product of a project out of his time at Stanford. But since then it has quietly picked up some interesting customers, including Reddit, Yubo, Chatroulette, Omegle, and Tango, along with NBCUniversal, Interpublic Group, Walmart, Visa, Anheuser-Busch InBev, and more. In all it has some 100 customers and has grown more than 300% in the last year.

Hive had its start with image identification, and working with companies building autonomous systems. In fact, if you talk with Guo over Zoom, chances are you’ll get a screenshot of some of that work as a background, with cars darting across Golden Gate Bridge.

These days, however, most of Hive’s activity (pardon the pun) comes around moderation, some of which includes images, but others including text and streamed audio — which is converted into text and then moderated as that would be. (The autonomous car modelling is still used as a backdrop, I believe, because it’s a little less disturbing than a content moderation image, as you can see below.)

Image Credits: Hive (opens in a new window) under a CC BY 2.0 (opens in a new window) license.

In part because it’s a very classic problem that you can imagine will be solved or helped with the use of AI, and in part because it’s such a big issue on the internet today, there are a number of other startups building platforms to help manage online abuse, including harassment, and to help with content moderation.

They include the likes of Sentropy, Block Party, L1ght, and Spectrum Labs, not to mention a lot of tools being built in-house by big technology companies themselves. (Instagram for example launched its latest tools to help users combat abuse in DMs just today: it built the whole thing in-house, the company told me.)

But as Kevin Guo describes it, what has set Hive apart from the crowd has been the crowd, so to speak. Over the last several years, the company has slowly been building up a trove of data by crowdsourcing feedback from some 2 million users, who get paid — either in ‘normal’ money or Bitcoin — to go through various images and items of text in order to identify “abuse” or other things. (Bitcoin started as a fringe offering and now accounts for the majority of how contributors get paid, Guo said.)

That database in turn powers a set of APIs used by Hive’s customers to help them run their own moderation tools, or whatever workflow requires frequent and rapid identification.

Most of the language learning in the system right now is based around English and several other popular global languages such as Spanish and French. Some of the funding will be used to help expand its reach and global coverage, including into a wider set of tongues. This is also leading to a wider set of use cases for the data and technology that Hive has built.

One of these, Guo said, includes a new approach to advertising that is based around serving ads associated with something you may have just read or seen on the screen. Very GDPR friendly because it involves absolutely no involvement of data based you or your online browsing activities (anonymised or not), this is picking up traction with brands who initially may have come to Hive to help protect their IP or reputation management, and are now considering how they can use the tool to spread the word about themselves in more effective ways.

The possibilities for how Hive’s AI can be used in the future, is part of what attracted the investment today. The focus on how it has been built in the cloud underscores that extensibility.

“Cloud computing has seen tremendous adoption in recent years, but only a small fraction of companies currently leverage cloud-based machine learning solutions,” said Charlie Friedland, principal at Glynn Capital, in a statement. “We believe cloud-hosted machine learning models will represent one of the most significant components of cloud growth in the years to come, and Hive is well-positioned as an early leader in the space.”

It’s notable to me that for now at least Hive doesn’t disclose any big technology companies among its customers. That may partly be due to NDAs, but Guo points out that their in-house activities, which include heavy doses of human involvement, have made them somewhat less willing customers up to now. That could be changing however, not just because AI tools are improving, but because of the problems that have arisen from some of the current routes, such as the run of controversial stories about social media content moderators and the traumas that they have faced.

In terms of future deals, those might come by way of some of Hive’s strategic backers and strategic partnerships. The company currently works with companies like Cognizant, Comscore and Bain (which is an investor), who in turn provide consulting and services to larger tech companies that have opted to outsource some of their human moderation work. Whether those human moderators shift up practices or not, chances are that tech will be playing an increasing role in the bigger process of trying to give more structure both to shaping and adhering to abuse policies.