TipRanks

Notifications

Decentralizing Blockchain Data with Indexers: Interview with SubQuery’s James Bayly

Blockchains are sometimes referred to as decentralized but very slow databases that sacrifice a lot of the performance of modern-day computing. While somewhat cheeky, it is true that an average bank’s database would be able to handle many times more transactions than Bitcoin (BTC-USD) or other leading blockchains.

But there’s a reason why blockchains are “slow.” The benefits of decentralization are hard to overstate, and, in any case, the technology is advancing at a rapid pace to increase the capacity of the blockchain systems.

As the data throughput of blockchain systems increases, we will increasingly need solutions to efficiently read and organize this data. This is where the ecosystem of blockchain indexers comes in: services that create an easy-to-query interface to access the data stored in blockchains, including historical transactions.

Therefore, we’ve sat down with James Bayly, the COO of blockchain indexing project SubQuery, to talk about the space as it is right now and how Bayly sees it evolving over time. The interview focused on how data indexing is made tamper-proof and reliable, who uses it, and how the project plans on out-innovating its competition with SubQuery Network.

Hey James, nice to have you here! Let’s begin from a general overview. What is data indexing in a blockchain setting, and why is it necessary? Why can’t we just get this data from the blockchain nodes directly?

JB: One of the critical components of every Web3 decentralized application (dApp) is the indexer. In simple terms, an indexer is software that collects and organizes data stored on the blockchain network into a more performant and available medium, allowing developers to query that data quickly and efficiently. They are essential because they enable dApps to function quickly and support the load of millions of users.

The way that blockchain data is stored on a network makes it difficult and time-consuming for developers to retrieve that data. A simple example is reading the last 10 transactions executed by the current user — this is an extremely difficult request for blockchain nodes as they’d need to sequentially scan every single block in search for the data.

Indexers like SubQuery can quickly retrieve data from a blockchain network and save it in a format that is easy to work with, for example, in searchable and sortable fields that allow developers to quickly find the information they need.

What are the challenges of making indexed blockchain data available for users? One could imagine that ensuring the data is correct and untampered with is one of the key requirements, right? What are some ways of decentralizing the indexing ecosystem?

JB: Indeed, the indexer is often the last remaining centralized service critical to the performance of a dApp. A decentralized indexer can help to ensure that the data is accurate, unbiased, and accessible to everyone and [that] your dApp is more performant and reliable.

In SubQuery’s case, we solve it by facilitating a network where any participant can join and run decentralized indexing tasks for the network in a trustless but verifiable way. Since your decentralized data is not controlled by a single entity or group, you can verify the accuracy of the data in the indexer and ensure that there is no manipulation or bias. 

What about other providers in the space? How does SubQuery compare to some competitors like The Graph, Covalent, Subsquid, and others?

JB: Some indexers, like Covalent and Unmarshal, are general-purpose indexers for standard datasets, e.g. transaction lists and blocks. The problem is that they have no flexibility — you can’t add more data that you need to make your dApp more feature rich or intuitive.

Some other providers, like SubSquid, are centralized, meaning that you’re reliant on a server run by their team, and you can’t easily verify the accuracy of the data in the indexer.

SubQuery is most like the Graph but brings some major improvements, including the ability to make external API calls, import external libraries, and protect against DoS attacks. Additionally, we have no plans to sunset our simplified managed service.

Both SubQuery and The Graph are designed to index data fast, but analysis shows that SubQuery is 1.85x faster for common projects over The Graph. With faster sync times, developers can iterate faster and deliver features to market quicker.

Who is using indexed data today? What about some interesting use cases for utilizing indexed data in the future?

JB: Every Web3 application, website, business intelligence tool, or extension has some need for indexed data today. This is such a critical aspect of application development that most developers need to harness fast and effective indexing in order to build a competitive advantage. 

We have customers who build wallets, run DeFi exchanges, monitor blockchains for events, manage marketplaces for NFTs, and even run AI workloads across chain data. We are also looking outside of Web3 at the newest innovations in Big Data in order to bring their benefits and changes into our industry.

A potentially major use case of data indexing is servicing dedicated decentralized data storage protocols like Filecoin or IPFS. Do you see demand for such a combination now or in the future?

JB: Just like data indexing is one major component of a Web3 dApp, another is decentralized storage. For example, they can be used to host resources needed for the front-end interface of the dApp, allowing users to interact with the application in a completely decentralized way.

We’re working closely with some decentralized storage providers like Crust and IPFS and also use both solutions heavily in our decentralized data infrastructure. Many of our customers want to use a decentralized data indexer to populate and store key data in decentralized data storage — and we’re helping them do just that.

What is the SubQuery Network? You recently announced your launch on Polygon (MATIC-USD). What is the long-term vision for that?

JB: The SubQuery Network indexes and services data to the global community in an incentivized and verifiable way. We’re building the most open, performant, reliable, and scalable data service for dApp developers, and we’re launching on Polygon.

We chose Polygon for a few reasons. Firstly, performance; our plans for the SubQuery Network are vast, so we need a network that scales to serve billions of API calls for data to millions of recipients each day. We wanted to find a hub with bridges and integrations to all other chains that we support so that our customers could easily access the network. 

Finally, the community size and engagement mean there are a number of developers in Polygon that will benefit from the growth of SubQuery in the future.

We are beginning with Kepler Network, the pre-mainnet of the SubQuery Network, which enables users to progressively bootstrap and test the features of the SubQuery Network.

Kepler runs off the same smart contracts that our mainnet will do; the key difference is that certain features will be slowly enabled and brought online in a sustainable way. Once ready, Kepler will morph into the SubQuery Network. We expect this to happen largely seamlessly, as the contracts won’t change that much, and kSQT can be burned for SQT. 

From this point, the future of decentralized data indexing will be live.

Disclosure 

Reuben Jackson
Reuben is a blockchain security consultant living in NYC. Outside office hours, he has been reporting on the blockchain-crypto space for a few years now.