Mike Cahill is a director of the Pyth Data Association, the Swiss association behind the Pyth network, a decentralized oracle that brings financial data across asset classes on-chain–directly from major trading firms such as Jump, Susquehanna and Virtu. Oracles are data feeds that provide information such as asset prices to blockchain-based financial applications so that they know when to take actions such as executing a trade or liquidating a collateral position.
In this discussion, we cover the role that oracles play in the growth of multiple crypto verticals across DeFi, NFTs and Web3. Cahill also makes his case for why he believes that Pyth is more resilient and trustworthy than other top oracles in crypto today, such as Chainlink.
Forbes: Who are some of your biggest data providers and who are some of the biggest clients or consumers of Pyth data?
Cahill: From a trading firm category, it would be Jump, Jane Street, Susquehanna, DRW, Virtu, and Hudson River. Those are six that immediately come to mind. On the U.S. equities front, CBOE is by far the largest, but there’s MEMX and IEX, as well. In crypto, there’s a whole swath of them from FTX to KuCoin, Huobi and OKEx. On the FX front, we’ve got some like LMAX, which run crypto exchanges but also have got an FX exchange as well.
Forbes: What are the main use cases for oracles today?
Mike Cahill: Within crypto the primary use cases thus far have been in Web3, DeFi or NFTs. I would say more broadly that you don’t really need oracles for NFTs, so that really leaves DeFi. It’s a big use case. When you subdivide the category down even further, there are two classes where there’s a lot of integration, and those are lending protocols and trading. Trading has its own subdivisions, but I’ll start first with the lending protocols. Lending protocols require the mark to market value of the collateral at all times in order to assess the viability of the loan. If the collateral level is breached, then a liquidator can liquidate that position on the loan.
On the trading side, as I mentioned, there are a few subdivisions. The first one is any sort of trading contract that has an expiry like a future. But in crypto, we actually see more use cases that look like structured products. They are called decentralized option vaults, and they have an expiry of a certain time, and they’ll look at the Pyth price and say, “Okay, this is the price and we’re going to settle the contract based on the agreed-upon rules.”
The second one are largely exchanges that have a non-fungible derivative, that has a cousin, that’s a sort of global market. The third one is automatic market makers (AMMs) where the price of Pyth is actually used to determine where people exchange and the liquidity providers are passively getting exposure to that midpoint price from Pyth with potentially some sort of a liquidity or slippage curve on it. That one’s really cool because in version 1 of AMMs like Uniswap they would rely on arbitrageurs to basically trade against the bad price that was the fixed bonding curve. And the liquidity providers would all lose money and so much so that they had a name for it, impermanent loss. And just holding your position long enough, and eventually, you’ll hopefully make more money. But with these virtual AMMs or these oracle-empowered AMMs, we’re seeing a lot more liquidity providing tokens or participants that don’t lose money. So those are the three broad categories and, as I said, there are other people that experiment with all sorts of other things.
Forbes: What do you think today’s oracles do well?
Cahill: I’ll categorize Pyth as a first party data provider, and we’re the only one in that category. I think that’s the right model. So let’s just say that when I’m broadly speaking about oracles I’m talking about this model. I think what we do really well is we get a price on chain in a very timely manner, and we’re able to articulate the spectrum of prices that are able to happen at any given point. Oracle attacks typically happen because you’ve got either people manipulating the data feed because there’s not enough data providers or you can identify that the liquidity on the data provider or the data set is smaller than the liquidity that can be moved. So if I’m getting my oracle price from an exchange that trades $10 million a day, but there’s $100 million of liquidity that’s being pinned to it, I actually can arbitrage that by moving the price deliberately. What I think Pyth does really well is by having now more than 70 data providers, it basically creates so much redundancy where it is really difficult to try and find a collusion mechanism or a way to get people to give you a bad price. They’re all composing at the same time on a public blockchain. There’s no two-sided market where it’s done first, and then it’s published later. I think that’s developed a lot of trust from people. I think the other thing that gets done well is that we have a wide spectrum of assets that are covered. Crypto is well-covered, but also there’s a representation of equities, FX and metals. So I think we do a really good representation of what sort is possible and what can be used.
Forbes: Can you break down Pyth’s setup, who’s participating, how it works and how it differs from some of the other oracles out there?
Cahill: I’ll start by just describing how I view the Chainlink model, and then we can get into the differences. And while I call it the Chainlink model, it’s also the same model that other oracles have used before. But basically, there’s an incentive mechanism that encourages a group of nodes to go out and fetch data. That’s the simplest way to put it. You could think of them as screen scrapers. They’re going out and creating data from basically the internet, where it’s pretty much publicly available. They take public endpoints, go out and fetch the data, and then they publish it to blockchains. So the underlying philosophy there or hypothesis is that the data from Web2 is robust enough, and that the data for Web3 can just be bridged over by having enough people publicly observe it. You’ve almost crowdsourced a way to choose data. The limitations to that are that you’re getting data from public endpoints and those public endpoints don’t have the same level of reliability that you’ll need for crypto. We know examples of when websites have gone down like Yahoo Finance or TradingView and it’s annoying, but you can switch to another one. Had they been securing the total value locked (TVL) of a lending protocol and the number was gone or null, it would create a big problem. That actually happened when Chainlink stopped pricing Luna during the depegging of TerraUSD (UST) and lending protocol Venus had wrongful liquidations as a consequence.
The other element of it is something that’s not yet a problem today, but we think will become a problem in the future. And we use that analogy to digital music because the same thing happened to digital music. Back when Napster or Limewire were available, the large music industry didn’t fully take notice because it was such a nascent thing. But eventually, when they realized that digital music is going to be the way people consume music in the future there were huge lawsuits. If you’re relying on taking data that is potentially licensed, but not approved for distribution on chain, you’re going to have some serious limitations. So you can’t actually take the price of Apple from Yahoo Finance and publish it on chain. That’s why those oracle models don’t have equities; they only are limited to crypto.
The Pyth model is very different. It asks how you can get the data providers themselves to publish directly on chain and not have this middleman node involved. It just wants to get them to run this software that publishes on chain and then use a smart contract that aggregates the data together in full transparency and full public view. So that you can see the component prices, you can see the aggregation and look at the math, and it gets published out. And that’s where we’ve got these 75 data providers. They span from the largest trading firms in the world–the most recent one that was added was CBOE, which is the third largest exchange group that owns a U.S. equity exchange. They also obviously own the Chicago Board of Options Exchange, are the predominant place where you trade the VIX and have now a digital assets marketplace as well as an FX market. Huge blue chip names like this are using Pyth as a mechanism to be able to participate using secure data in Web3. They’re not using this as a way to replace what they do in Web2, or to change their market data philosophy on how it works on off chain data. They’re doing it to step into the world of being an oracle data provider and seeing how they can help grow this ecosystem.
Forbes: Tell me a bit more specifically about how it works–the 75 data providers. What are the assets covered?
Cahill: Each data provider will look at their set of data and decide what’s appropriate for them to publish. So there’s 75, but they don’t all publish every one of the symbols. There are about 70 crypto symbols, another 20 or so that are in U.S. equities, and then a handful split between FX and metals. So the data providers will derive their update for Pyth from some combination of either their order book, if they’re an exchange, or if they’re a trading firm, usually the last price with which they traded at. Then they publish not only one price, they publish a confidence interval around that price. So if they’re a trader, and they’re trading bitcoin, maybe bitcoin has got some huge dispersion and they want to represent that. They could say that they think the price is $20,000 but there’s a likelihood that it could be down to $18,000 or up to $22,000 because the markets are very volatile, some of the exchanges have limited or halted withdrawals and deposits. They may say, “we think that the right price is going to be plus or minus $2,000?” Another exchange may say, “Well, no, we actually published our price at $20,000 and we’re only going to tell you what the top of book is.” So each one of these data sources will publish their price as well as their confidence interval. They do it to the Pyth Net, a private instance of the Solana code that only runs Pyth as a software. But you can use block explorers to see the full breakdown of every transaction and it gets aggregated and the aggregation formula basically looks at all the points that have been sent in (each person sends in three points). They get combined together and then the aggregate gives you a price as well as the aggregate competence interval. That’s distributed from Pyth Net to every chain that Wormhole (a crypto bridge) is connected to at the same speeds that Solana processes transactions at (every 400 milliseconds). We’ve just launched on the BNB chain. Venus is our launch partner there. Then what they’ll do is look at these prices coming in every 400 milliseconds, and then they’ll push it only when they need it. So it’s a cost effective model, but has full transparency.
Forbes: Which projects are the biggest users?
Cahill: Pyth started a year ago on Solana; it’s only now going cross chain. So if you look at projects that require an oracle on Solana, Pyth covers about 98% of the TVL. And this is a chain where there’s plenty of other oracles available like Chainlink or Switchboard. So anyone that uses an oracle there, will use Pyth. I’ll give you a few names. On the lending side, the largest lending protocol is Solend. On the trading side you’ve got Mango Markets whose perpetuals exchange, and Friktion is a decentralized option vault, and one of those virtual AMMs would be lift Lifinity.
Forbes: What happened during the Mango Market’s exploit from Pyth’s perspective?
Cahill: Pyth was not at all involved in the Mango Markets exploit. These exploits tend to go after pools with low-liquidity where they can easily manipulate prices, which is what we saw with Mango.
Forbes: When it comes to oracles there are always questions about bad actors and ways to disincentivize poor behavior like price manipulation. How does Pyth protect against being misappropriated in those ways?
Cahill: Today, it’s based on the fact that each one of these data providers has announced themselves, and they’re all deliberately household names. But what we want to get to is a full staking model. So each data provider will be required to post a stake and then will publish updates to the Pyth contract. So long as the updates are correct, they’ll be eligible for fees and rewards. If the data is incorrect, then they’ll be slashed. This allows us to go beyond household names and basically look at a much wider universe of people to come that you hadn’t heard of perhaps.
Forbes: Have you had any discussions with regulators with regards to the creation of this product?
Cahill: No. I’d say in the course of business, we may have come across regulators. I wouldn’t say this is something that they’ve reached out about necessarily and it’s not certainly on their radar. It doesn’t feel like an area where there’s tons of interest right now.
Forbes: It sounds like your plan involves expanding out into multiple chains and scaling up as you move beyond Solana, but what else is on your roadmap for the rest of this year?
Cahill: The new chains are the biggest deal for us. We are very dominant on Solana, and a lot of it became because of the features that Pyth had that other oracles were unable to provide. So our thesis is that we’re going to have similar levels of adoption from other chains. We’re very focused on expanding and making sure that these other chains get first class service from Pyth. And so those will be the EVM compatible ones, which are all live now. As well as the move language ones like Aptos and Sui. And finding the future blue chip firms that are building on those ecosystems.
Number two is building out the rest of the stuff from the white paper–having that staking model is an important thing for us to make sure that there are ways for people to participate through things like governance. The third thing around asset classes being on Pyth net enables us to raise the ceiling dramatically. We want to be over 1,000 symbols by this time next year; we’ll probably be mostly financial market data through crypto and in traditional assets. I think that we will most likely also get into where crypto and sports merge. So things like fan tokens are probably within scope. Then if something becomes super interesting and super hot, like let’s say that sports becomes a huge industry within crypto. I think that would be an area where we were trying to pivot toward getting into our scope as well.
Forbes: Thank you.