Dominic Weibel
Crypto Researcher
Oracles: Unblinding Blockchains - Part I
Apr 13, 2022
The Oracle problem: bringing real world data to the blockchain
Solving the oracle problem is key to achieving Smart Contract (SC) mass adoption across a wide variety of markets and use cases as Decentralized Finance (DeFi) aims to transform traditional financial products into trustless and transparent protocols that run without intermediaries. Thus, SCs offer a vast potential to redefine how independent entities engage in contractual agreements and exchange value. As the world becomes more digital, there is an ever-expanding reservoir of data and Application Programming Interfaces (APIs) that hold endless possibilities if connected to SCs. Yet, there is one fundamental limitation: blockchains are blind to external data and are not well-suited for subjectivity. Facing the oracle problem, we hit the following boundary conditions:
- Blockchains can’t access external data
- Centralized oracles mitigate SC decentralization and pose security risk
Blockchains aim for decentralization to maintain integrity of their consensus algorithm and provide strong guarantees of computational and data storage determinism, especially in highly decentralized and Sybil-resistant networks. As every node in a blockchain network needs to replay every transaction and end up with the same result, variable data from APIs bear the risk of nodes not being able to agree on a current state and therefore breaking consensus. Hence, blockchains are by default isolated and unable to obtain external data in order to realize high security and reliability. As execution in blockchains must be deterministic, SCs are blind to any information or access to off-chain data. However, a lot of smart contracts and applications rely on and require additional outside data to enable their use cases.
Oracles solve this problem by acting as on-chain APIs between blockchains and the real world, essentially unblinding them to external data. These oracles can be queried to feed information such as price information or weather reports into SCs. They therefore enable off-chain data on internal, on-chain environments. By that, any node replaying transactions will use the same immutable data that’s posted by the oracle, that is composed of a SC as well as off-chain components that query APIs and periodically update the smart contract’s data via blockchain transactions. Oracles can also serve bi-directionally, streaming inbound and outbound data. Due to data quality, scalability, friction, determinism and attack surface, oracles are not integrated into the base layer of any major blockchain but instead, operate as separate networks while maintaining flexibility to generate determinism from a complex and subjective real-world environment.
Unlike Greek mythology, blockchain oracles do not predict the future, but retrieve and feed information from the past. They facilitate communication between blockchains and any off-chain system, including data providers, web APIs, enterprise backends, cloud providers, IoT devices, e-signatures, payment systems, other blockchains, and more. Moreover, they incorporate several key on-chain and off-chain functions by collecting data from off-chain sources, by signed on-chain transfers and by making data available in SCs:
- Monitor – check for any incoming user or smart contract requests for off-chain data
- Extract – fetch data from one or multiple external systems
- Format – blockchain readable format of inbound data retrieved from external APIs and/or format outbound blockchain data for external API
- Validate – generate cryptographic proofs of the oracle performance by data signing, blockchain transaction signing, Transport Layer Security (TLS) signatures, Trusted Execution Environment (TEE) attestations, or zero-knowledge (zk) proofs
- Compute – perform secure off-chain computation for the SC, such as calculating a median from multiple oracle feeds or generating a verifiable random number via Verifiable Random Function (VRF)
- Broadcast (inbound) – broadcast and sign blockchain transactions for SC usage
- Broadcast (outbound) – send data to an external system after SC execution, such as relaying payment instructions to a traditional payment network
The other part of the oracle problem are centralized oracles, a single source of truth, that is insecure and more importantly, invalidates the decentralized aspect of a smart contract. By using a decentralized oracle that pulls from multiple data sources, it is possible to avoid that.
Targeting the oracle problem, there are several methods to enhance reliability and security, such as verifying data sources by digital signatures, eliminating single points of failures by decentralization via multiple data sources and/or multiple oracles, discouraging malicious behavior by staking and reputation systems and protecting privacy by TEE and zk proofs.
The Oracle ecosystem
Oracle use cases are manifold and include:
- Stablecoins and synthetic assets: exchange rate between the asset they are price targeting and the price of an on-chain source of collateral
- Derivatives and prediction markets: external prices or event outcomes to settle on-chain
- Provenance systems: tracking information of e.g. commodities
- Identity and on-chain reputation systems: require knowledge of governmental records to establish identities
- Lotteries and games randomness: whenever a randomness feed into a smart contract is required. Randomness can only be generated deterministically on a blockchain. To use any non-deterministic random number, an external oracle is bootstrapped. Additionally, cryptographic tools like VRF and verifiable delay functions (VDFs) can mitigate any predictability or manipulability in the randomness
- Decentralized exchanges: prices from an external oracle to set parameters. Some leverage oracles provide liquidity near the mid-market price to optimize AMMs
- Dynamic non-fungible tokens (dNFTs): can be updated based on external data. For example, sports trading cards that depend on the real-time performance of a player.
Modern oracles are more than data feeds that connect SCs to valuable off-chain information. By design, they responsively fetch external data such as price feeds, deliver it on-chain and act as a filtration system that verifies the trustworthiness of such information. Depending on the property of interest, can be classified by various qualities, see Illustration 1.
Illustration 1: Oracle taxonomy
Software oracles leverage digital sources like websites, APIs or other smart contracts.
Hardware oracles leverage sensors integrated e.g. IoT devices, to track and verify real-world data before sending it on-chain.
Human oracles rely on human input to provide external data.
Inbound and outbound oracles Inbound oracles feed data from external sources to smart contracts, outbound oracles feed data from smart contracts to the real world. Inbound oracles make up the largest share of the market.
Centralized oracles are considered centralized when they are controlled by a single, centralized entity that also acts as the sole provider of information for the smart contract. Similar to centralized blockchains, relying on one source of information corresponds to significant trust assumptions removing the entire purpose of using blockchains and poses a single point of failure risk as the contract depends entirely on the entity controlling the oracle.
Decentralized oracles avoid counterparty risk and minimize trust, as they provide more reliable data sourced from multiple oracles. The smart contract queries multiple oracles to determine the validity and accuracy of the data and therefore find consensus. Validation of a certain outcome can also happen in the form of social consensus useful in prediction markets. Importantly and similar to blockchains, decentralized oracles do not completely eliminate trust, but rather distribute it between network participants, that leaves certain attack vectors. Centralized and decentralized oracles can be further sub-grouped by the way they source data. Oracles sourcing off-chain data are generally slower to react to volatility, yet typically require a handful of trusted users to push the data on-chain. On-chain data doesn’t require trusted access, reacts fast to volatility and is therefore more easily manipulated by attackers that can lead to catastrophic failures, see “DeFi Security Lecture” for more information on the nuances.
Design Pattern The most commonly used oracle setups can be categorized as request-response, publish-subscribe, and immediate-read. To keep a reasonable scope, we recommend “A Study of Blockchain Oracles” and “Trustworthy Blockchain Oracles” for more details on design patterns.
In Table 1, we showcase some of the most important oracle implementations in order to give a brief understanding of how they realize data selection and aggregation and which mechanisms are used to resolve disputes.
Table 1: Selected Oracle implementations by design choice
With a Total Value Secured (TVS) of $173.75b within the oracle ecosystem, it is of great importance that any weaknesses and security risks of oracle networks are eliminated as good as possible while maintaining decentralization and a certain amount of flexibility. For instance, the Total Value Locked in DeFi is $209.43b at the time of writing. Illustration 2 demonstrates how TVS is allocated among the most important oracle solutions, that are Chainlink, internal oracles, TWAP oracles and Maker’s oracle, of which most act as price oracles with different design approaches.
Internal oracles are platform specific oracles designed for a unique use-case derived from the functionality of the underlying protocol.
TWAP oracles are used to consult for price information and source data either off-chain, simply taking price data from APIs or exchanges such as Coingecko, Coinbase or Coinmarketcap and bringing it on-chain, or on-chain by consulting decentralized exchanges such as Uniswap, Bancor or Balancer. TWAP oracles aim to eliminate price oracle manipulation by using a Time Weighted Average Price (TWAP) as the price of the last transaction in the previous block is recorded at the beginning of the block before any transactions take place. The cumulative price at the end of the block, which is the sum of prices per second, is added to the end of the block, allowing users to calculate an accurate TWAP. TWAPs, therefore, increase the cost of manipulation, as it increases linearly with the underlying liquidity and the TWAP’s length.
Illustration 2: TVS >$1b by Oracles (left), TVS >$1b of various protocols by Chainlink (right)
Maker oracle secures MakerDAO, a decentralized lending protocol that mints USD pegged DAI and is backed by cryptoasset collateral. The oracle module, composed of several whitelisted oracle addresses and an aggregator contract, serves to obtain an accurate real-time price of assets that is critical as it determines whether a Collateralized Debt Position (CDP) has enough collateral assets locked to avoid liquidation events. The oracles send periodic price updates to the independent asset aggregator that computes the reference median price by multiple reported medians and updates the platform. This reference price is delayed by the Oracle Security Module before it is finally used by the system. As such, the oracle functions as an autonomous auditor, monitoring for fraudulent activity in real-time.
Chainlink Since Chainlink, a decentralized oracle, is the absolute dominant leader in every possible metric, we take the opportunity to have a closer look at its features, product range and roadmap. According to the TVS chart, Chainlink secures 64.4% of the total value and interestingly, Chainlink’s integration boosted ecosystem growth in Avalanche and Fantom. Its product range includes:
Chainlink Data Feeds use a Volume Weighted Average Price (VWAP) to deliver accurate, real-time financial market data regardless of market volatility. It consists of >910 oracle networks and >1230 integrations. These oracles outperform TWAP oracles when it comes to volatility, weak market coverage, feed diversity or scaling oracle security.
Chainlink VRF (currently only on BSC, Polygon and Ethereum) is a provably-fair and verifiable source of randomness designed for SCs as Chainlink VRF provides tamper-proof random number generation (RNG) using on-chain block data as an input for applications that rely on unpredictable outcomes such as GameFi or NFTs. Moreover, VRF provides on-chain verifiable randomness using cryptographic proofs before smart contract interaction.
Chainlink Keepers (currently only on BSC, Polygon and Ethereum) as smart contracts can’t trigger their own functions at arbitrary times or under arbitrary conditions, state changes will only occur when another account initiates a transaction. Chainlink Keeper solves this problem in a trust minimized and decentralized manner.
Chainlink API calls Provide off-chain data via API calls.
Chainlink comes with some trade-offs as its inputs returned by Chainlink nodes cannot be validated without trust by relying on a centralized verification and dispute resolution rather than having a trustless verification mechanism. As Chainlink’s reward and punishment model are decoupled, nodes might deliver inputs into the system upon entry and provide security to reimburse requestors for erroneous information. Moreover, serving that many use cases it holds several attack vectors, such as network flooding with queries. Some of these flaws are addressed in Chainlink 2.0. We refer to “Understanding oracles: Chainlink 2.0” for an in depth review. As ambitious as the whitepaper is though, there might still be issues regarding trust assumptions that have to be addressed. As Eric Wall states: “In Chainlink 2.0, the ‘solution’ to the problem that trusted oracles can collude and feed incorrect oracle answers into the blockchain is to have another group of even more trusted oracles be responsible for punishing the first group.”
Conclusion and Outlook
Smart contracts running on blockchain infrastructure contain an immense potential when coupled with real word data. Due to their design, blockchains operate in isolation to guarantee deterministic execution, scalability, low friction, and low complexity. To unblind blockchains, oracles act as enabler of external data consumption by providing data in smart contracts that are fed by off- and on-chain components. It is obvious that the fashion in which data finds its way onto the blockchain is very delicate as a huge amount of value is secured by oracles and therefore, they need to provide high quality, reliable data that is unsusceptible to manipulation. Modern oracles, therefore, use various mechanisms such as TWAP or VWAP of decentralized oracles to eliminate attack vectors such as price manipulation via flash loans or freeloading and mirroring attacks. Analog to blockchains, decentralizing oracles is a key aspect of designing proper solutions that minimize trust assumptions and single points of failure. Moving forward, further optimization towards decentralized oracles seems promising, yet mitigating any trust assumptions remains to be as challenging as solving the blockchain trilemma. In Part II, we will explore the history of oracle exploits and how these events yielded more robust and more trustless oracle designs.
The author thanks Tejaswi Nadahalli and Marcus Dapp for their valuable input.