Bitcoin double-spend prevention bonds on Liquid
Table of Contents
A brief history of zero-conf txs and double-spend prevention
Zero-conf transactions have been contentiously discussed by Bitcoin developers ever since the first real-value transactions began to occur on the network. For a long time they were dismissed as dangerous, because the acceptance of a zero-conf transaction opens a receiver up to the risk of a double spend. However, new policy rules like RBF, as well as novel second-layer constructions like turbo channels, have brought them back as a serious topic of conversation among developers. I myself began to seriously ponder them as I reviewed Burak's proposal for a novel second layer protocol which he named Ark. In the Ark protocol, a single pending unconfirmed transaction could represent thousands of actual transactions so increasing trust in just a single unconfirmed transaction would have a large UX benefit for many individual transactions.
I'm not the first person to work on this problem. In 2017, a
paper by Solà et al. proposed a new addition to the Bitcoin
protocol which would allow a sender to convince a receiver that they would not double spend a
certain transaction. However, this proposal had a big drawback: it needed a new
opcode to be added to Bitcoin Script. Thus, it would
require a protocol softfork in order to work. Furthermore, this method would not be compatible
with the simple pubkey-based spending methods (p2wpkh
or taproot key-spends) which are currently
being used.
Despite these drawbacks, I was drawn to the paper's core idea. By adding a protocol operation that supports fixing the "nonce" value in the ECDSA signature that spends a bitcoin transaction, the sender would be incentivized not to spend the same output multiple times. This is because a double spend would actually reveal their private key, which would allow any party to spend the money in question. I won't dive into the specific details in this post. However, if you are curious, give that paper a read.
Elements/Liquid Script
I spent a long time trying to use the properties of the new Schnorr signature scheme in taproot to somehow force nonce re-use without the need for a soft-fork. However, despite my best efforts, it seemed impossible.
Then, I had a jolt of inspiration from my day job. I've spent the past four years working on Blockstream's Elements platform which powers the Liquid Network, and this has made me very familiar with the networks scripting language: Elements Script. I realized that Elements Script, which is actually a superset of Bitcoin Script, is expressive enough to validate Bitcoin transaction signatures and detect a double spend. Furthermore, this would not require any sort of fork on the Liquid Network! In other words, rather than trying to make double spends impossible on Bitcoin, I found a way to detect Bitcoin double spends on Liquid.
Once I realized that the Liquid Network could enforce such a "contract", I put all these ideas together and came up with the following solution. A sender on Bitcoin would create a "bond" smart contract transaction on Liquid, in which he would specify his Bitcoin public key. The money in this "bond" UTXO would be locked up for a certain amount of pre-specified time. Then, if a double spend were to occur on Bitcoin using this public key, any other user would be able to present the two Bitcoin transactions to the smart contract, prove that a double spend had occurred, and then burn the money in the bond. This way the bond creator is discouraged from double spending from this public key, because they stand to lose their locked money.
To summarize: a Bitcoin user who wants to make his Bitcoin transactions trust-worthy before confirmation would just need to tie money up in a "bond" UTXO on Liquid, which could be burned if a double spend from their Bitcoin public key were discovered.
The obvious question is: why do we need Liquid? Why can't we build this on Bitcoin itself? The answer to this question lies in the structure of Bitcoin Script. Bitcoin's scripting language lacks certain operations that we would need to validate Bitcoin transactions:
OP_CAT
: a simple opcode which concatenates two byte slices together; this opcode was originally included in the Bitcoin protocol but it was deemed risky and subsequently disabled. This opcode may seem boring at first. However, it can actually enable many interesting things which might surprise you. Check out Andrew Poelstra's musings on this opcode here and here, for more details.OP_CHECKSIGFROMSTACK
: an opcode that validates a signature for a message on the stack, as opposed toOP_CHECKSIG
which checks a signature for the signature hash of the transaction the input is spent in. This opcode will be key in validating double spends on Bitcoin.
Lastly, if we were able to somehow create the bond on Bitcoin, we'd have to allow the user proving the double spend to the bond to take all the money in the bond. This means that a miner can front-run by making a tx with the same witness data but which spends all the money to the miner fee. This might seem fine, all we want is for the bond holder to lose the money. However, this would mean though that any party that can include txs in blocks by bypassing the mempool (i.e. miners) could create bonds and violate them without losing their money. This is not ideal. So ideally we would have to require the money in the bond be burned and for that a covenant is required.
There are several ways to construct covenants on Liquid . I won't go into the details of actually constructing the covenants here -- that is out of scope for this post. Further reading on this can be found here and here.
Signature hashes
Now, let's discuss how to actually check whether two transactions double spend the same UTXO using a stack-based "programming" language like Elements Script. Firstly, let's define a double spend. For our purposes, a double spend is simply two transactions which spend the same inputs, but have different outputs.
The direct translation of this idea into code would work as follows: when burning the bond UTXO, the user will have to push two raw bitcoin transactions onto the stack. They would do this by providing them in the Spending transaction's witness and then the script will perform a check for a common input in both transactions which was signed by the public key tied to the bond. That sounds pretty complicated to do in Script..
It turns out that there exists a simpler way. When Bitcoin transactions are signed by a key, it's not actually the full transaction which is signed. Rather, for every signature, an individual "signature hash" (sighash) is constructed. This hash commits to various parts of the transaction along with some parts of the UTXO which the input is spending. Let's take a look at what the signature hash structure looks like for segwit v0 (i.e. pre-taproot) transactions.
The "signature hash" is the SHA-256 hash of a string which is simply the concatenation of the all the following pieces of data:
<tx-version> 4 bytes
the tx version
<prevouts> 32 bytes
hash of all input outpoints, or zero bytes for SIGHASH_ANYONECANSPEND
<sequences> 32 bytes
hash of all input sequences, or zero bytes for SIGHASH_ANYONECANSPEND
<in-prevout> 36 bytes
the outpoint of the input being signed
<in-script> up to 10,000 bytes
the UTXO scriptCode (scriptPubkey*) of the input being signed
<in-value> 8 bytes
the UTXO value of the input being signed
<in-sequence> 4 bytes
the sequence of the input being signed
<outputs> 32 bytes
for SIGHASH_ALL: hash of all outputs
for SIGHASH_SINGLE: hash of output at index of the input being signed
for SIGHASH_NONE: 32 zero bytes
<locktime> 4 bytes
the tx's locktime value
<sighashtype> 4 bytes
the sighash type used
Note that this data will always have the exact same structure and length, regardless of the sighash
types
that is used. The only exception lies in the scriptCode
field, whose length varies according to
the size of the script which is provided to the output being spent. This fixed length property is
very convenient when working in a stack-based language.
Now, observe that we can actually detect that a double spend was made, by only looking at the raw data that goes into two transactions' signatures hashes. A double spend exists between of these data structures if and only if the following two conditions hold:
- the
<in-prevout>
value is identical - the
<outputs>
value is different
Thus, rather than pushing entire transactions onto the stack in order to prove double spends, we
only need to receive a few pieces of data which allows us to check that the key signed two
signatures that represent two transactions of which the input matches and that the outputs differ.
After checking that the inputs match and the outputs differ, we will construct the signature hash
and then use OP_CHECKSIGFROMSTACK
to verify that the bond's pubkey did indeed sign these signature
hashes.
Now that we understand the data that we will be pushing onto the stack (the two transaction's signature hash data structures), let's dive a bit Deeper into the double spend detection algorithm.
First, let's split the signature hash data into the following parts:
<tx-version><prevouts><sequences> exactly 4 + 32 + 32 bytes
<in-prevout> exactly 36 bytes
<in-script><in-value><in-sequence> between 12 and 10,012 bytes*
<outputs> exactly 32 bytes
<locktime><sighashtype> exactly 4 + 4 bytes
Cool. Now we are going somewhere. But are we done?
Pitfalls
Note that any malicious actor can attempt to exploit the smart contract by trying to burn the bond
without possessing a valid double spend made. In other words, they might try to trick our contract!
If we don't carefully check the sizes of the different witness items we are expecting, then a
malicious actor could simply make the middle item one byte longer and the last item one byte shorter
which would make the <outputs>
items different. This would allow the malicious actor to burn the
bond. By carefully checking all sizes of the items, we can prevent this kind of attack. It is very
fortunate that there is only a single item of undefined size (the script code), because if there
were two, Then we wouldn't have this guarantee and securing our contract would become much harder.
Lastly, there is one other big problem in our smart contract.
As noted above, Bitcoin consensus rules allow a legal scriptCode
to be up to 10,000 bytes long.
While a script of this size is unlikely to be created by an average user, a malicious user can quite
easily compose a p2wsh
output with a witness script that is artificially grown to an arbitrary size.
What does that mean? First of all, individual items on the witness stack have size limits. For
standard transactions, the limit is 80 bytes and for consensus the limit is 520 bytes. So if the
script code is very large, the <in-script><in-value><in-sequence>
witness item could exceed its
maximum size. We can solve this easily though, by allowing the user to split up this item into
multiple items and then concatinating them together.
There is however another important limit. Any piece of data on the stack during execution is limited
to 520 bytes. Because we need to hash the sighash data all together, we also first need to
concatinate it all together into a single stack item. This means that there is a maximum size of
script codes we can accept in our smart contract: namely 364 bytes. What does this mean? It means
that the bond holder can secretly try use a p2wsh
output with a script code larger than 364 bytes
and if he would double spend that output, it wouldn't be possible for our smart contract to actually
validate the double spend.
The result is that users that will rely on this bond, need to be cautious. While regular p2wpkh
outputs have script codes within this limit, whenever they encounter a p2wsh
output, they can only
rely on the bond if they know that the witness script used is not larger than 364 bytes.
What about taproot?
The above setup works for creating bonds that can check on transaction using segwit v0 signature
hashes, namely for UTXOs with p2wpkh
and p2wsh
outputs.
But, why did I start off with segwit instead of taproot scripts, especially since taproot on
Elements introduced some new covenant
opcodes in
Elements tapscript which greatly simplify the construction of the burn covenant (which enforces that
the bond funds are burned) which we briefly discussed above. However, OP_CHECKSIGFROMSTACK
in
segwit uses the ECDSA signature algorithm, so it verifies ECDSA signatures, while the same opcode in
tapscript uses the Schnorr signature algorithm. This means that if we want to validate Bitcoin
transactions that use ECDSA signatures, we must write the bond in segwit, while if we want to verify
Schnorr signatures, we will have to write the bond using taproot. Unfortunately, this means we
cannot construct a single bond that protects both segwit and taproot outputs at once. (Side
note: this could be considered an oversight of the taproot upgrade in general, since verifying ECDSA
signatures is a functionality which is not available in tapscript but was available before.)
So, to build a double-spend bond that supports taproot on Bitcoin, we will unfortunately need to start over entirely. So let's take a look at the taproot sighash structure (hint: it's not very fun).
<epoch> 1 byte
version number fixed to 0x00
<sighashtype> 1 byte
the sighash type used
<tx-version> 4 bytes
the tx version
<locktime> 4 bytes
the tx's locktime value
if !ANYONECANPAY:
<prevouts> 32 bytes
hash of all input outpoints
<amounts> 32 bytes
hash of all input amounts
<scripts> 32 bytes
hash of all input scriptPubkeys
<sequences> 32 bytes
hash of all input sequences
if SIGHASH_ALL:
<outputs> 32 bytes
hash of all outputs
<spend-type> 1 byte
flags indicating annex or code separator
if ANYONECANPAY:
<in-prevout> 36 bytes
the outpoint of the input being signed
<in-value> 8 bytes
the value of the input being signed
<in-script> variable
the scriptPubkey of the input being signed
<in-sequence> 4 bytes
the sequence of the input being signed
else:
<in-index> 4 bytes
the input index of the input being signed
if annex:
<annex-hash> 32 bytes
hash of the tx annex
if SIGHASH_SINGLE:
<output-hash> 32 bytes
hash of the output at the index of input being signed
if codeseparator:
<cs-hash> 32 bytes
leaf hash of code separator leaf
<key-version> 1 bytes
fixed 0x00 version byte
<cs-position> 4 bytes
code separator position
I don't think you need me to explain why I don't consider it fun... not only does it seem way more complex, but it also does not have a fixed structure like the segwit sighash. Like we saw before, whenever certain pieces of information didn't need to be committed to within the segwit sighash, they would be replaced with zero bytes and the entire data structure would always retain the same fixed size (with the script code as the only exception). This is very convenient for us, because it allows us to still perform the same important size checks regardless of the sighash type used. This is no longer the case for taproot.
We can work our way around this though. Let's first talk about another great feature of taproot: MAST. MAST allows us to actually specify multiple different scripts inside a single output and then the user can chose any of these scripts to be executed.
So imagine we don't care about any other sighashtype than SIGHASH_ALL
. In that case, we can write
a smart contract like we did for segwit, but that only works when SIGHASH_ALL
is used; and we can
have another smart contract that a user can use to proof the bond holder signed with another sighash
type than SIGHASH_ALL
. This way, we have effectively made any other sighash illegal to use for the
bond holder, because we will also burn his money if he does. And meanwhile, for our actual double
spend detection, we can focus on the simpler case of SIGHASH_ALL
. Let's take a look at how the
signature hash structure looks then.
<epoch> 1 byte
version number fixed to 0x00
<sighashtype> 1 byte
the sighash type used
<tx-version> 4 bytes
the tx version
<locktime> 4 bytes
the tx's locktime value
<prevouts> 32 bytes
hash of all input outpoints
<amounts> 32 bytes
hash of all input amounts
<scripts> 32 bytes
hash of all input scriptPubkeys
<sequences> 32 bytes
hash of all input sequences
<outputs> 32 bytes
hash of all outputs
<spend-type> 1 byte
flags indicating annex or code separator
<in-index> 4 bytes
the input index of the input being signed
if annex:
<annex-hash> 32 bytes
hash of the tx annex
if codeseparator:
<cs-hash> 32 bytes
leaf hash of code separator leaf
<key-version> 1 bytes
fixed 0x00 version byte
<cs-position> 4 bytes
code separator position
That looks a lot simpler. There is one more problem though.. Recall the two parts of the sighash data which we focused on earlier: the input outpoint being spent and the outputs hash. This sighature hash data is missing an explicit mention of the input outpoint being spent! It does actually commit to it, though, in the form of the hash of all outpoints, combined with the input index. But, remember that we have to take two signature hash data structures and check that they have a specific input in common.
The only way to do that here with taproot is to take all the input outpoints on the stack and pick out the one at the input index. However, this means we need to accept 36 bytes of data for each input in either of the two double-spend transactions. Bitcoin doesn't have a clear limit on the maximum number of inputs, so in practice it seems that it can go up to 24,386. Taproot's input size limits are more relaxed than segwit's, but we certainly can't take 878 KB of data in our witness.
This is problematic. A potential solution strategy is to limit the number of inputs the bond holder is allowed to use in his transactions and make it illegal for him to use a higher number. For this we have to write a additional smart contract that takes in a single signature hash data and burns the bond if the sighature hash represents a transaction with more inputs than the limit.
But how do we learn the number of inputs a transaction had from the signature hash data? It doesn't seem to be committed to in any way. But we actually do commit to it, indirectly. Consider all these has values that hash information from all inputs, the input amounts, outpoints, scriptPubkeys and sequences. They implicitly commit the number of items that are hashed. The smallest ones among these items are the sequences of 4 bytes each. If we can prove that the sequences hash is the result of hashing together a certain number of 4-byte values, we proved that there are this same number of inputs in the transaction. But again, if the transaction has 24,386 inputs, that is still 97.5 KB of witness data, which would violate the consensus limits.
I spent the past week hiking the mountains of Honduras, where I had plenty of time breaking my head on this problem. I didn't want to accept that it's impossible to build the same double spend bond smart contract for taproot just because Pieter Wuille decided to use this clever trick to avoid hashing the same input data twice. Instead, I managed to devise a solution that I'm still working on... Elements tapscript has some new opcodes which support streaming hashes. What this means is that they allow you can hash data piece by piece instead of all at once because they keep the hash engine's internal state on the stack. This is nice for piecewise hashing of stack data, but we can use the same opcodes to do a clever trick of our own. (Note that these opcodes also help resolve the problem of the script code size we have in our segwit bond!)
Let's say we want the limit of inputs per transaction to be 50 inputs. (This number is arbitrary, it
would be worthwhile to do some research into what would be a good limit that makes our bond stay
within Liquid standardness and/or consensus limits.) We can prove that the signature hash commits to
at least 51 inputs by accepting a hash engine midstate onto the stack, then adding an extra 51
sequences from the stack into the hash and then finalizing the hash and using it as the <prevouts>
component of the signature hash. This way, we don't need to take all sequences values on the stack
to hash them, but we can still prove that at least 51 sequences were hashed in order to produce
the hash.
This trick allows us to effectively set a limit on the number of inputs, which in turn allows us to write the original smart contract in such a way that all the outpoints that need to be put on the stack can fit well within clearly set boundaries.
Conclusion
In theory, the above ideas should allow us to build a taproot version of the bond. The key words there are: in theory.
James Dorfman (a colleague at Blockstream) and I have started a project called doubletake, where we are building an actual implementation of the bonds described in this writing.
Project state
The doubletake project currently has the following features implemented:
- a working segwit-based smart contract to create bonds for segwit-based Bitcoin transactions
- a spec format that uniquely identifies a bond so that the spec can be used across all interactions
- Note that the spec is still not stabilized, though!
- Rust API for constructing bonds, burn bonds and reclaim bonds after expiry
- a CLI tool for doing the same operations in a terminal
- a WASM FFI for integration in web pages
Some drawbacks of the current implementation:
- Like described above, the segwit version of the bond can only protect UTXOs with scriptcodes within the limit of 364 bytes.
- The segwit bond only protects segwit v0 based UTXOs; the consumer of the bond should be aware of the fact that the bond has no effect on either legacy (pre-segwit) UTXOs or taproot ones.
- Using a bond means that RBF can't be used, as this is technically a double spend. We decided not to attempt to implement RBF rules in our smart contract. This limitation means that a party receiving funds spent from a bond-protected UTXO should be aware that the fee-rate of this transaction cannot be raised.
Some work in progress:
- a web app to create and interact with bonds (coming soon at http://zeroconf.me)
- a taproot-supporting version of the smart contract as detailed in this post