Bitcoin double-spend prevention bonds on Liquid

2023-08-25
18 min read
Tags:
bitcoin

A brief history of zero-conf txs and double-spend prevention

Zero-conf transactions have been contentiously discussed by Bitcoin developers ever since the first real-value transactions began to occur on the network. For a long time they were dismissed as dangerous, because the acceptance of a zero-conf transaction opens a receiver up to the risk of a double spend. However, new policy rules like RBF, as well as novel second-layer constructions like turbo channels, have brought them back as a serious topic of conversation among developers. I myself began to seriously ponder them as I reviewed Burak's proposal for a novel second layer protocol which he named Ark. In the Ark protocol, a single pending unconfirmed transaction could represent thousands of actual transactions so increasing trust in just a single unconfirmed transaction would have a large UX benefit for many individual transactions.

I'm not the first person to work on this problem. In 2017, a paper by Solà et al. proposed a new addition to the Bitcoin protocol which would allow a sender to convince a receiver that they would not double spend a certain transaction. However, this proposal had a big drawback: it needed a new opcode to be added to Bitcoin Script. Thus, it would require a protocol softfork in order to work. Furthermore, this method would not be compatible with the simple pubkey-based spending methods (p2wpkh or taproot key-spends) which are currently being used.

Despite these drawbacks, I was drawn to the paper's core idea. By adding a protocol operation that supports fixing the "nonce" value in the ECDSA signature that spends a bitcoin transaction, the sender would be incentivized not to spend the same output multiple times. This is because a double spend would actually reveal their private key, which would allow any party to spend the money in question. I won't dive into the specific details in this post. However, if you are curious, give that paper a read.

Elements/Liquid Script

I spent a long time trying to use the properties of the new Schnorr signature scheme in taproot to somehow force nonce re-use without the need for a soft-fork. However, despite my best efforts, it seemed impossible.

Then, I had a jolt of inspiration from my day job. I've spent the past four years working on Blockstream's Elements platform which powers the Liquid Network, and this has made me very familiar with the networks scripting language: Elements Script. I realized that Elements Script, which is actually a superset of Bitcoin Script, is expressive enough to validate Bitcoin transaction signatures and detect a double spend. Furthermore, this would not require any sort of fork on the Liquid Network! In other words, rather than trying to make double spends impossible on Bitcoin, I found a way to detect Bitcoin double spends on Liquid.

Once I realized that the Liquid Network could enforce such a "contract", I put all these ideas together and came up with the following solution. A sender on Bitcoin would create a "bond" smart contract transaction on Liquid, in which he would specify his Bitcoin public key. The money in this "bond" UTXO would be locked up for a certain amount of pre-specified time. Then, if a double spend were to occur on Bitcoin using this public key, any other user would be able to present the two Bitcoin transactions to the smart contract, prove that a double spend had occurred, and then burn the money in the bond. This way the bond creator is discouraged from double spending from this public key, because they stand to lose their locked money.

To summarize: a Bitcoin user who wants to make his Bitcoin transactions trust-worthy before confirmation would just need to tie money up in a "bond" UTXO on Liquid, which could be burned if a double spend from their Bitcoin public key were discovered.

The obvious question is: why do we need Liquid? Why can't we build this on Bitcoin itself? The answer to this question lies in the structure of Bitcoin Script. Bitcoin's scripting language lacks certain operations that we would need to validate Bitcoin transactions:

OP_CAT: a simple opcode which concatenates two byte slices together; this opcode was originally included in the Bitcoin protocol but it was deemed risky and subsequently disabled. This opcode may seem boring at first. However, it can actually enable many interesting things which might surprise you. Check out Andrew Poelstra's musings on this opcode here and here, for more details.
OP_CHECKSIGFROMSTACK: an opcode that validates a signature for a message on the stack, as opposed to OP_CHECKSIG which checks a signature for the signature hash of the transaction the input is spent in. This opcode will be key in validating double spends on Bitcoin.

Lastly, if we were able to somehow create the bond on Bitcoin, we'd have to allow the user proving the double spend to the bond to take all the money in the bond. This means that a miner can front-run by making a tx with the same witness data but which spends all the money to the miner fee. This might seem fine, all we want is for the bond holder to lose the money. However, this would mean though that any party that can include txs in blocks by bypassing the mempool (i.e. miners) could create bonds and violate them without losing their money. This is not ideal. So ideally we would have to require the money in the bond be burned and for that a covenant is required.

There are several ways to construct covenants on Liquid . I won't go into the details of actually constructing the covenants here -- that is out of scope for this post. Further reading on this can be found here and here.

Signature hashes

Now, let's discuss how to actually check whether two transactions double spend the same UTXO using a stack-based "programming" language like Elements Script. Firstly, let's define a double spend. For our purposes, a double spend is simply two transactions which spend the same inputs, but have different outputs.

The direct translation of this idea into code would work as follows: when burning the bond UTXO, the user will have to push two raw bitcoin transactions onto the stack. They would do this by providing them in the Spending transaction's witness and then the script will perform a check for a common input in both transactions which was signed by the public key tied to the bond. That sounds pretty complicated to do in Script..

It turns out that there exists a simpler way. When Bitcoin transactions are signed by a key, it's not actually the full transaction which is signed. Rather, for every signature, an individual "signature hash" (sighash) is constructed. This hash commits to various parts of the transaction along with some parts of the UTXO which the input is spending. Let's take a look at what the signature hash structure looks like for segwit v0 (i.e. pre-taproot) transactions.

The "signature hash" is the SHA-256 hash of a string which is simply the concatenation of the all the following pieces of data:

<tx-version>   4 bytes
               the tx version
<prevouts>     32 bytes
               hash of all input outpoints, or zero bytes for SIGHASH_ANYONECANSPEND
<sequences>    32 bytes
               hash of all input sequences, or zero bytes for SIGHASH_ANYONECANSPEND
<in-prevout>   36 bytes
               the outpoint of the input being signed
<in-script>    up to 10,000 bytes
               the UTXO scriptCode (scriptPubkey*) of the input being signed
<in-value>     8 bytes
               the UTXO value of the input being signed
<in-sequence>  4 bytes
               the sequence of the input being signed
<outputs>      32 bytes
               for SIGHASH_ALL: hash of all outputs
               for SIGHASH_SINGLE: hash of output at index of the input being signed
               for SIGHASH_NONE: 32 zero bytes
<locktime>     4 bytes
               the tx's locktime value
<sighashtype>  4 bytes
               the sighash type used

Note that this data will always have the exact same structure and length, regardless of the sighash types that is used. The only exception lies in the scriptCode field, whose length varies according to the size of the script which is provided to the output being spent. This fixed length property is very convenient when working in a stack-based language.

Now, observe that we can actually detect that a double spend was made, by only looking at the raw data that goes into two transactions' signatures hashes. A double spend exists between of these data structures if and only if the following two conditions hold:

the <in-prevout> value is identical
the <outputs> value is different

Thus, rather than pushing entire transactions onto the stack in order to prove double spends, we only need to receive a few pieces of data which allows us to check that the key signed two signatures that represent two transactions of which the input matches and that the outputs differ. After checking that the inputs match and the outputs differ, we will construct the signature hash and then use OP_CHECKSIGFROMSTACK to verify that the bond's pubkey did indeed sign these signature hashes.

Now that we understand the data that we will be pushing onto the stack (the two transaction's signature hash data structures), let's dive a bit Deeper into the double spend detection algorithm.

First, let's split the signature hash data into the following parts:

<tx-version><prevouts><sequences>   exactly 4 + 32 + 32 bytes
<in-prevout>                        exactly 36 bytes
<in-script><in-value><in-sequence>  between 12 and 10,012 bytes*
<outputs>                           exactly 32 bytes
<locktime><sighashtype>             exactly 4 + 4 bytes

Cool. Now we are going somewhere. But are we done?

Pitfalls

Note that any malicious actor can attempt to exploit the smart contract by trying to burn the bond without possessing a valid double spend made. In other words, they might try to trick our contract! If we don't carefully check the sizes of the different witness items we are expecting, then a malicious actor could simply make the middle item one byte longer and the last item one byte shorter which would make the <outputs> items different. This would allow the malicious actor to burn the bond. By carefully checking all sizes of the items, we can prevent this kind of attack. It is very fortunate that there is only a single item of undefined size (the script code), because if there were two, Then we wouldn't have this guarantee and securing our contract would become much harder.

Lastly, there is one other big problem in our smart contract.

As noted above, Bitcoin consensus rules allow a legal scriptCode to be up to 10,000 bytes long. While a script of this size is unlikely to be created by an average user, a malicious user can quite easily compose a p2wsh output with a witness script that is artificially grown to an arbitrary size. What does that mean? First of all, individual items on the witness stack have size limits. For standard transactions, the limit is 80 bytes and for consensus the limit is 520 bytes. So if the script code is very large, the <in-script><in-value><in-sequence> witness item could exceed its maximum size. We can solve this easily though, by allowing the user to split up this item into multiple items and then concatinating them together.

There is however another important limit. Any piece of data on the stack during execution is limited to 520 bytes. Because we need to hash the sighash data all together, we also first need to concatinate it all together into a single stack item. This means that there is a maximum size of script codes we can accept in our smart contract: namely 364 bytes. What does this mean? It means that the bond holder can secretly try use a p2wsh output with a script code larger than 364 bytes and if he would double spend that output, it wouldn't be possible for our smart contract to actually validate the double spend.

The result is that users that will rely on this bond, need to be cautious. While regular p2wpkh outputs have script codes within this limit, whenever they encounter a p2wsh output, they can only rely on the bond if they know that the witness script used is not larger than 364 bytes.

What about taproot?

The above setup works for creating bonds that can check on transaction using segwit v0 signature hashes, namely for UTXOs with p2wpkh and p2wsh outputs.

But, why did I start off with segwit instead of taproot scripts, especially since taproot on Elements introduced some new covenant opcodes in Elements tapscript which greatly simplify the construction of the burn covenant (which enforces that the bond funds are burned) which we briefly discussed above. However, OP_CHECKSIGFROMSTACK in segwit uses the ECDSA signature algorithm, so it verifies ECDSA signatures, while the same opcode in tapscript uses the Schnorr signature algorithm. This means that if we want to validate Bitcoin transactions that use ECDSA signatures, we must write the bond in segwit, while if we want to verify Schnorr signatures, we will have to write the bond using taproot. Unfortunately, this means we cannot construct a single bond that protects both segwit and taproot outputs at once. (Side note: this could be considered an oversight of the taproot upgrade in general, since verifying ECDSA signatures is a functionality which is not available in tapscript but was available before.)

So, to build a double-spend bond that supports taproot on Bitcoin, we will unfortunately need to start over entirely. So let's take a look at the taproot sighash structure (hint: it's not very fun).

<epoch>            1 byte
                   version number fixed to 0x00
<sighashtype>      1 byte
                   the sighash type used
<tx-version>       4 bytes
                   the tx version
<locktime>         4 bytes
                   the tx's locktime value

if !ANYONECANPAY:
    <prevouts>     32 bytes
                   hash of all input outpoints
    <amounts>      32 bytes
                   hash of all input amounts
    <scripts>      32 bytes
                   hash of all input scriptPubkeys
    <sequences>    32 bytes
                   hash of all input sequences

if SIGHASH_ALL:
    <outputs>      32 bytes
                   hash of all outputs

<spend-type>       1 byte
                   flags indicating annex or code separator

if ANYONECANPAY:
    <in-prevout>   36 bytes
                   the outpoint of the input being signed
    <in-value>     8 bytes
                   the value of the input being signed
    <in-script>    variable
                   the scriptPubkey of the input being signed
    <in-sequence>  4 bytes
                   the sequence of the input being signed
else:
    <in-index>     4 bytes
                   the input index of the input being signed

if annex:
    <annex-hash>   32 bytes
                   hash of the tx annex

if SIGHASH_SINGLE:
    <output-hash>  32 bytes
                   hash of the output at the index of input being signed

if codeseparator:
    <cs-hash>      32 bytes
                   leaf hash of code separator leaf
    <key-version>  1 bytes
                   fixed 0x00 version byte
    <cs-position>  4 bytes
                   code separator position

I don't think you need me to explain why I don't consider it fun... not only does it seem way more complex, but it also does not have a fixed structure like the segwit sighash. Like we saw before, whenever certain pieces of information didn't need to be committed to within the segwit sighash, they would be replaced with zero bytes and the entire data structure would always retain the same fixed size (with the script code as the only exception). This is very convenient for us, because it allows us to still perform the same important size checks regardless of the sighash type used. This is no longer the case for taproot.

We can work our way around this though. Let's first talk about another great feature of taproot: MAST. MAST allows us to actually specify multiple different scripts inside a single output and then the user can chose any of these scripts to be executed.

So imagine we don't care about any other sighashtype than SIGHASH_ALL. In that case, we can write a smart contract like we did for segwit, but that only works when SIGHASH_ALL is used; and we can have another smart contract that a user can use to proof the bond holder signed with another sighash type than SIGHASH_ALL. This way, we have effectively made any other sighash illegal to use for the bond holder, because we will also burn his money if he does. And meanwhile, for our actual double spend detection, we can focus on the simpler case of SIGHASH_ALL. Let's take a look at how the signature hash structure looks then.

<epoch>            1 byte
                   version number fixed to 0x00
<sighashtype>      1 byte
                   the sighash type used
<tx-version>       4 bytes
                   the tx version
<locktime>         4 bytes
                   the tx's locktime value
<prevouts>         32 bytes
                   hash of all input outpoints
<amounts>          32 bytes
                   hash of all input amounts
<scripts>          32 bytes
                   hash of all input scriptPubkeys
<sequences>        32 bytes
                   hash of all input sequences
<outputs>          32 bytes
                   hash of all outputs
<spend-type>       1 byte
                   flags indicating annex or code separator
<in-index>         4 bytes
                   the input index of the input being signed

if annex:
    <annex-hash>   32 bytes
                   hash of the tx annex

if codeseparator:
    <cs-hash>      32 bytes
                   leaf hash of code separator leaf
    <key-version>  1 bytes
                   fixed 0x00 version byte
    <cs-position>  4 bytes
                   code separator position

That looks a lot simpler. There is one more problem though.. Recall the two parts of the sighash data which we focused on earlier: the input outpoint being spent and the outputs hash. This sighature hash data is missing an explicit mention of the input outpoint being spent! It does actually commit to it, though, in the form of the hash of all outpoints, combined with the input index. But, remember that we have to take two signature hash data structures and check that they have a specific input in common.

The only way to do that here with taproot is to take all the input outpoints on the stack and pick out the one at the input index. However, this means we need to accept 36 bytes of data for each input in either of the two double-spend transactions. Bitcoin doesn't have a clear limit on the maximum number of inputs, so in practice it seems that it can go up to 24,386. Taproot's input size limits are more relaxed than segwit's, but we certainly can't take 878 KB of data in our witness.

This is problematic. A potential solution strategy is to limit the number of inputs the bond holder is allowed to use in his transactions and make it illegal for him to use a higher number. For this we have to write a additional smart contract that takes in a single signature hash data and burns the bond if the sighature hash represents a transaction with more inputs than the limit.

But how do we learn the number of inputs a transaction had from the signature hash data? It doesn't seem to be committed to in any way. But we actually do commit to it, indirectly. Consider all these has values that hash information from all inputs, the input amounts, outpoints, scriptPubkeys and sequences. They implicitly commit the number of items that are hashed. The smallest ones among these items are the sequences of 4 bytes each. If we can prove that the sequences hash is the result of hashing together a certain number of 4-byte values, we proved that there are this same number of inputs in the transaction. But again, if the transaction has 24,386 inputs, that is still 97.5 KB of witness data, which would violate the consensus limits.

I spent the past week hiking the mountains of Honduras, where I had plenty of time breaking my head on this problem. I didn't want to accept that it's impossible to build the same double spend bond smart contract for taproot just because Pieter Wuille decided to use this clever trick to avoid hashing the same input data twice. Instead, I managed to devise a solution that I'm still working on... Elements tapscript has some new opcodes which support streaming hashes. What this means is that they allow you can hash data piece by piece instead of all at once because they keep the hash engine's internal state on the stack. This is nice for piecewise hashing of stack data, but we can use the same opcodes to do a clever trick of our own. (Note that these opcodes also help resolve the problem of the script code size we have in our segwit bond!)

Let's say we want the limit of inputs per transaction to be 50 inputs. (This number is arbitrary, it would be worthwhile to do some research into what would be a good limit that makes our bond stay within Liquid standardness and/or consensus limits.) We can prove that the signature hash commits to at least 51 inputs by accepting a hash engine midstate onto the stack, then adding an extra 51 sequences from the stack into the hash and then finalizing the hash and using it as the <prevouts> component of the signature hash. This way, we don't need to take all sequences values on the stack to hash them, but we can still prove that at least 51 sequences were hashed in order to produce the hash.

This trick allows us to effectively set a limit on the number of inputs, which in turn allows us to write the original smart contract in such a way that all the outpoints that need to be put on the stack can fit well within clearly set boundaries.

Conclusion

In theory, the above ideas should allow us to build a taproot version of the bond. The key words there are: in theory.

James Dorfman (a colleague at Blockstream) and I have started a project called doubletake, where we are building an actual implementation of the bonds described in this writing.

Project state

The doubletake project currently has the following features implemented:

a working segwit-based smart contract to create bonds for segwit-based Bitcoin transactions
a spec format that uniquely identifies a bond so that the spec can be used across all interactions
- Note that the spec is still not stabilized, though!
Rust API for constructing bonds, burn bonds and reclaim bonds after expiry
a CLI tool for doing the same operations in a terminal
a WASM FFI for integration in web pages

Some drawbacks of the current implementation:

Like described above, the segwit version of the bond can only protect UTXOs with scriptcodes within the limit of 364 bytes.
The segwit bond only protects segwit v0 based UTXOs; the consumer of the bond should be aware of the fact that the bond has no effect on either legacy (pre-segwit) UTXOs or taproot ones.
Using a bond means that RBF can't be used, as this is technically a double spend. We decided not to attempt to implement RBF rules in our smart contract. This limitation means that a party receiving funds spent from a bond-protected UTXO should be aware that the fee-rate of this transaction cannot be raised.

Some work in progress:

a web app to create and interact with bonds (coming soon at http://zeroconf.me)
a taproot-supporting version of the smart contract as detailed in this post