Solving the Problem of Uniqueness
There is a solution - but we need to look at one that doesn't work first
Note: This is the second post of a 3-part series. Part 1. Part 3.
In the previous blog post, I presented the problem of uniqueness. As a short recap: there’s no way for singletons to check that a value (e.g., a name) hasn’t been used (e.g., registered) before. In this post, I’ll explain a previously proposed solution - and why it doesn’t work - as well as a solution that does work.
But first, back to the basics: let’s talk about hashing.
Hashing 101
There are many ways to describe hashing - but let me invent a new one based on AI’s rising popularity. Let’s imagine we have a robot that labels things - you give it an object, and it puts a label on it. Let’s also imagine that it’s 2100 - so the robot is powered by a combination of GPT-313, Gemini-7, and Grok-42. At this point, no one has any idea what the labels mean - but they’ve observed the robot puts the same labels for identical objects (i.e., two apples will have the same label). The robot also never uses the same labels for different objects (apples and pears will have different labels).
Hashing is just the digital equivalent of the Labeler 3000 robot:
Hashing algorithms (yes, there’s more than one - although SHA256 is arguably the most popular) take a piece of data (of any size!) and output a fixed-size ‘fingerprint’ of that data. They have two properties that make them special:
Deterministic: Given a piece of data, its hash is the same no matter the time or computer I run the hashing algorithm on. In other words, getting the same input through the same algorithm should produce the same output. Intuitively, it makes sense for a fingerprint to remain the same for a given object.
Not Reversible: Given an output - a hash value - it’s really hard to determine an input string that produces that output. More specifically, if I give you the hash ‘e378e72a75855c08d49b6421b04374a1c16d36f4f1cec3d06a3da38053a8fdcd,’ your best bet to determine what value produces it is to go through many values, hash them, and see if the outputs match (that string is the SHA256 of ‘yak’ - not that it matters).
The latter property leads to the comparison between hashing and a blender being frequently used. Blended apples would look the same (and differently from a blended oranges) - but good luck reversing the blender’s effect on the apple. Algorithms such as SHA256 have been carefully built to act like a digital blender. They take data in batches, and mix it into the output value in a very complicated way.
Now that we got that out of the way, let’s talk about one very cool use of hashing!
Merkle Trees
For every spend, the TibetSwap singleton has to run one of the following puzzles: add liquidity, remove liquidity, swap XCH to CATs, swap CATs to XCH. The goal is to make this mechanism take as little space as possible on-chain in order to minimize transaction cost.
The first place where hashing can be used is to ‘compress’ puzzles. Instead of storing the whole puzzle, we can store a hash (a ‘unique fingerprint’) of each puzzle in a list. When the singleton is spent, a full puzzle is given - we can hash that and see if the fingerprint is in the list of hashes that are allowed to run. It’s like implementing perfect biometric security for access into a building - if you have the wrong fingerprint, you’re not getting in.
However, we can do even better. Merkle Trees let us ‘compress’ the whole list via hashing. Here’s how this would be done for a list with 8 items:
The method to build a Merkle Tree is simple: you take every 2 items from the list and hash them together. That gets you from the bottom level (1, 2, …) to the second level (12, 34, 56, and 78). You can repeat this procedure until you only have one hash left - 12345678 in the image above. This ‘top’ hash is usually called the Merkle root. You have now stored the data in a very clever way: when you want to prove that an item is part of the original list, you only have to provide a few intermediary hashes, like below:
Let’s say you want to prove 3 is in the original list. By providing 4, 12, and 5678, someone can get at the Merkle root (top hash) from ‘3’ - all without knowing or using the whole list! The really elegant thing here is that nodes 1,2,5,6,7,8 are not included in the proof - instead, intermediary values such as 12 and 5678 ‘cover’ more than one node.
Let’s compare this to the ‘naive’ solution of hashing the list and accepting a reveal. In that case, you would have to provide all 8 hash values (1-8) to check if a puzzle is allowed to run. With Merkle Trees, you provide the puzzle hash (3) and 2 intermediary values (12, 5678) to compare against the Merkle Root. More generally, with Merkle Trees, you’re looking at log2(n) items instead of n for proofs of inclusion - which makes them a really great tool!
Previously Proposed Solution
We’re now in a good place to understand the best solution proposed for the problem of uniqueness so far. Back in 2023, Ken (also known as fizpawiz - a great Chialisp developer) published “Decentralized Names on the Chia Public Blockchain”. He suggested that we can use a Merkle Tree of sorted values to prove a name hasn’t been registered before - like the one below:
The idea is very elegant: to prove that 7 hasn’t been registered, prove that 5 and 10 have been registered, and that they’re next to each other in the current Merkle Tree. Because the list is sorted, you know for certain that 7 hasn’t been registered.
This idea has apparently also been proposed as a solution for the name service of Cardano. Unfortunately, it won’t work - while they’re good for proofs-of-inclusion, Merkle Trees are not practical for proofs-of-exclusion.
Let’s first assume we want to keep the new item - 7 - on the same ‘level’ of the tree. After the name is registered, the dApp will have to calculate the Merkle root of the new tree (i.e., one where 7 is part of the list) so future registrations can be processed as well. What intermediary hashes will have to be recomputed after 7 is inserted?
The answer is: most of them. When you’re inserting 7, you’re shifting the nodes on its right by one. Because you always hash nodes in pairs, all pairs on the right will change, and a lot of rehashing will need to be done. In practice, you end up having to reveal a majority of the initial list in many cases. That’s not practical because of size constraints - the more data your transaction contains, the more expensive it is. Additionally, blocks have a limited maximum size - at some point, you won’t be able to register some values because the data that needs to be revealed to register them doesn’t fit in a block.
There is, of course, another alternative to insert 7 - the one which Ken thought of when proposing the solution. What if, instead of keeping everything on the same level, we create a new one? When we insert 7, we can just turn the ‘5’ node into ‘5,7’. This sounds much better, as you only have to recompute log2(n) nodes, and you no longer have to reveal most of the tree.
Indeed, the problem in that scenario is smaller - but it’s still there. If your tree grows differently in different locations, it will become unbalanced. Assume someone also registers 5.1, 5.2, 5.3, 5.4, and so on. When I want to register 6, I’ll have to reveal all those values, plus 5 and 7. The 6 item would be added to a very ‘long’ root in an otherwise pretty shallow tree. This ruins the nice property of Merkle Trees that we’ve talked about - proofs are no longer log2(n). Moreover, this is a shared application - so someone could in theory ‘attack’ the dApp by registering 5.1, 5.2, etc. in order to annoy the person who’ll want to register 6.
There are, of course, other types of trees as well. Specifically, B-trees, which were created for databases, are very good at ‘auto-balancing’ the tree so insertions/deletions are still done in logarithmic time. But they also reveal another weakness: as we add more fixes on top of the tree idea, the code is getting more and more complex. That increases transaction cost, development time, and attack surface - isn’t there a much simpler alternative?
Trees are not the solution
I have myself been stuck down the rabbit hole of trees for a very long time - that’s why I know what a B-tree is in the first place. However, it’s important to keep in mind there are other data structures that could be used in our battle against the problem of uniqueness. So let’s go back to the basics.
We want something that ensures an item is unique - and we’ve seen that sorted lists are great because they allow us to prove uniqueness by looking at two neighboring items only (recall that, if we see 5 next to 10 in an ordered list, we can say for certain 7 hasn’t been registered before). However, we also want to insert new items into the list as easily as possible (ideally, O(1)). Allow me to introduce the data structure that can solve the problem of uniqueness: sorted, doubly-linked lists:
Doubly-linked lists are a known programming data structure with a simple idea: each node (item) also knows - aside from its own value - its neighbors. So, for example, node 5 will also contain the information that its left neighbor is 1 and its neighbor on the right is 10. What’s really cool is that, when you want to insert 7, you only have two update two nodes, which will be his future neighbors: 5 and 10. Specifically, you’ll ‘break’ the link between 5 and 10 by setting 5’s right neighbor to 7 and 10’s left neighbor to 7. Moreover, if you check 5 < 7 < 10, you can verify 7 hasn’t been registered before at the same time!
This is nice and would be a great way to solve the problem of uniqueness - if we properly store the list on-chain. Recall that each node contains 3 values (its own value + its two neighbors), and that we don’t want to have to reveal the whole list every spend. Let me give you a graphical hint on how we can do it:
The essential idea that makes it all work is to have each element be a coin. The registry singleton only interacts with node coins when they’re used - i.e., when they’re the future neighbor of a value being inserted. The user/wallet provides the relevant coins when registering - remember, just looking at two nodes is enough to make sure the value hasn’t been registered before. And that’s how you (simply) solve the problem of uniqueness.
One last thing to note is just how much sense this solution makes. After you register a value - say 20 - you only need to access it when one of its neighbors will change (i.e., to prove a value next to it hasn’t been registered before - and insert it). When it’s not needed, however, the coin containing 20 will remain unspent - which doesn’t increase the cost of a transaction on the underlying registry singleton at all. In other words, proving a value is not part of a list of 100 items is the same as proving uniqueness in a list of 1,000,000 items - which solves a huge concern of dApp developers (“will it scale?” - yes, this part of the dApp will).
Up next
When I presented this solution to other people (I had to make sure I wasn’t crazy), their initial reaction was “Wow! That’s clever!,” followed by “It’s so much simpler than I expected.” In my book, the simpler you can make a dApp, the better.
I started calling the coins that store the list items “slots”. I think slots will have much broader functionality beyond what I initially used them for, as singletons can essentially ‘save data for later.’ For example, slots would give a liquidity farm the ability to not only store the amount of liquidity tokens you added to the farm, but also the cumulative rewards at the time of deposit - making liquidity farms finally possible on Chia. Slots can do that without using a doubly-linked list structure - in a way, they become ‘tickets’ that liquidity depositors can later redeem for their tokens and accrued rewards.
Taking a step back, I do think this concept has lifted the final barrier to more advanced Chia dApps. I’m really excited for what’s next - so join me in Part 3 to see the PoCs I built to test out this new primitive’s capabilities.
Special thanks to stl and (or, maybe, a.k.a. 😉 ) Bob for reviewing this article.
This is a good idea. A lot of the details of implementation have to do with scope, specifically whether values can be proven on chain. If that's a requirement then a capability needs to be given to each coin to be able to report its value, and coins need to remember the values in their adjacent coins rather than coinids because the ids may change as they're used for reporting.
Could the simple text phrase representing the Name be in an NFT Metadata field of the NAME NFT that is minted ? Is is possible that NFT metadata can be used to route the money to the correct address without needing to hash it ? 🙏