Let’s start by saying that I’m an admirer of the blockchain technology and a very early observer of it. Please note I said “early observer”, and not necessarily “early adopter”. When it comes to blockchain, I’m more of a “just in time adopter”, to be honest, and not an early one, because, sometimes, early success might be bad. But that’s another topic.
Beyond the speculative bubble that is engulfing the entire crypto space every 3 years or so (around the “halving” time, for those Bitcoin-savvy), there are a lot of interesting topics to ponder about in this area. So, if you’re here for price predictions, technical analysis (or drawing lines on charts and pretending the reality follows them), well, it’s safe to leave now. But if you’re here for some thought provoking hypothesis related to the overall security of the blockchain, specifically of Bitcoin’s blockchain, grab a coffee. It will take at least 10 minutes.
What Breaking Bitcoin Means?
In my opinion, breaking Bitcoin means proving that access to tokens located at a certain address is unsafe. It doesn’t necessarily mean to break the SHA-256 / RIPEMD-160 algorithms per se (don’t worry if you don’t know what these algorithms are, for now).
In mathematical terms it means to find a repeatable and predictable collision in the address space.
In layman terms, it means finding “the password” or the private key of a Bitcoin address.
Even if this particular event is very difficult to achieve and even if it requires a lot of computing power (the kind that just a nation-state actor can harness), the result will be that Bitcoin is breakable. The value stored in the blockchain is built on top of the trust invested in, and proved by the blockchain. If this trust decreases, or has provable chances to decline, then the trust will decline as well.
It’s important, for me, to start with this definition, because it implies many other components. First of all, it’s not about proving theoretically that cryptography behind it is weak, because it isn’t. I am very well familiar with terms like “untractable problems”, “polinomial times”, “double hashing” and so on and so forth. It’s not about that, because that is solid, it has been audited many times. And second, Bitcoin is not a theoretical product, it’s became a practical asset, one with a market cap around one trillion dollars. Any shakeup to this asset can have massive ripples in many directions. As you will see below, if we can prove that access is unsafe, by using a variety of tools, even if the cryptography is solid, then it is breakable.
So, what would breaking Bitcoin would involve, in a scenario which doesn’t rely solely on breaking the theoretical crypto behind it?
Quantum Computing? Not Yet
First thing that comes to my mind is quantum computing. Because of the way computing is done in these computers, some cryptography problems might be a bit easier to solve than others. Each type of computing machine is better suited for specific tasks. Using traditional computing machines to solve crypto problems isn’t practical, but putting quantum computers at work for that might be relevant.
I don’t think this is an avenue worth checking, at least not in the next 3-5 years. The cost to that is still prohibitive, not only to use it, but to scale it to the amount of work required to tackle such a problem.
So, brute force with the most appropriate tool, quantum computing, is out of reach, at least for now.
Introducing Generative Adversarial Networks (GANs) And A Bit Of Adjustment
If you’re not up to date with AI, specifically with Generative Adversarial Networks, you should change that. This is a very recent area of AI, but it evolves really fast.
Very shortly put, GANs are AI structures comprised of two actors: one that tries to fool, and other that checks the first one for falsified stuff – the latter has access to the “true” objects as well, while the former doesn’t, the forger doesn’t even know what it falsifies. It’s common to use, in the learning process, the terms “art forger” and “art critic”, to distinguish between the two.
The process usually starts with some noise from the part of the “forger” and with answers from the “critic”. The forger tries to guess what exactly should paint, in order to fool the critic. It’s a fascinating process, especially because it may start with just some noise, and get to unbelievable levels of accuracy. One of the most famous results of such an AI model is thispersondoesnotexist.com. Every time you load that page, the model generates a person that doesn’t exist, but who is extremely plausible. On a side note, GANs applicability area is immense and I might write something about this in the future, but for now let’s see how this could be used for testing Bitcoin.
Suppose we want to train a forger to generate not human faces (which are, basically, still collections of bits), but private keys. It will start with just some noise, and then it will continue to talk to the critic, (the other part of the model), which already has a huge repository of real private keys – you can generate millions of that.
Let’s add some restrictions to this, because, like I said, we’re not intending to break the SHA256 protocol itself – I doubt this will work for such a task – but just a subset of it. So, not only the model will be trained on a data set of about 400 millions public / private keys (which is the dimension of the Bitcoin network at this moment), but the results will also be checked against an even smaller subset, namely all the public keys from satoshi era. I believe we’re under 10 millions of keys in the first 2-3 years. This iteration will make things go significantly faster, saving on computing power.
So, the process will go like this:
- generate test data of the size of the current Bitcoin network (400 millions public / private pairs)
- start training the forger until it gets “close enough”, by making it talk to the critic, which will evaluate the forger against its huge set of key pairs
- as the forger gets closer, run the results against the 10 million public keys in the satoshi era, see if any of them “fits”
- optional: run the result on a separate thread also, against all the remaining 390 millions addresses of the network
As you can see, it’s a combination of GANs, social engineering and brute force. The brute force part is still huge, if we want to do this properly, but not outside the reach for some powerful entities, from corporations (Google trained CPT on trillions of parameters) to nation-state actors.
I lack the mathematical skills to properly model this, and that’s one of the reasons I’m putting this out, in the hope that someone better equipped in this area can run some analysis on it, but my intuition (which may be wrong) tells me we have a much higher probability to find a private key in this setup, rather than trying to break the entire algorithm.
Of course, we may also have a much lower probability, that’s also an option.
But I think it’s worth thinking about it, especially about the data set that we use to train the forger.
What If BItcoin Is Unbreakable?
Well, if we find that this is not working, all good, back to our lives.
What If Bitcoin, As It Is Now, Is Breakable?
If this scenario works, then we have a problem. Here’s how I think it can be solved.
If the size of the trainable model is a parameter in the probability of discovering the private keys, then we should adjust the size of this attack surface. In other words, we know that an AI model performs better when there is more data to train on. So, having a fixed repository of 400 millions public addresses that can, theoretically, be used to check a generated private key against them, will make the model perform better. Let’s just lower this attack surface, then. Less data should make the model perform worse.
How can we do this? Apart from increasing the complexity of the generated keys, which is obvious, I think we can also do this by introducing something that I call “permanent limited supply”.
How does this work?
Well, Bitcoin will still have a limited supply, only it will be limited in time. It will still be 21,000,000 tokens, but, and here’s where the part “permanent” comes in, not since the genesis, but in the last, let’s say 10 years. The supply will be kept at 21,000,000 by “moving it” along a time window. As we advance in time, coins outside this window will become obsolete, untransferable. This can be done relatively easy, by allowing coins to be liquid, or “transferable”, only from a certain block height onwards.
That means people who have coins that will “expire” will have to move them to a new address, which will be added to a higher block height, thus escaping obsoletion. This, again, can be done either by users themselves (kinda difficult, but it’s nice to have it as an option) or by a separate, programmable entity. It can be a layer 2 app on top of blockchain, or even a routine added in a hardfork, on-chain.
This will always generate fresh addresses, while obsoleting the expired ones, which means it will put more pressure on the GANs to train for these new addresses (it lowers the quality of the data set by increasing its entropy), and, from a certain point, this will become ineffective / too expensive for the GAN.
Another interesting economical consequence of a permanent limited supply is that fresh Bitcoin will keep being generated, as the unused, obsoleted one is “left behind”. In our current setup, this will be true for a good number of years, since the satoshi era BTC is, allegedly, unusable – Satoshi Nakamoto disappeared (or we will finally find out he didn’t disappear, once he starts moving his coins to higher block heights?). But even as more people will be wary about keeping their Bitcoin “valid”, and will move it “forward”, new BTC should still be generated, to account for the one used in transaction fees.