web3 data permanence

when i put data on the public internet, like assets in an nft, i want the data preserved and available for as long as human beings are using the internet, which is really the definition of permanent we’re talking about when we talk about nfts and web3

here’s juan talking about how, at protocol labs, we think about storage that can persist through the future of human civilization

storage guarantees are something i’ve been thinking about a long time. back in 2010 i worked on CouchDB and i wrote a browser compatible version of it called pouchdb. i spent the next 10 years working with and writing bespoke storage systems using whatever ideas were percolating at any given time about how to build such systems

that was all really fun 😃

but i didn’t gain the perspective i needed for web3 until i joined protocol labs and came to understand the full impact of content addressed networks

in IPFS, we use this address called a CID, and it’s quite brilliant in terms of what it can address. not just what it can do now, but into the future. it’s a truly universal content address and the “IPFS network” is simply the means by which you get the data for these addresses, which means its not a fixed protocol, it can grow and evolve in the same way new routing protocols emerged as the scale of the internet grew

as such, IPFS has a much wider view, what we should probably call a “global” or “internet scale” view of data addressing, and this has had a large effect on how i think about storage guarantees. because you don’t ask a storage system for data in IPFS, you ask the internet for data, and anyone on the internet can give it to you, that’s what open permission-less protocols look like, that’s web3

in all the systems i previously built, the storage system would return a “guarantee” that data was sufficiently stored in that system. you could then come to understand how safe your data was by understanding what was included in that guarantee (multiple copies, multiple locations, fully consistent or eventually consistent, etc)

when we built web3.storage and nft.storage they were built in a similar way, the services don’t return a successful response until multiple copies of the data are securely stored and made available from multiple provider nodes. i’m not saying this to brag, this is table stakes for a storage system, but it’s important to draw lines between systems built for the future like IPFS and where traditional systems agree

these services, and other services like Pinata, are all comparable to the cloud services we’ve used on the web for a long time and have come to rely on

but these systems alone no longer satisfy the permanence i want to see for my data in web3, because IPFS has greatly broadened what i consider “permanent”

permanent, to me, means being able to persist beyond the life-cycle of a storage system. even storage systems that make claims to hundreds of years of persistence are relying on optimistic views of their own system that i shouldn’t subscribe to and should really be hedging against

i want my data in several storage systems, and i want to continue to add additional copies into additional storage systems as they arise

until IPFS, there wasn’t really a way to accomplish this, it’s hard to make guarantees that your data will just copy into future systems. but IPFS does this by not binding any of the content discovery or transport protocols to the address format. in other words, an IPFS address can be provided by anyone at any time now or in the future because it doesn’t include or imply a specific provider or even the means by which you would find a provider

IPFS addresses are just hashes you download from the internet. there’s a little more going on than that, and what is going on is really cool, but as we already know from decades of open source, the best way to hedge against future changes is to decompose into smaller parts that address individual problems

so IPFS supports all future storage blockchains and transports by not baking one in to the address. it hedges against all future hashing functions that might be used by self-describing the hash function in the address

it’s pretty brilliant stuff, and it pre-dates me having worked on it so i’m not just talking up my own work

i want to put my data on every web3 storage system that comes around, but a few of them have started to do this strange thing where they position their system in opposition to IPFS even when it’s easily compatible. since IPFS is not designed to handle provider guarantees (that’s what Filecoin does) it makes a very compelling strawman, you can say it doesn’t do a bunch of things your project does because it’s not suppose to

we all love Filecoin, but i don’t think anyone believes it’ll ever be the only storage chain, that’s why it just provides IPFS addressed data to protocols designed around IPFS, so that other chains and service providers can all easily provide into the same network and users can move their data around transparently