Understanding substrate storage with Merkle proofs

This article is based on my understanding on how patricia merkle trie is implemented on substrate storage. If you find any misconception , feel free to clarify it on comments. And this article assumes you are familiar with substrate and Rust. And also installed Rust and cloned the substrate-node-template. You can follow this guide for fast setup.

Substrate framework is a gigantic piece of software and an encapsulated complexity design to favor developers to focus on developing their business logic rather than understanding the framework or at least how each piece fits together. Well, this is highly supported by the extensively usage of Rust macros. So things just work because you know they should work, unless you follow Bastian Kocher’s advice:

Well this type of complexity is somewhat described by Vitalik Buterin as it occurs when a system is comprised of sub-system which are internally complex but present a simple interface to the outside.

This is how substrate is, as it was intended to enable developers to focus on their business logic , that is why there are tons of APIs and a bunch of macros to simplify your work but in turns hide the complexity and giving a hard time understanding how each pieces fit together. And that is why we are going to try to understand how storage on pallets work by working with Trie’s raw API, Storage proofs and Polkadot Js tool.

Note this is a work on progress on trying to understand substrate . Some concept will lack clear explanation, but stack-exchange is here for the rescue.

Pallet’s Storage items

Substrate provides 4 storage API’s , StorageValue (for storing single data without specifying a key), StorageMap( acting like a Rust HashMap), StorageDoubleMap( as the name suggest this allows for a single value to have 2 different keys) and CountedMap( a hashmap but having N set of keys).

We are going to focus on StorageMap throughout the article.

// pub struct StorageMap<
	// 	Prefix,
	// 	Hasher,
	// 	Key,
	// 	Value,
	// 	QueryKind = OptionQuery,
	// 	OnEmpty = GetDefault,
	// 	MaxValues = GetDefault,
	// >PhantomData<(Prefix, Hasher, Key, Value, QueryKind, OnEmpty, MaxValues)>;

#[pallet::storage]
pub type Example = StorageMap<_,Blake2_128Concat,Key,Value,ValueQuery>;

We wont cover the documentation of the macro, for reference here it is. An for the macro expansion , cargo-expand command will do.

So from the above code sample the storage item’s key is generated using this instructions

twox_128(pallet_name) + twox_128(storage_name) + Blake2_128concat(key)

The hash of pallet_name and storage_name use a non-cryptographic hasher , and this act as a key prefix. This enables one pallet to have multiple storage instances.

The key and values generated and obtained from the instance are stored inside Trie data structure for easily verification of data inclusion inside blockchain state.

Trie data structure

Substrate implements a Base 16 Merkle Patricia Trie this uses a key as a path to the node containing a value. Shawn Tabrizi did a good job explaining it in as in this article it will focus on proofs.

A Trie comprises of different types of nodes, each contains a piece of information about key slice ( nibble), optional children nodes and a value except for leaf node which only contains a nibble and a value.

Merkle Proofs

This can be better illustrated on a working substrate node. For simplicity we will be using a node-template without any custom pallet implementation, follow this guide for installing and compiling the fresh node. Facing any errors head over stack-exchange.

Start your local node and connect to Polkadot Js. And it will look like this.

Using Merkle proofs we will get a total issuance from the Balances pallet. First, create a new Rust library and add these crates.

use sp_runtime::testing::H256;
use hex_literal::hex;
use sp_runtime::traits::BlakeTwo256;
use sp_trie::{LayoutV0, StorageProof};

hex macro will convert hexadecimal number to an array of bytes, H256 will be used to represent state root type, and BlakeTwo256 will be used as a hasher when initializing TrieDB.

Head over polkadot js → developer →chain state → choose pallet balances and also TotalIssuance and obtain the encoded key.

let balance_total_key = hex!("c2261276cc9d1f8598ea4b6a74b15c2f57c875e4cff74148e4628f264b974c80").to_vec();

This key will act as a path to the desired node and fetch the value.

But we need to construct those nodes and form a partial Trie. As in substrate, there is one state Trie which encodes all storage information, but we need only nodes which are associated by the above key. So go to polkadot js → Developer → RPC → choose state and getReadproof method which takes in key and optional block hash.

After that take the array of proofs returned and each convert into array of bytes using hex macro.

let proof_nodes = vec![

hex!("80aeb98035f7cde2e1a25c6eec689191e08e896725f959a7ef6b0cbb31f516ff48a4b1fa80bdcbffecab78bafa245dc3b3127950dff1fe1dd149e9a996ee027511ffe72b9c8095af6b976dcda7575f3a70d78069e7fc1c1b939f4e74dcb54fd591fa9cbd86fa80a01c55f70abfd0e73c4706ecb909c731ff76c4b5d33211f2f5ccf2010200ff7580b828e0de1a7374ec1d8d6343c88da9c947c03575e80b563961ef45cc55892404806dacde93a8261b37191a4eb13e234784e0b480f7a66fe81987c0020df225c52c80a76457504800f92658d8e66d0e1124e5a3707b78de8d1e04388f62b12574dc8a80cf3779122367a4cb15bf42f3f08fb5a9758931873d8b188c9460bdfb0af7ea6580ae03a0a1554af614c8f399ecc27b3000a22b51a7cf5ea30b119723793318264e80576a54d973aa04351247342d02f7578148c2bc3a13a24b6ea531acdbc96a5f0c").to_vec(),

hex!("9f02261276cc9d1f8598ea4b6a74b15c2f38004c5f008ce9615de0775a82f8a94dc3d285a10401505f0e7b9012096b41c4eb3aaf947f6ea429080000805c8554a5bfdfd147a3ead2c645b463b10107d90df0d9058942901196d162aaee").to_vec(),

 hex!("5f07c875e4cff74148e4628f264b974c80406ff2a5e9ffffff3f0000000000000000").to_vec(),
];

You may get different nodes value but the process is the same. So what are these hexadecimals numbers?

As we stated earlier Trie consists of encoded nodes. Each value consist of different type of serialized node, we cannot get the original structure because parity_scale_codec does not store the context of the serialized data. Below is the overview of these nodes representation

Untitled (1).png

So each one of those numbers represent or atleast resemble the above diagram. But we cannot get a value without constructing a partial Trie. So do the following.

let database = StorageProof::new(proof_nodes).into_memory_db();

we are constructing a memory_db from these storage_proofs. A memory Db act as backend database. This DB implements HashDB trait which takes in a value hashes and returns a key.

So in the backend each node will be hashed as hash( node) and a resulting key will be used to fetch the value. So in sense we have two types of keys , one acting as a path to a node and one for getting value and that one is actually stored in a database.

So after constructing a memory DB, what is remaining is a TrieDB. A TrieDB provides a persistent read access from given key. But also you must provide which key hasher you want to use and a Trie Layout. For now we are using V0.

Note that a Trie structure is virtually constructed image. But it is just and array of arrays.

So the block used to fetch the Read proofs should be used to obtain the state root. Head over Polkadot js → Network → explorer →Block details

let state_root = H256::from(hex!("07a6a5f080a7e087b23839f1b135e37ff2ceb9f5391a26bec5240fbda78d0b4d"));

The storage state root should have a representation of a hash. As a state root result from recursive hashing of all nodes inside the Trie.

So after that construct a Trie DB.

let trie_database = sp_trie::TrieDB::<LayoutV0<BlakeTwo256>>::new(&database, &state_root).unwrap();

After this we are ready to fetch our stored value.

TrieDB has several APIs for interacting with.

let mut value = trie_database.get(&balance_total_key).unwrap().unwrap();

The value returned is encoded value , but we can decode it by using parity scale codec library.

let decoded_value:u64 = parity_scale_codec::Decode::decode(&mut &value[..]).unwrap();

As said earlier you need to explicitly annotate the return value after decoding because parity scale codec does not have the context and the structure of the decoded and encoded value.

You can assert the value and it will be equal to the total balance issued 4,611,686,018,427,387,904.

So I hope you can get a somehow clear picture on how substate storage is . The next article will focus on Child Storage ,verifying of root and looking into light client how it verifies the state inside the blockchain.

For reference

Parity Trie → https://github.com/paritytech/trie