Skip the joins with Semantic ABI
An open source library for analytics ready decoding of EVM transactions.
One of the hardest parts to understanding on chain Web3 usage is just getting the data in the right shape for analysis — be it SQL in Dune or through clicks in HyperArc. Even after decoding logs and traces from their hex in transactions with their ABIs, you’re left with semi-structured JSON of arbitrary nesting due to struct and array fields.
tl;dr Annotate your ABIs and skip all the SQL and slow runtime joins with this.
Why is Everything a Table?
To handle this semi-structured JSON, most Web3 data tools try to normalize everything to a table. A decoded project like for Lens Protocol will result in dozens of tables, one for each event and function.
Even with these normalized to the lowest common denominator of a single table, more complex — but common — fields of type structs and arrays are still left as hard to use “blobs”. This forces things like orders (a struct with nested arrays of more structs) in the fulfillAvailableOrders
Seaport call to be jammed into a single column as an array of strings that needs to be parsed out manually later in SQL.
After all this work to normalize, these single tables are not actually useful on their own. The next step is to denormalize and bring them back together through joins and unions so that the final “curated” table is ready for analytics and has all the fields needed to understand the behaviors and users of the project. This is usually by joining on the hash of the transaction that emitted all these tables, a pretty expensive high cardinality operation.
Kinda went in a circle there...
We broke down a nicely structured transaction into dozens of tables just to painstakingly stitch it back together by transaction to make sense of the transaction.
Skip the Tables
What we really want to do is to get to skip the intermediate tables and get to the final dataset directly from the hierarchical logs and traces of a transaction. A smart contract ABI is the schema of this structure so lets start there and just give it a bit more meaning.
The Semantic ABI does just this, it allows us to annotate an ABI with additional meaning — semantics — so that we can get to an analytics ready dataset in a single shot.
Explosive Unions
We often need to compare or aggregate data from different events or functions requiring us to union and normalize multiple tables. We can do this easily with @isPrimary
annotations on multiple items in a Semantic ABI, but lets make it a little harder with one function containing an array.
Lets create a single dataset containing orders from both fulfillAvailableAdvancedOrders
and fulfillBasicOrder_efficient_6GL6yc
traces for all Seaport transactions. The first step is to @explode
advancedOrders
in fulfillAvailableAdvancedOrders
, creating a single row for each item in the array. Explode automatically applies values hierarchically so fields like fulfilled
will be copied to every exploded child row.
All we need to do now tag both as @isPrimary
which will align basic and exploded fields by name with manual alignments possible with name in @transform
such as for parameters_orderType
. The resulting table will contain all orders from both functions, ready for analytics.
The final Semantic ABI will look something like:
{
"metadata": {
"chains": ["ethereum"]
},
"abi": [
{
"name": "fulfillAvailableAdvancedOrders",
"@isPrimary": true,
"@explode": {
"paths": ["advancedOrders"]
},
"inputs": [
{
"components": [
{
"components": [
{
"internalType": "address",
"name": "offerer",
"type": "address"
},
{
"internalType": "enum OrderType",
"name": "orderType",
"type": "uint8",
"@transform": {
"name": "parameters_orderType"
}
},
...
],
"internalType": "struct OrderParameters",
"name": "parameters",
"type": "tuple"
},
...
],
"internalType": "struct AdvancedOrder[]",
"name": "advancedOrders",
"type": "tuple[]"
},
...
],
"outputs": [
{
"internalType": "bool[]",
"name": "fulfilled",
"type": "bool[]"
},
...
],
"stateMutability": "payable",
"type": "function"
},
{
"name": "fulfillBasicOrder_efficient_6GL6yc",
"@isPrimary": true,
"inputs": [
{
"components": [
{
"internalType": "address payable",
"name": "offerer",
"type": "address"
},
{
"internalType": "enum BasicOrderType",
"name": "basicOrderType",
"type": "uint8",
"@transform": {
"name": "parameters_orderType"
}
},
...
],
"internalType": "struct BasicOrderParameters",
"name": "parameters",
"type": "tuple"
}
],
"outputs": [
{
"internalType": "bool[]",
"name": "fulfilled",
"type": "bool[]"
}
],
"stateMutability": "payable",
"type": "function"
}
]
}
Dependent Joins
Lets do some more analysis on Seaport conduits this time. To do this we will need to join fulfillAvailableAdvancedOrders
trace which contains the conduit field with the OrderFulfilled
log to validate which orders where actually fulfilled. Since we’re mildly paranoid, lets also verify this with an actual transfer log.
Instead of complex joins we simply use the @matches
annotation which will find events, traces, or transfers within the same transaction (much more performant than joins in SQL) to bring together all the fields required for analysis into a single dataset. No further joins required.
Matches are applied sequentially so we can first match the single fulfill trace with multiple OrderFulfilled
events and then match each resulting row with a single expected transfer for verification. We can be explicit with verifications by asserting on expected cardinality with many
vs. onlyOne
.
Here’s what that Semantic ABI will look like.
{
"metadata": {
"chains": ["ethereum"]
},
"abi": [
{
"name": "fulfillAvailableAdvancedOrders",
"@isPrimary": true,
"@matches": [
{
"type": "event",
"signature": "OrderFulfilled(...)[])",
"prefix": "fulfill",
"assert": "many",
"predicates": [
{
"type": "equal",
"source": "recipient",
"matched": "recipient"
}
]
},
{
"type": "transfer",
"prefix": "transfer",
"assert": "onlyOne",
"predicates": [
{
"type": "equal",
"source": "fulfill_offerer",
"matched": "fromAddress"
}
]
}
],
"inputs": [
...
{
"internalType": "bytes32",
"name": "fulfillerConduitKey",
"type": "bytes32"
},
...
],
"outputs": [...],
"stateMutability": "payable",
"type": "function"
},
{
"name": "OrderFulfilled",
"anonymous": false,
"inputs": [...],
"type": "event"
}
]
}
The Rest
There’s a bunch more features including adding derived fields through expressions and is implemented in both Python and TypeScript, take a look and contribute here!