Log Format & Storage
This document describes the format and structure of logs generated by the memlogger.
WAL Directory Structure
Overview
$CHAIN_DIR/data/log.wal/
└── node-<node-id>/
└── <yyyy-mm-dd>/
├── seg-NNNNNN.wal.gz
└── seg-NNNNNN.wal.idxComponents
Node ID Directory
node-e687dd88b46b950a919304190786e03f667347ce/- Purpose: Isolate logs per node
- Format:
node-<hex-node-id> - Source: Derived from node’s validator key
- Benefits:
- Multi-node support on same filesystem
- Clear attribution of logs
- Prevents cross-contamination
Date Directory
2025-11-23/- Purpose: Daily log rotation
- Format:
YYYY-MM-DD - Timezone: UTC
- Benefits:
- Easy archival by date
- Bounded directory sizes
- Simple retention policies
Segment Files
seg-000001.wal.gz # Compressed log data
seg-000001.wal.idx # Index for seeking- Purpose: Store compressed logs and enable efficient replay
- Naming: Sequential numbering within each day
- Format: 6-digit zero-padded (000001, 000002, …)
File Formats
WAL File (.wal.gz)
Encoding: Gzip-compressed JSON lines
Structure:
<gzip header>
<compressed data>
<gzip footer>Decompressed Content: Newline-delimited JSON (NDJSON)
Each line is a JSON object representing a log entry:
{
"level": "debug",
"ts": "2025-11-23T10:15:30.123456Z",
"msg": "state change",
"module": "store",
"height": 12345,
"store": "bank",
"operation": "write",
"key": "0x12ab...",
"value": "0x34cd..."
}Index File (.wal.idx)
Purpose: Enable efficient seeking within compressed WAL
Format: Binary format with fixed-size entries
Entry Structure (example):
Offset: 8 bytes (uint64) - Position in .wal.gz file
Timestamp: 8 bytes (int64) - Unix nanoseconds
EventCount: 4 bytes (uint32) - Number of events
Checksum: 4 bytes (uint32) - CRC32 of blockBenefits:
- Fast seeking to specific time ranges
- Validation of data integrity
- Efficient replay without full decompression
Log Entry Format
Standard Fields
All log entries include these fields:
| Field | Type | Description |
|---|---|---|
level | string | Log level (debug, info, warn, error) |
ts | string | ISO8601 timestamp with nanosecond precision |
msg | string | Human-readable message |
module | string | Source module (store, consensus, etc.) |
State Change Events
When state changes occur:
{
"level": "debug",
"ts": "2025-11-23T10:15:30.123456Z",
"msg": "store change",
"module": "store",
"height": 12345,
"store": "bank",
"operation": "write",
"key": "62616c616e636573...",
"value": "0a0b3130303030...",
"key_string": "balances/cosmos1...",
"value_decoded": {
"amount": "100000",
"denom": "uatom"
}
}Additional Fields:
height: Block height where change occurredstore: KV store name (bank, staking, gov, etc.)operation: Type of change (write, delete)key: Raw key bytes (hex-encoded)value: Raw value bytes (hex-encoded)key_string: Human-readable key (if decodable)value_decoded: Decoded value (if decodable)
Consensus Events
{
"level": "debug",
"ts": "2025-11-23T10:15:31.234567Z",
"msg": "consensus event",
"module": "consensus",
"height": 12345,
"event": "NewBlock",
"round": 0,
"proposer": "cosmosvalcons1..."
}Block Commit Events
{
"level": "debug",
"ts": "2025-11-23T10:15:31.345678Z",
"msg": "block committed",
"module": "state",
"height": 12345,
"app_hash": "E3B0C44298FC1C14...",
"num_txs": 42,
"gas_used": 1234567,
"gas_wanted": 2000000
}Message Filtering
When filter = true, only these message types are logged:
Allowed Messages
-
State Changes
- Store writes
- Store deletes
- State merkle updates
-
Consensus Events
- NewBlock
- NewBlockHeader
- ValidatorSetUpdates
- Commit
-
ABCI Events
- BeginBlock
- EndBlock
- DeliverTx (with state changes)
-
Critical Errors
- Consensus failures
- State machine errors
- Panic/recovery
Filtered Out (when filter=true)
- Module initialization logs
- RPC request/response logs
- P2P connection logs
- Mempool transaction logs (unless committed)
- Routine info/debug messages
Compression
Gzip Configuration
Level: Default (6) - balanced compression vs. speed
Typical Ratios:
Raw JSON: 10.0 MB
Compressed: 0.8 MB
Ratio: 92% reductionPerformance:
- Compression: ~20-50 MB/s (CPU-dependent)
- Decompression: ~100-200 MB/s
- Negligible CPU impact (async operation)
Why Gzip?
Advantages:
- Universal support
- Good compression ratio
- Fast decompression
- Stream-friendly
- Well-tested
Alternatives Considered:
- LZ4: Faster, but lower compression ratio
- Zstd: Better ratio, but less universal
- Snappy: Fast, but lower compression ratio
Segment Lifecycle
Creation
1. Buffer fills or interval expires
2. Create seg-NNNNNN.wal.gz.tmp
3. Compress and write data
4. Fsync to ensure durability
5. Rename to seg-NNNNNN.wal.gz (atomic)The .tmp suffix prevents shipping incomplete files.
Rotation
New segment created when:
- Previous segment written successfully
- New flush occurs
- Day changes (new date directory)
Archival
After shipping to apphash.io:
- Segments can be safely archived
- Keep local copy for specified retention period
- Compress further for long-term storage (if needed)
Cleanup
# Example: Delete logs older than 7 days
find $CHAIN_DIR/data/log.wal/ -type d -name "20*" -mtime +7 -exec rm -rf {} \;Reading WAL Files
Manual Inspection
# Decompress and view
zcat $CHAIN_DIR/data/log.wal/node-*/2025-11-23/seg-000001.wal.gz | head -n 10
# Pretty-print JSON
zcat seg-000001.wal.gz | jq '.'
# Filter specific events
zcat seg-000001.wal.gz | jq 'select(.msg == "store change")'
# Count events by type
zcat seg-000001.wal.gz | jq -r '.msg' | sort | uniq -cProgrammatic Access
Go Example:
import (
"compress/gzip"
"encoding/json"
"os"
)
func readWAL(path string) ([]map[string]interface{}, error) {
f, err := os.Open(path)
if err != nil {
return nil, err
}
defer f.Close()
gz, err := gzip.NewReader(f)
if err != nil {
return nil, err
}
defer gz.Close()
var entries []map[string]interface{}
decoder := json.NewDecoder(gz)
for decoder.More() {
var entry map[string]interface{}
if err := decoder.Decode(&entry); err != nil {
return nil, err
}
entries = append(entries, entry)
}
return entries, nil
}Python Example:
import gzip
import json
def read_wal(path):
entries = []
with gzip.open(path, 'rt') as f:
for line in f:
entry = json.loads(line)
entries.append(entry)
return entries
# Usage
entries = read_wal('seg-000001.wal.gz')
for entry in entries:
if entry['msg'] == 'store change':
print(f"Height {entry['height']}: {entry['store']}")Index Usage
Seeking by Time
The index enables efficient time-based queries:
Query: "Find all events at height 12345"
1. Binary search index for target height/timestamp
2. Seek to offset in .wal.gz
3. Decompress from that point
4. Read until height changesBenefits:
- No need to decompress entire file
- O(log n) search time
- Efficient for large files
Validation
Use index checksums to verify integrity:
1. Read index entry
2. Seek to offset in .wal.gz
3. Read block of data
4. Compute CRC32
5. Compare with index checksumDetects:
- Corruption
- Truncation
- Tampering
Storage Requirements
Estimation
Variables:
- Log rate (events/second)
- Average event size
- Compression ratio
- Retention period
Example Calculation:
Events per second: 100
Event size: 500 bytes
Compression ratio: 10:1 (90% reduction)
Per day:
100 events/s × 86,400 s = 8,640,000 events
8,640,000 × 500 bytes = 4.32 GB raw
4.32 GB ÷ 10 = 432 MB compressed
Per month:
432 MB × 30 = 12.96 GB
Per year:
12.96 GB × 12 = 155.5 GBOptimization Strategies
-
Enable Filtering
filter = true # Reduces volume by 70-80% -
Adjust Flush Interval
interval = "5s" # Larger batches = better compression -
Implement Retention Policy
# Keep only last 7 days locally find $CHAIN_DIR/data/log.wal/ -type d -mtime +7 -exec rm -rf {} \; -
Archive to Object Storage
- Upload old segments to S3/GCS
- Delete local copies after successful upload
- Retrieve on-demand for analysis
Data Retention
Local Retention
Recommended:
Recent: 7-30 days locally
Historical: 90+ days in object storage
Forever: Critical events on apphash.io platformShipper Management
The analyzer-shipper handles:
- Checkpointing (tracks what’s been shipped)
- Retry logic (ensures reliable delivery)
- Cleanup (optional, based on configuration)
Security Considerations
Sensitive Data
Logs may contain:
- Transaction details
- Account balances
- Validator information
- Governance proposals
Recommendations:
- Restrict file permissions:
chmod 600 *.wal.gz - Encrypt at rest if required
- Control access to log directory
- Consider PII implications
Access Control
# Recommended permissions
chown $CHAIN_USER:$CHAIN_USER $CHAIN_DIR/data/log.wal
chmod 700 $CHAIN_DIR/data/log.wal
chmod 600 $CHAIN_DIR/data/log.wal/*/*/*Shipping Security
When shipping to apphash.io:
- Use TLS for transport
- Authenticate with API keys
- Consider VPN/private network
- Monitor for unauthorized access
Troubleshooting
Missing Segments
If segment numbers skip (e.g., seg-000001, seg-000003):
- Segment 000002 likely failed to write
- Check node logs for errors
- Verify disk space and permissions
Corrupted Files
If decompression fails:
# Test file integrity
gunzip -t seg-000001.wal.gz
# Check filesystem
fsck /dev/sdXLarge File Sizes
If segments are unexpectedly large:
- Check if filtering is enabled
- Review log rate (may indicate issue)
- Verify compression is working
- Consider shorter flush interval
Next Steps
- Memlogger Architecture - Understand how logs are generated
- Node Configuration - Configure retention and rotation