When users duplicate records in a database, tracking whether copies have diverged from their source becomes challenging. A content hash provides a simple mechanism to detect when duplicates no longer match their parent.
The Problem
Consider an email_templates table:
CREATE TABLE email_templates (
id INT PRIMARY KEY,
name VARCHAR(255),
subject VARCHAR(255),
body TEXT,
footer TEXT,
hash VARCHAR(64),
parent_id INT NULL
);SQLUser A creates a template (id: 1, parent_id: null). Then User B duplicates this template (id: 2, parent_id: 1). Both records are identical.
User A then updates their template’s subject to “Welcome aboard!”. The parent_id on User B’s record still points to id: 1, but the records no longer match since the subjects are different.
Content Hashing
A hash function converts variable-length input into a fixed-length string. SHA-256 produces a 64-character hexadecimal output. Identical inputs always produce identical hashes; even single character changes produce completely different outputs.
const crypto = require('crypto');
function generateHash(data) {
return crypto
.createHash('sha256')
.update(JSON.stringify(data))
.digest('hex');
}JavaScriptImplementation
Whenever a user creates, or updates a record you take all (or some) of that records data and produce a hash based on it which is stored against the record.
async function updateEmailTemplate(id, updates) {
const hashData = {
subject: updates.subject,
body: updates.body,
footer: updates.footer
};
const hash = generateHash(hashData);
await db.query(
'UPDATE email_templates SET subject = ?, body = ?, footer = ?, hash = ? WHERE id = ?',
[updates.subject, updates.body, updates.footer, hash, id]
);
}JavaScriptDetecting Divergence
Now in our example, since we have two records which are linked through id and parent_id we can now determine whether they have diverged from each other by comparing the hashes. Since User A updated the subject heading, their hash will now be different to User B’s copy.
async function hasDiverged(recordId) {
const record = await db.query(
'SELECT hash, parent_id FROM email_templates WHERE id = ?',
[recordId]
);
if (!record.parent_id) {
return false; // Not a duplicate
}
const parent = await db.query(
'SELECT hash FROM email_templates WHERE id = ?',
[record.parent_id]
);
return record.hash !== parent.hash;
}JavaScriptConsiderations
- Select which fields to include in the hash. Exclude timestamps, usage counts, or other metadata that shouldn’t trigger divergence detection.
- Store the hash in the database rather than computing it on demand to avoid performance penalties when checking multiple records.
- For large text fields, consider hashing normalized versions (trimmed, lowercase) to avoid detecting inconsequential changes.
The hash comparison provides O(1) divergence detection without field-by-field comparisons or storing complete record history.
Leave a Reply