利用 JsonDiffPatch 產生資料異動時的差異內容,縮小儲存容量

當資料有異動 (CUD) 的時候,我會把異動的結果存放在另一張表,稱之為歷程 History 或是快照 Snapshot,讓活動中的資料內容都是有用的,其他的資料表要記錄"當下"的狀態時,則是參考 Snapshot 的資料,例如 Id+Version,這次我要用 JsonDiffPatch 來產生差異資料內容,並且還原出當時的內容,就來看看我怎麼實作。

JsonDiffPath

開發環境

實作

System.Text.Json、Newtonsoft.Json(Json .NET) 基於 benjamine/jsondiffpatch 發展出 JsonDiff 的套件

JsonDiffPatch (benjamine.github.io)

System.Text.Json

weichch/system-text-json-jsondiffpatch: High-performance, low-allocating JSON object diff and patch extension for System.Text.Json. Support generating patch document in RFC 6902 JSON Patch format. (github.com)

Newtonsoft.Json(Json .NET)

wbish/jsondiffpatch.net: JSON object diffs and reversible patching (jsondiffpatch compatible) (github.com)

本篇以 System.Text.Json 為主

安裝套件

dotnet add package SystemTextJson.JsonDiffPatch --version 1.3.1

 

熟悉下 Diff、Patch 的使用方式

比對 Json 物件:Diff

 

還原 Json 物件:Patch

資料結構

Member 資料表存放當前的最新資料

create table public."Member" (
 "Id" text primary key not null,
 "Profile" jsonb,
 "Accounts" jsonb not null,
 "CreatedAt" timestamp with time zone not null,
 "CreatedBy" character varying(50) not null,
 "UpdatedAt" timestamp with time zone not null,
 "UpdatedBy" character varying(50) not null,
 "Version" integer not null
);
create index "IX_Member_Accounts" on "Member" using gin ("Accounts");

當資料發生異動,則將異動結果放到 Snapshot 資料表

 

Snapshot 的設計原則

  • 以 Domain 為主;不是資料表。
  • 紀錄 Domain 當下狀態,它可能會依賴別的 Domain 的 Snapshop Id。

接下來就是存放 Snapshot 的方式,我選用 PostgreSQL,資料表定義如下:

create table public."Snapshot"
(
   "Id"         text                     not null,
   "Version"    integer                  not null,
   "DataType"   text                     not null,
   "Data"       jsonb                    not null,
   "DataFormat" text                     not null,
   "CreatedAt"  timestamp with time zone not null,
   "CreatedBy"  varchar(50)              not null,
   constraint "PK_Snapshot"
       primary key ("Id", "Version")
);
alter table public."Snapshot"
   owner to postgres;
create index "IX_Snapshot_Data"
   on public."Snapshot" using gin ("Data");
  • Data 存放當下資料,第一筆是完整資料。
  • 第二筆,用最新的資料與改變後的資料比對差異結果。
  • 第三筆以此類推。

 

新增資料

第一次新增時,同時新增 Member 以及 Snapshot 資料表

public async Task<(Failure Failure, bool Data)> InsertMemberAsync(InsertMemberRequest request)
{
    var authContext = this._contextGetter.Get();
    var search = $"[{{\"Id\": \"{request.Account.Id}\"}}]";

    await using var dbContext = await this._memberDbContextFactory.CreateDbContextAsync();
    var accountExist = dbContext.Members.AsNoTracking().Any(p => EF.Functions.JsonContains(p.Accounts, search));
    if (accountExist)
    {
        return (new Failure(FailureCode.MemberExist, $"Member({request.Account.Id}) exist"), false);
    }

    var member = new MemberDataEntity
    {
        Id = Guid.NewGuid().ToString(),
        Profile = this._mapper.Map<DB.Profile>(request.Profile),
        Accounts = new List<DB.Account>
        {
            this._mapper.Map<DB.Account>(request.Account)
        },
        CreatedAt = authContext.Now,
        CreatedBy = authContext.UserId,
        UpdatedAt = authContext.Now,
        UpdatedBy = authContext.UserId,
        Version = 1
    };
    var snapshot = new SnapshotDataEntity
    {
        Id = member.Id,
        DataType = typeof(MemberDataEntity).ToString(),
        Data = JsonNode.Parse(JsonSerializer.Serialize(member, this._jsonSerializerOptions)),
        DataFormat = DataFormat.Full.ToString(),
        CreatedAt = authContext.Now,
        CreatedBy = authContext.UserId,
        Version = member.Version
    };
    await dbContext.Members.AddAsync(member);
    await dbContext.Snapshots.AddAsync(snapshot);
    await dbContext.SaveChangesAsync();
    return (null, true);
}

 

 用 API 調用它

[HttpPost(Name = "InsertMember")]
public async Task<ActionResult<MemberResponse>> Post(InsertMemberRequest request)
{
    var insertMemberResult = await this._memberRepository.InsertMemberAsync(request);
    if (insertMemberResult.Failure != null)
    {
        return new ObjectResult(insertMemberResult.Failure)
        {
            ContentTypes = new MediaTypeCollection()
            {
                "application/json"
            },
            StatusCode = (int)HttpStatusCode.BadRequest
        };
    }

    return this.NoContent();
}

 

產生差異

  1. 從資料庫取得最新資料放在 oldMember 變數
  2. 複製 oldMember 狀態,放在 newMember 變數
  3. UpdateMemberRequest 的狀態寫更新到 newMember
  4. 比對兩個變數,若沒有差異就返回不進版
public async Task<(Failure Failure, bool Data)> BindMemberAsync(string currentAccount,
    UpdateMemberRequest request)
{
    var authContext = this._contextGetter.Get();
    var search = $"[{{\"Id\": \"{currentAccount}\"}}]";

    await using var dbContext = await this._memberDbContextFactory.CreateDbContextAsync();
    var oldMember = await dbContext.Members.FirstOrDefaultAsync(p => EF.Functions.JsonContains(p.Accounts, search));
    if (oldMember == null)
    {
        return (new Failure(FailureCode.MemberNotExist, $"Member({currentAccount}) not exist"), false);
    }

    var newMember = this.DeepClone(oldMember);

    this.UpdateAccounts(request.Accounts, newMember);
    this.UpdateProfile(request.Profile, newMember);

    // 比對兩個 member 內容是否有差異
    if (this.Diff(oldMember, newMember).Result == false)
    {
        // 沒有差異,不做任何事
        return (null, true);
    }
}

 

  1. 若有差異,則進版
  2. 再產生差異內容,放到 Snapshot.Data 裡
// 有差異,進版號
newMember.UpdatedAt = authContext.Now;
newMember.UpdatedBy = authContext.UserId;
newMember.Version = oldMember.Version + 1;

// 產生差異內容
var diff = this.Diff(oldMember, newMember).Data;

var snapshot = new SnapshotDataEntity
{
    Id = newMember.Id,
    DataType = typeof(MemberDataEntity).ToString(),
    Data = diff,
    DataFormat = DataFormat.Diff.ToString(),
    CreatedAt = newMember.UpdatedAt,
    CreatedBy = newMember.UpdatedBy,
    Version = newMember.Version
};

// 更新時,可以使用樂觀鎖定,Update Where Version=oldMember.Version,這裡我沒有實作
var entry = dbContext.Members.Entry(oldMember);
entry.CurrentValues.SetValues(newMember);

await dbContext.Snapshots.AddAsync(snapshot);
await dbContext.SaveChangesAsync();

 

比對物件使用 JsonDiffPatcher.Diff 實現

private (JsonNode Data, bool Result) Diff(MemberDataEntity oldMember, MemberDataEntity newMember)
{
    var oldData = JsonSerializer.Serialize(oldMember, this._jsonSerializerOptions);
    var newData = JsonSerializer.Serialize(newMember, this._jsonSerializerOptions);
    var diff = JsonDiffPatcher.Diff(oldData, newData,
        new JsonDiffOptions
        {
            JsonElementComparison = JsonElementComparison.Semantic,
        });

    if (diff == null)
    {
        return (null, false);
    }

    return (diff, true);
}

 

執行結果如下:

第一筆

 

第二筆

 

第三筆

 

還原差異

若帶版號,則使用最新的資料

public async Task<MemberResponse> GetMemberAsync(string account, int? version)
{
    var search = $"'[{{{{\"id\": \"{account}\"}}}}]'";
    await using var dbContext = await this._memberDbContextFactory.CreateDbContextAsync();

    // 讀取最新資料
    if (version.HasValue == false)
    {
        var queryable = from member in dbContext.Members
                        where EF.Functions.JsonContains(member.Accounts, search)
                        select new { member };
        var data = await queryable.AsNoTracking().FirstOrDefaultAsync();
        if (data == null)
        {
            return null;
        }

        return this._mapper.Map<MemberResponse>(data.member);
    }
}

 

反之,則讀取快照資料

public async Task<MemberResponse> GetMemberAsync(string account, int? version)
{
    var search = $"'[{{{{\"id\": \"{account}\"}}}}]'";
    await using var dbContext = await this._memberDbContextFactory.CreateDbContextAsync();

    // 讀取最新資料
    ...

    // 讀取快照
    var query = $@"select * from ""Snapshot"" where ""Data"" -> 'accounts' @> {search}::jsonb";

    var snapshots = await dbContext.Snapshots
            .FromSqlRaw(query)
            .ToListAsync()
        ;

    JsonNode finial = null;

    // 依序合併快照
    foreach (var snapshot in snapshots)
    {
        if (snapshot.Version == 1)
        {
            finial = snapshot.Data;
            continue;
        }

        JsonDiffPatcher.Patch(ref finial, snapshot.Data);
    }

    // search account in JsonNode
    if (finial == null)
    {
        return null;
    }

    var accounts = finial["accounts"]!.AsArray();
    foreach (var item in accounts)
    {
        var id = item["id"]!.GetValue<string>();
        if (id == account)
        {
            var result = finial.Deserialize<MemberResponse>(this._jsonSerializerOptions);
            return result;
        }
    }

    return null;
}

 

還原快照的邏輯,得搭配著異動資料的方法,這篇僅是提供一種思路,可別全部照抄

範例位置

sample.dotblog/Shnapshot/Lab.Snapshot at 1fb3a8e86bf1de3f64222486e1bb695308701629 · yaochangyu/sample.dotblog (github.com)

若有謬誤,煩請告知,新手發帖請多包涵


Microsoft MVP Award 2010~2017 C# 第四季
Microsoft MVP Award 2018~2022 .NET

Image result for microsoft+mvp+logo