Ghee 0.6 - the tastiest way to manage your data

Josh Hansen @me@joshhansen.tech

Hacker-poet. Learn Something cofounder & CTO. Creator of Seattle Poetry Meetup. To me everything is craft; everything is creativity.

Jack of all trades, but master of some. Typescript, Rust, Kotlin, Python—collector of programming languages since age 11. Deep neural networks, reinforcement learning, AI if you must....

Project concepts and progress; kind-hearted smackdowns; quesadilla technique if you're lucky.

Github: https://github.com/joshhansen

LinkedIn: https://www.linkedin.com/in/hansen-josh

Ghee 0.6 - the tastiest way to manage your data

Oct 02, 2023

Introducing Ghee 0.6, the latest version of the tastiest way to manage your data!

Ghee is an experiment in leveraging modern filesystem features to implement a data management system, providing a key-value database, Git-style commits, and extensive tools for manipulation of extended attributes (xattrs).

The focus of this release is the introduction of the commit-management subcommands commit, log, restore, and reset. These are modeled after their Git equivalents, but utilize Btrfs copy-on-write filesystem semantics, including read-only snapshots, to efficiently track changes.

Using Btrfs generalizes change tracking, efficiently handling not only text files, but arbitrary binary blobs as well.

It is hoped that this could lead to a version control system that handles large files in an integrated manner, whereas large file support in Git is tacked on separately - in the case of Git-LFS, requiring an additional server to implement.

When using Ghee as a database, the commit, restore, and reset commands provide transaction-like functionality, allowing modifications to be built incrementally but finalized all at once - and rolled back when mistakes are made.

The main question about what Ghee will be good for is how efficiently it can handle actual database workloads. I suspect the answer will be: not very well, based on Micheal Sproul's experience with Butter DB, based on a similar architecture.

But for many workflows, it's not necessary to serve large numbers of queries: Ghee would be well-suited for data scientists developing datasets and statistical models, for example.

It will be interesting to see what happens here at the meeting-place of databases, filesystems, and version control systems. Imagine adding WHERE clauses to your merge command, a la ghee merge origin annotations-20231002 -w age>45.

Now hosted at Codeberg... check it out and, as always, send bug reports my way!

Ghee Codeberg.org