Given frames of a video / gif, recover / estimate the original audio as well as possible
-
Ghee 0.6 – the tastiest way to manage your data
Introducing Ghee 0.6, the latest version of the tastiest way to manage your data!
Ghee is an experiment in leveraging modern filesystem features to implement a data management system, providing a key-value database, Git-style commits, and extensive tools for manipulation of extended attributes (xattrs).
The focus of this release is the introduction of the commit-management subcommands
commit,log,restore, andreset. These are modeled after their Git equivalents, but utilize Btrfs copy-on-write filesystem semantics, including read-only snapshots, to efficiently track changes.Using Btrfs generalizes change tracking, efficiently handling not only text files, but arbitrary binary blobs as well.
It is hoped that this could lead to a version control system that handles large files in an integrated manner, whereas large file support in Git is tacked on separately – in the case of Git-LFS, requiring an additional server to implement.
When using Ghee as a database, the
commit,restore, andresetcommands provide transaction-like functionality, allowing modifications to be built incrementally but finalized all at once – and rolled back when mistakes are made.The main question about what Ghee will be good for is how efficiently it can handle actual database workloads. I suspect the answer will be: not very well, based on Micheal Sproul’s experience with Butter DB, based on a similar architecture.
But for many workflows, it’s not necessary to serve large numbers of queries: Ghee would be well-suited for data scientists developing datasets and statistical models, for example.
It will be interesting to see what happens here at the meeting-place of databases, filesystems, and version control systems. Imagine adding
WHEREclauses to your merge command, a laghee merge origin annotations-20231002 -w age>45.Now hosted at Codeberg… check it out and, as always, send bug reports my way!
-
Ghee 0.4 – The tastiest way to work with Linux extended attributes (xattrs)
Introducing Ghee 0.4, the newest release of the premier tool for manipulating Linux extended attributes! (0.3 coverage here reddit .)
This latest release adds a Rustyline-based REPL and additional tools for using the filesystem as a relational database. The new
initsubcommand lets you declare the primary key by which a directory (and its subdirectories) are indexed, whileinsanddelnow allow insertion and deletion of records while keeping related indices up to date.lsis helpful in the REPL, showing Ghee’s view of the world.In addition to direct management of extended attributes, Ghee is designed to implement a relational data model built around xattrs while offloading as much functionality as feasible to the filesystem.
As such, Ghee does nothing special to ensure the integrity of stored data. You are encouraged to layer this not on but under Ghee by your choice of filesystem. For example, ZFS, Btrfs, and Bcachefs all provide checksum-based integrity checks.
Next steps include filling in missing features in existing subcommands and using copy-on-write snapshots to provide a Git-inspired workflow, something like:
ghee diff ./people: show how the ./people table has changed since last commitghee commit ./people -m "message!": commit the./peopletable in its current form with message “message!”ghee restore ./people gf037d2c98: restore the./peopletable to its state in commit gf037d2c98ghee log ./people: show commit messages for all commits in the./peopletable.
As I am a user of Btrfs, CoW-based features will be implemented with Btrfs in mind first. If this proves successful it could be extended to other filesystems.
Of course, I hope it goes without saying that version 0.4 of any project should NOT be used in mission-critical contexts where the cost of data loss would be substantial.
Thanks in advance for any thoughts, questions, or suggestions!
-
Ghee 0.3 – The tastiest way to work with Linux extended attributes (xattrs)
Introducing Ghee 0.3, the newest release of the premier tool for manipulating Linux extended attributes!
Originally known as Hatter and then, regrettably, as Mattress, this tastiest of tools has been redubbed Ghee after the clarified butter popular in Indian cuisine, and as a reference to the Btrfs filesystem, which originally convinced me that much database functionality has now been subsumed by advanced filesystem features.
This new release adds SQL
WHERE-style predicates to filter by, e.g.ghee get --where age >= 65 ./people, and makesgetrecursive by default (the old behavior is still available behind the--flatflag).The idea is for Ghee to implement as much of a relational data model as possible using the filesystem itself as a substrate. Design principles:
- Folders are tables
- Files are records
- Relative paths are primary keys
- Extended attributes are non-primary-key columns
- Enforce schema only when present
- The file contents are user-controlled; only directory structure, filenames, and extended attributes are used by Ghee
- Use of filesystem features should be preferred over implementing features directly in Ghee, e.g. locking, Btrfs subvolumes, snapshots, incremental backup
Would love to hear any comments. Apologies for the name changes—third time’s the charm, I think this one’ll stick.
-
Mattress 0.2.1 (formerly Hatter)
Mattress is a command line tool for working with Linux extended attributes (xattrs)
Because someone else’s awesome project already occupied the
hattercrate, I’ve changed the name of my project from “Hatter” to “Mattress” which, weird as it is, has the advantage of actually including “attr” as a substring.The executable name has correspondingly changed from
htrtomtr.This version begins the introduction of simple database-like features, implemented using the filesystem and extended attributes as a substrate.
Mattress sees the world in a peculiar way: it interprets a filesystem folder as a database table with one record for each file in the folder, indexed by the “primary key” of the filename.
A nested hierarchy of directories is seen by Mattress as a database table indexed by the compound key corresponding to the nested subpath, and one “record” per file encompassed under the folder recursively.
Consider this folder
./peopleof personnel records:n$ mtr get ./people/*n./people/Sandeep user.id 2n./people/Sandeep user.name Sandeepn./people/Sandeep user.state CAn./people/Sofia user.id 1n./people/Sofia user.name Sofian./people/Sofia user.state WAn./people/Wulfrum user.id 0n./people/Wulfrum user.name Wulfrumn./people/Wulfrum user.state CAnnSuppose we want to index not by the name as now, but by the
id. We can do this using the newidxcommand.n$ mtr idx -v -k id ./people ./people:idn./people/Sandeep -> ./people:id/2n./people/Sofia -> ./people:id/1n./people/Wulfrum -> ./people:id/0nnThe arrows show the hardlinks mapping the original
./peoplefolder to the indexed view./people:id.We can also index by compound keys, such as here where we index by
(state,id):n$ mtr idx -v -k state -k id ./people ./people:state:idn./people/Sandeep -> ./people:state:id/CA/2n./people/Sofia -> ./people:state:id/WA/1n./people/Wulfrum -> ./people:state:id/CA/0nnI have some “magic” planned to speed up the
getcommand and ease the ergonomics (letting you reference e.g.state, which will be taken from the path rather than from the per-file xattrs.) Eventually I’d like to allow for SQLSELECT-style conditions, but that’s for another day.(Note: this project is now known as Ghee.)
-
Hatter: a command line tool for working with Linux extended attributes (xattrs)
In my current, semi-stealth machine learning project, I’m experimenting with using Linux filesystem extended attributes as a sort of “poor man’s” database to store annotations in.
I’m not really sure this is turning out better than using, say, a SQLite database, but it’s been interesting to try.
In the process, I built my own command line tool for manipulating xattrs, which I’m releasing under a GPL3 license, called Hatter. It’s written in Rust since that’s what I’ve mostly been writing the past few months.
There are probably many bugs, but it’s working alright for me, so I figured I’d unleash it on the world. Just… don’t go crazy
https://git.disroot.org/joshhansen/Hatter
(NOTE: This project is now known as Ghee.)
-
Concept update: Calcifer
It seems the visual calorie estimator project I proposed is more or less being done.
Gotta act fast in this business.
They’re charging quite a bit – might still be worth entry.
-
We have liftoff
We’re now beating the random baseline on a 10×10 board, with a greedy algorithm trained only on self-play data:
Evaluating umpire AIs:r wins: 459./ai/agz/6.agz wins: 541Draws: 0The model is a basic convolutional neural network based on the context surrounding the city or unit taking the next action. The weights serialize to 166KB, so easy to deploy with the game.
The key to training was to up the number of training instances – “The Unreasonable Effectiveness of Data”, after all. With purely-random algorithms doing the self-play, this can run extremely fast.
The next obstacle will be the transport mechanic: the need for land units in Umpire to board a transport ship to transfer continents. The inability of air and sea units to capture cities means this mechanic must be understood at some level for an AI to operate on the full 180×90 map, where large stretches of ocean divide multiple land masses.
With purely-random players, the likelihood of this mechanic getting triggered and thus entering the training data seems fairly low. But if we throw enough training episodes at it, we’ll see it eventually. This may necessitate multithreading the self-play code to run multiple games simultaneously, and optimizing the game engine for throughput.
-
btrfs
Impressionistically it’s like git and rsync had a baby, oh, but it’s a fileystem. That’s how it feels with the lightweight snapshots (like branches/tags/commits in git) and the send/receive (like straight up rsync).
If there were a “merge” or “rebase” concept in this world, I think I’d have uses for it.
I’ve been using Duplicity for my off-site backup for a while now. It’s almost obsoleted by btrfs send/receive. (As are most backup tools, at least from the thousand foot view.) All it needs is encryption and compression, with the filesystem already providing the deltas.
-
Javascript, The Bad Parts #4: `sort` broken by default 😿
A REPL snippet’s worth a thousand words:
> [9,10,11].sort() [ 10, 11, 9 ]The thousand-and-first word is: unconscionable!

