• Ghee 0.3 – The tastiest way to work with Linux extended attributes (xattrs)

    Introducing Ghee 0.3, the newest release of the premier tool for manipulating Linux extended attributes!

    Originally known as Hatter and then, regrettably, as Mattress, this tastiest of tools has been redubbed Ghee after the clarified butter popular in Indian cuisine, and as a reference to the Btrfs filesystem, which originally convinced me that much database functionality has now been subsumed by advanced filesystem features.

    This new release adds SQL WHERE-style predicates to filter by, e.g. ghee get --where age >= 65 ./people, and makes get recursive by default (the old behavior is still available behind the --flat flag).

    The idea is for Ghee to implement as much of a relational data model as possible using the filesystem itself as a substrate. Design principles:

    1. Folders are tables
    2. Files are records
    3. Relative paths are primary keys
    4. Extended attributes are non-primary-key columns
    5. Enforce schema only when present
    6. The file contents are user-controlled; only directory structure, filenames, and extended attributes are used by Ghee
    7. Use of filesystem features should be preferred over implementing features directly in Ghee, e.g. locking, Btrfs subvolumes, snapshots, incremental backup

    Would love to hear any comments. Apologies for the name changes—third time’s the charm, I think this one’ll stick.

  • Mattress 0.2.1 (formerly Hatter)

    Mattress is a command line tool for working with Linux extended attributes (xattrs)

    Because someone else’s awesome project already occupied the hatter crate, I’ve changed the name of my project from “Hatter” to “Mattress” which, weird as it is, has the advantage of actually including “attr” as a substring.

    The executable name has correspondingly changed from htr to mtr.

    This version begins the introduction of simple database-like features, implemented using the filesystem and extended attributes as a substrate.

    Mattress sees the world in a peculiar way: it interprets a filesystem folder as a database table with one record for each file in the folder, indexed by the “primary key” of the filename.

    A nested hierarchy of directories is seen by Mattress as a database table indexed by the compound key corresponding to the nested subpath, and one “record” per file encompassed under the folder recursively.

    Consider this folder ./people of personnel records:

    n$ mtr get ./people/*n./people/Sandeep        user.id 2n./people/Sandeep        user.name       Sandeepn./people/Sandeep        user.state      CAn./people/Sofia  user.id 1n./people/Sofia  user.name       Sofian./people/Sofia  user.state      WAn./people/Wulfrum        user.id 0n./people/Wulfrum        user.name       Wulfrumn./people/Wulfrum        user.state      CAnn

    Suppose we want to index not by the name as now, but by the id. We can do this using the new idx command.

    n$ mtr idx -v -k id ./people ./people:idn./people/Sandeep -> ./people:id/2n./people/Sofia -> ./people:id/1n./people/Wulfrum -> ./people:id/0nn

    The arrows show the hardlinks mapping the original ./people folder to the indexed view ./people:id.

    We can also index by compound keys, such as here where we index by (state,id):

    n$ mtr idx -v -k state -k id ./people ./people:state:idn./people/Sandeep -> ./people:state:id/CA/2n./people/Sofia -> ./people:state:id/WA/1n./people/Wulfrum -> ./people:state:id/CA/0nn

    I have some “magic” planned to speed up the get command and ease the ergonomics (letting you reference e.g. state, which will be taken from the path rather than from the per-file xattrs.) Eventually I’d like to allow for SQL SELECT-style conditions, but that’s for another day.

    (Note: this project is now known as Ghee.)

  • Hatter: a command line tool for working with Linux extended attributes (xattrs)

    In my current, semi-stealth machine learning project, I’m experimenting with using Linux filesystem extended attributes as a sort of “poor man’s” database to store annotations in.

    I’m not really sure this is turning out better than using, say, a SQLite database, but it’s been interesting to try.

    In the process, I built my own command line tool for manipulating xattrs, which I’m releasing under a GPL3 license, called Hatter. It’s written in Rust since that’s what I’ve mostly been writing the past few months.

    There are probably many bugs, but it’s working alright for me, so I figured I’d unleash it on the world. Just… don’t go crazy

    https://git.disroot.org/joshhansen/Hatter

    (NOTE: This project is now known as Ghee.)

  • Concept update: Calcifer

    It seems the visual calorie estimator project I proposed is more or less being done.

    Gotta act fast in this business.

    They’re charging quite a bit – might still be worth entry.

  • We have liftoff

    We’re now beating the random baseline on a 10×10 board, with a greedy algorithm trained only on self-play data:

    Evaluating umpire AIs:

    r wins: 459

    ./ai/agz/6.agz wins: 541

    Draws: 0

    The model is a basic convolutional neural network based on the context surrounding the city or unit taking the next action. The weights serialize to 166KB, so easy to deploy with the game.

    The key to training was to up the number of training instances – “The Unreasonable Effectiveness of Data”, after all. With purely-random algorithms doing the self-play, this can run extremely fast.

    The next obstacle will be the transport mechanic: the need for land units in Umpire to board a transport ship to transfer continents. The inability of air and sea units to capture cities means this mechanic must be understood at some level for an AI to operate on the full 180×90 map, where large stretches of ocean divide multiple land masses.

    With purely-random players, the likelihood of this mechanic getting triggered and thus entering the training data seems fairly low. But if we throw enough training episodes at it, we’ll see it eventually. This may necessitate multithreading the self-play code to run multiple games simultaneously, and optimizing the game engine for throughput.

  • btrfs

    Impressionistically it’s like git and rsync had a baby, oh, but it’s a fileystem. That’s how it feels with the lightweight snapshots (like branches/tags/commits in git) and the send/receive (like straight up rsync).

    If there were a “merge” or “rebase” concept in this world, I think I’d have uses for it.

    I’ve been using Duplicity for my off-site backup for a while now. It’s almost obsoleted by btrfs send/receive. (As are most backup tools, at least from the thousand foot view.) All it needs is encryption and compression, with the filesystem already providing the deltas.

  • Javascript, The Bad Parts #4: `sort` broken by default 😿

    A REPL snippet’s worth a thousand words:

    > [9,10,11].sort()
    [ 10, 11, 9 ]

    The thousand-and-first word is: unconscionable!

  • Concept: Music Server / Music Server Protocol v2

    (Updates Concept: Cross-platform music playback daemon)

    Motivation: I’m trying to move away from Spotify, tired of renting my own playlists back month after month. So I’m paying for music downloads online (MP3s and FLACs) and ripping CDs, and generally moving to an old-is-new-again post-Spotify music workflow.

    As Linux is my primary desktop environment, I’m reacquainting myself with the Linux music playback ecosystem for the first time in a decade, more or less. While many of the projects I once relied on (hello, Amarok!) are still around, on the whole the field feels neglected. Even Spotify’s crappy client does get some things right, and usability standards have generally risen since my last outing. But open source offerings haven’t entirely kept up.

    I’m also struck by the lack of common infrastructure between projects. While low-level libraries are shared (e.g. libogg, libpulseaudio), the core file discovery, indexing, tag editing, playback, and playlist management functionality that music players rely on are cobbled together afresh by each project. (As well as transcoding, another common feature.)

    The duplication applies not only to desktop music players (Amarok, Rhythmbox, Clementine, cmus) but also to music servers such as Navidrome or mpd.

    Proposal: define an abstract music playback and management interface, the Music Server Protocol, and implement it definitively in musicd, a comprehensive music server, integrating with existing or newly-built clients.

    musicd could run incidentally (like a language server in the LSP world) or as a persistent daemon, and optionally expose a public network interface.

    The latter point, network transparency, allows such a system to be a true Spotify replacement, serving music to mobile apps from VPS or other servers.

    Multiuser support would be a useful feature, allowing friends and family to integrate their music collections; a lock-based rights management scheme could ensure that music licenses are treated as finite in the shared environment.

    Implementation: I would implement musicd in Rust, interfacing with the same libraries as the existing apps. If there’s a sufficiently-mature Rust codebase that already does what I need, I’d happily fork it. I’m insisting on Rust because I’m more familiar with it than with C++ these days, and because I’d prefer Rust’s rigors in what I hope would become a building-block for many projects.

  • Javascript, The Bad Parts #3: no negative array indexing 😾

    On the petty side—gets frowny cat, rather than horror cat. After a decade of Python, it’s surprising to me that Javascript doesn’t allow arr[-1] and such. Maybe negative indices returning undefined is widely relied upon, leading to compatibility issues? Not sure why it’s not a thing—arr[arr.length - 1] is just no fun.

  • Javascript, The Bad Parts #2: no custom comparisons in `Set` 🙀

    In Javascript, Set treats objects only by reference, not by value, requiring weird contortions to maintain unique sets of non-primitive values.

    In every other modern language I’ve worked with, the standard library set allows custom comparisons. But not Javascript.

    In Java / Kotlin, HashSet defers to an object’s hashCode method, which can be overridden to control comparisons. With TreeSet it’s even easier: just pass a Comparator of your own into the constructor.

    Python’s set will handle any hashable value, so overriding __hash__ and __eq__ should do the trick.

    Rust’s HashSet relies on the Eq, Hash, and PartialEq traits—conceptually somewhat similar to Python’s approach, with Rust’s trait implementations and Python’s special methods being two ways to accomplish much the same thing.

    But in Javascript, and by extension Typescript, the standard library provides no facility for useful sets of custom types.

    https://medium.com/coding-at-dawn/how-to-use-set-to-filter-unique-items-in-javascript-es6-196c55ce924b