• Concept: Rational Reader

    This is a sketch of a solution to Task: text to Bayes rationality.

    The paradigm is Bayesian epistemology. The broader task is to infer a rational worldview from empirical observation. Here, we use a collection of documents as our link to the real world: we observe that somebody created a document like this.

    Roughly speaking, we infer our rational worldview by forcing Bayes’ rule in all possible combinations of model weights and observations. The engine of this arrangement is a language model conditioned on propositional knowledge paired with a knowledge model conditioned on language.

    Preliminaries

    In reality, there are at least two Bayes’ rules: the discrete and the continuous. We we use the continuous form:

    $$f_{X|Y=y}(x) = f_{Y|X=x}(y) f_{X}(x) / f_{Y}(y)$$

    where each function is a probability density function / conditional density.

    To make a continuous distribution over something discrete like words, we use a traditional word embedding summed with positional encoding, then passed through the PDF of a multivariate normal distribution with inferred mean and covariance matrix. (How this interacts with the positional encoding I’m not clear on….)

    The multivariate normal is particularly useful because it can be arbitrarily marginalized to any subset of components of the random vector; this results from the fact that a multivariate normal parameterized by a linear transform of another’s parameters is also multivariate normal.

    Distributions of interest

    There are five:

    • $P(\vec{w})$—a general language model. This decomposes by the chain rule as $P(\vec{w}) = \Pi_{i} P(w_i | \vec{w}_{j < i})$.

      Implementation: unclear; we need a probabilistic language model; can we get a probabilistic interpretation of a transformer?
    • $P(K)$—a general knowledge model. How likely, a priori, is a belief or statement to be true?

      Implementation: a multivariate normal would be a starting point
    • $P(\vec{w} | K)$—the knowledge-conditional language model. This is the probability of a document $\vec{w}$ given some assertion of the state of the world, the nature of reality, or whatever, $K$. $K$ may make claims about a subset of reality; the world is a complex place, so it’s helpful to be able to discuss parts of it rather than always the whole. This is enabled by the marginalizability of the multivariate normal as discussed above. Of course by the chain rule this decomposes to $\Pi_{i} P(w_i | \vec{w}_{j < i})$.

      Implementation: uncertain; a multivariate normal parameterized by a transformer with $K$ as input?
    • $P(K | \vec{w})$—the language-conditional knowledge model. Given a word and its context, how likely is an assertion about our model to be true?

      Implementation: uncertain; another probabilistic transformer? A multivariate normal whose parameters are a function of $\vec{w}$, perhaps the output of a transformer?
    • $P(K|J)$ where $K$ and $J$ are disjoint propositions—a hypotheticals model. What does assuming part of our model say about the rest of our model?

      Implementation: multivariate normal parameterized by output of a transformer

    Training Procedure

    Randomly sample word-with-context $\vec{w}$ and knowledge vector $\vec{k}$. Randomly partition $\vec{k}$ into disjoint vectors $\vec{q}$ and $\vec{r}$. Compute the gradient of the loss:

    $$\mathfrak{L}_{Int} = [P(\vec{q} | \vec{r}) – P(\vec{r} | \vec{q}) P(\vec{q}) / P(\vec{r})]^2$$

    $$\mathfrak{L}_{Obs} = [P(\vec{w} | \vec{k}) – P(\vec{k} | \vec{w}) P(\vec{w}) / P(\vec{k})]^2$$

    $$\mathfrak{L} = \mathfrak{L}_{Int} + \mathfrak{L}_{Obs}$$

    and feed it to your favorite optimizer.

    The first part critically evaluates the interrelationship of model components. The second part critically evaluates the explanatory power of the model relative to empirical observation.

  • Task: text to Bayes rationality

    Task: text to Bayes rationality

    Language models, at the core, are very stupid, blindly predicting the next word given the preceding words. This leaves them profoundly vulnerable to the biases and inaccuracies of the training data.

    Human annotations are applied late in the game to reduce spurious, hallucinatory, extremist, discriminatory, and other undesired outputs, but this after-the-model reshaping is symptomatic of the fact that there is no critical thinking in the language model proper. It can’t assess the reasonableness of the text it’s generating; only the likelihood that the words would be spoken. Of course, some things are so widely accepted that they go without saying; and others, like propaganda, are repeated precisely because of their untruth.

    This task is to distill from biased textual inputs a rational world-model, so far as it is implicit in the training data.

    What makes a model rational?

    A “rational” model does its best to be both internally consistent and to comport with empirical observations. If this consistency respects Bayes’ rule, it embodies Bayesian epistemology; we here term such a model Bayes rational.

    The first requirement of Bayes rationality, internal consistency, can be seen as adherence to Bayes’ rule between all pairs of model parameters:

    $$P(\theta_i | \theta_j) = P(\theta_j | \theta_i) P(\theta_i) / P(\theta_j)$$

    This can be enforced by minimizing the loss function:

    $$\mathfrak{L}_{Int} \equiv \sum_{i} \sum_{j \neq i} [P(\theta_i | \theta_j) – P(\theta_j | \theta_i) P(\theta_i) / P(\theta_j)]^2$$

    But having internal consistency is not enough to render a worldview rational. Theory must also relate to observation in a manner consistent with Bayes’ rule. Observation consistency applies Bayes’ rule to every model parameter-observation pair:

    $$P(x_i | \theta_j) = P(\theta_j | x_i) P(x_i) / P(\theta_j)$$

    This relationship is captured by the loss:

    $$\mathfrak{L}_{Obs} \equiv \sum_i \sum_{j \neq i} [P(x_i | \theta_j) – P(\theta_j | x_i) P(x_i) / P(\theta_j)]^2$$

    The combination of internal rational consistency and rational consistency with observation is embodied by a Bayes rationality loss:

    $$\mathfrak{L} = \mathfrak{L}_{Int} + \mathfrak{L}_{Obs}$$

    The task of extracting a Bayes rational model of the world is complete to the extent that this loss is minimized.

    Art by emma marie andersson under a Creative Commons license.

  • Dataset: Wikimedia Commons Image Super-Resolution

    TODO full description

    TODO make available on torrent?

    A large dataset of lossless images from Wikimedia Commons, with 25%, 50%, and 128×128 downscales, plus train and validation splits for an image super-resolution task.

  • Task: image to video

    Given a frame, produce video as would occur in a movie

    https://paperswithcode.com/task/image-to-video

  • Task: audio restoration

    Given frames of a video / gif, recover / estimate the original audio as well as possible

  • Ghee 0.6 – the tastiest way to manage your data

    Introducing Ghee 0.6, the latest version of the tastiest way to manage your data!

    Ghee is an experiment in leveraging modern filesystem features to implement a data management system, providing a key-value database, Git-style commits, and extensive tools for manipulation of extended attributes (xattrs).

    The focus of this release is the introduction of the commit-management subcommands commit, log, restore, and reset. These are modeled after their Git equivalents, but utilize Btrfs copy-on-write filesystem semantics, including read-only snapshots, to efficiently track changes.

    Using Btrfs generalizes change tracking, efficiently handling not only text files, but arbitrary binary blobs as well.

    It is hoped that this could lead to a version control system that handles large files in an integrated manner, whereas large file support in Git is tacked on separately – in the case of Git-LFS, requiring an additional server to implement.

    When using Ghee as a database, the commit, restore, and reset commands provide transaction-like functionality, allowing modifications to be built incrementally but finalized all at once – and rolled back when mistakes are made.

    The main question about what Ghee will be good for is how efficiently it can handle actual database workloads. I suspect the answer will be: not very well, based on Micheal Sproul’s experience with Butter DB, based on a similar architecture.

    But for many workflows, it’s not necessary to serve large numbers of queries: Ghee would be well-suited for data scientists developing datasets and statistical models, for example.

    It will be interesting to see what happens here at the meeting-place of databases, filesystems, and version control systems. Imagine adding WHERE clauses to your merge command, a la ghee merge origin annotations-20231002 -w age>45.

    Now hosted at Codeberg… check it out and, as always, send bug reports my way!

  • Ghee 0.4 – The tastiest way to work with Linux extended attributes (xattrs)

    Introducing Ghee 0.4, the newest release of the premier tool for manipulating Linux extended attributes! (0.3 coverage here reddit .)

    This latest release adds a Rustyline-based REPL and additional tools for using the filesystem as a relational database. The new init subcommand lets you declare the primary key by which a directory (and its subdirectories) are indexed, while ins and del now allow insertion and deletion of records while keeping related indices up to date. ls is helpful in the REPL, showing Ghee’s view of the world.

    In addition to direct management of extended attributes, Ghee is designed to implement a relational data model built around xattrs while offloading as much functionality as feasible to the filesystem.

    As such, Ghee does nothing special to ensure the integrity of stored data. You are encouraged to layer this not on but under Ghee by your choice of filesystem. For example, ZFS, Btrfs, and Bcachefs all provide checksum-based integrity checks.

    Next steps include filling in missing features in existing subcommands and using copy-on-write snapshots to provide a Git-inspired workflow, something like:

    • ghee diff ./people: show how the ./people table has changed since last commit
    • ghee commit ./people -m "message!": commit the ./people table in its current form with message “message!”
    • ghee restore ./people gf037d2c98: restore the ./people table to its state in commit gf037d2c98
    • ghee log ./people: show commit messages for all commits in the ./people table.

    As I am a user of Btrfs, CoW-based features will be implemented with Btrfs in mind first. If this proves successful it could be extended to other filesystems.

    Of course, I hope it goes without saying that version 0.4 of any project should NOT be used in mission-critical contexts where the cost of data loss would be substantial.

    Thanks in advance for any thoughts, questions, or suggestions!