Little Short Bulletins

Diffusion Local Time, extra-large and extra-small, at CVPR

2024-07-01T00:00:00-07:00

Load Diffusion Local Time XL in a dedicated window →

Diffusion Local Time, extra large

Diffusion Local Time is an art piece that displays the time in surrealistic landscapes. At CVPR, I displayed two versions, Diffusion Local Time XL and Diffusion Local Time XS.

Extra large is precomputed on a desktop computer with tweening from minute to minute like the Dali Clock! 09:44 on Diffusion Local Time XL is displayed as

The full code to generate these images is available.

Diffusion Local Time extra small is realtime and interactive on a Raspberry Pi! A simple client-server setup on a Pi allows users to change the prompt on demand and the pi will render a new minute in the new prompt.

For ultra-high image quality, Diffusion Local Time XL uses 50-step Stable Diffusion XL, with the monster labs QR code controlnet. SDXL is exceptionally high quality compared to SD1.5, and 50 steps of an undistilled model produces ultra-realistic imagery compared to a few steps of a distilled model.

The tweening is a smoothstep across a fixed number of identically-spaced interpolations between each control image. Future directions can include interpolating the glyph shapes like the dali clock.

Diffusion Local Time, extra small

For realtime interactivity on a Raspberry Pi, the diffusion model must be more lightweight, and we can use Stable Diffusion XS. There are only around 300M parameters for SDXS, roughly a tenth of SDXL, and the diffusion model is distilled to only render using one step. The quality is lower than SD1.5, especially using a different ControlNet: the one control net for SDXS is trained on sketches of images, instead of luminosity. For successful imagery, it is useful to pick prompts that evoke objects that be assembled, like (as above) ferns in redwoods, or spices on a countertop. We can generate 1024x512 imagery in about 45 seconds on a Raspberry Pi 5, which is realtime for an unanimated workload.

Sample prompts that CVPR-goers entered

Kittens poking their heads out of blankets (installed in an urban residence)
Something in space
A messy room

Diffusion Local Time, a timepiece made of generative AI art, at Art Hack Day

2024-02-25T00:00:00-08:00

Diffusion Local Time is a functional timepiece that explores surrealism in generative AI. It was on display in the exhibition at Art Hack Day in San Francisco.

In this installation, the underlying technology and the politics remained the same (a Latent Consistency Model ControlNet running locally) and the local computer was upgraded to a high-end desktop computer, for a sub-minute render latency. Because of the higher compute budget, the resolution increased to 1440x810. For a more cohesive visual, the random seed to generate images was set to change every hour, and most of the digits stayed rendered with the same boulders and sky halos from minute to minute. Further, the number of clock faces reduced to only one, the desert night, to avoid large luminance shifts from afternoon cliffs to desert night to desert sunrise.

The reception was widely positive; reactions included enjoying how calming it was (under a thousandth the framerate of other video art), how beautiful the images were, and why there were so many Milky Ways.

Diffusion Local Time: Stable Diffusion running on a Raspberry Pi as an AI Art Timepiece

2024-01-01T00:00:00-08:00

What is this

Diffusion Local Time is a timepiece with a Generative AI display. A Raspberry Pi locally generates and displays pareidolic clock faces every several minutes, using open-source code and open-source typography and freely-available models.

Easy to customize

The clock faces are generated from four text prompts that default to California landscapes, and this is easily field-serviceable to new clock faces, like “kittens in the park”, for a 3PM viewing.

Fast: 6 minutes per image on a Pi

Using a latent consistency model derived from Stable Diffusion 1.5, and the Monster Labs QR Code Monster, a Raspberry Pi 4 can generate a 480x360 image in 9.5 minutes, suitable for a residential display. The newer Raspberry Pi 5 can generate a 480x360 image in under 6 minutes. On a Mac Studio with an M1 Ultra chip, the controlnet takes 1.1 seconds on GPU.

Diffusion Local Time was designed to use a greyscale 4:3 eink HDMI monitor, and can easily be adapted to fit other dimensions and other display technologies.

While the defaults in software and hardware for Diffusion Local Time prioritize fast and cheap at the expense of image quality, with 22.1 seconds of runtime on the GPU of an M1 Ultra you can get a much more detailed and beautiful result:

Tricky to balance and optimize

The Raspberry Pi is a well-tested deployment target, and 8 GB of memory is enough to run the Huggingface Diffusers code unchanged.¹ But math takes time, and adding smaller numbers up is faster. There is no GPU on a Raspberry Pi to run 16-bit floating point math, so the easiest way to reduce precision is to go to quantized 8-bit integer precision. This is 4x less data to read in from main memory into each layer: on the Pi 4 this gave a 10% speedup, 480 to 430 seconds, even with the overhead to dequantize into floating point numbers, but on the Pi 5 this lead to no time savings, potentially because of improved memory bandwidth, so this is by default off.

The image on the bottom is in 8-bit precision, while on top is in full 32-bit precision.

This is an especially egregious example of a common phenomenon, quantization or not. Not all of the results are plausible: water reflections often obey the control image over optics, sea foam and surf often obey the control image over wave dynamics, rocks often obey the control image over gravity. As with ChatGPT, and all current language models, the output of these LLMs is something that seems statistically plausible without actual fact, and we do the work of upholding both sides of the conversation in our dialogue.

This is challenging, as an artistic project, because the tuning knob for how much to effect image synthesis with the control image, is a real-valued number, and is not a measure of the legibility of the control image: a starry image of a lake in a hot desert night tends to have more contrast than redwoods, so the conditioning scale differs, whereas the artistic intent is to have equivalent legibility among all clock faces.

The images on the bottom come from just 1GB of statistical models.

Handle with care

These LLMs are powerful. The imminent danger is how they can facilitate generating misinformation and erode the idea of consensus reality, which damages our collective ability to fight many of our modern problems.² An urgent danger is their capacity to supplant paid stock art (using the work of photographers and artists, often those same photographers, who currently rely on the sales of their work to sustain themselves). Another urgent danger is the many biases and problems in the training data. Society is generally unprepared for the effects of this technology.

This artwork tries to grapple with these problems. There is minimal harm of a landscape altered with pillars and cairns and seafoam and extra milky ways. There are few pareidolic artists, and every image that Diffusion Local Time starts with a different random seed, based on creation time. The biases of mainstream photography of the American southwest are unfortunately replicated in this work (for example, these landscapes are gorgeous for forest bathing, but people have lived around there for millenia, and these landscapes rarely include people) and counteracting this is an ongoing effort.

The tooling that has made this artwork is powerful, and part of the point of this work is to raise awareness of the potential for its creative use and its malignant abuse. The “local” in Diffusion Local Time, especially on a Raspberry Pi, was a choicce to indicate the broad distribution of power, not only mediated by paid internet services, but locally, unmoderated, even at sub-$100 price points: this power is widely available, in such abundance that it is usable for artistic purposes, and we deserve to be appropriately cautious.

Ridiculous, like digital wristwatches, and thinking they are a pretty neat idea

In a world where most people’s primary interaction with a timepiece is through a mobile phone, which defaults to a 3 or 4 digit time display (8:30 instead of 🕣), a numerical display of time is common, but also ridiculous, in exactly the sense of “ape-descended life forms … so amazingly primitive that they still think digital wristwatches are a pretty neat idea”: to use the explanation of Douglas Adams himself, in rejecting an editorial change from digital wristwatches to cellular phones:

there is something inherently ridiculous about digital watches, and not about cellular phones. Now this is obviously a matter of opinion, but I think it’s worth explaining. Digital watches came along at a time that, in other areas, we were trying to find ways of translating purely numeric data into graphic form so that the information leapt easily to the eye. For instance, we noticed that pie charts and bar graphs often told us more about the relationships between things than tables of numbers did. So we worked hard to make our computers capable of translating numbers into graphic displays. At the same time, we each had the world’s most perfect pie chart machines strapped to our wrists, which we could read at a glance, and we suddenly got terribly excited at the idea of translating them back into numeric data, simply because we suddenly had the technology to do it. So digital watches were mere technological toys rather than significant improvements on anything that went before. I don’t happen to think that’s true of cellular comms technology. So that’s why I think that digital watches (which people still do wear) are inherently ridiculous, whereas cell phones are steps along the way to more universal communications. They may seem clumsy and old-fashioned in twenty years time because they will have been replaced by far more sophisticated pieces of technology that can do the job better, but they will not, I think, seem inherently ridiculous. ³

Let me know what you think! leebutterman@gmail.com

By replacing the new default scaled dot product attention with the existing Sliced Attention Processor”, runtime goes down from 9.5 minutes to 6 minutes. Interestingly, the default attention processor changed during development of this timepiece, which surfaced this regression, on this relatively rare deployment platform for large attention-based models. ↩
If we cannot come together for a problem that is recent, and has impact within days, and has a clearly known inexpensive solution (the US government spent $32B in the three decades before the pandemic through March 2022, about the cost of a low-end phone for each adult in the United States), how will we come together for a problem that is hundreds of years old, impacts systems with huge inertia, and has many unknown solutions that will be extremely expensive for each of us individually? ↩
That quote was from 1992, and in under twenty years there was an iPhone with an app store in roughly its modern shape, while many fewer people (especially as a percentage!) wear a digital wristwatch whose primary function is to keep time. ↩

Stable Diffusion Clock

2023-09-29T00:00:00-07:00

Three clock faces! Open source! Check it out!

How this was built

QR Code Controlnet + Stable Diffusion, a hyperlegible font from the Braille institute, a g5.xlarge that generates an image in 5.7 seconds, and some prompts.

QR Code Controlnet + Stable Diffusion

A few months ago nhciao posted several pieces of art that scanned as QR codes.

Recently, Angry Penguin made a Huggingface space that implemented that workflow for arbitrary prompts and images, with fully available source.

The luminance controlnet model and the upscaler are both publicly available on Huggingface, so it is possible to run those locally with only the cost of keeping the box running.

Typography from the Braille Institute

The numbers of the clock face need to be legible, and typographers have investigated this exact problem! This is the Atkinson font which was initially designed for letterform visibility, and its numbers are very legible as well. Even at the 1024x1024 final resolution the forms are legible, filtered through the various prompts.

`g5.xlarge`

The relationship that one has with a clock depends on how fast it goes. Running this controlnet on a Mac Studio in cpu mode (I could not get MPS to be stable) runs in a few minutes (at residential energy prices). Running on a g5.xlarge on AWS runs in 5.78 seconds in float16 precision (at $1/hr). This latency is low enough to generate several types of images for a few instances of each clock face per minute.

Prompts

Bosch

detail of a new hieronymus bosch painting

Hieronymus Bosch artwork can be blobby and weird with light figures on a dark background, and the images often fit the control image and are similarly spooky at first glance.

Train

beautiful detail of a train map

The train maps can be fascinating to look at, sometimes installed in an actual train, sometimes a low depth-of-field close up, always nonsensical after thorough observation. Need to work on this more. (Very open to suggestions!)

Urban

watercolor of a leafy pedestrian mall at golden hour with multiracial genderqueer joggers and bicyclists and wheelchair users talking and laughing

This comes from the sort of city I want to live in: leafy, close-knit, welcoming, without danger from wheeled vehicles.

Inspirations from other artists

There are current living artists who are making art in these styles and they would love you to support their work and their vision! (Send me more suggestions!)

Open source

All of the pieces come together in https://github.com/lsb/stable-diffusion-clock. The viewer could probably be improved from a static HTML page that refreshes itself every few seconds. The images are mostly deterministic: their seed comes from the epoch time at generation.

Apologies to people using this as a clock who are not also in Pacific time.

Let me know what you think!

Some open questions I’ve been thinking of:

Are there better prompts? I’m no Staff Prompt Engineer, but the frequency of new images per minute and gives enough variation to occasionally produce gems. The 0857 imagery at the top of this post was taken from the clock while running.
What is the latency on a new Raspberry Pi 5? A Jetson board from NVIDIA? Can this run on a little SBC, displaying the time like a beautiful clock, requiring no ongoing resources except power? (Most clocks do not cost $25/day to operate, the retail cost of a g5.xlarge.)
Control over training data? I’m unsure of what the sources of the training data for the upscaler and especially the QR Code controlnet were.

Tell me what you think! leebutterman@gmail.com

Wikipedia search-by-vibes through millions of pages offline

2023-06-01T00:00:00-07:00

Check it out! https://leebutterman.com/wikipedia-search-by-vibes/

What is this?

This is a browser-based search engine for Wikipedia, where you can search for “the reddish tall trees on the san francisco coast” and find results like “Sequoia sempervirens” (a name of a redwood tree). The browser downloads the database, and search happens offline. To download two million Wikipedia pages with their titles takes roughly 100MB and under 50 milliseconds to see the final results. This uses sentence transformers to embed documents, product quantization to compress embeddings, pq.js to run distance computation in the browser, and transformers.js to run sentence transformers in the browser for queries.

Is this good?

Yes.

Real-time search over millions of documents is happening in real-time completely offline. Results stream back every 10ms on a mobile device, and search results update gradually as the database is sequentially scanned.

Timing: first results in 21ms, 70% of final results in 116ms, faceted search in 23ms

The distance computation over 2M embeddings takes 250ms in total, over 20 iterations, and we can display intermediate results with a faceted top-10 computation that takes 8ms. To display intermediate results, we run batches of 100k distance computations at a time, and compute the top-k and repaint after a (30ms) timer runs out.

We order embeddings by compressed page size: more information-dense pages are the first to be analyzed and returned in a top-10 ranking, and might be more useful in a search result. Note that the search results continue to stream in and update the top results, but most of the lower-page-size pages do not rank in the top 10, so the search appears faster than if we did not update the UI until everything returned.

70% of the final search results were in the first 670K embeddings, which in total rendered in 116 milliseconds (note the topk timing at the bottom left, which counts distance calculations as positive times and topk calculations as negative times):

Note that changing the facet for the onomatopoeia search (changing the first letter of the page to return) avoided running a new embedding, and returned in under 25ms. Changing the number of results from top 10 to top 20 or top 100 is similarly instantaneous.

200k embeddings and page titles compress down to 10MB in Arrow

The database is small enough to support casual use cases of up to a million embeddings without special treatment.

Note that, for high performance, we use Arrow instead of JSON. Arrow can store our 8-bit integer product quantization arrays compactly, and Arrow can store an array of strings as an array of indexes into one buffer, which is a significant savings over a million Javascript string objects.

These ONNX models run in WASM for now

There is no GPU acceleration, only WebAssembly, so far. ONNX is a convenient compile target. WebGPU is still very new, and is an eagerly-anticipated future direction.

Step 1: embed all of Wikipedia with a sentence transformer

There are a lot of sentence transformers to choose from! There is a leaderboard of sentence embeddings: https://huggingface.co/blog/mteb

The all-minilm-l6-v2 model has reasonable performance https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 and is small and available in ONNX weights https://huggingface.co/Xenova/all-MiniLM-L6-v2/ for transformers.js https://github.com/xenova/transformers.js .

Step 2: use product quantization to compress embeddings

6M pages * 384-dimension embeddings * 32-bit floats is over 9GB. Even a million embeddings in float16 precision is 800MB. This is too large for casual usage.

As a first approximation, to choose the top million, one approach would be to choose the pages with the most information: compress each page and see the number of bytes that come out. Lists would be overrepresented (lists are less compressible than general text), there’s no appreciation of the link structure of webpages, but it’s cheap to compute and easy to start with.

FAISS (https://faiss.ai) is a highly popular embedding search engine serverside, with a lot of tuning knobs for creating different styles of search indices. Autofaiss (https://github.com/criteo/autofaiss) will usually recommend using Product Quantization, after creating IVF indices or HNSW indices (Pinecone has a great intro to vector indexing https://www.pinecone.io/learn/vector-indexes/).

Product quantization is exceptionally simple to implement: creating a ‘distance table’ is under 5 lines of numpy and using that to find distances is a one-liner.

Intermezzo: faceted search

Often times, you will want to search in some product subcategories, like finding only PDFs in a web search, or results in ancient Latin. Splitting up the distance computation from computing a top-10 ranking allows us to fudge the distances in flight before ranking. For million-scale search, this is highly feasible. In this search of Wikipedia, there is one search facet: the first character of the page. Because the top-k ranking is separate from distance computation we can avoid recomputing query embeddings and distances to explore different facet values in real time.

Step 3: hand-write ONNX

ONNX has a specific opcode that does exactly the product quantization step! That opcode is GatherElements. Unfortunately, the PyTorch ONNX export does not use this special opcode for the model as written. Thankfully, there is abundant support for reading and writing ONNX outside of a PyTorch compilation step.

A useful graphical editing tool for ONNX is ONNX-modifier, at https://github.com/ZhangGe6/onnx-modifier , which presents a friendly interface to add elements into the dataflow graph of any exported ONNX model.

By taking the multiple steps in the PyTorch model that gets compiled to ONNX, and replacing all of those with one ONNX opcode, distance computation is roughly 4x faster.

Step 4: export numpy to Arrow

As mentioned, the Arrow format is much more compact in memory and much more compact on disk to store the embeddings and the metadata (page titles).

Because the Arrow array format only stores one-dimensional data, and because we have 48 dimensions of embedding data, and because we do not want to store embedding data wrapped in another data format, we need two separate schemas, one for the metadata (with a hundred thousand rows each), and one for the embeddings (with a hundred thousand * 48 rows each), and we reshape the embeddings at load time.

Storing the product quantization codebook in JSON is under 1.0MB, so it is less crucial to optimize this part.

Step 5: let me know what you think 🙂

Lots of the library functions in the full Wikipedia search app should migrate into reusable pq.js components. A lot of the ONNX shapes are pre-baked, so it would be useful to support different quantization levels and different embedding dimensions. Give a shout!

Tironiculum—Automatic Speech Recognition for Latin

2022-03-01T00:00:00-08:00

Transcribe spoken Classical Latin, for the first time!

Try it out at https://huggingface.co/lsb/wav2vec2-base-pemlsb-la!

This is the first Latin speech recognition system! It is powered by a new voice dataset of 88.3 hours of Latin speech, mostly synthetically generated from Poeta Ex Machina. The self-supervision of using speech synthesis to train speech recognition (like SynthASR) offers a few exciting new directions (like examining the inductive biases of neural language models with artificial language).

Self-supervised Speech Recognition

Modern deep neural network statistical modeling relies on fewer hand-engineered features and larger piles of data. Self-supervised learning is increasingly useful in many applications, where a task can be framed as learning mechanically-generated labels. These labels are generated at usually much lower cost and usually much greater scale than human-generated labels. Self-supervised learning often amounts to learning the inverse of a mechanical process: image recoloring for black-and-white photographs is learned as the inverse of stripping images of their color. Super-resolution is learned as the inverse of downsampling images. Language modeling is learned as the inverse of deleting a word in a sequence (at the end (‘causal’) or in the middle (‘masked’)). A self-supervised speech recognition approach would be to start with a pile of text, generate synthetic speech, and learn to recognize human speech based on that synthetic speech, similar to SynthASR.

Many speech recognition systems rely on meticulously labeled sound files, with accurate timing data for each letter. The relatively-new wav2vec uses unlabeled (text, sound) pairs, which allow it to consume much more data (per $ of acquired sample data, like Common Voice). However, spoken Latin is rare, and much more challenging¹ to acquire than (say) Spanish or Japanese, so this self-supervised approach is crucial.

So: can Latin speech recognition learn from Latin speech synthesis? We can first create a dataset of Latin text, and then we can create a dataset of that text synthesized into speech, and we can try.

it requires High quality synthetic Latin speech in a classical pronunciation comes from Poeta ex Machina. Poeta ex Machina has a full database of scansions of single words, and we can use Poeta ex Machina to synthesize lines of (for example) Vergil for a multi-word corpus; all of Vergil’s extant works are all in dactylic hexameter, comprising over 21 hours of text.

Newly-available datasets!

`ancient-latin-passages`

The ancient-latin-passages dataset is a compendium of 19MB of Latin text, written roughly between 50BC and 150AD, from a wide variety of Classical authors on the Latin Library. This dataset was used to create poetaexmachina-mp3-recitations, and we can synthesize much more poetry and add to that dataset. This is publicly available at https://huggingface.co/datasets/lsb/ancient-latin-passages .

`poetaexmachina-mp3-recitations`

All of poetaexmachina-mp3-recitations is divided into three parts: the 1-grams, individual words, from Poeta ex Machina’s internal database of word scansions, comprising 66.9 hours of recited speech; the lines of dactylic hexameter, all from Vergil, comprising 21.4 hours of recited speech; and recitations from yours truly of Cicero and Catullus, comprising half a minute of recited speech. This is publicly available at https://github.com/lsb/poetaexmachina-mp3-recitations, with one recitation per text file + mp3 file.

wav2vec2 model, trained on Italian

In contrast to older speech recognition systems that require speech waveforms expensively annotated with timing data per letter, wav2vec2 is designed to learn timing data from unannotated pairs of an entire waveform and an entire text (usually under 10 seconds of audio).

The community and infrastructure around wav2vec2 means that there are many wav2vec2 models trained on various modern languages. We can take a large pre-trained model whose training data is close to the target data distribution, and use it as a foundational starting point, instead of starting training from scratch. Poeta ex Machina uses an Italian voice, partly for its phonetic inventory (English, for instance, does not have sufficient phonetic inventory: we believe that ancient Latin trilled or flapped its Rs (medi(us)-dies = meridies, like British English rhyming edible with terrible)), partly for sentimental/aesthetic reasons (would Spanish work? Russian? Xhosa?). For similarly phonetic and sentimental reasons, and availability, we use a wav2vec2 model trained on the Italian dataset of Vox Populi, and fine-tune from there. Informal test results found that the word error rate improved faster when fine-tuning from this Italian-trained model, compared to the English-trained model. An obvious future direction is starting from other initial monolingual models, or multilingual models. Another obvious future direction is upgrading from the 5-gram post-processing model to other text models (transformers? sub-word tokenization strategies?).

We can make our prediction task by normalizing the orthography of the Latin text, by stripping punctuation and macrons, and normalizing letters invented after 500AD (“j”, “v”, “w”) by substituting “j” with “i” and “v” with “u”, and only using lower case.

Wav2vec2 uses Connectionist Temporal Classification to infer its transcription: at each 20ms timestep we can predict a token, either a letter or a special character like a break, and we can merge identical predictions between breaks. The Huggingface wav2vec2 library has built-in support for an additional 1-to-5-gram language model, for post-processing the audio predictions with a stochastic 🦜. Tuning the post-processing model is very much an open question, especially for Latin.

Results: Word Error Rate of 4.13%

At the end of initial training, the word error rate was 4.13% on the validation set of data, only slightly more than 1 in every 25 words incorrect.

Given the rate of improvement when including even a small amount of human-generated training data, this is very much a work in progress, especially when experimenting with data augmentation.

Text-To-Speech self-supervision can analyze inductive biases of speech recognition systems

There are examples of using artificial languages to examine the inductive biases of neural language models (https://arxiv.org/pdf/2106.01044.pdf), and using artifically generated speech can be similarly useful here. By varying the voice pitch or timbre, or experimenting with background acoustics, or by introducing speech disfluencies, it would be possible to compare the inductive biases of speech recognition for different types of speakers (and then (ideally!) engineer those away). Using generated speech takes away one variable for trans-linguistic comparison of the model (“how well does this perform against English versus against Polish/Sanskrit/Tagalog/Toki Pona/etc”).

Thanks for reading, let me know what you think!

The careful observer will ask, why does one need speech recognition at all, if spoken Latin is very rare. I have a truly marvelous rationale which this endnote is too space-constrained to contain. ↩

Dactyloglyphomancy uses crypto for divination

2021-05-01T00:00:00-07:00

Like an amulet, brute force a string to a particular hash via emoji, and use the emoji for divination.

Bibliomancy: a time-honored tradition

Foretell the future through the text of a book in a few easy steps!

Choose a significant book, and ask the book a question about the future.
Open the book to a random page. Point to a random passage.
Because of the book’s magic powers, that passage has an oblique answer to your question.

People have used Vergil’s Aeneid, the Bible, and many other books for exactly this purpose.

We will use a similar concept: instead of divination through books, we will perform divination through fingerprints: dactyloglyphomancy.

Amulets

From https://text.bargains/amulet/:

An amulet is a kind of poem that depends on language, code, and luck. To qualify, a poem must satisfy these criteria:

Its complete Unicode text is 64 bytes or less.

The hexadecimal SHA-256 hash of the text includes four or more 8s in a row.

…

And, while this isn’t part of the formal definition, it’s important to say that an amulet of any rarity should be judged by its overall effect, with consideration for both its linguistic and typographic qualities. In particular, an amulet’s whitespace, punctuation, and diacritics should all be “load bearing”.

This is presumably so that an amulet will not be stuffed with zero-width nonjoiners, or be something like “lol butts 61140978758” (02c9ecef4bfda53a315201bcb728128888888888eed3b65d7bc0bcf5dae0ec2e) or just “2021-10-31T09:52:20.358328” (e61d881e1299dd7927c588888888884ac755686ace0a92012c89b5b6f46494c0).

Turn this on its head! Choose a phrase of importance, ask a question, in the form of a meaningful hexadecimal string, and find which emoji you can interpolate into the phrase to unlock the hexadecimal string in the fingerprint, and declare the choice of emoji significant to the question you have asked.

Dactyloglyphomancy in action

First, we choose an Oblique Strategy.

Bridges:
 -build
 -burn

Then, we ask a question.

What will be relevant for 2020, and 2021?

So we will search for a hash that matches 2020 and 2021.

Then we run several parallel threads in Rust to discover the hash ed7f23f8436df202021e1fce3d2f42627d5a7f1a01a23557c8c05ff6d1063e16, which corresponds to

Bridges:
 -build 💕
 -burn 📱

Thus, the theme for 2020 and 2021 is build bridges of 💕, and burn bridges with one’s 📱. Fair enough.

Next, we choose another Oblique Strategy, for areas of inquiry.

Consult other sources: promising, unpromising

Then, we ask for near-term suggestions: search for 202106.

With our resulting hash e68419dd6ba7e410a9c21509b6522f747b202106b41ace99623a800d9c87a5fa we find

Consult other sources: 🔭 promising, 🦖 unpromising

advising astronomy/astrology and ornithology/palaeontology.

Perhaps we require more timeless advice about defense against harmful desires. We can interpolate into Jenny Holzer’s famous

💆 PROTECT 🌄 ME FROM WHAT I 🏍 WANT 🧞

and perceive massages and mountain sunrises as a guard against motorcycles and bottled spirits, in an amulet that is legendary (2b40eb46e603c68049485e8a3df8888888862eebfab56ea00553305f506ec819).

We can use older poetry as well. Sappho’s “Some men say an army on horseback” is beautiful, and Anne Carson’s translation begins:

Some men say an army of horse and some men say an army on foot

and some men say an army of ships is the most beautiful thing

on the black earth. But I say it is

what you love.

The original Greek for the last “it is what you love” is κῆν’ ὄττω τις ἔραται which we can interpolate into

κῆν’ ὄττω ⛺ τις 🌘 ἔραται 🌄

for a timely amulet celebrating camping and mountain sunrises and night skies (bed5d962d2d2db72c4acf46799620210601e582e3bb65ff84698d5e5ee019915).

DIY

Try out https://github.com/lsb/dactyloglyphomancy yourself and divine new insights about your future!

Ancient Greece Papyrus priced as if copy paper made of gold

2021-03-01T00:00:00-08:00

I’d been thinking about how Ancient Greece lost writing for over half a millenium, because there’s been more and more an ever-stronger pushback against Big Tech’s technoboosterism, and I eventually got curious about the logistics of what happened when writing returned: for instance, what would you write on, especially for longer texts.

In 407 BCE, about 300 years after writing comes back to Ancient Greece, two sheets of paper to write an expense ledger cost 1⅓ drakhmai, roughly six grams of silver. We currently have “ledger”-sized paper (~A3 paper), and two sheets of normal-weight (80 gsm) ledger paper weighs ~20g. Papyrus cost a third of its weight in silver!

But! As for ancient baskets of goods, the going salary of an architect was a drakhma a day! A day’s pay for your writing material! And architects nowadays can work at $50/hr, for (say) a $500/day salary. Gold is $50/g nowadays: for 20g of paper, it’s comparable to if modern copy paper were worth half its weight in gold.

(Sources say that this high cost is from Egyptian monopoly pricing, with exorbitant taxes at every step, and it’s understandable why: it’s hard to administer an empire keeping debt records only in oral-formulaic poetry, or on wax tablets, or clay, or on the beach like Archimedes, and Egypt on the decline after fighting off the sea peoples probably didn’t want the competition.)

In the late Antique period, in England, in 1379, 126 books cost £113, and each £ was roughly 250g of silver, and hardcovers these days weigh around half a kilo, so, the books were roughly worth half their weight in silver. Nowadays, you can upload a 500 page pdf and get printed a ream of A4 paper individually-produced bound hardcover for $24, which is 2.5kg, and silver is ~$1/g, so $25 instead of $2500, only 100x cheaper now compared to several decades before movable type.

The copypaper-worth-its-weight-in-gold metaphor got me thinking about other examples of 25000x price improvement. The hard drive space to store one gigabyte of information cost $100k in 1985, $1k in 1995, and $1 in 2005. Tech twenty years in the future would literally have been worth its weight in gold, had it been available. (Also, it took a year or two to hand-copy a bible; a $500 printer these days can print a page a second, and finish a thousand pages of a bible in twenty minutes, a 25000x time reduction.

(a quick pechakucha talk I gave regarding ancient Greece losing writing was well received)

Store your epoch times as 64-bit floats

2021-02-01T00:00:00-08:00

Get quarter-microsecond granularity right now!

UNIX, since the 1970s, has had an internal notion of time that is the number of seconds after 1 Jan 1970 UTC.

This is often expressed as an integer, a signed integer. Many other APIs exist that specify fractional time, also as integers: clock_getres expresses seconds and nanoseconds as 32-bit integers, Java expresses time in milliseconds as a 64-bit integer, and a Date in JavaScript internally keeps track of milliseconds since 1970, PHP returns time in microseconds. Ruby keeps Time as nanoseconds and uses arbitrary-precision integers.

Instead of inventing a complex data structure yourself, use one implemented in hardware: the 64-bit float!

The float64 format has a sign bit, 11 exponent bits (representing exponents from ≈-1000 to ≈1000), and 52 explicit mantissa bits (representing a mantissa with precision of ≈ a quintillionth), as visualized by User:Codekaizen:

such that 1620620620 (in May 2021) is represented as 0b0100000111011000001001100010110101010011000000000000000000000000, or 0x41D8262D53000000.

The next largest floating point number is 0x41D8262D53000001, or 1620620620 + 2⁻²². This is a granularity of a quarter of a microsecond. Instead of many different APIs to try to represent fractional time, keep time as a float64, to adequately represent time with granularity of well under a microsecond for the next several decades, and only compute on this representation of epoch time.

Y2038 non-problem

Part of the problem of storing time as a 32-bit signed integer number of seconds after 1 Jan 1970: we have no more integers after 19 Jan 2038 that fit in 32 bits!

Signed integers roll over and turn negative when they overflow, at their current precision. Floats get half as precise when they overflow their current precision.

In 2038, float64s that represent time will degrade to a granularity of half a microsecond.

On 7 February 2106, when seconds after 1970 will exceed 2³², the floating point representation will have the precision of one microsecond, and maintain exactly the same bit structure.

At the extinction of the dinosaurs, 65 million years ago, when the epoch time was negative 2 quadrillion (-2051244000000000 for 65Mya), the precision is a quarter of a second.

Why am I not using float64 time already???

Even through the 90s, long after many system calls became formalized, floating point math was much more expensive than integer math. Also, while some of the earliest computers had floating-point support (C has a float and a double, because it initially ran on a computer that did!), there was no standard for what you could expect from a “float” or a “double”: K&R C explicitly warns you that a “double” could be 72 bits, and only in 1984 was there a floating point standard that people could ask for by name (IEEE-754), at which point many system APIs had settled.

Computing with float64 time

Floating point, especially when you least expect it, can be surprising: 0.1 (as expressed in the base 2 of a float64) + 0.2 (as expressed in the base 2 of a float64) generally equals 0.30000000000000004 (both 0.1 and 0.2, in float64 representations, are almost 2⁻⁵⁷ greater than their exact base 10 representations).

For this reason, financial computations in floating point are strongly discouraged.

Time is not money!

Whereas money can be contractually expressed as hundredths or millionths of a base currency ($, €, et cetera), time is not exact! Facebook increased the accuracy of their computers’ time from milliseconds to within hundreds of microseconds and it was a big deal.

Whereas you can reasonably divide a financial sum 3 ways, and you want to ensure that the parts sum to the whole, you will generally not be multiplying the time after 1970 by a number and making sense out of it, because 1970 is just an arbitrary zero-point.

Generally, to compute durations, you will be performing arithmetic on times. On computers that can adjust the system clock multiple microseconds at a time, sub-microsecond precision is entirely sufficient.

Furthermore, float64s are entirely adequate for storing both the number of seconds after 1970, and also the number of seconds of a particular duration, and when these numbers are smaller, the granularity increases: the granularity at a billion is a billionth of the granularity at one, so continuing to compute in float64 is a great idea, no type conversions required.

Case study: 128-bit UUIDs

Time stored as a float64 makes a lot of sense, especially when used in a fixed-length id!

Simple: float64+random64

Let us say that you want (probably) unique ids, which you can sort lexicographically (run through sort) and get a rough ordering in time.

The big-endian representation of float64 supports this sort order: recall that 1620620620 (May 2021) in a float64 is 0x41D8262D53000000, and 0x41D8262D53000001 is 1620620620 + 2⁻²². All positive numbers sort in ascending order, as do all negative numbers.

When time is accurate to hundreds of microseconds, time storage at sub-microsecond precision is entirely adequate.

If you use all 128 bits of the UUID, disregarding UUID’s backwards compatibility built in for 1980s computers, you have 4M different float64s per second, and you have 64 full bits of randomness.

Based on the math powering the Birthday Problem, for a 50% chance that two 64-bit random strings are equal, you would need roughly 5 billion 64-bit random strings, every quarter of a microsecond.

If you are okay with a quarter of a percent chance of any of these float64+random64 UUIDs colliding in twenty years, then the probability of collision per timeslice needs to be one in a quintillion, 10⁻¹⁸: (1-10⁻¹⁸)^(4000000 * 86400 * 365 * 20) ≈ 99.75% , which is to say, the odds of not colliding per timeslice, 1-10⁻¹⁸, multiplied together for the timeslices in a second for the seconds in a day for the days in a year for twenty years.

If you are making 6 of these UUIDs every quarter-microsecond, the space to store only the ids is 16 bytes/id * 6 ids/tick * 4M ticks/s * 86400 s/day * 30 day/month ≈ one petabyte per month, only for UUIDs.

If these UUIDs are connected to event data, and your event data is at least 10x the size of the id of the event, that is over 2PB/week.

Most use cases do not have 2PB/week of new data! Using this float64+random64 scheme is entirely enough to identify most types of events as they happen, with a very low chance of collision.

Fancier: float56 + random72

The float64 corresponding to the current epoch time will have its highest-order byte equal to 0x41, from 2 Jan 1970 until 16 Mar 2242. If we only store the lower 56 bits, we can have 8 more bits of randomness per timeslice.

The number of random72s that we can make every quarter-microsecond tick to retain the odds of collision at 10⁻¹⁸ is 97: √(2 × 2⁷² × -ln(1-10⁻¹⁸)) ≈ 97.

This is sixteen times as many as the float64random64, so, this corresponds to at least 30PB/week of event data. This is over an exabyte a year, well over $20M in storage costs alone.

Bonus: visualization strategies!

Kudos to Evan Wallace’s Float Toy for visualizations of the binary float16/float32/float64 formats! Kudos to Bartek Szopka’s ieee-754-visualization for a slightly more math-oriented approach!

Store your epoch times as 64-bit floats

Computing with a float64 is cheap, you get sub-microsecond precision nowadays, you don’t need to pre-coordinate about milliseconds versus microseconds versus (sencond,nanosecond) pairs et cetera et cetera, as long as you’re not counting individual nanoseconds you should be great.

Also obviously store your human times as ISO 8601 strings (among many other reasons: the list of time zones is unbounded).

2020 NeurIPS Saturday

2020-12-12T00:00:00-08:00

Brief notes

Farewell to Fart: Janelle Shane
- char-rnn makes cute recipes, obviously wrong; larger pre-trained models make more plausible recipes
- knitting patterns are much more susceptible to minor errors which can just eat up yarn (!)
- if you start with crochet patterns, and use GPT-2 to generate arbitrary text, it can sound realistic, and will steer any conversation onto hats lol
- GPT-3: no longer amusingly weird from scratch
- folk-rnn makes ∞ Irish folk tunes, but people aren’t interested in playing them (AI to fuel buckets of unremarkable content does not respect the consumer)
- Kate Compton: Opulent AI, AI that calls attention to its own artifice
- use GPT-3 to complete a prompt about training a neural network to generate costumes (!)
- horse facts can be adversarial if they’re too basic!
  - Q: How many giraffes are in the average living room? A: Two, but they won’t talk to each other!
- text-generating algorithms are getting better at sounding cliché
Artificial biodiversity: Sofia Crespo
- cf artbreeder
- where is the beauty in a dataset? images? pixels? computational training? the NN itself?
- “Isn’t all art made by humans an execution of reshaping of data processed by neural networks?”
- Visual Indeterminacy in Generative Neural Art
- Codex Seraphinianus: showcase of life, invented by an artist
- Anna Atkins: create an impression of life itself
Harms from AI research — workers very susceptible to wage theft, and wages are very low (MTurk can be a race to the bottom)
How should researchers engage with controversion applications of AI?
- math can be weaponized (facial recognition blocking entry to home, etc)
- “I’m not ready to always have an alternative ready just because you’re not prepared to engage with critiques of your work” – Tawana Petty
- “This is not a mathematical problem, so this will not have a mathematical solution, and we cannot offer more sophisticated math instead of engaging as activists”
Panel, anticipating / mitigating risks, both of participants and products
- social impact: if you’re not willing to engage with communities, the risks are totally different (firsthand risk: finder/publisher controls the narrative!)
- that the AI community has a problem with thinking about the ethical implications is very worrying
  - you never see “nursing for good” or “food for good” as often as you see “AI for good”
  - the smarter you are, the better you can justify any random decisions
- how do we change incentives?
- risk pyramids
- transphobic research presented at NeurIPS last year
  - very easily spotted by trans community
  - cf the work of Nicki Washington, at the intersection of race/gender/computer science — cf Ruha Benjamin
- HCI has value-sensitive design, as does STS
- interdisciplinary work helps spread this knowledge
- incentive mechanism: value papers holistically, and not condemning papers as “just dataset papers”
- self-auditing does not work!