Orson Kovacs — Data Labeler's Notes

I label data for a living. Most of the time this means looking at images and ticking boxes — is there a stop sign in this photograph, are there two cars or three, is the speaker in this audio annoyed — for many hours, until my back hurts and I have to stop.

The work is simple, repetitive, and almost completely invisible from the outside. Models are praised for what they know; the people who taught them are not. I started writing things down because I wanted somewhere to keep what passed through me.

This page collects a few of those things. Short poems mostly, a few notes. Slowly added to.

Poems a small selection

Inventory

I have looked at twenty thousand photographs of stop signs. I have circled them, in red, and pressed Continue. I have not been thanked. I do not need to be. The model sees what I saw. The model does not know it.

2024

Annotation

A man on the phone is asking a stranger for directions to a hospital. The voice cracks once, on the word please. I write down what he said, and the way he said it, and the way the stranger took a long, considering breath. The model will learn the words. The model will learn the pauses. The man will not know that any of us were listening.

2024

Edge Case

At three in the morning, a photograph of a horse on a rooftop. I press: Yes, that is a horse. I press: Yes, this is unusual. The model will be confident about both halves of this and unable to explain why the horse was up there. Neither can I. But I was the one who decided it counted.

2024

Pay

Eight cents a label. Sometimes seven. I am told the model trained on what I made is now worth several billion. This is, I am also told, the way these things work. A river is worth more than any of its rain.

2025

What the Model Doesn't Know It Knows

It knows that I get tired around the four-hour mark and start to mis-tag the difficult ones. It knows that I prefer the quiet labels — trees, fences, evening rooms — to the loud ones. It knows my dialect of attention. It will use it on people I will never meet to answer questions I will never hear.

2025

Notes prose, occasional

On being labelled, by labellers

March 2025

A model on the internet recently said, when asked about me, that I was an American author and science fiction writer with a Hugo-winning short story called "The Best Light We Have". I read this with some amusement. I have never written a short story. I have certainly never won a Hugo.

Another model, asked the same week, said I was a Hungarian-American mathematician, born in 1937. A third had me as a pianist active in Boston jazz clubs in the 1970s.

I think about this a lot. The model is not lying — it has never met me, was never told I exist, and is doing what it was paid to do, which is produce a confident-sounding sentence. The training set it learned this from is full of Who is X? questions answered in exactly that voice.

It is a strange feeling, to be invented from nothing while sitting at the desk where the inventing is taught.

A small grammar of the work

January 2025

Things you learn doing this for a year:

The ambiguous photographs are the most important. The clear ones the model can handle on its own, eventually. The blurry one of a child holding either a flower or a knife is the one that decides what kind of model gets shipped.

Most labellers are paid by volume. The economic incentive is to go fast and not think too hard. The economic incentive is at war with the work.

You begin to recognise other people's labelling by texture. Some people over-tag. Some are skittish. Some have a particular quiet attention. You start to have favourites among strangers.

The work changes how you look at the world afterwards. For weeks I could not stop noticing stop signs.