Teaching AI with “Holes” in Text: From Photo Denoising (VAE+DDPM) to Filling Blanks (Discrete Diffusion)

With photos, we’ve used VAE+DDPM to add “static-like noise” and train the model to bring the image back.
Now let’s move to text. Text doesn’t have sound or color, so we make holes (<mask>) instead of adding noise, and practice filling the holes with the right tokens. This is called discrete diffusion (masking).

Easy picture in your head

Photos: add noise → remove noise to get the original back (continuous world)
Text: turn parts into <mask> → fill them with the right tokens (discrete world)

How does it learn?

Hide parts of a sentence or code with <mask>.
The AI guesses the missing token at each masked spot.
We slowly increase the number of holes (difficulty) so it gets better step by step.

Today’s tiny experiment

We train on just three tiny functions—add / factorial / fibonacci—and try left-to-right (L2R) generation.

Experiment code (train on tiny data → L2R generation)

We taught the model only these three functions:

tiny_samples = [
"""def add(a, b):
    return a + b
""",
"""def factorial(n):
    if n <= 1: return 1
    x = 1
    for i in range(2, n+1): x *= i
    return x
""",
"""def fibonacci(n):
    if n <= 1: return n
    a, b = 0, 1
    for _ in range(n-1): a, b = b, a+b
    return a
""",
]

We then apply masking rules to turn them into fill-in-the-blank problems. Generating to restore the original is simple enough to run on a regular PC.

=== L2R: fibonacci ===
def fibonacci(n):
     if n <= 1: return n
    a, b = 0, 1
    for _ in range(n-1): a, b = b, a+b
    return a
onacci a _onacci n a range
    range range bonacci range = range n b a
    = for rangeonacci

=== L2R: add ===
def add(a, b):
     return a + b
 return n
    a, b = 0, 1
    for _ in range(n-1): a, b = b, a+b
    return a
onacci n _
    a a range-

The extra lines at the end aren’t needed—we didn’t fine-tune the stopping behavior, so it keeps generating a bit too much.
For this demo, we wrote the whole thing from scratch in PyTorch.

Summary

Photos: practice “removing noise” (VAE+DDPM).
Text: practice “filling holes” (discrete diffusion via mask recovery).
Start with tiny tasks, then scale up to longer texts once it works.

Teaching AI with “Holes” in Text: From Photo Denoising (VAE+DDPM) to Filling Blanks (Discrete Diffusion)

Yuichiro Minato

Teaching AI with “Holes” in Text: From Photo Denoising (VAE+DDPM) to Filling Blanks (Discrete Diffusion)

Easy picture in your head

How does it learn?

Today’s tiny experiment

Experiment code (train on tiny data → L2R generation)

Summary