Teaching AI with “Holes” in Text: From Photo Denoising (VAE+DDPM) to Filling Blanks (Discrete Diffusion)
With photos, we’ve used VAE+DDPM to add “static-like noise” and train the model to bring the image back.
Now let’s move to text. Text doesn’t have sound or color, so we make holes (<mask>
) instead of adding noise, and practice filling the holes with the right tokens. This is called discrete diffusion (masking).
Easy picture in your head
- Photos: add noise → remove noise to get the original back (continuous world)
- Text: turn parts into
<mask>
→ fill them with the right tokens (discrete world)
How does it learn?
- Hide parts of a sentence or code with
<mask>
. - The AI guesses the missing token at each masked spot.
- We slowly increase the number of holes (difficulty) so it gets better step by step.
Today’s tiny experiment
We train on just three tiny functions—add
/ factorial
/ fibonacci
—and try left-to-right (L2R) generation.
Experiment code (train on tiny data → L2R generation)
We taught the model only these three functions:
tiny_samples = [
"""def add(a, b):
return a + b
""",
"""def factorial(n):
if n <= 1: return 1
x = 1
for i in range(2, n+1): x *= i
return x
""",
"""def fibonacci(n):
if n <= 1: return n
a, b = 0, 1
for _ in range(n-1): a, b = b, a+b
return a
""",
]
We then apply masking rules to turn them into fill-in-the-blank problems. Generating to restore the original is simple enough to run on a regular PC.
=== L2R: fibonacci ===
def fibonacci(n):
if n <= 1: return n
a, b = 0, 1
for _ in range(n-1): a, b = b, a+b
return a
onacci a _onacci n a range
range range bonacci range = range n b a
= for rangeonacci
=== L2R: add ===
def add(a, b):
return a + b
return n
a, b = 0, 1
for _ in range(n-1): a, b = b, a+b
return a
onacci n _
a a range-
The extra lines at the end aren’t needed—we didn’t fine-tune the stopping behavior, so it keeps generating a bit too much.
For this demo, we wrote the whole thing from scratch in PyTorch.
Summary
- Photos: practice “removing noise” (VAE+DDPM).
- Text: practice “filling holes” (discrete diffusion via mask recovery).
- Start with tiny tasks, then scale up to longer texts once it works.