common.title

Docs
Quantum Circuit
TYTAN CLOUD

QUANTUM GAMING


Overview
Contact
Event
Project
Research

Terms of service (Web service)

Terms of service (Quantum and ML Cloud service)

Privacy policy


Sign in
Sign up
common.title

Teaching AI with “Holes” in Text: From Photo Denoising (VAE+DDPM) to Filling Blanks (Discrete Diffusion)

Yuichiro Minato

2025/08/12 12:44

Teaching AI with “Holes” in Text: From Photo Denoising (VAE+DDPM) to Filling Blanks (Discrete Diffusion)

With photos, we’ve used VAE+DDPM to add “static-like noise” and train the model to bring the image back.
Now let’s move to text. Text doesn’t have sound or color, so we make holes (<mask>) instead of adding noise, and practice filling the holes with the right tokens. This is called discrete diffusion (masking).

Easy picture in your head

  • Photos: add noise → remove noise to get the original back (continuous world)
  • Text: turn parts into <mask> → fill them with the right tokens (discrete world)

How does it learn?

  1. Hide parts of a sentence or code with <mask>.
  2. The AI guesses the missing token at each masked spot.
  3. We slowly increase the number of holes (difficulty) so it gets better step by step.

Today’s tiny experiment

We train on just three tiny functions—add / factorial / fibonacci—and try left-to-right (L2R) generation.

Experiment code (train on tiny data → L2R generation)

We taught the model only these three functions:

tiny_samples = [
"""def add(a, b):
    return a + b
""",
"""def factorial(n):
    if n <= 1: return 1
    x = 1
    for i in range(2, n+1): x *= i
    return x
""",
"""def fibonacci(n):
    if n <= 1: return n
    a, b = 0, 1
    for _ in range(n-1): a, b = b, a+b
    return a
""",
]

We then apply masking rules to turn them into fill-in-the-blank problems. Generating to restore the original is simple enough to run on a regular PC.

=== L2R: fibonacci ===
def fibonacci(n):
     if n <= 1: return n
    a, b = 0, 1
    for _ in range(n-1): a, b = b, a+b
    return a
onacci a _onacci n a range
    range range bonacci range = range n b a
    = for rangeonacci

=== L2R: add ===
def add(a, b):
     return a + b
 return n
    a, b = 0, 1
    for _ in range(n-1): a, b = b, a+b
    return a
onacci n _
    a a range-

The extra lines at the end aren’t needed—we didn’t fine-tune the stopping behavior, so it keeps generating a bit too much.
For this demo, we wrote the whole thing from scratch in PyTorch.

Summary

  • Photos: practice “removing noise” (VAE+DDPM).
  • Text: practice “filling holes” (discrete diffusion via mask recovery).
  • Start with tiny tasks, then scale up to longer texts once it works.

© 2025, blueqat Inc. All rights reserved