The instructor and the scholar
Our method revolves round an idea referred to as data distillation, which makes use of a “instructor–pupil” mannequin coaching technique. We begin with a “instructor” — a big, highly effective, pre-trained generative mannequin that’s an knowledgeable at creating the specified visible impact however is way too gradual for real-time use. The kind of instructor mannequin varies relying on the purpose. Initially, we used a custom-trained StyleGAN2 mannequin, which was skilled on our curated dataset for real-time facial results. This mannequin may very well be paired with instruments like StyleCLIP, which allowed it to control facial options based mostly on textual content descriptions. This offered a powerful basis. As our venture superior, we transitioned to extra refined generative fashions like Google DeepMind’s Imagen. This strategic shift considerably enhanced our capabilities, enabling higher-fidelity and extra numerous imagery, larger creative management, and a broader vary of types for our on-device generative AI results.
The “pupil” is the mannequin that in the end runs on the person’s machine. It must be small, quick, and environment friendly. We designed a pupil mannequin with a UNet-based structure, which is great for image-to-image duties. It makes use of a MobileNet spine as its encoder, a design identified for its efficiency on cell units, paired with a decoder that makes use of MobileNet blocks.