Friday, August 29, 2025

Unlocking information synthesis with a conditional generator

Experiments

We carried out experiments on 4 datasets, the place three datasets correspond with downstream generative duties and one dataset with a classification process. Generative duties are sometimes tougher than classification duties. It’s because the generative duties are evaluated by the next-token prediction accuracy, which requires the artificial information to protect fine-grained textual data from the personal information. In distinction, the classification duties solely require sustaining the co-occurrence patterns between labels and phrases within the personal information.

The three generative duties are chosen to cowl a various set of sensible situations: PubMed (medical paper abstracts), Chatbot Enviornment (human-to-machine interactions), and Multi-Session Chat (human-to-human day by day dialogues). To judge the standard of the generated artificial information, we adopted the setup of Aug-PE to coach a small downstream language mannequin on the artificial information after which compute the next-token prediction accuracy on the actual take a look at information.

The classification process is carried out on the OpenReview (educational paper opinions) dataset. To judge the standard of the generated artificial information, we prepare a downstream classifier on the artificial information, and compute the classification accuracy on the actual take a look at information.

To mitigate issues concerning information contamination, we rigorously analyzed our chosen datasets. Our evaluation confirmed no overlap between our pre-training information and the downstream datasets.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles