VITS2: IMPROVING QUALITY AND EFFICIENCY OF SINGLE-STAGE TEXT-TO-SPEECH WITH ADVERSARIAL LEARNING AND ARCHITECTURE DESIGN
SINGLE-STAGE TEXT-TO-SPEECH MODELS HAVE BEEN ACTIVELY STUDIED RECENTLY, AND THEIR RESULTS HAVE OUTPERFORMED TWO-STAGE PIPELINE SYSTEMS.