RETENTIVE NETWORK: A SUCCESSOR TO TRANSFORMER FOR LARGE LANGUAGE MODELS
IN THIS WORK, WE PROPOSE RETENTIVE NETWORK (RETNET) AS A FOUNDATION ARCHITECTURE FOR LARGE LANGUAGE MODELS, SIMULTANEOUSLY ACHIEVING TRAINING PARALLELISM, LOW-COST INFERENCE, AND GOOD PERFORMANCE.