SILO LANGUAGE MODELS: ISOLATING LEGAL RISK IN A NONPARAMETRIC DATASTORE

SILO IS BUILT BY (1) TRAINING A PARAMETRIC LM ON OPEN LICENSE CORPUS (OLC), A NEW CORPUS WE CURATE WITH 228B TOKENS OF PUBLIC DOMAIN AND PERMISSIVELY LICENSED TEXT AND (2) AUGMENTING IT WITH A MORE GENERAL AND EASILY MODIFIABLE NONPARAMETRIC DATASTORE (E. G., CONTAINING COPYRIGHTED BOOKS OR NEWS) THAT IS ONLY QUERIED DURING INFERENCE.