UNIVTG: TOWARDS UNIFIED VIDEO-LANGUAGE TEMPORAL GROUNDING

MOST METHODS IN THIS DIRECTION DEVELOP TASKSPECIFIC MODELS THAT ARE TRAINED WITH TYPE-SPECIFIC LABELS, SUCH AS MOMENT RETRIEVAL (TIME INTERVAL) AND HIGHLIGHT DETECTION (WORTHINESS CURVE), WHICH LIMITS THEIR ABILITIES TO GENERALIZE TO VARIOUS VTG TASKS AND LABELS.