Extract the itut_t35 caption data buffer from existing caption data and repackage in a metadata OBU instead of an AVC/HEVC SEI.
One notable difference from the AVC/HEVC code is that it also inserts the METADATA and SEQUENCE_HEADER OBUs into new_packet, otherwise the resulting video file wouldn't play.