Skip to content

特征预处理花费时间 #7

@xyy12345423

Description

@xyy12345423

作者大大您好,在原文中的视频特征提取部分您提到”This involves scene detection, object detection (Ren et al., 2015), face detection (Zhang et al., 2017), face tracking, and audio-visual active speaker detection (Tao et al., 2021), as described in (Zhang et al., 2022a). This process can generate more than 1,000K high-quality keyframes with speaker bounding boxes in approximately 5 days. Next, we use these annotated RoIs and employ the instance segmentation method, Mask R-CNN (He et al., 2017), pre-trained on the COCO (Lin et al., 2014) dataset to extract visual features.“,请问下之后的1000K帧说话人特征提取大概花费了几天时间呢,以及整个过程(即场景检测,目标检测,人脸跟踪,特征提取等)用到的GPU型号和数量是怎样的呢?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions