We show that SGs can encode surgical scenes in a human-readable format. We propose a novel pre-training step that encodes global and local information from (image, mask, SG) triplets. The learned ...