Virtual tours with mixed content types (image, video, text)

I have not been able to add text to an image hotspot without having to hide it in the hover-text. Ideally, I'd like it to be a small iframe in the spot so that I can have e.g. a text description, an image, some more text and then a short example video.

I haven't seen this requested, is there a way to do this already? Am I missing something obvious?

Content types: 
1
0
Supporter votes Members of the Supporter Network can vote for feature requests. When the supporter network has generated sufficient funding for the top voted feature request it will normally be implemented and released. More about the H5P Supporter Network