Timing of audio and hotspots in interactive videos desktop vs mobile

  1. I have a set of interactive videos where a red dot appears on an object (say, an apple), and when the user clicks or taps on it, the corresponding word (e.g., "apple") is played out loud. However, I've encountered an issue with the timing on mobile devices. Specifically, the audio starts a fraction of a second before the red dot actually appears on the object and consequently is not fully pronounced after the hotspot is pressed. You would hear: a.... then click... then... pple, instead of just hearing "apple" after clicking. This problem seems to be isolated to mobile devices (I tried both iOS and Android, and the problem appeared on both). It didn't occur when testing the videos on desktop browsers like Firefox, Chrome, and Vivaldi.
  2. Wordpress 6.3.1
  3. The problem is the mobile one. The desktop version follows the logic of the videos I made.
  4. Mutliple browsers
  5. H5P 1.15.6
  6. https://derjapossible.com/lessons/exercice-1-practice-listening/
  7. No errors
  8. No errors
  9. N/A
  10. No changes were made
  11. No changes were made
H5P file: 
Content types: