I was wondering if the conversation would get to discussing this.
I was curious some time ago and did some testing and some math.
In testing I can see the difference between 24p and 30p easily, on both a 60p display or a display that is set to the native frame rate. The difference is obvious and the look of 30p is quite distasteful to me, regardless of the display frame rate / refresh rate.
24p on a 60p display does indeed introduce jitter in the timing of the frames (where the frames displayed are "nearest" and not synthesised from multiple frames in the source material). When you go to higher frame rates the jitter becomes less, with 120p being an even multiple of 24p, so the jitter of 24p will be eliminated or drastically reduced with higher display frame rates.
In the math I did, I was surprised to see that capture frame rates are remarkably preserved even if put through different frame-rate timelines / displays etc.
Assuming I didn't screw up the logic, here's what you see when watching 24p source material on 30p display. Timing is all over the place, but for whatever reason both 24p on a 24p display as well as the below are still preferable to 30p for me.
What becomes interesting is when we shoot 30p, put it on a 24p timeline, and then display it on a 30p display:
Apart from a doubled-up frame every so often (because there are only 24 frames per second to choose from), the 30p is completely resurrected!
I have wondered if Netflix etc apps on smart TVs actually change the frame rate based on the source material or if they just run the TV at some fps and pick the nearest frame to display. I have been meaning to test my TV with my phone (recording the screen with 240fps slow motion and then reviewing the footage and counting the frames is pretty straight-forwards).
TLDR;
24p is far superior to 30p/60p regardless of display refresh rate (for me anyway)
When displays move to faster refresh rates the jitter from 24p sources will be reduced / eliminated
Frame rate conversions can involve interesting time-aliasing effects where the time-resolution of some frame rates can pass through almost completely in-tact