The Math is easy to explain.
First, 4:2:0 means it has half of the resolution in both directions in the chroma channel.
So if we downsample the resolution to a half, we have full color information in both directions, hence 4:4:4.
Second, originally the luminance has full resolution and are 8 bit. We can consider 8 bit precision means there are 256 possible values, and as each of them are precise up to an integer, the absolute error is 0.5. Averaging 4 pixels will divide the absolute error by 4, hence 1/8. To make it an integer, we will multiply it by 4, hence 10 bit.
Another way of saying it is there are 256 possible values for each pixel, adding the 4 pixels together make it 1024 possible values, hence 10 bit.