Use inline assembly for x86_64 speed/accuracy.
Sacrifice speed for accuracy on other processors.
Continue to use original implementation for ARM on Windows.
As os_gettime_ns() gets large the current scaling methods, mostly by casting
to uint64_t, may lead to numerical overflows. Sweep the code and use
util_mul_div64() where applicable.
Signed-off-by: Hans Petter Selasky <hps@selasky.org>