time rules. not only in our life-cycle but also in the software life-cycle.
time and time-ordering play an important role in distributed systems.
timestamp based storage is crucial when you want to maintain an order, can be events, messages, requests, responses or anything in general systems.
for example:
- time plays an important role in unique ID generator systems like twitter snowflake, google spanner, etc.
- we require time to determine cache entries and cache evictions.
- we require time to log the details. to maintain order.
- for performance measurements, for profiling.
- to detect failures, timeouts, retries, etc.
why actually this is an issue? time is just time right? your computer, my computer, that guy's mobile everything shows the same time. we maintain it and it is a done deal. but, NO. there is a lot going on in terms of time in the systems around us. it is harder to synchronize the time across things.
there are two types of clocks:
- physical clocks: number of seconds elapsed.
- logical clocks: count events, ex: messages sent till that point.
physical clocks/most computer clocks uses quartz. the resonance frequency in the material determines the current time. atomic clocks uses caesium-133, quantum mechanics. there are also clocks which uses gps, again contacting multiple satellites consisting of atomic clocks.
clock in digital electronics (oscillators) != clock in distributed systems (timestamps)
in distributed systems or in general software applications, you may have seen that we store timestamps like 1757997453907.
UTC is coordinated universal time. this is the combination of:
- greenwich meridian time (GMT. solar time, previously) based on astronomy.
- international atomic time (TAI) based on quantum mechanics. 1 day is 24 x 60 x 60 x 9192631770 periods of caesium-133's resonant frequency.
these two are a little off actually. not exactly same. why they are off?:
- earth speed isn't constant. astronomical time is based on earth's rotations.
- and as earth rotations are slightly off, we need to account for it.
so, universally we will consider UTC which is something that is slightly corrected TAI based on GMT.
timezones and daylight savings time are all offsets on UTC.
this correction is called as leap seconds:
- this may or may not happen every year.
- it can be a negative leap second, which is at 30 June or 31 December, clock directly jumps from 23:59:58 to 00:00:00.
- or it can be nothing. it just runs as usual.
- it can be a positive leap second, so we add a second after 23:59:59 which is 23:59:60. and then it jumps to 00:00:00 after one extra second.
computer represents timestamps:
- unix time: number of second since 1 Jan 1970, 00:00:00 UTC called as epoch. but here we won't count leap seconds. also known as universal atomic time.
- iso 8601: just year, month, day, hour, minute, second, and timezone offset that is relative to UTC. ex: 2025-09-15T14:30:00+05:30.
year, month, day: 2025-09-15
separator b/w date and time: T
time in 24-H format: 14:30:00
time zone offset from UTC: +5:30 means 5 hours 30 minutes ahead of UTC, which is IST (Indian Standard Time)
the conversion between these two requires gregorian calendar. takes into consideration of leap years and we can convert them. but what about past and future leap seconds? SOFTWARE DOESN'T CONSIDER THEM.
but in distributed systems, we do require this. we require sub-second accuracy. this inaccuracy may lead to deadlocks and system breakage in a few cases.
hence, the solution involves smearing/spreading the leap second across a time period. so in order to account for a single leap second, we either slow the system a very very very very tiny bit or ramp it up a very very very very very tiny bit.
clock synchronization:
computers have their own quartz clock from where it will get it's time. two computers time can't be the same. (rhyming :), read time as 'tiiaamm') due to clock drifts, clock error gradually increases.
this difference in time is called clock skew. solution is to periodically get the current time from an external server which probably contains a atomic clock or is a gps receiver.
the protocols like NTP (network time protocol) do this synchronization.
- ntp takes into account the network call delay from client to server, then other delays, and then response delay, etc.
- by considering all this, it decides on the clock skew.
- if clock skew < 125 ms, it slews the clock. makes it slower by 500 ppm (parts per million), probably takes 5 minutes to get the clock back to normal.
- if clock skew > 125 ms and < 1000 s, it steps the clock. it changes to the appropriate time directly.
- if greater than 1000 s, it just decides it is not right at all. gives the wand to human.
hence, whatever systems that rely on clock synchronization need to consider this clock skew.
generally, System.currentMillis() or clock_gettime(CLOCK_REALTIME) follow time-of-day clocks. which is the unix time. from an epoch like Jan 1, 1970.
and this gets affected by the ntp when handling the clock skews. hence, if you get any elapsed time from this kind of systems, you need to be careful because it can get negative in scenarios where ntp steps the clock in between. but, these are useful across distributed nodes as all the systems follow the same -> timestamps are matched across nodes.
using System.nanoTime() or clock_gettime(CLOCK_MONOTONIC follow monotonic clocks. monotonic clocks gives timestamps from an arbitrary epoch, ex: when system booted up.
and this doesn't gets affected by the ntp. because it is just some time from an arbitrary point. useful when elapsed time has to be measured on a single node.
now, coming to the previous point - in software applications, we often have to keep order. the best example here is messages. messages need to be ordered. suppose, A sends a message m1. B responds to with a message m2. now, C reads both these but, he should see the message m1 before m2. obviously.
how can we achieve this?
in a general setting, we can maintain timestamps along with messages coming from different nodes in a distributed system. but, we have just discussed that physical timestamps are unreliable in distributed systems. especially in time-sensitive environments.
here comes the happens-before relation.
- it is a logical concept where we maintain logical timestamps with the messages.
- if two events are processed in the same node, then clearly we can maintain order. we can say A -> B. (implies A event is followed by B event).
- if an event (or a message as an example) in a node 'A' affects some event in another node 'B' (sending a message from A to B) then, the effect received in node 'B' (receiving the message) is after the event occurred in node 'A'.
- transitivity applies: A -> B and B -> C leads to A -> C. (-> implies followed by).
here, logical timestamps can be lamport clocks (Lamport is the guy who proposed this relation) or vector clocks (you often see them when we apply reconciliation over inconsistent writes in distributed systems). even, CRDTs are used in this case. it is just that, you need some way to tell the order.
- lamport clocks can't tell whether two events are concurrent. they can just tell whether one occurs before or after.
- if two events are causal then, lamport timestamps agrees with it.
- but, if one lamport timestamp is less than another, we can't tell whether the first happened before the other or they are just concurrent/incomparable.
- vector clocks has two way relation with the happens-before logic. they can say concurrency as well as happens-before.
if there is some happens-before relation between A and B then, it is either that A is followed by B (A -> B) or B is followed by A (B -> A).
if there is no relation at all, then we can just say that sA is concurrent to B. A || B.
causality:
- this happens-before relation directly promotes the concept of potential causality.
- causality implies that if A -> B then, A can cause B.
- note that,
happens-before relation != guaranteed causalitybut onlypotential causality.
now, this becomes the central point of any distributed system. time-ordering of events is important to reconcile or avoid inconsistent states.
References
- https://youtu.be/FQ_2N3AQu0M?si=LKS4CERFFGZtseCS
- https://en.wikipedia.org/wiki/Quartz
- https://www.exhypothesi.com/clocks-and-causality/