Consider the case of an
event with
50% probability, such as a coin toss or an incoming random bit. This serves as a nice
unit for entropy. Let’s call it one bit of entropy. Then
n incoming independent random bits should have
n bits of entropy. Given that
n incoming independent random bits have a
2−n probability for each outcome, this motivates defining self-information to be
−log2P(ω)