Algorithms to Live By

By Brian Christian and Tom Griffiths

Sep 13, 2020

Summary & Takeaways

Algorithms to Live By showed how common solutions from computer science could be applied to everyday problems. I especially enjoyed it because even though I have been in software development for the past 8+ years, I never had formal training in computer science. This has always been a sore point for me and is one of the reasons I hate engineering interviews so much because I thought I would do so poorly on those questions. Despite trying many other methods to learn about these algorithms, nothing really clicked for me. That was until I picked up this book and started reading it. Things finally started clicking for me and now I feel much better about my knowledge in some of the common algorithms used in computer science, particularly sorting algorithms and Big O notation.

Notes & Quotes

Optimal Stopping

Look-Then-Leap Rule (or the 37% rule)

The way it works is that you specify a specific amount of time to explore potential options (it could also be done by specifying the # of options instead) where no matter what, you don’t choose any of those options. Then after you’ve surpassed your predetermined amount of time (or #) you chose the first option that is better than the best option you already have seen. The line for how many options or how much time until you start to look for the best option turns out to be about 37% for anything with more than about 3 options to consider.

So if you are considering choosing something from a pool of applicants like in dating or job seeking, then once you’ve seen 37% of all applicants (or determine an amount of time to dedicate and take 37% of that) you should choose the next applicant that is better than all the others you’ve seen so far.

Don’t look back

Even if it is possible to reconsider prior offers or options, don’t do it. If they weren’t good enough for you then, they’re not going to be good enough for you now.

Explore vs Exploit

The interval matters

When trying to decide between trying something new and sticking to what you know is best, the time interval matters. You should explore when you have time to use the knowledge you gain and exploit when you’re ready to use that information.

Exploring in itself is valuable

Exploration is valuable itself since trying new things increases our chances of finding the best.

Trying and failing

“To try and fail is at least to learn; to fail to try is to suffer the inestimable loss of what might have been.”

Upper Confidence Bound

We should seek out things and people with the highest potential. As we gain more information about them, we get a more accurate picture of its potential. But we should always consider the possible potential and air towards optimism until we discover otherwise.

Handling a fast moving world

When things are constantly changing and moving, we should choose to explore more than exploit.

“To live in a restless world requires a certain restlessness in oneself. So long as things continue to change, you must never fully cease exploring.”

Lean towards the new, especially early in life

“Over a lifetime, you’re going to make a lot of decisions. And it’s actually rational to emphasize exploration---the new rather than the best, the exciting rather than the safe, the random rather than the considered---for many of those choices, particularly earlier in life.”

Sorting

Scale is the enemy of sorting

The more things you need to sort the more effort and the longer it is going to take.

Mergesort is the answer

Compared to other sorting algorithms like Bubble Sort and Insertion Sort, Mergesort does not have a quadratic runtime---O(n2). Instead, its runtime is linerarithmic---O(n log n).

Mergesort works by splitting items up into n groups (where n is the number of items). Next, you merge consecutive groups by sorting them. You continue to do that until you are left with one group.

Bubble Sort works by going through a list starting from the beginning and sorting neighboring items. You continue to do this until the entire list is sorted. Its runtime is O(n2).

Insertion Sort works by going through a list starting from the beginning and inserting the item in a sorted list one at a time. Its runtime is O(n2).

Err on the side of messiness

Sorting something you will never search is a waste of time and effort. So it depends on how much searching you are going to be doing to evaluate whether or not it is worth taking the effort to sort. Also, when searching because really easy to do, sorting is less valuable.

Simple sorting algorithms are usually best

Looking at nature as an example, we can see that simple sorting mechanisms work best—like animal hierarchies. There is less fighting in animal hierarchies that are very clear—like where the bigger the fish, the more dominant it is.

The rat race might not be that bad

Despite us all hating the rat race where we compete with one another endlessly, it’s much better than constantly fighting which takes place in other animal species. Having ways to sort ourselves out that everyone—or at least most—can agree on saves a lot of bloodshed.

Caching

Least Recently Used

The best strategy for setting yourself up to easily retrieve the information you are most likely to need next is to always put the item you used last at the top of the list.

Key to good human memory

It’s to predict which items are most likely to be needed in the future. This is where the LRU algorithm works well too.

Scheduling

You must make your goals clear

Your optimal schedule depends on your goals, so in order to understand how you should schedule yourself, you need to understand your goals.

Earliest Due Date

If you’re most concerned about reducing lateness, then the best strategy is to do the task that is due soonest.

Moore’s Algorithm

If you want to reduce lateness but don’t think you’re going to get to all your tasks in time, you should start out by using the Earliest Due Date method but as soon as it looks like you won’t get to the next task, pause and look at the schedule you set out and get rid of the biggest one—the one that would take the most amount of time to complete.

Shortest Processing Time

If you want to reduce the overall amount of completion time it takes to do a set of tasks then you should always do the quickest task you can.

Priority inheritance

If there is a low-priority task that is blocking a high-priority task then that low-priority task assumes the priority level of the high-priority task until it is completed and/or no-longer blocking.

Best general purpose approach to scheduling

The best general purpose approach is to use the weighted version of Shortest Processing Time. The way it works is that each time a new piece of work comes in, divide its importance by the amount of time it will take to complete. If that is higher than the task you’re currently on, switch to the new one.

Beware of large switching costs

The problem of having any schedule that requires lots of switching between tasks is the cost of switching. Highly creatively tasks like writing or programming require a person to keep in their mind the entire state of the system and therefore has high switching costs—where it takes awhile just to get oneself up to speed again before really getting into the work.

One remedy to the cost of context switching is to set a minimum amount of time required to spend on any one task.

Note: This is actually a similar solution to that recommended in Peopleware—where the issue of context switching comes up when offices have interruptions. They measured how much time each day employees thought they got of uninterrupted quiet time and suggested that there be some amount to aim for for a productive environment.

Be as minimally responsive as possible

Another way to counteract the costs of context switching is to determine the minimum amount of responsiveness you require—in work, in personal life, etc.—and be no more responsive than that. The rest of the time you should stay on a single task as long as possible.

The benefits of meetings

One of the benefits of scheduled meetings is that they are scheduled and can reduce the amount of disruptions otherwise, which therefore reduces the amount of context switching, which can boost productivity.

Bayes’s Rule

Laplace’s Law

If you want to figure out what the odds are of you winning something, without knowing the contests specifics—like the total amount of entrants and winning drawings—one way to guess at it is to use Laplace’s Law, which says to take the number of possible winning drawings w in n attempts, then simply add one to wins and divide that by the number of attempts plus two, or w+1/n+2.

Bayes’s Rule

Bayes’s Rule is the idea that you start with some probability of an outcome happening and as you gather more data and evidence you update your probability of that outcome happening given the new information, but still considering the initial probability.

To combine preexisting beliefs with observed evidence we can just simply multiply their probabilities together.

Power law vs normal distribution

Things in life typically fall into two categories: things that cluster around some kind of “natural” value, and things that don’t. The normal distribution is for things that cluster and the power law explains a lot of the things that don’t.

One of the quirks in the observation of events that follow a normal distribution is that things we see as happening early—ex. young people dying—we see as unexpected, but things that we see as happening late—ex. old person dying—we don’t see as unexpected. Things happening later than expected tend to make us think they are “overdue” and therefore the longer we wait for it to happen, the more we expect them to happen.

Understanding the distribution can help us

We can better understand the odds we face in things if we understand what distribution the outcome falls into.

Overfitting

A better fit to the data is not always better

One counterintuitive point about building models from data is that the more features of the data you include in your model, often the worse the model becomes at predicting future data. Adding many factors will make the model fit to the data you already have very well but it won’t necessarily help you to predict future values. Therefore, considering less features of the data can be better at predicting future outcomes.

Danger of what you decide to measure

One of the side effects of choosing what you measure in a business—things like KPIs or OKRs—is that a business will end up maximizing for whatever you choose to measure. This can be a good thing—if chosen carefully—but it can also lead to unexpected results too.

How to overcome overfitting

One way to overcome the challenge of overfitting is by penalizing complexity. A method used to do this is called Regularization, which adds more information to your data that penalizes complexity. By using regularization, complex models need to do a really good job in explaining the data to justify the complexity.

One algorithm for regularizing data is called the Lasso and it works by reducing the weights of factors that have little impact in explaining the data---to close to zero---and over emphasizing the ones that have a big impact. This has the effect of reducing a model that considers 9 factors to one that may only consider 2 or 3.

Nature as a model in dealing with overfitting

Metabolism acts as a sort of brake on complexity in nature. The more complex a structure in nature, the more calories it requires. So something like the human brain has to be efficient and be worth the added burden of calories required to keep it functioning. Therefore, a larger and more complex brain might not actually be more beneficial and we’ve likely ended up with a pretty optimal sized brain.

Even brains themselves use penalties for complexity in order to function more optimally. For instance, brains try to minimize the number of neurons that fire at any given moment.

Note: This also is true in memory, where the more complex a topic or sentence or word is, the harder it is for us to remember. This is why things are often reduced to their simplest terms—elevator pitches—in order to be more rememberable.

Slow it down

Another way you can reduce the complexity of a model is by reducing the speed with which it adapts new information.

Note: Again, nature can be a good model for how to use this concept. In evolution, species adopt new traits very slowly because it makes them more robust and therefore more likely to survive and reproduce. Organisms aren’t defined by their current environments but rather their histories. If they instead tried to adopt to every new environment very quickly, it would make them more vulnerable to future changes in their environment, or overfitted to the current environment. Being constrained by the past makes us less than perfect for the present but more robust for the future we don’t know is coming.

Relaxation

The idea around relaxation is that some problems are too complex to have a perfect solution so instead of trying to solve for the perfect solution, we should instead relax the features of the problem and solve that instead. Most of the time doing this will get us very close to the optimal strategy anyway.

Solve an easier problem

Sometimes the best approach to solving a challenging problem is to solve an easier version of it. The solution to the easier problem could be a starting point in solving the bigger challenge.

Lagrangian Relaxation

The idea of Lagrangian Relaxation is that you create a scoring system that takes into account the rules of the game. For example, if trying to optimize for the best seating arrangement at a wedding, you could relax the restrictions around the maximum people allowed at a table and instead just use a penalty for space per person.

Monte Carlo Method

Instead of calculating the exact probabilities of something using complex math, you can instead just evaluate lots of sample outcomes. The more trials you run, the closer you can get to the actually mathematical probabilities, without having to do much math.

“A close examination of random samples can be one of the most effective means of making sense of something too complex to be comprehended directly.

Note: I used this method to take a closer look at poker. The results turned out to be very accurate and where there was a mathematical probability, it often was almost an exact match just based on using 1,000 or 10,000 simulations.

Randomness

Stimulate creativity using randomness

One way to stimulate creativity is to introduce a random element.

Note: This is similar to the concept I learned in a sketch comedy writing class. The concept is that you try to list out as many people, places, and things you can think of and then go through the random lists and see if you can make any funny concepts by connecting them.

Sometimes randomness is needed to get unstuck

One of the ways people get stuck in work or in life is that they reach a “local maxima” rather than the overall maximum. In computer science, the way to overcome getting stuck at a local maxima is to add randomness. In life and work therefore, we may benefit by adding some randomness to make sure we aren’t stuck at local maximas.

Note: Chris Dixon wrote about this problem and solution as it applies to a career in his post Climbing the Wrong Hill.

Networking

Dynamic hierarchies

Companies typically fall into flat or tall hierarchies but instead, they might be better off considering dynamic ones. The example is using the Additive Increase, Multiplicative Decrease strategy where instead of advancing traditionally by “climbing the corporate ladder”---where everyone ends up doing something they are bad at (the Peter Principle)---employees advance slowly when they are doing well and get dropped more when they underperform to put them back to a place they should ideally be very good at.

Feedback is key in communication

There is no such thing as one-way transmission in networking. Consistent feedback is needed otherwise communication will be lost.

This plays out in even things we think of as forms of one-way communication like storytelling or plays. Try telling a story to people who aren’t listening, it is a terrible experience. It only works well when others are giving you feedback and listening to what you have to say. This is important to remember in the other direction as well. Be a good listener, it makes it more enjoyable to talk with you.

Fixing latency will improve services

In today’s world, where everyone is constantly connected, it is more important than ever to reduce latency to as small as possible. No one enjoys being on a phone call where you hear someone that is delayed or playing an online game where things buffer. Fixing latency should be top of mind for engineers of the future and a host of new businesses and applications might come about by taking advantage of new low latencies.

Game Theory

Complete anarchy vs Perfect design

It’s been shown that systems of complete anarchy only do about 33% worse than perfect top-down coordination. This might explain why decentralization can work so well despite there not being any one entity coordinating everyone and everything.

It should be noted that this does work in both directions. For example, traffic might only improve by 33% from what it is today, even with every car being an autonomous vehicle and coordinating with each other. Though maybe the 33% only applies to human designed systems?

Change the game rather than the strategy

Sometimes we play in games where there is no great strategy for any of the players—something like Prisoner’s Dilemma. If we find ourselves in this kind of a game, it might be better if we instead try to change the game we are playing than seek out the best strategy.

This also works in the opposite direction if we are trying to figure out what rules of a game will give us our desired behavior.

Note: When it comes to governments, arguably the United States and Singapore do this sort of design the best. Singapore in particular seemed to consider the design of the system very closely at its founding when coming up with the rules they eventually put in place.

The unlimited vacation problem

Note: One relevant problem that I’ve come across that relates to game theory in startups is the “unlimited vacation” policy. This often causes employees to actually take less vacation than they would if it was scheduled because they are trying to show how much they care relative to others.
A common approach for a solution to this problem is to pay employees to take a vacation. However, the problem isn’t that vacations aren’t attractive to most employees, the problem is that everyone wants to take slightly less vacation than their peers, which still leads to a bad equilibrium. Instead, a better strategy might be to punish those that don’t take a vacation, perhaps by fining them some amount of salary and requiring a minimum number of days to take rather than maximum.

Emotion as mechanism design

Our emotions can actually serve as a sort of mechanism design. Revenge almost never works out for the person who seeks it, however, people still often seek it out and because of this, the system as a whole is better off (if others know that someone might seek revenge, they may be less likely to cheat them).

Note: This is somewhat similar to Charlie Munger’s quip about the Navy’s system of leadership, where the system on an individual is very unfair—captains may lose their command if someone else underneath them makes a mistake—but overall the system creates more fairness—less likely to have issues in leadership and passing the blame.

The problem/advantage of the crowd

There are examples of a group of people all behaving completely rationally and perfectly appropriately, yet still falling prey to infinite misinformation, otherwise known as an “information cascade”.

People often discount their own information when they see others acting a particular way. They often think that the others have some information that they themselves don’t have and therefore end up dismissing or devaluing their own information. When many people in a group do this, it ends up negatively affecting the groups ability because everyone discounts the true information they have and instead base it on what they think others know that they don’t, which is just what the others are doing. This goes to show how one ill formed opinion in a group can have drastic negative impacts on the group as a whole.

A lesson to learn here is that we should allows consider our own information, regardless of others’ opinions and further, we should consider why the group may be acting a particular way or hold particular opinions.

The fault sometimes lies not with the players but with the game.

We should always be wary of cases where public information seems to exceed private information, where we know more about what people are doing than why they are doing it, where you are more concerned with your judgements fitting the consensus than fitting the facts. In cases where you look to others to set the course, realize that they might be doing the exact same thing.

These problems arise when we misinterpret what others think based on what they do.

A way to address this issue is to do the work and to trust it our ability to do the work well. Then we don’t need to think about others.

Conclusion

Computation is bad

One of the counterintuitive positions in computer science is the view that computation is bad. Great algorithms minimize the labor of thought, not maximize it. We should consider this in our own lives and seek to reduce the amount of complexity and computation we’re required to do when looking for solutions to our problems.

This helps explain the paradox of choice. When faced with more and more information, we get paralyzed and can’t make decisions. It also speaks to the concept of having limited willpower throughout the day. We should save our willpower for the things that matter to us most, not for things that are trivial. Good algorithms can help us do this.

“In the hard cases, the best algorithms are all about doing what makes the most sense in the least amount of time, which by no means involves giving careful consideration to every factor and pursuing every computation to the end.”

Reduce tension, friction, and mental labor

“One of the chief goals of design ought to be protecting people from unnecessary tension, friction, and mental labor. (This is not just an abstract concern; when mall parking becomes a source of stress, for instance, shoppers may spend less money and return less frequently.)”

Note: This is a principle I should consider when building products. Think about the burden a product may place on the user and minimize it as best as we can. Also think about the second and third order effects that burden can have on other things. We should always strive to make products that reduce the burden on people, not ones that increase it.

Book Marginalia