Sunday, October 24, 2021

Discovery Processes

Introduction

A Discovery Process is a piece of work that takes approximately zero time to do the second time one has to do it. In other words: something that is extremely easy once you know how to do it. Modern mental work abounds with discovery processes, such as finding specific information, finding out what is causing a problem, and solving abstract problems. It is a natural consequence of the computer and the internet: if the result of one's work can be copied without cost, then it only needs to be done once, ever. Since it is only useful to know how long a specific Discovery Process takes to complete when we have exactly zero data on it, it is inherently difficult to estimate the time it will take. This creates problems for those of us who want to produce good work reliably, as well as for those who are dependent on it. In this post, I will analyse some consequences of the abundance of discovery processes for modern organisations, and for individuals. 

Examples

Let's first hone in on the definition so that we know what we are talking about. A simple test as to whether a given task is a discovery process or not, is to ask what would happen if the work's deliverable was lost. Suppose a craftsman makes a beautiful table. The table itself is the deliverable of the work. If the table is stolen or destroyed, replacing the table will take almost as much time as making the first table. I say almost as much time, because I'm counting on some learning from the craftsman [1]. The bigger the "design" part is of the tablemaking, the less it matters that a single table is lost. Compare with a program: most of the work in programming is finding out the right way to do something. If the code is lost (immediately after the coding is complete), it will not take nearly as much time to rewrite it. This applies more the more high-level the programming language is. Somewhere in between we find classic engineering, where a prototype or production process is developed and set up. Having to start over that work would be frustrating, but it would probably take at most 50% of the time the second time around. This pattern (carpenter, industrial engineer, programmer) gives us a clue for how work will develop in the future: all work, that is not about maintaining relationships or signalling, will become more and more like an ideal discovery process. If this is really true, it seems extremely important for us to start making sense of the properties of discovery processes. 

Estimating remaining time

I propose a simple model for estimating the remaining time of a discovery process: if one has worked X hours on it, then on average it will take another X hours to finish. A bug that has evaded the programmer's attacks for 1 day, will on average take another day to resolve. A conjecture that remains unproven despite 10,000 mathematician-years will take on average 10,000 more mathematician-years to finally crack (though I expect that many years from a single mathematician will pay off much better than a few years from a great many, who will have to redo a lot of thought-work between each other). 

Caveats

When does this not apply? It does not apply if we know that the right answer hides in one of a finite amount of places, and we have to do some nonzero amount of work for each place (such as compiling, restarting, or moving a lot of data). In those rare cases do we ever find ourselves saying something like: I have worked on this problem for 4 hours, but now I am certain that it will not take more than another 2 hours. These are really rare cases, and they're actually not good signs. A team lead who hears that a piece of work is certain to finish in a known amount of time is probably happy to have a solid figure for once, but they shouldn't be! The fact that one's organisation can know how exactly to do something but not actually be able to do it (yet) means that the work is not progressing at the speed of thought! It means that the bottleneck in development is not the most expensive part (developers' brains) but a relatively cheap part (computers) [2]. When I hear that a software project will take another 2 years, it makes me think that either their organisation is bottlenecked by the wrong things, or that they cannot be certain that the project is even possible. 

Another example when we need to locate a specific piece of information in a book. After the "shortcut" approaches have been tried, such as grepping for keywords, googling for a quote, or trying to narrow down the search by looking at the table of contents, we know that finding the information (or realizing that the information is not in the book) can take no longer than it takes to read the entire book from cover to cover. However, this is significantly longer time than it takes to try all the shortcut approaches, so we're better off starting with them. 

Consequences of having a job dominated by Discovery Processes

  • Constant uncertainty about when tasks will be finished. This is actually a good thing for the slacking employee: uncertainty of hardness of tasks makes it difficult for the employer to know how to price it. In the other extreme, highly repetitive work, the employer has very good information about how long tasks take and can therefore set a price just at market price for the class of workers who can perform the task. The information asymmetry should make us expect that companies are very hesitant to start paying for new software projects, especially ones that sound a bit weird. The result is largely the state of software consulting that we see today: well-paid consultants working less-than-optimally on software projects that seem safe and doable to a big organization. 
  • Divide and conquer is golden. Any insight about how a discovery process may be divided into independently solvable parts can drastically reduce the time is takes to solve it. 
  • Knowledge is valuable. Having a large toolbox of general solution methods to be thrown at any new discovery process can make a developer much more productive.
  • Experience is valuable. Having heuristics for discovery process in one's field can help with prioritizing solution methods. 
  • Curiosity is valuable. Being mentally malleable enough to reshape one's mental models on the fly when working on a new discovery process can make up for lack of experience. 

[1] This gives us an alternative definition for an (ideal) discovery process: it is work that has an infinitely steep learning curve. 
[2] This does not apply for industries that are limited by their tools, such as large-scale machine learning, super computing, or astronomy. 

Saturday, July 17, 2021

History of the Electric Scooters

At this point, in July 2021, it has become clear that the electric scooter is here to stay. I will refer to the vehicle as a 'scooter' in this post, for brevity. It seems likely that this shorter name will become more common, as the vehicle itself becomes more common relative to the lightweight moped vehicle that is already called a scooter. 

A scooter.


Initial wave

The scooter had its breakthrough in 2017. The first successful commercial application was scooter sharing. The business idea was straightforward: buy a few hundred scooters from China and place them without permission in a city one night. The user downloads an app on their phone which can be used to unlock the scooter. When finished, the user could initially park the scooter anywhere. The service quickly became popular among users. The author can't recall a single city where it flopped. Early companies were Lime and Spin in California. Copycats soon followed; Bird (California), Tier (Berlin), and Voi and Moow (Stockholm). There were many more in other cities. 

Business model

The economics were a bit unsure, however. A scooter purchased from China cost $300-$500. Initially, the price was $1 for unlocking and $0.15-$0.3 per minute of subsequent usage. Vandalism and theft of the scooters quickly became rampant. The author remembers seeing a figure that the average lifetime of a scooter was only 28 days. With an average travel times of 10 minutes, this would mean that each scooter would have to be rented about 5 times per day throughout its lifetime. The companies also had to charge the scooters. Lime did this using gig-workers called 'juicers'. According to the Lime homepage, a juicer is paid $5 for a full charge. Since a full charge was enough for about 60 minutes of usage, about 40% of the minute-fee was eaten up by the juicers. Business models diverged after a while. Lime introduced longer-range scooters, reducing the cost of juicers. Voi introduced a 30-day unlimited ridership pass in the summer of 2020, costing $60 for Stockholm and less for smaller Swedish cities. There were also scooters without the unlocking fee but with a higher per-minute fee, for short trips. 

Regulation catches up

There were many complaints from other road users. Mark Wagenbuur of BicycleDutch, a youtube channel, complained that the scooters were encroaching on space meant for bicycles in a video. Anecdotally, the scooters were used rather recklessly, particularly by teenagers. They were not only used for simple transportation, but also for urban 'sport'. Parked scooters were often in the way on sidewalks. Another common complaint was safety concerns. Most fatal accidents included collisions with motor vehicles, and were similar in kind to fatal bicycle accidents, i.e. occurring when both motor vehicle and scooter are travelling in the same direction but the motor vehicle makes a turn across the lane of the scooter. In Ontario, scooters and other similar vehicles were illegal even before the scooters appeared, due to a preexisting blanket law. In May 2021, the Toronto city council voted unanimously to uphold the ban. Stockholm, as of Spring 2020, had introduced geographic restrictions on riding, high speeds, and parking. The speed of the scooter on a plain surface was limited to about 25km/h from the beginning, due to technical constraints. It was however possible to achieve a higher speed when going downhill, but the user had to promise not to ride the scooter downhill before starting a ride (this was not followed, of course). As of Summer 2020, the scooter would automatically brake when going downhill or being kicked forward, limiting the speed to about 22km/h. The United Kingdom established public tenders for the right to do scooter sharing in several cities. In November 2020, the tender was awarded to three operators. 

Prior systems

Bikesharing had been proposed and tried several times since 1965. In European cities, the concept took hold as municipality-supported initiatives to promote less car traffic starting around 2010. These systems featured fixed locations for parking the bicycles. The cost of a seasonal pass was usually over 10x less per day than a single-day pass, implicitly subsidizing commuters at the expense of tourists. In Malmö, a day pass cost 72 SEK and a 365-day pass cost 250 SEK in 2018. This system was not a complete failure, and may very well have encouraged a few people to leave the car behind. However, it suffered from being overused on some distances, and underused in other places. The user had to worry about not being able to park their bicycle when arriving at their intended destination. There were also several private bikesharing companies, such as Donkey Republic from Copenhagen, started in 2014. This business model was identical to the later scooters, except for the vehicle being a bicycle. It was also much less successful, despite having much lower prices, with 30 minutes costing €2.2, compared to €7 for a scooter (assuming €1 for unlock and €0.2 per minutes). The un-electrified scooter had long been present on the market, under the name 'kickbike', however it was used mainly as a toy vehicle for kids, and for sport by teenagers, similar to the skateboard.

Normalization

The scooter sharing led to social normalization of riding a scooter in public. I saw the first privately owned scooters in Vienna in April 2019, ridden by geeks. As of July 2021, it is a common sight on bike lanes, and sidewalks and car lanes too. People (always teenagers) even ride their privately owned scooters inside supermarkets and the metro. I expect scooters to be banned soon from such places. This would present a problem for scooter owners, since it is not obvious how to lock the scooter when going inside a place. Special scooter parking spaces with locks may appear, and perhaps future models will feature better physical security. A standard scooter costs $300-$700, which is about the same price as a bicycle. 

Related vehicles

A recent addition to the ecosystem of vehicles is scooters with a small seat in the back part of the standing area. Another type of vehicle that has become popular is the 'fatbike', which looks like a crossbreed between a scooter and a motorcycle. It is electric and has very wide wheels. A benefit of this type of bike seems to be that it can climb sidewalk edges with comfort. The fatbike is the SUV of the bicycle world, a heavier vehicle that has the advantage in the case of a collision with a regular bike. The standard electric bicycle has also become popular in recent years, especially among the elderly. An electric bicycle costs about twice as much as a regular bike. 

Friday, July 16, 2021

Media-driven political change: centralized vs decentralized

I'll outline why the 20th century centralized model actually has some advantages over the 21st century decentralized model. I'll also present an idea for what can be done within a decentralized model to improve it.

Some political topics are hot for years. Tax. Immigration. Climate change. They are necessarily controversial: the only way a topic that many people pay a lot of attention to can stay that way, is if there are heated opinions that contradict each other. More marginal topics tend to be asymmetric: on one side is a highly opinionated small group, on the other side is a large group that can't be bothered (perhaps because they are busy caring about their own marginal topics?). A marginal political issue has three possible outcomes. One, the small opinionated group runs out of steam, and the issue dies. Two, the small group manages to push their issues to the agenda of the majority, which yields. Three, another small group is formed around the opposing view of the first. The topic has become controversial. Now let's see how media affects the possibility of the small group of pushing their issue to the majority agenda. 

Centralized news: the issue can be known by insiders (journalists) for a long time, but not published due to lack of interest from the public. Lots of fuel, no spark. Suddenly, an igniting publication leads to a lot of material being published in a seemingly coordinated effort. The issue occupies people's attention for a short time, relatively likely to lead to change due to discomfort of gatekeepers of being scrutinized. The journalists involved have much to cover, so may be relatively objective on most topics. But also less well-read.

Decentralized model: issues immediately published. Over time, Schelling points for complaints are defined. Difficult for gripers to prove objectivity, outsiders tend to assume that they are one-sided. Even if opposite gripe group exists: the average of two extremes is not necessarily good policy. Some of the involved people have really seen a lot of data and may have come to a radical solution, others who propose radical solutions are just biased. The point is that it's difficult for outsiders to tell who is who. 

What can we do to make the decentralized model better, relative to the centralized? The main scarce resource is the attention and understanding of the majority (the majority is not stupid, they may just be busy with their own hobbyhorses). Given that public attention for your topic will be short, intense, and initially very uninformed, some actions follow:

* Produce content that summarizes the topic and position to outsiders, even when almost no-one is paying attention. This content will help people quickly understand your case once the topic becomes hot. 
* It is important to post often. Outsiders who are looking into your subject are pretty much by definition people who care about recent events. The most recent post should not be older than a couple of weeks. Could be a good idea to keep an explanatory post in storage to be published at the right time. Or as a pinned post.
* Make it very easy for outsiders to find out what your policy suggestions are. Most of the gripe during the culture wars has been on the form "You should care more about group X", which doesn't fit this format.
* Since there is a very high bit cost for including exceptions and modifications to your suggested policy, the best strategy may be one of Max-Min. That is, assume that the policy will be misunderstood, perhaps deliberately, so pick a suggested policy which will be beneficial even in its dumbest implementation. 

When is it possible to predict the future?

In a world where the important data is widely available, it is possible to predict the future to the extent that change is limited by processes that are less smart than individual people. The proof is straightforward: if individuals are not operationally constrained by dumber processes than themselves, then they can create value by changing things using the available data. If however, individual freedom is constrained by dumb processes, then there might be a situation where everyone knows what the future will be like, yet no one has the power to change it. 

What processes are dumber than people? Evolution [1], top-down organisations where insights from low-level employees can't propagate up to management, markets with high thresholds to entrance. Such a world will seem to be changing slowly from day to day, even if a lot is changing over the years.

In a world with a very efficient market for starting companies, there is only a thin sliver of opportunities that are imaginable, but unexploited. Such a world will seem to be changing very fast from day to day, even if the fundamental values are not being improved much over the years. 

What has been said: 'dumb' change is more predictable than 'smart' change. However, if we want to tell whether things are improving, we should just look at what is happening to the things that we care about, rather than trying to gauge it based on how fast the world seems to be changing.

[1] Is outsmarted by e.g. selective breeding. 

Tuesday, June 1, 2021

Unwritten rules

Unwritten rules are unwritten by design. Their purpose is not to enforce coordination for the positive effects of coordination, but to distinguish who is popular. Popularity is a zero-sum game, so making the unwritten rules accessible will create an incentive to define new unwritten rules. 

Monday, May 24, 2021

The Amnesia Test

The Redo Test asks how much it would cost to redo a task, given that everything except the worker's memory of doing the task is lost. The Amnesia Test uses the opposite scenario: what if the worker forgets everything they learned from doing the task, but everything else is unaffected? How much would it cost to recover the worker's memory, given that the other results of the work remains? The Amnesia Cost Ratio (ACR) measures this as a percentage of the cost of doing the task the first time. An ACR of 100% means that nothing is useful without the worker's mental state, an ACR of 0% means that the worker can walk away (or be kicked out) without any loss of value, for the given task. 

Some examples:

1) You are a junior engineer working for a company that produces doohickeys. In evaluating a new prototype doohickey, a senior colleague asks you to check whether the prototype manifests "overextended flimming". You start writing a script to provoke overextended flimming, but get interrupted by a more pressing task. When you get time to work on the flimming again, you look at your script and draw a complete blank as to what you were trying to do. It is faster to just rewrite the script from scratch, rather than to salvage your old half-finished idea. The ACR for this (partially completed) task is 100%. 

2) You are the best welder in the doohickey factory, with 15 years experience. Every year, you have become 10% faster, which means that you can now weld a doohickey about 5 times faster than a new recruit. One day, you are hit by a tram, causing you to forget everything you knew about welding, so you have to start again. Let's assume that wages for doohickey-welders are always the same, and that learning saturates after 15 years. How many more doohickeys would you have welded in the following 15 years if you hadn't been hit by the tram? About twice as many. The ACR for this task is 50%. 

Number of iterations until saturation Improvement per iteration ACR
15 10% 50%
1 30% 30%
2 30% 40%
5 30% 61%
100 2.25% 61%
10 2.25% 12%

Difference between KCR and ACR

One important difference between the KCR and the ACR is how the incentives of employer and employee are aligned. The employer wants a high KCR, since it means that the capital is safe against loss of physical inventory, or disk crashes. The employees also want a high KCR, since it means that they themselves have increased their human capital; they could now do similar tasks faster. There is however a conflict between the employers and the employees when it comes to ACR. The employees want a high ACR, since it gives them leverage; they could leave and remove value from the company. The ideal situation for an employer would be a company whose main capital is an idea or a culture. Once understood, the idea or the culture can be used to create value for the company, but an employee cannot remove the idea or the culture by quitting. The issue with having an idea or a culture as the main capital is that a competitor might try to copy you. 

Consequences of the Redo Test

A task with a low Knowledge Cost Ratio (KCR) is by definition less robust to events that destroy all the results. The KCR decreases as the task is done more times, which decreases the robustness to unexpected loss. This gives a rationale for knowledge worker's reluctance for repetitive tasks. Since the KCR decreases with the task being done more times, we should expect that different people do the task later on. In the extreme case, the task gets automated. 

Some activities such as advertising and quality control are expensive and do not produce a physical product, which makes them superficially similar to knowledge work. However, they are not knowledge work according to the redo test.

Another consequence happens if you introduce procedure that requires artifacts from knowledge work to be saved, or else the work counts for naught. In that case, a lot of work may have to be redone unnecessarily.

The Redo Test

What is knowledge work? The Redo Test provides one answer, using a thought experiment. 

Suppose you have just completed some work. Now suppose that every result of this work disappeared, except your memory of it (including episodic, procedural memory, etc). How much would it now cost to redo the work a second time, as a percentage of what it cost to do the first time? Let's call this the Redo Cost Ratio (RCR). Let's also the define the Knowledge Cost Ratio (KCR) as 100% minus the RCR. A KCR of 0% means that there is no knowledge element to the result of the work, a KCR of 100% means that the only valuable result of the work is knowledge. 

Some examples:

1) You make a soup. You have made this soup a hundred times before, so you always do it in exactly 90 minutes. When you are on your way to serve the soup, you stumble and the soup spills all over the floor. Redoing the soup would take 90 minutes, so the RCR is 100%, and therefore the KCR is 0%. 

2) You find a bug in your software. You debug for 20 minutes, and finally find the cause. You implement a fix, which takes 5 minutes, but the source code for the fix is lost when you forget to save. Luckily, you still remember the cause of the bug, so you only have to rewrite the 5 minute fix. The KCR is 80%. 

3) You publish your first app to the App Store. You try 10 different advertising tactics, before one succeeds, and the app finally takes off. All of a sudden, your app is banned from the App Store due to an algorithmic mistake, with no ability to appeal. You have to republish the app under a new name and logo. The second time, it only takes a combination of 3 advertising strategies before the app takes off. The KCR is 70%. 

4) You represent a large company that releases one of your products in a new country. You spend $1 billion setting up: recruiting marketing and salespeople, renovating office space, etc. They then spend $1 billion per year for 9 years building and maintaining your brand using TV and magazine ads. Suddenly, an unfounded rumor causes your brand to become permanently socially undesirable in that particular country. You still have the people and the real estate, but you have to spend another $9 billion over 9 years to build the brand of one of your other products. The KCR is 10%. You have not performed knowledge work, but signalling work. Edit: depending on how you define it, KCR can also be 0. Every result of the work should be erased except the memory, and that includes the hires and real estate. 

5) You inspect a toy for safety hazards. The toy passes all your tests, but you lose the quality assurance documentation files on the computer. Your knowledge that the toy passed the tests are worth nothing if you couldn't present the files to the meta-inspector. The KCR is 0%. You have not performed knowledge work, but bureaucratic work. 

6) You are a junior engineer working for a company that produces doohickeys. In evaluating a new prototype doohickey, a senior colleague asks you to check whether the prototype manifests "overextended flimming". You take the prototype out for a spin, but forget to press 'record session'. The flimming is perfectly normal, however, and you tell the senior engineer as much. When you admit that you forgot to record the session, your colleague answers "That's OK, at least we know that we don't have to prioritize the flimming". The KCR is 100%. 


Thursday, April 29, 2021

Which Engineering Project Gets Done?

The one that pleases all essential stakeholders for the lowest cost. 

Suppose you are an executive with real power over starting new projects, i.e. you have an R&D budget. If you are in the middle of the organization, then you are held responsible by your higher-ups, and to a lesser degree to neighbouring teams. If you are at the very top of your organization, then you are held responsible by your shareholders, the media, regulators, etc. So it's safe to say that if you ever find yourself in a situation where you have a lot of money and talent to direct towards a new project, then there will be a lot of people who have a say in what you with those resources. 

Every stakeholder who has the possibility to veto your project, creates a constraint for the final product. Typically, this will create an overdetermined problem; the stakeholders' wishes are so contradictory that a universally pleasing solution is impossible, but some tradeoffs have to be made. This is the difference between engineering and art: engineering deals with overdetermined design problems, art deals with underdetermined design problems. 

The over-determinedness means that if there happens to exist a solution that pleases all stakeholders and is within the budget, then there isn't much wiggle room to experiment. 

Another consequence of this is that decreasing costs have a direct impact on which projects are at all possible

Thursday, January 7, 2021

Internal and External Explanations

I would like to make a distinction between two kinds of explaining a thing. The thing can be both physical and abstract. An Internal explanation tells us how the thing works. This can be a mathematical definition, a blueprint, program code, or a flowchart. In theory, this is a complete description of the thing. This is also often the shortest way of explaining it (but not always). When we want to figure out something really nontrivial about the thing, we almost always have to look at the internal explanation. The internal explanation is the one that should be remembered. There is definitely a lot of power in getting the habit of checking internal explanations, as many people are reluctant to. 

However, the internal representation is not very good for transferring knowledge. It may be just a human thing, or perhaps it's a universal quality of knowledge, that the shortest explanations aren't very enlightening on their own. We often need some context for the knowledge to really stick. That is why we need the External explanations. The external explanation tells us how the thing connects to the rest of our world. It can tell us which problem the thing is meant to solve. It can reveal why the assumptions used, are exactly the assumptions needed. If the person giving the external explanation seems to be very vague and unsure, perhaps they themselves have not thought it through enough, and will not be taken seriously. A person who gives long external explanations however, is often extremely appreciated by their audience. I can't say the same thing about a person who presents long internal explanations. So there is also a lot of power in mastering the external explanation, both for personal and for social reasons. 

I have presented a distinction between two kinds of explanations, and what they are good for. I hope these concepts are useful to you. 

Friday, January 1, 2021

Biasology

Independent thinking can't be imitated with an attitude. For people who need to think professionally, doing one's own thinking is cognitively cheaper than constructing a patchwork of attitudes and counter-attitudes. It is wise to copy other's data and arguments, but not their attitudes.

One of the things that most stuck with me in 2020 was a quote from Stefan Schubert's twitter (2020-03-16):

"The coronavirus crisis displays the limits of "biasology" - arguments for over- or under-reactions by reference to biases.
It's too easy to make up a just-so-story we're biased in this or that direction. You should primarily look at the object-level facts about the virus."[1]

References:

[1]: https://twitter.com/stefanfschubert/status/1239518892621471744?lang=en