by Brian Tomasik
First published: 8 Jul. 2017; last update: 18 Jul. 2017

Summary

This piece enumerates some means by which writing (and other media) can make a positive impact on the world, such as expanding humanity's knowledge and boosting more important ideas into broader awareness. Thinking about the meta-level question of how writing makes an impact might help to clarify in what circumstances object-level writing is valuable. I discuss the question of whether preserving and making accessible existing content is a more leveraged way to have an impact than generating new content.

Introduction

Many people who want to improve the world do so through writing of various sorts, including academic research, popularizations of technical material, blog posts, Facebook comments, and so on. Motivations for writing may include activism, expanding knowledge, getting paid, getting tenure, signaling status, curiosity, and artistic fulfillment. There are different ways in which writing can be useful to the world, discussed below.

Of course, writing can also be harmful to the world. Obvious examples include hate speech and spreading false rumors. Speeding up certain kinds of technological progress might also sometimes be net harmful. And promoting one set of values might be harmful to an opposed set of values. In this piece, I ignore these ways in which writing might be disvaluable. The items I discuss should mostly apply in reverse to the topic of in what ways some writing is harmful.

Ways writing has impact

1. Replacing outdated writing

A relatively clean case of writing's positive impact is when a new piece of writing replaces an old one by updating it. This might include correcting errors or updating software documentation to reflect a newer version of the software. The impact of these kinds of changes is fairly straightforward because the new written document is generally a strict improvement on the old one, and since no one needs to read the old document anymore (except for historians or people doing debugging), there's not a significant increase in the total amount of material that people have to read.

2. Making objective progress on a question

On some scientific questions, it's relatively clear when progress has been made. For example, if someone proves an important conjecture, this new proof makes forward progress in mathematics. Similarly, when relativity displaced Newtonian mechanics, forward progress was made in physics. These new discoveries require new papers to be written and don't necessarily replace old papers, but there's still a fairly clear sense in which "progress" is being made.

3. Documenting more details of phenomenona

Ernest Rutherford famously said: "All science is either physics or stamp collecting." The distinction between physics and stamp collecting is fuzzy. For example, physicists "collect" new physical constants, new proofs, new equations, etc. And non-physicists formulate general theories that apply at higher levels of organization. But there's something to the physics-vs.-stamp collecting continuum. For example, some biologists spend a lot of time documenting the enormous variety of life forms that Earth contains. Sometimes these discoveries force updates to broad principles. But often, discovering new species, morphological traits, and behaviors is "merely interesting" but not transformative from the standpoint of general biological theory.

Many forms of writing can roughly be seen as stamp collecting. For example, if a person writes her memoirs, these experiences are likely to be of a similar kind as the experiences of many past people who have also written memoirs. There's an increase in the total amount of written data in the world, but there are less likely to be updates to some fundamental principles that everyone in a field knows about, in contrast to relativity replacing Newtonian physics.

4. Speculations and half-baked ideas

Speculations about physics, musings on philosophy, and so on may be written about in blog posts and email discussions, even though these ideas haven't (yet) been widely accepted by their respective communities. Such writing can form the basis for further thinking, may elicit peer commentary, and so on.

5. Clarifying one's own thoughts

Sometimes writing is useful as a way to think in a more organized manner about a topic. (In fact, that's a main reason I wrote the current piece, and it's a big motivation for many of the other pieces on my site.) In this case, writing is kind of an extreme version of "active reading" or what we might analogously call "active thinking". Writing can expand one's effective memory capacity and organize thoughts into a structure that makes them easier to analyze. The written document also provides a record of one's thoughts for later reference by oneself, even if no one else ever finds it useful.

6. Summarizing and enhancing accessibility

Given that there's too much information in the world (or even in a single topic area) for a single person to read, value can be created merely by summarizing existing content rather than making new discoveries. This is why academic review articles exist. Wikipedia, textbooks, popularizations, some news articles, and other such writings perform a similar function.

Technologies like Google, Ctrl+f text searches, natural language processing, etc. also make a huge difference in this regard. (In college, one reason I was interested in text-processing artificial intelligence was that I regretted not being able to read everything ever written and wondered if computers could help solve that problem.)

7. Signal boosting

Everyone thinks they’re right. We do, too. So we have some temptation to take our own favorite current models of AI Safety strategy and to try to get everyone else to shut up about their models and believe ours instead.

This understandably popular activity is often called “signal boosting”, “raising awareness”, or doing “outreach”.

Many forms of writing are signal boosting, such as op-eds for news publications that don't contain any novel insights, Facebook posts that share a link, online petitions, etc. If your message really is better (relative to your values) than other messages, then signal boosting can be fairly important (relative to your values).

8. Socializing

Some of the value of Facebook discussions, online forums, etc. is for participants to get to know one another. These social connections may bear later fruits if the connected people collaborate in the future, feel more welcome within a community, and so on.

9. Throw-away messages

An email discussion about where to meet for lunch is an example of writing that generally only has immediate value for getting a task done. Occasionally such writing may be useful later if you want to remember what you did in the past, or perhaps if a biography will be written about you.

10. Other instrumental benefits

Writing might be useful to signal your skills, to generate ad revenue for your website, as a form of recreation or personal fulfillment, and so on.

Expanding humanity's knowledge vs. limited reading bandwidth

Matt Might's "The illustrated guide to a Ph.D." depicts a common way to think about the value of new research: it (ever so slightly) expands the total size of humanity's collective knowledge.

One problem, however, is that people have limited time in which to read published research. This is especially easy to see if we imagine that, say, humanity's population remained constant while humans kept producing academic research for 1 billion years. Eventually there would be so many papers written that even if every human read different papers than every other human, most past research would remain forever unread by any living person. A less extreme version of the same point already applies in the present. This is especially so for the reading diets of non-specialists. For example, if a new biography of Benjamin Franklin comes out, people may read that and will not have time for the old ones. Or if a biography of Steve Jobs comes out, people may read that and won't have time to read a biography of Thomas Edison.

In some cases, it may be sufficient that only a few people read something, and those few people can alert others if what's written contains a huge new insight. For example, if someone publishes One Weird Trick to cure obesity in an obscure medical journal, and if it actually works, the rest of the world will probably soon hear about it (probably through online ads). This is similar to the way in which most of the inputs received by our bodily senses don't rise to the level of conscious awareness, but the important signals do. (There are some exceptions to this generalization both for vision processing and for academic papers, such as when aggregate summary statistics are created based on the full set of inputs/publications (Cohen et al. 2016). For example, citation counts aggregate information from roughly all published papers.)

In other cases, "collecting stamps" of knowledge may add value if the "stamps" can be organized cleanly into a database that can be easily accessed when needed. Having more items in the database adds to the value of the database without necessarily overloading users.

To some degree, search engines are like "databases" for messy text content, and if there are more total articles in the world on a given topic, you're in general more likely to get a relevant result at the top of Google's "ten blue links". That said, Google rankings aren't perfect. Thus, one could argue that writing a new, poor-quality article on a given topic is actually net bad, because it increases the probability that Google will return your suboptimal article, and people will waste time reading it when they could have been reading other things. (I hope that isn't true for the article you're reading now....)

Especially in fields where the value of new insights isn't objective and where new insights are unlikely to spread rapidly on their own, signal boosting may actually have significant value as a way to combat limited reading bandwidth. Boosting the good ideas over the bad ones helps address the problem that people can't read everything for themselves to come to the right conclusions on their own.

Old content vs. new content

In a comment that helped to inspire the current piece, Pablo Stafforini wrote:

It astonishes me how little effort is put in preserving existing content relative to creating new content. This may also have implications for [effective altruists]. Years ago, James and I spent a few hours restoring the old Felicifia forum, which had gone offline. Thanks to this effort, the product of hundreds or thousands of hours of work was recovered.

"A penny saved is a penny earned", and to some degree, "a paper preserved is a paper written". You could probably preserve orders of magnitude more papers, blog posts, forum discussions, etc. in your life than you could write yourself.

There are some ways in which a paper preserved is not a paper written, as we can see from my list of ways writing has value. For example, some of the purpose of Internet discussion forums is socializing and networking rather than just generating written content. We might picture the forum's content as an exoskeleton to be cast off once it's no longer needed, with the text left to be devoured by scavenging historians. But insofar as forum participants believe they're actually "making progress" in generating ideas, and insofar as these ideas aren't saved elsewhere, then it is a big loss for forum discussions to go offline. And the same applies to most other web content.

One example of a tragic loss of content was the website of Paul Almond, which contained dozens of insightful articles. I've tried to make the content available by linking to Internet Archive versions of the essays, but this isn't much of a solution because few people know to look for the articles on my website. Many potential readers would probably have discovered Almond's articles through Google, and when the content isn't available on a non-Internet Archive website anymore, those potential readers are lost. This is because, as far as I can tell, Internet Archive pages aren't indexed by Google. I've verified that most of Almond's articles can't be found through Googling quotes from them. (A few of the articles have extant copies on atheist forums.)

The Paul Almond case is hardly exceptional. Link rot bedevils a significant fraction of web hyperlinks. Kille (2015):

A 2013 study in BMC Bioinformatics looked at the lifespan of links in the scientific literature — a place where link persistence is crucial to public knowledge. The scholars, Jason Hennessey and Steven Xijin Ge of South Dakota State University, analyzed nearly 15,000 links in abstracts from Thomson Reuters’ Web of Science citation index. They found that the median lifespan of Web pages was 9.3 years, and just 62% were archived. Even the websites of major corporations that should know better — including Adobe, IBM, and Intel — can be littered with broken links.

A 2014 Harvard Law School study looks at the legal implications of Internet link decay, and finds reasons for alarm. The authors, Jonathan Zittrain, Kendra Albert and Lawrence Lessig, determined that approximately 50% of the URLs in U.S. Supreme Court opinions no longer link to the original information. They also found that in a selection of legal journals published between 1999 and 2011, more than 70% of the links no longer functioned as intended.

Link rot is annoying but tolerable when the web page merely moves to a new address and is still available on Google. A more serious problem is when the site disappears completely so that Google no longer returns its pages. In that case, most potential readers will never discover the content, even if it's still available on Internet Archive. (And some sites don't allow Internet Archive backups.)

As someone who generates a lot of new content myself, I've been pondering what to make of the argument that one can preserve far more than one can write. I find the point fairly compelling and intend to think more about it.

I think many of my writings can be justified under the "Clarifying one's own thoughts" point above. My website is often more like a public notebook of thoughts and links than a collection of polished, publication-worthy articles. That said, one could argue that I would achieve more by preserving or otherwise making accessible others' content than by reading and thinking about object-level issues.

Another argument against spending my life as an archivist is that I have a particular set of values, and I want to promote those, rather than promoting the collection of all things that have been written. But even if this is true, perhaps I should selectively preserve writings that I agree with, of which there are still vast quantities.

If one were to preserve existing content rather than creating it anew, how would one best do this? Internet Archive performs a valuable portion of this work, and one could donate to them. However, as far as I can tell, Internet Archive backups of other web sites don't show up in Google search results, which means the 404'ed webpage content is still effectively lost unless you already have a link to the page. I wonder if this is because if a website is still online, you don't want Google returning the Internet Archive versions over the live version of the page? But maybe Google could allow Internet Archive links for pages that no longer exist on the web? Apart from hoping for changes from Google, one can address the problem on a per-website basis by putting the content back online on its own site. This requires manual effort and paying for a domain and hosting. Also, it seems hard to scale because you presumably need permission to put someone else's content back online on a site that you own.

Archive Team is a collection of people who are frustrated by the loss of content from the web. Their site offers a number of archiving tips and describes their ongoing projects.

In addition to preserving existing content, one could make a case for work that helps people better find relevant content, such as through improving search engines, Reddit voting functionality, and so on.

It's not surprising that most people focus on creating new content, because most selfish motives require writing new things: getting tenure, adding to your CV, becoming a public intellectual, selling books, earning ad revenue on your website, and gaining status with one's peers. Learning and writing about interesting topics is also often more fun than the "boring" and unglamorous work of preserving others' content.

Wanted: A general overview of impact

One reply to the argument that preserving existing writings is easier than creating new writings could be that humanity already has more than enough writings, and what we need now are more people to digest and then act upon what those writings say. This is probably sometimes true, especially for topics that have been thoroughly studied. But in this case, the implication isn't that new writings on the topic are important; rather, movement building around the issue is important.

In general, it would help to have a more comprehensive framework for what kinds of actions have the most impact depending on how neglected various activities are in a particular domain. Under what conditions is it optimal to do each of the following?

• Do original research
• Summarize existing research
• Preserve content on the web against being lost
• Increase accessibility and searchability of existing content
• Identify policy conclusions relative to your values from existing research
• Build a movement of people who can take action on the topic
• Do various other activities.

The option of preserving existing content provides a baseline for how important original writing has to be before it can be claimed to be an optimal use of resources. (Note: I write many clearly suboptimal articles, including memoirs on my personal website, but I recognize that doing so is a form of selfishness.)

Another possible reply is most existing web content is so worthless that preserving it isn't very useful. But even if 9,999 out of 10,000 web pages are useless (which I doubt), it may be easier to back up 10,000 web pages than to write 1 new article.a Of course, maybe putting those 10,000 pages back online is harder than writing 1 new article, especially if you don't have permission to put the content back online.

Archive Team's "Deathwatch" page gives a sample of what kinds of content are being lost from the web on a large scale. There are of course millions of smaller sites also being lost, including several that I used to visit in the past before they died.

Footnotes

1. By "web page" I mean an individual HTML file, not an entire website.

A typical web page might be ~100 KB uncompressed. So 10,000 web pages would only require ~1 GB of storage uncompressed. Dropbox Plus gives you 1 TB of storage for ~$100 per year. So you can store 10,000 web pages for ~$0.10 per year.  (back)