Bots. Nick MonacoЧитать онлайн книгу.
identities and remove others’ posts. This meant that, in effect, a bot could be used to censor other users by deleting their content from the web. Once the secret was out, users and organizations began cancelling other’s users’ posts. For example, a bot called CancelBunny began deleting mentions of the Church of Scientology on Usenet, claiming they violated copyright. A representative from the Church itself said that it had contacted technologists to “remove the infringing materials from the Net,” and a team of digital investigators traced CancelBot back to a Scientologist’s Usenet account (Grossman, 1995). The incident drew ire from Usenet enthusiasts and inspired hacktivists like the Cult of the Dead Cow (cDc) to declare an online “war” on the Church, feeling the attempt at automated censorship violated the free speech ethos of Usenet (Swamp Ratte, 1995). Another malicious cancelbot “attack” from a user in Oklahoma deleted 25,536 messages on Usenet (Woodford, 2005, p. 135). Some modern governments use automation in similar ways, and for similar purposes as these cancelbots and annoybots: using automation to affect the visibility of certain messages and indirectly censor speech online (M. Roberts, 2020; Stukal et al., 2020).
Another prolific account on Usenet, Sedar Argic, posted political screeds on dozens of different news groups with astonishing frequency and volume. These posts cast doubt on Turkey’s role in the Armenian Genocide in the early twentieth century, and criticized Armenian users. Usenet enthusiasts still debate today whether the Argic’s posts were actually automated or not, but its high-volume posting and apparent canned response to keywords such as “Turkey” in any context (even on posts referring to the food) seem to point toward automation.
Over time, more advanced social Usenet bots began to emerge. One of these was Mark V. Shaney, a bot designed by two Bell Laboratories researchers that made its own posts and conversed with human users. Shaney used Markov Chains, a probabilistic language generation algorithm, which strings together sentences based on what words are most likely to follow the words before it. The name Mark V. Shaney was actually a pun on the term Markov Chain (Leonard, 1997, p. 49). The Markov Chain probabilistic technique is still widely used today in modern natural language processing (NLP) applications (Jurafsky & Martin, 2018, pp. 157–160; Markov, 1913).
Bots proliferate on internet relay chat
Like Usenet, Internet Relay Chat (IRC) was one of the most important early environments for bot development. IRC was a proto-chatroom – a place where users could interact, chat, and share files online. IRC emerged in 1988, nine years after Usenet first appeared, coded by Finnish computer researcher Jarkko Oikarinen. Oikarinen made the code open-source, enabling anyone with the technical know-how and desire to host an IRC server. Along with the code, Oikarinen also included guidelines for building an “automaton,” or an autonomous agent that could help provide services in IRC channels (Leonard, 1997, pp. 62–63).
The arc of bot usage and evolution in IRC is similar to that of Usenet. At first, bots played an infrastructural role; then, tech-savvy users began to entertain themselves by building their own bots for fun and nefarious users began using bots as a disruptive tool; in response, annoyed server runners and white-hat bot-builders in the community built new bots to solve the bot problems (Leonard, 1997; Ohno, 2018).
Just as with Usenet, early bots in IRC channels played an infrastructural role, helping with basic routine maintenance tasks. For instance, the initial design of IRC required at least one human user to be logged into a server (often called a “channel”) for it to be available to join. If no users were logged into an IRC server, the server would close and cease to exist. Eventually, “Eggdrop” bots were created to solve this problem. Users deployed these bots to stay logged into IRC servers at all times, keeping channels open even when all other human users were logged out (such as at night, when they were sleeping). Bots were easy to build in the IRC framework, and users thus quickly began designing other new bots with different purposes: bots that would say hello to newcomers in the chat, spellcheck typing, or allow an interface for users to play games like Jeopardy! or HuntTheWumpus in IRC.
Given the ease of developing bots in IRC and the technical skill of many early users, this environment was the perfect incubator for bot evolution. Good and bad IRC bots proliferated in the years to come. For example, Eggdrop bots became more useful, not only keeping IRC channels open when no human users were logged in but also managing permissions on IRC channels. On the malicious side, hackers and troublemakers, often working in groups, would use collidebots and clonebots to hijack IRC channels by knocking human users off of them, and annoybots began flooding channels with text, making normal conversation impossible (Abu Rajab et al., 2006; Leonard, 1997). In response, other users designed channel-protection bots to protect the IRC channels from annoybots. In IRC, bots were both heroic helpers and hacker villains – digital Lokis that played both roles. This dual nature of bots persists to this day on the modern internet on platforms like Reddit, where both play helpful and contested roles on the platform (Massanari, 2016).
Bots and online gaming on MUD environments
In addition to Usenet and IRC, computer games were also a hotbed of early bot development. From 1979 on, chatbots were relatively popular in online gaming environments known as MUDs (“multi-user domains” or “multi-user dungeons”). MUDs gained their name from the fact that multiple users could log into a website at the same time and play the same game. Unlike console games, MUDs were text-based and entirely without graphics,5 due to early computers’ limited memory and processing power, making them an ideal environment for typed bot interaction. These games often had automated non-player characters (NPCs) that helped move gameplay along, providing players with necessary information and services. MUDs remained popular into the 1990s, and users increasingly programmed and forked their own bots as the genre matured (Abokhodair et al., 2015; Leonard, 1997).
ELIZA, the original chatbot from the 1960s, served as a prototype and inspiration for most MUD chatbots. One of the big 1990s breakthroughs for MUD bots was a chatbot named Julia. Julia was part of an entire family of bots called the Maas-Neotek Family, written by Carnegie Mellon University graduate student Michael “Fuzzy” Mauldin for TinyMUD environments. Julia, a chatbot based on ELIZA’s code, inspired MUD-enthusiasts to build on the publicly available code from Maas-Neotek bots, to hack together their own bot variants (Foner, 1993; Julia’s Home Page, 1994; Leonard, 1997, pp. 40–42). Bots became legion in TinyMUDs – at one point, a popular TinyMUD that simulated a virtual city, PointMOOt, had a population that was over 50 percent bots (Leonard, 1996) – which was an essential part of the appeal for both players and developers.
Bots and the World Wide Web
As we have seen, early internet environments such as Usenet, IRC, and MUDs were the first wave of bot development, driving bot evolution from the 1970s through the 1990s. The next stage of bot advancement came with the advent of the World Wide Web in 1991.
Crawlers, web-indexing bots
The World Wide Web became widely available in the early 1990s, growing exponentially more complex and difficult to navigate as it gained more and more users. Gradually, people began to realize that there was simply too much information on the web for humans to navigate easily. It was clear to companies and researchers at the forefront of computer research that they needed to develop a tool to help humans make sense of the vast web. Bots came to fill this void, playing a new infrastructural role as an intermediary between humans and the internet itself. Computer programs were developed to move from webpage to webpage and analyze and organize the content (“indexing”) so that it was easily searchable. These bots were often called “crawlers” or “spiders,”6 since they “crawled” across the web to gather information. Without bots visiting sites on the internet and taking notes on their content, humans simply couldn’t know what websites were online. This fact is as true today as it was back then.
The basic logic that drives crawlers is very simple. At their base, websites are text files. These text files are written using hypertext markup language (HTML), a standardized format