Outsmarting AI. Brennan PursellЧитать онлайн книгу.
all industries transform themselves with AI.”[11]
Unstructured data gets more of the media attention, because people are more impressed when a machine can identify objects in pictures and respond to written and spoken speech in a human-like way. Unstructured data includes digitized photographs and video, audio recordings, and many kinds of documents that people can readily understand much faster and more thoroughly than a computer. It took years and many millions of images and dollars to get AI algorithms to identify cats in photos with a high degree of reliability. Your average three-year-old child would get it right for the rest of her life after one or two encounters.
Back to data in general. It is the source of your organization’s knowledge, not the AI. It is a key part of your institution’s historical record, actually. Your data is a cross between a gold mine and a swamp. Working through it can really pay off, but you have to be very, very careful. Always maintain a healthy, skeptical attitude toward your data. How accurate is it, really? How was it compiled? Are there obvious or hidden biases that might skew your analysis and its conclusions?
Historical data presents a big problem for AI applications in the US criminal justice system. More often than not, inmates with lighter skin in the past have received parole at a much higher rate than those whose skin is darker, and it is no surprise that AI predictive models recently put these results into use in similar decisions.[12] To what extent does data about past court decisions reflect poor legal representation in court or base prejudice instead of actual guilt or innocence? There is no easy answer to this loaded question. Drug enforcement efforts quite frequently concentrate on neighborhoods with darker-skinned, poorer inhabitants, although drug trade and use are relatively color-blind and prevalent among all socioeconomic classes. Data collected from these efforts used to predict the best time and place for the next drug bust are virtually guaranteed to continue the trend. The computer certainly hasn’t the foggiest idea that anything could be wrong in the data’s origin or derivation.
The problem is nearly universal. In the United Kingdom, a model was used in Durham, in northern England, to predict whether a person released from prison would commit another crime—until people noticed that to a high degree, it correlated repeat offenders with those who live in poorer neighborhoods. Authorities then removed residential address data from the system, and the resulting predictions became more accurate.[13]
People are just people. We are sometimes rational and sometimes not, depending on the circumstances. If only our legal systems were as rational and reliable as mathematical analysis. The two meet in data and the analysis of it. Our great, shared challenge in this age of AI is that the machines serve the people, and not the reverse.
Let’s return to our key term definitions for AI software.
Rules in software say what your organization will do with the outputs. You set the operational rules. You can have an AI system perform numerous tasks, but you and your organization, not the machine, are responsible for the results. Your rules must keep your organization’s actions compliant with the law.
Will your chatbot or texting app use or suggest words associated with hate speech, however popular in use? Will you grant or deny credit to an individual? Will you interview this or that person for a possible job? Will you grant an employee a promotion or not? Will you place this order or not? Will you call the customer about a suspected case of credit card fraud? Will you dispatch a squad car at that time and place? Your organization is completely responsible for the data source—did you obtain it legally?—for the data classification, the procedures, and the rules. The law bans discrimination against people based on race, religion, gender, sexual orientation, etc., even if you made no such data entries. Other laws protect personal privacy about certain topics. (Joshua will explore this topic further in chapter 6.)
Scorecards set up the different factors that contribute to a complex prediction, such as the likelihood of whether someone will contract lung cancer, and algorithms working through masses of historical examples assign points to each factor that accumulate into a final score. Age and the incidence of the disease in family history tend to count more than gender, smoking more than income level or education. Because the computer calculates without thinking rationally, it can point out statistical relationships among the factors without any prior expectations. It may detect possible connections that health experts had not thought of. A computer does not bother with the distinction between cause and coincidence, so some of the correlations might prove medically absurd. (Recall the statistical connection between a person’s IQ and their shoe size!)
Decision trees are used to model predictions based on answers to a series of simple classification questions. Using a breast cancer example, routine mammogram results divide patients into two groups. Those with no abnormalities are classified in the negative; the others go to the next question based on a mammogram. Were the mammogram results suspicious? The “no” answers are set aside, and the “yes” answers move on to the biopsy. That test may result in either a nonmalignant cyst, or, in case of a “yes,” a recommendation for surgery and chemotherapy. To a certain extent, medical professionals are trained to think in decision trees—as should AI systems in the same field, no?
In AI, decision trees can manage multiple sources of data and run constantly. They can “decide” whether a vehicle accelerates, cruises, or brakes; turns left, turns right, or heads straight; remains squarely in its lane or heads to the side of the road to take evasive action.
Neural networks are a key component of most AI systems, but the term is fundamentally misleading. Recall the table about the differences between human brains and computers (see introduction). Neural networks are computerized functions performed by software on hardware, nothing else. They take digitized data, make many calculations quick as lightning, and end in a number. A real neuron is a living human cell that accepts and sends electro-chemicals in the body. Human biologists actually don’t really know how a neuron works, when it comes down to it. To compare a neuron to an electric transistor that is either on or off (reads either 1 or 0) is wildly misleading. But there is no point in trying to change the name at this point.
The best way to explain what a neural network in AI does is by way of example. Think of everything that can go into determining the actual price of a house at sale.[14] Square footage, lot size, year built, address, zip code, number of bedrooms, number of bathrooms, basement, floors, attached/detached garage, pool, facade type, siding type, window quality, inside flooring, family size, school quality, local tax burden, recent sale price of a house nearby, and so forth. Those are the inputs. For simplicity’s sake, let’s say there are thirty of them in a single column of thirty rows.
The neural network sets up an inner “hidden layer” of calculations—imagine another column of say, ten rows—in which each of the original inputs is “weighted,” or multiplied by a parameter (a number you can change) and results in a single, new output number. Think of the hidden layer as a stack of ten functions that push the house price up or down. One could be for, say, “curb appeal,” another for “in-demand trend,” another for “family fit,” another for “convenience,” etc. All thirty input data points are included, each weighted differently by a set parameter, in each of the ten processed inputs in this first hidden layer.
The next layer does the same as the first, adjusting the numbers from the first hidden layer further, bringing them closer to a final recommended price. The final output is the price number. The input data, the inner “hidden” layers, and the final output comprise the neural network.
Neural Network
Source: Image by Brennan Pursell
Although the statistical calculations linking the layers can become very complex, to say the least, the computer performs them accurately, except where bugs intervene—and they can be fiendishly difficult to detect. Given enough data entries and enough hidden layers, neural networks can produce some very accurate calculations. Neural networks can have one, few, or many hidden layers. The more there are, the “deeper” the neural network. “Deep” networks usually work better