The Math Behind AI

The first article of this series (Mind the Machines) explained why law firms and law departments should familiarize themselves with machine learning algorithms. While discussing data sets collectible by firms or departments and software resources for the computations on those sets, that article left for later the topic of how machine learning software actually "learns." Magic may be what many people think is the legerdemain of machine learning, but underneath the hood is not magic—it is math.

"Wait," you might be protesting, "I'm a lawyer, and math is a foreign language! Real lawyers manage concepts, clients, legal problems and other lawyers, and let techie geeks crunch the numbers." Those who stop reading here say, "Let's preserve my comfort zone of the supremacy of text and intuition over numbers and probabilities."

Understanding enough about the burgeoning world of artificial intelligence and what computer programs can deliver, that is to say how the software tools actually function, is not too detailed, too geeky, or too incomprehensible. Lawyers will think strategically better and manage others more effectively when they have a grounding in the world of today's technology. Innumeracy and tech naiveté is unworthy of management lawyers who aspire to lead; the march of machine learning is afoot.

In the innards of their computer code, algorithms—a fancy term for the steps the computer has been told to take when given data—burrow through and emerge with clusters or classifications or estimated predictions. Sophisticated mathematical calculations take place, out of sight of the user for the most part, but deserve to be understood at a basic level. The prospects for how artificial intelligence will affect the practice of law, not to mention the alarming predictions—"50,000 lawyers to lose jobs to Watson by 2020!!"—can be evaluated better by those who have a sense of what the software can and can't do.

Machine learning algorithms deliver the most when the data they parse is numeric. As we mentioned in the previous article, if we want to predict the billable hours likely next year from individual associates (or whether they will decamp for another job), we would assemble HR and financial data on their years out of law school, their years with the law firm, their billable hours in each of the previous two or three years, the number of lawyers in their primary practice group, and other numeric facts about them.

Enter the math embedded in machine learning software! Let's touch on the range of mathematical tools on which algorithms such as naïve Bayes, neural nets, multiple regression and Support Vector Machines rely.


Statistics have been known for years regarding how to take data sets and create the "best fit" lines for that data. When legal managers choose multiple regression, they draw upon a powerful suite of tools that can estimate what the output will be based on new information. With the best fit line or related output calculated from a training set of our associate data, we gain a formula. Into that formula we can plug information about an associate who was not in the training set and learn an estimate of the new associate's likely billable hours (or probability of staying with the firm). The range of regression applications in law firms and law departments is enormous.

Matrix Algebra

Matrix algebra carries out operations on numeric matrices, rectangular sets of numbers such as in a spreadsheet. More significantly for machine learning algorithms, this field of mathematics can convert complex, hyper-dimensional sets of data (each dimension is a different bit of information about the client or matter or associate or practice group) into simpler sets that are more tractable for software. This is at the root of what is known as "principal components analysis" and other tools in the machine learning workshop. Decomposing large data sets into key variables makes sense, and only computers can whisk through the complicated mathematical transformations of large data sets.

Similarity Measures

Similarity measures are carried out constantly by machine learning software. If you think about points on a Cartesian plot with an x-axis and a y-axis, it helps you visualize how software can measure the distance between any two of those points. Now, extend your thought experiment to three dimensions, four dimensions and more. Humans fail early, but it is easy for computers to calculate the Euclidean distance or the Manhattan distance or other measures of distance between any of the points. Or they can find clusters of points and calculate the center of the cluster—the centroid—and instantly figure out the distance between centroids.


Another mathematical tool used is probability. Probability is used for example in naïve Bayes algorithms to figure out the optimum likelihood of something happening given various conditions. More fundamentally, as new information comes in, Bayesian analyses updates the new probability. With probability calculations, as with any inference learning, more data is always better.


As for calculus, machine learning methods "learn" by figuring out how close they are to a known answer (the training set) by figuring out lines on curves. When the software is training itself on part of the data, it knows the answers and it keeps tweaking the mathematical dials until it figures out how closely it can get to the answers. One common step involves tangents, a term for a straight line that touches a curve at a single point. The slope of this tangent line (how much a movement in one direction changes the position in another direction) is instrumental for machine learning calculations. A key concept here is forbiddingly named "gradient descent" determinations.

When you have a sense of the calculations that underlie machine learning algorithms, you also have a better grasp on why it is important to scrub the data you feed in. For example, missing data can confound a machine learning algorithm, although there are many ways to identify holes in the data and even to impute data into them or otherwise handle it. And, we should emphasize, the data does not have to be numeric. Machine learning algorithms can handle what are called factors, such as practice group names, because essentially the algorithms turn those factors into numbers.

Machine learning algorithms do not develop concepts and abstract ideas or create anything new, but they can extraordinarily rapidly discern patterns in data that humans simply cannot possibly match. Text mining, by the way, manipulates words as numbers and relies on statistics, probabilities, and similarity measures to deliver insights into text documents.

Rather than passing on the seeming complexity of the mathematics that may be unfamiliar to you, think of it as an entrée into an exciting field of leading-edge technology, a different way of thinking, and a set of powerful software tools. The old canard about people going to law school because they were not good at mathematics should not deter open-minded and strategically-thinking lawyers from recognizing that what computers do is in its essence manipulate numbers, and that the future of machine learning for the legal industry depends on clever applications of mathematics.

Rees Morrison is a principal with Altman Weil.  One of his specialties is data analytics for law firms and corporate law departments.  Contact him at

This article originally appeared in Law Technology News, December 2016.   Copyright 2016. ALM Media Properties, LLC. All rights reserved.

Email this page
Email this page