Products & services
Law Practice / Professional Services
Medium (100-1000 employees)
Williams Mullen, a full-service corporate law firm that specializes in corporate law, litigation, finance, and real estate, realized that having attorneys comb through millions of documents took time away from actually doing work for their clients. This led the company to research a machine learning solution. Once they realized that a lot of their issues were really classification problems, they decided to explore further with ML.NET.
When it comes to tech, attorneys live primarily in two applications: Word and Outlook. This translates to a rather large amount of unstructured data in the form of Word documents, PDFs, and e-mails that get placed into a document management system, which can contain decades worth of digital information. This becomes a challenge when attorneys want to find specific information in these documents, which they do by searching document metadata where information is often missing, incorrect, or outdated.
Due to this manual process, William Mullen found that millions of documents had issues that prevented the documents from being easily searchable, which was wasting attorney time and contributing to lost revenue.
The legal industry is rather Microsoft-centric when it comes to technology choices. Williams Mullen is no different; their developers are big C# users. They started looking into machine learning solutions around the same time that ML.NET was announced, so it was a natural fit to start using ML.NET for their classification scenario.
Through their research, William Mullen found that 20% of the documents in their system (for example, millions of documents) had issues that prevented the documents from being easily searchable, which was wasting attorney time and contributing to lost revenue. If not for ML.NET, the company either would not have undertaken the project at all, or they would have had to pull people off actual work to manually fix these issues, which could have led to hundreds of thousands of dollars of cost for the project.
"With ML.NET, we're able to train the model and then immediately test it inside of our code. This makes shipping new changes faster because all the tooling was together in one place."
The architecture is just two .NET Core console applications and a database. One console app pulls down the training data, prepares the data and trains the model. The other console app pulled down the data that was needed to run the model against, loaded the model to classify data, and off it went before putting the data back in the database.
"The joy of this project was sort of how simple it was to get going so we didn't really need anything more complex. I mean the training app was a total of 200 lines of code, with comments, logging, etc. and the app to run the model was even smaller. The largest part of the whole thing was the transformation code which came in at 13 lines of code."
The training data, which is around 2 million documents, came from the law firm's document management system. The data itself includes content from the document, the title, the author, the recipient (for e-mails), and other bits of metadata depending on which fields are being cleaned up.
Williams Mullen has tried several different data transformations and training algorithms over a couple of different applications. Data transformations include NormalizeText, TokenizeWords, RemoveDefaultStopWords, OneHotHashEncoding, FeaturizeText, ExtractWordEmbeddings, and ProduceNGrams. For training, they primarily use the StochasticDualCoordinateAscent and OneVersusAll classification algorithms.
Using ML.NET's data transformations and algorithms to create a machine learning solution, Williams Mullen has been able to make millions of documents more searchable, which in turn has helped make their attorneys more productive.