Mar 17, 2024 5 min read AI

Run, Rabbit, Run

“The first thing we do, let’s kill all the lawyers.”

-Dick the Butcher, Henry VI Part II, Act IV, Scene II

Ranking among the oldest professions in the world, lawyering dates back to between 1 and 1.5 million years ago and has always been a good example of what we call “knowledge work,” or work which consists primarily of processing not tangible materials but intangible information. The advent of the digital revolution over the past 50 years has seen lawyers transition to manipulating that same intangible information in the form of bits on a computer screen. The exponential increase in computing power is now heralding the next stage of the digital revolution, the age of AI-driven productivity, which is set to transform knowledge-based professions yet again. In many ways, a lawyer’s job is one of translation: they help clients make sense of dense language. Lawyers have consistently maintained their economic value for the last 1-1.5 million years because legal verbiage is too difficult for the layperson (read: most people) to understand. Lawyers, in this sense, straddle two worlds, one in which there is a general consensus about what words mean, and one in which people are paid to make precise arguments about what words mean and don’t mean. We founded Contract Rabbit in order to bridge the gap between these worlds and empower technical and non-technical users alike with the latest in AI and NLP technology.

The past 18 months has seen a frothy wave of products brought to market in legal tech that leverage generative AI/LLMs in various implementations. While these products perform cute parlor tricks, such as writing a song that combines the lyrical styles of Shakespeare and Taylor Swift, they suffer from several structural flaws that make them uniquely ill-suited to performing the work of lawyers. We aim to change that.

LLMs at their core generate the most likely sequence of words based on a very large training corpus, and a long, linear context of surrounding text. They offer tremendous value for some human-in-the-loop activities, like generating a code snippet for a known, solved problem without needing to open Google or Stack Overflow. However, just because a sequence of generated text is likely does not mean it is correct. In fact, non-curated generative models can give an operator a seductive but dangerously misleading sense of validation and confidence in their outputs. Another dangerous element of LLMs is the concept of temperature, or the explicit incorporation of randomness into the model’s generation process in order to imitate creativity (and produce the “wow” factor of some of the parlor trick applications); it ensures output is different between runs of the same prompts. You can guess why that is not a great property for legal applications that crave consistency, where precedent documents, court decisions and statutes are always taken into consideration during analysis.

But perhaps the biggest “frothy wave” problem is that products that leverage third-party GenAI/LLMs treat data privacy with utter indifference. While there are assurances around user inputs being protected and not used for training (and hence being available to leak into output generated for other users i.e. a "jailbreak"), the recent spate of legal action against such companies as OpenAI call into question how seriously such things as user privacy and copyright law are considered. Indeed, these lawsuits have brought damning evidence to the public eye of how little these products care about keeping your secrets secret, in some cases showing how the models generate word for word copies of texts they have been trained on. These products pose a systematic risk to the privacy of user data, especially when considering what uploading training material that contains highly sensitive information (as is often found in legal documents) into an LLM might bring. Legal tech vendors whose products wrap to external AI providers such as ChatGPT are delegating data security to an additional actor, an actor whose proper stewardship of privacy policies is outside the vendor’s control. There is no guarantee that once sensitive information is pushed into the gaping maw of an LLM, it won’t get spewed back up in response to a query from a completely different user. In domains where data privacy matters, this is laughably unacceptable.

While LLMs will continue to provide value in certain applications, careful curation of the automation and machine learning-based generation of content is supremely important in industries with high stakes around the particulars of that content. LLMs can get you 80% of the way there for most tasks and for most tasks, 80% is just fine. But in the law, where precision is next to Godliness – and where the smallest difference in how a phrase is worded can bankrupt a company or subject someone to ruinous levels of liability – 80% is also laughably unacceptable.

For all the excitement surrounding GenAI/LLM products, their limitations are not difficult to understand. After all, most SaaS products are, for purposes of expediency, built on top of a pre-existing technology stack that is domain-agnostic or not designed to understand any domain in particular. When these generalist technologies are tasked with understanding often-nuanced domain-specific problems, they fail. Cloaking older tech solutions with a shiny new wrapper is good enough for offerings that leverage third-party GenAI/LLMs, but ultimately these products are simply incapable of performing at the level of precision that legal (and any other highly specific) domains demand.

Contract Rabbit is not setting out to build a shiny new wrapper on an already existing tech platform. We are designing and building our machine learning-enabled language processing and generation to be tailored to the domain-specific characteristics of the legal field. This provides a more sophisticated and fine-tuned experience than what GenAI/LLM-wrapped solutions can offer, penetrating far deeper into the issues that bear upon legal analysis. Our products are native to a suite of patent-pending software dreamed up, designed, and built solely by our team of visionary developers, and can be used to reimagine the legal thought process from its most elemental form. We offer legal tech products that do not leverage third-party GenAI/LLMs, but rather leverage the power of our proprietary domain-specific language model, or “small” language model. Thus, we can deliver better solutions for our users, while not exposing their data to a third party that may misuse it.

At heart, Contract Rabbit is a tech company. We love building AI solutions to solve important problems in society. We are currently building products to solve problems in the legal tech space, but the unique angle from which we tackle these problems allows us to take a wider view of our software’s scope of potential applications. We are lucky to find ourselves just on the cusp of a generation of AI that will radically alter the way human beings communicate. Thus, our goal is to use AI to improve the way people use language, firstly in the narrow domain that is legal services and then on to potentially any domain that relies primarily on language to move information around.

A cabbie in 1970s England knew the map of the city by heart. “The Knowledge” was considered a valuable asset among the city’s cab drivers, as there was no other way to remember London’s complex network of streets and alleys. Upon dissection, scientists have observed evidence of particularly dense neuronal clusters in the hippocampus of the cab drivers that had obtained “The Knowledge”, whereas today such anatomic clusters are virtually non-existent. As this kind of rote information migrates from being stored in and used by human brains to being stored on servers and used by algorithms, the databases of information that AIs have to draw upon will only get larger. We are currently in the middle of a global phenomenon whereby information is flowing from human brains to artificial memory architectures. By enhancing the quality of these artificial memory architectures with domain-specific language models, we can create a generation of AI that can replicate the precision and accuracy of a human brain. Apophenia will flourish in this brave new world that has such rabbits in it where all users will need to do is give a simple input and then let the rabbit run.