The Magic of Data Driven Regulation

An inaugural presentation of the Allens Hub at the University of New South Wales.

I had the good fortune of meeting Mireille Hildebrandt in 2008 while we were both at the London School of Economics presenting at an ID Systems Conference. And while on a recent trip to Sydney I learnt unexpectedly from Lyria Bennett Moses that Mireille would be presenting, so I stayed behind another few days. It was worth listening to her methodical presentation live!


My notes were comprehensive but may not have been altogether accurate and organised as presented (I take full responsibility for inaccuracies). Sadly there were likely another 20 or so slides that were glossed over all too quickly toward the end of the talk, but this only promises for Mireille to come out to Australia again for a part 2. I learnt some very profound things that night- and I had several k-ching moments throughout the night. This is what a brilliant academic can do- take the audience on a long metaphor and then come in with the practice. As a philosopher Mireille has a big advantage over your standard lawyer- and her sources demonstrated her art in her craft- we were spoiled with references to mathematicians, anthropologists, computer scientists, technology lawyers and more. Thank you Mireille! We look forward to the book.

Katina’s Notes

Mireille has a full time affiliation with Vrije Universteit Brussel and several other minor affiliations.

Interfacing law and technology

Lawyer and philosopher; part-time chair with CS; lawyers begin to interact with CS

Legal tech

Responses from lawyers regarding legal tech

Some of the responses include:

·       AI in the law is nonsense, not feasible, waste of time

o   Old logic

·       AI in law will democratize the provision of legal services

o   Apps. Landlord vs big company

o   Find the app and get a prediction of whether you will win a case or not

·       AI in law will solve many legal problems caused by text-driven complexity

o   Text is naturally ambiguous

o   Legislation

o   Enormous; so much text; especially with international jurisdictions; the contradictions are too much. Without AI it is impossible?

·       AI in law will solve some problems and create new ones

·       AI will depend on how we will develop “legal tech” and by whom?

o   Lawyers, CS, consultants, policymakers?

o   And for WHOM?

§  Who is paying for this?

o   What are we optimizing for?

·       Is legal text about reasoning, meaning; but in a machine legal text is just “Data” that you are trying to correlate, and what you are asking algorithm to do is “optimize”. The question is what are you optimizing for? If you have no answer to this the machine will not learn anything.

Should we understand how the technology of legal tech works?

·       We can drive a car without knowing how the engine works?

o   Remember Pirsig’s “The Art of Motorcycle Maintenance”

·       We act on doctor’s advice though we don’t know how they get their diagnosis?

o   Note that doctors generally employ ‘diagnosis by treatment’

o   Doctor has potential diagnosis or none- and he will try something- and then start to figure out what is wrong with you

§  We expect doctor knows and we don’t have to get involved

§  There is trust in a doctor’s knowledge and that the ‘engine is built well’

·       Can we trust legal technology?



·       We must not believe

o   “prestidigitator enables him to do things that are not noticed by those whom he is engaged in fooling (1939-John Dewey)

·       Some people claim a trade-off between interpretability and accuracy:

o   The less “ordinary folk” understand AI, the better it functions

§  If we have this legal tech we can predict the case solution

§  Dangerous way of going about things

§  Some people not necessarily within CS, they claim there is an accuracy issue

§  AS systems become more difficult to understand they will become more accurate

§  Don’t try to make them understandable because it downgrades their functionality

§  If a then b does not imply if b than a…

§  If the developer says they do not understand what the AI is doing, raises the question “What the accuracy actually means”? Is this truly accuracy?

·       Some people claim that blockchain application support trustless transactions

o   Trust is displaced from institutions, and put in technology instead

§  Trust in the minors… people who write the code for smart contracts

·       This is the lure of magic:

o   We don’t have to drive or trust the doctor, so we don’t have to understand the code to use it

·       In anthropology:

o   Mistaken attribution of causality (the rain dance)

o   Raising fear and inviting subservience (the power of the priest)

o   We are now warned of an arms race in AI and asked to submit, e.g. our data

§  Behavioural

·       Such magic is not reserved for “primitive society”

o   All types of society are vulnerable to such thinking

o   All types of society found ways to fact-check and to call-to-account

§  To call the priests to account, the board of governors etc… requires the below…

o   This requires resilience, patience and a serious effort to understand

§  Lawyers need to understand this tech

o   And a well designed system of checks and balances (e.g. re car and doctor)

§  That is why we can afford to say as a user- I don’t need to know more about it.

o   We call it Rule of Law: legality, auditability and contestability

Counting as a human being in the era of Computational Law (COHUBICOL)

·       However not everything that can be counted counts, and not everything that counts can be counted. “William Cameron”

·       To count to calculate to compute

o   Incomputability from a CS perspective

§  Maths and CS term

o   Godel, Wolpert, Mitchell

§  Moving from axcioms to deductively drawing conclusions

§  Mathematical proof “no lunch theorem”

o   Inferences from that data to predict new data

§  There is one thing that limits machine learning from predictions, the simple fact that you can only train on present and historical data

·       You cannot train on future data…

·       This is what limits machine learning

·       The mathematical assumption of ML are incorrect but productive

o   They can do great work but are incorrect

·       To count, to quality, to matter:

o   Incomputability from an anthropological perspective about how people interact

o   G.H. Mead, Arendt, Plessner, Ricoeur

o   The co-constitution of self, mind and society

o   “Imagine that I talk to a small child of 1-1.5 yrs old, the child learning to speak, and I tell the child you are Charlotte and I am Mireille, and then the child will say opposite… and then say again… “You and I”… and then the child, that “she is you to me”, and then decentering, and that looking back at yourself from a perspective of being human

§  We are trying to make computer a different thing

o   When I say “I” to constitution of self, we are not born into a self. We are developing into a self because we are being addressed by others.

§  System of “law” to address others.

·       Law co-constitutes us in our expectations

o   Descartes: I think therefore I am

§  He did not get it…

§  “Being profiled”; you think therefore I am.

§  The constitution of the self

o   Legal protection

§  Because we are being profiled, we get better results from search engine

§  It is not because we have computers we are constituted by things, it is not because of machines…

§  Currently through “human interactions”

§  Something changes when the emancipation comes from machines and not other human beings.

·       To what extent is being profiled by machines, different from other people or constitutions, or the law, and what is changing because of computation background because of systems profiling us

§  Against being “overdetermined”

·       By gaze of the other (Mead)

·       Soi-meme…

·       How state sees us

·       So what happens if something like the law that is nl and text, that the law is based in legal tech


Data-driven legal tech:

·       eDiscovery

·       Argumentation mining

·       Prediction of judgements

·       So when I have a particular legal case I can look for things that count in the law

·       Most dangerous type of AI is the prediction of judgements

o   Code driven legal tech

o   Code driven legislation (policy articulated in code), contracts (in fintech, transfer of assets), decisions (public administration), connected with blockchain

o   Instead of writing a contract in NL, you translate the content into CODE, and make that code “Self-Executing”

o   Can do the same with regulation; could have a policy written into code

o   With every move you make, the text is executed autonomically

Machine learning

o   What is A/B Testing?

o   All the web sites are liked, and are experienced, there is A/B testing happening continuously

o   A small change that is tested.

§  Do you like version A or version B?

§  Software-based

§  Software automatically calculates which attracts the most favourable behavior (click-based behavior; or purchasing behavior; preferred?)

§  Which site has better outcome, that is the site that has better implementation

§  Continuous process

§  Changing buttons; +3 days keep up with competitors

o   We are continually being nudged by certain things running in the background

o   Online environment is “Add revenue”

o   Optimize web site means put content that attracts more $

o   We are surrounded by online environments that have ad driven content

§  This microtargeting does not really work because human behavior is far too complex and far too smart

§  From side of CS, there is an urge to think it works

§  What is the statistical relevance?

·       Because moving too fast, there is a lure to do “p-hacking”

o   If you have significance you find favourable, then you STOP, it is methodologically unsound but people continue buying into this

§  Proctor and Gamble withdrew their budget, and this year it said “it didn’t cost us anything”

o   Lawyers must not make that same mistake

o   We don’t want “Crappy Machine Learning” giving verdicts

o   Tom Mitchell:

§  A computer program is said to learn from experience E, with respect to some class of tasks T (prediction of judgments)…

o   ML often parasites on human domain expertise: what cs calls “ground truth”

o   The politics are in who get to determine E, T, and P

o   The ethics are in how they are determined

§  Law concerns the contestability

o   Are you optimizing to cut out racism or for banal statistics… triggered a whole discussion on “fairness” from statistical point of view

o   But law is about contestability? How can we make law this way and make sense in criticizing it

o   Company that has the priority software to do this (how they handle the stats), the statistical error for black people is to their detriment, and for white people it is beneficial.

o   But what we have to do is to “compensate” and understand the error rates

o   “AI Program able to predict human rights trials with 79% accuracy”

o   Is it in use?

o   “Assumption: text extracted from published judgements are a proxy for applications lodged with the Court”

§  But not accurate. They only used cases in English. And not everything published. Doesn’t have all relevant data.

§  Get to the “low hanging fruit” but why?

·       So don’t suggest you have it right for 79% of cases

·       Problem: as authors state, facts may be articulated by court to fit the conclusion

o   As selected and rendered by courts as they have found in their mind

o   Cases held inadmissible or struck out beforehand are not reported, which entails that a text based predictive analysis of these case is not possible

o   The experience is reduced

o   Why? Admissible cases = low hanging fruit

o   Problem that cases that are not reported applications, they would make a difference which now remains invisible

o   Data on cases related to art. 3, 6, 8 ECHR (privacy, torture…)

o   Why? Because they provided the most data to be scraped, and sufficient cases for each

o   Problem: here you are framing things, that all cases are either 3, 6, or 8 but the rest remain invisible (e.g. art. 5, 7, 8, 10, 14)… but the way it works, that all interlinked articles, and they say x y z is obvious. But you are presenting these as independent variables.

o   Dataset = publicly available

o   Need to distinguish it is EXPLORATORY RESEARCH

§  You don’t have the full dataset

§  For each article: all cases (apart from non-English judgements)

§  Equal amount of violation and non-violation cases

§  Test extraction by using regular expressions, excluding operative provisions (of, and on, and the)

o   Circumstances and topics are best predictors, combined works best

o   Law has the lowest performance

§  Discussion: facts are more important than the law

§  Legal formalism and realism: evidence that legal realism is realistic

o   This is nonsense

§  Facts has been framed in a way that they are “Determined” but you don’t have the facts…

§  In a lot of the cases there are no law section due to an inadmissibility judgement

o   SO sometimes there is no law


o   I’ve asked lawyers and look the program is better! NO!



Text-driven interpretation:

o   Only a good lawyer does close reading and bounded rationality

o   Integrity of law and logical coherence

o   Treating like cases alike

o   Legal certainty as contestability

o   Ambiguity tells us how text will affect life and opens for contestation and argumentation… this is very different to code law

Data- and code-driven interpretation:

o   These systems do “remote reading”

o   With new technologies we can do remote.. instead of how we read a canonical text… and machine can read everything and do inferences

o   Remote reading based on NLP

o   Coherence based on the approximation of a mathematical target function

o   Input and output- mathematical function (not reasoning), so there is an assumption that with machine learning our world (and our legal world), is ruled with mathematical functions

o   By getting hold of maths function someone become a good lawyer- no!

o   But if we outsource tasks to these technologies it is based on these assunptions that a mathematical “Target function”, based on “predictive accuracy” or blockchain type legal tech

Text-driven normativity and legal protection

o   Because of ambiguity in human language but not so flexible … lawyers contrained by legal norms… higher court can impose interpretation on them

o   Suspension of judgement, contraints upon personal opinion

o   Practice and effective legal remedies with institutions checks and balances

Data-and-code-driven normativity and legal protection

o   Either freezing the future by making predictions based on historical data

o   Or by way of deterministic self-executing code

o   Contesting statistics and contesting execution of irreversible code

Lawyers will have to make a good grip on statistics to contesting


Learners and decisional algorithms

o   The learner…

o   Once the system has learned then you can translate that output to another algorithm

o   Cases with these 4 characteristics will always be judged like that

§  Then develop another algorithm—this causes violation

Illusion of legal certainty



Legal protection by design vs legal design

o   Can you use it enforce compliance

o   If you reorganize legislation and contracts, where non-compliance is ruled out, then it is technical management, and administration

o   Legal protection by design

o   We cannot think of legal protection in same way as always done

o   We have no tools to protect ourselves as lawyers and people we defend

§  If you develop and employ legal technology, then you have to embed in the design of that technology, the legal protection must be embedded

§  Democratic legitimization (representation, deliberation, participation)

§  Enabling the contestation into the design of these systems

o   Legal protection impact assessment

The legal tech is NOT magic (not like the car or medical science)

IT does not deserve blind trust

Law shapes the checks and balances that enable trust in engines and medicine