Transcript: Artificial Intelligence Is Here Series: Deciding When and How to use AI in Government
[The Webcast Webdiffusion logo appears.
Neil Bouwer appears via webcam. As he speaks, a chyron appears reading "Neil Bouwer. Canada School of Public Service."]
Hello, everybody. Welcome to the Canada School of Public Service. My name is Neil Bouwer. I'm the vice president of Innovation and Skills Development Branch here at the school, and I'll be your moderator for today's events. So, thank you for making the time today. We're really pleased to have you here with us. Before we go any further, I do want to acknowledge that I'm broadcasting from Ottawa, notwithstanding my virtual background. That puts me on the traditional and unceded territory of the Anishinaabe people. And so, I think it's right that we take a little bit of time to reflect on that. And for those of you that are not in the national capital region and other parts of the country, you'll be on traditional territories of other first peoples. And I invite you to take a moment to think about the First Nation and indigenous peoples that first and continue to occupy those lands. Today's event, for those of you who don't know is our third instalment in Artificial- Artificial Intelligence Is Here series. So welcome to it. This series is a partnership between the Canada School of Public Service and the Schwartz Reisman Institute for Technology and Society, which is a research and solutions hub based at the University of Toronto. And it's dedicated to ensuring that technologies like AI are safe, responsible and are being harnessed for the public good. So, we're really pleased to be bringing this collaboration to you all. So far in the series, we've provided participants with an overview of the AI landscape, including how AI is likely to transform governments and government decision-making and issues around citizen consent. Today, we're turning our attention to the question of when and how AI should or could or maybe is being used in a government context. For today's event, we're going to start with a lecture from two experts from the Schwartz Reisman Institute. The first is Peter Loewen. He's the director of the University of Toronto's Munk School of Global Affairs and Public Policy, as well as being an associate director of the Schwartz Reisman Institute. Peter is going to talk about some of the factors that should be considered in deciding when and how AI should be used to government. After that, very pleased to hear from Gillian Hadfield. She's the director and chair in technology at the Schwartz Reisman Institute, and she'll be talking about some of the tools and regulatory frameworks that governments around the world are starting to introduce in relation to AI in government. I can't wait to hear from both of them. Following that lecture, we're going to have a panel with two really esteemed panellists.
The first is Cary Coglianese who's joining us from the University of Pennsylvania, where he holds the title of Edward B. Schultz, Professor of Law and Professor of Political Science, as well as being the director of the Penn Program on Regulation. The second panellist is Dr. Daniel Ho, who joins us from Stanford University. Dr. Ho holds a number of distinguished titles at Stanford, including the William Benjamin Scott and Luna M. Scott, Professor of Law, Professor of Political Science, a senior fellow for the Stanford Institute for Economic Policy Research, associate director of the Stanford Institute for Human-Centered Artificial Intelligence and director of the Regulation, Evaluation and Governance Lab. Just before we get started, few administrative notes that I need to mention. This is a great event, I don't want you to miss any of it. To optimize your viewing, we recommend you disconnect from your VPN or use a personal device to make sure that you get the best bandwidth possible for the event. And if you do have technical issues, we recommend that you re-launch the webcast link that was sent to you in your email. You are going to be invited to submit questions throughout the event. Please, do so. So even while the lectures are playing and while the panel is discussing, please send your questions in. They will come to us through the Collaborate video interface. So, all you need to do is look in the top right-hand corner of your screen. There's a click-- "raise hand" button there and you can enter your question anonymously and we'll be monitoring those questions, and I'll be posing those questions for you to our invited guests. So please, take advantage of that. There is also simultaneous translation for those of you that are joining the webcast. You can choose French or English, the official language of your choice for the translation. So, without further ado, let's move to the video on deciding when and how to use AI in government.
[A purple title card with a starburst reads "Artificial Intelligence Is Here Series". Words appear. "Deciding When and How to Use AI in Government". Peter Loewen stands in front of a blue background. Slides appear to the left of him. The first is a 3D model of a human brain. A chyron reads "Peter Loewen. Associate Director, Schwartz Reisman Institute. Director, Munk School of Global Affairs and Public Policy, University of Toronto"]
Hello, my name is Peter Lohan. I'm a professor at the University of Toronto and associate director of the Schwartz Reisman Institute and the director of the Munk School of Global Affairs and Public Policy.
[Words appear on the slide. "How are governments using AI and machine learning?"]
In the short talk, I want to explore with you some of the ways in which governments are employing artificial intelligence and machine learning. As our earlier sessions have explored, artificial intelligence, or AI, is already in use all around us from powering our phones and cars, to making recommendations about music, from financial instruments to underwriting new forms of military conflict.
[The slide reads "The rate at which AI is employed is increasing dramatically".]
The combination of ubiquitous data, high powered and high dimensional computing and computation, and commercial usefulness, mean that the rate at which AI is being employed is increasing dramatically.
The positive effects of this are massive. If you want to best think about how the use of AI in government can be done, then it's probably helpful to narrow our scope at least a bit.
So, I want to share with you over the next few minutes four ways in which AI is being deployed in governments around the world.
[Words appear over an image of a crosswalk. "Four ways AI is being deployed."]
And I also want to give you a certain classification of these uses.
So, let's think about a simple two by two table that can organize the way AI is being used. Now, after we explore these different uses, we'll revisit some of the general principles of when AI might be deployed in government.
[A title slide briefly fills the screen that reads "Dimensions of Use".]
If we want to organize our understanding of how AI is being deployed in governments, let's think about two different dimensions.
The first dimension is a citizens or bureaucrat dimension. The basic question here is, is the AI being used or experienced by a citizen or is it being used by a bureaucrat? Put just a little bit differently, is the AI being used for the public or is it being used internally by the government? Is it front counter or back office? Second, is the AI being used as a replacement or as an enhancement?
On this dimension, we're interested in whether the AI is standing in for something that would previously be done by a person, or whether it's enhancing an already existing process to make it more efficient, accurate, faster, easier.
In short, is AI replacing a human or enhancing a human? When we put these two dimensions together, we get four different scenarios about how governments can be using AI, and we could find ready examples in each of these. So let's imagine them together. Let's walk through them.
[A title slide fills the screen that reads: "AI for public-facing uses".]
Let's take first the case of government using AI for interacting with citizens. And in a fashion that replaces a human in the process.
[A slide displays an article headline reading "Automation in government: Harnessing technology to transform customer experience. September 28, 2020."]
A recent case study by McKinsey and Company highlights the use of an automated chatbot by the German Federal Employment Agency, Germans seeking employment insurance or assistance, or employers looking to coordinate benefits, now have the option of interacting not with a human, but with a chatbot.
[An image of a text message that reads "Hi! How can I help you?"]
Using a likely combination of natural language processing and reinforcement learning, combined with well-designed surveys. The bot is able to process even complex cases for employment insurance.
Previously, it might have been a human bureaucrat reviewing an application or interviewing a citizen and then making a judgment about their suitability or eligibility for the benefit.
Now that process is all automated, but rather than an impersonal website of questions and dropdown menus, the citizen has a more humanlike experience of conversing with the bot.
Users, for what it's worth, report positive experiences, perhaps because they don't have to share difficult or even embarrassing information about their employment situation with another human. And as McKinsey reports, the chat AI is able to convey a sense of both efficiency and care. What are the upshots here?
Well, more people can be served perhaps more effectively and certainly more efficiently, more quickly.
And there's little to no limit on the scale of individuals who can be attended to by this AI chat function in any point in time. Think about that in an economy where we expect much more volatility in work and in the need for short term tide-overs and income security. What a benefit it is to have a machine that can do this without hitting the limit.
Importantly, and this moves us into our next cell of a case where we're using AI to enhance a process. This need not come at the cost of the complete or even partial replacement of people. Instead, we could imagine the second scenario in which the AI enhances decision making by a bureaucrat. Indeed, imagine the chatbot identifies cases in which it feels that a human judgment is required.
In those instances, cases can have their uncertainties identified and then be handed off to a government employee who can consider this case and apply human judgment.
Isn't it a potentially better use of resources if we take those who would have previously interacted with all those citizens and redeploy them to cases which require more complex or vexing interactions or more judgment? Or maybe just more empathy?
[A title slide reads "AI for administrative uses".]
Artificial intelligence applications are not being deployed by governments only or even principally in citizen facing work. Instead, they're also being widely deployed and used to support so-called back-office functions of governments. But as when they are deployed and citizen facing work, they can be used as a substitute for human actions or as a complement to them. Let's begin with the case of where an AI is enhancing a human function.
[A screenshot of an article from the Yale Journal on Regulation titled "Algorithmic Accountability in the Administrative State".]
In their review of the use of AI with American federal government administrative agencies, Daniel Ho and David Angstrom, both Stanford professors, note that the capacity of the state to enforce its laws is a central measure of the capacity of the administrative state. A state that can't enforce its laws, isn't much of a state at all. This is perhaps especially true in the case of the regulation of markets.
[The slide reads "AI for regulatory oversight".]
In its Ho and Angstrom document, The Securities and Exchange Commission, the principal regulator of capital markets in the United States, now deploys a large suite of algorithmic tools to help shrink the haystack in its search for malfeasance. This includes a dashboard which captures more than 200 possible indicators of fraud within tax returns.
The goal here is not to immediately identify audit targets, but to equip auditors to get deeper, more careful and more focused examinations of potential tax fraud cases.
[The slide reads "AI can help investigators uncover patterns of data."]
So here we have an AI, helping investigators wade through masses of data, identifying cases and uncovering patterns they might not have otherwise seen. This is enhancing their work, not replacing it.
[A photo of the U.S. Securities and Exchanges Commission building. Text appears. "Advanced Relational Trading Enforcement Metrics Investigation System (ARTEMIS)". After a brief pause, text is added: "Abnormal Trading and Link Analysis System (ATLAS)".]
In related domains, the SEC has developed tools to detect securities fraud. The Advanced Relational Trading Enforcement Metrics Investigation System, ARTEMIS, and the Abnormal Trading and Link Analysis System, ATLAS, are both designed to use large amounts of data, run through high dimensional models to help detect cases of insider trading. Importantly, these are not completely automated processes. In the case of ARTEMIS, it depends first on potential cases being identified by officials and then uses the AI to estimate the likelihood of fraud. This is an example of an AI technology that helps to augment a human once they've already made a judgment about a potential fraud case. These are not perfect systems.
They may be slow to capture fast changing versions of fraud. They need to be constantly calibrated against real cases to establish ground truths, and they rely on officials who are sufficiently fluent in data heavy approaches to enforcement to be able to understand and effectively interact with the AI. But when these conditions are met, they provide an incredibly powerful set of tools for putting the law into action.
[The slide reads "AI for policy modelling".]
Now, finally, let's consider a case which is being used in a back-office function. But in a way which by design replaces a human in a process. You know, government consultations and policy modelling on policies provide a useful example. When considering and designing new policies, governments will often want to understand where the citizens will support the policies and, if adopted, how these policies may change behaviours of citizens in response to them. We have, over a long period of time, developed behavioural models to understand how citizens will respond to a change policy, whether it's a change in a tax rule, a new health program or new credit for educational or vocational retraining. But as advanced as our measurement models, our econometric models might be for understanding the effects of policy changes, they do not compare to the power of higher order models working across multiple dimensions, pulling data from thousands of sources. Providing government can create a data environment in which large amounts of data can be linked and analyzed, then the best advances in data analytics, not AI, can be brought to bear on the toughest policy problems. We can look at policies on more dimensions than we could ever imagine using an AI rather than a human.
In smaller applications, this will supplement the work of analysts, but at scale, it holds out the promise of almost completely replacing the often-subjective work of human analysts in determining whether a program has worked or not.
[The slide reads "AI can consider the effects of a policy across a limitless number of dimensions."]
An AI can consider the effects of a policy across a limitless number of dimensions, and then collect these into a summary judgment about the efficacy of a policy. A human cannot.
Humans are not completely out of the loop here. They all need to make the final recommendations about whether a program has worked well enough to be renewed, for example. But all the work of analyzing the effects and efficacy of a program can be left to higher order models informed by AI and machine learning.
[Words appear as he speaks. "Four policy challenges: 1. Volume of decisions. 2. Consistency of decisions. 3. Fairness of decisions 4. Learning from decisions.]
In a previous talk, we outlined four policy challenges that those in government face. First, people in government are asked to make a large number of decisions. Second, decisions are to be made in a way that is consistent with policy goals and objectives. Third, decisions, especially those that have bearing on the public and most especially those that have bearing on individuals, should be made in a way that is procedurally fair and consistent with democratic norms. Fourth, we should learn from the decisions that we make. All the applications of AI mentioned here can help us overcome these challenges,
both in citizen-facing work and the important work that happens in back offices.
[Peter and the slide fade out. A purple title card with a starburst reads "Artificial Intelligence Is Here Series". Words appear reading "Tools and regulatory frameworks for using AI in government".
Gillian Hadfield stands in front of a blue background. Slides appear to the left of her. The first is a photo of a glowing model of a human brain in profile, resembling a circuit board. A chyron reads "Gillian Hadfield. Professor of Law and Economics, University of Toronto. Director and Chair, Schwartz Reisman Institute"]
Governments around the world are beginning to explore how they might use AI to deliver public services. Some are already using AI.
[Words appear over the image. "How can we govern the use of AI and machine learning?"]
Some police departments and immigration offices are using facial recognition technology. Some courts and criminal justice systems are using AI assisted systems for deciding who gets bail or probation. In this video, I'll give you an overview of the emerging tools and frameworks for governing the use of AI in government.
[A title slide briefly fills the screen that reads "Emerging tools and frameworks for governing the use of AI".]
Canada was possibly the first country to introduce formal policy on the use of AI in government.
[Words appear over an image of the Canadian parliament buildings reading "Directive on Automated Decision-Making (2019)"]
Canada's directive on Automated Decision-Making was introduced in April 2019 with a goal of reducing risks to citizens and institutions and ensuring that the use of AI contributed to the efficient, accurate, consistent and interpretable decisions consistent with Canadian law.
[A new slide reads "Canada: Key Requirements". As she speaks, her points appear on the slide.
The directive established five key requirements. First, government entities planning to deploy an automated decision-making system are required to conduct an algorithmic impact assessment.
[screenshot of an online questionnaire.]
A self-guided questionnaire intended to guide government officials through a review of the motivation for the use of AI, evaluate its risk profile, flagged potential difficulties and explainability, data sources, quality and risks. The assessment generates impact and mitigation scores, and it's intended to inform thoughtful choices around the use of AI.
Second, the directive has transparency requirements, ensuring that notice is given, that an automated decision-making system is being used, providing meaningful explanations to people affected by the decision. Making source code available in most cases and documenting decisions.
Third, information about the use of an automated decision-making system is made publicly available.
Fourth, the directive requires steps to be taken to ensure that data is relevant, accurate, and up to date, and collected in accordance with privacy law, and that there is appropriate opportunity for human intervention.
And fifth, the directive requires that people affected by an automated decision have recourse options to challenge the decision.
[The next slide features the British parliament buildings. Text over-top reads "United Kingdom: Guidelines for AI Procurement (2020)"]
Since 2019, numerous other jurisdictions have enacted similar types of frameworks and requirements. The UK introduced guidelines for AI procurement in June of 2020 with an emphasis on the use of diverse multidisciplinary teams to contribute expertise in different subject matters, like health or data ethics, systems engineering, visualisation.
[Words on the slide read "UK: Key Requirements". Her points appear on the slide as she speaks.]
The UK guidelines call for conducting data assessment before beginning the procurement process and early attention to the risks and benefits of using AI.
They urge government agencies to engage early on with the market, to ensure that the systems designed meet government needs and to ensure suppliers have considered the risks of bias, explainability, and unintended consequences.
They also urge officials to develop a concrete plan for oversight and governance throughout the life cycle, beyond the procurement decision, including compliance with data ethics guidelines.
[A photo of an office building. Text reads "US Government Accountability Office (GAO)". Words appear over the image reading "Accountability Framework for Federal Agencies (2021)"]
And in 2021, the US Government Accountability Office, an independent agency sometimes referred to as a congressional watchdog, published a detailed accountability framework for federal agencies.
[The slide fills the screen. It has four quadrants of text of various colours, connecting to a diagram of a DNA strand. The slide is titled "Artificial Intelligence (AI) Accountability Framework".]
The GAO's framework is built on four complementary principles: governance, data, performance and monitoring. These general principles are explored in terms of key practices, key questions and audit procedures.
[The top-right quadrant, coloured green, is titled "Governance". The text under it reads "Promote accountability by establishing processes to manage, operate and oversee implementation." Two bullet points read "Governance at the Organizational Level" and "Governance at the System Level".]
The governor's principle calls on agencies to define clear goals, roles, and responsibilities. And to engage stakeholders to ensure compliance with relevant laws, regulations, standards and guidance, and to foster trust and competence.
[The top-left quadrant, coloured blue, is titled "Data". The text under it reads "Ensure quality, reliability, and representativeness of data sources and processing." The two bullet points read "Data Used to Develop an AI Model" and "Data Used to Operate an AI System".]
The data principle encourages documentation and assessment of the origin, reliability, interconnectedness, security, privacy and potential biases in the data used to build and operate an AI system.
[The lower-right quadrant, coloured brown, is titled "Performance". The text underneath reads "Produce results that are consistent with program objectives." Two bullet points read "Performance at the Component Level" and "Performance at the System Level".]
The performance principle seeks to ensure that metrics and procedures are in place to evaluate how well an AI system is meeting a program's objectives, including the avoidance of bias and to enable human supervision.
[The lower-left quadrant, coloured red, is titled "Monitoring". The text underneath reads "Ensure reliability and relevance over time." Two bullet points read "Continuous Monitoring of Performance" and "Assessing Sustainment and Expanded Use".]
Last, the monitoring principle calls for continuous or routine monitoring of a system.
[A diagram of three circles in a row, connected by a line. Each contains icons and has text underneath. The slide briefly fills the screen and each segment of the diagram is highlighted in turn.]
The GAO's document provides a great framing of the path we're on with AI. They lay out what they see as three waves.
[The first circle on the left, coloured yellow, is labelled "First Wave". The icon is a hand holding a smartphone with the word "Tax" on the screen. The text underneath reads "Expert knowledge or criteria and logical reasoning. The first wave of AI is represented by expert knowledge or criteria developed in law or other authoritative sources and encoded into a computer program in the form of an expert system. Example: online tax preparation."]
The first wave hit a few decades ago with the emergence of expert systems used for things like online tax preparation.
[The middle circle, coloured blue, is labelled "Second Wave". The icon is an icon of a person with a square on their head next to a screen with a picture of them on it and a checkmark. The text underneath reads "Machine/Statistical Learning. Second-wave AI technology is based on machine learning, or statistical learning, and includes voice recognition, natural-language processing, and computer-vision technologies, among others. Example: Face-recognition technology."]
The second is hitting now. With machine learning, including systems like voice and facial recognition, natural language processing, which allows computers to read and interact with human written text. The second wave systems are relatively narrow task-based ones. They are built to do a specific thing like match faces to an existing database of facial images, or determine which of a prepared set of answers to supply to a query about taxes entered into a chat box online.
[The far-right circle, coloured purple, is labelled "Third Wave". The icon resembles a ship with lines denoting a wifi signal above it. The text below reads "Contextual Adaptation. Third-wave AI technology combines the strengths of first- and second-wave AI, and is also capable of contextual sophistication, abstraction, and explanation. Example: Autonomous ships."]
The third wave, the GAO identifies moves beyond today's narrow task-based systems to what they call contextual adaptation. Systems that can process a much broader understanding of contexts and goals to make increasingly sophisticated and general choices. The example they give is of autonomous ships.
[A circle is divided diagonally into four sections, labelled design, development, deployment and continuous monitoring.]
The GAO document also provides some helpful graphics with an overview of the phases of an AI lifecycle from design through development, deployment, and continuous monitoring.
[Four people icons inside a microchip are surrounded by eight circles. Clockwise from the top, they're labelled "Software Developer", "Privacy and Security Expert", "Risk Management Professional", "User", "Individual affected by the AI system", "Civil liberties advocate", "Legal counsel", and "Data scientist".]
And another that helps organize thinking about who are the stakeholders involved in the AI system, from software developers to privacy and security experts to users, individuals affected by the system. Advocates and more. It's a really helpful tool for organizing government thinking about this complex new terrain.
Countries are not the only jurisdictions getting into the AI procurement and regulation domain. As of 2021, several U.S. states had introduced legislation to govern some aspects of the use of AI in government.
[A San Francisco Chronicle article headline reads "Oakland bans use of facial recognition technology, citing bias concerns". A Vox headline reads "Illinois says you should know if AI is grading your online job interviews"]
From outright bans on the use of facial recognition by police departments to requirements that the use of AI by governments be cataloged and monitored. And entire multi-country regions are working to coordinate on AI policy.
[A photo of the European Parliament Building. Text appears over it reading "Artificial Intelligence Act (2021)".]
One of the most active jurisdictions on data and AI regulation, the European Union, as of 2021, had yet to move beyond general non-binding ethics guidelines for trustworthy AI, specifically addressed to procurement. But, in the spring of 2021, the EU introduced one of the first comprehensive legislative frameworks for the use of AI, and it will apply to private and public sector entities in the same way. In its current form, the Artificial Intelligence Act would prohibit certain AI applications outright and impose obligations in respect of others, depending on the level of risk to European citizens' health, safety, and fundamental rights. The Act divides AI systems into three categories.
[The slide reads "Unacceptable risk".]
The first is unacceptable risk systems. These are ones that do things like manipulate individuals through subliminal techniques. Or use for social scoring, or perhaps used for real time remote biometric identification by law enforcement.
[An image of a BMO TransUnion Credit Score with the words "High Risk". A Forbes article headline reads "How AI Is Impacting Operations At LinkedIn". An Independent headline reads "Eurostar Trials Touchless Biometric Identification System Using Facial Recognition".]
The second category is high risk systems. These are ones that evaluate things like consumer work Credit worthiness, are used in recruiting, or employee management, for biometric identification in non-public spaces, or in safety critical systems. Or in the Administration of Justice.
[A screenshot of a customer service chat window. Words over top read "Limited and Minimal Risk". The image switches to a man scanning a box in a warehouse with a barcode scanner.]
Then there's limited and minimal risk AI systems, which includes many of the AI the applications currently used throughout the business world, such as AI chatbots and AI powered inventory management.
[A new slide is titled "EU: Artificial Intelligence Act". An icon of a stop light is red. Text below reads "Unacceptable risk systems would be banned in the EU."]
Systems in the unacceptable risk category would no longer be permitted in the European Union if this act passes.
[The stop light changes to amber. The text under it changes to "High risk systems would require significant oversight".]
High risk systems would be subject to the largest set of requirements, including human oversight, transparency, cybersecurity, risk management, data quality monitoring and reporting obligations. Organizations would be required to conduct conformity assessments for all high-risk AI systems to verify whether they meet applicable regulations and other relevant standards.
[The stop light changes to green. The text under it changes to "Minimal risk systems would have fewer requirements."]
Systems seen as posing minimal risk would have significantly fewer requirements, primarily in the form of specific transparency obligations.
As with its major effort in data regulation, which was the General Data Protection Regulation, or GDPR, and which is responsible for the burst of cookie approvals, you now have to click on several times a day. The EU's effort could potentially shape how many countries around the world set AI policy.
[Text appears that reads "Emerging Procurement Frameworks are Ambitious".]
A couple of things seem clear from looking at this landscape of emerging procurement frameworks. One is that they are very ambitious as they probably should be. We all want pretty much the same things from our AI and government. We want it to be fair and unbiased, transparent, reliable, efficient, accountable, and to be built using the right kind of data and to be monitored and overseen by humans all along the way. But at the same time that these frameworks identify these ambitious goals, it's also pretty clear that a lot of government officials asked to implement this vision will be wondering where to start.
[Text reads "Where to Start?"]
Our frameworks and guidelines are, for now, just that. They set out fairly general and, let's face it, vague principles. The EU framework, which as of 2021 was still just proposed legislation, provides somewhat more concrete structure, but it's still dominated by fairly general principles.
[A stacked bar graph titled "Country and Regional AI Laws and Regulation".]
In this sense, government procurement guidelines are like the hundreds of sets of guidelines and principles that we have seen emerge over the past several years in industry and corporate codes of conduct.
[Words appear that read "How do we translate principles into practice?"]
The real challenge for procurement policy we now face is like the challenge we face in the private sector and civil society. How do we translate principles into practice?
["Lessons in moving from principles to practice" An ACLU headline reads "New York City Takes on Algorithmic Discrimination".]
Here's one cautionary tale about the challenge of translating ambitious visions and principles into practice. New York City was an early mover in the AI governance world, establishing a task force in 2018 to make recommendations about how the city should manage automated decision-making systems. The task force drew on a diverse group of representatives from government and partners from the private sector, non-profits, advocacy groups, and research communities.
[A Vox headline reads "New York City couldn't pry open its own black box algorithms. So now what?"]
They issued their report in November 2019, but many felt the effort didn't deliver on its goals.
[A Fast Company headline reads "The First Effort to Regulate AI was a Spectacular Failure." Another headline reads "NYC's algorithm task force was 'a waste,' member says".]
In fact, one member called it a spectacular failure and another called it a waste and a sad precedent. What went wrong?
Critics felt like it never got past broad generalities and did not produce the disclosure many sought of where algorithms were in use in city government already. But the biggest challenge was perhaps the threshold one.
[A screenshot of an NYPD police report.]
What is an automated decision system? Does an automated procedure designed by humans and written on paper old-fashioned style like the NYPD Patrol Guide Count?
[A screenshot of an Excel spreadsheet.]
City officials worried that every Excel spreadsheet used to automate the millions of decisions reached by all of its many offices would get swept into a new regulatory oversight regime.
But other stakeholders, notably civil rights advocates and researchers, didn't want to exclude everything except fancy machine learning systems from oversight because, as one put it, some of the most powerful forms of automation still run-on Excel or in simple scripts. You don't need a multimillion-dollar natural language model to make a dangerous system that makes decisions without human oversight. And that has the power to change people's lives.
The devil is in the details. And any concrete efforts for the use of AI in government will require getting into many, many details.
[The slide reads "Cost of Effort can discourage engagement from the start."]
There's also the problem that asking government agencies to engage in extensive assessment and documentation and consultation can discourage engagement with this challenging agenda from the get go.
Canada has a cautionary tale in this regard.
[A Globe and Mail headline reads "National Defence skirted federal rules in using artificial intelligence, privacy commissioner says".]
Although Canada was the first country to require algorithmic impact assessments for federal procurement. In 2021, the Canadian Privacy Commissioner reported that the Department of National Defence tested the use of AI without conducting and publishing the assessment. In fact, a newspaper report found that as of February 2021, only one government agency had conducted an AIA. And that was by the government office that produced the requirement.
[The slide reads "Where to go next?"]
Perhaps that's because answering questions like, "Are clients in this line of business are particularly vulnerable?" and "Are the stakes of the decision very high?" It's perhaps easier said than done. Or at least, not meaningfully answered yes or no. Now, of course, the assessment is only intended to prompt decision making. But for many in government, it's not clear where to go next. Now that's a challenge we all need to be working on together.
[Words read "Possible Tools" Bullet points appear as she reads them off.]
Some of the tools I'm developing with colleagues around the globe include certification systems and metrics, automated monitoring, and assessment tools. And what I call regulatory technology like AI systems that can identify potential biases in an automated decision system. Or computer software that can achieve compliance with privacy requirements for privacy and security.
[A purple line points from the list to the text "Public-private partnerships"]
Building those technologies will require public private partnerships to create the incentives to make the investment these technologies will require. There's a clear need and role for governments to help foster these techniques to help solve the responsible AI governance challenge, not only in government procurement, but in the private sector and civil society more generally.
And ultimately, we'll need to make sure that the determination of what we want from our AI systems is securely under democratic oversight.
[Words appear over the image. "Governments play an essential role in translating principles to practice."]
Governments, as representatives of citizens and residents are the ones that will need to translate broadly stated principles into concrete standards and metrics to ensure that we achieve the ambition of AI systems in government that are fair, responsible, effective, and appropriately deployed.
[She smiles and fades out. A purple title card with a starburst reads "Artificial Intelligence Is Here Series" Neil reappears, filling the screen.]
Welcome back. Thank you to Gillian and to Peter for those opening remarks, those are really interesting frameworks to help us think about the use of AI in government. It's great to be able to reflect on that, and I think it helps set a good context. So now, as promised, we've got two experts in the field who are going to help us to talk through some of the issues and-- and help us to further develop some of these ideas. We're also going to take your questions, so please use the chat function to get your questions in early.
[He's joined in the video chat by two more participants. As each of them speaks, their chat panel briefly fills the screen.]
So we have here Cary Coglianese and Dan Ho, both professors in their own right. I gave a little bit more introduction at the beginning, but maybe the first thing we could do is just really turn to each one of you and ask you to more properly introduce yourself, if there's anything else you'd like to say about the work that you're doing and the context you come from. And also, just your initial reactions to what Gillian and Peter laid out. Maybe, Daniel, I'll start with you if you don't mind. Just a word of introduction and a little bit of reaction to what we've just heard.
[Daniel speaks, filling the frame. Shortly after, a chyron appears in the lower-left that reads "Daniel E. Ho. Stanford University."]
Sure, I'm a professor at Stanford, where I'm also associate director of Stanford's Institute for Human-Centered AI, and I direct Stanford's RegLab, which actually engages in the kind of private public partnership that Gillian was actually alluding to. We partner with government agencies around these kinds of demonstration projects for how to build responsible AI through these academic agency partnerships. So I thought the opening remarks were really interesting. Maybe just to give a little bit of context to it in how it relates to some of the work that Peter, for instance, had mentioned with colleagues, David Engstrom, [inaudible] and Cathy Sharkey. We had crafted a report that looked at the use of AI by the top one hundred and forty federal regulatory agencies in the US context, and roughly half of them were really earnestly experimenting around with AI. But the Stanford engineers that were on this pretty large research team could only rate about 12% of those use cases being high in sophistication. And to me, one of the real takeaways of that study was how critical it was to have personnel that could really interrogate the use of these kinds of systems. I'm happy to talk more about some of the demonstration projects we've done with the Stanford RegLab, but maybe just to give you one example of how important the personnel side of this is when the Securities and Exchange Commission that Peter noted, built out a number of these prototypes. What was very interesting to note was that ultimately the machine learning team had to deliver the output of those scores to line level prosecutors who were building cases, and it was really the attorneys within that agency that were the folks interrogating the reliability of those scores that were saying, "Hey, listen, you can't just tell me what the risk score is associated with this particular file of insider trading. I need to ultimately be able to build this case and make the argument to a judge. So you need to tell me what exactly triggered those risk scores and why I should be looking at that case." And that, I think is so much of the future of really how we have accountable development in this kind of space, particularly in the public sector.
Fascinating. Well, we're going to want to come back to the question of explicability and transparency, especially for regulators who, you know, get litigated on every kind of decision they make. So absolutely fascinating. Cary, over to you. Just the word of introduction and any reactions to-- to the lecture.
[Cary speaks. A purple chyron appears in the lower-left reading "Cary Coglianese. University of Pennsylvania."]
Sure. And Neil, nice to be here. Thanks for the invitation. I'm Cary Coglianese. See, I'm on the faculty at the University of Pennsylvania Law School, where I direct the Penn Program on Regulation, and I've been working in the area of administrative law and regulation throughout my career. I've been doing a number of projects related to artificial intelligence and its use by regulators more recently. But it really grows out of actually a longstanding interest in the connection between information technology and the regulatory process. I mean, which began actually, maybe I shouldn't admit my age, but you know, before even we had websites for four federal agencies and sort of just moving up into the 1.0 era of IT and regulation. So this is now well beyond that. But-- but still very much in that same line. And I'm interested certainly be here today and happy to talk in terms of building on what the opening presentations were. I would, I would say to piggyback a little bit on what Dan-- where Dan left and-- and where Gillian left, with these public private partnerships, which is quite frankly the reality for most government agencies, they'll have to have some kind of capacity usually coming in from the private sector. I think one of the really important areas to be aware of in setting up those partnerships are the is just the procurement process and setting up a contracting relationship with private sector consultants such that the government will be able to have some access to information to answer some of these questions about explicability and transparency. There was, you know, in the United States litigation that emerged in Texas, in the federal court in Texas over the use of some algorithms to assess teachers and their performance in the Houston Independent School District. And this went to litigation and the teachers were claiming their due process violations. We really need to understand how these algorithms are working, how they're rating our-- our-- colleagues and-- and yet, the private sector firm would not share anything. Even they were even in negotiation over in-camera investigation of the algorithm and the-- the-- the private contractor was resistant to it. That was, I think, in that particular case, really instrumental and the judge ultimately coming down with a decision on a preliminary issue, but an important preliminary issue for the teachers, and then that led to ultimately a settlement. But-- but it was largely because I think the-- the school district really hadn't contractually arranged with the consultant to make sure that that they could get access to or in litigation, at least, key information that would allow them to defend themselves. So, I think you have to go into these public private partnerships with eyes wide open and anticipating not just the more immediate needs of getting the project done, but also thinking ahead to, well, what happens if there's some controversy, or a lawsuit even, over this project? Will we be able to-- to, you know, overcome the claim of the private parties to some kind of trademark or IP protection for their-- their algorithm and their data and their-- their work. Fascinating. I mean-- -Go ahead. I'm just going to respond to that a little bit. I think, Cary is absolutely right. The procurement lever here is really important for the public sector. Just to give another example, in addition to the Texas one that Cary mentions, Customs and Border Protection had procured a number of services for biometric technology to be used for-- for a border entry. And in one of the iris scanning sort of applications, there were pretty significant errors, and the contractor refused, kind of claiming, intellectual property protections, really to divulge what the source of those errors were. That's obviously going to be unacceptable in the kind of public sector context where there is a kind of responsibility under typical administrative law principles to explain. And-- and of course, the sort of due process principles that loom over much of administrative law that Cary mentioned. So I think getting that form of procurement right and not thinking about procurement as a one time-- sort of one shot form of contracting, because a lot of these AI systems evolve over time. As more data comes in, you have to update the systems, integrate new streams of data, and all of that should really be built into the procurement contract so that you're not left with a kind of one-time model that becomes really hard to adapt as context changes. So can I ask you guys a naive question, then? I mean, you can interrogate a decision maker. They're answerable to a political committee or oversight committee or others, but how do you interrogate an algorithm? How do you test that an algorithm is being done in a fair manner? Which of you would like to start?
Well, I mean, first of all, I mean, there's just access to information about the algorithm, how it is structured, first of all, what's the-- what's the objective function that's established that it's trying to accomplish? How is that defined mathematically? So you know that-- that's-- that gives-- gives you some insight into this. I mean, interrogation, you know, is-- is the word you use. I guess I would first start by saying just information disclosure. So knowing what it's what it's trying to do, how-- how it's actually going about trying to optimize that objective function. What are the underlying data sources that are being used? What were the efforts to verify and validate the efficacy of-- of the algorithm? So there's documentation, there's output tests, there's ways of calculating errors. I mean, the underlying information surrounding it can be released. In some instances, as you know, Neil, under the Canadian AI directive, there's impact assessment that, uh, that provides a way of disclosing information about how the algorithm is structured and what it's doing. Yeah, just to build on that a little bit, they're sort of the evaluation of the data on which an algorithm is trained. The model, which principally is about the objective or the outcome of interest, the features that are being used and the kind of modelling approach that is being deployed. And so that's all part of the process of algorithmic design. But then there's a suite of all sorts of interesting-- useful tools to understand the potential weaknesses. Right now, we still live in the age of fairly narrow AI, but that can be brittle. So there are kinds of-- sort of ablation techniques and-- and tests one can do to see, for instance, whether a computer vision model is really just learning from the background or whether it's learning the thing that you actually care about. So, for instance, a really simple example of this that Pietro Peroni, I think gave was does a computer vision model really learn to understand what a cow is? And one way to test that is to put the image of a cow, put it in the background of a beach. And if it doesn't recognize it as a cow, it's really learning a lot of shortcuts that are suggestive of its inability to kind of transfer into different domains. That's really important in the governmental context. We showed in a paper that came out in Nature Medicine with James Zou and co-authors that most of the ML enabled devices currently approved by the Food and Drug Administration are evaluated only on a single hospital context, and that-- that performance degrades really significantly when that model is ported over to other hospitals. So getting that form of evaluation right is really important. The last thing I should-- I should note in response to your question is, Neil, you had mentioned sort of the question of bias, which is of course, central to understanding the reliability of-- of these kinds of approaches. And so an absolutely key approach here is to test for the performance across demographic subgroups. I also actually love to talk a little bit further about some of the notions of evaluation that Cary had already mentioned when mentioning the Algorithmic Impact Assessment, but we can Bracket that as our discussion continues.
Okay, fantastic. Well, um, the one question I want to get and this is switching gears a little bit, but getting back to this idea of the future of work and of AI replacing versus augmenting humans, you know, there's lots of studies that say X number of jobs will be impacted either completely or somewhat by automation. I'd be interested to hear from each of you, any views that you have about how we are, but maybe how we ought to be positioning AI relative to the future of work and the work that-- that employees do. Maybe there, Dan, I'll start with you and then Cary.
Sure, in the reporting they'd mentioned at the outset when we looked at public sector applications across the top 140 federal regulatory agencies, we actually saw very few instances of true displacement of humans in a system. So that category that Peter had. I think the one example he had mentioned was chatbots. Even there, I would query whether the chatbot is simply increasing touchpoints with an agency or truly displacing kind of call-in lines and whether there aren't complementarities between those two systems. And I guess that leads me to kind of the main point on this front, which is that I think right now we're still looking at an age where what is really centrally important is how to think about building out these use cases in a way to augment human capacity, not replace humans. And so most of these decision tools are sort of the things that age in the exercise of human discretion. Still, we're not seeing, for instance, in the adjudicatory contexts, machine learning systems that completely displace the exercise of judgment. It's usually some information that is displayed to the decision maker to be able to make a more informed decision.
Well, you know, let me say that, you know, some of the work that I've done and thinking through the legal issues about public sector use of AI. I have sort of imagined a future in which we might actually have machines taking over and substituting for humans. I think we're closer to that, in some areas, in some domains, than in others. But we're going to get there, I think, eventually and a lot more. So, yeah, I think I agree completely with Dan that this is not an immediate issue, but I do think that long term and that long term could be 10 years, 15 years, even. We will start to see the replacement of humans in a lot of tasks in the public sector. I think that that obviously made for many people who are participating in this-- in this discussion today, seem a little unsettling, perhaps. But I also want to suggest that it gives us an opportunity. It gives us an opportunity to kind of replace humans in doing a lot of drudgery work. And may be repositioning this-- repositioning humans to make greater touchpoints is the word Dan-- Dan put with the chatbots. But but actually making government actually more responsive and more empathic overall is an opportunity we have, actually, if we can substitute AI, you know, for-- for handling that drudgery work. We actually have a great deal of opportunity to make government seem more responsive to people than it currently is. And let us be clear, right. You know, we can imagine this future of a government by robot and it maybe seem sterile and-- and really uncaring. But the reality is today we have often a pretty sterile and uncaring set of bureaucracies in many countries around the world. The citizens' interaction with government offices is often not a very [chuckles] nurturing, empathic-- empathic one. And so I think what we should do, the real challenge going forward is to think about ways of swapping out and using machines where-- where they can do things well and saving humans and maybe even bolstering our efforts for humans to do what humans can do really well, which is to listen and engage, and show empathy.
If I could sort of just add to that, one of the, to me, most interesting use cases of AI in the federal government and it kind of really amazing story of public sector innovation in the US context comes from the Social Security Administration, which, I think, highlights some of the points that Cary is making. We have to talk about robot judges, but in actuality, the way that machine learning is being deployed is is really sort of as a way to assist adjudicators. And there was a guy by the name of Gerald Rea, a co-author of mine, written a little bit about this. But he at the time that he was basically heading up the Appeals Council over at the Social Security Administration realized that this was an agency under extreme stress. There's 1600 judges. They've got a half a million cases that they're processing. There are well documented instances dating back 50 years of the arbitrariness of decision makers. Professor Jerry Machar wrote that essentially it matters more who the identity of the adjudicator is than what the facts of the case are. Gerald Rea sort of realized that the agency was at this point where it was really just producing written decisions and realized that it wasn't investing in the kind of data infrastructure that could really be a kind of game changer for helping the agency understand where errors were being made. And he made a series of really important investments to capture that information. Ultimately built a team that developed a prototype of the natural language processing system called the Insight System that now allows adjudicators to upload draft decisions. And it will process one of roughly 30 quality flags to understand whether or not there are potential errors in decision making. It doesn't change anything about the decision. It flags those for the adjudicator. So judges can know, have I cited a provision of the US code that doesn't exist? Or is there potential inconsistency between the functional impairment that I identified in the statement of facts and the ultimate legal conclusion in the case. And that I think highlights what some of what would Cary was was mentioning, which is this ability to augment human decision makers, but also relieve them of some of the the challenges of high volume case adjudication. And therein is the really interesting kind of potential here for AI to actually go back to the roots of the procedural due process doctrine, where if you could improve accuracy through these kinds of decision making tools, you may actually start to revisit some of the US Supreme Court's anchoring of due process and accuracy alone, and actually free up judges to have more time for hearings to actually convey legal principles and engage with claimants in a way that right now the system is very strained to do and to actually potentially create a kind of adjudicatory system that does not feel as stale and unsympathetic as it may currently seem to many claimants.
So it sounds like people don't need to worry about algorithms taking their jobs, but the people that understand algorithms are going to be more effective and efficient, probably, than those that don't. So one lesson I'm taking from what both of you are saying is that we should all learn to use these tools, much like an accountant would have learned to use Excel or a calculator. We should be using these tools to enhance our-- enhance our-- the quality and number of decision making. That's fascinating. Both of you are legal experts. You've spoken about the legal system. There is a question from the audience about the question of-- can you give a short overview of the challenges faced when using AI in the court system? The tribunal system, or the legal system. So I know you've touched on this already, but maybe I could just invite you both. And maybe, Cary, I would start with you. If you could just highlight for us a few considerations when using AI in the court and legal system.
Sure. Let me just build into that by, um... adding to the last point that Dan was making about using AI to augment and improve adjudicatory processes in the administrative state. You know, one pathway forward is as I was indicating, you know, take care of some of that drudgery work to augment human decision making and free up time to engage with the public. But there's also just the the possibility that the public really just wants an answer. You know, we have some administrative tribunals in the United States where people are waiting years for a decision. And it may be that just by automating some of these adjudicatory processes, we can get faster decisions and that will be really satisfactory and satisfying to most members of the public. And I just point-- want to point to a private sector analog, if you will, to adjudication. A dispute resolution tool that PayPal and eBay have developed that's fully automated and is no human, you know, providing any empathy at all. But nevertheless walks-- when people have a dispute over a contract that they've engaged in with some other private party via eBay, they can go through this automated dispute resolution process and resolve the matter pretty quickly without any humans getting involved at all. And what eBay, at least, reports is that the customers who end up having a dispute and having that dispute resolved through this automated process? Well, they have a greater tendency to want to come back and do more business with eBay than someone who never had a dispute at all. So it's possible both, it seems to me, to improve the efficiency through automating processes as well as to free up human space for other tasks as well. In terms of, you know, what are the challenges of using, uh, AI in the adjudicatory context in tribunals? Well, I should say that, you know, a paper that I have coming out ooks at AI in adjudication and administration. And what we see in the court system in the US, at least, is that there's some building blocks moving toward the possibility of of online dispute resolution. But-- but we're not really there quite yet. And what are the building blocks? I mean, we've got a lot of digitization of court records, for example, that is necessary. So, you know, what are the challenges? And one thing that's needed is you need data. And so if-- if you still have courts that are, you know, filing everything in paper, well, you know, you're going to need to get that-- that digitized. So that's one thing. The other thing, too, is that in adjudication, there are, you know, really kind of sui generis type of decisions, you know, one of a kind sorts of scenarios and that have to be made that, I think, will inherently be limiting for any AI tool to-- to really address. What AI will be capable of doing is giving you some good answers when it comes to questions that are repeated and where there can be large data sets developed. But as to whether, you know, Tom Smith was at the-- the bank at the time, the bank was robbed, you know, that-- that kind of individualized, very personalized decision that that ultimately rests upon an adjudicator is probably more of a one of a kind decision that will-- Now, in building the case for that, will there be possibilities for using machine learning to analyze the DNA or evidence that was found and make some judgments about how likely it is that-- that evidence belongs to the defendant? Those kind of sort of building block pieces for a case can be there. But I think that integration in this sui generis situations will always need some. Ultimately, I think some human judgment to-- to be made. Yeah, I think Cary is right there, particularly forms of agency adjudication that have been known to take years and years to come to a conclusion. So in the veterans adjudicatory system, in the US context, it can take five or more years from the time that a notice of appeal is filed until you have an appeal fully resolved and there are just inherent costs to that kind of delay. The vast majority of veterans claims in that kind of a system are about claims for disability compensation for injuries sustained in service. And so the very demand for a written explanation that the American system sets up can itself be costly to claimants, and one has to trade that off against a more expeditious system to resolve these kinds of appeals. And the-- the question that was asked about some of the challenges with the use of AI in-- in the legal system. And I think that's still something that we're very much wrestling with. So I'll just mention a few of the the kind of main sort of areas of contestation currently. And I think the one interesting case here is the Wisconsin Supreme Court's case that contemplated whether the use of a proprietary risk assessment algorithm in the criminal justice system violated constitutional rights and the right to a kind of individualized decision. And so some of the main kind of areas of of concern are about due process and what kind of information should be disclosed to litigants to be able to successfully defend themselves against something that was influenced by an algorithmic decision-- decision tool? The other main question that arises is a question of kind of equal protection and other sources of anti-discrimination law, for instance, for criminal risk assessment scores. One question is to what extent can gender be used in these kinds of tools, given that there are apparent known differences in the likelihood of a recidivating across gender lines? I guess the last point I would make on-- on this front is that as these kinds of decision tools are used and I think they're much more likely to be used in forms of agency adjudication because the case volume is just so much higher, there's much more information to-- to kind of contend with. One of the things that we-- that these systems will start to uncover is that the kinds of human labels that systems are often trained on, for instance, to make a decision about whether this is a likely grant of disability benefits, may turn out to be wrong. And so one of the really cutting edge challenges is going to be how to build living systems that enable you to properly update, reweight and secure higher quality labels. As the very decision making tool that you're incorporating into an adjudicatory system is telling you about some of the weaknesses of the existing human adjudicatory system. You know, there's also a possibility to follow up on that-- that over time, we'll lose the human ability. You know, a little bit like how people have a harder time reading maps and following directions, now, that they could just rely on Google to tell them exactly where to go. So we do have to realize that as we develop some of these systems, if they are starting to take over humans, you know, we may lose some human capacity. I think, you know, there may be some domains in adjudication in particular, where we would want to keep on, just know sort of for quality control purposes, maintain a degree of expertise, maybe a smaller cadre of-- of judges who are, you know, maintaining their expertise and providing continued validation to-- to the automated system.
Those are fascinating points. I love the point that, you know, normally we think about humans teaching algorithms and making algorithms better, but the idea that the experimentation of algorithms might make humans learn something is fascinating. And then also this idea that we have a duty to understand the basics and not let algorithms sort of take over in decision making and create a kind of collective memory loss about that. Those are fascinating. I also want to ask you both about experimentation. So you've seen a lot of solutions. Some of these are, you know, sound like they're actually put into practice. They've already gone to the courts and been litigated, which is amazing. But there's also the question of experimentation and learning in this area. So I would love to hear from each of you just in terms of what kind of set up or approach that you think public servants should take or create to experiment with these kinds of algorithms and to learn about them. And maybe a little bit what you've seen as a best practice or, for example, where you say, yeah, that's-- that's a really excellent approach so, Dan, maybe I'll start with you given your survey and then, Cary, turn to you.
Sure. I think when we kind of engaged of the many officials around kind of how they were approaching the R&D process here. One theme that came out was not just the lack of kind of capacity in many of regulatory agencies, but also the real challenge of identifying skill sets that are at the intersection of domain knowledge and technical knowledge. And that is, in a sense, what was so ingenious about the SSA sort of form of innovation is that it came out of a constraint, which is that Gerald Rea was not allowed to hire anyone but attorneys [inaudible] counsel, and anyone who was a software engineer would have to be hired under IT. So he happened to just identify individuals who also had an aptitude for data science and software engineering. Had them actually adjudicate cases for years until they were promoted and could be assigned to other duties under their job description. And so it turned out to be that you had this really unique blend of expertise that knew the innards of-- of kind of the adjudicatory system so that the main developer of the insight system, Kurt Glaze, simply started out with the kinds of flags that he wanted to have available for himself in the shoes of an adjudicator that he had literally occupied for three years. I think that's one of the-- the areas where you're likely to-- to have a really affirmative culture to-- to be able to-- to pilot, prototype and test different potential models. I guess in terms of experimentation, the other thing I would add here, which I know is not necessarily the sort of R&D version of the question that you asked about, Neil, is that I do think it is absolutely critical for us to think about the evaluation standards of how these systems operate. And by that, I don't just mean a narrow technical sense of evaluation, which is what we often focus on when we develop ML systems, but actually a rigorous evaluation of how these kinds of systems operate in practice in the human institutional context in which they are deployed. And we see this over and over again that, you know, when you give a judge a risk score, it doesn't directly translate into a change of decisions. We have pretty decent evidence these days that judges actually, well, under certain circumstances, overrule recommendations by algorithms. And so I think understanding the actual impact on human decision making will be crucial as these kinds of tools are stood up.
That's really playing the long game like having those folks learn the-- the whole process with the data science stuff that sounds like it takes years to do. I don't know, Cary, if there's anything quicker that you could recommend.
Well, certainly, I think, you know, where you-- where you can hire data scientists directly, you should, I mean, there's no need necessarily to have to do that kind of a workaround if you don't have to, right? And I think what Dan is saying is that really that was the only option available at the time. But-- but clearly, by the way, it highlights a really important message that I hope everybody here comes away with. And that is, even though we're talking about technology and advances in its artificial intelligence that we're emphasizing, this is not fundamentally a challenge for technology. Yeah, I mean, you do need the data scientists. You do need the computing capability. You do need the data of the ability. Maybe you need identifying numbers to match data sets. I mean, there are a lot of technical aspects to it, but fundamentally to do this well, it's a human challenge and you need to build up the human capability within your government office, or department, or agency, or ministry if you're going to really do this well. So, by the way, also another point that I hope people can come away with is not only familiar with the Canadian directive, but there are a slew of best practice principles and the like that are out there. The OECD has a set of best practice principles. Dan was involved in a process that the US Administrative Conference of the United States developed a statement on best practices all geared toward government use, so people should know about that. As far as evaluation goes, you know, I think there's value in what I would call kind of algorithmic shadowing. And that is setting up structures, so you continue with the status quo system as it is, but you build on this capability in the background separately to be seeing how your algorithmic system would be operating to the extent that it is, you know, you're aiming for it to become something that maybe could substitute for human decision making. Before you do that, you do this shadowing process and you compare and see how well that system works against the status quo. You know, there are problems with AI systems. There are problems with bias. There are mistakes that AI systems make. What we hope for is that if we're going to shift to an AI driven government, increasingly, that it's performing better than what we have right now, because quite frankly, as we've already kind of indicated with some of the delays and so forth that are in the current system, the human system isn't perfect either, right? And a lot of like, quite frankly, a lot of the concerns about bias and discrimination from algorithms owes itself to these algorithms being trained on data that are human created and that have human biases built-- built into them. Anyway, all of this is to say that comparative question is really key. Yeah, AI compared to what-- it's got problems, but compared to what? And if you can do a kind of a shadowing process, you can really make that comparison explicit in an evaluation framework and be able-- I think we owe it to the public, quite frankly, if we are going to automate certain governmental processes to be able to demonstrate, yeah, we actually have some improvement from it. That's-- that's really critical.
David Engstrom, sorry, just to kind of respond to that a little bit, David Engstrom and I call that sort of a human alongside the loop approach, where right now you wouldn't have a major tech platform that would change the font of its website without doing some AB testing on it. And that's essentially what you need in these kinds of systems. You've got a new adjudicatory tool. You want to have a kind of create a comparison group of cases that are adjudicated in the conventional way and cases that are adjudicated, augmented with this tool and see whether you're spotting any differences, whether it's improving along the dimensions that you care about. Case processing time, reduced sort of drafting efforts back and forth between the supervisor and line attorneys and that kind of stuff. The one thing I wanted to briefly respond to Neil, too, is that I wasn't suggesting that the only way to do this is actually the SSA kind of long game. I think, actually, what the SSA innovation case study teaches us is what some of the existing barriers are that really do need to be broken to enable more responsible forms of AI innovation. It's really, you know, truly a bureaucratic problem in the classic sense to have different divisions and you're not allowed to. You can only hire the data scientist out of one find the knot, you know, in the Appeals Council, where actually where the person might actually interact with the domain experts. And so the broader lesson there is that you really do need to have these kinds of interdisciplinary teams that bring hybrid knowledge that crosses technical and substantive domains to build out these kinds of tools responsibly. It's going to be a disaster if it's seen as, "Oh, the technical team will develop something in a year, they'll deliver us something and we'll will adopt it into decision making." It really what the SSA case study teaches us is that much more of this is about iterative continuous development to understand what is technically feasible, what problem-- what substantive problem is worth solving, and where the kind of sweet spot of innovation lies.
Just to come back to an earlier point, too, about contractors to just-- I think that this should dissuade anybody from thinking you can just sort of call up a contractor, get a contract made and in-- and then within a couple of months, have one of these systems in place to do it well and to be responsible does take really some tremendous thoughtfulness.
Well, I absolutely love these points and it pains me to think that the government is sort of functionally divided and that we know we need these interdisciplinary, agile teams to make progress. I'm going to draw from another question from you-- from our participants. And it's a question about backlogs. And if so, the question is-- is when AI is used to address backlogs versus traditional ways of addressing backlogs, which might be actual regulatory changes or administrative policy changes or other changes. Could this lead to a risk that we're not going to identify the pain points in our policies and our program design and our program delivery? So sort of paving the cow path through AI and not actually reforming the system? I don't know which of you would like to jump on that first.
I'm happy to take a first crack at that because I think it goes back. I think to the discussion we had, Neil on your first question about how we interrogate the use of these kinds of tools. That is what's the objective that you're trying to solve. And it's absolutely the case that one could start off with the objective of just kind of, "Oh, let's reduce the-- the backlog. " And one might naively kind of build out a system that would predict the expected processing time for a case and try to, you know, have an operations research type implementation where you wrote cases differently, potentially to-- to improve case processing times. But I think the question is a really good one, because if you've got that kind of team with blended expertise, that may not be the highest priority problem that is worth solving. And let me give you an example of it. We at the Sanford RegLab collaborated with Santa Clara County's public health department around COVID-19 contact tracing. Within a matter of weeks, the Santa Clara County Public Health Department, which is the first along with five other Bay Area counties to issue a shelter in place order in the country, had stood up within weeks, a unit of nearly a thousand contact tracers who are supposed to call upon a diagnosis and help people in the isolation and quarantine process. One of the very real challenges there was, that it was not possible when the system was stood up, given the current software limitations, to actually match the language skill sets of contact tracers with the likely languages of people being routed in through the system. That was a really significant challenge in terms of health equity because while roughly a quarter of Santa Clara County's population identifies as Latinx, over 50 percent of cases at baseline were for individuals who are Latinx, and there are really significant perceived language challenges. What we didn't do in that design process was simply try to think, "Okay, we've got a thousand contact tracers. Let's minimize the processing time or predict which cases are easy to process and do." What we really focused on was health equity in and of itself. And so, what we did is we took the laboratory information as it came in, matched it with other administrative data, including census data, to predict the incoming language needs of each case. And then simply reroute them towards the bilingual contact tracers. The results were pretty stunning. We ran this in a randomized controlled trial and reduced case processing time by 14 hours and improved patient engagement as measured by the number of contacts that people offered during their interview. Now, one could have approached this purely as how do we reduce the case processing time? But really, I think, that part of the design process that was so important here was identifying precisely what I think the questioner noted as the pain point and policy of program delivery. And the pain point was particularly the language mismatch that many of the contact-- the contact tracers were facing, such that the folks who were hired on as bilingual contact tracers weren't actually able to utilize the skill set that they uniquely brought to the table.
I just would say that, you know, some of our ways of dealing with clearing backlogs without AI can also miss those pain points too, right? I mean, we can just say, "Oh, let's just clear this, let's give everybody, you know, amnesty or whatever. " It might be a blanket sort of clean the slate to clear the backlog approach. We'll also often not identify those pain points, either. I think we should assume, basically, that if you are going to be automating and creating an artificial intelligence system, you are going to have the ability to process things faster. I think that should be given, and Dan's point is spot on, that the goal, then, shouldn't be clearing the backlog, but achieving the outcomes that we care about. And, you know, more accurately, or at least as accurately as is under the current system and not assume that it's going to be faster and cheaper because you're eliminating a lot of humans and a lot of processing time. But, you know, by the way, it also makes a point, I think, that's also worth highlighting here, is making sure that people understand that our current processes their algorithms too. They may not be machine learning algorithms, but, you know, the procedures that administrative agencies have to follow, those checklists, those are all algorithms, too. So our choice fundamentally in this space is one between a certain kind of algorithms that we're operating by today, versus some other kinds of algorithms that are going to be, I think, faster and cheaper. Then the question is, can we also achieve the substantive goals and the process fairness considerations that we want as well? And-- and in some cases, it's not going to be the case, but in, I think, many cases going forward, it's quite promising that we will improve our governmental performance through AI. Well, I really like the both of you have emphasized sort of the algorithm's performance relative to what and the importance of setting up experiments and evaluations and to critically analyze the contribution and preferably around a narrow-- a narrow band rather than our vast question. I'd like to just combine a couple of questions that have come in, both of which are, you know, you've mentioned a few resources. So it's questions about those resources. So one question is "What are some of the leading think tanks, cities or universities that we should look at if we want to learn more about the use of adjudication around the world?" And the second question was "Where can we go to read more about each of your work more?" So I'd like to give you just a couple of minutes to talk about some of your favourite references or sources of information. And let me also say that we were going to capture all of these links. And after this event, we'll put them in the email summary that is emailed to you all so that you can get those. So Cary, maybe I could start with you and just ask you if you could give us a couple of minutes of your favourite resources and then turn it over to Dan.
There's really, you know, I can say, nope, no shortage of-- of good outlets, some of the OECD is working on these issues. Obviously, and Dan's going to talk about what his group is doing at Stanford. They're doing a lot. Gary Merchant's group down at Arizona State are doing a lot about the soft law and governance issues. The University of Toronto's got a good group of folks. So there's-- And of course, you know, my own institution. We've got a very interdisciplinary group of computer scientists, engineers, data scientists and lawyers. We all interact as well. If you want to find out some of the work that I'm doing. Go to the Pen Program on Regulation. That's www.penreg.org. And yeah, I've put a number of papers related to artificial intelligence in the law that I'm happy to pass along, links to or whatever. If you want to pass those along to folks. Fantastic. We absolutely will, Cary. Anything you give us We'll pass along in the summary email. Dan?
Yeah, there's many wonderful institutions that are working around this space. If you're interested in some of the work that we're doing here at Stanford, the website for the Reglab is Reglab.Stanford.edu, and you can find there, for instance, a chapter that documents what the Social Security Administration did. We have the pretty lengthy report that we did for the Administrative Conference of the United States that covers the use of AI across federal regulatory agencies, but has several chapters, actually, on forms of adjudication. So, for instance, placed under US administrative law, informal adjudication, which are not sort of the-- the quasi-tribunal kind of proceedings are the vast majority of-- of-- of adjudications. So we have a chapter that goes in-depth into the tools that the Patent and Trademark Office has been trying to-- to build out to help its patent examiners and trademark examiners improve the examination process. I think some of these-- I think what I also heard in the question is, is the future a sense of how we do education in this space rather than having kind of siloed forms of technical education? How do we actually improve that? And I think Stanford's Institute for Human-Centered Artificial Intelligence is very much trying to do that. We have a series of courses that really bring together engineers, social scientists, and lawyers to work together on these kinds of problems. So you can find that at Stanford HAI and there's a series of courses that are listed there and the administrative conference report that we wrote was actually one of those products. We have around 30 students, half computer scientists and have lost students who are working together to really understand the technical side and the legal side of each of these particular use cases.
Fantastic. And for the federal public servants, I should also mention that the Canada School of Public Service has a portfolio of regulatory AI projects that we would love to share with you, and we'll include a link for that in the follow up as well. So we're getting to the end of the-- of the session. And so I just wanted to really ask each of you to leave us with a core or a key message if you could leave us with one message. What-- What would that message be? So, Dan, maybe I'll start with you just if you want to leave all of our participants with one message.
I think there is tremendous potential for the use of AI in the public sector. Right now, we're living in a state of the world where there's a really sharp gap between where the private sector is and where the public sector is. So I could not imagine a more important enterprise of how to figure out exactly the kinds of questions that are being raised by this audience of how to actually build these kinds of teams within the public sector, where the gains in terms of social welfare can be so much more dramatic than at least some of the current private sector applications. As, I think, Jeffrey Hammer back then put it right now, where we live in a state of the world where we're taking the world's brightest minds and having them think through the problem of how to make people click on more ads and at least the mission of the RegLab and that many of us are working on is how to think about that kind of responsible deployment within the public sector.
I think if we look at the histories of different technologies and how they've interfaced with the legal system, we see some parallels with what we're living through today with artificial intelligence. If you go back and actually look at how some of the courts responded to basic DNA typing early on, they were very skeptical of it and resistant to it. Same thing is true if you look at breathalyzers or speed-- automatic speed detector systems, a lot of resistance at first. And yet, today, these technologies are considered best practices state of the art, if you will, even. And I think I draw from the experiences of some of those technologies that have gone from a period of skepticism, and failures, and problems to acceptance and even, you know, being treated as the gold standard in adjudication today. I draw on that, and I would leave everybody with a hopeful vision for artificial intelligence, too. I mean, I think there's a lot of skeptics out there. There's a lot of critics. I think that criticism is really important and really good. That's one of the ways that these tools will get better. But there's a lot of what I think of as sort of a negative rights approach to AI right now, a lot of movement, political movement, to resist that use of AI by governments to give people a right to a human decision. Human decisions aren't always all they're cracked up to be, right? We, as humans, have our own failings, and biases, and problems. So I'm hopeful that actually we can start to build tools that are smarter than we are, that can augment what we're doing, maybe even take over for some of the things that we're doing where at work that can do it better and we can solve problems better. The public demands that maybe we'll find a time 15, 20, 30, 40, 50 years from now perhaps when instead of seeing a negative rights approach to AI, we'll actually see a positive rights approach. People will demand it. They'll expect it. Well, why don't we have AI solving this? Why do I have a human making this critical decision that's going to affect me in my life? I want a good, reliable AI system that is, I think, the hopeful vision going forward. It's not going to be easy getting there. It's going to require more than just technology. It's going to require good human-- responsible human decision making about how to design, develop, evaluate, and validate, and continuously monitor, and improve AI systems going forward. But I think we can get there. I really do.
Thank you, both. This is really provocative and interesting and optimistic about the role of AI and one thing is for sure, I think we all need to be able to understand this discussion and to learn together to come into this, this new world of using AI based tools and approaches to inform our judgment to create, as Dan said, public value for everyone. So, Cary and Dan, Thank you for this most fascinating discussion. Thanks as well to Peter and Gillian for their introductory remarks. This has been a really interesting discussion. We've brought this to you from the Canada School of Public Service in conjunction with the Schwartz Riesman Institute at the University of Toronto. They are a leader in this area, as was mentioned, and we'll continue to bring content to you together in this area. In the meantime, if you want to learn more about artificial intelligence, please check out the Digital Academy's resources on the website. Certainly, where we want to bring you more events like these, where we can have amazing presenters and experts in their field speaking to us and discussing with us.
[A small chyron appears in the bottom-left that reads "canada.ca/school-ecole"]
So thank you as well for your questions. Also, your feedback is very important to us. We will be sending you an electronic evaluation. Please fill it out. We look forward to seeing you again next time. Check out our website at the Canada School of Public Service to look at the latest learning offerings and events. The next event in this series is on Monday, January 17. The topic is How AI is Transforming the Economy. And we will have registration details up soon and we hope to see you at that event. So once again, thank you very much for your participation today. Have a great week, everybody.
Thanks so much.
[The chat fades out, replaced by the Webcast Webdiffusion logo. The Canada logo appears.]