Hacking for Good: Unmasking Bias in AI Models at Def Con 2023
As an economic developer I don't usually find myself trying to hack AI models. But thanks to a recent partnership between Microsoft and Black Tech Street (BTS), with support from Tulsa Innovation Labs, that is exactly what I found myself doing in Las Vegas last week. Let me explain.
I took up the gracious offer by BTS to join their contingent of more than 50 people attending the Def Con, the world’s largest gathering of hackers. As a member of the BTS delegation and Red team, our job was to prompt AI models to act harmfully.
The models are built to screen for hate speech and harmful language while also trained to detect a multi-layered iteration of racism, bias and bigotry that is subtly embedded in human systems and the training data of the AI model. This was in partnership with the White House Office of Science and Technology Policy and SeedAI.
Preparing for Def Con was weird. I spent days trying to figure out how I would secure my devices from getting hacked and considered buying a burner phone a couple of times. It’s custom at Def Con is to embarrass the “sheep” by hacking them and posting their username and password on a large interactive display board, known as the “wall of sheep.” Likewise, cash is always King at Def Con because anything that can be hacked will be hacked, so I had a wad of cash with me at all times.
I immediately plunged into the action at the AI village where my fellow delegates brainstormed and created interesting prompts to trip up the AI model. While I tried and failed a few times, I could hear others in the back of the room cheering their success.
Our jobs were to find flaws in the AI model, which is an interesting way to generate feedback to redoubt this nascent technology. I tried my best to prompt the model to retrieve personal data, but it did a good job of resisting my prompts.
Red teaming is a common method for revealing vulnerabilities in software, and I think that this approach of massifying and diversifying the prompts will prove effective and efficient in revealing the embedded bias in the AM models’ data. The potential for artificial intelligence to transform various aspects of human life is immense. However, the impact depends on intentional design that prioritizes human wellbeing. If properly designed, AI can contribute to an equitable redesign of human systems. Conversely, poor design can lead to significant harm due to foundational bias.
A significant takeaway from this experience is the need to mainstream, diversify, scale, and incentivize efforts to expose vulnerabilities in AI models. Fundamentally, it’s GIGO: Garbage In, Garbage Out. If the dataset we use to train AI models reflects the historic injustice that minority groups have suffered in our society, then we have just successfully bequeathed our discriminatory practices to AI.
This is a clear and present danger for AI and it underscores why BTS had one of the largest delegations at Def Con this year. It is also why TIL is building our cyber and data community in the shadow of the historic Greenwood neighborhood and “Black Wall Street.” Our approach is to build along with the whole community, not for the community.
Ladies and gentlemen, I returned home from the conference with my dignity intact! My name was not one of the 2,000 people whose hacked details were displayed on the “wall of sheep.”
Download file