Demystifying the Generative AI Boogeyman

By Patricia Cartes Andrés & Lucía Gamboa, August 2023

Frankenstein was first published in 1818. In it, Mary Shelley imagines a scientist who creates life and is subsequently horrified by the product of his work. Victor Frankenstein goes on to fear the destruction of humanity if he caves in and creates a companion for his monster. Surely they would procreate and end the human race, his reasoning goes.

A hostile Artificial Intelligence (AI) takeover of humanity is a relatively common theme in pop culture. Over the last few months, we have been consuming narratives about the impending end of humanity and, on the contrary, recognition of AI’s transformative capacity to improve our lives. It’s challenging to stay calm and collected when you are facing hurricane force winds. As the global AI frenzy begins to become more reasoned and nuanced, it can be helpful to look back at how online safety, policy, and integrity specialists have applied Large Language Models (LLMs) in their fight against abuse. Over time, practitioners like us have gained an in-depth perspective on these nascent technologies and their applicability to the world of Internet safety. 

We are part of a broader community of professionals who have seen the evolution of the industry and adaptability in tackling profoundly challenging online problems. In the same way that AI is not new, the use of AI and Machine Learning (ML) for detection and enforcement in an online safety context is not a novel concept either.

Patricia: operating in the trenches of nascent trust & safety teams 

I started my career in trust and safety before technology companies, Ebay, in particular, coined the catch-all name for teams operating on the frontlines of the effort to moderate content and promote safety online. In 2006, at Google, we called ourselves the “Search Quality and Webspam” team. In 2009, at Facebook, we were known as “User Operations.” At Twitter, there were different streams of this same work, such as “User Safety Policy”, “Legal Policy”, or “Product Trust.” Eventually they would consolidate and embrace their common DNA as Twitter Trust & Safety. 

The development of the field was common across content-rich companies. These new platforms would enable their users to upload content at lightning speed and without geographic or cultural barriers. As text and rich media began to find its way into profile photos, direct messages, and other shared spaces, questions of what content should be allowed or taken down would be grappled with around the world. These philosophical debates still lie at the foundation of the modern Internet. 

Back then, organizations such as the Trust & Safety Professional Association didn’t exist, and content moderators often worked in company silos. We struggled with the challenges inherent in reviewing hundreds–if not thousands–of pieces of content daily. I will never forget my days moderating harmful content—and the frustration I would feel every time I had to review the same piece of illegal or abusive content over and over again. Before the introduction of new technologies and industry collaboration, content moderation was highly analog. It could feel like a never-ending rabbit hole as bad-faith actors became increasingly determined to re-upload content that we removed. My team and I would talk endlessly about technical solutions to prevent illegal imagery, particularly the most egregious cases, such as child sexual exploitation, from being uploaded across user-generated content platforms. To make the work effective and scalable, we clearly needed a technological breakthrough and thankfully one soon arrived.

PhotoDNA was created in 2009 by Dartmouth College Professor Hany Farid in partnership with Microsoft. At its inception, it was a hash-matching technology that developed a signature for online photos that had been deemed the “worst of the worst” by the National Center for Missing and Exploited Children (NCMEC). This digital fingerprint would enable companies deploying the technology to locate any matching photo wherever it had been distributed on the Internet—even if the photo had been altered. All of a sudden,  we trust & safety specialists weren’t alone in our fight to combat child sexual exploitation online. For example, if I processed and reported an image to NCMEC and they incorporated it into the PhotoDNA classifier, counterparts of mine at Google, Twitter, Flickr and other companies would not have to review that same content. In effect, they could automatically remove the illegal content with the help of AI. Over the next decade we would see large language models transform the way we detected and dealt with harmful and illegal content across the Internet. 

Lucía: protecting against foreign interference and misinformation

My tech policy journey started with a bang. I joined Twitter in 2018 amid the Cambridge Analytica scandal. Although that particular scandal focused on another company, the belief that online ads were the main vehicle for influencing U.S. elections in 2016 became increasingly prominent in political circles in the wake of Facebook’s crisis. The U.S. midterm elections were fast approaching. Companies were rushing to establish transparency centers for advertising in anticipation of potential regulations, although these proposals never became law except in a few U.S. states. Additionally, companies faced challenges in verifying the identities of advertisers. At the time, we were authenticating advertisers by confirming their identity and location by mailing physical letters, but it was slow and cumbersome work. Something had to change. 

Better alternatives and complementary solutions emerged. I drafted one of the industry's earliest policies on state-media advertising. This policy prohibited the paid promotion of state-backed content where states exercise control over editorial content through financial resources or political pressures. The need for this policy became apparent during the 2019-2020 Hong Kong protests, where state-media presented a narrative different from the actual events. This approach was then expanded to adding labels to state-media accounts to give people context and allow for informed judgments regarding the content shared from these accounts. 

Civic integrity became a term to describe the need to protect conversations from interference and manipulation, particularly during elections, and mature programs and teams emerged across the industry. The ML algorithms that had been used to detect harmful content could, theoretically, also help to identify potential misinformation for labeling. Furthermore, with increased funding of machine learning and ethics teams within social media companies, it became possible to test hypotheses about biases in timeline algorithms, and to push content moderation more towards a context-led paradigm versus a removal paradigm. 

Fast forward to today and we now have third-party AI prototypes that can serve as a one-stop shop for knowledge on policy, community guidelines, and integrity tools. We are starting to see the introduction of LLMs that enable new and growing platforms to launch globally robust policies from the outset and assist with enforcement. Open source AI moderation models also help identify and flag instances of moderation evasion across platforms, stopping repeat bad actors from returning to services once their accounts are suspended. On the account verification side, while limitations persist and there are still policy trade-offs to consider, the market is now saturated with established companies offering more scalable solutions.

We have come a long way since the early days of content moderation and providing a more context-rich timeline. Trust & safety professionals, as well academic experts advising on the evolution of content moderation and online safety, have collectively contributed to case studies, escalation guidelines, research papers, bespoke technological solutions, third-party engagement, and government engagement. These efforts have been essential to remind us of the need for a collaborative, multistakeholder approach to Internet governance. 

Right now, the challenge will be AI moving from Internet safety work streams to the consumer layer of the Internet. The overarching concern is that the core technological development of AI is outpacing the creation of robust mitigation systems and safeguards. How do we build scalable AI solutions to oversee AI? What ethical concerns does this raise? Similar to every other moment in the development of the modern Internet, we must anticipate that bad actors–including the models themselves–will misuse and abuse these technologies to gain an upper hand. This is a challenging task, particularly when safety experts from new AI platforms warn us that we have not yet seen or fully comprehended the potential misuse of their products.

Automated troll bots can create harmful posts on a variety of platforms at a speed that would overwhelm any victim and the ensuing hostile online environment is particularly difficult to manage. Algorithms can learn granular-level data about victims, and generate highly targeted messages and content. Bad faith actors can game safety protections and increase the virality of harmful content. Doxxing, swatting, online shaming campaigns, or sending unwanted deliveries to an address can all be automated. This could make the risk of highly personalized–but less prosecutable–forms of automated harassment more likely, requiring updates to criminal statutes and new frameworks to protect citizens. 

GenAI models that generate fake news already exist. Deepfakes and vishing (voice phishing) have been used to influence the electorate and thus seek to disrupt democratic processes. On a recent joint webinar, the non-profit Thorn and the Tech Coalition–a global alliance to combat child sexual abuse online–warned of synthetic imagery of child sexual abuse overwhelming trust & safety teams and law enforcement. It also raised concerns about whether existing legal frameworks are fit for purpose.

Conclusion

We can’t underplay the risks, but we must fight the temptation to be apathetic or despondent. After all, it’s up to us to lead the way, like we did in the early days of online content moderation. In that spirit of optimism, just look at these examples of established and collaborative forums that are already diving into these knotty issues and producing new ideas and recommendations: 

  • Witness, a Brooklyn-based non-profit organization, is a leader in utilizing media to support frontline defenders in their fight for human rights. They have been working on a framework that proposes holistic provenance solutions to combat the damage of deepfakes since 2018. 

  • Partnership for AI has championed safety-critical AI frameworks and brought vital stakeholders to the table to develop programs on inclusive research and design, media integrity, fairness, transparency, and platform accountability.

  • The Undersight Board, a generative-AI policy testing system, helps policy managers, moderation leads, and operations teams to generate, test, and deploy localized inputs for trust and safety policy decision-making. 

We have always said we must develop multistakeholder frameworks to enable collaboration between civil society, the AI sector, platforms, academics, and governments. We have no choice. Governance of AI is not going to take care of itself. It requires commitment and steely determination. The good news is the building blocks are already in place from the earliest days of content moderation online—although Claude or ChatGPT-4 may have some good ideas on how we tackle these challenges, too!

To paraphrase Witness’ CEO Sam Gregory, our words create many of the harms we fear and it’s time to de-escalate the inflammatory rhetoric: “Let’s prepare, not panic.”

Join the conversation on LinkedIn and X. Tell us what you think: info@blueowlgrp.com

Previous
Previous

Squawk Box talks with Managing Director Colin Crowell on A.I. policy

Next
Next

The Internet is Approaching a New Hinge of History