An Indigenous Perspective on Generative AI

Justin Hendrix / Jan 29, 2023

Audio of this conversation is available via your favorite podcast service.

Earlier this month, Getty Images, one of the world’s most prominent suppliers of editorial photography, stock images, and other forms of media, announced that it had commenced legal proceedings in the High Court of Justice in London against Stability AI, a British startup firm that says it builds AI solutions using "collective intelligence," claiming Stability AI infringed on Getty’s intellectual property rights by including content owned or represented by Getty Images in its training data. Getty says Stability AI unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images without a license, which the company says is to the detriment of the content’s creators. The notion at the heart of Getty’s assertion-- that generative AI tools like Stable Diffusion and OpenAI’s DALLE-2 are in fact exploiting the creators of the images their models are trained on-- could have significant implications for the field.

Earlier this month I attended a symposium on Existing Law and Extended Reality, hosted at Stanford Law School. There, I met today’s guest, Michael Running Wolf, who brings a unique perspective to questions related to AI and ownership, as a former Amazon software engineer, a PhD student in computer science at McGill University, and as a Northern Cheyenne man intent on preserving the language and culture of native people.

What follows is a lightly edited transcript of the discussion.

Justin Hendrix:

Can you tell me just a little bit about the work you're doing, training Lakota youth? I read a little bit about that, and find it fascinating and would love to hear more.

Michael Running Wolf:

Yeah, maybe we start at the beginning. So I began this AI journey when I was working in industry when I was at Amazon, and I really wanted to enable these technologies for Indigenous peoples. And as part of that, I was searching for colleagues and peers, or even just research analogous to my interests, to specifically voice AI for Indigenous peoples. And I literally couldn't find anyone. And so working with the organizers of NeurIPS, which is a large AI conference, I created an affinity group. It's associated with NeurIPS, fiscally sponsored as a nonprofit. And as part of that, I became to realize how bad the statistics are. There's a survey by Taulbee, the CRA survey, and it checks how many computer scientists and data scientists are there in North America. They survey universities in the United States and Mexico and Canada, I believe. And it's the largest survey in North America on the subject.

And the stats are really bad. We only produce one or two PhD students per year, if we're lucky. And it's actually a recent phenomena that we've only been, so there may actually only be six or seven PhD students with computer science or related fields in all of North America that are Indigenous, and either First Nations, Indigenous of Mexico and the Native American from United States. And I start to think, "How do I address this? How do we as a community address this?" So I had friends in ML space and we got to thinking. And one of my friends, his name is Mason Grimshaw, MIT graduate super-genius, had this idea, why don't we teach Lakota youth artificial intelligence and data science so that they can go on and join the career in AI? And that's where it began. It was an idea, it happened on a Tuesday. I guess he told me his idea.

And he was very super, it was one of those conversations where it was a wild idea. He thought it was a wild idea, and he didn't think it was feasible. This is something we might do in 10 years. And I said, "I happen to be having a phone call tomorrow with one of my partner's foundation at the McGovern Foundation. And do you mind if I mention this?" And he says, "Sure." I said, "Okay." So I talked to McGovern Foundation and they were asking, they just wanted to, it was a conversation around, they had funded my AI work previously and we were talking about other partnership opportunities we can work with together, collaborate on. And I mentioned this idea, and they loved it. They said, "Give us a pitch by Monday, and we'll see where we go from there." And so that weekend, that had half week over four days, we put it together. And that came together, surprisingly. It was very expensive. We anticipated the cost would very high, and we were surprised that they were willing to foot the bill.

Justin Hendrix:

Can you tell me something about the students? What's it look like? What's the program look like for them?

Michael Running Wolf:

The long-term objective of the Lakota AI Code Camp is to produce more researchers, and produce more mass graduate students and more PhD students. And it has to start at high school. There's just this huge gap between high school and undergraduate. And so the students we are targeting are high schoolers, high school age in United States, between 13 and 18 years old. And they come from South Dakota. We focus in on Lakota, my father's tribe. And so there's many different types of Lakota, different languages, different cultures, that made up the former Lakota Nation. And they make up North to South Dakota and Nebraska and Wyoming. But the terrain originally before colonization was from Alberta, Canada, all the way down to Oklahoma. Between the Rockies and the Missouri, they had this large nation-state that was Lakota, the Lakota Nation.

Justin Hendrix:

So when I look up your name, and I look at some of the concepts and ideas that are associated with the various kind of media appearances and interviews that are out there on the web, there are lots of phrases that come up. Things like indigenous data sovereignty, indigenous AI. And I gather that it's not just about Indigenous folks doing artificial intelligence research or concerning themselves with data sovereignty, but this is about a perspective on these issues which is peculiar. Can you tell me what those types of terms mean to you? What is the Indigenous perspective on artificial intelligence?

Michael Running Wolf:

Indigenous data sovereignty is a response to colonization and behaviors of colonial behaviors, specifically in research. So my people, my mother's tribe, Cheyenne, and my dad's tribe, Lakota, they've been long researched by anthropologists and by linguists and by media and whatnot. And I have aunts and uncles who have been consultants on movies that take place in the West, because my tribes are the stereotypical feather Indian. When you think of these old Western movies, that was us. Literally, often that we were in the background as extras because they filmed this stuff in Montana and the Dakotas. And there's always been this anthropological and social interest and romanticization by the West. And when I say the West, I mean everyone, Spain, France, Germans, with the Indians, Russians, even the English, there's a whole subgenre of films that took the mid-century, the '60s and '70s, in the former Soviet bloc about Native Americans.

They would dress up, put that shoe polish on themselves and run around Lakota. And that interest in us has unfortunately also resulted in the commoditization of our knowledge systems. And so since colonization started 1492, the Indigenous peoples of North America have been subject to intellectual property theft. So you have this whole host, everyone in the Americas, North and South, has stolen property in museums. So it started with our gold, the Mayan gold, Aztec land and Native American war bonnets are all in the museums and in Europe. And then also, with our land and also our knowledge systems. And one of the thing, the consequences, though, is that we're having our identity taken from us, and we're no longer allowed to put on our practice. And this was all up until the last 30 to 40 years, optimistically, that we were really allowed to be ourselves.

And in the meantime, we had anthropologists and linguists taking our identity, taking recordings of us, pictures of us, audio from us and commercializing it, and creating a huge subgenre of media based around our identities. And so now we're in this situation where a lot of our intellectual property is owned by outsiders. Never mind the land loss, and never mind the material artifacts like hide scrapers and cultural significant items that exist in museums. We're now also facing this problem with their language data and our stories and our heritage. And it's a consequence of Western economic biases. Or it's hard to say that yes, there was very obviously evil people in our history, the military that tried to kill us and the doctors who sterilized our women, but there was also this economic pressure to exploit us and commoditize us. And so this is really in a response, it's been a general movement. I'm not the only one, obviously. I was inspired by intellectual thinkers in New Zealand and the Maori and also my peers, Lakota peers.

Justin Hendrix:

When I spoke to you where we met at a conference at Stanford that was focused on existing law and extended reality, you mentioned the Maori people and the idea that perhaps they're a little further along, as you put it, in their thinking about these issues. What did you mean by that?

Michael Running Wolf:

When I say it further along, they just generally are. They've been, and when I asked them, people who are advocates for language and technology and in New Zealand, they always-- in that Commonwealth King's English way-- get kind of shy, but they don't realize how fortunate they've been. And so what I mean is that in North America, a lot of languages very nearly went extinct. And that's actually a rude way to say it. And Indigenous peoples prefer to say, go to sleep, because some languages have gone to sleep and have been reawakened recently. So we were facing a crisis of not being able to remember our own mother tongue in North America, broadly. And this was in the 1970s and 1980s, 1990s, there's this catastrophic loss of language. And we were five years away from a lot of the communities that are losing their languages.

And this had happened already to the Maori by this point. And they had worked out strategies and cultural behaviors to protect themselves, because they were facing the same crisis that we were. And they had this idea, for instance, this idea of a language nest, creating this safe space where the language is spoken, and then immersion camps and education is adopted. And this idea of their language reclamation spread into the Pacific Northwest. And they got into Hawaii, and through Hawaii, through various language conferences, people meet people. These ideas made it to North America. And so famously in Montana, the Blackfeet taking ideas, the Blackfeet language community saw these ideas originated in New Zealand and implemented them. And so it's quite common in language advocacy that you'd hear discussions of how we setting up these different programs, a holistic approach, creating language nests, and quite amazing that they've been on the forefront of this strategy to protect the language.

Because it's also inspired us in North America. And they've also faced earlier than us, the same problems of language, their language data being stolen and being commoditized. Because when you think who are the biggest badasses of the Pacific, the Maori, right? They have a fearsome reputation, they never actually surrendered. A lot of us didn't surrender. And them in particular, they actually very nearly defeated colonization by the British. And they have this fearsome reputation, and that's commoditized. You see all the media, the tribal tattoos, the surfer culture, it's very originated within Polynesian societies and particular, the Maori. And there's other Polynesian communities that we also appropriate and exploit. And so they've been on the forefront similar to cultures like the Lakota, and they've had to build institutions to resist that. And for instance, what inspired a lot of my AI research were Maori data scientists in the media company called Te Hiku, who decided, "We're going to be their own AI. We don't want to be beholden to corporation in building this technology."

Justin Hendrix:

So I understand part of your project has been about language preservation, and about thinking about how to preserve an advanced culture in new technology, natural language. And AI is one, of course. XR, virtual reality another. And yet this panel I saw you on, I thought it was fascinating in that it was really asking the question, what's new here as we enter this phase of generative AI tools like ChatGPT or diffusion models like the stuff that Stability AI is working on, or other similar types of projects, as we move into a world where some folks want to see us engage with information and entertainment in some kind of virtual environment or 3D environment? What are your concerns as we move in that direction? Are the types of issues you're raising here likely to be exacerbated in these environments? The commoditization of culture?

Michael Running Wolf:

Yeah. And maybe just to step back a little bit, but my goal is to build virtual space in the metaverse or whatever we're calling it, and be able to have meaningful educational experiences through XR and through AI, and specifically voice AI, being able to walk into Lakota Village, and being able to speak and adhere to cultural norms in a simulated, safe environment. And so I see there's incredible potential with XR technology. And as you said, there's risk. I'm embracing artificial intelligence and embracing XR. And with new economies and new technology, it's always new risk. And whereas before, like I said, we're concerned about our material culture like spears or arrowheads or traditional beading techniques being exploited, similarly, we're probably going face the same phenomena in XR. Material objects that are going to be scanned through photogrammetry, or created that are basically ripoffs of Indigenous cultures to be exploited.

It's happening in NFTs. A few months ago, I discovered there's a Lakota BAPE. And it's not the big fancy one, there's a Bathing Ape. NFT is a knockoff of the Bored Ape. And they wear this big Lakota headdress, and it's like the Lakota gorilla, which is on some, and I mentioned this in within the NFT Reddit community. And the response is usually just, "Oh, we're trying to honor you." But that it's the opposite for us, because what we're seeing is you're taking the caricature of our culture, parading that around without permission. And it creates a un-inclusive environment where Indigenous peoples are not going to be willing to participate in that, because this is a clown, basically. And that's going to not allow us really to participate much in the metaverse. And so that's one thing.

And the other thing, the more sinister side is exploitation, where you just going to start seeing our culture being sold as being representative of our work. We actually have federal law restricting commerce of Indigenous-made, Native American-made arts and crafts, I think Native American Arts and Crafts Act. Or American Indian Arts and Crafts Acts, I believe what's called. And so it's illegal to market material as handmade by Native Americans when it's not actually.

And I would see something similar within the metaverse happening, where we have people who are infatuated with our culture, creating sacred artifacts and 3D representation and in there. And as we're approaching into this area, space of AI. And I think you referred to this panel, and I think on that panel, some of the panelists. And I'm a fan of this technology broadly, I'm just outlining the potential risk for Indigenous people. Just that it's going to be possible, similar to these generative AIs for image, where you can ask the AI, "Create for me a 3D representation of a Lakota teepee." And AI will generate through a 3D object. We're close. We're probably one or two papers away from you can just ask an AI to generate arbitrary 3D objects with limited data.

Justin Hendrix:

I've done this, just to see what it looks like. I've gone to Stable Diffusion to their web playground and just said, "Show me a Lakota picnic." And it immediately returns something that looks almost like Lakota art.

Michael Running Wolf:

Yeah. And the problem is that our artists, artisans struggle. If we were trying to, I have uncles and family in the art space, and my parents take for example, they do silversmithing. And so my mom is the design. She comes up with an idea, the traditional design, and my dad implements it in silversmithing. And it's hard work. It takes some hours, takes some weeks to come up with a new design and implement it and create it. And they know they have to keep doing it because new designs, because within months, weeks, there's going to be a Chinese knockoff. If someone's going to take the idea, cast it, which is pretty easy, and send it off to a factory, where they'll just pump them out and completely flood the market with this interesting idea. And it's happened to my dad, it happens to that pretty regularly.

It happens to all artisans. So if you have a product that's simple and popular, the fact the Chinese factory is going to be building at that end. And not to pick on Chinese factories, it could be easily a Vietnamese factory, a boiler room factory out there is pumping out Native American art and trying to pass it off as Indigenous art. But it's so pervasive, and you could see that that's going to happen in the XR space. And I think it's going to be worse with generative AI, because you won't need to do actually any intellectual energy and effort. Because it does take tooling, takes a factory, it takes skill, it's not going to take any of that. But in the metaverse with AI, you can just say, "Create for me a Lakota hide scraper." And how do you compete as a creator in that space?

Because that's how in theory, the next economy is the creator economy. But if AI just completely is able to stamp it out perfectly, human, we have no chance. And the core conceit problem here isn't the fact that the AI exists, it's because they're stealing data to create these AIs. If we had the ability to enforce our intellectual property rights, either through having some sort of legal entity who owned the data, or even as artisans being able to stop it, this wouldn't be a problem, actually. The whole reason why you can have Stable Diffusion, DALL-E, or I think there's the other one, the many other AIs out there, is because they scrape the internet. These AIs have seen Lakota artwork. These AIs have seen Lakota artisanal work in pictures, and that's why they can do it. If they don't see it, they can't do it.

And so I'm actually excited about this Getty image lawsuit against these large AIs, because that could set precedent to help us Indigenous creators to protect it. And so what's the solution? And maybe to back up a little bit, it was like, yes, this is a problem, the technology, and what do we do about it? Because it can seem very overwhelming. But the simple solution is just allowing us to opt out, or even better, allow us to opt in. Allow artisans broadly, either Indigenous or otherwise, be able to opt in and say, "You can't scan our work. And you can't portray generative art." Because I'm pretty sure the Dali family, the estate of the artist Dali has an opinion, because a lot of his work is still under copyright over this matter.

Justin Hendrix:

I have seen, of course, the press statement from Getty Images, who is, in fact, suing Stability AI over alleged copyright violation. And they say that Stability AI "unlawfully copied and processed millions of images protected by copyright," in order to train its software. One of the concepts that was discussed on the panel that I saw you first on at Stanford was this idea of abundance. The idea that in these virtual environments or with these AI tools that can create approximations of or synthesis of certain aesthetics or objects or create essentially out of millions of data points, something that is perhaps like something else but is fundamentally a new representation. That is a kind of abundance that what you've got here is a sort of almost, I don't know, something that really challenges our notion of property rights altogether. I found your pushback on that concept of abundance in this regard as very interesting. Because that thought about abundance, I mean, that's what seems to be underlying the mentality of some of these Silicon Valley founders. You hear that word frequently.

Michael Running Wolf:

Yeah, I think when we think of abundance, particularly in the context of the panel at Stanford, and my fellow panelists kept bringing up this notion of the Star Trek replicator. You walk over there and you say, "Earl Gray hot," and this spits out an Earl Gray cup full of cup of hot Earl Gray. Or you ask it for lunch. "I want a nice traditional tamale, a bean tamale" or something. Actually, that'd my favorite. That's what I probably would ask for all the time. Because it's really hard to get a good tamale in Canada, where I'm in Vancouver. And that's a really alluring idea. But if you start thinking of that and realistically, how do you actually implement that, you begin to realize there's an underlying foundation of exploitation. And so this idea is basically you would be in the metaverse, and through whatever API, or maybe it's a service you have to pay for. Because the replicator is not free.

It takes electricity. E = mc2. That means that there's a ton of electricity has to be used to generate that cup of tea under order of a magnitude of jillion, billions and billions of watts. Now, but the idea is in the metaverse, trying to translate in this idea from the panel is that you could just metaverse ask the AI to please create for me a Barbie doll house with a Ken car, right? And poof, magically the AI generates a unique representation using the data that's trained on of the Barbie house and a Ken car. And when we think about abundance, what is it going to take to do all that? And we kind of been talking about data sovereignty. So the underlying exploitation, we have to be mindful of the underlying exploitation when we are creating these systems that are seen as beneficial. So take for Indigenous perspective, what you're doing here is taking Indigenous data, scanning it into a system without permission, and then exploiting our intellectual property to generate.

So it's not going to stop at Barbie doll houses, it's going to be create a DALL-E representation of a Lakota teepee, create a headdress as if it was painted by a modern artist, let's say the Dilbert cartoonist. And you really think about it, to enable that system and to make it effective and cheap, you're going to have to break a lot of, never mind copyright issues, trademark issues and other IP issues. You're explaining people and their intellectual works without permission to make it cheap or free. Right now, the AI companies Stable Diffusion and DALL-E, they're basically just charging you for electricity to generate, these are commercial products. You go over there, you pay 15 bucks, you pay 10 bucks, whatever, and it gives you the X amount of images you can generate. And you're basically just paying for electricity. And what they're not doing, is Getty is being concerned about, is paying intellectual property rights.

"You have our images in there, you're using these to generate." And so maybe to summarize all this up, the issue with AI is that it's not generating new content, it's actually a facade. These AIs are stochastic parrots. And briefly, stochastic is just a mathematical term for a random process with a measure of predictability, the mathematical term. So they're essentially just a statistical parrot. They see data, and they're really good at replicating it at will. And it's only able to replicate what it sees. They're not actually able to synthesize new information. They're doing is doing a really good job of seeming like humans to generate, say, new art. And on every pixel that's being generated by these image originates in a previously seen artwork. So what they do is they basically create a layer of noise and then construct art, assembling other art pieces they've seen together. And so that it's actually not unique. It's a pretend intelligence. It's just there's a set pattern that's been baked into the AI, and it's just going to reproduce it. And that reproduction is based upon previous data it's seen.

Justin Hendrix:

The replicator metaphor, which as you say was much discussed in the panel, was interesting. Because I think what it was trying to suggest is suddenly, you've made a great number of things available to people. Experiences, cultural experiences, information, emotional interaction, et cetera, available in a way that almost completely reduces the cost out of it. And the notion, I suppose, in some corners of Silicon Valley is that's a good thing. We're going to permit humans to mix and match ideas and cultural artifacts, and all of the things that humanity has essentially produced and placed onto the internet.

We're going to be able to cut it up, and remake and redo and advance the culture in that way. I mean, part of me might argue that at what point do we stop remunerating a culture or a set of craftsmen for their work? Are there ancient cultures that we can regard as totally in the public domain, or how does that work in your mind? I mean, because what I hear you saying is maybe two things at once. You're thinking about your father and the specific craft that he has created that is his product, right? But there's also something larger here, which is this cultural practice or the aesthetic that's produced by an Indigenous culture, and you see that as worthy of protecting as well.

Michael Running Wolf:

Yeah, I think from a high level, no, I agree. That's a great synthesis, by the way. You read a lot, I imagine. On a broad level, currently these AI are virtually free, really cheap. You're essentially just paying for electricity. That's the cost associated with these. But that's not true. We have this abundance of art all of a sudden. We can create replicas of manga for dead artisans. We can create new works by Dali, through DALL-E the AI. And we're only paying for electricity. And we're seeing that's the concept of abundance right now. This is so cheap, we've never been able to create such elevated works. That's the abundance that we see. But there's no free lunch. That great deal of energy and effort went into the construction of this art, a great deal of energy and effort went into digitizing these art. It's not trivial to take an image.

I know, because I help artists friends try to get their work on canvas into a digital form so they can create lithography, printed. It's difficult work, taking art and digitizing it, and even if you didn't create it. And so what these AIs aren't doing are remunerating all the energy and effort that went into creating the original art in the first place. And without that original art, the AI can't create anything. If it removed all the Getty images and removed all the artwork by Dali, it would not be able to recreate Dali-like art. And same thing with Indigenous. If we removed all the data from Lakota art from these AIs, they would not be able to create it. And so that in my mind says that means there's a dependency upon the intellectual property right of that content. And we're not remunerating because yes, the AI is creating a stochastic parrot version of art. But without the underlying energy and effort that went into the original data, it could not do it at all.

And so in a better world or a better scenario, which I hope is what's going on on the horizon is where artists are remunerated for their inclusion in these generative arts. Otherwise it would be IP theft. And I don't think we're going to stop it, because Stable Diffusion open sources their AI models. And I think the cat's out of the bag. The technology, the strategy is too easily replicable. But we are going to exist in a metaverse, in a commercial system that must adhere to modern copyright policy. And so there would always be the fringe. You can still go to the back alleys of New York, I was there recently, and buy CD or DVD of Beyonce's. Or buy the knockoff, it's always going to be there in the fringes. I don't think there's anything to stop that.

Justin Hendrix:

100%.

Michael Running Wolf:

Yeah, we can't stop it. But I think we could at least mitigate this by having a policy frameworks that remunerate the art, because it does take energy and effort. It's a lie to say that it only costs electricity to generate the art. That's a lie. These Stable Diffusion could not do this if they didn't have the ability to scan the intellectual property of the internet. And that is worth something.

Justin Hendrix:

Can you imagine using these technologies in a way that perhaps some fractional part of whatever profit that Stability AI gets or OpenAI gets from a particular work goes towards the individuals whose copyrighted works are perhaps included in that data set? I mean, it sounds like a good idea, but I don't know if it's even practical. How would they ever be able to figure that out on some level?

Michael Running Wolf:

Actually, you can in AI. So what I'm thinking is, I think the model already exists somewhat in say, streaming from Netflix. You pay your 9 or $10 now, it's 15 or whatever it is, and they can track what you stream exactly. And then the artists get renumerated. Residuals I think is what it's called. And in AI, I think it's easy. You can just say, "Create Lakota art of a spear." And so if there's some sort of entity, now there's actually a kind of kink in this problem, though, but let's just make it easier. DALL-E or Disney, Mickey Mouse. Created a Mickey Mouse rowing the boat with a corgi. Right off the bat, Disney, you can say they want Mickey Mouse. So obviously, Disney needs to get a cut. And so the user pays some licensing fee, some convenience fee to the AI, and then you can give residuals to Disney.

Furthermore, AI actually, is it actually anonymous? It's entirely possible to reverse engineer the original data used to create AI. And this is actually a privacy problem. It's been shown that you can reverse engineer personally identifying information from AI models until you can actually re-derive the original data used to train that model. And so it actually would be entirely feasible to audit these models. And Disney can say, "Hey, in your model, you might block Mickey Mouse from being the prompt to generate new art, but we can audit your model and see that you have our artwork in there, our copyright and our trademark, whatever intellectual property rights as part of your model." So it's actually feasible. I don't think we're necessarily in the new worlds. I still think it's still tractable, until modern AI figures out a way to completely anonymize training data. It's totally doable right now to keep track.

Justin Hendrix:

And perhaps some of those other technologies like the NFTs or maybe blockchain applications could potentially serve as a solution there. But I guess you're talking about more electricity.

Michael Running Wolf:

Yes. Well, and that's another thing too. Yes, if you only pay electricity, well, we have global warming. Electricity has a problem, they're a problem now. But to the point of NFTs, I would just say no comment. I'm not a huge fan of NFTs, but don't want to turn off your audience.

Justin Hendrix:

I don't think you'd have to be so concerned about that. This is a generally skeptical environment for some of these things. So do you have conversations with policymakers, or have policymakers shown an interest in your ideas in this regard, either in the US or in Canada?

Michael Running Wolf:

I do not, actually. I do have a short circuit, though. My wife actually is on AI policy and XR policy for Canada. She's the famous one actually, my wife Caroline Running Wolf. She is on panels around future technology and how it affects Indigenous peoples. But I am not, I'm talking to cool people like you. That's the extent of it.

Justin Hendrix:

And tell me a little bit, I know that you're early in pursuing your doctorate, or at least I understand that to be the case. Will you pursue these types of questions in your doctorate alongside work on computer science? Do you have a research agenda in mind yet, or is it a little too early?

Michael Running Wolf:

We've been talking about the negatives of AI, and I really think that's such a minor issue with proper policy and proper ethical behavior. It not an issue. These aren't technical problems. These are just social problems, and of course, practically intractable in the United States. So I think what I want to do is use my research as a model for future research, in particular ethical research regards to Indigenous peoples. And so for example, what I'm working on right now is building automatic speech recognition for Indigenous languages. And one of the key problems we have is that we just don't have enough data. No tribe has enough data to build ASR on their own, with a handful exceptions. Maori notably, Nacotchtank, Cherokee, and probably some Iroquoian languages or Algonquin languages have enough data to build their own automatic speech recognized. That is being able to say a word, turns into an MP3, and then converts it to text.

It's currently an unsolved problem in North America for languages of North America. There's not been any fundamental breakthroughs in science to enabled it, certainly not broadly. And so my goal is to generally make it enabled for Native Americans and First Nations in Canada. And I also want to do it ethically. I keep talking about remuneration. I believe the data that I'm using to accomplish my scientific goals is fundamentally not mine. I believe that it is entirely reasonable for the communities I'm working with to ask me to pay them to use their data. They're not, because they also have a similar vision as I do. But we do make sure that when we work with corporations, nonprofits, or in this case McGill University, really making sure that the community retains full ownership and full ability to veto usage of the data. And I think that's the model I want to approach.

And it makes me feel tired, but I'm probably one day going to run a nonprofit. It's some sort of data coop. Because as we're assembling, like I said at the top of this discussion, is that we don't have enough data. So what do you do? If you take related languages for, like we're working with the Wakashan languages community. Not one of them has enough data, but if you take related languages, and there's a whole community in the Northwest and Vancouver Island and Washington State, and if you combine their phonetic data and morphological data, they're close enough that it helps us deal with this data problem. And as we're collecting communities and integrating the data into AI, we're creating a risk. Because we have a lot of data, and I foresee a need that we need some sort of neutral entity where I am simply a client in this model, where I license the data to accomplish my scientific goals.

And I can see that other researchers also need that. And this would be beneficial for science. Because it's difficult to get clean Indigenous data. There's a reason why this is unsolved problem, because we have a data problem. And number two, we also need a way for communities to safely get into relationship with researchers that can be beneficial towards both in a safe way. So the community will always have veto power. And how do you do that?

I think it's going to be a data co-op where you have a third party entity that acts as the copyright enforcer, and any profits and benefits go to the community minus some overhead, of course. And so that's what I see going on in the future, and how my research can contribute to that vision. And I'm not the only one. This is not a unique idea, this idea of having a data coop is something that is coming out of library science and of course elements within AI, because we do have this data ownership issue. It is a concern that AI can scan a lot of data. How do we band together to make sure that we're data owners, copyright owners are being able to enforce their rights cohesively?

Justin Hendrix:

So some of the applications that you can imagine from the data sets, the language data sets that you create and the, I suppose, ability to do speech recognition. This is not just preservation, it's also for applications allowing people to engage with machines and with one another in their language.

Michael Running Wolf:

Yeah, I think the initial goal, fundamental goal is create an AI API or an SDK that you could use in the Metaverse, and live in within a language playground. Whether if you were able to within an appropriately, licensed safe environment say, "Take me to a Lakota village and teach me how to speak to a grandma." And AI can ethically generate properly licensed Lakota village 3D space, and generate an avatar for you in which they communicate. And that's what I foresee, in being able to do all of that interaction within Lakota or Cheyenne, or in Makah or Kwalhioqua, the languages I'm working with. And enabled this technology broadly and do it in a way that's safe for everyone involved is my ultimate vision here.

Justin Hendrix:

So perhaps a alternative vision of abundance.

Michael Running Wolf:

Yeah, I think we're going to unfortunately end up here anyways. We're going to have, again, not picking on Disney, Disneyland and Meta space where you go in there and say, "I want to talk to Goofy, who's wearing the cowboy hat." And it's going to generate, and you're going to have to pay some fee to get in. I see it's going to happen for copyright or entities who are able to enforce their IP, but how do we do that for smaller communities like the Indigenous?

Justin Hendrix:

Michael, it's been great talking to you. Hope to catch up with you perhaps a little further into this PhD, and see where you've taken it.

Michael Running Wolf:

Awesome. And thank you for the opportunity.

Authors

Justin Hendrix
Justin Hendrix is CEO and Editor of Tech Policy Press, a nonprofit media venture concerned with the intersection of technology and democracy. Previously, he was Executive Director of NYC Media Lab. He spent over a decade at The Economist in roles including Vice President, Business Development & Inno...