Ethical AI: Legal & Ethical Considerations for Writers & Publishers

Ethical AI - Legal and Ethical Considerations for Writers and Publishers - AllFreelanceWriting.com

Will they, or won’t they? That’s the question many writers and publishers face right now—whether or not they’re willing to publish AI content or use AI tools in their businesses. They have legal and ethical AI concerns. And that's fair.

Are there ways to engage in ethical AI use in your freelance writing business?

Absolutely.

But to do that, you first have to be aware of the legal and ethical issues facing the AI space and how these tools are used.

That’s what we’ll explore today—key ethical issues in AI, and some of the earliest legal cases facing big players in the field.

Note: I am not a lawyer. I don’t even play one on the internet. Nothing in this article should be construed as legal advice. If you have questions about the legality of particular AI content or usage, please consult with an attorney familiar with this industry and rules regarding location.

In our intro to AI content for freelance writers, we looked at what plagiarism is (and isn’t). In short, plagiarism is claiming someone’s words, ideas, or works as your own without proper attribution.

This is different than copyright infringement.

Plagiarized work can be copyright infringement if, for example, your re-wording of someone else’s work was done without consent, and that work is covered by copyright. In this case, you could have an infringing derivative work.

Remember, only the copyright holder has the right to authorize a derivative work. You can’t simply re-write someone else’s article and claim it as your own.

At the same time, you can cite the original author and still engage in copyright infringement.

For example, if you re-write another author’s article (or a significant portion of it), you’re not necessarily off the hook for copyright infringement just because you credit the original author as your "inspiration."

Where AI Content Tools Come into Play

Plagiarism and copyright infringement are both potential issues with any AI-generated content.

Why?

These tools, like ChatGPT, are language models trained on existing content and data.

If you ask an AI tool to write an article on [topic], it pulls information on that topic from existing content used in its training. If your AI tool has internet access, it might pull more recent information. But it still comes from other original authors.

AI doesn’t just “hallucinate” the content it generates (which we’ll talk about more shortly). It can also hallucinate sources.

Tools like ChatGPT provide you with information and text, but it can’t always tell you where that information came from or what its sources are.

Sometimes that text compiles information from multiple sources (possibly OK, possibly plagiarism, and possibly copyright infringement depending on length and scope).

Sometimes that AI content is simply the re-wording of one article or other source (plagiarism and potential copyright infringement).

And sometimes AI-generated content will contain directly-copied text from another source (the most obvious case of plagiarism, and possible copyright infringement depending on the source and content).

At the current time, it isn’t possible to publish ai-generated content or copy without a significant risk of plagiarism (unless you supply significant seed data, which we'll get to).

What You Can Do:

Here's how you can avoid producing plagiarized or copyright infringing content with AI writing tools.

  • Never publish AI-generated content as it’s delivered.
  • Don’t assume revising or re-wording that AI content is enough. It can still be both plagiarism and copyright infringement.
  • Don’t assume plagiarism checkers will protect you. They’re only designed to find directly-copied text, and plagiarism and copyright violations go beyond that.
  • Vet any statistics or factual information in AI-generated content, and cite your sources even when the AI tool cannot provide them. If you can’t find sources to back those statistics or facts up, don’t include them in your piece.
  • Use AI more in your ideation, outlining, and revision process than in writing initial drafts.
  • Instead of using AI tools as sources themselves, use them to help you find and summarize actual reputable sources to speed up your research process.
  • Only use AI content as-is if that content is generated wholly from public domain content or your own content that you own the copyright to. Remember, publicly-accessible does not mean something is in the “public domain.”
  • As a general writing principle, don’t write content based on a single source, AI or otherwise.

The Potential for Bias in Generative AI

The thing about AI tools is they’re only as good as their training data.

That training data comes from humans—original works, data sets, and the decisions as to what’s included in that training data.

Humans have biases.

Even if those creating and training AI tools have the best of intentions, we all have biases.

These biases aren’t always intentionally malicious. We might not even be aware of them. But they creep into our choice of language, and into research, and therefore data.

This data can then be used to train AI models, making them inherently biased as a result.

Example: Biases in Medical Research and Care

A common place you can see this is bias in medical research and the resulting data. This, in turn, can lead to biases in medical care different patients receive.

For example, for a long time in the US women of child-bearing age were excluded from many drug trials, even in cases where there was little chance of them getting pregnant during those trials.

As a result, the data didn’t directly represent them. Yet it would influence their treatment.

Similar issues occur when racial or ethnic groups are under-represented in research. Data that doesn’t represent those groups can be used to treat (or not treat) them.

Current healthcare technology already relies on algorithms affected by racial biases, which can determine whether or not patients get the care they need.

We’ve seen similar with facial recognition algorithms and racial and gender bias, with significantly higher error rates when differentiating or identifying dark-skinned women.

Again, these tools are only as good as the data they’re trained on, which is only as unbiased as the people deciding which training data to use.

Then there’s the issue of publication bias.

What is Publication Bias?

Publication bias is the act of publishing, or not publishing, the results of studies based on whether or not the results were expected or desired.

In other words, when you look at published research around a particular topic, you’re only seeing what researchers (or those funding them) want you to see.

Sometimes flaws are discovered with a study, and the results are rightly not published. Doing so would mislead the public.

But there are also cases where results aren’t what people wanted, so the decision is made not to submit the results for peer-review and publication. Even if the results might be valid.

For example, let’s say a study is conducted around a particular drug. It’s funded by a pharmaceutical company that makes this drug. If the study shows the drug doesn’t have much of an effect on the targeted condition, they might choose to simply not publish those results. Yet they will publish other studies showing more favorable results.

That’s publication bias in a nutshell.

This is why you can’t assume quantity equals accuracy when looking at published research volume around particular results.

But that’s not all.

Publication bias isn’t only relevant when it comes to published research.

It can also impact news reporting and other publishing formats–what is and isn’t covered (think fear-based stories, sensationalism, clickbait, etc.), how stories are placed or promoted (which could impact whether or not they’re pulled into AI training data), commercial impacts on publishing decisions (sponsored content for example), and so much more.

As a freelance writer, it’s vital you understand this when conducting research. If you plan to use AI for your research or writing, you don’t know where its training data came from.

You can’t be aware of all biases built into any given tool—publication bias, racial bias, gender bias, geographical bias, or otherwise. And those biases will be reflected in any AI content you or your clients publish.

What You Can Do:

Here are some things you can do to minimize bias and improve the accuracy of your end results when working with AI.

  • Don’t rely exclusively on AI writing tools for research.
  • If you ask an AI tool for information or background, verify it independently.
  • Always look into funding sources or other potential conflicts of interest before taking data and studies at face value, regardless of where or how you find them.
  • For research purposes, only use AI tools with current training data. Preferably focus on tools with internet access and require source citations with your prompts.
  • Look carefully for instances of potential bias in any content you publish with the help of AI. You'll want to crank up those critical thinking skills any time you work with AI content or AI-assisted research.

Ethical AI In the Disinformation Age

Another big ethical concern when it comes to AI content revolves around accuracy.

The biggest concern is that these tools could be used to generate mass-scale disinformation campaigns – email copy, social media updates, etc. AI tools can even be used to quickly disseminate that disinformation.

This isn’t new. Bots have been used on social media for years to publish false information, run attack campaigns, and to more generally incite anger and backlash over outright lies.

The issue is that tools of disinformation have become far more accessible.

And while these companies might have terms of use that prohibit this, there’s currently no good means of enforcement to ban accounts or even expose this kind of AI-generated content.

That might change.

But right now, these tools can be used maliciously with little to no recourse.

That could be a big problem during elections. Healthcare disinformation could directly hurt people, or worse.

But while I expect to see much more of this with AI content tools being so openly accessible, I suspect the biggest misuse will be commercial – more fake reviews and testimonials, dishonest copy to push sales of dangerous or useless products, and more email scams just to start.

AI Hallucinations and the Spread of Misinformation

Disinformation isn’t the only accuracy concern when it comes to tools like ChatGPT.

Due to some of the problems we already talked about surrounding plagiarism, AI content can also spread misinformation due to poor sourcing.

For example, let’s say there are inaccuracies in some of the AI’s training data.

Maybe the publication posted a correction at some point. But that initial inaccurate information was already used to train the language model.

Now there’s a chance that information will be output into text generated by the AI if a user asks for content on that topic. If the user publishes it, they would in turn publish that misinformation without realizing it.

Then there are the hallucinations.

What is AI Hallucination?

AI hallucination is when an AI model makes something up with no factual basis.

Yes. Tools like ChatGPT can essentially lie to you.

They’ll make up facts when they don’t have the answers you ask them for.

They can also make up sources for those fake facts.

AI tools can even hallucinate about people. Maybe even you.

As I noted in a previous post on AI content, ChatGPT credited me with co-founding a website I would never want to associate with. Not only did that association repulse me on a personal level, but it would violate my professional ethics in an irreparable way. (And no, asking it now won’t tell you which site it is because ChatGPT’s responses for the same prompt continuously change.)

Not fun, but not as bad as what ChatGPT hallucinated about the first guy to sue OpenAI for ChatGPT’s alleged defamation. It accused him of being investigated for embezzlement, which was reportedly untrue.

Please. Pretty pretty please. Never use a tool like ChatGPT to conduct even the most basic background research on any person.

Now, to be clear, some AI writing tools are better about this than others.

For example, while ChatGPT 3.5 is notorious for hallucinating false “facts,” GPT-4 is much better.

There’s still a risk, and it still depends on how you use it, but having internet access (currently only through plugins) means GPT-4 can cite, and link you to, actual source material so you can more easily verify what it claims.

What You Can Do:

Here's how you can publish more reliable content even when using AI tools in some part of your process.

  • Obviously, don’t use AI writing tools to intentionally spread lies and disinformation.
  • Verify all facts in content created by AI tools.
  • When using internet-connected versions like GPT-4 (with plugins enabled), ask for source citations and links.
  • Don’t trust any background information ChatGPT provides on a person.

Data Privacy Concerns

Another ethical and legal issue to be aware of is data privacy.

There are a couple of key concerns here:

Data Used in Training

There currently isn’t a lot of transparency around AI training data. Some of your own content could be included without you knowing.

Training data like the text of laws and judicial systems might be in the public domain and be fine in this case. But there are already lawsuits claiming OpenAI used entire books as training material by accessing illegal copies online.

AI tech companies are also licensing some content from third-party sites and publishers.

Here’s the thing though. Not all publishers buy the copyright to a writers’ submissions. Many purchase things like First North American Rights, exclusive digital rights, etc. But without owning the copyright, they wouldn’t have the right to authorize derivative works – essentially what AI writing tools generate.

Think about sites like social networks. You can have ChatGPT write your social media posts. It knows how, and what formats work well on each service, because it has access to that data. And when you use these sites, you grant them and their users certain licenses to use and share that content.

Twitter, for example, had licensed its data to OpenAI for training for a while.

Elon Musk later called out Microsoft for using the data allegedly without permission, only to announce his own AI project where he plans to use Twitter’s data for training. (Have you backed up your data and moved along yet?)

Did you realize your social media posts could be used that way? That these sites could not only hand over your data to third-parties, but that those third-parties would then potentially be able to create derivative works based on your style on that platform?

Maybe it’s time to re-read the terms of use and privacy policies of the platforms you post to. And the contracts you sign with clients.

Be Careful How You Seed Your Prompts

While we’ll cover prompt engineering in a later post in this AI series, there’s one concept you should understand up front: prompt seeding.

Earlier I suggested only publishing AI content if it’s based entirely on your own existing work. This avoids plagiarism and copyright issues.

But how do you do this?

You use seed data.

What is AI Seed Data?

AI seed data is the data or other input you provide an AI tool to give it the necessary background information it needs to respond to your prompt.

In other words, rather than saying “Write me an article about [topic],” you might ask your AI writing tool to summarize a chapter of a book you wrote, then turn it into a blog post.

Or you might paste in one of your blog posts (or seed the tool with a URL to the live post if it has internet access) and ask it to write social media posts based on it.

This would also include feeding the AI multiple articles of yours, the points you want to make, maybe a completed outline, and specific stories you want an article to share prior to asking it for the post draft.

AI seed data can also include something like a .pdf you want it to summarize (GPT-4 can do this with plugins), or a .csv file that you want the tool to analyze for you (GPT-4 can do this via its code interpreter).

Seed Data and Privacy

AI seed data can be game-changing. For example, you can use it to:

  • train AI to write in your voice or style;
  • analyze and organize reports (think website analytics, SEO keyword research, content marketing competitive data, and so much more);
  • repurpose your existing content (create emails, social media posts, YouTube scripts, and more to promote your latest articles);
  • help your AI tool give you what you really want (such as seeding your prompt with keyword examples before asking for more ideas, or giving it a list of title templates or formats it can use to give you post ideas);
  • summarize long documents to provide key points (or help you find exactly what you’re looking for in them).

There’s a lot you can do when you take this approach to prompt engineering. It lets you use AI tools to make the most of what you already have, and at speeds you couldn’t reach if you did these things manually.

Where’s the data privacy concern?

Each AI tool is going to have its own terms of use and rules regarding data privacy for what you enter or upload into its system.

This means your owned content could be saved and used as future training data. Or it could be used to target advertising towards you.

This is a problem if you don’t want others receiving plagiarized versions of your work. It’s a problem if you don’t want to be bombarded with more targeted ads. It’s an even bigger problem if you upload information you shouldn’t.

For example, don’t share:

  • private communications;
  • contact information you don’t have consent to disseminate (like pasting in an email list);
  • other personal information, including health information (I’ve seen people suggest using ChatGPT as a private journal; don’t do that.);
  • any confidential work information;
  • unpublished work you don’t want used in training (like a book manuscript nearing publication).

Basically, if you don’t want the company to store it and potentially use it, don’t use it as seed data.

Data Options for ChatGPT

If you were one of the many writers to test out ChatGPT, consider checking your data settings.

To do this:

  1. Log into your OpenAI account and go to the ChatGPT interface.
  2. Click the three vertically-aligned dots near the bottom left of your screen (by your account info).
  3. Click settings.
  4. Then click “data controls” on the left of the window.
  5. Click the slider to turn off chat history and training (if it was turned on).

According to the company’s data control information, you’ll need to do this on any device where you use ChatGPT.

This means ChatGPT will only keep new chats for 30 days, but it won’t use them for training. As far as I’m aware, at this point you cannot completely disable chat histories. Their data control page says this is to allow time to review cases of possible abuse.

What You Can Do:

Here are some ways you can protect yourself, your clients, and your data.

  • Review the terms of use and privacy policies for every AI tool you use.
  • Don’t upload or paste in sensitive seed data with your prompts.
  • Don’t violate the privacy of others by pasting or uploading their data or content without consent.
  • If you need to upload competitive reports for analysis, anonymize the data first (ex. Changing your competitors’ names to something anonymous – make sure you also anonymize any URLs so you don’t give their site information away unintentionally)
  • Check settings for all of your AI tools and opt in, or out, of data sharing based on the kind of data you intend to seed your prompts with, and how comfortable you are with it being stored and used.

These are just some of the legal and ethical AI issues you might run into when using these new tools. You need to decide for yourself what you’re OK with, what the legal risks are where you live, and whether or not certain types of AI content could damage your professional reputation.

AI writing tools have the potential to help you make more of your past work (like updating older articles or repurposing them). They can make you more productive in the ideation and outlining process, and even save time on marketing so you have more time to focus on writing.

But these same tools are going to be used by bad actors. And others will cause unintentional harm because they didn’t think through the consequences of certain use cases (like publishing one-prompt articles because they don’t fully understand what plagiarism is).

By making yourself aware of these issues before getting too deep into using AI tools, you can avoid being someone in that latter group. You can keep your data safe. And you can avoid publishing, or providing clients with, unintended misinformation.

Do you have more tips on the responsible use of AI writing tools? Make sure you follow All Freelance Writing on LinkedIn for even more AI content ethical issues and legal developments as they come up. Later this week we’ll look at one more example there—the issue of disclosure of AI content.

Read the Full Series on AI for Freelance Writers

Get More Content Like This in Your Inbox

Did you enjoy this post? If so, please subscribe to the All Freelance Writing newsletter where you'll be notified of new blog articles and receive subscribers-only content.

Subscribe now.


Leave a Comment