Generative AI in ROS 2 Codebases? Oh My!

Hi,

As part of my Community TSC Representative commitments, I wanted to address the topic of policies around Generative AI in the ROS Community Code of Conduct (or I suppose another locale?). This is for obvious legal rationale, but also making a uniform process across the ROS community in how we deal with generative AI software so we can streamline contribution in a complex new world.

Also, some maintainers (including myself) have already had to engage with users creating numerous superfluous pull requests adding random AI generated code into random spots, with AI generated PR text that makes no sense. While the legality of code lifted from projects it was trained on is still not settled law, setting some policies so we can identify, track, and respect each other’s time seems like a place to start regardless.

My aim with this post is to provide my first-draft of such a policy to get others feedback and adjust it into a proposal to present to the TSC in December. Please take a look below and let me know what I missed, any questions, disagreements, etc!


Generative AI Software Policy

In order to track and manage generative AI produced software, the following policies are put in place:

  • To respect maintainers’ time and efforts, Generative AI must not be used in Pull Request or comment message bodies. Please state the software’s goals and respond to your fellow developers in your own words. Violations may be subject to summary closure of pull requests.

  • A pull request containing Generative AI derived software must explicitly state AI was used to create some proportion of the software.

  • A comment must be added around code created by Generative AI indicating that it is generated by such. Do not present AI created code as original work if it is not substantive modified or combined with other developer contributions.

  • All code generated using AI should be closely scrutinized and tested for accuracy, understandability, and efficiency by a developer before submitting it for community review.

Due to the current (November 2023) legal ambiguity of generative AI-based software such as those generated by ChatGPT, it is the responsibility of a developer representing an organization to ensure that their contributions do not violate copyright or software licenses. Software uniquely generated by AI is not currently subject to copyright, but software that has been copied by generative AI tools may be subject to the original author’s copyright and licensing terms.

4 Likes

Point of order here.

  • A comment must be added around code created by Generative AI indicating that it is generated by such. Do not present AI created code as original work if it is not substantive modified or combined with other developer contributions.
  • All code generated using AI should be closely scrutinized and tested for accuracy, understandability, and efficiency by a developer before submitting it for community review.

Does this also apply to documentation? While I am very skeptical of AI generated code, I am far less skeptical of documentation generated by LLMs. Giiven how bad some humans are at documentation LLMs might actually be a good thing for documentation (e.g. “ChatGPT here are a list of steps and a finished example, write good restructured text describing that process”).

3 Likes

Thanks for the initiative! I myself reviewed an Msc thesis partly written by ChatGPT and I understand your frustration about the “too easily generated” content.

This would be hard. Or - it needs a better definition of “AI generated code”. I’m now beta-testing CLion with ChatGPT integration and most of my two uses are hard to imagine following this rule:

  • Better Intellisense (i.e. line completion). It just helps with these little things like suggesting parameters to a function call on a single line. Does this even count as AI generated content? Because it’s always the developer who chooses from the multitude of suggestions. If the rule had to be followed, almost all SLOC would need the AI generated comment.
  • Generating documentation comment stubs. I usually write an API and when I’m happy with it, I’ll let AI do the boring stuff, generating Doxygen annotations etc for the whole header file. And then I go through the generated sections and edit where appropriate. But I’d say about 70% of the generated docs stay as given by ChatGPT.
1 Like

I specifically call out software in the policy name since that’s the big IP issue. Is documentation IP something we should also expand the scope of this out to? I didn’t originally think to and that might fall under a different set of rules and/or legal concerns.

if its the output of generative AI, not basic autocompletion, yes.

Throwing my 2 cents in there and echoing @peci1 's sentiment, using chatgpt and github copilot is pretty well integrated in to my workflow at this point. I use gpt frequently to start my ideas, help me through difficult code, and help me fix bugs. With copilot, it definitely generates a lot of boilerplate and non-boilerplate code that I usually modify. I think an example of something that followed that policy from me would look like

AI Generated

template <typename T>
void sortAndFilter(std::vector<T>& vec1, std::vector<T>& vec2, std::function<bool(const T&)> lambda) {
    // Remove elements from vec1 that don't satisfy the lambda
    vec1.erase(std::remove_if(vec1.begin(), vec1.end(), 
                              [&lambda](const T& item) { return !lambda(item); }), 
               vec1.end());

    // Remove elements from vec2 that don't have a corresponding element in vec1
    vec2.erase(std::remove_if(vec2.begin(), vec2.end(), 
                              [&vec1](const T& item) { return std::find(vec1.begin(), vec1.end(), item) == vec1.end(); }),
               vec2.end());

    // Sort the remaining elements
    std::sort(vec1.begin(), vec1.end());
    std::sort(vec2.begin(), vec2.end());
}

Edited by me where every line from the AI has a // above it

/////////////////////
template <typename T, typename F>
/////           ////////////////////////////////////////////
void filter_sort(std::vector<T>& vec1, std::vector<T>& vec2, F predicate) {
    //////////////////////////////////
    // Remove elements from vec1 that fail the predicate
    //////////////////////////////////////////////////////         ///////////////////////////         ////////////////
    vec1.erase(std::remove_if(vec1.begin(), vec1.end(), [&predicate](const T& item) { return !predicate(item); }), vec1.end());

    //////////////////////////////////
    // Remove elements from vec2 that pass the predicate
    //////////////////////////////////////////////////              ////////////
    vec2.erase(std::remove_if(vec2.begin(), vec2.end(), predicate), vec2.end());

    //////////////////////////////
    // Sort the remaining elements
    ////////////////////////////////////
    std::sort(vec1.begin(), vec1.end());
    ////////////////////////////////////
    std::sort(vec2.begin(), vec2.end());
}

I understand the desire to have a policy that rejects all the junk pull requests out there and discourages plagiarism, but it also has the potential to discourage many useful contributions.

To be compliant, I’d have to use a different editor at this point.

1 Like

In some editors, it starts to be less and less obvious what is “standard” autocomplete and what is “AI-generated”.

I personally feel there’s a big difference between @griswaldbrooks snippet and my most usual case:

const auto duration = params->getParam(/* AI starts here with ctrl+shift */ "duration", ros::Duration(1.0));

I.e. in this case there is only one correct way of typing what was intended. If I had to write it myself, the line would be letter-by-letter the same. Here, AI just saves time, it does not invent anything.

That’s the motivation of this work - we need rules around it. We cannot operate in the wild west, we need to know what we’re putting in the projects should there be lawsuits that fall which say that all of your work generated with copilot is illegal. Being able to track it is critical.

sure. I completely understand that. I just think that

might not be what we need

I’m open to other suggestions about how we track AI generated contributions. What else would you have in mind? We may need to know in the future which functions / lines were generated by something like ChatGPT if there turns out to be serious licensing issues. That will need to be searchable.

Ensuring we have traceability for legal compliance is something that as open-source practitioners we need to be keenly aware of in generative AI, but also just general software licensing, etc. Its not a new category for compliance, but its a new vector on it we need to consider.

This is something that folks and their organizations really need to take a look at internally and come to a conscious decision if they’re willing to risk potential lawsuits given its questionable legal status. In the meantime, I think tracking where this code exists is a minimum bar so that we can achieve compliance if it were necessary.

I think the two main areas of concern @smac mentions in the OP are not strictly unique to AI-generated contributions, so it’s useful to consider how we’d handle them without LLMs as part of the equation.

Bad-faith contributions to open-source repos:

[…] some maintainers (including myself) have already had to engage with users creating numerous superfluous pull requests adding random AI generated code into random spots, with AI generated PR text that makes no sense.

What would we do if someone submits a low-quality PR the old-fashioned pre-LLM way? Depending on the project, maybe a first-time submitter gets some polite but firm advice about the contributor guidelines, while a repeat offender gets put on PR probation for a few months.

There is some precedent for this. In 2020 there was a minor crisis in several open-source projects on Github because users were submitting low-content PRs in order to get a free T-shirt from Digital Ocean’s Hacktoberfest event and wasting a lot of maintainer effort. I think the main outcome from this was that Github expanded the repo moderation tools to enable more granular control of who can submit PRs.

The main issue with LLMs here is that it makes it easier to create submissions that look correct and have the same texture as a genuine manually-written change, which means it takes more time and effort for the maintainer to figure out what’s going on and reject the PR. One way to deal with this is to just give way less leeway when judging if someone’s making a bad PR for good-hearted reasons or a bad PR for lazy or malicious reasons, and take firm and visible action against the latter category.

Legal liability from PRs that are not novel contributions:

Due to the current (November 2023) legal ambiguity of generative AI-based software such as those generated by ChatGPT, it is the responsibility of a developer representing an organization to ensure that their contributions do not violate copyright or software licenses. Software uniquely generated by AI is not currently subject to copyright, but software that has been copied by generative AI tools may be subject to the original author’s copyright and licensing terms.

This sort of thing would fall under the certifications within the Developer Certificate of Origin, which is a required sign-off for every commit to the core ROS repos.

In theory, developers who sign-off their commits are already certifying that they review each set of changes for license compatibility, copyright compliance, etc.

In practice, very few people or organizations seriously examine where their code is coming from. There’s a lot of code out there that was manually copy-pasted from StackOverflow or some dev’s blog.

I don’t know offhand if there are examples of situations where the DCO has been put to the legal test (beyond the SCO-Linux disputes, which are the reason the DCO exists to begin with). In such a case I bet the DCO, in conjunction with git-blame, would primarily facilitate finger-pointing to limit the scope of liability to an individual rather than the whole organization (i.e., “That’s him, officer! That’s the guy who lied about checking his code for GPL contamination!”).

The challenge AI brings in here is that it obfuscates the origin of the code it’s producing, and people who are using it generally aren’t thinking about how it works or where the code is coming from. IMO this is an issue of community awareness and perception: the companies that make these systems want people to think of them as your little buddy who writes code for you, not as a big pattern-matching engine with an undisclosed amount of incompatibly-licensed code in its training set.

A self-certification requirement like the one proposed in the OP (“Do not present AI created code as original work if it is not substantive modified or combined with other developer contributions”) is in-line with existing policies, but it would be important to discuss how we’d determine if a contribution was submitted in violation of this policy.

Resolving this in a more conclusive way than requiring/trusting individual developers to honestly and thoroughly self-certify probably requires a way to independently inspect the provenance of a given submission, which seems extremely challenging to implement in a consistent and reliable way.

6 Likes

First of all, thank you @smac to decide to engage on this very hard problem.

Just my 2c, but I agree on @peci1’s view where more and more tools will integrate LLM generated outputs, or “AI” if you wish.
I can see the legal concern, but I can also see most if not all PRs will be commented as generated by AI, and comments like // this block/line of code is generated by an AI everywhere in the codebase. I’m not sure if I like that future.

Also, thank you very much @schornakj for making the core concern clear. While I agree something has to be taken into action to prevent both rogue PRs and copyright infringement (which I agree, are much more likely to happen with outputs from generative AIs), I don’t think this (@smac’s) proposal is the right way to do it.

The biggest thing I’m looking to address on that respect is making sure that we have a strict policy that Generative AI can be used in code; not in PR descriptions or developer discussions. If a user can’t in their own words tell you what the PR is doing, its not worth maintainers’ time of day to review or thoughtfully respond to it.

That is one major element of a policy I feel strongly about. How we track generative AI software is definitely up for healthy debate, but we should respect each other enough to respond human-to-human when talking about complicated technical things found in robotics software infrastructure. Personally, even if the ROS community doesn’t set this as a policy, I will in Nav2 since I find it wildly disrespectful. However, I think that is a good general policy with all things AI; use it to automate tasks and fix problems, but not in discussions with each other.

You wouldn’t use ChatGPT to text your significant other, would you? I know I haven’t bought you dinner recently, but we are volunteers or professionals try to make the world a better place :wink:

I think its more than you think, especially for larger organizations, but that’s here nor there and I don’t want to get side tracked. But you can see lots of evidence of this like Amazon’s push for Quality Levels and Bosch’s presentation of about licensing at ROSCon. Much of Samsung’s Open Source team was focused on compliance and tooling around compliance. I have an entire book written about software licenses, terms, and important notes I read while there as pseudo-required reading.

Any suggested other way? My aim is to be able to track where generative AI was used so that in some notional future we need to remove it, we know what is good and what potentially needs careful review. Perhaps using Git to have some remark about it in the commit and set a policy to stick all AI generated code in unique commits from the rest? A required remark in PR template that generative AI was used across the PR (but then doesn’t isolate specific lines / functions)?

I like the machine comment in the git history mostly because I feel like I have to look at too many comments created for machines anyway when reading code (can we all agree that doxygen comments are 90% noise and there has to be a better way).

You wouldn’t use ChatGPT to text your significant other, would you?

After that South Park episode where Cartman did that, my wife did it for a week or so. It was a funny bit, but it quickly got old. One of our premises in this conversation is that this generative AI stuff is genuinely helpful. On that front, I’d be okay with banning it for now and waiting for larger companies that have a financial interest in us using generative AI to figure out how to build good tooling to satisfy legal concerns. And if it is just a fad and not all that useful, there is no harm in banning it.

We practically ban all sorts of other helpful technology in ROS because something is missing for us to be able to accept those kinds of contributions. In this case, the thing missing is a tooling and legal precedent for accepting code contributions created with AI tooling.

I’m not proposing an outright ban - though if we can’t find any way to track it that makes folks happy, that may be a practical solution which shifts liability back onto the user more squarely. That isn’t, however, my first or second choice.

While I totally understand (especially for the maintainer’s sake) and want to agree, I just want to be cautious to simply ban things. I’d prefer to prohibit its usage as a guideline, but not strict law.

  • discussions: what about using it as a tool to generate useful information?
    • e.g. as an advanced grep
  • PRs: what about simple, small PRs that could be reasonably generated by branch name and diff? I.e. when technology and tools evolve to such a point that there’s not so much difference between it and a human written one.
    • e.g. a PR that removes trailing whitespaces

I hate being that guy that just opposes an idea without any alternatives, but I really don’t have any good ideas. Maybe it’s because I see it as a new tool just like a typewriter or computer, or just haven’t seen enough dark sides of it. (Although there were times I yelled at my colleagues not to blindly copy paste generated code)

I may be the weirdo here, but I think it’s too soon to correlate “using generative AI” and “potentially bad code”. Sure that may be the majority of bad PRs now, but it might also be the majority of good PRs in the near future.

This research (it was conducted by github - so it should be taken with a pile of salt) basically states that better code is generated faster with the aid of a generative AI. I wouldn’t be surprised if more papers states so.

I personally prefer that, and maybe with some guides for reducing bad PRs without malicious intent.
e.g.

# PR template
- [ ] I have used a generative AI (e.g. GitHub Copilot, ChatGPT)
  - [ ] I have read the [legal and technical risks](a link to a page explaining it) of using a generative AI, and made sure to avoid them

As for those lying or checking those boxes without reading or thinking, I don’t think a policy will stop them for making rogue or bad PRs anyway.

Here’s my take. This is not a final policy.

Generative AI in PRs

Overview

In general, generative AI is a benefit to making better PRs (both the submitted content and the PR description). However there are two potential liabilities:

  1. Legality of the submitted content is not clear. If the generative AI was trained on copyrighted data, the resulting output may fall under copyright, but also may not. There is as yet no case law nor legislation settling this point anywhere in the world. This makes it difficult to claim it complies with the project’s chosen open-source license.
  2. The generated content may not do what is expected, because the PR’s submitter does not fully understand it.

We can have separate policies for use of generative AI in source code and in documentation.

Source code PRs

Generative AI is now used for things as small as auto-completing a function’s parameters. This makes it problematic to ask the submitter to put comments before every use of generative AI in a submission.

We can consider the following requirements of source code PRs.

  1. The PR title and description must be entirely hand-written. Using generative AI to produce the title and/or description indicates that the submitter may not understand their own PR.
  2. The PR description is required to list the provenance of all code in the PR. This applies not just to generative AI-produced code but also code copied from Stack Exchange, someone’s blog, etc.
  3. For a PR that includes source code from generative AI, the PR description must list approximately what proportion of the changed lines of the PR contain generated code, what generative AI models/tools were used to produce them, and for what purpose. For example:
    1. if CLion’s AI-based autocomplete is used, that should be listed as “CLion autocomplete - completing function parameters - all function calls;
    2. if ChatGPT was used to debug some hand-written code, that should be listed as “ChatGPT - debugging of hand-written code - two lines of algorithm”; and
    3. if ChatGPT was used to generate an entire algorithm implementation, that should be listed as “ChatGPT - generated algorithm implementation - 75% of lines”.
  4. For each git commit in the merged PR, if the content of that commit contains AI-generated source code, the commit message must note this.
  5. All source code derived from generative AI must be fully tested with 100% line coverage as a minimum, and ideally full MC/DC coverage. Tests will ideally be hand-written, or at least hand-designed.
  6. If a maintainer suspects that a PR may have been substantially generated by an AI-based source code generating tool, or suspects that the submitter has not been honest or accurate in the PR description about what was generated by AI, for what purpose, and what proportion of the PR, the maintainer has the right to request a walkthrough of the PR’s code by the submitter using a teleconference tool.

Documentation PRs

Generative AI shows much more potential for documentation, as it can resolve a long-standing problem of developers not documenting their code, and documentation submissions being lacking for keeping the long-form documentation of the project up-to-date and complete.

However, generative AI still suffers from the same copyright risks for documentation as it does for source code.

We can consider the following requirements of documentation PRs.

  1. The PR title and description must be entirely hand-written, with the exception of generating a list of functions/features documented as part of a larger PR description. Using generative AI to produce the title and/or description indicates that the submitter may not understand their own PR.
  2. The PR description is required to list the provenance of all content of the PR.
  3. For a PR that includes documentation from generative AI, the PR description must list approximately what proportion of the changed lines in the PR contain generated documentation, what generative AI models/tools were used to produce them, and for what purpose.
  4. For each git commit in the merged PR, if the content of that commit contains AI-generated source code, the commit message must note this.
  5. The submitter must acknowledge that they have completely read, tested, and verified the documentation changes as well as the complete page(s) being modified.
  6. If a maintainer suspects that a PR may have been substantially generated by an AI-based documentation generating tool, or suspects that the submitter has not been honest or accurate in the PR description about what was generated by AI, for what purpose, and what proportion of the PR, the maintainer has the right to request a walkthrough of the PR’s documentation changes by the submitter using a teleconference tool.
3 Likes

In general, I really like your approach! I’d just add a requirement that for a docs PR, the submitter has manually read all the generated docs and verified their accuracy. This is important because otherwise the docs won’t fit the code, which is worse than no docs.

1 Like

That is quite a strict, or hard to follow policy. May I apologize for the nitpicking.

While I agree on the overview:


Feedback on suggested rules on source code PRs

As I said in my previous post, I don’t really like the hard “must”. I’d agree if it had an “In general” beforehand.

Nothing to disagree on!

I think this is a bit too much. Ultimately the goal is to reduce the 2 risks posed in the overview, and I struggle to understand why this level of detail is needed.

Really? Some people commit very often, especially in development stage. This will really disturb their workflow. And if the answer is to squash the commit, then what’s the difference between squashing and only writing in the PR?

Again, I feel “All” is too strong. Sorry for nitpicking, but what about some obvious AI-assisted autocompletes? What’s the difference between it and an experienced programmer typing from habit?
Also, what’s the definition on “100% line coverage”? Does manually testing count? I’ve never had good experiences with projects chasing line coverage, especially 100%.

No objections on that.


Feedback on suggested rules on documentation PRs

Same as the source code ver., I feel “must” too strong. Especially for a documentation PR, I can see some being just as below, which I think is 100% fine to be auto-generated.

Add documentation for class/functions in file x

List of class/functions with documentation added:
...

No objections!

Same problems with the source code version.

Same as the source code version, no objections.


Just for one last note, if we value the maintainer’s time and sanity, the last thing we want is suspecting if the PR writer is lying (maybe due to too much procedures), or (god forbid!) a video call.
Thus I think making life easier for the people writing PRs is equally important as to make life easier for the maintainers. (Although I might be too biased :wink:)


This is a very good point!
Now that I think of it, there’s also the problem with non-native (or fluent) English speakers.
More often than not, their only choice is a translation service - which isn’t really helpful.
That’ll be a discussion for another thread, but I think we should keep in mind that not everyone is good at English. (Although like it or not, there’s always the option to ignore those people)

1 Like

I think it would be wise if we settle on a standard syntax for this requirement. This would allow us to cherry-pick AI generated commits and flag them for a review. Perhaps a quick acronym would be sufficient (e.g. Contains AI Generated Code – CAIGC).

@gbiggs I think there is a subtlety in all of this that we need to discuss / clarify. What about code that is AI generated, but then hand tweaked / reviewed?

I think the most common use case with AI is that the generated code gets about 80% of the way there but then a human has to step in and further refine the results. For example, I could see a prompt like, “Write me a ROS 2 Humble Python node that subscribes to topic A of type X and publishes topic B of type Y” working reasonably well. In this case the human would then write a function that translates type “X” to type “Y”. Do we consider this AI generated code, or does it qualify as human generated, or is it a third distinct thing? In this case what would be the expectation for the series of commits / annotations? Would the appropriate behavior be to first commit the raw AI generated code and then commit the delta from the raw code to the final product?

I think a good exercise would be for us to run through a couple of examples and model the behavior that we expect from contributors.

2 Likes