Interim policy on the use of generative AI in OSRF projects

There is a desire to use generative AI tools as part of producing contributions to OSRF projects. However, we request that developers please refrain from using generative AI tools (such as Chat-GPT, Gemini, and GitHub Copilot) in submissions to OSRF projects, including ROS 2, Gazebo, and Open-RMF. The OSRA Technical Governance Committee has created a Technical Committee to investigate the potential risks with regard to these tools; this TC plans to release a comprehensive formal policy of the OSRF on contributors’ use of Generative AI by the end of 2024.

If you have comments or feedback on this subject please feel free to discuss them in this ROS Discourse thread and we will consider them, along with other relevant opinions (particularly legal), while developing these guidelines.

If you would like to share feedback privately please message @gbiggs (OSRA TGC chair) and @nuclearsandwich (chair of the Technical Committee on Generative AI) on Discourse.

3 Likes

What about Jetbrains “Full Line Code Completion”?

It runs completely locally, and regarding training data, they explicitly say:

The backbone of full line code completion is a programming-language specific language model, which is trained in house using a dataset of open-source code with permissive licenses.

In music copyright law, there’s this minimum length of copyrightable sample, or “substantial part”, of what can be considered to infringe copyright. I guess that finishing the half-written line of code you have is somewhat almost deterministic (given the frameworks you use and variable/function names you chose before), so it might be pretty difficult to find or prove any kind of copyright infringement.

My small tangential grievance was shutting down of ROS answers. The stackoverflow is a for profit site which directly uses answers for LLM & GenAI training.

It was nice to have a place where you could ask ROS related questions and have it answered by actual humans.

Is there a possibility of ressurection given stackoverflow’s is progressively more closed and anti-user tendencies?

This is nonsense, political noise indicating OSRF has lost focus.

We need the DDS issues addressed before everyone stops using ROS.

A dead project has no political weight to throw around.

3 Likes

I’d like to make sure we consider users personal projects that may create generative AIs in the future. The policy that’s put in place might need to ensure the training sets cited as well as the algorithms.

It seems to me this policy only considers OSRF-managed repos. User-managed repos are free to do whatever they want, right (assuming the comply to the standard licensing terms)?

I think this was directed at my comment. I mean to say, if contributors make a Generative AI, and they then use it to code. For example a personal (from scratch) Co-pilot. Even then we should consider the training set’s origin, not just the algorithm’s source.

I personally do not use generative AI to make contributions… maybe just not yet.

But I wanted to share the opinion about this, because I believe this decision will have significant impact for entire community in next several years.

Technical Committee to investigate the potential risks with regard to these tools; this TC plans to release a comprehensive formal policy of the OSRF on contributors’ use of Generative AI by the end of 2024.

Really appreciate this effort. We will wait for the official policy announcement from TC.

The followings are what i think about generative AI thing, happy to have some feedback!

Concerns

  • Using AI to generate content can still raise legal concerns about copyrighted content that the AI has been trained on especially public information.

  • This could be accidental copyright infringement, license incompatibility. (training code may be copyleft or otherwise incompatible)

  • Could conflict against individual LLMs policy and terms. (model restrictions)

  • Some companies might not be willing to use Open Source that are know to include the code could be generated by AI. (usage concerns)

Question

I believe that , even without generative AI, there is (has been) the risk that a contributor will copy incompatibly licensed materials they are not allowed to…?

Why we need to make an exceptional condition for generative AI tool that is just a one of the tool from my perspective?

I think developer’s responsibility does not change at all, and we are responsible for everything we sign as individual?

Options?

  • Restricted (Conservative): Don’t accept any contributions generated by AI. (current plan?)

  • By use case: Only allow for some specific use cases, such as debugging, analyzing.

  • Model base: Only accept contributions generated by AI, that is trained locally with consistent materials.

  • Developer Responsibility: In other words, trust the developer with guideline?

Just FYI: Linux Foundation Guidance Regarding Use of Generative AI Tools goes with Developer Responsibility at this moment.

i may be totally wrong on some points and missing legal parts, but would love to hear some feedback.

thanks,

Tomoya

7 Likes

This is a really weird hill for OSRA to die on.

AI-assisted code generation is clearly the future of software development, and it’s ironic to see OSRA, an organization at the forefront of AI and robotics, being one of the first to reject AI. I’m not saying there are zero concerns with AI-generated code, but I strongly disagree with the overcautious approach being taken here.

A better approach would be to wait and see what other much larger open source projects do. For example, the Linux Foundation AI Policy previously mentioned by @tomoyafujita (which also covers the Cloud Native Computing Foundation, Kubernetes etc):

We have received numerous questions from our project communities about contributing AI-generated code to Linux Foundation projects.

Open source software has thrived for decades based on the merits of each technical contribution that is openly contributed to and reviewed by community peers. Development and review of code generated by AI tools should be treated no differently.

Code or other content generated in whole or in part using AI tools can be contributed to Linux Foundation projects. However, there are some unique considerations related to AI generated content that developers should factor into their contributions.

  1. Contributors should ensure that the terms and conditions of the generative AI tool do not place any contractual restrictions on how the tool’s output can be used that are inconsistent with the project’s open source software license, the project’s intellectual property policies, or the Open Source Definition.
  2. If any pre-existing copyrighted materials (including pre-existing open source code) authored or owned by third parties are included in the AI tool’s output, prior to contributing such output to the project, the Contributor should confirm that they have have permission from the third party owners–such as the form of an open source license or public domain declaration that complies with the project’s licensing policies–to use and modify such pre-existing materials and contribute them to the project. Additionally, the contributor should provide notice and attribution of such third party rights, along with information about the applicable license terms, with their contribution.

Some tools provide features that can assist contributors. For example, some tools provide a feature that suppresses responses that are similar to third party materials in the AI tool’s output, or a feature that flags similarity between copyrighted training data or other materials owned by third parties and the AI tool’s output and provides information about the licensing terms that apply to such third party materials.

Individual Linux Foundation projects may develop their own project-specific guidance and recommendations regarding AI-generated content. Similarly, organizations that employ open source developers may have more stringent guidelines related to use of AI for software development. Contributors should comply with their employer’s policies when contributing.

4 Likes

This does alot of heavy lifting here. Do you (or any of us not currently employed by the largest of companies) have access to the appropriate tooling to determine if there are copyright issues with the outputs of the models you’re attempting to contribute back to open-source? As far as my understanding goes, there are only a few tools that do this, they cost millions a year to acquire for an organization, and they have immense number of false positives that need to be sifted through.

I’d argue no and LF setting up a ‘pass-the-buck’ policy know this as well.

Its worth noting that OSRF’s policy is interim while determining what a medium-term policy should look like. There are several major open source organizations that have outright banned the use of Generative AI (among others with similar bans or softer but firm discourgements) and they/we have not gone so far as of yet

If there were to be serious copyright issues due to the outputs of GenAI model(s) that are commonly used, and if some policy or knowledge about what code might be tainted isn’t in place, that could be the end of ROS and hundreds of other open-source communities since it would be virtually impossible to continue without stripping back years of work and redoing it since you don’t know what external contributions were legally dubious.

This is an important topic for our age and needs to be considered by all organizations, large and small, open-source and proprietary commercial. The legal ramifications of a laissez-faire attitude are enormous and many large players are setting up their own policies internally due to this as well.

I wouldn’t recommend folks jumping to too harsh of conclusions here until a full policy is flushed out and proposed. I don’t think anyone’s on Team Outright Ban, but some thought and guardrails need to be put in place.

For example, in Nav2 for the last ~year or so, we’ve required users to indicate if they used generative AI in our required PR template so we know when it was used as a very minimal, very non-intrusive first step to have some tracability. This has caused zero friction and gives us some important information (which we will likely need to build on in the future).

Edit: Note that Nav2 and my thought here aren’t the official policy or project of OSRF, just my thoughts and current directives on the matter as an active project maintainer that also has to think about this.

1 Like

I will have a fuller reply for all of you later today, but for now I want to emphasise as strongly as I can the word “INTERIM” in the title of the post.

1 Like

This word is one of the more difficult ones for non-native speakers. For those (including me) who are unsure, you can imagine temporary or non-final instead.

NB: I’ve used help of some AI to write this post. It is called thesaurus (or at least was in the pre-AI era). It corrected typos I pointed to. I don’t know how this AI works and what data it was trained on (if it was trained at all) and it doesn’t seem to have any licensing policy as it is a part of the preinstalled keyboard app on my phone which definitely did not show any EULA to me when activating the phone. Now, what if I used this phone keyboard to author a PR in Github mobile?

1 Like

I was indeed also going to suggest changing interim to temporary, because it’s not necessarily clear what the former means and it’s easy to just ignore it and read “something-something policy / ban on the use of generative AI”.

while I completely agree this is an important topic and I also agree those problems are currently turning people away from ROS: a) the work done on rmw_zenoh is going to improve this situation and b) unfortunately software is considered (almost) equal to a book, a movie, a play or a television series, at least as far as (copyright) law is concerned.

This means you have to absolutely make sure you’re not (inadvertently) copying someone’s work without permission, or making something heavily based on someone else’s work without stating that, because otherwise copyright law provides some quite powerful awful tools to whoever believes you infringed on their copyright (ie: violated it).

Especially for open-source sw, which already has a hard time when it comes to things like liability and licensing, this is a real challenge especially in (semi) corporate environments (ie: companies) – which is exactly the kinds of environments ROS would really like to see as much as adoption as possible.

As @smac mentions, it’s really difficult to give any guarantees about that when using generative AI right now, especially as a hobbyist, small company or anyone who doesn’t have a legal team at their disposal.

To be honest I believe we’re already too late and lots of lines / functions that were auto-completed by tools like Copilot or perhaps heavily based on what such tools suggested have already been merged. Not just in ROS, but also in other projects.

But it does make sense to at least take pause and at least acknowledge this is an issue and setup some rules. At the very least no one can accuse you/us of not having done anything at all.


Edit: the Linux Foundation AI Policy is rather interesting. IANAL, but it seems to me it basically says something like:

If you use AI and contribute something based on whatever it says/does/generates for you, you are responsible for figuring out whether you are legally allowed to contribute that to any of our projects.

Oh and you are also legally responsible/liable if this (ie: whether you were allowed to do this) doesn’t turn out to be the case in the future, once we figure this whole generative AI situation out.

This seems to to indeed push the problem to contributors, as @smac already noticed.

That’s like the Submission of Contributions section in the Apache 2.0 license and/or a CLA, but with the words “generative AI” added.

Would any of us here know exactly whether they comply with such a policy when using generative AI tools like Copilot? @amacneil @Shannon_Barber?

2 Likes