Interim policy on the use of generative AI in OSRF projects

There is a desire to use generative AI tools as part of producing contributions to OSRF projects. However, we request that developers please refrain from using generative AI tools (such as Chat-GPT, Gemini, and GitHub Copilot) in submissions to OSRF projects, including ROS 2, Gazebo, and Open-RMF. The OSRA Technical Governance Committee has created a Technical Committee to investigate the potential risks with regard to these tools; this TC plans to release a comprehensive formal policy of the OSRF on contributors’ use of Generative AI by the end of 2024.

If you have comments or feedback on this subject please feel free to discuss them in this ROS Discourse thread and we will consider them, along with other relevant opinions (particularly legal), while developing these guidelines.

If you would like to share feedback privately please message @gbiggs (OSRA TGC chair) and @nuclearsandwich (chair of the Technical Committee on Generative AI) on Discourse.

4 Likes

What about Jetbrains “Full Line Code Completion”?

It runs completely locally, and regarding training data, they explicitly say:

The backbone of full line code completion is a programming-language specific language model, which is trained in house using a dataset of open-source code with permissive licenses.

In music copyright law, there’s this minimum length of copyrightable sample, or “substantial part”, of what can be considered to infringe copyright. I guess that finishing the half-written line of code you have is somewhat almost deterministic (given the frameworks you use and variable/function names you chose before), so it might be pretty difficult to find or prove any kind of copyright infringement.

1 Like

My small tangential grievance was shutting down of ROS answers. The stackoverflow is a for profit site which directly uses answers for LLM & GenAI training.

It was nice to have a place where you could ask ROS related questions and have it answered by actual humans.

Is there a possibility of ressurection given stackoverflow’s is progressively more closed and anti-user tendencies?

This is nonsense, political noise indicating OSRF has lost focus.

We need the DDS issues addressed before everyone stops using ROS.

A dead project has no political weight to throw around.

4 Likes

I’d like to make sure we consider users personal projects that may create generative AIs in the future. The policy that’s put in place might need to ensure the training sets cited as well as the algorithms.

It seems to me this policy only considers OSRF-managed repos. User-managed repos are free to do whatever they want, right (assuming the comply to the standard licensing terms)?

I think this was directed at my comment. I mean to say, if contributors make a Generative AI, and they then use it to code. For example a personal (from scratch) Co-pilot. Even then we should consider the training set’s origin, not just the algorithm’s source.

I personally do not use generative AI to make contributions… maybe just not yet.

But I wanted to share the opinion about this, because I believe this decision will have significant impact for entire community in next several years.

Technical Committee to investigate the potential risks with regard to these tools; this TC plans to release a comprehensive formal policy of the OSRF on contributors’ use of Generative AI by the end of 2024.

Really appreciate this effort. We will wait for the official policy announcement from TC.

The followings are what i think about generative AI thing, happy to have some feedback!

Concerns

  • Using AI to generate content can still raise legal concerns about copyrighted content that the AI has been trained on especially public information.

  • This could be accidental copyright infringement, license incompatibility. (training code may be copyleft or otherwise incompatible)

  • Could conflict against individual LLMs policy and terms. (model restrictions)

  • Some companies might not be willing to use Open Source that are know to include the code could be generated by AI. (usage concerns)

Question

I believe that , even without generative AI, there is (has been) the risk that a contributor will copy incompatibly licensed materials they are not allowed to…?

Why we need to make an exceptional condition for generative AI tool that is just a one of the tool from my perspective?

I think developer’s responsibility does not change at all, and we are responsible for everything we sign as individual?

Options?

  • Restricted (Conservative): Don’t accept any contributions generated by AI. (current plan?)

  • By use case: Only allow for some specific use cases, such as debugging, analyzing.

  • Model base: Only accept contributions generated by AI, that is trained locally with consistent materials.

  • Developer Responsibility: In other words, trust the developer with guideline?

Just FYI: Linux Foundation Guidance Regarding Use of Generative AI Tools goes with Developer Responsibility at this moment.

i may be totally wrong on some points and missing legal parts, but would love to hear some feedback.

thanks,

Tomoya

11 Likes

This is a really weird hill for OSRA to die on.

AI-assisted code generation is clearly the future of software development, and it’s ironic to see OSRA, an organization at the forefront of AI and robotics, being one of the first to reject AI. I’m not saying there are zero concerns with AI-generated code, but I strongly disagree with the overcautious approach being taken here.

A better approach would be to wait and see what other much larger open source projects do. For example, the Linux Foundation AI Policy previously mentioned by @tomoyafujita (which also covers the Cloud Native Computing Foundation, Kubernetes etc):

We have received numerous questions from our project communities about contributing AI-generated code to Linux Foundation projects.

Open source software has thrived for decades based on the merits of each technical contribution that is openly contributed to and reviewed by community peers. Development and review of code generated by AI tools should be treated no differently.

Code or other content generated in whole or in part using AI tools can be contributed to Linux Foundation projects. However, there are some unique considerations related to AI generated content that developers should factor into their contributions.

  1. Contributors should ensure that the terms and conditions of the generative AI tool do not place any contractual restrictions on how the tool’s output can be used that are inconsistent with the project’s open source software license, the project’s intellectual property policies, or the Open Source Definition.
  2. If any pre-existing copyrighted materials (including pre-existing open source code) authored or owned by third parties are included in the AI tool’s output, prior to contributing such output to the project, the Contributor should confirm that they have have permission from the third party owners–such as the form of an open source license or public domain declaration that complies with the project’s licensing policies–to use and modify such pre-existing materials and contribute them to the project. Additionally, the contributor should provide notice and attribution of such third party rights, along with information about the applicable license terms, with their contribution.

Some tools provide features that can assist contributors. For example, some tools provide a feature that suppresses responses that are similar to third party materials in the AI tool’s output, or a feature that flags similarity between copyrighted training data or other materials owned by third parties and the AI tool’s output and provides information about the licensing terms that apply to such third party materials.

Individual Linux Foundation projects may develop their own project-specific guidance and recommendations regarding AI-generated content. Similarly, organizations that employ open source developers may have more stringent guidelines related to use of AI for software development. Contributors should comply with their employer’s policies when contributing.

5 Likes

This does alot of heavy lifting here. Do you (or any of us not currently employed by the largest of companies) have access to the appropriate tooling to determine if there are copyright issues with the outputs of the models you’re attempting to contribute back to open-source? As far as my understanding goes, there are only a few tools that do this, they cost millions a year to acquire for an organization, and they have immense number of false positives that need to be sifted through.

I’d argue no and LF setting up a ‘pass-the-buck’ policy know this as well.

Its worth noting that OSRF’s policy is interim while determining what a medium-term policy should look like. There are several major open source organizations that have outright banned the use of Generative AI (among others with similar bans or softer but firm discourgements) and they/we have not gone so far as of yet

If there were to be serious copyright issues due to the outputs of GenAI model(s) that are commonly used, and if some policy or knowledge about what code might be tainted isn’t in place, that could be the end of ROS and hundreds of other open-source communities since it would be virtually impossible to continue without stripping back years of work and redoing it since you don’t know what external contributions were legally dubious.

This is an important topic for our age and needs to be considered by all organizations, large and small, open-source and proprietary commercial. The legal ramifications of a laissez-faire attitude are enormous and many large players are setting up their own policies internally due to this as well.

I wouldn’t recommend folks jumping to too harsh of conclusions here until a full policy is flushed out and proposed. I don’t think anyone’s on Team Outright Ban, but some thought and guardrails need to be put in place.

For example, in Nav2 for the last ~year or so, we’ve required users to indicate if they used generative AI in our required PR template so we know when it was used as a very minimal, very non-intrusive first step to have some tracability. This has caused zero friction and gives us some important information (which we will likely need to build on in the future).

Edit: Note that Nav2 and my thought here aren’t the official policy or project of OSRF, just my thoughts and current directives on the matter as an active project maintainer that also has to think about this.

6 Likes

I will have a fuller reply for all of you later today, but for now I want to emphasise as strongly as I can the word “INTERIM” in the title of the post.

2 Likes

This word is one of the more difficult ones for non-native speakers. For those (including me) who are unsure, you can imagine temporary or non-final instead.

NB: I’ve used help of some AI to write this post. It is called thesaurus (or at least was in the pre-AI era). It corrected typos I pointed to. I don’t know how this AI works and what data it was trained on (if it was trained at all) and it doesn’t seem to have any licensing policy as it is a part of the preinstalled keyboard app on my phone which definitely did not show any EULA to me when activating the phone. Now, what if I used this phone keyboard to author a PR in Github mobile?

3 Likes

I was indeed also going to suggest changing interim to temporary, because it’s not necessarily clear what the former means and it’s easy to just ignore it and read “something-something policy / ban on the use of generative AI”.

while I completely agree this is an important topic and I also agree those problems are currently turning people away from ROS: a) the work done on rmw_zenoh is going to improve this situation and b) unfortunately software is considered (almost) equal to a book, a movie, a play or a television series, at least as far as (copyright) law is concerned.

This means you have to absolutely make sure you’re not (inadvertently) copying someone’s work without permission, or making something heavily based on someone else’s work without stating that, because otherwise copyright law provides some quite powerful awful tools to whoever believes you infringed on their copyright (ie: violated it).

Especially for open-source sw, which already has a hard time when it comes to things like liability and licensing, this is a real challenge especially in (semi) corporate environments (ie: companies) – which is exactly the kinds of environments ROS would really like to see as much as adoption as possible.

As @smac mentions, it’s really difficult to give any guarantees about that when using generative AI right now, especially as a hobbyist, small company or anyone who doesn’t have a legal team at their disposal.

To be honest I believe we’re already too late and lots of lines / functions that were auto-completed by tools like Copilot or perhaps heavily based on what such tools suggested have already been merged. Not just in ROS, but also in other projects.

But it does make sense to at least take pause and at least acknowledge this is an issue and setup some rules. At the very least no one can accuse you/us of not having done anything at all.


Edit: the Linux Foundation AI Policy is rather interesting. IANAL, but it seems to me it basically says something like:

If you use AI and contribute something based on whatever it says/does/generates for you, you are responsible for figuring out whether you are legally allowed to contribute that to any of our projects.

Oh and you are also legally responsible/liable if this (ie: whether you were allowed to do this) doesn’t turn out to be the case in the future, once we figure this whole generative AI situation out.

This seems to to indeed push the problem to contributors, as @smac already noticed.

That’s like the Submission of Contributions section in the Apache 2.0 license and/or a CLA, but with the words “generative AI” added.

Would any of us here know exactly whether they comply with such a policy when using generative AI tools like Copilot? @amacneil @Shannon_Barber?

4 Likes

Thank you all those who gave well-thought-out feedback. This will be very helpful to the TC when forming the long-term policy during the remainder of 2024.

I’d like to address some specific concerns that have been raised.

The linked blogged post says:

The backbone of full line code completion is a programming-language specific language model, which is trained in house using a dataset of open-source code with permissive licenses.

Unfortunately it doesn’t say which licenses, or if you can limit it to a model trained on one specific license. This is a (potential) problem because while a piece of code copyrighted by someone else may be under a permissive license, if that license is different from the license the code you use it in is under, then it is a copyright violation to change the license of the original code without permission from the original creator.

Now, full line code completion works on single lines only, and is advertised as saving about 20% of keystrokes. This implies that it only provides very small chunks of code which likely would be too short and/or generic to provably violate a copyright. That’s a good thing, for as long as it stays that way. The same argument doesn’t apply to using Jetbrains’ full AI tool or tools from other companies such as Copilot that can generate an entire source file for you.

This policy, and the work the TC is doing, is specifically for tools that use “generative AI” models to generate source code which is then proposed for inclusion in one of the OSRF’s projects. It is not relevant to people who are creating generative AI models - there are different concerns there, such as privacy of personally-identifying information found in the training set data, that the TC will not be addressing.

These are some of the very valid concerns which led to the creation of a TC to put together a formal policy for the OSRF’s projects.

Each contributor’s responsibility does not change, and as always contributors are first in the line of responsibility for ensuring that they have the necessary and correct permission to contribute source code, documentation, artwork, or whatever they are contributing, under the license that it will be under as part of the project. I agree that there are likely instances in our codebases of source code that has been copied without permission - I can recall dealing with at least one such case in the past couple of years. However, the difference with generative AI-powered code-completion and code-generation tools is that now anyone can easily end up trying to contribute code that is against copyright law, potentially without even knowing it, due to the ease of use of these tools and their increasingly-common integration into popular editors and IDEs. This makes the need for a formal policy for use of these tools necessary.

I’d also like to note that the Developer Certificate of Origin sign-off line in each commit message that most of our projects require has never been tested in a court of law. No one actually knows for sure that it can bind someone as a signed contract would, and thus would indemnify the project, other maintainers/contributors, or the OSRF against being sued.

The TC has already put together a comparison table of the stance being taken by various open-source software organisations and projects. It is not the only organisation that has taken this stance, but others have taken other stances, including some that are completely opposite the Linux Foundation’s. NetBSD, Gentoo and SUSE have all decided to go with complete and outright bans on code contributions from generative AI-powered tools.

The Linux Foundation has gone to “developer responsibility” based on their analysis of the risks and the resources they have available to deal with the potential consequences of those risks. I think it’s important to remember that the OSRF is much smaller than the Linux Foundation and does not have anywhere near the same level of resources to handle the potential consequences. We don’t have lawyers on staff and we would not survive long if the OSRF were to be sued by a much larger organisation.

The OSRF is not able to just copy another organisation’s policy, not even the Linux Foundation. Each organisation has to evaluate the known risks, the potential consequences, and how these relate to its own projects, user base, and resources available should consequences arise. This is why the TC is taking time to produce a policy rather than just making a rash decision to copy something that sounds good from another organisation.

I encourage you to re-read the interim policy slowly and understand it fully (including that it is interim), before leaping to conclusions that the OSRA/OSRF is “rejecting AI” or treating this interim policy as a live-or-die, never-to-be-changed decision. The TC has already identified that even its “final” policy proposal will likely need to be continually modified over the years - particularly as case law develops.

As an OSRA silver member, Foxglove has Silver Member representatives on the TGC, and I encourage you to let those representatives know Foxglove’s thoughts and position on generative AI code generation via your own company’s OSRA representative, so that they can be recorded and properly taken into consideration by the TGC.

I respect all of the non-native English speakers in our community for the effort they put in to participate despite having to use a language they may not be fluent in. I’m sorry that you found some of the wording difficult. However, I am not in a position where I can consider the English language capability of every potential reader. All I can do is use clear language that is unambiguous as possible. I encourage anyone who feels that they don’t understand a word in an official statement properly to treat it as a learning experience and consult a dictionary - as I do when I interact with the Japanese ROS community.


Again, thank you all for the well-thought-out feedback. It will be very helpful to the members of the TC.

4 Likes

The biggest threat against ROS is not Generative AI but this governance.

We are witnessing that changing the last letters has no effect. I do not think this small number of people who govern OpenRobotics can offer any solution now or in the future to address any long-pressing problems of ROS. Neither Zenoh nor any technical breakthrough could solve it too. Please do not be in self-deception.

Anyway, @gbiggs:

  1. Could you please let the community know who this Technical Governance Committee is? By names, please. Where are the meeting minutes according to your charter (Article 5.9)? Who instructs this soon-to-be-forever policy?
  2. Similarly, who is this Technical Committee on Generative AI that will formulate such a complicated problem with such limited resources? Why @nuclearsandwich leads it? He might be a great engineer, but what is his proficiency in such legal issues? I want to hear his opinion then.

Again, the biggest threat against ROS is not Generative AI but this governance. You already have lost the new generation, you are losing the hard-earned trust of the previous one.

3 Likes

What exactly are the DDS issues that people are complaining about? For me, FastDDS and Cyclone DDS just work. I’m not handling tons of camera data or something challenging, though.

While that is a nice soundbite, is your proposition that we don’t have any governance over the projects? If you agree with the general need for governance structures, then its reasonable to believe that the structures in place are here for the projects’ and community’s benefit (since none of us are paying money to fund OSRA / ROS for the purposes of destroying it).

The TGC voted upon a resolution to form this committee to look into this subject and make a proposal for a policy that the Foundation’s projects will align against (assuming its accepted).

I think a more productive conversation would be to express your views on the subject at hand (use of Generative AI in open-source Foundation project contributions) so that your opinions and insights can be used in the formulation of a TBD policy. That is what this thread is intended to provide a platform for, not general anti-governance soundbites.

  • What are the specific objections to having a Foundation policy on Generative AI, given that many other organizations, open-source foundations, and companies have such policies in place?
  • What do you think are reasonable guidelines, best practices, and advise to give?
  • What concerns do you have about a policy or about the use of Generative AI in the community?

This is off topic here, but this is a good point. We should make these available and I’m sure its the intent of folks to do so. There may be a fruitful separate discussion thread that could be had about some of the missing items in the Charter that are not yet being fulfilled, including minutes and current voting committee membership. There’s only been about 3-ish meetings so I’m sure its just missing due to getting everything spun up.

@gbiggs I know calling you ‘too busy’ is an understatement, but posted minutes feels like a pre-ROSCon priority from my perspective.

2 Likes

My proposition is very clear here.

And I asked a few simple questions to the chair of the TGC and TC. Please let them respond. If you want to answer, my suggestion for you is to invent a speaker position on the OSRA like our developer advocate then you can appoint yourself.

Otherwise I don’t care about your Generative AI directives.

I sit on both the TGC and TC and am adequately informed and in participation to able to respond to the subjects in which I replied to.

It sounds like you’re yielding your opportunity to have an opinion in consideration then. I think my questions pose a reasonable starting framework for a productive discussion rather than governance bashing.

For others - @amacneil in particular who had some clear thoughts - I and the committee would love to understand some of your concerns! The questions above are posed to you all.

1 Like

Yes, you’re right. I’ve been letting perfect be the enemy of good for a few things. Getting the minutes up has been a priority for this week, and I’m happy to say they are now publicly available.

3 Likes