There is a desire to use generative AI tools as part of producing contributions to OSRF projects. However, we request that developers please refrain from using generative AI tools (such as Chat-GPT, Gemini, and GitHub Copilot) in submissions to OSRF projects, including ROS 2, Gazebo, and Open-RMF. The OSRA Technical Governance Committee has created a Technical Committee to investigate the potential risks with regard to these tools; this TC plans to release a comprehensive formal policy of the OSRF on contributorsâ use of Generative AI by the end of 2024.
If you have comments or feedback on this subject please feel free to discuss them in this ROS Discourse thread and we will consider them, along with other relevant opinions (particularly legal), while developing these guidelines.
If you would like to share feedback privately please message @gbiggs (OSRA TGC chair) and @nuclearsandwich (chair of the Technical Committee on Generative AI) on Discourse.
It runs completely locally, and regarding training data, they explicitly say:
The backbone of full line code completion is a programming-language specific language model, which is trained in house using a dataset of open-source code with permissive licenses.
In music copyright law, thereâs this minimum length of copyrightable sample, or âsubstantial partâ, of what can be considered to infringe copyright. I guess that finishing the half-written line of code you have is somewhat almost deterministic (given the frameworks you use and variable/function names you chose before), so it might be pretty difficult to find or prove any kind of copyright infringement.
My small tangential grievance was shutting down of ROS answers. The stackoverflow is a for profit site which directly uses answers for LLM & GenAI training.
It was nice to have a place where you could ask ROS related questions and have it answered by actual humans.
Is there a possibility of ressurection given stackoverflowâs is progressively more closed and anti-user tendencies?
Iâd like to make sure we consider users personal projects that may create generative AIs in the future. The policy thatâs put in place might need to ensure the training sets cited as well as the algorithms.
It seems to me this policy only considers OSRF-managed repos. User-managed repos are free to do whatever they want, right (assuming the comply to the standard licensing terms)?
I think this was directed at my comment. I mean to say, if contributors make a Generative AI, and they then use it to code. For example a personal (from scratch) Co-pilot. Even then we should consider the training setâs origin, not just the algorithmâs source.
I personally do not use generative AI to make contributions⌠maybe just not yet.
But I wanted to share the opinion about this, because I believe this decision will have significant impact for entire community in next several years.
Technical Committee to investigate the potential risks with regard to these tools; this TC plans to release a comprehensive formal policy of the OSRF on contributorsâ use of Generative AI by the end of 2024.
Really appreciate this effort. We will wait for the official policy announcement from TC.
The followings are what i think about generative AI thing, happy to have some feedback!
Concerns
Using AI to generate content can still raise legal concerns about copyrighted content that the AI has been trained on especially public information.
This could be accidental copyright infringement, license incompatibility. (training code may be copyleft or otherwise incompatible)
Could conflict against individual LLMs policy and terms. (model restrictions)
Some companies might not be willing to use Open Source that are know to include the code could be generated by AI. (usage concerns)
Question
I believe that , even without generative AI, there is (has been) the risk that a contributor will copy incompatibly licensed materials they are not allowed to�
Why we need to make an exceptional condition for generative AI tool that is just a one of the tool from my perspective?
I think developerâs responsibility does not change at all, and we are responsible for everything we sign as individual?
Options?
Restricted (Conservative): Donât accept any contributions generated by AI. (current plan?)
By use case: Only allow for some specific use cases, such as debugging, analyzing.
Model base: Only accept contributions generated by AI, that is trained locally with consistent materials.
Developer Responsibility: In other words, trust the developer with guideline?
AI-assisted code generation is clearly the future of software development, and itâs ironic to see OSRA, an organization at the forefront of AI and robotics, being one of the first to reject AI. Iâm not saying there are zero concerns with AI-generated code, but I strongly disagree with the overcautious approach being taken here.
A better approach would be to wait and see what other much larger open source projects do. For example, the Linux Foundation AI Policy previously mentioned by @tomoyafujita (which also covers the Cloud Native Computing Foundation, Kubernetes etc):
We have received numerous questions from our project communities about contributing AI-generated code to Linux Foundation projects.
Open source software has thrived for decades based on the merits of each technical contribution that is openly contributed to and reviewed by community peers. Development and review of code generated by AI tools should be treated no differently.
Code or other content generated in whole or in part using AI tools can be contributed to Linux Foundation projects. However, there are some unique considerations related to AI generated content that developers should factor into their contributions.
Contributors should ensure that the terms and conditions of the generative AI tool do not place any contractual restrictions on how the toolâs output can be used that are inconsistent with the projectâs open source software license, the projectâs intellectual property policies, or the Open Source Definition.
If any pre-existing copyrighted materials (including pre-existing open source code) authored or owned by third parties are included in the AI toolâs output, prior to contributing such output to the project, the Contributor should confirm that they have have permission from the third party ownersâsuch as the form of an open source license or public domain declaration that complies with the projectâs licensing policiesâto use and modify such pre-existing materials and contribute them to the project. Additionally, the contributor should provide notice and attribution of such third party rights, along with information about the applicable license terms, with their contribution.
Some tools provide features that can assist contributors. For example, some tools provide a feature that suppresses responses that are similar to third party materials in the AI toolâs output, or a feature that flags similarity between copyrighted training data or other materials owned by third parties and the AI toolâs output and provides information about the licensing terms that apply to such third party materials.
Individual Linux Foundation projects may develop their own project-specific guidance and recommendations regarding AI-generated content. Similarly, organizations that employ open source developers may have more stringent guidelines related to use of AI for software development. Contributors should comply with their employerâs policies when contributing.
This does alot of heavy lifting here. Do you (or any of us not currently employed by the largest of companies) have access to the appropriate tooling to determine if there are copyright issues with the outputs of the models youâre attempting to contribute back to open-source? As far as my understanding goes, there are only a few tools that do this, they cost millions a year to acquire for an organization, and they have immense number of false positives that need to be sifted through.
Iâd argue no and LF setting up a âpass-the-buckâ policy know this as well.
Its worth noting that OSRFâs policy is interim while determining what a medium-term policy should look like. There are several major open source organizations that have outright banned the use of Generative AI (among others with similar bans or softer but firm discourgements) and they/we have not gone so far as of yet
If there were to be serious copyright issues due to the outputs of GenAI model(s) that are commonly used, and if some policy or knowledge about what code might be tainted isnât in place, that could be the end of ROS and hundreds of other open-source communities since it would be virtually impossible to continue without stripping back years of work and redoing it since you donât know what external contributions were legally dubious.
This is an important topic for our age and needs to be considered by all organizations, large and small, open-source and proprietary commercial. The legal ramifications of a laissez-faire attitude are enormous and many large players are setting up their own policies internally due to this as well.
I wouldnât recommend folks jumping to too harsh of conclusions here until a full policy is flushed out and proposed. I donât think anyoneâs on Team Outright Ban, but some thought and guardrails need to be put in place.
For example, in Nav2 for the last ~year or so, weâve required users to indicate if they used generative AI in our required PR template so we know when it was used as a very minimal, very non-intrusive first step to have some tracability. This has caused zero friction and gives us some important information (which we will likely need to build on in the future).
Edit: Note that Nav2 and my thought here arenât the official policy or project of OSRF, just my thoughts and current directives on the matter as an active project maintainer that also has to think about this.
I will have a fuller reply for all of you later today, but for now I want to emphasise as strongly as I can the word âINTERIMâ in the title of the post.
This word is one of the more difficult ones for non-native speakers. For those (including me) who are unsure, you can imagine temporary or non-final instead.
NB: Iâve used help of some AI to write this post. It is called thesaurus (or at least was in the pre-AI era). It corrected typos I pointed to. I donât know how this AI works and what data it was trained on (if it was trained at all) and it doesnât seem to have any licensing policy as it is a part of the preinstalled keyboard app on my phone which definitely did not show any EULA to me when activating the phone. Now, what if I used this phone keyboard to author a PR in Github mobile?
I was indeed also going to suggest changing interim to temporary, because itâs not necessarily clear what the former means and itâs easy to just ignore it and read âsomething-something policy / ban on the use of generative AIâ.
while I completely agree this is an important topic and I also agree those problems are currently turning people away from ROS: a) the work done on rmw_zenoh is going to improve this situation and b) unfortunately software is considered (almost) equal to a book, a movie, a play or a television series, at least as far as (copyright) law is concerned.
This means you have to absolutely make sure youâre not (inadvertently) copying someoneâs work without permission, or making something heavily based on someone elseâs work without stating that, because otherwise copyright law provides some quite powerfulawful tools to whoever believes you infringed on their copyright (ie: violated it).
Especially for open-source sw, which already has a hard time when it comes to things like liability and licensing, this is a real challenge especially in (semi) corporate environments (ie: companies) â which is exactly the kinds of environments ROS would really like to see as much as adoption as possible.
As @smac mentions, itâs really difficult to give any guarantees about that when using generative AI right now, especially as a hobbyist, small company or anyone who doesnât have a legal team at their disposal.
To be honest I believe weâre already too late and lots of lines / functions that were auto-completed by tools like Copilot or perhaps heavily based on what such tools suggested have already been merged. Not just in ROS, but also in other projects.
But it does make sense to at least take pause and at least acknowledge this is an issue and setup some rules. At the very least no one can accuse you/us of not having done anything at all.
Edit: the Linux Foundation AI Policy is rather interesting. IANAL, but it seems to me it basically says something like:
If you use AI and contribute something based on whatever it says/does/generates for you, you are responsible for figuring out whether you are legally allowed to contribute that to any of our projects.
Oh and you are also legally responsible/liable if this (ie: whether you were allowed to do this) doesnât turn out to be the case in the future, once we figure this whole generative AI situation out.
This seems to to indeed push the problem to contributors, as @smac already noticed.
Thatâs like the Submission of Contributions section in the Apache 2.0 license and/or a CLA, but with the words âgenerative AIâ added.
Would any of us here know exactly whether they comply with such a policy when using generative AI tools like Copilot? @amacneil@Shannon_Barber?