Rules for using LLMs in academic research
Here are 4 rules for using ChatGPT and similar tools in academic research.
➤ Why is this important?
Using ChatGPT and other large language models (LLMs) without proper guidelines is dangerous. The risks in the context of academic research include:
👉Plagiarism
👉Spreading misinformation
👉Inability reproduce scientific results
👉Inability to scrutinize papers properly
➤ The background for this entry
👉On January 24, 2023 Nature set two rules for using large language models (LLMs), such as ChatGPT, in research. Their rules are helpful but incomplete.
👉The standards that Nature sets are likely to influence global expectations on what it means to conduct research responsibly in the age of ChatGPT and other large language modes.
👉Therefore, I wrote a proposal for more comprehensive rules. The final list is below and in the attached pdf, it incorporates public feedback from conversations here and here.
👉I also sent my proposal to Nature, as a correspondence, but they didn't want to publish it.
➤ Here is my proposal:
1. Authorship
LLMs cannot be credited as authors of papers since authors should be accountable for papers.
(One of Nature’s existing rules.)
2. Transparency
Papers must document the use of LLMs. The documentation should include the following, which are important for critical scrutiny and reproducibility. (Expanding Nature’s transparency rule.)
2.1 State which LLM was used
2.2 Disclose conflicts of interest
2.3 Explain how the LLM was used in detail (including prompts)
2.4 Explain which portions of the paper the LLM affects and how
2.5 Discuss relevant limitations of the LLM
3. Fact-checking
Papers should state whether and how they check for LLM misinformation (e.g., whether and how the authors used manual or automatic techniques).
4. Anti-plagiarism
Papers should credit the rightful owners of ideas presented by LLMs.
If the authors can’t find the owner or if the idea seems to have been generated by an LLM, the authors should do the following:
State that they couldn’t find an owner
Explain what they did to locate an owner (e.g., whether and how they used manual or automatic techniques)
Explain what they did to generate the idea through LLMs (e.g., which prompts they used)
➤ Thank you to everyone who contributed. Additional comments are welcome!
➤ Join the conversation here.
Comments