Abstract: The vast availability of free data has been critical to the success of large language models (LLMs). With the widespread use of LLMs, more and more concerns have been raised about the ...