Plagiarism Detection Against Generative AI for Code Snippet Questions

Plagiarism Detection Against Generative AI for Code Snippet Questions

How do we detect code generated via generative AI? 
 
At Mercer | Mettl, we have created a sophisticated algorithm that harnesses the power of artificial intelligence (AI) to identify instances of AI-generated code. 
 
This feature has been specifically designed to assess the likelihood of whether the code submitted by a candidate has been generated by AI or written by a human. While various techniques exist to identify AI-generated text in natural language, detecting code generated by AI presents a more significant challenge. Unlike natural language, where some tools - can gauge perplexity and burstiness to detect AI-generated text, differentiating between AI-generated code and human-written code is considerably more difficult. AI models have reached an impressive level of sophistication, allowing them to replicate the structure, syntax, and patterns of code closely resembling human coding. As a result, distinguishing between AI-generated code and code written by a human has become increasingly challenging. 

"AI Sense" is an experimental feature that displays a probability score in a candidate's report, providing an indication of the likelihood that the candidate's code was generated using AI or AI tools such as ChatGPT, Bard, or GitHub Copilot.

 

At Mercer | Mettl, we believe it is incorrect to rely solely on the plagiarism percentage calculated. These scores are useful for assessing the relative likelihood of the code being generated using AI; however, they are not proof of plagiarism — only an indication of similarity. A manual review of the code is still required for verification. Additionally, plagiarism scores are not intended to be used as the sole basis for selecting or rejecting a candidate. Other parameters, such as test cases, proctoring data, and browsing tolerance, should also be taken into account. 
 

How are “AI Sense” scores categorized? 
 
Currently, a candidate's “AI Sense” score is labelled as one of the following:

  1. If the AI Sense score is >= 80%, there is a very high probability that this code was generated by AI.
  2. If the AI Sense score is < 80% and >= 60%, there is a high probability that this code was generated by AI.
  3. If the AI Sense score is < 60% and >= 30%, there is a moderate probability that this code was generated by AI.
  4. If the AI Sense score is < 30% and > 0%, there is a low probability that this code was generated by AI. The AI Sense score will be marked as N/A in the following situations:
  1. The candidate has scored zero or fewer marks. 

  1. AI Sense does not support the language the candidate has coded in.  

  1. The AI Sense score is 0%.

 
Support for programming languages 
 
As an experimental feature, we currently support the calculation of the “AI Sense” score for the given programming languages: C, C18, CPP, CPP17, CSHARP, CSHARP12, JAVA7, JAVA8, JAVA11, JAVA17, JAVA21, JAVASCRIPT8, JAVASCRIPT19, PHP, PHP8, PYTHON2, PYTHON3, TYPESCRIPT.

For all other languages, the “AI Sense” score is marked as N/A.