Plagiarism Detection Against Generative AI for Code Snippet Questions

How do we detect code generated via generative AI?

At Mercer | Mettl, we have created a sophisticated algorithm that harnesses the power of artificial intelligence (AI) to identify instances of AI-generated code.

This feature has been specifically designed to assess the likelihood of whether the code submitted by a candidate has been generated by AI or written by a human. While various techniques exist to identify AI-generated text in natural language, detecting code generated by AI presents a more significant challenge. Unlike natural language, where some tools - can gauge perplexity and burstiness to detect AI-generated text, differentiating between AI-generated code and human-written code is considerably more difficult. AI models have reached an impressive level of sophistication, allowing them to replicate the structure, syntax, and patterns of code closely resembling human coding. As a result, distinguishing between AI-generated code and code written by a human has become increasingly challenging.

"AI Sense" is an experimental feature that displays a probability score in a candidate's report, providing an indication of the likelihood that the candidate's code was generated using AI or AI tools such as ChatGPT, Bard, or GitHub Copilot

However, at Mercer | Mettl, we also believe that relying solely on the "AI Sense" score for evaluation is incorrect. While these scores assist in assessing the probability of AI usage in generating the code, they do not serve as definitive proof of plagiarism. Instead, they indicate similarity. To verify this, a human evaluator must review the code. Additionally, the "AI Sense" score should not be used as the sole criterion for candidate selection or rejection. Other factors such as test cases, plagiarism detection, proctoring flags, and browsing tolerance should also be taken into consideration.

How are “AI Sense” scores categorized?

Currently, a candidate's “AI Sense” score is labelled as one of the following:

If the AI Sense score is >= 80%, there is a very high probability that this code was generated by AI.
If the AI Sense score is < 80% and >= 60%, there is a high probability that this code was generated by AI.
If the AI Sense score is < 60% and >= 30%, there is a moderate probability that this code was generated by AI.
If the AI Sense score is < 30% and > 0%, there is a low probability that this code was generated by AI. The AI Sense score will be marked as N/A in the following situations:

The candidate has scored zero or fewer marks.
AI Sense does not support the language the candidate has coded in.
The AI Sense score is 0%.

Support for programming languages

As an experimental feature, we currently support the calculation of the “AI Sense” score for the given programming languages: C, C18, CPP, CPP17, CSHARP, CSHARP12, JAVA7, JAVA8, JAVA11, JAVA17, JAVA21, JAVASCRIPT8, JAVASCRIPT19, PHP, PHP8, PYTHON2, PYTHON3, TYPESCRIPT.

For all other languages, the “AI Sense” score is marked as N/A.