card

Malicious Python package uses Unicode support to avoid detection

13.04.2023
453

Cybersecurity specialists at Phylum have published a report on how a malicious Python package on PyPI uses Unicode as an obfuscation technique to evade detection while stealing and exfiltrating developer account credentials and other sensitive data from compromised devices. The malicious package, named "onxyproxy", was submitted to PyPI on 15 March 2023 and successfully collected 183 files before being deleted. The code has similarities to Unicode characters, which include bold and italic fonts and can be interpreted by a Python interpreter.

Unicode was created to ensure interoperability and consistent representation of text across languages and platforms. It is a comprehensive character encoding standard covering a wide range of scripts and languages, unifying different sets/schemes under a common standard covering over 100,000 characters. The onxyproxy package contains a "setup. py" package with thousands of suspicious code strings that use a mix of Unicode characters. This is achieved by using homoglyphs, or Unicode versions of what appear to be the same character, such as look and look, to hide their true intent among other innocent-looking functions and variables.

This technological breakthrough demonstrates the continued efforts of threat actors to develop new techniques that go beyond string-matching-based protection by using similar Unicode characters to hide malware. The Phylum specialist wrote: "But whoever this author copied this obfuscated code from is clever enough to know how to use the internals of the Python interpreter to generate a new kind of obfuscated code, one that is somewhat readable without revealing too much of what exactly the code is trying to steal”.

Meanwhile, in November 2021, a theoretical attack called Trojan Source, which uses Unicode control characters to inject vulnerabilities into source code, was presented by two Cambridge University researchers, Nicholas Boucher and Ross Anderson. The code, similar to the Phylum findings, is difficult for a human reviewer to detect the underlying intent. The authenticity of the report has been confirmed, adding to the battles facing cybersecurity. Defenders must implement more robust detection mechanisms against these emerging threats to the computer world.