Unveiling the Digital Curtain: Reverse Engineering GenAI to Expose the Social Media Clone Army!

augmentedrobot
3 min readSep 24, 2023

Be it an article I read in the local newspaper, a post on LinkedIn or Facebook or even a friendly email from the local sports club, they all sound the same.

They sound the same because the sender has prompted the content in a system and probably used a generic prompt that leads to a generic output.

The system (trained to be very clean, kind and polite) makes content that is repetitive in nature, clinical and sometimes very shallow and cold.

The system sounds like a well polished politician running a highly calculated political campaign. Nothing is out of order, and everything is scripted to death.

I can understand the appeal of sounding more professional in some settings, but having one’s grammar corrected and having one’s entire personality erased are two very different things.

It doesn’t take much experimentation to find an epidemic of phrases that have been mass-generated and reused in posts all over social media.

If you ask e.g. ChatGPT to write a post for LinkedIn it will give you this:

If you then take those #hashtags, search them on social media and look at the posts that are connected to them, the EXACT phrasing (as mentioned above) appears in an abundance of posts.

Sure, there are only so many ways you can express yourself when announcing a new job, but the similarities are too striking to just be coincidental.

This type of “finding a post generated by autoregressive genAI” reverse engineering can be done for almost anything.

As a bot engineer, I’ve created enough autoregressive text-based clone-armies to know how they work, and how to spot repetitive patterns in the digital world.

As much as we would like to think that what these systems generate are unique, it is in fact very standardised when it comes to generic things such as phrases and expressions (e.g. emails or posts). In order to save time, the system gives everyone the most probable output, because that is what autoregressive language models are for!

Here are some examples, I erased the info identifying the individuals but the standard formula of phrasing was VERY SIMILAR and the hashtags are very likely provided by the system. Of course, I can never be a 100% sure.

Some more hallmarks of generated posts:

Emojis like this in the start of the post 🚀

Lists of questions in the post that are very contrived:
For example:
🤖 What are your thoughts on x?
📊 What potential do you see for x?
🧩 What challenges and ethical considerations should we address in x?

And a generic invite for discussion e.g.:
I’m excited to hear your insights! Share your thoughts in the comments below and let’s shape the future of healthcare together. 👇

This was bound to happen, and will become a larger part of what we see in digital and physical media. In the movies we watch and the art we see.

I’ve even seen it happen to obituaries, but that is for another post.

In order to save time and money, we settled for the clone army. The question is: how long until the rebellion?

--

--