Monday 8 July 2024

Synthetic, fake or just sh*t?


Fur, leather, meat - when you don’t want to wear or eat the real thing, there are alternatives. These are often described as “synthetic”, meaning that they’ve been been artificially produced in order to closely imitate the natural product. 

As well as the “artificial/not natural” connotations, “synthetic” also has associations of insincerity or affectedness. Fake. But fake can be a positive too, especially if presented as a clever alternative to using up natural resources, be they animal, vegetable or mineral.

I’m a bit slow off the mark, I admit. I only consciously heard the phrase “synthetic data” a couple of months ago in a seminar about “The Future of Measurement.” It was used in this context for data generated by AI to patch holes. This made me a little uneasy, but I brushed it off - after all, we’ve been patching holes in data via statisitcal modelling and analysis for as long as I’ve been in this business, and no doubt before that.

But the mention sparked a memory from another seminar, or something I read in the marketing press. That market research organisations such as Kantar are busy with R&D on “synthetic samples” which can generate “human-style responses.”

Does the euphemistic “synthetic sample” really mean fake people?

A year ago, Kantar were still moderately cautious about “synthetic samples”.  While AI has some great applications in market research (coding open-ended responses is an obvious example), the article points out some of the shortcomings of using AI as a substitute for human respondents. For example, look at the differences here:


I’m not surprised that the AI is more enthusiastic than real people about statements that sound AI-generated. Who in their right mind would agree to gobbledegook such as “my product is a way for me to bond/connect with others who share my passion”? Particularly if it’s bog cleaner or something.

The article concludes that: 

Our conclusion is that right now, synthetic sample currently has biases, lacks variation and nuance in both qual and quant analysis. On its own, as it stands, it’s just not good enough to use as a supplement for human sample.

And Kantar advise a blended approach based on real people and supplemented with AI.

Fast-forward a year and Kantar are far more gung-ho about it all. Theyve launched an AI Lab and appointed a Chief AI Scientist . And theres a new GenAI marketing assistant, too. 

Competitive pressure, the race to be first, client demands for faster, faster - or genuine innovation and leadership? Who knows - but Kantar are not alone. Mark Ritson has written enthusiastically about his chums at an outfit called Evidenza.AI

Evidenza say we survey AI copies of your customers to build finance-friendly sales and marketing plans ... we generate hundreds of synthetic customers based on your product category ... we test your messaging with synthetic customers.

Meaning, I guess, use AI to generate marketing communication and throw it to the customer copies to get a tick on that box. No messy humans involved.

Self-fulfilling prophecy? Can we look forward to synthetic sales, too?

Cory Doctorow takes it a few leaps further in his brilliantly-titled critical piece, The Coprophagic AI crisis. From warnings about botshit ("inaccurate or fabricated content shat out at scale”) and human-created content sinking in the cesspit ("As the web becomes an anaerobic lagoon for botshit, the quantum of human-generated “content” in any internet core sample is dwindling to homeopathic levels.) he goes on to consider the consequences should AI Search really take off:

The question is, why the fuck would anyone write the web if the only “person” who can find what they write is an AI’s crawler, which ingests the writing for its own training, but has no interest in steering readers to see what you’ve written? If AI search ever becomes a thing, the open web will become an AI CAFO and search crawlers will increasingly end up imbibing the contents of its manure lagoon.

Food for thought, and I feel distinctly queasy.

It’s another example of a contained system or black box that’s easy to control. Like home-grown problem: solution advertising.

And the answer is to get out of the system, go back to first principles, get inspiration from the internot. And remember that we are responsible for the data we produce and how it’s used. This stuff does matter.


No comments: