In May 2022, as part of my ongoing quest to 'work smarter not harder', I delved into the world of AI copywriting, specifically using Writesonic and Jasper to write a few LinkedIn posts.

I had had access to the OpenAI playground for a while, but only tinkered with a few experiments. My goal was to ramp up content production and social media presence for the startup we were working on at the time, and I needed some support to get my content creation process streamlined.My goal was to discover how good the tools are to help produce more effective and personalized content for marketing campaigns.

Initially, the writing tools produce ok content, and I found that supplying detailed prompts dramatically improved the output.

However, the AI-generated content still felt dry and lacked a distinct tone. I felt it was easy to spot they were AI written, although some guidance on style did help. But it led me to explore the possibilities of fine-tuning GPT-3 to improve the writing style.

The idea was simple: train the LLM on specific content, infusing it with a unique tone of voice. I believed that fine-tuning the model on corporate blogs would help AI-generated articles match the stylistic preferences of those blogs more closely.

It took some time, but I eventually came back around to the idea of fine-tuning in Feb 2023 (I've been busy!).

I conducted two fine-tuning experiments - one with ~300 articles from a charity website, and another with nearly 1,000 articles from a marketing thought leader's blog.

I used blog titles as prompts and their corresponding content as completions for the training data.

And although the fine-tuning process was successful (I did create two new models to use for content creation purposes), the outcomes fell well short of my expectations.

The detailed prompts that worked well earlier didn't yield the desired topics in the output, possibly due to the model's overfitting to the prompt element of the fine tuning data.

Moreover, I hadn't cleaned the data, leading to irrelevant content, such as event promotions and software references, in the outputs.

While the larger dataset did capture the writing tone, the output was a random mix of content that wasn't usable in its current form.

Additionally, the cost of fine-tuning GPT-3 was not insignificant, with training the marketing-thought-leader blog costing $90 and each query on that model $0.12 (per 1000 tokens), 600x more expensive than using GPT-3.5. It's definitely possible to get better results through improving the fine-tuning, but the costs will mount up quickly.

Interestingly, this exploration happened just before GPT-4's launch, which isn't available for fine-tuning yet. It'll be interesting to see what evolves there.

One possible approach to customising AI-generated content is using embeddings to deliver custom knowledge, potentially creating bespoke chatbots for businesses. However, it's unclear whether this will solve the tone of voice challenge.

Moving forward, the To refine this evaluation further, I would:

  • Clean the training data, removing irrelevant content.
  • Reverse engineer the posts by using GPT to summarize the articles and use these summaries as prompts in the training data, appended with "write a blog post about [summary]."

While my initial attempt at fine-tuning GPT-3 didn't yield the desired results, the lessons learned were invaluable. And in the fast changing world of AI there are plenty of further experiments to be made.