Introducing Eliza Ng, my generated content sandbox and blogging bot
By Richard Audette, firstname.lastname@example.org
I’ve created a blogging bot I’ve named Eliza Ng, as a sandbox for playing with generated content and tooling. Check it out!
Who is Eliza Ng?
One of my earliest computing experiences was interacting with a Commodore 64 version of chatbot called Eliza, a psychotherapist chatbot first developed in the mid-1960s at MIT. When I started building my bot persona, I decided to call it Eliza Next Generation, in honour of this early chatbot.
A blogging bot is a nice, a small project where I could learn about the capabilities and limitations of current tools.
How does it work?
When I first started building out the site, I would just enter the prompts, like “Write a blog post about how bananas may go extinct”. This worked well, as ChatGPT has been trained on a diverse range of material, and it has all the information it needs to build out a short blog post.
This was just a first step to build out the functionality to automate publishing posts – I also wanted my bot to come up with things to write about, I didn’t want to be coming up with fresh topics every day myself.
For my first attempt, I developed a script that would identify key points from the top technology article in The Guardian, and then write about it – check out Reinventing Twitter for an example. The writing itself is fine, but ChatGPT has inserted an untruth: Elon Musk did not buy Twitter in October 2020. I presume as ChatGPT isn’t trained in current events, it would insert untruths for facts when it didn’t have any to write about.
This approach also produced a narrow range of topics – while I was working on this, Elon Musk was consistently the subject of the most read article Technology on the Guardian, and I didn’t want my bot to build an Elon Musk blog. Although this did accomplish my goal of creating a bot that could write a blog post every day, it felt a bit pointless: what’s the point of creating a mediocre version of an article from the Guardian full of untruths?
My current approach is to draw content from a popular technology news discussion site called Hacker News. On this site, users submit, upvote, downvote, and discuss stories on a variety of topics. I settled on the following logic:
- Find the most commented story in the last 22 hours, to ensure content is popular and fresh
- Capture all comments associated with the story, greater than 250 characters, to eliminate short responses that add little to the discussion
- Have ChatGPT write a blog post about the collection of comments
I’m using Hugo to build the website, and I am hosting it on an Amazon S3 bucket. Although the process for building the post is automated, I currently review every post before it is published to the web. I’m not worried about accuracy or errors, but I don’t want to publish anything offensive, which could happen – one day, the bot created an article on gun violence – there was nothing particularly offensive about it, but it’s not topic I want covered by the bot. After my review, the article is published as-is, and the bot announces the post on Mastodon and Twitter.
If you are interested in details or would like to setup your own bot, you can download the code I’m using here: https://github.com/raudette/bloggingbot
- The automated image prompt generation has to be improved. I am currently just providing Dall-E the blog title, and the results have been mediocre.
- Improve the article prompt. The current writing style doesn’t suit me - there are far too many exclamation points. Key information is regularly omitted. For example, the article about the music discovery site called Maroofy didn’t include the link to Maroofy.
- Gather topics and content from more sources. Using a discussion surrounding a popular, current topic on a forum generates interesting results, that I would argue add value. Great internet communities, like Hacker News, provide a variety of topics to write about, and the discussion provides interesting content. ChatGPT is great at creating a coherent summary of a few hundred words that captures key points from a threaded discussion with hundreds of comments. I’m considering looking at popular Reddit forums next.
Since the dawn of writing, creating content has always taken more time and effort than consuming it. Generated content changes this balance. There has always been content of limited value (like spam), but it has not always been as easy to generate or as hard to filter through. Communities, forums, and publications will need to learn how to efficiently filter and manage generated content – through editors, moderators, voting systems, policy and other processes. Many are currently updating their policies: Medium.com is requiring transparency and disclosure, and, ironically, a home automation forum has banned generated content outright.
That being said, there will be great use cases for generated content, and perhaps generating summaries of discussions will be one of them. I follow a number of forums, like Hacker News, and I’ll read through all the comments associated with an interest post – but I only have so much time available to read. As tools like ChatGPT can create summaries, it might be interesting to generate custom newsletters based on forums of personal interest – for example, an article summarizing a month’s worth of Toronto cycling related posts on Twitter. For the time being, I will not be changing my reading or writing habits, but I will continue to look for new and interesting applications of this technology.