So over the last few weeks I’ve made some more improvements to the overall system, and built up a totally new channel with a few changes. Thought I would discuss what I’ve done and learned here.
I really wasn’t happy with the videos, they seemed a bit “empty”. With a bit of work, I was able to modify the encoding to include a multi-layer approach. Now each “slide” contains, bottom to top:
- A static zoomed + blurred + darkened version of the image to 100% screen coverage
- The slow zoom-in of the image
- The overlay caption
Without that background I was seeing some strange artifacts where the zoomed image wouldn’t increase in size, except behind the text. I’m pretty sure it’s an internal moviepy bug, but this was an easy way to fix it that led to a better video anyway.
Now that the system was mostly working, I decided to copy-paste it into a slightly new format. Behold:
This channel uses much of the same technology with a few changes:
- ChatGPT generally chooses a topic itself. I simply give it a theme of “fable”, “dream”, “nightmare”, “myth”, etc and let it do the rest.
- All of the imagery is generated with DALL-E, OpenAI’s image generation system.
- Some encoding tweaks to cross-fade between slides. Given the “dream” nature of the channel, that just seemed more appropriate.
It took a bit of tweaking, but it’s starting to generate some truly impressive stuff. I was really impressed with this one:
Prior to this weekend, I was running everything on a Raspberry Pi. It was OK, but had a lot of problems.
- the MicroSD Storage was pretty slow.
- A 55s video took about an hour to encode.
- My Nagios & SMNP system kept throwing warnings as I was maxing out memory, and occasionally the encode would get oom_killed.
So I found a super-cheap microPC on Amazon: A CyberGeek Intel Quad-Core with 16G Ram for $129 on sale (Up a bit now). Comes with Ubuntu Linux pre-installed, and has been a HUGE upgrade. Shorts now encode in about 5 minutes, and I don’t have nearly as many compiler errors with outdated raspberry-pi distributions.
Previously, I was using YAKE to convert the script captions into search queries for images. That worked ok, but led to some really bizarre videos where the individual string lacked enough content. When the script used pronouns in a sentence like “It has several beautiful vistas”, YAKE had no idea what “it” meant so you wound up with really random results.
With a bit of prompt engineering, I got ChatGPT to return a result array containing individual sentences and a recommended image search query. With that, results have improved significantly. It still sometimes does weird things, but the frequency of it has dropped significantly.
So this is where I could use some advice.. What next? I have 2 channels running: One procedural (travel) and one experimental (dreaming), 55s shorts multiple times a day completely automated. What do I do next?