Databytes: AI

For the Week Ending 11/18/2023

Nov 21, 2023

Editorial Notes

This has been a crazy week. One thing I've been contemplating doing for some time now is branching out into other media, and more particularly putting together a week in review type piece, which I'm now calling Data Bytes. I would end up doing this of course during one of the most tumultuous weeks in recent months.

Databytes is a weekly review of significant events in different tech and science fields. It makes heavy use of AI for production and imagery to voice synthesis to video generation, though the writing and editorial content is mine. It is, not surprisingly, something of an experiment, and feedback is always welcome.

In the News This Week

OpenAI CEO Sam Altman Dismissed
Significant Resignations at OpenAI Following CEO's Dismissal
GPT-4 Outperforms Lawyers in Ethics Exam
Kyutai: New French AI Research Lab
Toku's AI Revolution in Heart Condition Prediction
Microsoft Ignite 2023: AI Product Announcements
Meta Advances Towards AI-Generated Movies
Microsoft Copilot: AI Integration in Windows 10
Pippin Title: AI Streamlines Property Title Navigation

How This Was Made

The creation of a news review show such as this involved a number of pieces, many still manual, and frankly should belie the myth that we’re reaching a stage where everything is push-button easy. Where we have reached is the ability for one person to readily put together what is (I hope) compelling content quickly, to the extent that a determined person should be able to build such content 3-5 times a week.

AI informs just about everything in this process, which can be broken down as follows:

I put together a list of 5-10 topics that I want to cover (and make use of ChatGPT 4 to help determine what those are), usually drawing from the top 10 current stories in the news.
From this a script is developed with short titles, longer titles, and a longer description which makes up the bulk of the article, and is typically 100 words or so. I also extract links to supporting articles, though they weren’t used in this version.
The script is then fed into Natural Reader’s text-to-speech AI. I use their commercial version because I like how the voices sound and because I have better control over vocal patterns, inflections, and spacing. My voice tends to be rather nasal and I find it challenging to find a proper “quiet room” environment for recording, whereas using a text-to-speech voice means that if I need to change the content (and I do, quite frequently), it’s much easier just to re-run the script in question. This usually generates a single sound file, which I’ll later cut up as needed.
The illustrations generally use Stable Diffusion (I like working with Easy Diffusion) or Dall E-3, typically to create an image at 1024x1024 resolution. The square aspect ratio works best for me, as I can take the images and generate output appropriate for both desktop and mobile (Instagram style). I will typically generate several illustrations for each article. While I occasionally use the text description of the article as my seed, I also usually supplement that with relevant terms to better shape the output.
The next part is a bit cheesy, but it works now: I use the Pika Labs Discord app to add animations to the illustrations. This is a hit-or-miss operation, especially as you can only generate a few seconds of video this way. Combining videos in various ways to extend this is possible, but the workflow is still awkward. I’m looking forward to true tweening, where you pass keyframes for the video you’re producing and let the system interpolate between frames. That capability is beginning to emerge, but it should be available in more professional tooling within three to four months.
I currently use commercial music tracks. Yes, AI can be used to generate these tracks, but even the best sounds like your typical grade school band trying to play Elgar or Rachmaninoff. Don’t go there. Just … don’t.
Once I have these video snippets, I pull them into Movavi, a decent (if not fantastic) video editing suite. This is where I sequence the videos, add titles and transitions, sync the soundtracks, and equalize the audio.
The final process involves editing and reviewing, and that process of tweaking can take a while. The above video is 4 1/2 minutes long and took me about fourteen hours to complete. You can produce video faster, but a lot of producing good video really does come down to making iterative choices and decisions, some of which will pan out while others won’t.

I expect my workflow to change dramatically over the next year as tools evolve, but I also believe that longer form video content is going to explode in the next year.

Contact

If you have something you feel is newsworthy, please contact me at kurt.cagle@gmail.com, or get on my Calendly calender: https://calendly.com/semantical.

In Media Res,

Kurt Cagle

Databytes

Discussion about this post

Ready for more?