When I presented my blogging bot here a couple months ago, a friend suggested I try to create generated video content – check out the linked video to see what I’ve been able to do so far: https://youtu.be/WkfGq42OyBI
It’s not quite there – the body movements are random, the eyes aren’t focused on the camera, some of the mouth movements don’t seem to match the face. I’ve added a street scene in the background to make it a bit more interesting.
Like the blogging bot, the script is generated by an OpenAI model, seeded with comments scrapped from an internet forum. The audio is created with a text to speech tool from ElevenLabs. I experimented with a few different tools for creating an animated avatar, and settled on working with Nvidia’s Audio2Face animation model in combination with Metahuman technology in Unreal Engine, an environment popular in game development.
Unlike my blogging bot, the process for creating this video is not automated. The tooling, at least for someone with my experience and resources, does not seem to lend itself to automation. It looks like this could change in the very near future – Nvidia has announced, but has not yet released, its Omniverse Avatar Cloud Engine (ACE), which looks like it could facilitate the creation of generated content. If anyone from Nvidia is reading – I’d love early access.
The Guardian reported earlier this week that Kuwait News has introduced a generated news presenting Avatar. I could envision services like LinkedIn experimenting with using an avatar to present our personal news feed as generated video. It remains to be seen if this new generation of avatars will see greater success than earlier attempts like Microsoft Office’s Clippy!
In January, I created a blogging bot as a sandbox for playing with generated content and tooling. The bot identifies a popular post on an internet forum, collects some of the comments, submits them to ChatGPT to write an article, and then posts it to the web, Fediverse, and Twitter. Read about it here https://articles.hotelexistence.ca/posts/bloggingbot/ , and then check out the bot at https://www.eliza-ng.me/ . Source code: https://github.com/raudette/bloggingbot. It reached an interesting milestone this week – one of its posts was organically starred on Mastodon.
I recently participated in GeekWeek 7.5, the Canadian Centre for Cyber Security’s (CCCS) annual cybersecurity workshop. I was assigned to a team of peers in banking, telecom, government and academia. We were to work together on analyzing how Bluetooth item trackers (eg: Apple AirTags, Tile) can be covertly used for malicious purposes, and developing processes and tools to detect them.
It was my first time attending the event, and I wasn’t sure what to expect. Here’s how it worked. The CCCS builds teams from the pool of GeekWeek applicants, based on interests and skills identified by the applicant in the application process. Leading up to the event, CCCS appoints a team lead, who defines goals for the team.
Testing a homemade AirTag clone
I was appointed as a team co-lead and assigned a team. My co-lead and I were to define a challenge in the IoT space. Inspired by recent headlines concerning stalking facilitated by AirTags, my co-lead suggested analyzing how item trackers can be covertly used for malicious purposes, and developing processes and tools to detect them.
“Can you find my stealth AirTag clone in this data?” we asked the data scientist on our team. 🍰 <piece of cake>, he responded.
As various news outlets reported about the malicious use of these trackers, manufacturers implemented anti-stalking measures which were quickly defeated by research efforts such as find-you. The find-you tracker avoids detection by rotating through identifiers – to an iPhone, it appears no differently from walking past other people with iPhones.
The team built find-you stealth trackers which rotated through 4 keys and went to work. We wanted to see if we could develop a technique for detecting and identifying the trackers. Data was collected by stalking ourselves, walking through busy urban areas with find-you trackers, and logging all Apple Find My Bluetooth advertisements.
Our hypothesis was that we could identify the find-you tracker based on signal strength patterns. We believed that the signal strength of the find-you tracker we were wearing would be consistent over time, as its location relative to our data logger wouldn’t change, and other Find My devices would vary, as we walked past other pedestrians carrying devices.
Rotates keys on a random interval, between 30 and 60 seconds (Find-You rotates every 30 seconds)
Transmits at random power levels, in an attempt to avoid detection based on a consistent signal strength.
In our testing, even with these changes, we were still able to identify stealth tags in an area with other Apple Find My devices. Even at the ESP32’s lowest power level, the stealth tracker can be identified as the signal strength of the stealth tracker is higher than other devices the vicinity of the tracked subject.
Further areas for exploration would be reducing signal strength with shielding material or some other means, and adding an accelerometer such that the device only transmitted Bluetooth advertisements when the target subject is in motion.
Building a Clone Tile Tracker
The team was inspired by Seemoo Lab‘s work on the AirTag. Could we do similar things with the Tile tracker?
And, we figured out some things, but got stumped. Tile uses the Bluetooth MAC address to track a Tile. We got our clone Tile to the point where we could take MAC of a tile registered to our account, load our firmware with the MAC onto an ESP32, and walk around with our Tile app and it would track it.
But, it seemed like if we sent our Tile’s MAC ID to someone else to load on to their ESP32, and track with their Tile app, it wouldn’t report the location. And, although Tile reports thousands of users in my area, even a genuine Tile didn’t seem to get picked up by other Tile users walking through office food courts, malls, or transit hubs. As a result, at the end of two weeks, many questions remained unanswered.
The experience of stepping away from the day-to-day, learning about different technologies and building things was very similar to my previous experiences participating in corporate hackathons.
What made GeekWeek exciting for me was leading a team of people from different companies, with different professional backgrounds. At a corporate hackathon, everyone already knows their teammates, what they’re capable of, they share the same corporate culture, they typically work on the same software stack, and have the same collaboration tools pre-loaded on their laptops. At GeekWeek, it was interesting just breaking down the problem, and finding out who had what tools, what experience, and was best suited to take on a task. Also, it was interesting to hear about everyone’s professional work – some were really pushing boundaries in their fields.
I wanted to build my own vision model for a few reasons:
I wanted to learn how
In my limited experience with OpenALPR, it looked like it was missing some license plates that seemed fairly readable to my eyes – could I possibly do better training my own model?
Just the way it is built, I know I wouldn’t be able to get OpenALPR to run faster on my Pi – I wouldn’t be able to get it to run faster by off loading image processing to a VPU like the Myriad X in my Oak D camera.
The gen2-license-plate-recognition example provided by Luxonis, built from Intel’s Model Zoo, does not work well with Ontario license plates
The first step was building a library of images to train a model with. I sorted through hundreds of images I’d taken on rides in September, and selected 65 where the photos were clear, and there were license plates in the frame. As this was my first attempt, I wasn’t going to worry about sub-optimal situations (out of focus, low light, over exposed, etc…). I then had to annotate the images – draw boxes around the license plates in the photos, and “tag” them as plates. I looked at a couple tools – I started with Microsoft’s VoTT, but ended up using labelimg. Labelimg was efficient, with great keyboard shortcuts, and used the mouse scroll wheel to control zoom, which was great for labeling small details in larger photos.
I then tried one tutorial after another, and struggled to get them to work. Many examples were setup to run on Google Colab. I found when I was following these instructions, and I got to part where I was actually training the model, Colab would time out. Colab is only intended for short interactive sessions – perhaps it wouldn’t work for me as I was working with higher resolution images, which would take more computing time.
What I ended up doing was manually running the steps in the Train_YoloV3.ipynb notebook from pysource, straight into the console. As my home PCs don’t have dedicated GPUs, I setup a p3.2xlarge Amazon EC2 instance to run the training. If memory serves, training against those 65 images, using the settings from that tutorial, took a couple of hours.
I took the model I created from my September rides, and then tested it against images from my October rides – I’m surprised how well it worked.
My Yolov3 model running on Oak-D
Since training that model, I’ve been on the lookout for an nVidia video card I can use for training at home. It’s hard to know for sure, but it seems it wouldn’t take long to recoup the cost of a GPU vs training on an EC2 instance in the cloud, and I can always resell a GPU. I’ve tried a few times with the fastest CPU I have in the house (a Ryzen 3400g), and it just doesn’t seem feasible. I haven’t seen a cheap GPU option, and the prices just seem to be going higher since I started looking in November.
I don’t have usable code or a useful model to share at this point, at this point, I’m mostly learning and trying to figure out the process.
I wanted a hard copy of an eBook I had that is out of print. There are many resources out there for binding books. Many recommend using acid free PVA glue. I can’t speak to how it compares to other glues, but “Aleene’s Tacky Glue” is a PVA glue, available acid free, which was available at craft stores in my area.
This post will focus on prepping an eBook for print. As US Letter is the common paper size here, which is too big for a book, I decided to print 4 pages per US letter page, 2 pages per side, each 5.5″ wide by 8.5″ tall.
First, I loaded to book into Calibre, opened it, and printed it to PDF. For this exercise, I’ve used Ian Fleming’s Casino Royale, which is out of copyright in Canada.
Calibre Print to PDF
Next, I had to re-arrange the pages. If we just print 2 pages per side, duplex, page 4 will end up on the back of page 1. We want page 2 on the back of page 1 – we want to reorder the PDF following the patterns 1, 3, 4, 2, 5, 7, 8, 6… This LibreOffice spreadsheet might help: Pages.ods
Pages have to be re-ordered for regular duplex printing – page 2 has to be on the back of page!
Next, I used a tool called pdfjam to fit 2 pages per side: pdfjam collated.pdf -o collated-2perpagealternate.pdf --nup 2x1 --landscape
I sent this PDF to my local printer, and had them cut the pages in half for me. With this output, I bound the book, roughly following a Youtube tutorial. My book turned out OK, but it feels like it would take me a few more attempts to get a book as sturdy as a commercially bound book.
One of the features I have in mind for my bicycle dashcam was license plate recognition. In parts 1, 2 and 3, I experimented with the OpenALPR license plate recognition library and a couple different Pi cameras. I encountered a few challenges:
Image quality challenges: out-of-focus images, warped images due to the “rolling shutter” of the Pi camera
Field of view: capturing more than just the license plate
Speed: Only able to process 1 image every 8 seconds on my Pi 3
I acquired the Luxonis Oak-D AI accelerated camera to experiment with different image sensors which could potentially address my image quality challenges, stereo vision/depth sensing provided interesting capabilities, and the AI acceleration to increase the speed. This spring, I mounted it to my bike and started capturing images on my rides.
I had issues with my Pi 3 – it would stop running reliably after a minute or two – I suspect it had been damaged by vibration from previous rides, being strapped to my bike rack. I acquired a new Pi 4, and was up and running again.
Initially, with the Oak-D setup, I had a lot of the same image quality problems I was having with the Pi 1 and 2 cameras – lots of out-of-focus images, the camera just kept on trying to focus, which is a hard problem with taking photos in moving traffic on a bumpy bicycle ride. My application would also crash – this turned out to be due to filling buffers – I was writing more data to my USB thumb drive than it could handle. I ended up getting acceptable results by reducing my capture speed to 2 fps, recording at 4056×3040, turning auto focus off, locking the focus at its 120 setting, and setting the scene mode sports, in the DepthAI API as follows:
With these settings, images are focused in the narrow range where it’s possible to read a license plate – when cars are too far back, the plates are impossible to read anyway, and it doesn’t matter if that’s out of focus. Luxonis will soon launch a model with fixed focus cameras, which should further improve image quality in high vibration environments. I hope to try this out in the future.
I wanted to build a library of images I could later use to test against various machine vision models, and potentially train my own. I posted a the question on the Luxonis Discord channel – their team directed me to their gen2-record-replay code sample. This code allows you to record imagery, and later play it back against a model – it was exactly what I needed. So I started to collect imagery on my next few rides.
Ever since the Espressif’s ESP8266 wi-fi capable microcontroller was launched, I’ve been thinking about all the possibilities for low cost network connected devices. And, nothing world changing, but I have used it to build a data logging CO2 monitor and a device to control my old TV with Alexa.
A day after it arrived, Hackaday published an article about a re-creation of Google Chrome’s T-Rex game for this TTGO dev board. Getting that loaded on to the board seems like a good test. I downloaded the TRexTTGOdisplay and installed Lilygo’s TFT_eSPI driver, compiled and…
undefined reference to `TFT_eSprite::pushToSprite(TFT_eSprite*, int, int, unsigned short)'
Hmm. I search around, and I see a hint in the comment’s of the author’s Youtube video: “You will need to update tft library”. I find the source of the TFT_eSPI library, review it a bit, and see that it is designed for a number of microcontrollers and screen controllers – so I copy the User_Setup_Select.h from the Lilygo repository to Bodmer’s most recent TFT_eSPI libary. For anyone doing this now, this will fix the TFT_eSprite::pushToSprite issue and just work… but I got:
TFT_eSPI/TFT_eSPI.cpp: In member function 'virtual void TFT_eSPI::drawPixel(int32_t, int32_t, uint32_t)': TFT_eSPI/TFT_eSPI.cpp:3289:21: error: 'SPI_X' was not declared in this scope while (spi_get_hw(SPI_X)->sr & SPI_SSPSR_BSY_BITS) {};
I take a look at what’s happening around line 3289 in TFT_eSPI.cpp, and it appears to be optimization code for the RP2040 – it shouldn’t be compiled in… Taking a look at line 3285:
// Temporary solution is to include the RP2040 optimised code here #elif (defined (ARDUINO_ARCH_RP2040) || !defined (ARDUINO_ARCH_MBED)) && !defined(TFT_PARALLEL_8_BIT)
See that exclamation point? And everywhere else in the code there are RP2040 optimizations, I see:
// Temporary solution is to include the RP2040 optimised code here #elif (defined (ARDUINO_ARCH_RP2040) || defined (ARDUINO_ARCH_MBED)) && !defined(TFT_PARALLEL_8_BIT)
I was hoping for another successful contribution to open source, but I was beaten to the punch – if I had started this project a day later, it just would have worked with the latest TFT_eSPI library. In any case, the important thing is, I got TRexTTGOdisplay running. My next project for this dev board will be a little internet connected dashboard.
I continue to experiment with how a dashcam can assist urban cyclists. This time, I’ve started a fresh design with a different idea, a new camera, new models, and new code, which I’m submitting as an entry for the Toronto ♥️’s Bikes Make-a-Thon.
I enjoy biking from my home in North York, near Mel Lastman Square, to my office near Union Station during the week. The most harrowing part of this ride is the Yonge-401 interchange, which requires two lane changes with fast moving traffic from the 401 on and off ramp.
Yonge – Hwy 401 Interchange
As cyclists in the city, we all have “scary spots” like these on our routes. I would like to present you with a Smart Dashcam for Bicycles as a tool for these challenges. A dashcam could:
My prototype consists of a laptop, a USB AI accelerated camera from Luxonis mounted to my bicycle seat post, and my smartphone as a display. It’s a few hundred lines of Python code that builds on a freely available AI vehicle recognition model from the Intel Open Model Zoo. I’ve built on the license plate recognition and MJPEG video streaming sample code from Luxonis that was supplied with the OAK-D camera. I tether the laptop to the smartphone using wifi, and I use an iOS app called IPCams to view the video stream.
Bicycle Dashcam with Smartphone Display
The vehicles are recognized and identified. The video is streamed over wifi to the smartphone. A caution alert is added to the video when a vehicle is detected.
In this proof of concept, the dashcam is just a fancy, complicated, expensive, rear view mirror. A final version would expand on this functionality by integrating features such as:
Sounding an audible alert when danger is detected
Recording the speed and proximity of the cars around you
Integrated GPS
Cloud and social features for sharing data with the city and fellow cyclists
A car driver readable display, eg: “Driver ABCD1234, your current speed is 45”. Like a mobile Toronto Watch Your Speed program sign. Would a driver allow a cyclist more space if they were aware their actions are being logged?