Enjoying our lemonade: How my team came out ahead from a 4 week outage

I work on a web based product which supports hundreds of thousands of paying subscribers. Our servers were first built out in 2014, before I joined the team. By 2018, our environments were starting to look dated, with some components approaching end of life. I submitted an intake request with our shared-services infrastructure team to modernize it, and assess a move to Azure, but as everything was still supported, and as we had no new requirements, we were unable to make a case, as other lines of business had much more pressing concerns.

In March of 2020, an incident shut down our development environments for an indeterminate amount of time. We were stuck: no API gateway, no environments for QA, nowhere to deploy and test code, no way to progress the product. The team was idle until our infrastructure team restored our environments, and other lines of business were ahead of us in the queue. We did have access to a company-funded Azure sandbox, which had been acquired for developing independent proof-of-concepts.

No environments, no way to progress the product, idle team, and access to an Azure sandbox. Although there was no approved infrastructure project to migrate our product to Azure, our team decided to migrate our .Net web app to Azure PaaS, with the hopes that one day, the project would get green-lit.

4 weeks elapse. Our pre-production environments are restored, and we’re now in a position to continue regular product progression. We’ve made great progress, but we’re not quite running on Azure PaaS, and a number of business-driven priorities are 4 weeks behind. Do we stop, and resume work as we had before the incident? If we resume progression work, how do we keep this Azure PaaS code branch current?

We estimated we needed another 2 weeks to get the app fully functional on Azure PaaS. We met with all stakeholders.

Here’s what we decided to do:

  • Complete the Azure PaaS migration work (it did take another 2 weeks!)
  • Replace our on-premise Dev environment with the Azure PaaS environment. Going forward, all deployments would start with our new Azure PaaS dev environment, and then progress to our regular on premise QA environment. This ensured our Azure PaaS work didn’t rust, sitting unused, and allowed us to continue stable development on our on-premise infrastructure.

About 7 weeks after the incident, we resumed our regular product progression cadence.

Fast forward to July 1st 2020, and, unexpectedly, our line of business is sold to another company. We need to plan for a data center migration to the acquiring firm’s data center. The initial plan is to “lift and shift” – move the production on-premise virtual machines and configuration. We propose to the new management team that we think we should move the application to Azure cloud. A couple of meetings later, this becomes the plan.

The migration date is set for November 2020, and is executed smoother than any of us expected, with disruption limited to a single planned service window of a few hours.

That was almost three years ago. For our use case, the Azure environment has proven to be far more reliable and cost effective than our previous on-premise data center. This was all enabled by a four week dev environment outage. We made lemonade with our lemons!

Starting rootless containers at boot with Podman

I’m building out a new server at home, and decided to try out Podman instead of Docker for running containers. Everything is a bit different. I wanted some containers to start on boot, as I had previously setup with Docker. I found an article that got me most of the way there, but it was missing a few key things for rootless containers. Here’s how I got a Vaultwarden container I setup, named vaultwarden, booting for user username and group groupname, on an Ubuntu / systemd based system.

Create a service file in /etc/systemd/system/vaultwarden.pod.service

[Unit]
Description=Vaultwarden/Bitwarden Server (Rust Edition)
Documentation=https://github.com/dani-garcia/vaultwarden
Wants=syslog.service

[Service]
Restart=on-failure
ExecStart=/usr/bin/podman start -a vaultwarden
ExecStop=/usr/bin/podman stop vaultwarden
User=username
Group=groupname

[Install]
WantedBy=multi-user.target

The following command allows the container to run when the user isn’t logged in (further details):

sudo loginctl enable-linger username

Reload the daemon:

sudo systemctl daemon-reload

And, your rootless Podman container should run at boot.

Python gps / gpsd KeyError

I have been adding GPS logging to my Bicycle Dashcam. All of the example Python code (eg: https://gpsd.gitlab.io/gpsd/gpsd-client-example-code.html) I’ve found online throws this error after about a minute:

Exception has occurred: KeyError
'el'
File "example2.py", line 12, in
while 0 == session.read():
KeyError: 'el'

I’m not sure if this is a problem unique to my GPS receiver, but it regularly sends data that is not “class”:”TPV” (eg: “class”:”SKY”) – TPV is the data with coordinates, and whenever that happens, all the sample code throws a KeyError.

I’ve found the easiest way to fix this is to add a TRY/EXCEPT inside the forever loop, catch the KeyError expection, and PASS. Here is an adaptation of code I found on Stackoverflow that solves the problem – I’ve submitted it as an improvement, hopefully they accept my edits!

import threading
import time
from gps import *
from datetime import datetime

class GpsPoller(threading.Thread):

   def __init__(self):
       threading.Thread.__init__(self)
       self.session = gps(mode=WATCH_ENABLE)
       self.current_value = None

   def get_current_value(self):
       return self.current_value

   def run(self):
        while True: #moved the try INTO the forever loop, so pass from the except resumes collecting data
            try: 
                self.current_value = self.session.next()
                time.sleep(0.2) # tune this, you might not get values that quickly
            except KeyError as e: #catch the KerError, and return to the forever loop
                pass
            except StopIteration:
                pass

#if __name__ == '__main__':

gpsp = GpsPoller()
gpsp.start()
# gpsp now polls every .2 seconds for new data, storing it in self.current_value
while 1:
    # In the main thread, every 5 seconds print the current value
    time.sleep(5)
    value=gpsp.get_current_value()
    print(value) 

Update (May 6, 2023): I found a better solution: the gpsd-py3 library works great.

Creating Generated Video

When I presented my blogging bot here a couple months ago, a friend suggested I try to create generated video content – check out the linked video to see what I’ve been able to do so far:
https://youtu.be/WkfGq42OyBI

It’s not quite there – the body movements are random, the eyes aren’t focused on the camera, some of the mouth movements don’t seem to match the face. I’ve added a street scene in the background to make it a bit more interesting.

Like the blogging bot, the script is generated by an OpenAI model, seeded with comments scrapped from an internet forum. The audio is created with a text to speech tool from ElevenLabs. I experimented with a few different tools for creating an animated avatar, and settled on working with Nvidia’s Audio2Face animation model in combination with Metahuman technology in Unreal Engine, an environment popular in game development.

Unlike my blogging bot, the process for creating this video is not automated. The tooling, at least for someone with my experience and resources, does not seem to lend itself to automation. It looks like this could change in the very near future – Nvidia has announced, but has not yet released, its Omniverse Avatar Cloud Engine (ACE), which looks like it could facilitate the creation of generated content. If anyone from Nvidia is reading – I’d love early access.

The Guardian reported earlier this week that Kuwait News has introduced a generated news presenting Avatar. I could envision services like LinkedIn experimenting with using an avatar to present our personal news feed as generated video. It remains to be seen if this new generation of avatars will see greater success than earlier attempts like Microsoft Office’s Clippy!

Tools Used

How to

I followed these tutorials to create the video:


Introducing Eliza NG, my generated content sandbox and blogging bot

In January, I created a blogging bot as a sandbox for playing with generated content and tooling. The bot identifies a popular post on an internet forum, collects some of the comments, submits them to ChatGPT to write an article, and then posts it to the web, Fediverse, and Twitter. Read about it here https://articles.hotelexistence.ca/posts/bloggingbot/ , and then check out the bot at https://www.eliza-ng.me/ . Source code: https://github.com/raudette/bloggingbot. It reached an interesting milestone this week – one of its posts was organically starred on Mastodon.

Who watches the certificate authorities?

In November, I was following a news story the Washington Post broke about Trustcor, a Certificate Authority with root certificates pre-loaded in all popular browsers. A few academics identified some problematic relationships and facts about Trustcor that puts into question the process used by Chrome, Firefox, and Safari to determine which companies can issue the certificates that our browsers use to indicate a connection is secure. From the article:

The website site lists a contact phone number in Panama, which has been disconnected, and one in Toronto, where a message had not been returned after more than a week. The email contact form on the site doesn’t work. The physical address in Toronto given in its auditor’s report, 371 Front St. West, houses a UPS Store mail drop.

Would this company pass your audit? I was familiar with the role played by root certificates in browsers, and understood there was an assessment process for inclusion completed by the teams that build browsers. I get that a process can’t catch everything, but I was surprised about how flagrant this evidence of untrustworthy behaviour was.

I was pleased to see in the Post’s follow up story that Mozilla and Microsoft are removing Trustcor’s root cert from their browsers. This follow-up article also presents the possibility that the root cert was leveraged by Packet Forensics for man-in-the-middle interception of encrypted traffic:

Packet Forensics first drew attention from privacy advocates a dozen years ago.

In 2010, researcher Chris Soghoian attended an invitation-only industry conference nicknamed the Wiretapper’s Ball and obtained a Packet Forensics brochure aimed at law enforcement and intelligence agency customers.

The brochure was for a piece of hardware to help buyers read web traffic that parties thought was secure. But it wasn’t.

“IP communication dictates the need to examine encrypted traffic at will,” the brochure read, according to a report in Wired. “Your investigative staff will collect its best evidence while users are lulled into a false sense of security afforded by web, email or VOIP encryption,” the brochure added.

Researchers thought at the time that the most likely way the box was being used was with a certificate issued by an authority for money or under a court order that would guarantee the authenticity of an impostor communications site.

They did not conclude that an entire certificate authority itself might be compromised.

Can you imagine the potential for man-in-the-middle abuse by ISPs and VPN service providers if they had root certificates in our browsers (examples of untrustworthy behaviour by VPN providers)? Given the challenges with this model, it’s interesting we haven’t seen more evidence of widespread abuse:

  • Was this situation isolated and unique?
  • Why doesn’t every malicious actor get a root certificate?
  • Are there limited incentives to intercepting traffic?
  • Perhaps regular law/ethics are sufficient to deter intercepting traffic?
  • Is this happening everywhere all the time, and there’s limited impact?
  • Perhaps most traffic isn’t of value, so traffic interception is limited and targeted?

I recommend reading Delegating trust is really, really, really hard, a reflection on the matter by Cory Doctorow. Hopefully, the Trustcor story brings greater attention to the issue, and more attention will be paid by development teams & users to the root certificates pre-loaded in their browsers.

Hours of Fun Creating Visual Art with Prompts

“koala bear eating eggs benedict in a bistro” is the prompt I entered into OpenAI’s DALL·E system to generate this image. I have been reading articles about AI image generation since DALL·E 2 launched earlier this year, and have been experimenting with it hands-on since I received access earlier this week. It is lots of fun, but doesn’t always generate the results you might expect. I’ve been trying to describe to it artist Henri Julien’s Chasse-gallerie, a drawing of 8 voyageurs flying in a canoe at night, and DALL·E struggles with the flying canoe. For each prompt, DALL·E initially creates 4 image variations – I’ve selected the most interesting one for each below.

La Chasse-galerie by Henri Julien
4 men in a flying canoe at night, generated by DALL·E
4 men paddling a flying canoe through the sky at night, generated by DALL·E
4 men in a canoe in the sky at night, generated by DALL·E
lumberjacks flying in a canoe past the moon, by DALL·E. I like how they are chopping the tree while flying the canoe.

Generating Images on your PC

There are models that exist that you can run on your home PC. I’ve checked out min-dalle (used by https://www.craiyon.com/) and stable-diffusion (demo). They can both create imagery that roughly matches my prompts, which is amazing. I found that the output from min-dalle was a bit crude, with distorted features. Output I’ve generated from stable-diffusion is much better.

For me, quality really makes a difference in how much I get out of the tool. Generating low quality images isn’t as much fun. Perhaps this is a reflection of my artistic skills – I can make a crappy drawing of a panda climbing with a ball point pen while watching a Powerpoint in a random meeting. But I personally don’t have the skill to render my ideas as well as DALL·E – maybe that’s what makes it so fascinating.

A painting of a panda climbing a skyscraper, generated by DALL·E mini
A painting of a panda climbing a skyscraper, generated by stable-diffusion
A painting of a panda climbing a skyscraper, generated by DALL·E

Not all fun and games

As much fun as this is, there is a lot of controversy about the implications of making tools like this accessible.

NBC News actually has a pretty good article about the biases exhibited by DALL·E. For example, ‘a flight attendant’ only produces images of women. In this respect, it is very much like other tools currently in the marketplace – I was curious, and searched Getty Images for ‘flight attendant’ stock photos, and found only women in the top results. Image generation tools continue to propagate the bias problems we see everywhere.

An Atlantic columnist raises some other interesting points about the backlash he faced when he used generated images in his newsletter, as opposed to professional illustration that you might see in a magazine feature. Here are some further thoughts on the ideas he presented:

  • Using a computer program to illustrate stories takes away work that would go to a paid artist, “AI art does seem like a thing that will devalue art in the long run” – I almost wonder here, if the computer program just becomes another tool in the artist’s arsenal. Is the value of an illustrator in the mechanics of creating an image, or visually conveying an idea? What if we think of a program like DALL-E as a creative tool, like Photoshop. Did Photoshop devalue photography?
  • There is no compensation or disclosure to the artists who created the imagery used to train these art tools – “DALL-E is trained on the creative work of countless artists, and so there’s a legitimate argument to be made that it is essentially laundering human creativity in some way for commercial product.” – At primary school age, my children brought home art created using pointillism techniques, without compensating or disclosing inspiration from Georges Seurat and Paul Signac. Popular image editing packages have had Van Gogh filters for years. We all learn and build on the work of those who preceded us. Once a style or idea is presented to the world, who owns it? Is the use case of training an algorithm a separate right, different from training an artist?

There are also challenges with image generation tools facilitating the creation of offensive imagery and disinformation, making it easier to cause harm than it is today with existing tools like Photoshop.

These tools will continue to progress, and will create change in ways, and on a scale that are hard to predict. It remains to be seen if we collectively decide to put new controls in place to address these challenges. In the meantime, I’ll be generating imagery of pandas and koalas in urban environments.

CRTC publishes Rogers’ Response

I have been in IT “war room” type situations a number of times, working to get service to a production system restored. It was with professional interest that I followed the Rogers outage on July 8th, 2022.

For anyone looking for more information than what was covered in the media, Rogers’ response to the CRTC’s questions about the incident was published and can be downloaded from the CRTC (the .DOCX link on the July 22, 2022 post).

Rogers response has been redacted, and is light on specifics (eg: no information about their network). However, there were some interesting details, such as how Rogers issues alternate mobile SIMs from competing mobile carriers to some its technical teams to maintain contact in the event of an outage like this one.

Photo: Diefenbunker Situation Room, CC BY-SA 3.0, by Wikipedia User Z22

Exploring Bluetooth Trackers at GeekWeek 7.5

I recently participated in GeekWeek 7.5, the Canadian Centre for Cyber Security’s (CCCS) annual cybersecurity workshop. I was assigned to a team of peers in banking, telecom, government and academia. We were to work together on analyzing how Bluetooth item trackers (eg: Apple AirTags, Tile) can be covertly used for malicious purposes, and developing processes and tools to detect them.

It was my first time attending the event, and I wasn’t sure what to expect. Here’s how it worked. The CCCS builds teams from the pool of GeekWeek applicants, based on interests and skills identified by the applicant in the application process. Leading up to the event, CCCS appoints a team lead, who defines goals for the team.

A map with a drive to Vaughan that was tracked at multiple points.
Testing a homemade AirTag clone

I was appointed as a team co-lead and assigned a team. My co-lead and I were to define a challenge in the IoT space. Inspired by recent headlines concerning stalking facilitated by AirTags, my co-lead suggested analyzing how item trackers can be covertly used for malicious purposes, and developing processes and tools to detect them.

The team worked in 5 streams:

Collecting Data

An existing baseline data collection tool was modified for our purposes for logging Bluetooth LE data for further analysis.
https://github.com/raudette/geekweek-7.5_1.3_loggingble

Detect Stealth AirTag Clones

“Can you find my stealth AirTag clone in this data?” we asked the data scientist on our team.
🍰 <piece of cake>, he responded.

As various news outlets reported about the malicious use of these trackers, manufacturers implemented anti-stalking measures which were quickly defeated by research efforts such as find-you. The find-you tracker avoids detection by rotating through identifiers – to an iPhone, it appears no differently from walking past other people with iPhones.

The team built find-you stealth trackers which rotated through 4 keys and went to work. We wanted to see if we could develop a technique for detecting and identifying the trackers. Data was collected by stalking ourselves, walking through busy urban areas with find-you trackers, and logging all Apple Find My Bluetooth advertisements.

Our hypothesis was that we could identify the find-you tracker based on signal strength patterns. We believed that the signal strength of the find-you tracker we were wearing would be consistent over time, as its location relative to our data logger wouldn’t change, and other Find My devices would vary, as we walked past other pedestrians carrying devices.

A team member assessed the data and built a model for identifying the keys based on signal strength.

Stealth Tag Easily Identified
Stealth Tag Easily Identified

The stealth tag, with its four keys, could easily be picked out in the data.

We experimented with randomizing the transmit power and time between key rotations of the stealth tracker.

Stealth Tag with Varying Transmit Power

Even at its lowest transmit power, the stealth tag’s four keys could still be could easily be picked out in the data.

Enhancing Stealth AirTag Clones

We created a “SneakyAirTag“, which tried to increase the stealthiness of Positive Security’s Find-You tag, found here: https://github.com/positive-security/find-you, which itself was built on the excellent work of: https://github.com/seemoo-lab/openhaystack

It tries to improve the stealthiness as follows:

  • Rotates keys on a random interval, between 30 and 60 seconds (Find-You rotates every 30 seconds)
  • Transmits at random power levels, in an attempt to avoid detection based on a consistent signal strength.

In our testing, even with these changes, we were still able to identify stealth tags in an area with other Apple Find My devices. Even at the ESP32’s lowest power level, the stealth tracker can be identified as the signal strength of the stealth tracker is higher than other devices the vicinity of the tracked subject.

Further areas for exploration would be reducing signal strength with shielding material or some other means, and adding an accelerometer such that the device only transmitted Bluetooth advertisements when the target subject is in motion.

Building a Clone Tile Tracker

The team was inspired by Seemoo Lab‘s work on the AirTag. Could we do similar things with the Tile tracker?

And, we figured out some things, but got stumped. Tile uses the Bluetooth MAC address to track a Tile. We got our clone Tile to the point where we could take MAC of a tile registered to our account, load our firmware with the MAC onto an ESP32, and walk around with our Tile app and it would track it.

But, it seemed like if we sent our Tile’s MAC ID to someone else to load on to their ESP32, and track with their Tile app, it wouldn’t report the location. And, although Tile reports thousands of users in my area, even a genuine Tile didn’t seem to get picked up by other Tile users walking through office food courts, malls, or transit hubs. As a result, at the end of two weeks, many questions remained unanswered.

We also experimented with altering transmit power and rotating through MAC addresses as a means of avoiding detection. Our work can be found here:
https://github.com/raudette/geekweek-7.5_1.3_tileclone

Blueprints

One of our team members had access to a few HackRF One Software Defined Radios (SDRs), and set about learning how to use them. They wanted to duplicate the results of a paper that demonstrated that with an SDR, one can identify individual bluetooth devices based on differences in how they transmit as a result of minute manufacturing imperfections. The team member called their Bluetooth device fingerprint a Blueprint, and documented their work here: https://github.com/raudette/geekweek-7.5_1.3_blueprint

Overall Experience

The experience of stepping away from the day-to-day, learning about different technologies and building things was very similar to my previous experiences participating in corporate hackathons.

What made GeekWeek exciting for me was leading a team of people from different companies, with different professional backgrounds. At a corporate hackathon, everyone already knows their teammates, what they’re capable of, they share the same corporate culture, they typically work on the same software stack, and have the same collaboration tools pre-loaded on their laptops. At GeekWeek, it was interesting just breaking down the problem, and finding out who had what tools, what experience, and was best suited to take on a task. Also, it was interesting to hear about everyone’s professional work – some were really pushing boundaries in their fields.

I hope to participate again in the future!

Virtual Hackintosh

Ever since I first read about Hackinthoshes, I’ve thought about building one. A friend of mine edits all of his video on a purpose built Hackintosh. I never did build one – for myself, I like to run Linux, I don’t really need a Mac for anything, and I find that off-lease corporate grade laptops are the best value in computing. But, every once in a while, I have something I want to build on my iPhone, and a Mac is like a dongle that makes it possible.

Simple things, like finding out why the site I’ve built with the HTML5 geolocation API will work on my PC, but not on my iPhone, are just not possible without a Mac. To solve these types of problems in the past, I’ve just borrowed a Mac from the office.

Recently, another project came up, and I decided to try to build a virtual Hackintosh with OSX-KVM – a package that simplifies running OSX in a virtual machine. Years ago, I tried running OSX in VMWare Player – this would have been on a Intel Core 2 system at the time – in my opinion, it was interesting, but unusable. My experience with OSX-KVM has been much better.

Just following the instructions, within a couple of hours, I had OSX Big Sur running in a VM. OSX-KVM does almost everything, including downloading the OS from Apple. My PC is pretty basic – a Ryzen 3400g, SSD, 16 GB. I assigned the VM a couple CPU cores and 12 GB of ram, and it’s usable.

In a few hours of dabbling around, I’ve come across four issues:

  1. First, I couldn’t “pass through” my iPhone from the host PC to the guest OSX OS. The best overview of this issue I could find is logged on Github. There are configuration related workarounds, but I decided the easiest way to solve it was to acquire a PCIe USB controller and allocate it to the VM. PCIe cards with the FL1100 and AS3142 chipsets are said to work with OSX – I ended up buying this AS3142 card, as it was the one I could get my hands on quickest, and it seems to work fine – I can see my iPhone in OSX now, and run the dev tools for Safari iOS.
  2. Second, I can’t log into Apple ID, and as a result, I can’t compile iOS apps in Xcode. It looks like this is solvable.
  3. Chrome doesn’t display properly. I wonder if this is related to graphics acceleration. I don’t need Chrome to work, but its a reminder to me that this exercise is really just a big hack – I don’t think I would count on this to do any real work.
  4. Finally, I seem to lose control keyboard/mouse of the VM if it goes to sleep, and I can’t seem to get it back. I’m sure this is solvable, but addressed it by turning sleep mode off.

Presumably, Hackintoshes will eventually stop working as Apple moves its platform to ARM processors, but for now, it’s definitely worth trying out.

Update (2022/03/03):
The Apple ID issue was addressed by following the previously linked instructions, the Chrome issue was resolved by turning off GPU acceleration (–disable-gpu).

Update (2022/03/26): I needed Bluetooth. I bought an Asus BT-400 adapter, it seems to work fine. AirDrop functionality doesn’t seem to work – haven’t investigated. And – not a Hackinstosh issue, but a Mac issue – the CH9102F serial chip on one of my ESP32 microcontroller boards isn’t natively supported by Big Sur, found a third-party driver. Apple Maps doesn’t work, as it requires a GPU, and the OpenHaystack project that I’m playing with requires Maps.

My personal brain dump, Opinions, Projects, Toronto