Category Archives: work

CRTC publishes Rogers’ Response

I have been in IT “war room” type situations a number of times, working to get service to a production system restored. It was with professional interest that I followed the Rogers outage on July 8th, 2022.

For anyone looking for more information than what was covered in the media, Rogers’ response to the CRTC’s questions about the incident was published and can be downloaded from the CRTC (the .DOCX link on the July 22, 2022 post).

Rogers response has been redacted, and is light on specifics (eg: no information about their network). However, there were some interesting details, such as how Rogers issues alternate mobile SIMs from competing mobile carriers to some its technical teams to maintain contact in the event of an outage like this one.

Photo: Diefenbunker Situation Room, CC BY-SA 3.0, by Wikipedia User Z22

The Pigeon Tunnel

My manager has been on secondment to another team for the past 8 months. He stepped into a recent team meeting, where we were re-visiting challenges with our release process, and re-starting an initiative that had been displaced by other priorities.

As he stepped out, he joked (I paraphrase): “Good to see nothing has changed while I’ve been away”

In the next couple months, my manager will return to this team, back to where he started, back to re-visit familiar challenges.

This reminded me of the following introduction to John le Carre’s The Pigeon Tunnel, a collection of stories the author’s time in MI6 and of his unreliable father:
“There is scarcely a book of mine that didn’t have The Pigeon Tunnel at some time or another as its working title. Its origin is easily explained. I was in my mid-teens when my father decided to take me on one of his gambling sprees to Monte Carlo. Close by the old casino stood the sporting club, and at its base lay a stretch of lawn and a shooting range looking out to sea. Under the lawn ran small, parallel tunnels that emerged in a row at the sea’s edge. Into them were inserted live pigeons that had been hatched and trapped on the casino roof. Their job was to flutter their way along the pitch-dark tunnel until they emerged in the Mediterranean sky as targets for well-lunched sporting gentlemen who were standing or lying in wait with their shotguns. Pigeons who were missed or merely winged then did what pigeons do. They returned to the place of their birth on the casino roof, where the same traps awaited them.

Quite why this image has haunted me for so long is something the reader is perhaps better able to judge than I am.”

There seem to be challenges that 8 months don’t progress. Is this exercise futile, like the Monte Carlo pigeons? Are these tough problems we avoid, or constraints we work in?

Or, perhaps its why we’re here – these things are tough, and that’s why we’ve got a very skilled team consistently delivering within this environment. In this environment, over the past 8 months, we’ve completely refreshed our application’s UI, moved to Angular framework, re-branded, and delivered it to > 100,000 customers. Our release process may not be as efficient as we like, there are problems we had a year ago that we haven’t resolved, we may defer initiatives, but it’s all about playing the best hand possible with the cards we have been dealt: we are not standing still.

Ask not if our product uses Apache Struts, but…

When it was revealed that the massive Equifax breach in 2017 was attributed to their failure to patch a component in their system known as ‘Apache Struts’, everyone was reaching out to their development teams and asking: “Do we use Apache Struts? Is it patched?”

And I found it interesting. In my opinion, the wrong question was being asked.

What they should be asking us (and what we should be doing) is:

  • Do we know what libraries our application is using?
  • Do we have a process for checking if security vulnerabilities have been disclosed in the libraries we use?
  • Are all the libraries we using currently supported?
  • Are we using current, patched versions?

There has been an interesting news story recently about how a specific company was a target of a cyber-attack through a library it used. A malicious actor planted a back door in a library it was known to use – some good assessments of the incident have been posted on Ars Technica and Linux Weekly News.

Most development teams don’t have the capacity to audit the source code of all the libraries they use. Further, it would seem that in this instance, the malicious code would have passed a cursory review. At this point, our best defense is to be aware of this possibility when assessing a library, ensure that it has an active community and is well supported prior to incorporating it. Once a library has been incorporated, ensure we track its development for updates.

Paul Allen, the NBA’s Portland Trailblazers, and building a team

When Paul Allen, the co-founder of Microsoft, died in October, I decided to read his 2011 Memoir, Idea Man. I thoroughly enjoyed the book.

Seven years after co-founding Microsoft with childhood friend Bill Gates, Paul was diagnosed with Hodgkin’s lymphoma. He left Microsoft, already a very wealthy individual. A few years after successful treatment and recovery, he bought the Portland Trailblazers NBA basketball franchise.

In 1994, Paul Allen hired Bob Whitsitt as the Trailblazers general manager, to rebuild the team. Whitsitt focused solely on basketball skills in his hiring.

What follows is a cautionary tale for anyone building a team – skill is critical, but it’s important to consider team fit, character, balance, diversity, resiliency, empathy, etc… The following passage from the book describes one of those worst-case scenarios of a team built solely on skill:
Whitsitt proceeded to overhaul our aging roster as he’d done in Seattle, drafting young athletes with upside and adding big-name veterans.

He openly professed that he cared only about talent, to the exclusion of character and other intangibles. “I didn’t take chemistry in college,” he told the media. With enough physical ability on the floor, team cohesion would take care of itself. It was a risky assumption for a sport in which five men share one ball.

With hindsight, Whitsitt temporarily staved off decline by using my wallet to load up on pricey long-term contracts, players who were available because they were overpaid or had off-court issues or both.

When you come so close to winning a championship, as we had in the early nineties, it makes you that much hungrier because you know what the Finals taste like. It was the same for Whitsitt, who was desperate to validate his approach with a title. We were perpetually one big-salaried veteran away from contention, and our payroll ballooned. Deep down I knew that something was wrong. In the playoffs, when the pressure peaks and higher-caliber opponents target your weaknesses, a player’s makeup is revealed in performance. In the 2000 Western Conference Finals against the Lakers, we fell behind three games to one and then fought back to earn a deciding seventh game. Up fifteen points in the final quarter, it looked as though we were headed to the NBA Finals against Indiana, whom I thought we could beat. When I watch my team in the playoffs, I get superstitious; I try not to think about how much I want to win. Whatever happens, I’ll be fine with it. The players tried their best. But in that fourth quarter, I succumbed. I couldn’t deny it. I really wanted to beat the Lakers.

Within minutes, the Blazers unraveled. We missed thirteen consecutive shots. Our players suddenly looked as though they’d met for the first time that morning. The coup de grace came when Shaquille O’Neal dunked an alley-oop from Kobe Bryant with forty seconds left.

That seventh game exposed us as a team without leadership or discipline. I’ll never forget the feeling I had when we boarded our plane, still festooned with BEAT LA stickers, and headed home, our season done. It was a crushing defeat, and it took me a long, long time to get over it.

IN 2002, EIGHT years after Whitsitt’s arrival, we fell into the abyss. We led the league in payroll at $106 million, $44 million more than the championship Lakers. We were $65 million over the salary cap and $50 million over the league’s new luxury tax threshold, which had been designed to level the playing field for small-market teams like ours. Our player salaries cost us an outrageous $156 million, all for a medium-to-good fifty-win team that would lose yet again in the first round of the playoffs.

Off the court, it was worse, as the Trail Blazers became exhibit A for all that was wrong with professional sports. I found myself reeling from one lowlight to the next.

November 9, 2002: Bonzi Wells is suspended for spitting on the Spurs Danny Ferry.

November 22: Co-captains Damon Stoudamire and Rasheed Wallace, on their way home from a game in Seattle, are pulled over and cited for possession of marijuana. To settle the case, both agree to attend drug counseling sessions.

November 25: Ruben Patterson is arrested for felony domestic abuse. His wife later asks prosecutors not to pursue charges.

January 15, 2003: Rasheed is suspended for threatening a referee.

April 3: Zach Randolph is suspended after sucker punching Ruben in the face during practice and fracturing his eye socket.

The fans who felt so close to the Drexler-Kersey-Porter Blazers were disenchanted. Our attendance suffered, and our TV ratings fell by half. The wayward players showed little remorse. Bonzi Wells told Sports Illustrated: We’re not really going to worry about what the hell [the fans] think about us. You could see why parents weren’t rushing out to buy Bonzi or Rasheed jerseys for their kids.

One day I said to Whitsitt, “What’s it like in the locker room? How is the team reacting to the latest incident?”

And he said, “Well, Paul, half our guys are normal and half our guys are crazy. The good guys are all freaked out, but the crazy guys are crazy, so they’re fine.”

I’d heard enough. A team might be able to absorb one erratic personality, but who could win with a group that was half crazy? Three days after our season ended, I fired Whitsitt and gave his successor, Steve Patterson, a mandate to clean house. We traded established starters like Rasheed and Bonzi for forty cents on the dollar while letting bad contracts expire. The win-now regime had stunted younger talents like Jermaine O’Neal (who blossomed into a six-time all-star after being moved to Indiana), and our cupboard was bare. In 2004, the Blazers missed the playoffs for the first time in twenty-one years.

And then we sank even lower. An internal investigator came to me with a report on Qyntel Woods: “We think there may be dogfighting at Qyntel’s house.”

Dogfighting? I couldn’t believe what I was hearing.

A few days later: “We think there may be some dogs buried in his yard.”

Buried in his yard?

And a day or two after that: “There’s a room in his house where we hear the walls are covered with blood.”

Blood on the walls?

I was shocked and mortified. Qyntel eventually pleaded guilty to animal abuse and got eighty hours of community service. We suspended and then released him three months later.

The next year we touched bottom. With a record of 21-61, the Trail Blazers were indisputably the worst team in the league. Though things were quieter off the court, I had a new challenge: how to pay for my team’s home court.

As we discovered too late, the financial formula was fatally flawed. Add a local downturn and an unpopular losing team, and we had a perfect storm of red ink and disaffection. The Blazers were getting booed at home, once unthinkable in Portland. Our season ticket holders were canceling in waves amid calls for a boycott, despite our explicit efforts to rebuild and start over. All told, I’d invested more than half a billion dollars in the franchise, at a huge net loss. Something had to give.

Reflections on Workplace Hackathons

I recently participated in a workplace hackathon.

Here are some reflections on the experience:

  • It’s hard to completely step away from day-to-day work for 2 days. Release schedules set long ago don’t change, we choose to keep customer facing meetings, incidents still need to get addressed
  • Conversely, much can be deferred (many emails can wait!), and a lot can still be accomplished in two part days
  • Two days of development and a 7 minute presentation are great constraints which force many decisions

It’s amazing how much you can get done when:

  • We start “fresh”, no legacy code, no figuring out what a predecessor was trying to do or why something was built a given way
  • We pick our preferred tools and infrastructure

There is such a big gap between a Proof of Concept and working production code:

  • We didn’t worry about production readiness (performance, scalability, stability, security)
  • We focus on prototyping ideas, as opposed to working on functional integration of all systems
  • We only focussed on the main flow, we didn’t worry about handling less-used flows or exception handling

Even without regular constraints, some things are still hard:

  • Firewalls make it is challenging to building a project with uses internal systems (even pre-prod environments) and external APIs
  • Data projects work best with production data, which is rarely possible. It would be really cool to have a legal and security teams participate to make quick decisions on what we can and can’t do with production data for a demo.

And finally:

  • For someone who’s not a developer like myself – this is fun opportunity to write some code – it’s fun to build
  • It’s also fun to play with new tools – last year, we played around with the Amazon Alexa API – when else would we set aside time to do this?
  • It’s a great opportunity to present to an engaged audience
  • The event generates such a positive atmosphere
  • The competition and feedback is immediate, which is awesome
  • A working prototype that can be demonstrated in minutes is critical.

Why do simple things take so long?

“I was in Boston recently and visited Old Ironsides at its berth, coincidentally at a time when the ship was being painted. I chatted with one of the supervisors and asked him about the length of the government specifications for this particular job. He said it numbered two hundred pages and laughed in embarrassment when I told him to take a look at the glass display case showing the original specification to build the ship in 1776, which was all of three pages.” – Ben Rich, from Skunk Works: A Personal Memoir of My Years at Lockheed

Why does it take a 200 page specification a re-paint ship which was built from a 3 page specification? How many kinds of paint were there 1776? How many colours were available? Presumably, when the ship was built, most of the requirements were implicit, just understood and assumed by the builder.

This is partially addressed by one of the principles of the Agile Manifesto emphasize working software over comprehensive documentation. This can be achieved by having small, self-organizing teams, that understand the domain well enough, they can identify the next task and have what they need to take action. They know the ship has to be painted, they know what paint is required, and know how and when to apply it.

This can be hard to scale. Our development team knows our web stack, and our business domain, really well. We have great turnaround times on turning ideas into functional code delivered to production. But if we need something beyond code – let’s say, we’ve got to introduce a new brand, which requires a new domain – we’re not in a position to assess the interaction between our DDOS-protection vendor, network routing, load balancer, server setup, and have to engage external teams.

Similarly, I’ve encountered these issues when working on projects with Canada’s “Big Five” banks. I’ve seen projects held up for months by small things, such as making small changes to a file transfer – something that ultimately takes the right person a few minutes to implement – once the right person is found, allocated over competing priorities, and changes documented. What is operationally very efficient for the bank is less nimble for changes.

What has changed since the days of a 3 page warship specification? How do we efficiently use 30 minutes of SFTP administrator time a year, who has no context to a broader project? How do we get that 30 minutes allocated at the right time? When using niche skillsets across a project, how do we avoid a 200 page spec? Who has the skill to build a 200 page spec across diverse skill sets when we need something new? How long does that take?

Why do simple things take so long?

Malcolm Gladwell and Adam Grant on Teams

One of the benefits of commuting across the top of Toronto on North America’s busiest highway is I have lots of time to listen to my favorite podcasts.

One series I follow is called Revisionist History, a podcast hosted by Malcolm Gladwell, an author known for writing about research in social sciences, often presented alongside observations and stories. This season has started with a discussion between Malcolm Gladwell and Adam Grant, where, among a number of topics, they talked about the impact of teams on individual contributions and outcomes.

I have not fact checked anything they have stated, and I don’t know anything about basketball, hospitals, or flying airliners to comment, but some of the ideas are interesting to think about.

Here are a few interesting excerpts from the transcript:

On the impact of moving from one team to another:
“One of the best guards in the game this season was this kind of Victor Oladipo on Oklahoma City, and was considered a disaster, and he simply moves teams to a new environment with a presumably better coach, he’s no longer playing with Russell Westbrook who’s probably a very difficult person to play with, and simply by moving teams he went from being someone who was widely considered to be a bust, someone would be washing out of the league soon or a mediocre player, into this suddenly a superstar who’s kind of playing transcendently”

“The best coach in the league is probably Brad Stevens of the Boston Celtics … every time a very promising player is traded from the Boston Celtics they turn out to be terrible … Jae Crowder is a good example. Everyone’s like… ‘oh god they traded Jae Crowder wow i don’t know if they can survive without Jae Crowder’. Jae Crowder goes to Cleveland Cavaliers, …people realize oh Jae Crowder is actually not any good he just was good on Boston”

“You then begin to wonder, how many players on basketball teams who we consider mediocre are actually really good but just in the wrong environment? Is Victor Oladipo is he an exception or is he part of a larger trend? And I am increasingly of the opinion that there must be lots of Victor Oladipo’s out there, I think there are, and I think they’re not just in basketball”

On the effect of moving from one team to another:
“…is I have a different team who knows my strengths and weaknesses, and we’ve developed a set of effective routines, and that kind of suggests that performance and skill and expertise is team specific, it’s contact specific”
This I’ve experience in recreational sports – a less athletically capable, but experienced team, can often defeat a more athletic team, just by knowing where team mates will be, how fast they can run, how far they can pass, etc…

“…over seventy five percent of airline accidents happened the first time the crews flying together, and the evidence goes so far on this that NASA did a simulation showing that if you had a crew that was well rested flying together for the first time they made more errors than a sleep deprived that just pulled an all nighter but flown together before”

On building teams:
“…you hire individuals, reward individuals, promote individuals, … What if … you hired entire teams but you didn’t just do that, you promoted teams, rewarded teams”

Experimenting At Work

A co-worker recently came back from training, and shared some of the techniques that were presented.

One that sounded interesting was, in planning, try swapping roles to elicit different ideas. For example, have a developer act as product owner, and speak to priorities, and have a product owner speak to effort and commitment. Role playing is more common when identifying personas to help define user stories, but I hadn’t heard of anyone doing this within a scrum team – we typically go into a planning session as our assigned roles. This is also similar to the ideas presented in Edward de Bono in Six Thinking Hats, where he suggests you try to place yourself in a specific mode of thinking (eg: emotional, creative) to approach problems from a different perspective.

And this idea of role playing got me thinking of a scenario playing technique that I’d heard about on the Freakonomics podcast, where a psychologist proposes conducting a “pre-mortem” before starting a project. Post-mortems are pretty common – particularly when something “bad” happened (which is interesting on its own – as if there were only lessons to be learned when something bad happens). In a pre-mortem, before the project starts, the team meets, and pretends that the effort has been a failure, and spends 5 minutes writing down all the reasons they can imagine why the project failed. The idea is there is now “prospective hindsight”, to adjust for over-confidence at the beginning of a project, and provide input into the planning process. It’s a different way of collecting input for a risk register.

Along with other ideas circulated on LinkedIn news feeds, and blogs, does anyone ever try to shake things up and try different techniques at work? Ever schedule an outdoor walking meeting? Should we wait for the “best of” processes to get collected and codified, and then formally adopt them (PMP, SCRUM, etc…)? Have you introduced anything to your teams after reading about it or talking with peers? Did it stick?

Antifragile – Hidden Benefit of Chaotic Systems

Although not related to IT, there are ideas worth considering as we think about our systems in the following book:
Antifragile: Things That Gain from Disorder by Nassim Nicholas Taleb

“Just as human bones get stronger when subjected to stress and tension, and rumors or riots intensify when someone tries to repress them, many things in life benefit from stress, disorder, volatility, and turmoil. What Taleb has identified and calls “antifragile” is that category of things that not only gain from chaos but need it in order to survive and flourish.”

The idea is that a bunch of un-aligned, disorganized systems suffer from a bunch of small, recoverable failures which make the whole more resilient. Whereas large, organized, homogeneous systems may suffer from fewer small failures, they are susceptible to larger failures which can lead to catastrophe.

I have seen some evidence of these patterns at work. I worked for years on a product which required an Intel server running RedHat acting as a proxy, an Intel server running Windows handling connectivity with other systems, a Sun server running the SunONE application server, and a Sun server running Oracle (all before Oracle bought Sun!). When I worked in support, a “Severity 1” server down alert might mean an issue with the application server, and a single client would be out of service.

In 2012, significant upgrades were made to our infrastructure. All of those Intel servers for all of our lenders were consolidated onto VMWare clusters. All of our Sun servers were consolidated onto larger Sun servers. Significant savings were realized in infrastructure expenses, and systems became easier to manage. The number of incidents decreased.

But as we consolidated our infrastructure, an outage now had much greater scale. A “Severity 1” server down alert now meant that multiple customers were out of service. As we consolidated our servers, we also consolidated our incidents. A Sev 1 became bigger and more complex. If we were using the number of Sev 1 incidents as a performance metric, were we counting the same thing?

As we look to the cloud, the potential scale is even bigger – here are a few examples:

What happens when all applications are hosted by Amazon AWS, Microsoft Azure and Google Cloud? When every server runs Linux on Intel?

Given the choice, I don’t think anyone wants to manage a impossible patchwork 1000’s of systems unsupported by vendors that no one understands with different versions of everything. However, the dangers of homogeneous systems should be considered as we design and assess our systems – there can be strength in disorder!

Hiring for Potential and Building The Amiga Team

I spent a good portion of my childhood in front of a Commodore Amiga 500, an amazing home computer for the late 1980s. I purchased mine used, after having saved months of hard-earned income delivering newspapers.

When author Brian Bagnall created a Kickstarter campaign to fund Commodore: The Amiga Years, a book about the history of the Amiga in 2015, I backed it. As Kickstarter projects go, 2 years later, I received it (now you can buy it on Amazon).

The Amiga was a really neat computer with great capabilities for its price point, much of it enabled by a number of custom chips. The design of these chips was lead by Jay Miner, a former Atari Engineer. I was surprised to learn that for one of the chips, Jay Miner hired Glenn Keller, an oceanographic engineer visiting California looking for work in submarine design, with no prior experience in chip design.

From The Amiga Years:
The engineer who would end up designing the detailed logic in Portia seemed like an unlikely candidate to design a disk controller and audio engine, considering he had no prior experience with either and didn’t even use computers.

In 1971, MIT accepted his application and he embarked on a masters in ocean engineering, graduating in 1976. As an oceanic engineer, Keller hoped to design everything from submersible craft to exotic instruments used in ocean exploration. “I’m the guy that builds all those weird things that the oceanographers use, and ships, and stuff,” he says.
When the oil crisis hit in 1973, Western powers began looking for alternative sources of energy. One of those potential sources was the power of ocean waves. The project caught Keller’s eye while he was attending MIT, and in 1977 he moved to Scotland to work for Stephen Salter, the inventor of “Salter’s duck”, a bobbing device that converted wave energy into electrical power.
The British government created the UK Wave Energy program and in turn, the University of Edinburgh received funds for the program. This resulted in them hiring Keller to work for the university.
The experience allowed Keller to develop skills in areas of analog electronics (with the study of waves playing an important role), digital electronics, and working with large water tanks to experiment with waves. “That resulted in some actual power generated from ocean waves,” he says. “It was a lot of fun.”
In March 1982, with oil prices returning to normal, the UK government shut down the Wave Energy program and Keller returned to the United States ready to continue his career in oceanographic engineering. He soon landed in California, where much of the development of submersibles was occurring. “I was up in the North Bay looking for oceanography jobs and ocean engineering jobs,” he recalls.

Soon, Keller was boarding a train for what would become a life changing experience. When he exited the train he was greeted by Jay Miner, wearing one of his trademark Hawaiian T-shirts. “I go to Sunnyvale, I show up at the train station, and there is this guy in a Lincoln Continental with a little dog sticking out,” laughs Keller.

One doubt Keller had was his lack of experience in the computer industry, or with personal computers of any sort. This was 1983, after all, and millions of personal computers had already permeated homes across North America. “I had done programming but I didn’t understand the world of personal computers or indeed the world of Silicon Valley,” he explains. “I hadn’t been there.”
Once at Koll Oakmead Park, Miner brought him into the shared office space with the whiteboards and block diagrams. Although Miner hoped the proposed system would have a great impact on Keller, he failed to get it. “I didn’t really understand why the architecture was so great in a general sense, because I didn’t know that much about where computers were at that point,” says Keller.
Instead, he hoped his diverse electronics background would give him enough skills for the job. “I had done a lot of electronics but no chips,” he says. “But I liked Jay and I always liked pretty colored wires. I had done a lot of different kinds of electronics. Being in ocean engineering, you do everything: digital, analog, interfaces, all that stuff. Even software. You do the whole thing. So I had a pretty broad base even though I hadn’t done chip design.”
Decades later, Keller sounds mystified as to why Miner would hire an oceanographic engineer into a computer company. “He hired me for some reason,” he says, musing the reason might be because, “I guessed correctly the difference between a flip flop and a latch.”
Most likely, Miner knew all he needed was an engineer with a good understanding of both analog and digital electronics for Portia. He could bridge the gap of chip design by mentoring a junior engineer.

A great story about a successful hire based on an assessment of someone’s potential to learn and grow.

Incidentally, in my high school years, that Amiga 500 landed me my first part time job at Dantek Computers, a small store that assembled IBM PC clones. By this time, around 1994, the Amiga was obsolete, and parent company Commodore was bankrupt. At my interview, Dan of Dantek looked at my resume, saw “Amiga”, and said in French:
“Amiga – ça c’est un signe de bon goût “. I started the next Thursday at 4 PM – I worked there after school for 2 years, and saved enough to pay for a good chunk of my engineering degree.