Author Archives: greg

The 6 Signs of a Tipping Point in Software Engineering Organizations

Like most people, my career hasn’t been with a single team or company. I recently realized that I may have observed patterns with teams, organizations, companies in the last 20 years in software, but a list of warning signs isn’t something that’s readily available.

The tower of Pisa began to lean during construction in the 12th century, due to soft ground which could not properly support the structure’s weight. It worsened through the completion of construction in the 14th century. By 1990, the tilt had reached 5.5 degrees.

As I went through my notes of the peaks in valleys of my career, these are the things that really stood out to me.

Over the course of many months, a foundation that was once stable can start to lean after each misstep – over time and many faults, it creates a leaning tower that people see each day when they come into work.

1. Consistent Feedback that Management isn’t Listening to Feedback

From my experience, when this happens over ~2 years, it’s a sign that something is wrong.

If employees don’t trust the feedback system in place, it’s hard to create a flywheel to improve everything around them. One consistent thing that I noticed within organizations that fail is that employees don’t believe meaningful action will be taken from the surveys that they pour their feedback into. While there are several reasons for this, one key DevOps area that can be borrowed here is to make work visible.

Make action items public for the effort your leadership team is dedicating to making things better. Overlay feedback on monthly retrospectives to get continual feedback from your teams.

2. Last Minute Budget Cancellations

I’ve typically seen Q4 Travel and Education (T&E) budgets shrink to zero as the company panics to make their most important quarter successful. While T&E budget usually doesn’t represent a significant portion of the Profit and Loss (P&L) statement, when the company provides guidance out of the blue to cancel all travel, where it then must be approved by a VP, things aren’t looking good. If this happens successive quarters in a row, or every year for 2 years, it’s a symptom of poor budgeting.

In tech companies, the largest swath of Operational Expenditures (OpEx) is with People, running infrastructure, and licences whereas Capital Expenditure (CapEx) is with buildings and infrastructure procurement. If software is only using 30% of the available hardware capacity, there’s a 70% gap in Operational Costs just sitting on the datacenter floor.

This is what reactive finance looks like

3. Significant Attrition

When a Vice President, a Senior Director, Senior Managers, and Senior Engineers leave an organization within a few months, you likely have a confluence of problems that has finally reached a tipping point.

With stocks down post-COVID, employees no longer have RSU handcuffs that allowed them to deal with constant heartache at work. This normalizes all other tech companies and allows the veil to be lifted to find greener pastures where they might be valued and treated better. Now, human kindness and emotional quotients take priority over a devalued company.

For us, we lost our senior engineer tech lead, followed by our director, then another director, then a few talented engineers – all within 6 months. This started a downward spiral, because we were also blocked to backfill some of these positions, where the culture shifted, morale declined, burnout was high, empathy went away – fast-forward another year and significant attrition among all levels was realized. The fundamental reason why attrition numbers aren’t usually known is to protect the company, so most people won’t even realize it’s bad until it’s too late.

Another late-breaking “solution” that didn’t work here is the concept of a Stay Interview – for us, this was too little, too late because RSU value dropped, the organizational track record wasn’t positive, and asking people why they would leave shows that the leadership team lacks context. Invest in people from start to finish, and not when things are in shambles.

The positive side of a forest fire is that the forest usually grows back more dense, in the case of software engineering teams, I haven’t yet seen this.

4. Doing Scrum Wrong

  1. Excessively long agile meetings with more than 20 😱 people
  2. Agile without outcomes
  3. Project Managers that only focus on getting Stories closed and not the context of the work
  4. Months without (start, stop, continue) retrospectives
  5. Retrospective feedback, when given, isn’t acted upon
  6. Teams not tracking their work
  7. Tracking defects in Excel or Slack to avoid being tracked in Jira/GitHub
  8. An excessively long backlog (e.g. 50-200 Stories) that never diminishes

When this has occurred, I’ve noticed that engineers don’t feel the empowerment to change the status quo because the people that run the meetings aren’t able to take stock and auto-correct with feedback. This means that 6 months goes by, until an official performance cycle when feedback is gathered, to make a correction.

The best tool we have to correct errant agile is a focused and honored retrospective, performed at least every month, with clear action items and clear progress made towards them.

5. Lack of Standard Hiring Bar / Backbone

If your organization struggles to define the roles of a Software Engineer, they will struggle with hiring the right people for the job. For us, we had engineers that did Data Analytics, Data Engineering, managed Airflow jobs, performed System Administrator roles, Network Engineering, were a Scrum Master, and others that wrote code – we didn’t have a common definition of what it meant to be a Software Engineer, nor did each manager have a common way to ensure we were hiring top talent with a diverse candidate pool (ideally n > 5).

This creates a pay gap where people are paid for the time they work and the criticality of their role (e.g. managing a batch cluster is more critical than writing code for an internal tool), but not the work that they do. There were times where someone would be under-performing as a software engineer, but would be saved from a “does not meet” rating, because they were available 24×7 and responsive on Slack (so they were more DevOps than Software Engineer). When you don’t have consistent expectations for roles, and managers aren’t calibrated to those expectations – merit increases were seemingly random. Other companies work to calibrate managers and provide guidance that engineers must at least meet the expectations on every axis of their growth framework (see Square’s, for example).

Stacking another level on an already skewed tower, some of the organizations I’ve worked in were risk averse. So much so that people were managed-out by transferring them to other teams, when it was obvious they were underperforming overall. This allows a manager to pass the buck and receive a backfill, instead of solving the root cause of the problem. Only worse, when an under-performer would leave, we had no control in place to both ensure they could not rejoin and worst of all, receive a promotion when an underperformer rejoins another team within the same company.

Non-standard practices create inconsistent pizzas with olives only on one side

6. Referring to Employees as Fungible

The tough years came when we started to determine how many people could be replaced in San Jose with people in Chennai, with a focus on complete cost savings without regard for people. The going rate was about 3 software engineers in Chennai for 1 in the Bay Area. Finance’s perception was that a software engineer in one location at one level is equivalent in another location.

What they failed to realize is:

  1. The emotional toll that layoffs have on an organization
  2. Late night meetings for teams becomes a new normal that no one wants
  3. Performance and potential greatly differs between engineers and regions, a senior software engineer with 1 month domain experience is not the same as a senior engineer with 4 years of domain experience
  4. Team meetups and bonding is restricted (see travel budget cancellations above)
  5. Work becomes transactional, you lose connectivity between people and their work
  6. You aren’t fixing the aforementioned upstream problems of: (1) improving organizations based on feedback, (2) last minute budget cancellations, (3) significant attrition, (4) doing scrum wrong, (5) a standard bar – this just magnifies the problem by three.

I want to be short and succinct with these opinions I write, so we’ll stop here for now. In the future, I’ll be writing about these other areas:

  • Forced Curve Calibrations
  • Promotions Limited by Available Budget
  • Solving Failures with More Compliance Training
  • Not Trusting Employees to Take on a Challenge
  • Delayed Merit Increases
  • Constant Layoffs
  • Continual Ask for Managers to Calibrate Employees
  • How to Create Spaghetti Software with Top-Down Feature Requests
  • Focus on Moving Employees to Low Cost Areas
  • No Significant Achievements, but a lot of PowerPoints
  • The Long Jumper Problem: Human Tracking of Key Results
Not all of Pisa is leaning – just the tower. If you work in a place that’s been leaning for a couple of years, there are other options out there.

Enterprise Engineering Teams: Why You Need a Strong One

How do you know if you have an enterprise engineering team? If you do, how do you know if you have a strong product, team, and happy customers?

Most companies aren’t doing enough with their enterprise tools.

They usually have more junior engineers, creating buggy products running old platforms and technology. It takes months to create a tool, and when it’s released it’s not user friendly and wasn’t built with the customer in mind.

They may rely on contractors that have limited company perspective and lack long-term vision to build these applications. What you really want is a solid engineering team delighting employees with the tools they use: it’s a retention tool at minimum and a productivity driver on average.

I wanted to be extremely pragmatic with this article, forcing you to retrospect and introspect about your company, noting that the grass can be greener if you look outside.

A quick test

You have an enterprise engineering team if:
1. You have a CIO organization: typically there is a software engineering team for HR functions, an employee phonebook system, and other internal tools to support employees.
2. You have an internal phonebook system that is built (e.g. Amazon Phonetool) and not purchased (e.g. WorkDay, Gusto, Friday, etc.)

Enterprise tools should stand the test of time, but not be built with ancient technology

General features might include

1. Employee directory: Allow employees to search for organizations, people, teams
2. News: Learn about recent events
3. Events: See a calendar and subscribe to upcoming events like a brainstorm, or a ERG meeting
4. Policies: HR, Security, etc
5. Thank you system: Applaud someone for helping out
6. Search capabilities: Search documents, news, people, etc.
7. Site-specific details: (e.g. Austin office details and perks) – how to get a badge, main contains, coffee shop hours, etc.

There are some more uncommon features to look for as well

1. Internal job portal: Allow employees to find jobs that are open
2. Employee transfer system: Allow employees to move into other roles easily
3. Badge system: Reward employees with internal currency and digital badges of honor for various things
4. Active Incident display: All tools used by employees display the current state of the production, sandbox, and developer environments along with any current Incident that may impact Customers or their work
5. Integrated financials and system health dashboards: At a glance, give people a general feeling for how the company is doing without triggering any deeper insider training restrictions
6. Ideas and patent portal: Keep track of awesome ideas that can turn into the next great product feature, allow employees to engage with others to build a business plan, go-to-market, and write code – then submit it for legal review to check for patentability to secure that intellectual property!
7. Promotion and rating systems for managers: If you are fortunate to have a standard promotion and calibration process across the company, this tool should allow for common practices to be implemented like anonymization, timed discussions of each candidate to remove bias, notes, assigned reviewers, voting, feedback back to the candidate, and decisions/ratings.

How do you know if my company is doing a good job with enterprise tooling?

Employees move from team to team easily+5: As an employee, when I want to move teams, I can mark in the system that I’d like to be considered by other managers for their teams
+5: I can move an employee to another manager in 10 minutes
+5: It’s easy to copy/paste usernames and first/last names because managers have to do this several times a day
-5: I can move an employee in 1 day
-10: I can move an employee in 5 days
-10: As an employee, I have to hunt for internal jobs on my own alone in the forest and have to wait for weeks for managers to respond to me making me not feel valued
As a manager, I love hiring because it’s easy+5: I can see how many positions, and at what level I need to hire for
+5: Within a few seconds, I can review the candidates resume, take notes so that my entire hiring group, including peer managers and senior engineers who help with hiring, can see it
-5: I can’t easily filter and mark candidates for the next phase of the interview pipeline
-10: I have had my positions taken away from me due to reorganizations, and hiring freezes within 2 months of each other wiping out any hiring pipelines and making candidates upset tarnishing your company brand
-20: A manager can hire someone without ensuring a significant cohort of people have been considered to ensure proper diversity, inclusion, equity, and belonging practices were followed see How to Actually Hire for Diversity
I can easily recruit people who have been referred by others+10: When you mark someone as a referral, you can easily follow them through the interview process and you’re consulted by the hiring committees for what you know about the candidate
-5: When you refer someone it goes into a pile where you can never tell if the hiring manager has ever seen it, there is no follow-up as the hiring process can take months to close a position even after someone is hired for that role
-5: As a hiring manager, you believe that referrals are the same value as an organic applicant applying online because referrals are sent by employees who don’t even know the person they are referring
We have a system that handles promotion panels and performance reviews+10: Across the company, performance reviews are conducted in similar ways for engineering teams
-5: Employees know they can get a better performance review in another organization
-5: Employees are promoted faster in other organization due to a lacking standard performance and promotion process
-10: Engineers who work harder, not smarter, working long hours due to poor planning and unrealistic deadlines are rewarded more and promoted due to fear of attrition
I can tell if there is a current Incident with our customer-facing products across all platforms I use+5: Our culture is such that people can easily find an active Incident for a product and can easily join in to see how things are going no matter if they are call center teammates, managers, directors, engineers, or project managers
+5: We hold the bar on post-mortems where we ask folks to leave if they have not prepared properly for their Incident review
+10: The same Incident doesn’t happen again, along with similar ones because proper testing, canary rollouts, and documentation is put in place
We have a centralized place to keep track of ideas, including patentable ideas, we can innovate on our products easily+5: When I have an idea for a new feature, product, or internal process I can collaborate with others to push it into production and/or receive a patent for it
+5: Our code base is built so that teams can easily collaborate on other products and features, even if they aren’t on the same team because we all have the same production push processes across the entire company and a culture of keeping good documentation.
Examples: Facebook Live and Timeline
Standard things like 1:1s and performance conversations are consistent across the entire company+10: Notes from 1:1s, performance reviews, calibrations, are all kept in the same place and travel as people change teams. Managers have consistent impactful 1:1s with their teams that don’t require threats from their VP because they are unable to hold a high bar for their leadership team
-10: PowerPoint templates are sent for various initiatives without any go to market plan or follow-up to ensure consistency
A rubric to use to score your company: Enterprise Tools

Engineering teams need even more from enterprise tooling

Aside from enterprise tools for the entire company, engineers also need high-quality, robust, easy to use tools to make their life easier. Engineering teams can make more than 30% of large tech companies.

Craftsmanship in enterprise tools with scale in mind cannot be underestimated
  1. Data exploration with Alation and Data Explorer
  2. Data querying with Trino
  3. Data analysis
  4. Configuration like Remote Config
  5. Keys and encryption like with Google Cloud Key Management
  6. Alerting and monitoring with DataDog or Splunk
  7. Build, test, deployment, verification of code like Semaphore CI
  8. Experimentation systems allow for ease of A/B testing and multi-arm-bandit like Optimizely
  9. A rollout system allows for different rollout strategies like Argo

How do I know if my company is doing a good job with tooling for engineers?

I don’t need to wait for a license to be able to analyze data+5: I can analyze data in under 10 minutes
+10: We are empowered no matter the role of TPM, PM, EM, SWE to query data on my own
-5: I need to request a license to get access to data
-10: I have to ask someone else to access data for me because it’s too complicated
I can figure out what column names mean along with the data inside of them+5: The columns and values inside of the database are well commented so I can determine what they mean
-5: I have to query another system to figure out what columns and values mean
-10: I have to ping on Slack to ask people how to interpret tables and data
Our Incident management system is tied into all of our tools, so I know when there is an issue with the tool I’m using+5: I don’t have to go anywhere to realize there is currently an issue with an enterprise tool
-5: I have to ping a help channel on Slack to see if there is an Incident going on
Our key management system for storing sensitive data is stable, reliable, and easy to use+5: They key management system just works and I understand how to use it
+5: I can easily write code to access passcodes, PGP keys, etc. There are mock frameworks that allow us to unit and functional test them.
-5: The key management system I use is confusing
-10: Our enterprise tools use a different key management system than production (or none at all because you copy/paste configuration files in production for enterprise tools)
The configuration system is state of the art+5: I can target a subset of users (e.g. employees) easily with my configurations
+5: Employees can opt-in easily to dog food new features
+5: A feature flag can be disabled if there is an alert triggered to disable a feature that may have a production problem
+5: Our internal tools use the same configuration platform as our production system to allow for dog-fooding and internal testing of new configuration features
Observability is fast and insightful+5: I can setup an alert to page my oncall team in under 5 minutes
+5: It takes a few seconds to get metrics an log data
-5: It takes a few minutes to get log data
+5: I can comment and mention other people on a dashboard to collaborate directly within our tools
Oncall teams are linked to data, products, code+5: It’s obvious who owns a specific piece of code
+5: It’s obvious who owns a database table or data pipeline
+5: It’s obvious which oncall teams on a specific product feature
Code rollouts are slick+5: I only need to click one button to roll out my code to production after it has been reviewed
+5: When it gets late in the day, or on Friday when you should never push code, I’m cautioned against being stupid
+5: I can easily test my feature within an employee base before rolling to production
+5: As code passes different gates, our team is notified via Slack chat and mobile pings
+5: Functional tests with 90% coverage run on production as code is rolling out
+5: It’s expected of engineering teams to not only write unit and functional tests, but these same tests are used to test production rollouts
-5: I have to click multiple buttons and get multiple approvals to push code that takes over 5 minutes to complete after my code review already gets a LGTM! 🙌 comment
I can do most everything from my IDE+10: I can save the code in my IDE and immediate test it on a staging environment
+10: I can save code in my IDE and another developer on their laptop can use what I’ve done on my staging environment
-5: I have to constantly create a test environment to test my code
-5: My test environment is broken most of the time
-5: I can’t reproduce something from production on my test environment because the environments are not even close to parity
-5: Functional testing is not standard across the company, I can’t easily look at another code base and write a functional test for it
A rubric to use to score your company: Engineering Enterprise Tools

So…how did your company score?

Max points: 220
Min points: -180

Add up all of the pluses and minus and consider the best score is 220, the worst is -180. Where did you land?

What actionable steps can you take to help repair these problems upstream, as a system? As a next step, you might try using an Ishikawa or Five Whys to get to the root cause.

How much does operational cost, employee moral factor in to these focus areas? Use this to create a business plan and speak with your leadership about your recommendations for change: even something small can go a long way.

If you need more inspiration, check out these podcasts:

All of the content from this blog is my personal opinion and does not reflect the opinion of any company I have worked for.

Flowers in France

The other side: 4 reasons why the grass is greener

I recently left my job of 10 years at PayPal (5 as an engineer, 5 as an engineering manager) to join Meta. One month into my new role seems like a great time to check-in and reflect on the past.

Greener, cuter, grass

I want to take this time to update my fellow confidants, friends, and the world, on why this job is 100x better (with a caveat that I’m still in a honeymoon phase and have recency bias). I’m going to only talk about four key areas that are most salient for me right now: engineering tools, management expectations and accountability, human-first relationships, and an early emphasis on scaling.

Continue reading

Food Options Flying Singapore Airlines from SFO to SIN (SQ SFO SIN)

Changi Airport (SIN) from SFO

Before you get to experience the beautiful waterfall from Terminal 1 of the Changi Airport and the amazing and diverse food scene in Singapore, you’ll be on a 16 hour flight from SFO to SIN. While you’re trying to decide on economy or economy premium, it might help to have some insight into some differences in value.

The Jewel of Changi Airport in Terminal 1
The Jewel of Changi Airport in Terminal 1
Continue reading

30 days in Italy

For my sabbatical we ventured all around Italy.  Here’s a summary of the places of joy that were experienced during this journey.

Journey through Italy – April 21st to May 20th 2018
  • Rome – 5 nights
  • Vernazza – 2 nights
    • Monterosso
    • Corniglia
    • Manarola
    • Riomaggiore
  • Giais – 2 nights
    • Venice
Continue reading

The top 5 things to do in Florence

Start here – Google Maps List: Florence 2018

I’ve put together a Google Maps list of all of the highlights from my trip in May 2018 where I stayed in Florence for a week!  I hope you find this information as deliciously helpful as I did!

1. Eat (gelato, pizza, panini, Florentine steak)

The highlight of my trip was when I spent a day wondering from gelato shop to gelato shop.  In all, after the day was over, I had visited 7 gelato shops out of the 20 that I targeted.  The following day I rounded out the top 10 by visiting 3 more.  Each gelato shop is a little different – some offer very rich chocolate, some have pieces of nuts or fruit.   Check out my Google Maps list for some of my favorites including some from the pictures below!

Continue reading

Remembering Gary Wheeler

Gary was the captain of many pool teams and was a father for everyone he knew.  He was known for his never-ending stories, his jaw-dropping precision shots on the pool table, and his mentorship both in and outside of the game.

There are so, so many words to describe Gary, but I wanted to extract three from my heart:

1. Passionate: Gary wanted to win, but he also wanted to have fun.  Fun usually was more important than winning.  If he wasn’t pulling a prank, he was making a joke – his laugh was contagious, but his smile was heart-grabbing.

2. Heartfelt: Gary saw the best in people and always wanted the best for his friends and family.  As a captain, when the game was on the line, he would always want to make sure everyone had a chance to play. He always put himself last and others first.

3. Strong: This was felt in his handshake, in his steadiness, and after every accident that caused him bodily harm (like landing upside down in a sandrail and compressing his spine and his coal mine accident).  He always got up and kept going.

Gary’s eternal souvenirs are plentiful.

His shouts of “you dirty son of a bitch” after you play a defensive shot on him, his facial expressions always told you what was on his mind. He reminds me of how I’ll be when I’m older, but I have a lot to learn until then.

The most profound learning from Gary was when he called a timeout at the Southwest Challenge in Las Vegas with Tashana, walked up to the table, picked up the q-ball, and committed a foul as a coach, giving the opponent ball-in-hand.  I will never forget the look on his face, and I will never, ever pickup a q-ball during a timeout unless I’m sure I can.

I am thankful to have these memories of Gary, and I am eternally grateful to Tashana and Moses for introducing me to this great man, a winner at life.

A Quick Trip to Versailles from Paris

versailles door and ceiling
Gold! So much gold and beauty.

While living in Paris for the past 6 months, we consider ourselves to be Versailles experts. We normally target Saturday or Sunday with an arrival time at 9am.

Breakfast – First Things First

If you aren’t morning people like us, it’s probably best to start your day off with some coffee and amazing pastries from Eric Kayser.  Even though it is a chain restaurant in Paris, it is still one of the best places for your morning croissant, brioche, pain au chocolat and café.

Continue reading