Monthly Archives: July 2022

Enterprise Engineering Teams: Why You Need a Strong One

How do you know if you have an enterprise engineering team? If you do, how do you know if you have a strong product, team, and happy customers?

Most companies aren’t doing enough with their enterprise tools.

They usually have more junior engineers, creating buggy products running old platforms and technology. It takes months to create a tool, and when it’s released it’s not user friendly and wasn’t built with the customer in mind.

They may rely on contractors that have limited company perspective and lack long-term vision to build these applications. What you really want is a solid engineering team delighting employees with the tools they use: it’s a retention tool at minimum and a productivity driver on average.

I wanted to be extremely pragmatic with this article, forcing you to retrospect and introspect about your company, noting that the grass can be greener if you look outside.

A quick test

You have an enterprise engineering team if:
1. You have a CIO organization: typically there is a software engineering team for HR functions, an employee phonebook system, and other internal tools to support employees.
2. You have an internal phonebook system that is built (e.g. Amazon Phonetool) and not purchased (e.g. WorkDay, Gusto, Friday, etc.)

Enterprise tools should stand the test of time, but not be built with ancient technology

General features might include

1. Employee directory: Allow employees to search for organizations, people, teams
2. News: Learn about recent events
3. Events: See a calendar and subscribe to upcoming events like a brainstorm, or a ERG meeting
4. Policies: HR, Security, etc
5. Thank you system: Applaud someone for helping out
6. Search capabilities: Search documents, news, people, etc.
7. Site-specific details: (e.g. Austin office details and perks) – how to get a badge, main contains, coffee shop hours, etc.

There are some more uncommon features to look for as well

1. Internal job portal: Allow employees to find jobs that are open
2. Employee transfer system: Allow employees to move into other roles easily
3. Badge system: Reward employees with internal currency and digital badges of honor for various things
4. Active Incident display: All tools used by employees display the current state of the production, sandbox, and developer environments along with any current Incident that may impact Customers or their work
5. Integrated financials and system health dashboards: At a glance, give people a general feeling for how the company is doing without triggering any deeper insider training restrictions
6. Ideas and patent portal: Keep track of awesome ideas that can turn into the next great product feature, allow employees to engage with others to build a business plan, go-to-market, and write code – then submit it for legal review to check for patentability to secure that intellectual property!
7. Promotion and rating systems for managers: If you are fortunate to have a standard promotion and calibration process across the company, this tool should allow for common practices to be implemented like anonymization, timed discussions of each candidate to remove bias, notes, assigned reviewers, voting, feedback back to the candidate, and decisions/ratings.

How do you know if my company is doing a good job with enterprise tooling?

Employees move from team to team easily+5: As an employee, when I want to move teams, I can mark in the system that I’d like to be considered by other managers for their teams
+5: I can move an employee to another manager in 10 minutes
+5: It’s easy to copy/paste usernames and first/last names because managers have to do this several times a day
-5: I can move an employee in 1 day
-10: I can move an employee in 5 days
-10: As an employee, I have to hunt for internal jobs on my own alone in the forest and have to wait for weeks for managers to respond to me making me not feel valued
As a manager, I love hiring because it’s easy+5: I can see how many positions, and at what level I need to hire for
+5: Within a few seconds, I can review the candidates resume, take notes so that my entire hiring group, including peer managers and senior engineers who help with hiring, can see it
-5: I can’t easily filter and mark candidates for the next phase of the interview pipeline
-10: I have had my positions taken away from me due to reorganizations, and hiring freezes within 2 months of each other wiping out any hiring pipelines and making candidates upset tarnishing your company brand
-20: A manager can hire someone without ensuring a significant cohort of people have been considered to ensure proper diversity, inclusion, equity, and belonging practices were followed see How to Actually Hire for Diversity
I can easily recruit people who have been referred by others+10: When you mark someone as a referral, you can easily follow them through the interview process and you’re consulted by the hiring committees for what you know about the candidate
-5: When you refer someone it goes into a pile where you can never tell if the hiring manager has ever seen it, there is no follow-up as the hiring process can take months to close a position even after someone is hired for that role
-5: As a hiring manager, you believe that referrals are the same value as an organic applicant applying online because referrals are sent by employees who don’t even know the person they are referring
We have a system that handles promotion panels and performance reviews+10: Across the company, performance reviews are conducted in similar ways for engineering teams
-5: Employees know they can get a better performance review in another organization
-5: Employees are promoted faster in other organization due to a lacking standard performance and promotion process
-10: Engineers who work harder, not smarter, working long hours due to poor planning and unrealistic deadlines are rewarded more and promoted due to fear of attrition
I can tell if there is a current Incident with our customer-facing products across all platforms I use+5: Our culture is such that people can easily find an active Incident for a product and can easily join in to see how things are going no matter if they are call center teammates, managers, directors, engineers, or project managers
+5: We hold the bar on post-mortems where we ask folks to leave if they have not prepared properly for their Incident review
+10: The same Incident doesn’t happen again, along with similar ones because proper testing, canary rollouts, and documentation is put in place
We have a centralized place to keep track of ideas, including patentable ideas, we can innovate on our products easily+5: When I have an idea for a new feature, product, or internal process I can collaborate with others to push it into production and/or receive a patent for it
+5: Our code base is built so that teams can easily collaborate on other products and features, even if they aren’t on the same team because we all have the same production push processes across the entire company and a culture of keeping good documentation.
Examples: Facebook Live and Timeline
Standard things like 1:1s and performance conversations are consistent across the entire company+10: Notes from 1:1s, performance reviews, calibrations, are all kept in the same place and travel as people change teams. Managers have consistent impactful 1:1s with their teams that don’t require threats from their VP because they are unable to hold a high bar for their leadership team
-10: PowerPoint templates are sent for various initiatives without any go to market plan or follow-up to ensure consistency
A rubric to use to score your company: Enterprise Tools

Engineering teams need even more from enterprise tooling

Aside from enterprise tools for the entire company, engineers also need high-quality, robust, easy to use tools to make their life easier. Engineering teams can make more than 30% of large tech companies.

Craftsmanship in enterprise tools with scale in mind cannot be underestimated
  1. Data exploration with Alation and Data Explorer
  2. Data querying with Trino
  3. Data analysis
  4. Configuration like Remote Config
  5. Keys and encryption like with Google Cloud Key Management
  6. Alerting and monitoring with DataDog or Splunk
  7. Build, test, deployment, verification of code like Semaphore CI
  8. Experimentation systems allow for ease of A/B testing and multi-arm-bandit like Optimizely
  9. A rollout system allows for different rollout strategies like Argo

How do I know if my company is doing a good job with tooling for engineers?

I don’t need to wait for a license to be able to analyze data+5: I can analyze data in under 10 minutes
+10: We are empowered no matter the role of TPM, PM, EM, SWE to query data on my own
-5: I need to request a license to get access to data
-10: I have to ask someone else to access data for me because it’s too complicated
I can figure out what column names mean along with the data inside of them+5: The columns and values inside of the database are well commented so I can determine what they mean
-5: I have to query another system to figure out what columns and values mean
-10: I have to ping on Slack to ask people how to interpret tables and data
Our Incident management system is tied into all of our tools, so I know when there is an issue with the tool I’m using+5: I don’t have to go anywhere to realize there is currently an issue with an enterprise tool
-5: I have to ping a help channel on Slack to see if there is an Incident going on
Our key management system for storing sensitive data is stable, reliable, and easy to use+5: They key management system just works and I understand how to use it
+5: I can easily write code to access passcodes, PGP keys, etc. There are mock frameworks that allow us to unit and functional test them.
-5: The key management system I use is confusing
-10: Our enterprise tools use a different key management system than production (or none at all because you copy/paste configuration files in production for enterprise tools)
The configuration system is state of the art+5: I can target a subset of users (e.g. employees) easily with my configurations
+5: Employees can opt-in easily to dog food new features
+5: A feature flag can be disabled if there is an alert triggered to disable a feature that may have a production problem
+5: Our internal tools use the same configuration platform as our production system to allow for dog-fooding and internal testing of new configuration features
Observability is fast and insightful+5: I can setup an alert to page my oncall team in under 5 minutes
+5: It takes a few seconds to get metrics an log data
-5: It takes a few minutes to get log data
+5: I can comment and mention other people on a dashboard to collaborate directly within our tools
Oncall teams are linked to data, products, code+5: It’s obvious who owns a specific piece of code
+5: It’s obvious who owns a database table or data pipeline
+5: It’s obvious which oncall teams on a specific product feature
Code rollouts are slick+5: I only need to click one button to roll out my code to production after it has been reviewed
+5: When it gets late in the day, or on Friday when you should never push code, I’m cautioned against being stupid
+5: I can easily test my feature within an employee base before rolling to production
+5: As code passes different gates, our team is notified via Slack chat and mobile pings
+5: Functional tests with 90% coverage run on production as code is rolling out
+5: It’s expected of engineering teams to not only write unit and functional tests, but these same tests are used to test production rollouts
-5: I have to click multiple buttons and get multiple approvals to push code that takes over 5 minutes to complete after my code review already gets a LGTM! 🙌 comment
I can do most everything from my IDE+10: I can save the code in my IDE and immediate test it on a staging environment
+10: I can save code in my IDE and another developer on their laptop can use what I’ve done on my staging environment
-5: I have to constantly create a test environment to test my code
-5: My test environment is broken most of the time
-5: I can’t reproduce something from production on my test environment because the environments are not even close to parity
-5: Functional testing is not standard across the company, I can’t easily look at another code base and write a functional test for it
A rubric to use to score your company: Engineering Enterprise Tools

So…how did your company score?

Max points: 220
Min points: -180

Add up all of the pluses and minus and consider the best score is 220, the worst is -180. Where did you land?

What actionable steps can you take to help repair these problems upstream, as a system? As a next step, you might try using an Ishikawa or Five Whys to get to the root cause.

How much does operational cost, employee moral factor in to these focus areas? Use this to create a business plan and speak with your leadership about your recommendations for change: even something small can go a long way.

If you need more inspiration, check out these podcasts:

All of the content from this blog is my personal opinion and does not reflect the opinion of any company I have worked for.

Flowers in France

The other side: 4 reasons why the grass is greener

I recently left my job of 10 years at PayPal (5 as an engineer, 5 as an engineering manager) to join Meta. One month into my new role seems like a great time to check-in and reflect on the past.

Greener, cuter, grass

I want to take this time to update my fellow confidants, friends, and the world, on why this job is 100x better (with a caveat that I’m still in a honeymoon phase and have recency bias). I’m going to only talk about four key areas that are most salient for me right now: engineering tools, management expectations and accountability, human-first relationships, and an early emphasis on scaling.

Continue reading