19 August 2022

Data Navigators

Rich with historical trivia, linking to great practical insights, and sublime section headings. A must skim for any slow Friday - Data Workers Daily

Where am I?

Throughout history, this question has usually been resolved somewhat approximately, but always with a strong desire for better accuracy.

From the perspective of explorers deep in the fog of discovering new worlds, each incremental step in increasingly accurate methods of navigation was a new opportunity, much like an entrepreneur sees the incremental tech capabilities as a wonder of possibilities.

Let us pick up the thread of increasing accuracy at the very interesting inflection point of celestial navigation.

Sextant_(PSF).png (2320×2090) | Tattoo | Pinterest — Pinterest captures the gist of how it works

Using the angle of stars to the horizon, coupled with some maths and reference charts, one can approximate latitude, with fair accuracy.

Latitude (remember, steps on a La(t)dder) are the ones that go across the map. How far North or South.

Figure 1 - Latitude and Atlantic Ocean — Encyclopedia Britannica

Longitude (the down ones), however could only be very roughly estimated due to the lack of effective seagoing clocks and the process being more complicated.

For most of history, Marine Navigators could at best get to the latitude they knew their destination was on, and sail along along that latitude until they bumped into their destination (Land Ho!), or whatever was between them and the destination.

Often this was something hard and sharp.

Crossing the Atlantic Ocean between Europe and the Caribbean (see figure 1) was pretty attainable, with a great heuristic:

Sail South until the butter melts, and then follow the Sun.

The butter melting is the approximate Latitude that you are aiming for in the Caribbean. Following the Sun is heading West. When you spot land, you have arrived!

Beyond that, things get far trickier.

Rounding Cape Horn or Cape of Storms are both formidable challenges, even today. Without accurate East/West determination, you were left with far fewer heuristics and some mediocre workarounds:

As the Crow Flies – When lost or unsure of their position in coastal waters, ships would release a caged crow. The crow would fly straight towards the nearest land thus giving the vessel some sort of a navigational fix. 1

The race to map the world meant that enhancing location accuracy was as valuable then as it is now (maps, the new oil).

James Cook, without much sensitivity for how it would be interpreted after the fact, hammered around the earth, charting the world with only a rough idea of his latitude. Here he is running aground on the Great Barrier Reef, presumably due to an intern’s PR.

And so, a 1700s British government-sponsored technical arms race was established, matched only by the Data Stack sparring of 2021 in terms of strong narratives. The desired outcome - a better clock.

The marine chronometer (a much better clock) was conceived of and improved to such a degree that it enabled navigators to know where they were.

The chronometer enabled better longitude determination. With an extreme emphasis on better (just a tiny snippet of the history of navigation, which may or may not be fascinating, I won’t impose).

The point, and why I find this a relevant thing to idle upon, is because this approximate location resolution process still exists. When you learn to navigate a modern sailboat, you are instructed to determine your location by taking 3 compass readings, resulting in a triangle, resolving to a fair chance that you are or at least were in that triangle.

https://i1.wp.com/www.paddlinglight.com/pl/wp-content/uploads/2011/02/fix-example.jpg?fit=620%2C412&ssl=1 — 3 lines are better than 2. You are possibly within that red triangle.

One does this despite having a GPS, for obvious reasons to anyone who has ever relied on any highly available service - backup and validate.

And no, the numbers don’t match. Find peace therein.

The point is that within the context of “where am I?” there is always a certain uncertainty around how well all the data feeds are working. Quite literally in the rules of of the ocean, you need to continuously consult “all available means” to assure yourself that you know where you are. Granted the rule is for avoiding a collision, the point is that you may not presume to trust a single data point from a single system.

The link to the data

…is still coming, first another bit of history.

Sir Francis Chichester was another straightforward British navigator type with an entrepreneurial streak, having gone to New Zealand to set up a forestry startup with NZ Combinator. He later picked up an interest in flying, and went to the UK to buy a plane and fly it BACK to New Zealand over a few months - a new Tesla the modern equivalent I presume.

The plane spent a fair bit of its time upside down, and this was not the most spectacular nor dangerous of the calamities he suffered in his plane.

Through this long and dangerous trip (England to New Zealand in 1929 in the above single-seater float plane), he encountered a few setbacks in terms of knowing where he was.

Once in NZ, he decided to tackle crossing the Tasman sea, thought to be a bad idea at the time because navigation would be an issue. He needed to stop halfway across to refuel, and the tolerance for missing the refuelling stop at Lord Howe Island was near zero. If he wasn’t able to land there, he’d certainly be beyond rescue or recovery.

He was suddenly outrunning the metrics layer of a previous generation (neat), so he needed to develop his own:

The challenge of the Tasman remained and Chichester realised that he could reach Australia if he fitted [the plane] with floats to alight on the sea and refuel at Norfolk Island and Lord Howe Island.
The real problem now was to find these tiny spots in the sea. The only method of position-fixing available was to take sunshots with a sextant - not easy when you’re alone in the cramped vibrating cockpit - then laboriously work out the fix with pencil and paper.
He decided to use the principle of ‘off-course navigation’ i.e. deliberately aiming for a point to one side of the island. When this point was reached there would be no doubt which way to make a 90º turn for the final leg to the island. To reduce potential errors in calculation, Chichester worked out a series of examples based on his estimate of the sun’s position at the time of his expected arrival at critical points on his course.

Notice the dog-leg about 1 3rd of the way across (East to West) — More pics

The primary issue was around the speed at which he moved, but through a clever application of logic and tolerance for approximations, he managed (quite literally barely based on the reading of his book) to make it across.

He then went on to become the Chief Data Officer for the Royal Airforce flying school (training WW2 pilots), a prolific seller of maps, as well as a solo sailor (which is where he became famous).

The point here is that he was moving too fast for existing methods. The dynamic was changing too quickly for existing technologies. SQL Server SSIS was no longer sufficient. It may have been sufficient if you were doing milk runs in a steamship from Bristol to Londonderry (or a mid-tier bank with a CIO intent on minimal risk), but certainly not if you were trying to set a navigation record across the Tasman sea.

The Data Navigator

This is the link: Data + Navigation

Today, the problem of longitude has been solved to centimetre accuracy through satellite navigation2. However, the reason for the strong nautical-themed trivia, is that they demonstrate an excellent ability to manage and thrive with uncertainty.

Both Chichester and Cook knew the limitations of the BI tools of their time. They supplemented these with experience and intelligence to surmount what were pretty tough odds, to get to wherever they felt needed arriving.

For clarity, most navigators are indeed supporting roles to the Captain. The Navigator would use their combined technical skills, intuition and experience (collectively, wisdom) to give the Captain guidance on where and when. The Captain would then integrate all insights into their resource planning (a startup on the sea).

What is clear is that the Navigator didn’t rely on navigation aids as a crutch, but rather as means to better outcomes through a clear understanding of the weaknesses and an attempt at improving them.

So what is all of this leading to?

First, the Data Navigator is a useful analogy for startups and their data abuse.

When exploring, there is a benefit to be had from pushing into the unknown, and new methods are often required.

Second, bad data causes problems. Often it’s not the data, but rather the navigator.

Third, the navigator provides context which can be used as part of a higher purpose: alignment, motivation and maybe even consensus.

Three thoughts, three sections. Back to the sea, from whence we came.

1 - Exploration as a startup analogy

Navigation is well applied to the data team when the team is supporting the exploration of new worlds.

What the navigator does is supplement and confirm a mental model that someone has built about how the world works.

The map is not the territory

The idea maze maps nicely from historical exploration to startups - barely anything is known and there is a commercial framework based on discovery leading to reward. A startup is entirely lost, by definition, in the idea maze:

A good founder is capable of anticipating which turns lead to treasure and which lead to certain death. A bad founder is just running to the entrance of (say) the “movies/music/filesharing/P2P” maze … without any sense for the history of the industry, the players in the maze, the casualties of the past, and the technologies that are likely to move walls and change assumptions.

The navigator has a theory about where they are and are going. This information complements the plan.

The startup has a plan of action and looks for confirmation or invalidation.

A navigator needs to understand their science in addition to the history and futures of their space. This becomes acutely more important when there is some form of race, as was typically the case with all historic navigation.

Innovation mixed in with luck and intuition

A great supplement to this is that the navigator doesn’t rely entirely on data. They use it to complement their intuition.

In a rapidly developing environment, the intuition delivered by a navigator gives an indication of which inputs can be trusted over others, what to prioritise and how to simplify the decision space.3

I think the Data Navigator concept echoes the purpose of data within a risky environment, where there is upside to getting it right.

2 - When it goes wrong

There are many documented situations where among other failings, huge ships just run aground.

He left her on autopilot, but strong currents overnight pushed the ship to the north and east and the chief officer altered her course towards the north. When Captain Rugiati awoke he saw that the Scilly Isles were unexpectedly off his port, not starboard bow4

The issue arises when the Captain receives information from the navigator that undermines the model. That number doesn’t look right. That island doesn’t look right.

Twitter bots, Substack daily views. These issues are easily overlooked until they blow up.

It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.
— ?

Dealing with these is the actual skill of the navigator. Being sure of what they know, and balancing this with what they don’t.

Navigators provide the best supplement to decision making, trading off speed against accuracy, and relying on intuition, and overcoming ingrained preconceptions and bias.

Speed not haste

Accuracy trades off against speed almost directly.

Crucially, this needs to be seen in the appropriate context. Some data teams function to ensure accuracy. In that context, the Navigator mindset is likely misapplied, and it’s probably better to rely on an auditor/historian mindset5.

Startups require speed, due to competition. Accounting requires accuracy, due to compliance.

Speed and hurry should not be seen as excuses for bad data. As with navigation, so with data, there are pretty firm fundamentals that are easy to achieve, can be effectively relied upon and supplement measurement and improvement of the basics.6

"To err is human, but to persist [in error] is diabolical."
— Latinum Sayinum

As a navigation route starts to mature and grow in traffic, then the balance favours accuracy, to drive efficiency. Similar dynamic when a startup becomes an established scale-up.

Hierarchy

It should be noted that the Nautical Navigator operates in a strictly hierarchical environment. There are no committees at sea, the Captain has the final word, typically for good reason, though the shipping disaster earlier indicates how this becomes an issue, as described in Black Box Thinking:

When we are confronted with evidence that challenges our deeply held beliefs we are more likely to reframe the evidence than we are to alter our beliefs. We simply invent new reasons, new justifications, new explanations. Sometimes we ignore the evidence altogether.

Many maritime disasters, when deconstructed, relate to power dynamic issues, indicating mental inflexibility around what was true. Running your ship into a reef or ignoring a warning about a storm are great examples of ignoring the data.

This is because new insights are disruptive and hard to integrate. People are happier with consensus than they are with uncertainty, even when the uncertainty might save them.

Organisations think they want more insights and innovation.
They are deluding themselves.
Organisations suppress insights for reasons that are locked inside their corporate DNA: organisations prize predictability and they recoil from errors.
— Gary Klein

Granted that the above line was in the context of innovation, not avoiding disaster, but the same “don’t rock the boat” mentality underlies both aversions to innovation and failing to avert course.

My anecdote - If you are on a UK train and someone is in your seat, and has the same seat booked, then at least one of you is on the wrong train - an uncomfortable insight.

The hardest thing for the navigator to overcome is an unwanted insight.

3 - Purpose

To bring things closer to the point then. A data team ultimately brings a shared context to an organisation. We are here, and aiming there.

As we know, the shared context is abstract, flexible, and transient. All the data does is add a foundation to the abstraction of reality we use to support and enhance our shared context. The data tends to be the one part that is consistent, reliable and infallible.

This has an underlying seemingly heretical belief: a single source of truth is an abstraction.

Single source of truth is a fairytale, data teams help reconcile this untruth.

What it looks like is shared context, with firmly drawn lines, in pen, on paper, that indicate that the boundary of the territory is exactly HERE.

Trails

And so what the data team does is simplify, create some abstractions, and identify options, and trails:

Complete freedom is not what a trail offers. Quite the opposite; a trail is a tactful reduction of options.
― Robert Moor, On Trails: An Exploration

This is to simplify and create shorthands, narratives, and reasonable decisions.

However, the Data Navigator needs to keep this simplification in mind, always apply their experience and skill to avoid disaster, and ensure the drift from reality to map isn’t too severe:

The map maker, the surveyor, the compass reader, the ships crew, and everyone that had some part in the determination of the final pen to paper knows that there are many sources of uncertainty, each possibly extending the truth in their own limited capacity that may indeed lead to the pen on paper determination being slightly but disastrously off.

We had 424,000 daily active users yesterday.” The pessimetricist thinks — hopefully, he does not say this — “Actually, you had in excess of 424,000 HTTP requests from devices associated at least temporarily with unique user accounts registered in your internal systems over a 24-hour time period that survived a number of arbitrary assumptions in your data processing systems that passed muster six months ago but which haven’t been re-evaluated meaningfully since.
— Stephen Bailey

But pen to paper it is, and from that moment onwards, and until a better map is created, that is the single source of truth, the true story.7

Substack doesn’t embed replies, so I took a screenshot

Certainty Matters

If the way we are collecting data, storing data, transforming data, distributing data, consuming data and then sharing data each have between 0.1% and 1% chance of error, then that error gets accumulated, and in some cases amplified, apply a narrative and suddenly the truth could be anything.

Is the data correct? What is correct? What is?

This doesn’t land well with people on the receiving end of data systems and old maps. Accountants doing financial reconciliation on data warehouses, and someone who has smashed their fibreglass boat into an uncharted granite rock 8.

When a data team gives an indication of uncertainty - this causes thoughts along the lines of YOU CANNOT BE SAYING WHAT YOU ARE SAYING.

The map, still, is not the territory. It is part of building consensus and alignment.

In the same boat

Context isn’t enough, one needs consensus:

Data professionals can build consensus as the company becomes more diverse. Data systems can establish methods for understanding the world even as it becomes more complex. [My emphasis]
— Stephen Bailey again! Perennial Truth Architectures

What that means, is that using context (meaning), the navigator and the team need to build from that shared context towards consensus (opinion or position reached by a group as a whole).

Without consensus, we get stuck in the wrong quadrant of a 2x2 matrix, where autonomy and indirection lead to the night watch sailing the boat in one direction and then the day watch on handover reversing course and backtracking. Consensus is king.9

Last Words

To wrap things up, the story here is as follows:

Help your team/company by navigating. Reconciling uncertainty is your special responsibility.
Hierarchy and inertia are cultural issues that flummox good intentions and amplify bad data. A purely technical orientation will only get the message so far, a voice and an opinion are necessary to succeed.
Build a shared context, and use it to reach consensus.

Navigation [Data] is easy.
If it wasn't, they wouldn't be able to teach it to Sailors [Business People].
— James Lawrence

Cook Inc team building off-site — Australia

The best list of nautical phrases, some need a fact-check (Windfall)

Accuracy is still not “solved” if you need sub-centimetre precision. Precision is probably another post or line of thinking

Excellent podcast on why product teams need to take more risk, and a great perspective on data vs intuition 20VC: Startups Fail Because They Do Not Take Enough Risk, Why A/B Testing is Inefficient and Slows You Down

So much detail around SS Torrey Canyon running aground: summary details, legal proceedings, excellent podcast

Once we go beyond generalists and into specialists, we start to see the need for all kinds of data archetypes. A few that I’ve plucked from the mind space, in order of likely usefulness from startup to enterprise

Data Navigators - less worried about the truth, more concerned about the objective
Data Plumbers - quality, speed, reliability
Data Journalists - truth-seeking, relentless, individual
Data Librarians - availability, discoverability, comprehensive
Citizen navigators - business users who can navigate data without engineering skills

I’d like to think further on how to matrix these against the notion of Pioneer - Settle - Plan concept.

I deliberately didn’t include considerations around mistakes - which while they should be expected, are their own category.

When asked if I would task the embedding of “data-stack analytics” into a web app, my first question is how important accuracy? This is probably too inflammatory, but ultimately comes to an important point - it is much easier to constrain the possibilities of embedding analytics if the system is one coherent stack, same DB, same framework, same developer.

Introduce an entirely different stack, with different latencies, different processes, a DIFFERENT TEAM, well then the chance for amplifying error increases, just like if you subcontract the printing of your maps to the lowest bidder and they distort the scaling inadvertently to get it to fit into their printer.

Substack had this issue, where they were double counting the readers of posts, presumably root cause is related.

Crazy replay of this boat running aground, to be fair to that rock, it’s actually an island!

dbt captures an important subtlety on the road to consensus - power. As we’ve seen (in boats above, in varying political systems and in companies we’ve worked for), how we reach consensus can be reached in varying ways.

[There are] two paths to growth in an organization that represent two different approaches to truth: the path of power, in which the word of God CEO comes down and slowly diverges via apostles organizational hierarchy; and the path of consensus, in which multiple humans converge on a truth based on shared principles