"A mathematician is a man who is willing to assume anything except responsibility." (Theodore von Karman)
"Rapid, effective decision making under conditions of uncertainty
whilst retaining Meaningful Human Control (MHC)" is the sort of mantra
associated with Human Machine Teaming (HMT). A purely mathematical
approach to risk and uncertainty is unlikely to match the needs of real
world operation, as Wall St. has discovered.
So, during the design of a system where the data are potentially incomplete, uncertain, contradictory etc. how does the designer offer assurance that data quality is being addressed in an appropriate manner? Or are we doomed to crafted systems on the basis of "trust me"?
Not all forms of
uncertainty should be treated in the same way; this applies to data
fusion, say, and most other tasks. It is my impression that the
literature on data quality and information quality is not being used
widely in the AI, ML, HMT community just now - I'd be delighted to be
corrected on that.
ISO/IEC 25012 “Software Engineering – Software Product Quality
Requirements and Evaluation (SQuaRE) – “Data Quality Model”, 2008
categorises quality attributes into fifteen characteristics from two
different perspectives: inherent and system dependent ones. This
framework may or may not be appropriate to all applications of HMT but
it makes the point that there is more than just "uncertainty". Richard Y
Wang has proposed that "incorporating
quality information explicitly in the development of information
systems can be surprisingly useful" in the context of military image
HMT takes place in the context of Organisational Information Processing.
The good news is that this is quite well-developed for flows within an
organisation (less so for dealing with an opposing organisation). The
bad news is that Weick
is hard work. The key term is equivocality, and I suggest that the HMT
community use it as an umbrella term, embracing 'uncertainty' and other
such parameters. Media richness theory helps.
"A man's gotta know his limitations" (Clint Eastwood). "So does a robot" (BSJ)
A key driver for data quality management is whether a system (or
agent etc.) assumes an open world or a closed one. Closed world
processing has to know the fine details e.g. how a Google self-driving
car interacts with a cyclist
on a fixed-wheel bicycle. By contrast, GeckoSystems takes an open
world approach to 'sense and avoid' and doesn't have to know these fine
details. It would seem that closed world processing needs explicit
treatment of data quality to avoid brittleness.
Time flies like an arrow, fruit flies like a banana.
At some point, the parameters acquire meaning, or semantic values.
"We won’t be surfing with search engines any more. We’ll be trawling
with engines of meaning." (Bruce Sterling). The parameters may be
classified on the basis of a folksonomy, or the results of knowledge
elicitation. So far as I can see, the Semantic Revolution has a way to run before achieving dependable performance. Roger Schank has been fairly blunt about the present state of the art. Semantic parameters are likely to have contextual sensitivity, which may be hard to characterise.
If a system is to support human decision making, then it may need to provide information well beyond that required analytically for the derivation of a mathematical solution. Accordingly, the system may need to manage data about the quality of processing.
For robotic state estimation, the user may need more than a point best estimate. Confidence estimates may need to be expressed in operational terms, rather than mathematical ones. Indeed, the HMT may need to reason about uncertainty as much as under uncertainty.
This post is scrappy and home-brewed. Suggestions for improvement are welcome. If I am anywhere near right, then the state of art needs advancing quite swiftly. As a customer I wouldn't know how to gain assurance that the management of data quality would support safe and effective operation, and as a project manager, I wouldn't know how to offer such assurance.
Update: This is nice on unknown unknowns and approaches to uncertainty in data: