Sunday, 20 December 2015

Human Machine Teaming - Data Quality Management

"A mathematician is a man who is willing to assume anything except responsibility." (Theodore von Karman)

"Rapid, effective decision making under conditions of uncertainty whilst retaining Meaningful Human Control (MHC)" is the sort of mantra associated with Human Machine Teaming (HMT). A purely mathematical approach to risk and uncertainty is unlikely to match the needs of real world operation, as Wall St. has discovered.

So, during the design of a system where the data are potentially incomplete, uncertain, contradictory etc. how does the designer offer assurance that data quality is being addressed in an appropriate manner? Or are we doomed to crafted systems on the basis of "trust me"?

Not all forms of uncertainty should be treated in the same way; this applies to data fusion, say, and most other tasks. It is my impression that the literature on data quality and information quality is not being used widely in the AI, ML, HMT community just now - I'd be delighted to be corrected on that.

 ISO/IEC 25012 “Software Engineering – Software Product Quality Requirements and Evaluation (SQuaRE) – “Data Quality Model”, 2008 categorises quality attributes into fifteen characteristics from two different perspectives: inherent and system dependent ones. This framework may or may not be appropriate to all applications of HMT but it makes the point that there is more than just "uncertainty". Richard Y Wang has proposed that "incorporating quality information explicitly in the development of information systems can be surprisingly useful"  in the context of military image recognition.

HMT takes place in the context of Organisational Information Processing. The good news is that this is quite well-developed for flows within an organisation (less so for dealing with an opposing organisation). The bad news is that Weick is hard work. The key term is equivocality, and I suggest that the HMT community use it as an umbrella term, embracing 'uncertainty' and other such parameters. Media richness theory helps.

"A man's gotta know his limitations" (Clint Eastwood). "So does a robot" (BSJ)

A key driver for data quality management is whether a system (or agent etc.) assumes an open world or a closed one. Closed world processing has to know the fine details e.g. how a Google self-driving car interacts with a cyclist on a fixed-wheel bicycle.  By contrast, GeckoSystems  takes an open world approach to 'sense and avoid' and doesn't have to know these fine details. It would seem that closed world processing needs explicit treatment of data quality to avoid brittleness.

Time flies like an arrow, fruit flies like a banana.

At some point, the parameters acquire meaning, or semantic values. "We won’t be surfing with search engines any more. We’ll be trawling with engines of meaning." (Bruce Sterling). The parameters may be classified on the basis of a folksonomy, or the results of knowledge elicitation. So far as I can see, the Semantic Revolution has a way to run before achieving dependable performance. Roger Schank has been fairly blunt about the present state of the art. Semantic parameters are likely to have contextual sensitivity, which may be hard to characterise.

If a system is to support human decision making, then it may need to provide information well beyond that required analytically for the derivation of a mathematical solution. Accordingly, the system may need to manage data about the quality of processing.   For robotic state estimation, the user may need more than a point best estimate. Confidence estimates may need to be expressed in operational terms, rather than mathematical ones. Indeed, the HMT may need to reason about uncertainty as much as under uncertainty.

This post is scrappy and home-brewed. Suggestions for improvement are welcome. If I am anywhere near right, then the state of art needs advancing quite swiftly. As a customer I wouldn't know how to gain assurance that the management of data quality would support safe and effective operation, and as a project manager, I wouldn't know how to offer such assurance.

Update: This is nice on unknown unknowns and approaches to uncertainty in data:

No comments:

Post a Comment