Scientist or Engineer: Who Cares About Intelligence Measures?

This 1997 paper explores one possible rationale behind attempts to measure intelligence or cognitive ability empirically.

It might be thought that a good general measure of intelligence would be indispensable, for instance, for quantifying the progress of research and development programmes aimed at understanding cognitively sophisticated natural systems and applying their underlying functional principles to develop more capable artificial systems. A good intelligence measure, so the story goes, would underwrite substantive comparisons between Nature’s systems and our own, permitting quantitative analysis of the success of such R&D programs. The story has a nicely scientific ring to it. But this paper argues that general intelligence measures are largely irrelevant to the ‘R’ component of R&D and much too broad for the ‘D’ component. I believe a better R question is whether the research is good science, while a better D question is whether a given artificial system has applied R insights in such a way as to better meet a particular (rather than general) set of design goals. This paper describes a candidate R&D methodology intended to reflect this belief.

1. Amazing men (and women) and their flying machines

Suppose an imaginary early twentieth century version of BT, the well diversified British Aeroplanes and Telecoms, embarked on a project to build better aircraft based on biological principles. We might imagine that back in those days, anyone who wanted to could strap a rocket on their back and ‘fly’, and over time rockets had become increasingly large and powerful. But the trouble was that being rocketed from point A to point B wasn’t really flexible enough to allow customers to do what they wanted to do efficiently, such as changing directions in mid-air or even arriving at their final destination in the right number of pieces. Technology for such niceties lagged far behind the technology for building bigger and more powerful rockets. The BAT team set out to create a whole new paradigm for flight, one which took inspiration from real flying creatures in the biological world and which might not involve rockets at all.

The plot is packed with breathtaking twists and turns, but only one small part of the BAT story concerns us just now. In particular, consider how the BAT team might measure their progress at building better flying machines and, indeed, how we might measure the capabilities of different aircraft today. At least two distinct approaches immediately present themselves.

First, we might ask how capable machines or animals are when it comes to the broad task of flying. Which is better at flying, a bat or BAT machine revision 12? Is an F-16 or a paraglider the superior flying apparatus? How about a dragonfly or a space shuttle? A microlight or a flying squirrel? One line of thought in defence of these sorts of questions would be that even though each of these systems is meant to operate in a completely different environment, there must be some common measure which reflects, however crudely, their basic capabilities with respect to the principle subject of research interest: flight. That measure might be complicated and multidimensional, but surely there must be one. Somehow we must be able, so the line goes, to justify such intuitively appealing assertions as ‘a dirigible is more capable of flying than a flea’. Without such a common measure, one might argue, we would have no empirically justifiable handle on whether the BAT team really were building better flying machines.

The trouble is that each of these example questions is difficult even to make sense of, let alone answer.

A second kind of approach eschews the catch-all concept of ‘flying’ altogether and focuses instead on the particular jobs which flying systems do, asking how capable they are when it comes to doing those jobs. Which is better for going faster than Mach 2, an F-16 or a flea? Which is more manoeuvrable at subsonic velocities, a dragonfly or a paraglider? How reliably can a bat fly in the dark, and how does its ability compare with that of a microlight, suitably piloted? The advocate of these sorts of questions would argue that even though they do not permit large scale comparisons between, for instance, the general flying abilities of BAT machine revision 12 and dragonfly, they allow for a better empirical understanding of how capable systems are when it comes to performing the tasks we really care about: not flying in some all encompassing general sense, but flying manoeuvrably, or quickly, or upside down.

The advantage of the second approach is that each of these questions admits of a comparatively straightforward answer.

2. Engineering, science, and bad engineering that sounds like science

The relevant analogy between the BAT R&D programme and the project to build more capable computers by appealing to biological principles is intended to be straightforward. But while the analogy does (hopefully) expose one or two aspects of the relationship between an R&D programme and ways of measuring such a programme’s progress, it is too simple to capture the details of that relationship. In particular, the analogy only reveals the difficulties which may crop up if we ask the wrong sorts of questions when trying to evaluate engineering success (the D part of R&D). It says little about the R part of R&D.

Generalising wildly, one might say that R is science, that D is engineering, and that one should evaluate the progress of each with measures suited to their respective natures. Thus, relevant R questions should indicate scientific success: Are these really general principles of biological cognition? Are these hypothesised general principles of biological cognition consistent with the available neurophysiological evidence? What features of natural cognition do these general principles allow us to predict and/or explain? What empirical evidence would tend to confirm these hypotheses, and what would lead us to reject them? In general, R questions ask how well we understand the object of study and how we could improve that understanding.

Likewise, relevant D questions should indicate engineering success: Does artificial system revision 12 efficiently implement this particular general principle of biological cognition? Does this implementation of that biological principle successfully meet the design criteria we have set? What other biological principles might be added to this implementation to meet those design criteria more successfully? In general, D questions ask how well a particular system does something we’d like it to do and how we could make another system that does it even better.

The question of how intelligent a system is-like the question of how good one is at flying-fits neither category very well and is probably best understood as bad engineering that sounds like science. It is an engineering question, because it asks in a rough way how well a particular system lives up to our ideal. (I.e., we would like to have machines that fly, and we would like to have machines that are intelligent.) But it sounds like science, because it looks like we are specifying a well defined and empirically meaningful property of a system and then setting out to measure it. However, as I have argued in the short companion paper ‘”Intelligence” is a Bad Word’, ‘intelligence’ in any general sense is neither a well defined nor an empirically meaningful property of a system. And as I have argued in ‘Information Theoretically Attractive General Measures of Processing capability-And Why They are Undecidable’, aiming instead for a well defined notion of information processing capability means living within absolute logical barriers which severely constrain the empirical verifiability of claims about information processing.

3. An alternative methodology

Taking the view that we are really interested in particular abilities rather than general measures and that R&D can be split roughly into science and engineering, I suggest the following as one very coarse grained way to look at the underlying methodology of an R&D programme to build better computers using biological principles. (Because I never can remember the difference between a diamond and an oval in flow chart-speak, I’ve gone for the old 1-2-3 list…)

Step 1: Formulate a testable hypothesis stating a general principle of biological cognition.

This may be empirically driven, in which case the aim is to extract an inductive rule from a body of data, or theory driven, in which case the aim is to fit an existing partially formed cognitive theory to empirical data.

Step 2: Evaluate: is this general principle consistent with available evidence?

Is this general principle consistent with the available neurophysiological, psychological, and behavioural data, plus any established results from real robotic tests?

-If YES, proceed to Step 3; otherwise, go back to Step 1.

Step 3: Implement this general principle in a particular robotic or other artificial system.

In other words, shift from scientist to engineer.

Step 4: Evaluate: does this robotic system display behaviours akin to those which the general principle hypothesised in Step 1 is intended to explain?

-If YES, this is progress! Go back to Step 1 or Step 3, depending on how satisfied we are with the answer to Step 4 and how much we value the relevant behaviours for solving particular problems. Otherwise go on to Step 5.

Step 5: Evaluate: was the lack of appropriate behaviour at Step 4 the result of an engineering failure (bad implementation) or was it evidence to reject the hypothesis formulated at Step 1?

-If engineering failure, then go back to Step 3; otherwise, go back to Step 1.

Obviously this simplifies the process hugely. Among other little catches, the loop between steps 3, 4, and 5 is apt to be complicated by the fact that in attempting to implement cognitively sophisticated systems, we may not know in advance what components can be used to implement a given general principle. Science may overtake engineering at those stages when it becomes necessary simply to put things together and see what happens-not because we’re testing for the successful implementation of a general principle, but because we’re testing what the available components can do in the first place. (Some available components, such as VLSI arrays of artificial neurons, have become so complex that we don’t know in advance what can be done with them.) Note also that many general principles will survive steps 1 and 2 without necessarily reflecting anything about the way the real world works: we might formulate many hypotheses which are consistent with existing data but which are in fact false. However, designing new experiments to test hypotheses’ plausibility in the biological world (as distinct from the artificially designed world) is the job of the natural scientist, not the job of those using biological inspirations to build better computers. Finally, the 5-step plan really should be a 7-step plan for a better fit with BT culture.

But these nuances aside, my purpose in describing the methodology explicitly, even very loosely and at a coarse grained level, is to show that no step requires questions to be answered about general intelligence or processing capability. Progress is measured straightforwardly at Step 4 by how well an R&D programme has extracted principles from biological cognition to build better robots or other systems.

Concocting a general intelligence measure is thus by no means indispensable when it comes to gauging the progress of R&D programmes to build more capable computers based on biological principles of cognition. It appears instead to be a poorly formed question of engineering masquerading in scientific garb. Simplifying greatly one last time, I would suggest that arguing over intelligence is much like arguing over life: we don’t (or shouldn’t) care so much about the semantic question of whether a system ought to be called ‘intelligent’ or ‘alive’, but we do (or should) care about whether it does the things that ‘intelligent’ or ‘living’ things do.

This article was originally published by on and was last reviewed or updated by Dr Greg Mulhauser on .

Mulhauser Consulting, Ltd.: Registered in England, no. 4455464. Mulhauser Consulting does not provide investment advice. No warranty or representation, either expressed or implied, is given with respect to the accuracy, completeness, or suitability for purpose of any view or statement expressed on this site.

Copyright © 1999-2023. All Rights Reserved.