In my Assignment 2 tutorial my tutor Wendy suggested four particular projects to look at as part of my continuing quest to pin down what aspects of memory interest me. To these I added a fifth, Bate’s Bungled Memories, largely because I came across it at around the same time as the tutorial and in my head I bracket it with the other four. Over this and related blog posts I will look at each of these projects in turn to identify points of potential inspiration and/or to assist me in refining aspects of memory that interest me most.

“The cat sits on the bed”, Pedagogies of vision in human and machine learning (2016) by Nicolas Malevé

This is an article written by Malevé in response to a TED Talk by Professor Fei-Fei Li, director of Stanford Artificial Intelligence Lab, titled ‘How we teach computers how to understand pictures’.

On the surface it’s a curious recommendation by Wendy, as it’s not a photographic project per se, and its relevance to memory wasn’t immediately obvious. However, Wendy’s notes that came with the research recommendations illuminated the relevance of this particular text somewhat:

“There are several strategies which spring to mind when thinking about remembering (and its mirror image, forgetting). The use of photography itself as an aide memoire or tool for remembering is the most obvious one. It is to the photograph itself that we look most often as an aide in remembering. Then we employ the other senses – smell, touch, taste and hearing (or sound). These other senses are of course more difficult to embody in photography, where we can often only illustrate them (i.e. your graphic interpretation of sound waves into visual form).” (McMurdo 2018)

To quickly summarise the TED Talk before looking at Malevé’s response:

Professor Li explains and illustrates how observing how humans learn to visually recognise their surroundings has led to a breakthrough in visual machine learning – in a nutshell, getting a computer to say what it sees. To do this, Li and her team used thousands of people to categorise millions of photographs over several years to feed an algorithm with a combination of quality and quantity of images until the machines could recognise and describe basic scenes (e.g. “a cat lying on a bed”).

Malevé’s review of the TED Talk starts by remarking on the similarities between humans and machines in this visual recognition context – but soon moves on to pointing out the significant differences. Firstly, human visual learning is individual while machine learning is crowdsourced (this potentially has parallels with the comparison of individual memory vs collective memory).

Something that he doesn’t mention but sprung to my mind is that the machines inevitably only worked with photographs – two-dimensional rectangular slices of mostly human-selected moments (some were images taken by machines in the first place, e.g. Google Street View). The human visual learning apparatus, by comparison, takes in absolutely everything it sees, in three dimensions and however trivial or repetitive, to build its visual memory bank. We may think that we’re teaching machines what the world looks like, but currently we are actually teaching machines what photographs look like.

More significantly, Malevé makes a point that links to Wendy’s notes above on the limitations of visual-only inputs:

“The computer vision algorithm’s impressive ability to track car plates and brands is counter-balanced by its inability to understand the subtleties of social life and basic human emotions. […] A boy patting an elephant and a man standing next to it do not describe exactly the same thing. What has been lost in translation is what grounds perception in a body: touch and affect.” (Malevé 2016)

Another example given is that whilst the child described ‘people going on a big airplane’, the machine simply saw ‘a large airplane on a runway.’

big airplane

The point here is that even at three years old a human subject visually prioritises other humans, even when they take up a minority of the scene. The machine misses that human-centric interpretation. Taking the idea further, a specific individual might describe this photo as ‘the last time I saw my family’. Context is key.

His closing point links back to the title of the piece and is concerned with the ‘learning’ part of ‘machine learning’. He makes an interesting point (a side-point for my purposes, but interesting nonetheless) that we are currently making more of an effort to teach machines how to look at images than we do humans (emphasis in original):

“This vast operation of photographic learning is happening outside of the institutions of education: on our phones, tablets, and computers. And it is not about training students in the art of visual literacy, but machines.” (Malevé 2016)

Main takeaways

As noted above, I wasn’t initially sure that this text related much to memory at all, but further thought has deepened my understanding. It’s still a tangental connection, I think, but it is important. For me Malevé’s key observation is how the purely visual process is a limited one, that simply seeing is not understanding in any meaningful sense. Context, emotions, prior knowledge, memories, biases and more can augment a visual stimulus significantly, rendering the purely visual reading somewhat primitive.

Looking at this from the viewpoint of photography and memory, the critical lesson for me is that looking at a photograph is only a useful aide-memoire if the viewing experience can also take into account a wider web of context, sensations, emotions etc. How the photographer can engineer such wider context in support of an intended message is an interesting challenge.

Relating this train of thought specifically to the lost gloves series that I am working on for Assignment 3, this article has made me realise that there is a risk (hopefully mitigated by my use of text, to be explored later) that a viewer would read the photographs much as a machine would – “that’s a black glove lying on a pavement” – rather than discerning any deeper signification. A viewer, like a machine, might not understand that the aforementioned black glove reminds me of some significant memory lapse in my past.


Pedagogies of vision in human and machine learning (accessed 05/06/2018)