Is Google Really
a Neural Network?
by Phil Marks
Firstly, what do we mean by Google in this context? For the purposes of this discussion, we
refer to Google as a combination of a search engine and an instantaneous results set across all web site and blog
resources worldwide. Now, the last numbers I saw (Feb 2010) estimated 750 million websites worldwide, plus 200
million blogs. There are of course other domains which Google also scans. Other figures suggest 25 billion indexed
webpages (Netcraft March 2009).
Here, I use the term neural network not in the strict AI sense, but in a more general
sense.
Now, consider the human brain as I understand it (a very simple model). It has a set of
data inputs (visual, auditory, chemical – taste and smell, pressure – touch, thermal, inertial – the ear canals,
that we know of) and a memory structure. Data input is stored in short term memory becoming information – i.e.
brain processing adds context, then sorted and filtered and then moved to long term memory. The short and long term
memory takes the form of synapses (junctions between brain cells). More input in a given memory area strengthens
the relevant synapses. We know that as we age, the more salient memories (stronger synapses from earlier in our
lives) are easier to retrieve, and short term memory becomes less efficient. Our ability to build new synapses
falls off with age in most people. Autonomic responses (e.g. breathing) use ‘hard wired’ memory in the hypothalamus
which is a very primitive part of the brain structure.
So, consider Google to have a set of data inputs – primarily the bot/crawler data
gathering, but also input about the ‘popularity’ of web pages as gathered through use of its search engine by
users. The data from these bots about a given web page – for example keyword relevance of content, the number of
external links to the page is converted into Google’s proprietary and secret page rank scores and provides a
‘salience’ for the analogous or proxy Google synapse. The proxy synapse is simply (I assume as I am not privy to
Google’s design) a database row for the website/page with the aforementioned data items (including the page
rank/scoring factors) in the columns, site map entries and site refresh rate and search history
information.
Of course the analogy with the human brain breaks down with time, as we would not expect
the Google model to suffer from a capacity limitation or by a constraint imposed by ‘technology’ (as happens with
the brain when we age and the synapse building processes become less efficient).
So, what use is this analogy to us? Well, consider how we might wish to add to human brain
capacity and extend its efficiency - we are getting into William Gibson territory now (he was the author who
invented the term ‘cyberspace’). Why plug additional memory chips into the brain, when all that is needed is a
connection to Google. Science fiction? I don’t think it is that far away (less than 50 years). The potential social
consequences are quite frightening to consider!
(c) 2010 Phil Marks PDQ Project Management at =>
www.projectpdq.com
|