Sunday 31 January 2016

Google's Knowledge Graph a total fruitcake

Many thanks to Emily Scott who alerted me on Twitter to a priceless example of Google Knowledge Graph getting it totally wrong.

For those of you who don't know what the Knowledge Graph is, it is the box that sometimes appears on the right hand side of your results, which pulls together information on your topic from a variety of sources.  To quote Search Engine Land it is "a system that Google launched in May 2012 that understands facts about people, places and things and how these entities are all connected". The problem is that Google quite often gets it wrong, although usually it is just one fact that is incorrect. One of the more well know examples is when Google decided that the American author Robert Greene was born in 1959 but died in 1592. It had confused him with the the 16th century English writer of the same name. As I always say in my Google workshops, never trust the information that appears in the Knowledge Graph.  The data comes from different sources that may be referring to entities that are not related at all.

The example that Emily encountered, though, is in a league of its own. She was searching for frugivores (fruit eaters) and this is what Google's Knowledge Graph suggested:



As far as I am aware fruit is not the preferred food of wolves, cats or lions.  Clicking on the "View 45+more" option for representative species we see that Google is under the impression that cheetahs, killer whales, polar bears and leopards are also frugivores.



I'll allow raccoons although I wouldn't say that fruit is their preferred food. But, hey, what do I know about raccoons other than that my US friends tell me the little s***s raid trash cans and will eat anything they can get their paws on.

No doubt someone has already reported the error via the feedback link and someone at Google is busy correcting it. Enjoy and take screenshots while it is still there.

Friday 15 January 2016

Guardian's top search tips for Google not quite tiptop

I have just been alerted by fellow search expert Alison McNab to an article by Samuel Gibbs (@SamuelGibbs) in the Guardian on top search tips for Google.  I had to double check the date of the article because although it is OK for the most part it has got a few things wrong, one of the commands was withdrawn some time ago,  and it has missed what I consider to be one of the most important Google search options.

So let's have a look at the tips one by one.

1. Exact phrase.

Yes, placing double quote marks around words usually makes Google search for the exact phrase. However, Google does sometimes ignore the quote marks.

2.  Exclude terms

Yes, preceding a term with a minus sign will exclude documents containing that term

3. Either OR

Yes, the OR command does work when searching on alternative terms - most of the time. Make sure the OR is in capital letters.

4. Synonym search

Tilde symbol (~) for a synonym search? No! Google withdrew it over two years ago  because not many people used it. Google now looks for synonyms by default. If you precede a term with a tilde Google ignores it and carries on as normal. I've just tried several searches with and without the tilde and get exactly the same results.

5. Search within a site

Yes. The site: command is one of the most powerful advanced search commands and enables you to search within a single site, for example site:www,gov.uk, or a type of site, for example site:ac.uk for UK academic sites.

6. The power of the asterisk

Yes, the asterisk can stand in for one or more terms between two words, for example solar * panels. No, it is not a truncation symbol.

The example given by The Guardian  is a search on architect*, which finds "architect, but also architectural, architecture, architected, architecting and any other word which starts with architect." As with synonyms, Google searches for variations on a word by default.

I ran a search on phenobarb* expecting Google to pick up references to phenobarbitone. It picked up 76,000 results including phenobarbital but there was no mention of phenobarbitone in the first 100 documents.  Phenobarb without the asterisk picked up the exact same results. Excluding phenobarbitone by using the minus sign gave me 70,000 results.  A search on phenobarbitone, with and without the asterisk came up with 241,000 results.

7. Searching between two values

Yes. The number range search does work and is great for searching within a range of values or years.  For example:

chocolate consumption forecasts 2016..2020

top 10..100 UK car insurance companies

toblerone 1..5 kg

8. Search for word in the body, title or URL of a page

This covers the commands intext:, intitle: and inurl:.  All correct but intext: is especially useful in that it forces Google to include that term in the search. It is invaluable if you find Google dropping key terms from your strategy, which it does if you are likely to retrieve zero results or it thinks the number of results is too low.

9. Search for related sites

The related: command looks for similar sites, for example related:theguardian.com finds other news organisations. It works but only shows you 20-30 sites. Worthwhile using, though, if you want to broaden your search to other but similar organisations and only have one or two to start with.

10. Combine them

I wholeheartedly agree with this one. Once you have a few advanced commands under your belt you can really start to focus your search and retrieve more relevant results.

What's missing?

I'm surprised that filetype: was not included. It is nearly always on the list of top tips that my advanced search workshop participants suggest at the end of the day.  It's a quick and easy way of finding presentations (filetype:ppt, filetype pptx), government documents and research papers (filetype:pdf) and datasets (filetype:xls, filetype: xlsx, filetype:csv).

The major omission for me, though, is Verbatim. It is different from the rest in that it is not a command that you can type in. You have to run your search first. From the menu at the top of the results select 'Search tools', followed by 'All results'  and  then 'Verbatim'. Use this when Google is wreaking havoc on your search by leaving out terms and using weird and wonderful terms that have nothing to do with your subject. Verbatim will search on all of your terms without dropping any or looking for variations and synonyms.

If you are interested in learning more about advanced search in Google and other search tools, some of my past presentations and fact sheets are available at http://rba.co.uk/as/.