Monday, 21 February 2011

Update on coots vs. lions

If you have landed on this page thinking that this is a post about your favourite football or rugby team, please note that this is an update on my earlier article 'Google decides that coots are really lions' (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/). It has nothing to do with sporting activities unless you count trying to work out what Google is doing with your search! The original post was about how and why Google decided that a search on coots mating behaviour should really have been lions mating behaviour.

The first response to my posting was a comment from Arthur Weiss (http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-coots-are-really-lions/comment-page-1/#comment-14207).
He suggested that Google was treating coots and lions as synonyms (both are living creatures). I thought that was pushing synonyms too far even for Google. (Sorry, Arthur).

I then had two comments in quick succession from Susanna Winter via Twitter (@Mrs_Figaro). The first is at (http://twitter.com/Mrs_Figaro/statuses/36714410223341568):


Moving coots from the beginning to the end of the strategy resulted in an exact match and not a single lion in sight:


Changing the order of the search terms is a trick I often use to change the order of my results or bring up pages that might be buried in the hundreds or thousands, but I have never seen such a dramatic change such as this.

Susanna's search strategy 'coots feeding behaviour', which came up with an exact match, muddied the waters even more. Perhaps there is a search frequency algorithm coming into play? Are there more searches for lions mating behaviour than for coots, but not lions feeding behaviour? I am not convinced that this explains Google's insistence on looking for lions rather than our animal of choice. Susanna's next tweet suggests what is going on (http://twitter.com/Mrs_Figaro/statuses/36715389190676480):


What you see is:


So Arthur was on the right track. (My apologies, Arthur).  What probably happened with our search is, as Susanna said, that Google first assumed a typo and then did a synonym search on cats. What puzzles me, though, is how Google arrived at cats from coots. Surely coyotes or goats would be nearer when it comes to typographical errors?

I have two final variations on our search to confuse you even further.

The first is repeating coots at the start of the strategy. An exact match:


Now move one of the 'coots' to the end of the strategy and Google asks "Did you mean lions mating behaviour coots":


I give up!

Friday, 18 February 2011

Google includes your social circle in search

Google has been including search results from your social circle for quite a while. If you are logged in to a Google account it sometimes includes a box either at the bottom or in the middle of your search results page with a couple of "Results from people in your social circle":


Click on the link and it will show you more from your circle.

If you are using Google.com and open up the search options in the side bar to the left of your results there is a "Social" option that will do exactly the same thing.


Want to know who is in your social circle? Head straight to your Google dashboard at http://www.google.com/dashboard and scroll down to Social Circle. This not only tells you which of your social networks Google is using but also lists who.


Any social networks that you have mentioned in, for example, your Blogger profile or your general Google profile such as Twitter will be included as will contacts in Google Reader, Google Buzz, Google Contacts and Picasa. In addition to your direct connections Google also searches the content of secondary connections that are publicly associated with your direct connections. The A-Z reveals all!

Until now the Social option has been kept separate but Google has started integrating all of these results with the rest of  your search starting with Google.com (see Official Google Blog: An update to Google Social Search
http://googleblog.blogspot.com/2011/02/update-to-google-social-search.html). So if you are using Google.com and are logged into your Google account you will now start seeing results automatically from your social circle.


Underneath the social search entry it will say something like "Joe Bloggs shared this on Twitter". Hover over this and it tell you how you are connected.

I frequently use social media to search for information and advice but I much prefer to choose when and how to do it. I do not always want Google unilaterally deciding to add opinions from all and sundry in my network. Remember, this is not just the direct connections that you have chosen to make but others who are connected to them and over which you have no control. The only way you can shut this off completely is to log out of your Google account. I will reserve final judgement until I have used it more but after only 2 hours of experimentation I am already finding that some searches are dominated by results from Twitter, Google Reader and Flickr. The results look even more cluttered than before and I fear it gives Google even more opportunity to completely mess up the search.

Wednesday, 16 February 2011

Pancakes and Mash: mashed library unconference - places still available


There are 14 places remaining on Mashed Library "Pancakes and Mash", a library technology un-conference taking place at the University of Lincoln on 8 March 2011 (Shrove Tuesday).

You can claim one of the last 14 places at: http://mashlib.blogs.lincoln.ac.uk/booking/

Booking costs only £12 and will remain open until Monday, 28 February.

There's a growing list of delegates, drawn from a range of academic, public and commercial libraries at http://mashlib.blogs.lincoln.ac.uk/who/

The programme is taking shape and the organisers are confirming speakers/mashup 'experts' for the various workshop sessions. The keynote presentation will be from Gary Green (Technical Librarian, Surrey County Council Library Service) on: "Loud Library Voices: Campaigning, The Web, Journalists & The Offline World".

Further details and programme (in perpetual beta) at http://mashlib.blogs.lincoln.ac.uk/

For photos of previous events search Flickr.com for mashlib, haggisandmash, mashliv, mashspa, chipsandmash or middlemash just for starters!

Saturday, 12 February 2011

Google decides that coots are really lions

First of all let us make sure we all know the difference between lions and coots. As far as I can recall, lions are huge, snarly, growly, land animals that are liable to eat you if you cross their path. This appears to be confirmed by Wikipedia (http://en.wikipedia.org/wiki/Lions) but of course Wikipedia could be wrong. Coots are  medium sized water birds (http://en.wikipedia.org/wiki/Coots) and the worst that could befall you should you antagonise one is a severe pecking.

I was walking by the Thames in Caversham today and took several photos of the birds on the river. One was of two coots who were having what appeared to be a minor domestic or an argument over territory, but a friend suggested to me that what I saw was coot mating behaviour. What do you do in a situation such as this? You Google.

My search on coots mating behaviour came up with:


Where the [expletive deleted] did the lions come from?? I just do not understand how Google managed to replace coots with lions. One is a water bird with wings, feathers, and a beak and the other a large, aggressive land mammal with fur, claws and big teeth. But Google, yet again, has decided to go off and run its own search. (See my posting Oi! Google – you have seriously overstepped the mark http://www.rba.co.uk/wordpress/2011/01/03/oi-google-you-have-seriously-overstepped-the-mark/).

So did I get what I wanted by clicking on "Search instead for coots mating behaviour"? Yes I did, but Google still thinks I really want to search for lions and asks "Did you mean: lions mating behaviour". Google has totally lost the plot.


And the photo that started it all? That can be found on my Flickr account at http://www.flickr.com/photos/rbainfo/5438769506/. I think you will agree that coots are very different from lions (http://commons.wikimedia.org/wiki/File:P_l_Bleyenberghi.jpg)


Monday, 7 February 2011

Company registers: Slovenia and Croatia

I have added two more company registers - Slovenia and Croatia - to my Official Company Registers page at http://www.rba.co.uk/sources/registers.htm.

Slovenia

AJPES - Agency of the Republic of Slovenia for Public Legal Records and Related Services (http://www.ajpes.si/)
AJPES provides access to the Slovenian Business Register online (ePRS), and Annual Reports of companies, cooperatives and sole proprietors (JOLP) free of charge. ePRS currently has around 211,000 entries and includes companies (partnerships and corporations), sole proprietors, legal entities governed by private law, societies, natural persons performing registered or regulated activities, subsidiaries and other divisions of business entities and main offices of foreign business entities.

You can search on a range of criteria: name, identification number, tax number, address, activity, sector and legal form. The amount of information available on each business entity varies depending on its legal form but typically includes identification number, company name, tax number, details of representatives and founders, etc. Credit rating reports (eS.BON) are also available for a fee. The interface to the site is available in English, Slovene, German and Italian. The information is available in English and Slovene. Free registration is required to view the free documents.

Croatia

Biznet (http://www.biznet.hr/) is maintained by the Croatian Chamber of Commerce and provides information and statistics on the Croatian economy, and access to the Register of Business Entities. The Biznet home page is in Croation and the link to the Register is under Registar poslovnih subjekata. You can view data for a single company or a selected set of companies grouped according to certain criteria for example industry, company size. Available information includes identification number, short and long name, county, municipality, address, phone, fax, E-mail, company size, main activity, legal form, number of employees, membership of professional associations. The Register is available in both Croatian and English and is free of charge.

Many thanks to Tomaz Lajovic for the information on both services.

Although much of the data on both sites is available in English some of the detailed reports and guides are in the local languages. Google's Chrome automatically offers to translate texts and the Google toolbar for both Internet Explorer and Firefox has a translate option. If you are not allowed to install the toolbar then you can copy and paste the text or the URL of the page you wish to translate into Google's translation tools at http://www.google.co.uk/language_tools. The quality of Google's translations has improved significantly but you should still treat the results with caution. If the information is mission critical then find someone who is fluent in the language concerned to double check Google's effort.

Sunday, 6 February 2011

M&A Portal - free information from BvD

Bureau van Dijk's M&A portal  (http://www.mandaportal.com/) is a gateway to news, events, research and analysis on mergers and acquisitions. The information on the portal home page is free of charge and there is a free search option for tracking down deals and rumours contained in BvD's Zephyr database. The deals can be sorted by value, date or status.


Basic information is free of charge but you can purchase the full details from the Zephyr database. The cost of the reports varies depending on the amount and type of information available.

The portal also has M&A news feeds for 36 industry sectors. Registered users - registration is free - can select the sectors that interest them most and create a tailored news tab.

Sunday, 30 January 2011

AROUND: Google proximity search operator

Several people have already blogged about Google's AROUND proximity operator: Digital InspirationResearchBuzz, SearchReSearch and Phil Bradley to name just four. According to SearchReSearch the command has been available for 5-6 years, which begs the question "Why has no-one picked up on it before now?" Could it possibly be because the operator does not do what it says on the tin? Perish the thought and wash my brain out with soap and water for even considering such a thing. 

The AROUND command allows you to specify the maximum number of words that separate your search terms. The syntax is firstword AROUND(n) secondword. For example oil AROUND(2) production.

The reason I have not commented on AROUND so far is because - how can I put this politely - I am finding it difficult to find a search in which it is of practical value. I shall illustrate with just one of my searches, macular degeneration, but my experiences with other test and "real" searches are similar. When testing search features the relevance of the documents that appear on the first few pages of the results is more important than the number of  hits, especially as the latter are often guesstimates from Google and can vary enormously depending on which version of Google you use. Nevertheless, the numbers are interesting even if they only serve to confuse us further and I have included them with the screen shots. All of the following searches were run in Google.co.uk

Let's kick off with a very basic version of my test search: macular degeneration

Number of results: 7,340,000


The results are relevant and as usual Google appears to be listing first those pages where the terms appear next to one another. If we did want to be more precise and reduce the number we could search for the phrase: "macular degeneration".

Number of results: 1,690,000


Not surprisingly the number of results has been reduced significantly to 1,690,000.

Let us now say that my enquirer has come back with an amendment to the original request. They have been told that there are several forms of macular degeneration, for example macular disciform degeneration, and they want a selection of articles covering as many of them as possible. I have a biomedical background and can easily identify the relevant phrases and run separate searches on them, but what if I didn't have a clue where to start? I could use Google's asterisk (*) between my two terms to stand in for one or more words.

The strategy macular * degeneration gives us a massive 21,500,000 results, far more than our first basic search if the numbers are to be believed.


In just the first 6 results we have picked up vitelliform and disciform degeneration, and more are picked up in the subsequent 20-30 results.

Google's search tips say "If you include * within a query, it tells Google to try to treat the star as a placeholder for any unknown term(s) and then find the best matches." It is not clear from this whether the asterisk stands in for one or more terms. Adding more asterisks to the search does not alter the number of results, which in any case are only an estimate. We do, though, see very different content and now variations on our terms (for example macula)  are appearing emboldened in the page summaries.


We could try and force an exact match search by placing a plus sign before macular in our strategy, but let's try and keep this exercise simple.

Now for three searches using AROUND(n). Note that AROUND must be in capital letters, otherwise Google will treat it as just another search term. Specifying the number of separating words as 1, 2 and 3 gave me 1,710,000, 1,710,000 and 1,720,000 results respectively.


The results are very different from the searches incorporating the asterisk and AROUND(2) and AROUND(3) were identical. Also, it seems that with the AROUND operator Google is giving priority to documents where the terms are a phrase and not separated by any other words. It was only when I reached around 650 that I started to see phrases where my two terms were separated by one other word.

Using just AROUND without any number gave me 1,610,000 results that looked very similar to those obtained with AROUND(1).

Logically, one might think that macular AROUND(0) degeneration would be the same as a search on the phrase "macular degeneration". It isn't!


Not only are the number of results different (AROUND(0) comes back with 4, 250,000 compared with 1,690,000 from the phrase search) but so is the content.

Finally, I decided to follow Phil Bradley's lead and see what happens when I try and exclude the phrase from the AROUND(0) search: macular AROUND(0) degeneration -"macular degeneration". I got 43,000 results in which the terms seemed to appear anywhere within the document, in any order and separated by any number of other words.

In conclusion, despite what I said earlier I think AROUND does work but it is difficult to test because Google always seems to give priority to pages in which your terms appear as a phrase and not separated by any other words. Its effect is probably more obvious if you are dealing with a topic that would otherwise return a very small number of results. The ranking and sorting of the results changes significantly, though, when you use AROUND so it might be worth trying if you are fed up with seeing the same documents and sites again and again. In all of the test searches I have carried out so far I still prefer the asterisk, especially if I want to be able to identify expanded phrases quickly and easily. But, as the saying goes, your mileage may vary. Feedback on your own experiences, please.