Wednesday 27 February 2013

Top tips for finding research information

This week I was in Canterbury leading a workshop and discussion on Google and Google Scholar for finding research information. Although the emphasis was on Google we also covered other specialist tools designed to search for scientific and research information. We also had an interesting discussion on h-index, other citation indices and services such as ORCID and ResearchGate. The slides for the session are available on authorSTREAM (http://www.authorstream.com/Presentation/karenblakeman-1706478-google-scholar-research-information/), Slideshare (http://www.slideshare.net/KarenBlakeman/scholar-research-information) and temporarily at http://www.rba.co.uk/as/.

Anyone who has attended one of my workshops knows that I ask the group to propose at the end of the session their top tips. These are the Canterbury group's top 10 tips.

1. What's going on?
Try and find out what's going on behind the scenes and how the different search tools work. For example, Google and Google Scholar are quite different in the way they manage your search. Understanding how they operate means that you can adapt your search strategy accordingly and also manage your expectations; for example Google Scholar does not use the publishers' meta data so author and date search are unreliable.

2. Personalisation and 'unpersonalisation'
Google personalises your search based on past activity, who is in your social networks,and a whole host of other 'stuff'. You can quickly 'unpersonalise' your results by using a separate browser window that does not use cookies or your web history as part of the search algorithm.

If you use Chrome as your browser, open what is called an incognito window. In the top right hand corner of your screen there is an icon with three lines. Click on it and from the drop down menu select New incognito window. Alternatively press the Ctrl Shift N keys on your keyboard

If you use Firefox, from the menu at the top of the screen select Tools followed by Start Private Browsing.

In Internet Explorer select Tools followed by InPrivate Browsing. If you cannot see InPrivate under Tools try looking under the Safety option.

3. Advanced search commands
Use Google advanced commands  such as filetype: to focus on PDFs, presentations, spreadsheets containing data and site: to look for information on just one site or a range of sites such as UK government. Although the advanced search screen has boxes for you to fill in for the commands the file format or filetype option is limited. It does not include options for the newer Microsoft Office formats such as .pptx and xlsx. Use filetype: as part of your search strategy, for example:

nasa dark energy dark matter filetype:pptx

Google Scholar commands are more limited - see slide 28 of the presentation.

4. intext:
Google automatically looks for variations on your terms and sometimes omits words from your search if it thinks the number of results is too low. Prefixing a term with intext: tells Google that it must be included in your search and exactly as you have typed it in. For example:

UK public transport intext:biodiesel statistics

tells Google that biodiesel must be included in the search and exactly as typed in.

5. Reading Level
Use Reading level if Google is failing to return any research oriented documents for a query. Run the search and from the menu above the results select Search toolsAll results and then from the drop menu Reading level. Options for switching between basic, intermediate and advanced reading levels should then appear just above the results. Google does not give much away as to how it calculates the reading level and it has nothing to do with the reading age that publishers assign to publications. It seems to involve an analysis of sentence structure, the length of sentences, the length of the document and whether scientific or industry specific terminology appears in the page.

6. Date options
In Google web search, use the date options in the menus at the top of the results page to restrict your results to information that has been published within the last hour, day, week, month, year or your own date range. Click on Search tools, then Any time and select an option. This works best with news, discussion boards, and blogs and web sites that use blogging software  to generate pages but Google is getting better at identifying the correct date of a web page.

Google Scholar handles publication dates differently. On the results page you can select a date range from the menu on the left hand of the page. Alternatively, you can run a Google advanced search and enter your publication years. However, Google Scholar looks for publication years in the area of the document where the date is most likely to be. As a result it may identify a page number or part of an author's address as a year!

7. Google Scholar alerts
To be used with caution as the searches periodically stop without warning, and so have to be set up again, and they sometimes include documents that are several years old. Whatever your search you can set up an alert by selecting Create alert from the menu on the left hand side of the results page.

If the author has created a profile on Google Scholar, from their profile page you can follow new articles and/or new citations for that author. From past experience I warn you that this is not entirely reliable.


8. Metrics - top publicationsAlthough it claims to search all scholarly literature Google Scholar does not always cover all of the key journals in a subject area. There is no complete source list but there is a top publications for subjects and languages under the ‘Metrics’ link in the upper right hand corner of the Scholar home page.

9. Microsoft Academic Search - visualisations
Microsoft Academic Search (http://academic.research.microsoft.com/) is a direct competitor to Google Scholar. The site is sometimes slow to load and it often assigns authors to the wrong institution. Nevertheless, the visualisations such as the co-author and citation maps can be useful in identifying who else is working in a particular area of research. The visualisations can be accessed by clicking on the Citation Graph image to the left of the search results or author profile.



10. Mednar visual
Deep Web Technologies has developed in conjunction with various institutions a number of science and research specific portals, some of which are publicly available. The sources that they cover are different but they all have similar search and display options. Results are automatically ranked by relevance but this can be changed to date, title or author. In addition to the standard relevance ranked list of results the portals create clusters of topics on the left hand side of the screen. The topics include broad subject headings, authors, publications, publishers, and year of publication and are a useful tool for narrowing down a search. Some of the portals, such as Mednar (http://mednar.com/), offer a clickable 'visual' of topics and sub-topics.

Sunday 24 February 2013

Tweets from the past

Embarrassed by some of your first tweets from 2007? Wish you hadn't got involved in that drunken virtual brawl on Twitter last Christmas? There was a time when you could safely assume that those ramblings would be lost in the mists of Twitter's archive never to be seen again. A search on Twitter would only give the last few days worth of postings and Google no longer archives the whole of Twitter. True, the Library of Congress does keep copies of every single tweet for posterity but access is only allowed for serious research purposes. So far, the Library has received  about  400 inquiries but has not yet been able to provide access (http://blogs.loc.gov/loc/2013/01/update-on-the-twitter-archive-at-the-library-of-congress/). So you can breathe easily again? Unfortunately not.

There are commercial organisations such as Datasift (http://datasift.com/) and Gnip (http://gnip.com/) that charge an arm and a leg for analysing tweets and other social media comments, but the cost puts their services out of the reach of the casual searcher. You may find, though, that your forthright hashtagged tweets at a conference have been recorded for all to see free of charge (Sharing (or Over-Sharing?) at #ILI2012, http://ukwebfocus.wordpress.com/2012/11/02/sharing-or-over-sharing-at-ili2012/). And Twitter, itself, is finally providing access to historical tweets.

You can now download your entire collection. Go to your Twitter home page, click on the cog wheel in the upper right hand corner and select settings.


At the bottom of the Settings page is a link to request your archive. You should receive an email a few minutes later with a download link. The file is zipped and once you have unpacked it you can browse your tweets by year and month or search the archive using keywords or hashtags.






I have not been able to work out how often you are allowed to download your archive and, rather annoyingly, there is no top-up option.

Twitter also runs searches on its entire archive - sort of. There is no obvious date option at the moment, not even under advanced search, so it is appears to be all or nothing, and it does not give you everything straightaway. I thought I would have a look at the tweets on Internet Librarian International 2009, hashtag #ili2009, and was surprised that there seemed to be so few. I scrolled down to the bottom of the results and saw "You’ve reached the end of the Top Tweets for #ili2009" with a link to "View all tweets". Twitter then loaded the remaining tweets as I continued to scroll down the page. About Top Tweets Twitter says:

"We've built an algorithm that finds the Tweets that have caught the attention of other users. Top Tweets will refresh automatically and are surfaced for popularly-retweeted subjects based on this algorithm. We do not hand-select Top Tweets."

There are also links at the top of the results page that enable you to view Top, All, and tweets from just 'People you follow'.


There are in fact advanced search commands that can be used to include a date range in your search (see https://support.twitter.com/articles/71577 for details). Changing my search to #ili2009 since:2009-10-01 until:2009-10-31 did seem to work. I am not convinced, though, that Twitter is giving me everything, even when I choose 'All'. It's a start and long overdue, but I'm not going to abandon my own archiving strategies just yet.

Tuesday 5 February 2013

Medicine search on Google

In November of last year Google announced that it was going to start showing a knowledge graph for searches on medicines. (Look up medications more quickly and easily on Google, http://insidesearch.blogspot.co.uk/2012/11/look-up-medications-more-quickly-and.html). I am now seeing it in my search results but only on Google.com.

When I search on ibuprofen Google now gives me some key facts on the drug in a box to the right of the standard web results. The information includes indications for use, side effects, brand names, contraindications and other drugs that people also searched for. The sources it uses are the National Library of Medicine, US FDA, DailyMed and and Micromedex.


Ibuprofen is the generic name for this painkiller and is one of the names under which it is sold in the UK and many other countries. Searching on the brand name Nurofen, which is not available in the US, brings up web search results with shopping options at the top. There is no knowledge graph this time.

I played around with a few other brand names and found that if it is on sale in the US, for example Motrin, Google is able to identify the active ingredient.

So Google's new medicine search is US-centric: US brand names and US sources of information. It will be interesting to see if and how they roll it out to other countries. Meanwhile, for those of in the UK NHS Choices provides better and more detailed information on medicines at http://www.nhs.uk/medicine-guides/, and if you are interested in a drug's physical or chemical properties Chemspider (http://www.chemspider.com/) is a good starting point.

Already appearing in UK Google results is the related medical conditions feature. Type in a symptom and Google lists possible related conditions at the top of the page.


If you are using Google.co.uk or are based in the UK clicking on any of the conditions in the list brings up content that is UK focused. It will be interesting to see if they do the same with the medicines knowledge graph.

Monday 4 February 2013

Google Scholar author fail

Eight months after setting up my Google Scholar author profile and "claiming" my papers I have received my first alert. If you only use Google Scholar (http://scholar.google.com/) to search for papers you may not be aware that if you have published papers you can set up a Google Scholar author profile and add those papers to your profile. Google then creates a page showing a graph of when and how often your papers were cited and generates an H-index and i10-index for you.


This only covers the papers that Google Scholar has in its database and there are serious gaps in its coverage for some sectors. On the other hand, it does sometimes include articles, web sites and blog postings that are not peer reviewed in the conventional way. This can be a good thing because it may pick up some very useful grey literature. It can be a bad thing because it is possible to fool Scholar into adding a paper of dubious quality by mimicking the structure of an academic paper - title and author names in large font, affiliation, abstract, keywords, list of references etc.

Another feature of Scholar is that you can create alerts for keyword searches, new papers by an author or new citations to their articles. Needless to say I have set up alerts on my own name! Sadly, until last week I had received nothing so had to assume that no-one was interested in or citing my papers. Or perhaps the alerts do not work? Whatever the reason, I was delighted that at last someone had mentioned me in some way in an article. Clicking through to the item, though, led me to Katie Fraser's blog and "Communicating with postgraduate research students: some themes from the library literature" (http://www.chuukaku.com/blog/2013/01/communication-with-pgr.html). Was I mentioned or cited in the posting? No, but my own blog was listed in her blogroll to the left of the article.

Having got over the disappointment I turned my attention to working out why Scholar had picked up this particular post. Why wasn't I receiving alerts every time Katie updated her blog? The answer appears to be at the end of the posting in question: Katie has provided a list of references.


Another factor, I thought, might be that Katie has an author profile and claimed her papers but I could not see it anywhere in her profile.


On further investigation, and unfortunately for Katie, Google Scholar is unaware that she is the author of this article. It appears that it is someone called MA Lib.


This was confirmed when I clicked on the 'Cite' option. This presents you with formatted citations that you can cut and paste into an article or import into a bibliography manager. The author is definitely MA Lib.


Google Scholar has failed to recognise Katie Fraser as the author and has decided that the MA Lib link in the side menu of her blog is a person's name. There are many similar examples and it is well known that Scholar is unreliable when it comes to identifying authors. Peter Jacso has written several articles detailing Scholar's shortcomings in this area. (1, 2, 3, 4). Many of his articles are available as pre-prints (5)

What this means for Katie is that although Google Scholar believes her blog posting (6) is worthy of inclusion in its database it is not listed in her author profile and does not contribute towards her h or i10-index. And in case you are wondering, yes I have appended references to this posting to see if Google regards it as scholarly literature and adds it to the Scholar database

Update: Katie Fraser has now "claimed" the posting for her profile but the Google Scholar database has not yet been updated to reflect this.

References

(1) Jacsó, Péter. "Metadata mega mess in Google Scholar." Online Information Review 34.1 (2010): 175-191.

(2) Jacsó, Péter. Newswire Analysis: Google Scholar's Ghost Authors, Lost Authors, and Other Problems  [Online] 24 September 2009 [Accessed 4 February 2013.] http://www.libraryjournal.com/article/CA6698580.html

(3) Jacsó, Péter. "Google Scholar Author Citation Tracker: is it too little, too late? "Online Information Review 36.1 (2012): 126-141.

(4) Jacsó, Péter. "Using Google Scholar for journal impact factors and the h-index in nationwide publishing assessments in academia–siren songs and air-raid sirens." Online Information Review 36.3 (2012): 462-478.

(5) Jacso - Savvy Searching Columns, Online Information Review http://www2.hawaii.edu/~jacso/savvy-mcb.htm [Accessed 4 February 2013]

(6) Lib, M. A. "www.chuukaku.com."