Information available to the researcher was originally created by someone with their own resources and money; it has therefore been paid for. Some data owners realise that after their initial use, the data findings may be of little value, and so results are offered free of charge. This is sometimes as a public service or sometimes as a public relations gesture. In other cases, owners
of information recognise that they have a clear asset and it would be unusual for them to give it away completely free of charge. Further sales take place in many ways; data can be bought directly from the data owner or through another company.
In the online business information segment, many free company information, government information, directory, and news sources have eroded the share taken by fee-based services in a movement that has been called ‘open data’. Open data is defined as ‘data that can be republished without fear of copyright restrictions’. It is particularly evident in government data sources such as that found at http://www.data.gov.uk in Britain and http://www.data.gov from the USA. These data owners are aware that they have so much valuable information that they could not possibly exploit it beyond the ways in which they already use it, and therefore invite third parties to find further uses. One commercial example is to offer smartphone apps with travel details, all derived from government transportation data.
An aggregator collects information from many different sources and might charge different rates for the content of its different databases to reflect the costs faced. Another may charge the same rate, regardless of the source, to create an easy-to-understand pricing structure for clients. Subscriptions can make some information appear to be free but a ‘pay-as-you-go’ option adds to the complexity. Paradoxically, this situation can mean that it is more expensive to buy directly from the company that collected the data to begin with. One thing is certain—the occasional user has reason to be confused and reluctant to invest in expensive services.
David Mort of IRN Research says: ‘The free sources are often used for basic information, or used at
the start of an information or research process, with more detailed content obtained from the fee-paid services’.
Compiled by Nigel Bradley 2012.
Sources:
Mort, D. (2003) European online revenues on the rise, Research Information special report, http://www.researchinformation. info/special2003overview.html
Plosker, G. (2004) Making money as an aggregator, Online, March/April, 28(2), http://www.infotoday.com Temperton, J. (2011) How to explore opendata. Computer Active, 23 June, pp. 52–53, 55.
Questions
1 Why are some data sets free or of low cost?
2 How have pricing structures changed in recent years?
3 Why are governments happy for anyone to take their data free of charge?
Part 2 Data collection
88
of a marketplace that is not possible from viewing on-screen material. There are several
important libraries in each country that off er a range of services. In the UK, many such archives
are in London. They include the British Library (http://www.bl.uk) and government resources
such as: the Offi ce for National Statistics Service (http://www.statistics.gov.uk), Business Link
(http://www.businesslink.gov.uk), City Business Library (http://www.cityofl ondon.gov.uk), and
Westminster Central Reference Library (http://www.westminster.gov.uk). There are also many
specialist libraries run by industry bodies, usually located at trade associations.
Two main means of location should be considered: both human searches and computer
searches. Each method may complement the other and may also provide duplicate
information. Researchers diff er in their willingness to initiate their studies using one or the
other. For example, some people prefer to avoid making contact with other people until
documents have been inspected. Others prefer to ask someone to help guide them through
the masses of information available. When faced with many sources of information, people
employ mechanisms such as being selective, ignoring information, and also asking for help
from anyone who may have carried out similar searches before.
The amount of data available from secondary sources is enormous and, as each day
passes, more is added. For this reason, the desk researcher needs to set limits on the various
parameters before research takes place. This is sometimes known as ‘scoping’ or setting the
scope. There are limits that need to be set on:
• Time spent, cost expended
• Number of sources searched
Liaise with client
Submit regular progress reports Locate people
and documents
Evaluate sources and data
Gain access and
capture data Plan human and
documentary
searches together.
Set limits on time/cost/other
Figure 3.5 The desk research process
Chapter 3 Secondary data
89
• Language to use (e.g. English only)
• Geographical parameters (e.g. UK only)
• Historical parameters (e.g. one year old, up to fi ve years old)
• Format of data (e.g. bound report, online, on disk)
• Methodology used (e.g. quant or qual).
The researcher must attempt to impose some form of sampling on the documentation available. In this case, the sample may be an indication that further investigation is needed
in a particular direction. It is impossible to search for the locations of all documents and it is therefore impossible to gain access to all documents. It is feasible to divide the two activities and allocate time (and therefore cost) to each. Once the procedures begin, the time spent on identifi cation and time spent on accessing diff erent resources should be monitored.
Human searches
By human searches, we are referring to the idea of asking someone for directions to sources; it may be that they have the information themselves. This includes visits to general and specialist libraries. The ways to contact a human, in order of effi ciency, are by email, by phone, and in person. We are not contacting the human to carry out a qualitative depth interview, or to administer a fully structured questionnaire; we are looking to glean facts or sources for other facts. The person identifi ed will act as a ‘guide’ to show the way, and also to give some interesting facts. The important thing is to fi nd the right human to ask—someone who has already had a need for the information you seek—but who would that be? A journalist? A competitor?
A scholar? Does a librarian know? Find the right person and you will save hours of searching archives—that person will be your signpost. Think of the saying ‘A wise man learns from experience’, and then consider this addition, ‘A very wise man learns from someone else’s experience’.
Paradoxically, direct contact with individuals who have knowledge of a particular fi eld comes from looking at published sources. Existing sources are used to identify potential contacts, and expert interviewing then permits the researcher to identify fruitful sources, thereby saving time and money. Initially, it is instinctive for us to contact known people, but the desk researcher must not be afraid to contact strangers. Indeed, Flynn (2005) found, in
a study into communication, that university reference librarians are more likely to contact
‘a slight pre-existing acquaintance’ when seeking assistance via email. But Flynn believes that email has made them more likely to contact unacquainted or loosely acquainted peers. The important thing is to plan human searches by identifying relevant people, then their names and contact details, before starting communications. The actual communications should not
be as ambitious as a structured questionnaire nor as open as a topic guide; they should enable the ‘respondent’ to cooperate with ease.
Computer searches
Even when faced with the computer, it cannot be denied that human judgement comes into play by acting as a fi lter to reject, or to include, or even to modify, secondary data to make it useful for decision-making.
Part 2 Data collection
90
The Internet is a series of ‘interconnected networks’ and most of the content of these
networks is available as web pages or fi les that can be accessed using a standard browser. Desk
research will inevitably start with the Internet, so it is important to be familiar with some of the
devices available (see Figure 3.6).
Directories provide an ordered structure to the many websites in the world. A long standing
automated example is Yahoo (http://dir.yahoo.com/). This is a logical organisation of data on
the web and can be browsed or searched. The famous Open Directory is created by human
editors and can be found at http://dmoz.org/. Directories are useful because they show similar
services (competitors) together in context.
Search engines provide users with a service of locating and retrieving information from
documents located on the Internet. Search engines can search all media, whether text, sound,
or images. Search engines diff er from each other, so if one does not work for a sector, the
user can simply try another (see Table 3.8). The Search Engine Watch site (http://www.
searchenginewatch.com) gives information on the diff erent search engines on off er, showing
their strengths.
Metasearch engines will perform a simultaneous search on several engines: for example,
http://www.search.com/ or http://www.metacrawler.com.
YAHOO
ALTA VISTA
ODP BING
Others Images
Groups Web
SEARCH ENGINES
Figure 3.6 Popular search engines
Chapter 3 Secondary data
91
Blog search engines will search web-based diaries written by private individuals or companies: for example, http://www.blogsearchengine.com or http://www.google.com/ blogsearch.
A slightly diff erent device to the engine is an agent: the search agent allows users to search for information taking specifi c needs into account: it is a sort of intelligent search engine. This may adjust itself automatically, depending on previous actions carried out by the user, or the user can specify preferences.
Traditionally Boolean logic operators help in searching databases. Internet searches have become more user friendly and Boolean logic is largely hidden from the user. Here are
a few points for users who may need these tools. The three main Boolean logic operators have three outcomes. One widens the search, one narrows the search, and the last one will exclude items. Respectively, they are the words: or;and;not. For example, if the words ‘farmers
or doctors’ are inserted into a search engine query, the search engine will fi nd documents where either word (‘farmers’ or ‘doctors’) appears. This will fi nd documents that only feature the word ‘doctors’. It will also fi nd documents that only feature the word ‘farmers’. If we use the word ‘and’ (or the sign +) with the words ‘farmers’ and ‘doctors’ and insert that into a search engine, then the engine will fi nd documents where both words (farmers and doctors) appear. It will not fi nd documents that
only feature the word ‘doctors’. We have eff ectively narrowed the search.
If we use the word ‘not’ (or the sign −) with the words ‘farmers not doctors’
and insert that into a search engine, then the engine will fi nd documents where the word ‘farmers’ appears, but
not ‘doctors’. It will not fi nd documents
for the word after ‘not’. Again, we have eff ectively narrowed the search. The sign * is known as a wildcard. It can
be used as a clipping. In summary, to broaden your search, use + and also −
to narrow your search.
Diff erent types of search device
Type Description Example
Search engine Searches archives of images,
sounds or text on a computer network
http://www.google.com
Blog search engine
Searches web-based diaries written by anyone
http://www.blogsearchengine.com http://www.google.com/blogsearch
Metasearch engines
Seeks results from several search engines simultaneously
http://www.savvysearch.com http://www.metacrawler.com
Table
3.8
Part 2 Data collection
92
Any thesaurus will provide similar words that can be used in searches. Robot language
translators can convert these into other languages (see Table 3.9).
Ask your question as a question—for example, What is biotechnology?—and place it in
inverted commas (quotation marks), to become: ‘What is biotechnology?’. This will search
for the phrase intact, and if it appears in a page, the answer will probably be somewhere
Common mistakes