Skip to Main Content
It is important to find all the relevant keywords for the topic to ensure the search is comprehensive by identifying:
- different spellings, tenses and word variants of keywords
- related concepts
- names of people or authors associated with these ideas
There are many ways to locate these terms, including
- recommended readings, textbooks and other review articles that provide an overview of the field of research
- dictionaries, thesauri, handbooks and encyclopedias that provide definitions and general information about topics.
- database thesauri or subject headings that tell you which terms are used in the databases and professional literature.
- text mining tools that allow you to analyse large amounts of text or information and identify commonly used terms in the field.
The process of searching will also help identify more terms that you should be adding to your list.
Comprehesive vs precise
There needs to be a balance in searching between making the search comprehensive enough to encompass everything on the topic and precise enough to only capture those results that are specifically relevant.
Both approaches have advantages and disadvantages
Type of Search
|Broad search finds everything on topic
||Specific to topic so results are more relevant
Lessens chance of missing relevant papers
|Easier to discard irrelevant results
|Too much information to process easily
||Not enough results
|Many irrelevant results to discard
||Many relevant papers missed as topic too narrow
Increasing the comprehensiveness (or sensitivity) of a search will reduce its precision and will retrieve more non-relevant articles.
Using text mining to identify keywords
Text mining will help identify how often terms come up in the literature and help identify other related terms and subject headings that have not been considered or thought of as being useful.
Text mining is a process used to look at large amounts of text and find relationships in the results by using computer programs designed to extract and analyse this data.
It is used to categorise information and identify trends and patterns which can be done across large documents or multiple sources (or both).
1. Mining for terms
Use these tools to find alternate search terms that are related by identifying how often keywords appear and which other terms appear with them by number of occurrences.
Ovid Reminer Tool
Upload a file of Medline results saved as a csv or excel file to analyse for term occurrence. Available fields include -Author(s), Title, Source, Subject Headings, Abstract and Publication Type. Note that Embase, PsycINFO and CAB Abstracts are not available through JCU so cannot be used.
Enter search terms and click Start PubReminer button. Note boolean search connectors shoudl not be included in the search.
Yale MeSH Analyzer
Enter a list of PubMed (PMID) numbers to see a list of MeSH headings and author keywords. Results can be downloaded to Excel for analysis.
2. Mine within the text
Locate terms within blocks of text (e.g. an article) to find word patterns and frequency. More frequent words are more likely to be relevant to the topic.
Enter a block of text or a file and the Termine system will analyse the text to highlight key terms.
WriteWord Phrase frequency
Enter a block of text to count the frequency usage of phrases in the work. A single word frequency counter is also linked from this page.
MeSH on Demand
MeSH on Demand identifies MeSH® terms in the submitted text (abstract or manuscript). MeSH on Demand also lists PubMed similar articles relevant to the submitted text.
JSTOR Text Analyser
Upload a document into the JSTOR Text Analyser and it will analyse the text to find the key topics and terms used. It then priortises these terms and uses the ones it deems most important to find similar content in JSTOR. Refine the search results, by adding, removing or adjusting the priority of terms.
3. Use visualising tools
These tools create word clouds related to search terms
Coremine presents search results as a graphic network that describes relationships discovered through text-mining for a huge range of biomedical terms, including Medical Subject Heading, Gene Ontology, Pharmaceutical, Herbal medicine and Chemical as well as Gene and Protein terms.
PubVenn takes a complex PubMed search and divides it into its constituent parts. It then searches those disparate parts and shows the piles of citations they return using a proportionally-sized venn diagram to show the amount published on any given topic and the overlap between different concepts.
Vizit is a visual bibliographic search tool that presents results as a visual network of related terms and then links to the relevant literature.
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents into thematic categories and displays in folders, circles or foam trees to visualise the results.
These are just some of the tools available for mining text that are available on the web. There is also both commercial and free software that can be downloaded and installed. The web pages linked below have lists of yet more tools.
EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews
Paynter R., Bañez L. L., Berliner E., Erinoff, E., Lege-Matsuura, J., Potter, S., & Uhl, S. (2016). EPC methods: An exploration of the use of text-mining software in systematic reviews. Retrieved from: https://www.ncbi.nlm.nih.gov/books/NBK362044/
We acknowledge the Australian Aboriginal and Torres Strait Islander peoples as the first inhabitants of the nation and acknowledge Traditional Owners of the lands where our staff and students, live, learn and work.
Except where otherwise noted, this work is licensed under a Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 International License. Content from this Guide should be attributed to James Cook University Library. This does not apply to images, third party material (seek permission from the original owner) or any logos or insignia belonging to JCU or other bodies, which remain All Rights Reserved.