SELECTING USEFUL KEY WORDS AND PHRASES
Edward G. Gbur, Agricultural Statistics Lab, University of Arkansas, Fayetteville, AR
Bruce E. Trumbo, Department of Statistics, California State University, Hayward, CA
As the number of papers and journals in the statistical sciences grows, a decreasing proportion of papers is discovered by browsers of printed journals, and an increasing proportion is retrieved via bibliographic searches. Perhaps users of CIS/ED are uniquely qualified to understand the importance of selecting useful key words and phrases (KWs).
As an author, you should remember that the main purpose of KWs is to complement title and author information in helping others to retrieve your paper from a printed or computer-searchable bibliography. Try to consider all types of potential readers, and how they might look for papers such as yours. Following the ten suggestions below will increase the chances that your KWs will be used by bibliographers, and useful to your intended audience.
- Use simple, specific noun phrases. Examples: `Variance estimator', not `Estimate the variance'; `Nonparametric test', not just `Nonparametric'; `Generalized exponential used in epidemiology' is better split into `Generalized exponential distribution' and `Epidemiology'; `Regression' alone may retrieve a dauntingly long list of papers; `Statistic' unmodified is next to useless.
- Do not repeat information already provided in the title. For example, `Weibull distribution' might be a good KW, but not if the title is `On the estimation of lifetimes when the underlying distribution is Weibull'.
- When possible, avoid plurals and posessives. For example, `Chebyshev polynomial' instead of `Chebyshev's polynomials'. But, some phrases are usually or inherently plural, such as `Order statistics' and `Multiple comparisons'.
- Try to avoid unnecessary prepositions, especially `of' and `in', except in standard phrases. Examples: `Data quality', not `Quality of data'; `Reliability', not `Theory of reliability'. But `Analysis of variance', `Goodness of fit', and `Errors-in-variables' are standard phrases not to be tampered with.
- Avoid acronyms. `ANOVA', `ARIMA', `SPRT', etc. are widely recognized among statisticians, but an acronym may fall out of favor over the years, and may be unknown to those in related fields or puzzling to beginners. In fact, if you feel that you must use an acronym in your title, try to give the full form as a KW.
- Spell out Greek letters and try to avoid mathematical symbols. There may be a very few ideas so well known by their mathematical notation as to be exceptions. But then try to provide an alternate non-symbolic KW in addition. Remember that it is usually not practical to do a computer search for mathematical notation.
- Include names of people only if they are truly part of established terminology, as in `Pitman efficiency', `Weibull distribution', `Polya urn model', etc. Avoid making a `Smith-Chen-Rao estimator' into a KW if this just refers to something from a paper in your reference section. Do not refer to a result of your own in this way unless the terminology is truly well-established; the paper is already retrievable by your name(s) because you are listed as author(s).
- Where applicable, include crucial mathematical or computer techniques (such as `Generating function', `Chebyshev inequality', `Monte Carlo', etc. used to derive the results), and statistical philosophy or approach (such as `Likelihood ratio test', `Bayes estimator', `Fiducial inference', or `Empirical Bayes').
- Where appropriate, note areas of applications (such as `Tumor growth', or `Labor force') and special attributes of the paper (such as a real-world `Dataset' which illustrates the present method and might be used to illustrate others, `Tables' that are of value beyond the exposition of the paper, or a useful `Computer algorithm' provided in the paper).
- Be especially alert to include alternate terminology. If a concept is, or has been, known by several terminologies, include any KW that might help a user doing a search across a span of time or from outside your subspecialty. This is perhaps the most difficult and the most important of the guidelines. Its implementation requires a wider view, and often benefits from consultation with knowledgeable colleagues.
This article is drawn from an article about keywords (also available on the Web) by the same authors that appeared in The American Statistician, February 1995, Vol. 49, pages 29-33.
Common keywords used in CIS | HomeLast updated: 02/13/2004    [Send a question or comment]