Tips - R & Stata & SAS: How to get orthogonal polynomial coefficient/vector/codes
When we do "contrast {lvl #1 #2 #3}" for trend analyses using Stata or other software for unequally spaced levels/categories, we need the orthogonal polynomial coefficient (#1 #2 #3), which is hard to be find in books. We can get these coefficients using R, Stata, or SAS. My favorite software for this purpose is R. Below I show the examples using these different kinds of software. Note: Stata has an operator (p. for orthogonal polynomial in the level values) for unequally spaced levels, for example, "contrast p.lvl".
R
mostly I use R to get these coefficients:
>cntr<-poly(c(1,2,5,6),3)
>cntr
Stata
Step 1: create a dataset with one variable [lvl]:
.input lvl
1. 1
2. 2
3. 5
4. 6
5. end
Step 2-b: use 'orthpoly'
.orthpoly lvl, generate(cntr1 cntr2 cntr3) degree(3)
Now, in the dataset, you can find three new variables 'cntr1' for the orthogonal polynomial coefficients of degree 1 (linear), and 'cntr2' for the orthogonal polynomial coefficients of degree 2 (quadratic), and 'cntr3' for the orthogonal polynomial coefficients of degree 3 (cubic).
SAS
PROC IML;
lvl = {1 3 5 6}
cntrl=ORPOL(lvl);
PRINT cntrl;
QUIT;
Disclaimer: This blog site is intended solely for sharing of information. Comments are warmly welcome, but I make no warranties regarding the quality, content, completeness, suitability, adequacy, sequence, or accuracy of the information.
Monday, September 19, 2011
Friday, September 16, 2011
Tips: Stata - my first Stata program
capture program drop tabm
program tabm
version 12
syntax varlist [if][in],cell count column row se ci ///
cv percent proportion]
local varnum : word count `varlist'
local x : word 1 of `varlist'
forvalues i=2/`varnum' {
local y: word `i' of `varlist'
svy: tabulate `x' `y',`col' `cell' `se' `percent' ///
format(%5.1f)
}
end
.tabm sex race5grp diabetes,cell se percent
capture program drop tabm
program tabm
version 12
syntax varlist [if][in],cell count column row se ci ///
cv percent proportion]
local varnum : word count `varlist'
local x : word 1 of `varlist'
forvalues i=2/`varnum' {
local y: word `i' of `varlist'
svy: tabulate `x' `y',`col' `cell' `se' `percent' ///
format(%5.1f)
}
end
.tabm sex race5grp diabetes,cell se percent
why I get error message, when using 'margins' for complex sampling data
Tips - Stata: why I get error message, when using 'margins' for complex sampling data
When I use 'margins' for complex sampling data, after a logistic regression:
I've got an error message sometime:
"missing predicted values encountered within the estimation sample r(322)"
The answer is to include 'subpop' in the 'margins' command:
.margins diabetes, subpop(if suball==1) vce(unconditional) post
When I use 'margins' for complex sampling data, after a logistic regression:
. svy, subpop(if suball==1): logit arthritis i.diabetes c.age i.sex i.bmi4grp
. margins diabetes, vce(unconditional) post
. margins diabetes, vce(unconditional) post
I've got an error message sometime:
"missing predicted values encountered within the estimation sample r(322)"
The answer is to include 'subpop' in the 'margins' command:
.margins diabetes, subpop(if suball==1) vce(unconditional) post
HbA1c: what do the numbers really mean?
The Lancet, Volume 378, Issue 9796, Pages 1068 - 1069, 17 September 2011
The Comment by Shivani Misra and colleagues (April 30, p 1476)1 addresses the topic of changing the way glycated haemoglobin (HbA1c) is reported from the traditional percentage units (used in the Diabetes Control and Complications Trial [DCCT] and UK Prospective Diabetes Study [UKPDS]) to the International Federation of Clinical Chemistry's (IFCC's) mmol/mol units. This is an important communication. Unfortunately, the Comment contains both misleading and erroneous information.
The remark about “variations of between 3% and 14% being reported” is misleading. The paper cited refers to between-laboratory coefficients of variation obtained from old (1996) data, before implementation of method standardisation by the National Glycohemoglobin Standardization Program (NGSP). Virtually all current methods have coefficients of variation of 5% or less, with some less than 2%.2
Moreover, Misra and colleagues advise clinicians not to convert the IFCC mmol/mol results to DCCT-aligned percentage units and claim that “the DCCT-aligned results are now untraceable and cannot be linked… to the original reference measurement, making them effectively meaningless”. This statement is completely incorrect. An established master equation with documented stability, which describes a linear relation between IFCC and NGSP/DCCT units, permits traceability of DCCT results to the IFCC reference system, and allows direct conversion of numbers between the two systems.3 This is vital to allow health-care providers to compare a patient's HbA1c value to the large body of published outcome data that use DCCT-aligned results.
A third miscommunication is “One untimed… blood sample for diagnosis”. The guidelines4 recommend that, in the absence of unequivocal hyperglycaemia (an uncommon finding), HbA1c be confirmed by repeat testing. It is essential for the medical community to understand these changes in HbA1c clearly to avoid negatively affecting care of diabetic patients.
We declare that we have no conflicts of interest.
References
1 Misra S, Hancock M, Meeran K, Dornhorst A, Oliver NS. HbA1c: an old friend in new clothes. Lancet 2011; 377: 1476-1477. Full Text | PDF(46KB) | CrossRef | PubMed
2 College of American Pathologists. GH2-A glycohemoglobin participant summary, 2011. Northfield, IL: CAP, 2011.
3 Geistanger A, Arends S, Berding C, et al. Statistical methods for monitoring the relationship between the IFCC reference measurement procedure for hemoglobin A1c and the designated comparison methods in the United States, Japan, and Sweden. Clin Chem 2008; 54: 1379-1385. CrossRef | PubMed
4 International Expert Committee. International Expert Committee report on the role of the A1C assay in the diagnosis of diabetes. Diabetes Care 2009; 32: 1327-1334. CrossRef | PubMed
The Lancet, Volume 378, Issue 9796, Pages 1069 - 1070, 17 September 2011
HbA1c: what do the numbers really mean? — Authors' reply
We do not believe that we have misled readers. The stated coefficients of variation refer to figures before the National Glycohemoglobin Standardization Program (NGSP) was implemented and were quoted to illustrate the different coefficients of variation in existence at the time of the Diabetes Control and Complications Trial (DCCT). Furthermore, the next paragraph clearly states that “harmonisation of results to DCCT-based calibrants in the 1990s partly alleviated this variation”. Although effective, the NGSP did not provide a reference measurement system, which has been the underlying driving force behind the International Federation of Clinical Chemistry (IFCC) standardisation.
In quoting “the DCCT-aligned results are now untraceable and cannot be linked… to the original reference measurement, making them effectively meaningless”, Randie Little and David Sacks chose to omit the phrase “through successive calibrations”. This statement referred to the use of DCCT-calibrated analysers, which are not in any way linked to the IFCC reference system. This practice would generate untraceable results. The consensus statement1 clearly indicates that the IFCC reference represents the only valid anchor to standardisation. We acknowledge that the use of the IFCC-NGSP master equation does permit traceability to the IFCC reference system. However, there are some crucial limitations, which underpin our reluctance to encourage physicians to undertake this conversion routinely.
First, although a linear relation exists between the IFCC-standardised and DCCT-aligned results, the latter cannot be considered a “pure” HbA1c measurement.2 Now that a pure HbA1c standard exists, one must question the validity of continuing to report DCCT-aligned results. To suggest that comparisons to outcome data necessitate interconversion is, in our opinion, ill-considered since the master equation can equally convert targets into new units.
Second, the use of the master equation generates further uncertainty in the derived DCCT-aligned values.3 Irrespective of whether this is significant, should the use of an equation to derive values from a reference be considered as robust as a system in which an unbroken chain of calibrations links the reference to the designated comparison method?4
Third, in the UK, DCCT percentage units will cease to be reported from October, 2011. We therefore actively encourage clinicians to familiarise themselves with the new units now. This is a fundamental course of action to avoid confusion later, which would undoubtedly be detrimental to patients' care.
We accept that a single measurement is not proposed; however, Little and Sacks have misunderstood the message being conveyed. Since guidelines5 advise repeat testing of an abnormal result by the same method, a second HbA1c measurement in a patient with an interfering factor will simply duplicate the error. It is important for clinicians to understand the limitations of a test, no matter how many times it is repeated.
References
1 Hanas R, John G. 2010 consensus statement on the worldwide standardization of the hemoglobin A1C measurement. Diabetes Care 2010; 33: 1903-1904. CrossRef | PubMed
2 European Association for the Study of Diabetes. Report of the ADA/EASD/IDF Working Group of the HbA1c Assay. London, UK, 20 January 2004. http://www.ifcchba1c.net/files/2004_Diabetologia2004_46_R53_54.pdf. (accessed Aug 3, 2011).
3 Geistanger A, Arends S, Berding C, et al. Statistical methods for monitoring the relationship between the IFCC reference measurement procedure for hemoglobin A1c and the designated comparison methods in the US, Japan and Sweden. Clin Chem 2008; 54: 1379-1385. CrossRef | PubMed
4 Joint Committee for Guides in Metrology. International vocabulary of metrology—basic and general concepts and associated terms. 3rd edn. http://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2008.pdf. (accessed Aug 31, 2011).
5 WHO. Use of glycated haemoglobin (HbA1c) in the diagnosis of diabetes mellitus: abbreviated report of a WHO consultation. http://www.who.int/diabetes/publications/report-hba1c_2011.pdf. (accessed Aug 31, 2011).
a Imperial Healthcare NHS Trust, Charing Cross Hospital, London W6 8RF, UK
Thursday, September 15, 2011
Bariatric Surgery and Obesity and Diabetes – International Journal of Obesity, 09/2011
Tuesday, September 13, 2011
Poisson regression and related
Poisson regression and related
- Wikipedia. Poisson distribution, Poisson regression, Zero-inflated model, Negative binomial distribution, Exponential family, Count data
- Paul Allison (2012). Do we really need zero-inflated models?
- William Gould (2011). Use Poisson rather than regress; tell a friends
- Richard Williams (2016). Models for count outcomes (pdf)
Sunday, September 11, 2011
Multiple Imputation (MI)
Multiple Imputation (MI)
- Hippel (2019): How many imputations do you need?
- SAS (2016): Survey Data Imputation with RPOC SURVEYIMPUTE
- A very good classic reference by Donald B. Rubin (1996). Multiple Imputation After 18+ Years
- Formula for Combining Results across the Multipe Imputed Datasets.
- Sterne (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls
- Kenward (2007). Multiple imputation: current perspectives or full text here
- Graham (2007). How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory
- Books:
- Applied Missing Data Analysis, see Google book as well.
- Applied Survey Data Analysis is one of my favorite books of complex sampling data analysis with a whole chapter about multiple imputation (Chapter 11). The way of teaching is very intuitive to an epidemiologist without much high level math background.
- Paul Allison: Multiple Imputation for Missing Data: A Cautionary Tale
- Multiple Imputation Online is not updated for a while, but this website is still have some useful linkages. It mentioned a few software for MI analyses; it doesn't include Stata, which is an my current first choice.
- SSCC: Multiple Imputation in Stata
- UCLA: Multiple Imputation in Stata
- Stata: How can I combine results other than coefficients in e(b) with multiply imputed data?
- Stata: Multiple imputation in Stata (2010) by Bill Rising. Multiple imputation using Stata’s mi command by Yulia Marchenko
- UCLA: How can I get margins and marginsplot with multiply imputated data?
- UCLA: How can I get margins for a multiply imputed survey logit model?
- Blog: Can 'margins' be used after 'mi estimate' for complex samplied data with 'svy' in Stata?
- Multiple Imputation FAQ by Joe Schafer
- SAS: Multiple Imputation for Missing Data
Wednesday, September 07, 2011
Writing, Speaking, and Reading
Writing, Speaking, and Reading
- 7ESL: English As A Second Language
- Paul Brians. Common Errors in English Usage
- Roy Peter Clark. Fifty Writing Tools
- William Strunk Jr. The Elements of Style
- Find the right journal for your manuscript:
- BioSemantics: Journal/Author Name Estimator - this is a nice bio-related generic site might helps you find a right journal for your manuscript publication.
- Springer: Springer Journal Selector.
- BioMed Central: Journal Selector
- Elsevier: Match your abstract to a Elsevier journal
- Elsevier (2014): Publishing Connect Webinar
- Joe Schall. Effective Technical Writing in the Information Age
- George D. Gopen. The Science of Scientific Writing (pdf, ppt)
- Edward Livingston (2012). Publishing in the High-Profile Literature
- Cyberseminar of HSR&D
- The Purdue Online Writing Lag (OWL): Writing with Statistics.
- Shaun Usher. Letters of Note
- How to write personal statements for school
- Writing the Personal Statement - OWL
- Writing Personal Statements for Graduate School (pdf) - University of Washington
- How to Write a Great Statement of Purpose - Vince Gotera
- Personal Statements: General Advice - Dartmouth University
- U.S. Government Printing Office Style Manual
- Zotero is a free reference management system to help you collect, organize, cite, and share your research sources.
- Citing Sources by Duke University
- The Chicago Manual of Style
- Intranet only: CDC Style Guide and Brand Identity Standards
- Dictionary:
- Merriam-Webster online dicitonary.
- Dicitonary.com and Thesaurus.com Free Online Dictionary and Thesaurus of Synonyms and Antonyms.
- Stedman’s Online Medical dictionary.
- Dict.cn - Online English-Chinese Dictionary.
- nciku - Online English Chinese Dictionary, support handwriting (drawing).
- YouDao - Online Dictionary, support handwriting (drawing).
- Urban Dictionary - the dictionary you wrote.
- Google Online Dictionary - Online Multi-language Dictionary.
- Blog: Scientific Communication
- Blog: How to organizing a scientific meeting
- WHO (2005). Effective Media Communication during Public Health Emergencies
- Federal Plain Language Guidelines
- NCI: Making Data Talk: A Workbook
Thursday, September 01, 2011
Resampling and Monte Carlo Simulation
Resampling and Monte Carlo Simulation
- Wikipedia. Monte Carlo method
- Monte Carlo Simulation Tutorial
- Monte Carlo Simulation in Excel: A Practical Guide
- The Basics of Monte Carlo Simulation
- Owen (2013). Monte Carlo theory, methods and examples (pdf)
- Power. Markov Chains - a visual explanation
- Zhu: Markov Chain Monte Carlo (MCMC) for Computer Vision
- Stata: drawnorm
- SAS: RANDNORMAL function in SAS/IML or PROC SIMNORMAL in SAS/STAT, or macro %MVN
- R!: mvrnorm() function of MASS package
- Statements
- simulate
- MVN: drawnorm
- putexcel (pdf):
- postfile
- matrix mkmat (pdf) — Convert variables to matrix and vice versa
- Baum. Monte Carlo Simulation in Stata (2007), Simulation for estimation and testing (2013)
- UCLA. Running a simulation using Stata
- How to simulate multilevel/longitudinal data
- Programming and Post-Estimation
- Rick Wicklin.
- Simulating data from common univeriate distributions (from his book: Simulating Data with SAS)
- Eight tips to make your simulation run faster
- Simulation in SAS: The slow way or the BY way
- Simulate from the multinomial distribution in SAS
- Alternate ways to simulate multinomial data
- The best articles of 2013: Twelve posts from The DO Loop that merit a second look
- Turn off ODS when running simulations in SAS
- How to generate multiple samples from the multivariate normal distribution in SAS
- Procedures
- PROC MODEL
- MVN: RANDNORMAL function in SAS/IML or PROC SIMNORMAL in SAS/STAT, or macro %MVN
R!
- Robert. Introducing Monte Carlo Methods with R
- Functions
Subscribe to:
Posts (Atom)