`In correspondence, Roger Newson (r.newson@imperial.ac.uk) described
``to me a method for computing a CI for a median with weighted,
``clustered, data. With his permission, I am posting his description
``and the do-file which accompanied it. (I have slightly edited both to
``delete some extraneous material.) The do file utilizes Roger's
``program -cendif- (part of -somersd- available from SSC) and his
``commands -keyby- ,-xcontract-, and -expgen- .
`
-Steve
From Roger Newson:

`I would use the following method to make cendif calculate confidence
``intervals for a median (possibly with clusters and/or weights). This
``method involves the concept of between-scenario rank statistics,
``exemplified by the Gini coefficient and the population attributable
``risk. (See Subsection 5.3 of Newson (2006).)
`

`I would use my own expgen package, downloadable from SSC, to
``duplicate every observation in the sample, replacing each observation
``with a pair of identical observations, and generating a copy
``identifier variable named scenario, equal to 1 for the first member
``of a pair and equal to 2 for the second member of a pair. I would
``then set the value of the variable whose median I wanted to know to
``zero in all the second members of all pairs (where scenario==2). I
``would then use cendif, with the options
`
by(scenario) funtype(vonmises)

`and with a cluster() option putting both members of each pair in the
``same cluster, to calculate a median difference between the subsamples
``with scenario==1 and scenario==2. This should definitely produce a
``valid confidence interval, with or without weights, for the median
``difference between individuals sampled from the world as we know it
``(Scenario 1) and a hypothetical fantasy world in which the variable
``whose median we are estimating is always zero (Scenario 2). This
``median difference is (of course) simply the median of the variable
``whose median we wanted to know.
`
To illustrate my method I have run an example do-file, appended.
I have set options
transf(iden) tdist

`which, based on my own simulations so far, are usually a good idea
``when estimating median differences. I also set options
`
by(scenario) funtype(vonmises)

`and the appropriate cluster options. To produce an "unclustered"
``estimate, I have used the option cluster(make), implying 1 cluster
``for each car model. (Therefore, each observation in the old auto data
``corresponds to a cluster of 2 observations in the duplicated
``dataset.) To produce a "clustered" estimate, I have used the option
``cluster(firm), where firm is a derived variable, containing the first
``word of make. This cluster variable implies that we are sampling
``firms from a population of firms, instead of sampling car models from
``a population of car models. For both the clustered and the
``unclustered case, I have produced an unweighted estimate, and also a
``weighted estimate, using pweights to weight non-American cars higher,
``and therefore producing a higher median mpg.
`
References
Newson R. Confidence intervals for rank statistics: Somers' D and
extensions. The Stata Journal 2006; 6(3): 309-334. Download
pre-publication draft from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm
**************************CODE BEGINS**************************
#delim ;
version 10.1;
*
Demo of confidence interval for median using cendif.

` This program uses the SSC packages somersd, keyby, xcontract, and
``expgen.
`*;
clear;
set memory 1m;
sysuse auto, clear;
keyby foreign make;
gene firm=word(make,1);
lab var firm "Firm";
gene pwt=foreign + 0.25*!foreign;
lab var pwt "Probability weight";
desc;
xcontract foreign pwt, list(,);
xcontract firm, list(,);
preserve;
expgen =2, copy(scenario);
replace mpg=0 if scenario==2;
*
Unclustered analyses
*;

`cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises) cluster
``(make);
``cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype
``(vonmises) cluster(make);
`*
Analyses clustered by firm
*;

`cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises) cluster
``(firm);
``cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype
``(vonmises) cluster(firm);
`restore;
exit;
***************************CODE ENDS***************************;
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/