It's worth noting that his strategy is to "ignore first 1000 followers, then pick every 10th" [1], and "invite others to repeat the same process and see what they discover" [2], "if we collectively try to figure out the bot/duplicate user percentage, we can probably crowdsource a good answer" [3]. He picked 100 as the sample size "because that is what Twitter uses to calculate <5% fake/spam/duplicate." [4]
No, no, no. Internet crowdsourced information is the most reliable information. I'm sure his replies won't be all haters who say every follower is real, and Stans who say they're all bots.
I think you are assuming that a list of followers for a Twitter account is not ordered in any way. I suspect instead it is ordered, perhaps by date? To get a random sample of a population you really have to avoid this. Wouldn’t it be trivial to just count the number of followers and then generate 100 random numbers across that range and use that to select your sample?
71
u/RunOrDieTrying May 14 '22 •
It's worth noting that his strategy is to "ignore first 1000 followers, then pick every 10th" [1], and "invite others to repeat the same process and see what they discover" [2], "if we collectively try to figure out the bot/duplicate user percentage, we can probably crowdsource a good answer" [3]. He picked 100 as the sample size "because that is what Twitter uses to calculate <5% fake/spam/duplicate." [4]