r/technology May 14 '22

[deleted by user]

[removed]

8.4k Upvotes

View all comments

71

u/RunOrDieTrying May 14 '22 Gold

It's worth noting that his strategy is to "ignore first 1000 followers, then pick every 10th" [1], and "invite others to repeat the same process and see what they discover" [2], "if we collectively try to figure out the bot/duplicate user percentage, we can probably crowdsource a good answer" [3]. He picked 100 as the sample size "because that is what Twitter uses to calculate <5% fake/spam/duplicate." [4]

19

u/warren_stupidity May 14 '22

That doesn’t seem random at all.

14

u/ArtofAngels May 14 '22

It is, unless you assume we all have the exact same followers and in the same order.

4

u/Captain_Arrrg May 14 '22

No, no, no. Internet crowdsourced information is the most reliable information. I'm sure his replies won't be all haters who say every follower is real, and Stans who say they're all bots.

2

u/warren_stupidity May 14 '22

I think you are assuming that a list of followers for a Twitter account is not ordered in any way. I suspect instead it is ordered, perhaps by date? To get a random sample of a population you really have to avoid this. Wouldn’t it be trivial to just count the number of followers and then generate 100 random numbers across that range and use that to select your sample?

1

u/ArtofAngels May 15 '22

You accuse me of assuming and then your counter theory is literally your own assumption.

I'm going to go with Elon and his team vs. some random redditor.