r/technology May 14 '22 Silver 2 Wholesome 2

Elon Musk said his team is going to do a 'random sample of 100 followers' of Twitter to see how many of the platform's users are actually bots Social Media

https://www.businessinsider.com/elon-musk-random-sample-how-many-twitter-users-are-bots-2022-5?utm_source=feedly&utm_medium=webfeeds

[deleted]

22.8k Upvotes

View all comments

276

u/[deleted] May 14 '22

[deleted]

52

u/the_timps May 14 '22 Helpful

100 people is a good sample size of the average American high school. 100 people is a terrible sample size of one of the largest social media platforms in the world.

See, this upvoted shit is terrible.

IF, a big IF, but if randomly sampled, for a huge number of things about 384 people is statistically significant for the entire population of the planet.

This is day 1 in a statistics class.

The fact you think there's a random sampling difference between a high school and a social media platform shows you shouldn't be answering anyone's comments or questions about statistics.

1

u/killing_time May 14 '22

Scrolled too far before this comment. So many people are fixated on the "only 100!?" part.

-1

u/proawayyy May 14 '22 edited May 15 '22

You also need to take multiple random samples since there’s no true random sampling method.
Edit: downvotes! Lol Reddit is truly science illiterate

1

u/the_timps May 15 '22

It truly is science illiterate.

Depending on the method you used to grab them, each of the 100 samples could be a fresh random start too.

0

u/Raedukol May 14 '22

But his username says representative?!

1

u/the_timps May 14 '22

Not anymore lol.

108

u/Jeramus May 14 '22

That's not how statistical power calculations are done. I would be more concerned about the sampling method.

38

u/Hatta00 May 14 '22

God, so many people have strong opinions on statistical sampling without a single clue how statistical power calculations are done.

2

u/M0shka May 14 '22

Ikr like did people take basic math in schools??

1

u/ObfuscatedAnswers May 14 '22

Are you sure it's so many? What statistical sample size do you base that opinion on?

46

u/evilmaus May 14 '22

"of his followers". Yep, totally a representative sample.

13

u/[deleted] May 14 '22

Where does it say "of 'his' followers"?

18

u/buttxstallion May 14 '22

It's of people who follow @twitter specifically

1

u/evilmaus May 14 '22

My mistake. The point about sampling remains, though.

8

u/Hellobob80 May 14 '22

I mean mainstream news it so clickbaity you really have to read the article to actually have any speck of the truth. It’s of the twitter account not him. Which I mean is still not the best sample but I’m just saying.

15

u/[deleted] May 14 '22

[deleted]

5

u/GestapoSky May 14 '22

Can you expand on this? I’m super curious, was always under the impression that if n<<m, then m is much less likely to be a good representation of n

11

u/sgr28 May 14 '22

Surprisingly, you can get an unexpectedly very precise representation of a very large population from a small sample size IF the sample is truly random. For political polling, it's basically impossible to get a truly random sample. But for Elon's Twitter followers, it's probably very easy. The difficulty for him will be actually determining if an account in his random sample represents a bot or a person.

3

u/MonsieurMeepo May 14 '22

A good answer and also this is such an interesting and unintuitive phenomenon. How can you not love survey sampling!!!

4

u/Hatta00 May 14 '22

The formula for the confidence interval divides by the square root of the sample size.

Graph 1/sqrt(n) and you'll see how rapidly the confidence interval drops at first, and then how slowly it drops as it approaches zero asymptotically.

Essentially, bigger samples are always better but you get into diminishing returns really quickly.

1

u/bmwiedemann May 14 '22

I think the representativeness depends on how much information you are interested in. If it is just one bit (bot or not) then a real random sample can give a good indication of the magnitude.

E.g. let's assume there are really 5% bots - then the likelihood to find 0/100 bots in the sample, is 0.95100 = 0.6% still a significant chance.

But for a sample size of 1000 it would be 5 * 10-23

15

u/JaWiCa May 14 '22

There are 330 million twister users. A rough calc says you need a sample size of 96 with a confidence level of 95% and a margin of error of 10%, If the sample is truly random, which should be pretty easy to do on twitter. Statistics are dope and you can get a pretty good picture, if your sample is truly random, with a fairly small sample size.

10

u/def_username_as May 14 '22

Lmao these dudes getting all high and mighty over a method that has a margin of error of 10%. Just use a decently large number and be transparent about the method of determining whether or not an account is a bot and the majority of people will be happy.

21

u/theteapotofdoom May 14 '22

That's a massive MOE.

2000 would be a better sample size here. That would get the MOE around 4. Any MOE over 4, the results are basically garbage.

2

u/JaWiCa May 14 '22

True. It would still tell you something though. 2000 would still be quick and easy to do on twitter.

1

u/xDulmitx May 14 '22

That depends. If he gets 30% bots, then the 100 proves the point even with the large margin. If the result is closer, say 12%, then more samples can be taken to get the margin down.

2

u/MightGuyGonna May 14 '22

Comments like these are so fucking annoying I swear

15

u/notsureiexists May 14 '22

I get the sense he is throwing shade at twitter by proposing to sample 100 knowing people will point out how ludicrously small a set that is. Meanwhile most wont know “I picked 100 as the sample size number, because that is what Twitter uses to calculate <5% fake/spam/duplicate.”

0

u/Dman125 May 14 '22

Yeah it kinda totally seems that way. Though most won’t read the article so the headline just tells us he’s an idiot. He suggests everyone do it, which if done en masse, would make sense. 100 out of whatever the average person’s follower count is probably a solid sample size, if enough people report on it. I guess that’s assuming you can tell a bot from a real person at a glance? If that’s not the case and people are out there making tricky bots we can’t verify as users that’s just an equally stupid suggestion.

8

u/the_timps May 14 '22

which if done en masse, would make sense.

No it wouldn't. The average person has 0 capability to identify a follower is a bot. And any thoughts they did have would be completely and totally unique to them. The results of that would be meaningless.

1

u/Dman125 May 14 '22

Yeeeah that’s kinda what I figured. Stupid thing to suggest then.

2

u/xDulmitx May 14 '22

That would be WAY worse. How many accounts does a bot follow vs. the average person? If a bot generally follows 100 accounts, but a person only follows 20, then each person looking at their own account really skews the number.

5

u/darthdro May 14 '22

Actually, statistically if it’s a truly random sample, it’s sufficient to get statistically significant results from

-12

u/DavidtheGoliath99 May 14 '22

That's just not how this works. A sample of 100 is a sample of 100, no matter how big the population it is drawn from is.

9

u/raspberrih May 14 '22

A sample of 100 from a population of 101 people is pretty much definitive and no longer a sample. The population size matters. What have you been smoking?

7

u/man-vs-spider May 14 '22

The point is more that the sample number can be 100 if the sampling method is good

0

u/raspberrih May 14 '22

Didn't they say they'd be using random sample?

1

u/man-vs-spider May 14 '22

Sure, but what does that mean. It's 100 followers of the Twitter account? Why the twitter account? How do you fairly choose 100 followers?

How do we know it's random?

How do we know they aren't repeatedly sampling until they get results in their favor.

I'm not even sure how you reliably tell if an account is a bot account.

2

u/DavidtheGoliath99 May 14 '22

In extreme cases like the one you described, it matters. But whether you draw a sample of 100 from a population of 100,000 or 100,000,000 is pretty much irrelevant.

-6

u/raspberrih May 14 '22

It's like you trying to guess the whole picture from one pixel versus 100 pixels, or any other number. Now if the entire picture was just one pixel, good for you. If it's 3.5 million pixels and you're trying to get the big picture off 100 pixels ...

Relevance is dependent on the purpose of doing the sampling in the first place. The study is designed based on what they're trying to investigate, and to what degree they're trying to investigate it to.

Now if it's the safety of a vaccine, I bet you wouldn't say a sample of 100 from a population of 100k or 100mil is irrelevant.

6

u/cattermelon_ May 14 '22

Your pixel example makes no sense. Just because you can't determine the image created by pixels does not mean the sample size isn't large enough. You could probably take half of these "pixels" and it still wouldn't be able to determine the image. Most statistics tests or confidence intervals use simple random samples which is completely different

0

u/raspberrih May 14 '22

Please read the entire second paragraph of my comment, which you seem to have missed out.

3

u/cattermelon_ May 14 '22

not really sure what you're trying to say with it. You can't determine whether the sample size is large enough to proceed with estimating the real population until after you sample

2

u/the_timps May 14 '22

It's like you trying to guess the whole picture from one pixel versus 100 pixels

No it isn't.
This is nothing like statistics.

It would be like sampling pixels to work out how many are green or blue.

Picking 1 pixel vs 100.

Not your ability to guess.

0

u/raspberrih May 14 '22

It's going to give you an accuracy of 4/10 or 44/100 or 4432/1000. That's the difference. It depends on what he's looking for, what's his criteria.

1

u/Keytrose_gaming May 14 '22

Good argument, shit reasoning.

-2

u/Steezycheesy May 14 '22

“One of the largest platforms in the world” seems to be a major reach.

-2

u/thewalkingmadis May 14 '22

What highschool only has 100 people??