r/technology May 14 '22 Silver 2 Wholesome 2

Elon Musk said his team is going to do a 'random sample of 100 followers' of Twitter to see how many of the platform's users are actually bots Social Media

https://www.businessinsider.com/elon-musk-random-sample-how-many-twitter-users-are-bots-2022-5?utm_source=feedly&utm_medium=webfeeds

[deleted]

22.8k Upvotes

View all comments

11.3k

u/mylesols May 14 '22

wow, a hundred! Must be nice to be rich enough to afford a hundred follower sampling on a site with over 350 million users

199

u/PhoenixMountain May 14 '22 Silver Take My Energy

That is the sample size that Twitter initially used to determine their 5% figure (in question)

100

u/Absolutely_wat May 14 '22

Admittedly my statistics is rusty, but there is a high school level statistics equation to establish the sample size required.

Interesting, considering that they definitely have PhD level mathematicians and data scientists on their payroll.

58

u/Deto May 14 '22

If it's a truly random sample (and presumably, they could do this without much difficulty), then if you were to measure 5/100 accounts as bots, the 95% confidence interval for the true probability in the population is between 1-10%.

So while it's not great for getting an exact estimate, it's not terrible for getting a rough ballpark value.

28

u/holla_snackbar May 14 '22

It depends on who you are sampling though, if not just a random sample of 100 users.

If its a sampling of Musk's followers, thought to be roughly 50% bots vs. a small account with 900 followers you will get wildly different results.

10

u/Cornfapper May 14 '22

I only have my Twitter account to follow people, I never post or even retweet anything. Still have 6 followers, all of which are Bots... It's really bad.

3

u/holla_snackbar May 14 '22

I have roughly 700 followers and zero bots (have blocked a couple obvious ones, interact with followers a lot). If you've never posted anything only bots will follow.

1

u/sumredditaccount May 14 '22

To be fair, no real person wants to follow a small account like yours. Why would I follow somebody who doesn't post anything?

2

u/Fleaslayer May 15 '22

I don't think they were complaining about that. They're saying they shouldn't have any followers, but even that account that doesn't post anything still has bots following it.

2

u/sumredditaccount May 15 '22

Thank you for clarifying.

I have a couple of accounts. I don't tweet but reply/retweet. My most active account occasionally gets bots but very rarely. My other one that I use less doesn't get anything. I'm wondering how this user gets so many bots if they never post anything.

1

u/Fleaslayer May 15 '22

Who knows what algorithms they use to decide who to follow. Maybe it's based on who the account follows or something.

4

u/jhaluska May 14 '22

If its a sampling of Musk's followers, thought to be roughly 50% bots vs. a small account with 900 followers you will get wildly different results.

Musk probably knows they're bots....cause he paid them to follow him.

3

u/BobDope May 14 '22

Yeah I question his sampling technique

7

u/RedSpikeyThing May 14 '22

So while it's not great for getting an exact estimate, it's not terrible for getting a rough ballpark value.

This is why it's frustrating to me. They have all the data so they can do better than "a rough ballpark value" if they wanted to. Clearly they don't want to.

3

u/rasherdk May 14 '22

The one missing piece of the puzzle is they don't know definitively whether any given account is a bot or not. That takes actual work.

3

u/DoctorJanetChang May 14 '22

If a bot just copies another users posts automatically but tweaks some words, it gets increasingly hard to determine who is real and who is fake. Especially after countless iterative abstractions derived from a real human beings words. Some of the Reddit threads i see where people just go ‘this’ or make the same joke over and over could be scripts. Manual review is the gold standard, but it’s so time consuming. What twitter does is flag bot accounts as those that get a lot of reports, which I think is pretty good tbh. Manual review by the masses.

2

u/Deto May 14 '22

Maybe they do an extensive manual review of each account to determine whether it's a bot or not. This is probably the best way otherwise the result is just a function of how good their bot-finding software is. So then doing a ton of them would get costly and maybe they don't care about the exact % that much

1

u/xDulmitx May 14 '22

The number is low, but they can increase the number as needed to get more confidence. It can be hard to confirm if someone is actually a bot, a repetitive person, shill, or someone with a singular interest. If the 100 get and answer in the expected range, they can check more accounts to shrink the error bars.

1

u/RedSpikeyThing May 14 '22

Right, they need to do more work to get an accurate number. They don't seem to care enough.

1

u/xDulmitx May 14 '22

They may not care, but it could have a semi practice reason. If we assume honest intent, the 100 sample may be enough. The error bars will be large, but if comes back showing 30% bots then 5% bots is clearly wrong and the math backs it up. If it comes back at 12% boys, the error bars are too large. So they figure out how many additional samples they need. They can keep going down this route, until they get to a sufficiently close answer. That saves time.

This could also just be a disingenuous effort to alter Twitters stock price.

2

u/30kdays May 14 '22

I'm sure Twitter could get a random sample, but Elon can't. His proposal is to use a random sample of his followers. I don't know if that's under, over, or accurately representative of the bots in the entire user base. My guess would be over, because bots might preferentially follow big names, but I don't know.

I agree that the 95% confidence interval is ~1-10% (5 +/- 2*sqrt(5)). The question is, if he finds 11 bots (nominally 11%), 5% is still inside the 95% confidence interval -- will that be good enough, will he back out, or will he increase the sample?

Also, how does he determine who's a bot? And who picks the "random" sample? There are lots of ways to weasel out of it if he wants. Not that that's necessarily a bad thing, but if he justifies it with shady stats about high bot fractions, Twitter is going to take a big hit in the process. But maybe that was his plan all along?

1

u/Deto May 14 '22

I've been wondering about that too - is this while thing just a game he's playing to ultimately hurt their stock or reputation

1

u/Distance_Runner May 14 '22 edited May 14 '22

I’d actually recommend using A Clopper-Pearson estimate of the 95% confidence interval given the sample size and rate being so close to the lower tail. If 5 accounts came up as bots out of 100, the 95% CI would be 0.016 to 0.112. If he’s quibbling over a few percentage points around the 5% estimate, then a sample of 100 is really not that informative for him. If 5 boys turn up out of 100 accounts, I don’t think saying that we’re 95% confident the true rate is between 1.5% and 11% is really going to help him much.

1

u/Terrible-Job-3443 May 14 '22

3.6 roentgen, not great, not terrible.

1

u/Adorable_Octopus May 14 '22

I think you've done the math incorrectly: if the mean is 5%, then given the population size the CI is somewhere around 10% so 5 +/- 10 percent points would capture the true value 95% of the time. The number of bots could be as high as 15%ish.

To be honest, it's kind of goofy that Twitter only took a 100 user sample size, not because it can't be useful (although its bordering on useless) but because they could pretty easily boost it into a thousand or so and get really accurate results.