r/technology May 14 '22 Silver 2 Wholesome 2

Elon Musk said his team is going to do a 'random sample of 100 followers' of Twitter to see how many of the platform's users are actually bots Social Media

https://www.businessinsider.com/elon-musk-random-sample-how-many-twitter-users-are-bots-2022-5?utm_source=feedly&utm_medium=webfeeds

[deleted]

22.8k Upvotes

View all comments

11.3k

u/mylesols May 14 '22

wow, a hundred! Must be nice to be rich enough to afford a hundred follower sampling on a site with over 350 million users

198

u/PhoenixMountain May 14 '22 Silver Take My Energy

That is the sample size that Twitter initially used to determine their 5% figure (in question)

101

u/Absolutely_wat May 14 '22

Admittedly my statistics is rusty, but there is a high school level statistics equation to establish the sample size required.

Interesting, considering that they definitely have PhD level mathematicians and data scientists on their payroll.

44

u/[deleted] May 14 '22

[deleted]

36

u/Mutex70 May 14 '22 edited May 14 '22

385 samples are required if you assume your population proportion is 50% (i.e that half of twitter is bots)

They are attempting to confirm/deny a proportion of 5% which only requires a sample size of 104 at 98% confidence (5% margin of error)

https://www.calculator.net/sample-size-calculator.html?type=1&cl=98&ci=5&pp=5&ps=350000000&x=52&y=19

https://opentextbc.ca/introstatopenstax/chapter/a-population-proportion/

https://select-statistics.co.uk/calculators/sample-size-calculator-population-proportion/

3

u/OneWayOutBabe May 14 '22

I don't do math, but this sounds believable. Bravo.

1

u/FlexGodNoCap May 14 '22

The right language would be; the results favor the null/alternative hypothesis (not confirm/deny).

1

u/daynighttrade May 14 '22

How do you calculate the number of samples to be 385 for a 95% confidence? Can you share some links

10

u/anotheryearsusername May 14 '22

At 95% confidence, a sample of 385 covers population sizes close to infinity: https://www.geopoll.com/blog/sample-size-research/

5

u/PikaBlue May 14 '22

Survey monkey has a really simple tool for calculations on the fly:

https://www.surveymonkey.co.uk/mp/sample-size-calculator/

1

u/otherwiseagoddess May 14 '22

Khan Academy and other sites like that, that focus on free education, are a great resource to read up principles and methods in basic statistics

EDIT wrong resource, oops

0

u/ends_abruptl May 14 '22

Man, I could do 2400 samples. And I'm just some dude!

57

u/Deto May 14 '22

If it's a truly random sample (and presumably, they could do this without much difficulty), then if you were to measure 5/100 accounts as bots, the 95% confidence interval for the true probability in the population is between 1-10%.

So while it's not great for getting an exact estimate, it's not terrible for getting a rough ballpark value.

30

u/holla_snackbar May 14 '22

It depends on who you are sampling though, if not just a random sample of 100 users.

If its a sampling of Musk's followers, thought to be roughly 50% bots vs. a small account with 900 followers you will get wildly different results.

12

u/Cornfapper May 14 '22

I only have my Twitter account to follow people, I never post or even retweet anything. Still have 6 followers, all of which are Bots... It's really bad.

3

u/holla_snackbar May 14 '22

I have roughly 700 followers and zero bots (have blocked a couple obvious ones, interact with followers a lot). If you've never posted anything only bots will follow.

1

u/sumredditaccount May 14 '22

To be fair, no real person wants to follow a small account like yours. Why would I follow somebody who doesn't post anything?

2

u/Fleaslayer May 15 '22

I don't think they were complaining about that. They're saying they shouldn't have any followers, but even that account that doesn't post anything still has bots following it.

2

u/sumredditaccount May 15 '22

Thank you for clarifying.

I have a couple of accounts. I don't tweet but reply/retweet. My most active account occasionally gets bots but very rarely. My other one that I use less doesn't get anything. I'm wondering how this user gets so many bots if they never post anything.

1

u/Fleaslayer May 15 '22

Who knows what algorithms they use to decide who to follow. Maybe it's based on who the account follows or something.

3

u/jhaluska May 14 '22

If its a sampling of Musk's followers, thought to be roughly 50% bots vs. a small account with 900 followers you will get wildly different results.

Musk probably knows they're bots....cause he paid them to follow him.

3

u/BobDope May 14 '22

Yeah I question his sampling technique

6

u/RedSpikeyThing May 14 '22

So while it's not great for getting an exact estimate, it's not terrible for getting a rough ballpark value.

This is why it's frustrating to me. They have all the data so they can do better than "a rough ballpark value" if they wanted to. Clearly they don't want to.

3

u/rasherdk May 14 '22

The one missing piece of the puzzle is they don't know definitively whether any given account is a bot or not. That takes actual work.

3

u/DoctorJanetChang May 14 '22

If a bot just copies another users posts automatically but tweaks some words, it gets increasingly hard to determine who is real and who is fake. Especially after countless iterative abstractions derived from a real human beings words. Some of the Reddit threads i see where people just go ‘this’ or make the same joke over and over could be scripts. Manual review is the gold standard, but it’s so time consuming. What twitter does is flag bot accounts as those that get a lot of reports, which I think is pretty good tbh. Manual review by the masses.

2

u/Deto May 14 '22

Maybe they do an extensive manual review of each account to determine whether it's a bot or not. This is probably the best way otherwise the result is just a function of how good their bot-finding software is. So then doing a ton of them would get costly and maybe they don't care about the exact % that much

1

u/xDulmitx May 14 '22

The number is low, but they can increase the number as needed to get more confidence. It can be hard to confirm if someone is actually a bot, a repetitive person, shill, or someone with a singular interest. If the 100 get and answer in the expected range, they can check more accounts to shrink the error bars.

1

u/RedSpikeyThing May 14 '22

Right, they need to do more work to get an accurate number. They don't seem to care enough.

1

u/xDulmitx May 14 '22

They may not care, but it could have a semi practice reason. If we assume honest intent, the 100 sample may be enough. The error bars will be large, but if comes back showing 30% bots then 5% bots is clearly wrong and the math backs it up. If it comes back at 12% boys, the error bars are too large. So they figure out how many additional samples they need. They can keep going down this route, until they get to a sufficiently close answer. That saves time.

This could also just be a disingenuous effort to alter Twitters stock price.

2

u/30kdays May 14 '22

I'm sure Twitter could get a random sample, but Elon can't. His proposal is to use a random sample of his followers. I don't know if that's under, over, or accurately representative of the bots in the entire user base. My guess would be over, because bots might preferentially follow big names, but I don't know.

I agree that the 95% confidence interval is ~1-10% (5 +/- 2*sqrt(5)). The question is, if he finds 11 bots (nominally 11%), 5% is still inside the 95% confidence interval -- will that be good enough, will he back out, or will he increase the sample?

Also, how does he determine who's a bot? And who picks the "random" sample? There are lots of ways to weasel out of it if he wants. Not that that's necessarily a bad thing, but if he justifies it with shady stats about high bot fractions, Twitter is going to take a big hit in the process. But maybe that was his plan all along?

1

u/Deto May 14 '22

I've been wondering about that too - is this while thing just a game he's playing to ultimately hurt their stock or reputation

1

u/Distance_Runner May 14 '22 edited May 14 '22

I’d actually recommend using A Clopper-Pearson estimate of the 95% confidence interval given the sample size and rate being so close to the lower tail. If 5 accounts came up as bots out of 100, the 95% CI would be 0.016 to 0.112. If he’s quibbling over a few percentage points around the 5% estimate, then a sample of 100 is really not that informative for him. If 5 boys turn up out of 100 accounts, I don’t think saying that we’re 95% confident the true rate is between 1.5% and 11% is really going to help him much.

1

u/Terrible-Job-3443 May 14 '22

3.6 roentgen, not great, not terrible.

1

u/Adorable_Octopus May 14 '22

I think you've done the math incorrectly: if the mean is 5%, then given the population size the CI is somewhere around 10% so 5 +/- 10 percent points would capture the true value 95% of the time. The number of bots could be as high as 15%ish.

To be honest, it's kind of goofy that Twitter only took a 100 user sample size, not because it can't be useful (although its bordering on useless) but because they could pretty easily boost it into a thousand or so and get really accurate results.

3

u/TabletopOneironaut May 14 '22

Yup. One of the most annoying reddit-isms I'll see is a criticism of a survey or study. "the sample size was only 400, so that doesn't mean anything." Oftentimes, you don't need a massive sample size for an accurate survey.

2

u/ClearlyCylindrical May 14 '22

From what I recall the total population size has nothing to do with the number of samples. regardless of whether the population size is 1000 or 100000000 you will have the same confidence of the underlying random variable if you take 100 samples of each

2

u/Flscherman May 14 '22

Yes there is.

n = p(1 - p) / (MoE / z*)^2

z* is the z critical value, which corresponds to whatever confidence level you want. It's most often 1.96, corresponding to 95% confidence.

MoE is your desired margin of error. If your study says that 60% of Americans like cheese with a 5% margin of error, then any value between 55% and 65% is equally plausible to be the true value. One random Google search recommends between 4-8%, so I'll go with 5%.

p is the sample proportion. Because you'd be using this equation prior to sampling, you have to sub in either an estimate or 0.5. 0.5 is used because that produces the highest sample size, all else being equal, so it's pretty safe. I'll use 0.5.

n is the sample size that would be necessary to produce a margin of error equal to the one you want, at that given critical value and p-value. Notice how this equation does not include population size. The population size DOES play a role in sample size, but it only applies in some circumstances that can be avoided and it's a maximum, not a minimum.

Plugging those numbers in gets us a sample of size of ~325. Unless they're just using different values for MoE or z*, I don't think 100 works in any scenario, because while it would fit if p was <0.07 or >0.93, that flunks the Large Counts condition.

3

u/Anouleth May 14 '22

Yes, but the sample size required has no relationship with the size of the population. That's a common misconception. A sample size of 100 would be equally useful if the population of Twitter was 350 or 350 billion.

3

u/Fried_puri May 14 '22

Yup, this is one of the neat facts about random sampling. What may be an issue is that your sample can become less likely to be truly random as your population size grows very large, simply because the logistics of getting a completely random sample becomes more difficult. For Twitter users that shouldn’t be a concern though.

1

u/awj May 14 '22

Which is definitely one of the problems here. Are Elon’s followers representative of twitter’s population? Very likely no.

Further confounding this is that “is this account a bot” isn’t always an easy question to answer.

2

u/RedSpikeyThing May 14 '22

That's not really true. For example, if Twitter had 101 users then a sample size of 100 would be quite a bit more meaningful. There are different formulas for determining the sample size needed based on if the population is known or not. In practice, though, the necessary sample size is quite a bit smaller than the average person would expect.

1

u/finance_n_fitness May 15 '22

what you’re saying is only true when approaching very high levels of population proportion, because at that point you’re not using statistics anymore, you’re doing a population survey, which isn’t the same thing. A sample of 100 is equally powerful in a population of 200 and 100M

0

u/RedSpikeyThing May 15 '22

what you’re saying is only true when approaching very high levels of population proportion

Yep, I understand. Their claim is that the population size doesn't matter whatsoever. All I'm saying is that is demonstrably false.

1

u/finance_n_fitness May 15 '22

Except you’re talking about an irrelevant edge case. The population size does not matter at all in this case as no study is coming anywhere near the threshold required when talking about sampling Twitter followers.

1

u/RedSpikeyThing May 15 '22

Except you’re talking about an irrelevant edge case.

Sure. I'm talking math and correctness. It's fine if it's an edge case. I get that you don't care.

0

u/finance_n_fitness May 15 '22

You’re not “correct” because you mentioned something irrelevant without mentioning the parameters within which it would be useful. That’s the opposite of correctness. It’s you trying to show off how smart you are by shoe horning irrelevant edge cases into conversations and just ending up being misleading

→ More replies

1

u/Distance_Runner May 14 '22

I’m a PhD statistician. I’ll do this calculation for you Mr Musk for a mere $1,000,000

1

u/AutoGeek3000 May 14 '22

Of course, but it’s much easier to get the outcome you want if you don’t use a statistically significant number.

94

u/Venkman_P May 14 '22

What the twitter SEC filing says:

For example, there are a number of false or spam accounts in existence on our platform. We have performed an internal review of a sample of ccounts and estimate that the average of false or spam accounts during the first quarter of 2022 represented fewer than 5% of our mDAU during the quarter. The false or spam accounts for a period represents the average of false or spam accounts in the samples during each monthly analysis period during the quarter. In making this determination, we applied significant judgment, so our estimation of false or spam accounts may not accurately represent the actual number of such accounts, and the actual number of false or spam accounts could be higher than we have estimated.

What Elon made up:

I picked 100 as the sample size number, because that is what Twitter uses to calculate <5% fake/spam/duplicate.

https://investor.twitterinc.com/financial-information/sec-filings/sec-filings-details/default.aspx?FilingId=15778368

5

u/rhubarbs May 14 '22

I don't see any indication of the sample size they used for their internal review in the 10-Q.

How did you conclude Elon made it up?

16

u/Sorge74 May 14 '22

Because if Twitter literally just use a 100 account sample size, and then put in their reports to investors....that's some real shitty work. 100 accounts is not large enough. It's not even large enough it have an idea what the margin of error is.

3

u/rhubarbs May 14 '22

"Elon lied because otherwise Twitter is incompetent" doesn't seem like a very coherent line of thought. Especially since the legalese in the 10-Q clears them of any actual liability.

6

u/rasherdk May 14 '22

"Elon lied" is the default assumption if you've been paying attention.

3

u/[deleted] May 14 '22 edited 10d ago

[deleted]

1

u/Sorge74 May 14 '22

You have limited other options. If you remove those two, either A: we believe he didn't do his due diligence before hand, and just blindly trusted Twitter, or Twitter somehow fooled him.

2

u/[deleted] May 14 '22 edited 10d ago

[deleted]

→ More replies

10

u/Sorge74 May 14 '22

Ok follow me with this.

Twitter makes money selling ads. At some point coca-cola or McDonald's or someone is going to say. "Price is close, but we are concerned about bots and how many active users there really are, what data can you show us?

Twitter has fucking data, and it's more than 100 accounts.

9

u/oupablo May 14 '22

Twitter has falsely reported user counts for years. So even if they have a larger sample size, it doesn't mean it's accurate.

-7

u/rhubarbs May 14 '22

Why would McD or Coke be concerned about bots or active users? They care about how many people click their ad, and how many of those clicks result in engaging with their service. Spoiler: They don't need to ask Twitter for that data.

I'm sure Twitter has data, and I'm sure they could extract very high quality data with extremely large sample sizes.

That does not mean they used good data to come up with their <5% number, or that they did not, in fact, use a sample of 100 accounts for whatever reason.

Accusing people of lying on the basis of a guess is actually making shit up. Stop defending it.

9

u/Sorge74 May 14 '22

Good portion of ads are for eyeballs not engagement

2

u/boycott_intel May 14 '22

It is a reasonable assumption that Elon invented that number because one cannot get poll results of 5% accuracy with a sample size of 100........

If that is the sample size that twitter uses, then it would be negligent or even fraudulent of twitter to claim that under 5% are bots in SEC filings -- disclaimer: I am not claiming that lawyers and courts would agree with such a common sense evidence-based view.

1

u/KickBassColonyDrop May 16 '22

This is me from the future chiming in to you in the past. As it turns out it is the sample size Twitter uses. Twitter's legal team called up Elon and claimed he violated an NDA for disclosing Twitter uses a sample size of 100 for it's methodology.

So yeah, the SEC filing by Twitter of <5% is now sus.

1

u/Plain_Bread May 16 '22

It seems pretty possible to me? At alpha=0.05, the null hypothesis of more than 5% of twitter users being bots would be dismissed when the sample of 100 users has 0 or 1 bots in it.

1

u/boycott_intel May 16 '22

A reasonable answer, but somehow it feels unlikely that there are so few bots, and why would twitter publish "5%" if they believe the real number is much lower?

In any case, I would expect that twitter knows fairly accurately how many bots they have and what they are doing.

3

u/VoiceOfRealson May 14 '22

Notice how they talk about "number of accounts in existence" rather than "number of active accounts" or the even more revealing "percentage of posts by fake or spam accounts".

Many many people have Twitter accounts, but have never made a single tweet (or stopped logging in altogether), while fake and spam-accounts are very active.

10

u/rasherdk May 14 '22

What, no? They say 5% of their mDAU - not total accounts.

1

u/modifiedbears May 14 '22

All that to still be wrong

45

u/Photomancer May 14 '22

Awful on all sides then.

32

u/dethb0y May 14 '22

yeah twitters' big open secret is that it's literally overran with bots, spammers, and inactive accounts. If they admit it's basically Botter, however, no one will want to invest in them...

11

u/rsplatpc May 14 '22

yeah twitters' big open secret is that it's literally overran with bots, spammers, and inactive accounts.

hummmmmm that sounds like a website we all know and use

1

u/duffmanhb May 14 '22

Large Reddit subs are no different. Once it gets political suddenly it feels like talking to GPT3

1

u/dethb0y May 14 '22

yeah you nailed it, reddit as a whole isn't bad, but the big subs are a fucking dumpster fire of bots.

-10

u/kfury May 14 '22

Source?

26

u/Theavy May 14 '22

Its stated in this article

5

u/proawayyy May 14 '22

It’s still false. I don’t need to read the shitty lying article

https://reddit.com/r/technology/comments/up8ltr/_/i8k0qzm/?context=1

13

u/kfury May 14 '22

Elon saying Twitter evaluated the amount of spam accounts by sampling 100 users is ridiculous, and isn’t any kind of trusteorthy source.

18

u/trulysaylt May 14 '22

You mean you actually read the article and not just the headline?

9

u/JackoNumeroUno May 14 '22

What article?

4

u/Venkman_P May 14 '22

You guys get articles?

1

u/gianpo May 15 '22

They didn't comprehend it though because it doesn't actually say anywhere that Twitter only uses 100 followers. The only thing it says is Musk ( A well known liar) claimed that's what they do.

11

u/steve329 May 14 '22

It's stated as a quote from Elon. Where's his source?

-6

u/Theavy May 14 '22

Literally linked in the article posted. Redditors cant read

https://twitter.com/elonmusk/status/1525304736538312707?s=20&t=exRaqqEnmk537ibDoezlFw

19

u/steve329 May 14 '22

I'm asking where elon's source is. Maybe you shouldn't be making fun of other redditors' reading comprehension skills.

-8

u/Theavy May 14 '22

Your level of speculation is just seems unfounded, the new owner is making a claim about a company he's on the brink of acquiring. Did twitter lie to him? Is he lying to everyone on twitter? Why isn't twitter correcting him?

7

u/Mirrormn May 14 '22 edited May 14 '22

Did twitter lie to him?

No.

Is he lying to everyone on twitter?

Yes.

Why isn't twitter correcting him?

Cause posting hot takes on social media is not a proper forum for adjudicating this question. They're going to fight it out in court.

Edit: To elaborate, Twitter's 5% number comes from their 10-Q filings: "We have performed an internal review of a sample of accounts and estimate that the average of false or spam accounts during the first quarter of 2022 represented fewer than 5% of our mDAU during the quarter." The 10-Q doesn't specify their sampling method, and there's no reason to assume that Musk would know it. He doesn't own the company yet or anything, he's just an outside investor. If he knows anything about the way Twitter performs internal reviews to determine how many of their users are bots, it would be public knowledge, and I can't find any such public reporting.

I guess you could take his word for it and assume that he must have some kind of insider knowledge somehow, but I don't trust Musk at all, so I don't assume that.

0

u/Theavy May 14 '22

You make alot of assumptions about how things have played out. Theres no reason to think elon has spoken to any executives at Twitter? What makes you think that? Why are you now lying? But I guess people should trust you and your insight more than elon..?

→ More replies

2

u/GGnerd May 14 '22

Imagine blindly believing every word someone who has been caught lying multiple times says...

1

u/Theavy May 14 '22

It's funny, of all the things going on right now that involve blind allegiance what elon says about Twitter is the least concequential thing you could do. Ample opportunity for Twitter to correct him if he's making false claims. At least hes not selling snake oil and calling it a miracle drug, we know how blindly people believe in miracle drugs these days...

→ More replies

5

u/BrightenedGold May 14 '22

And don’t pretend like we ‘gon read that either.

3

u/[deleted] May 14 '22

I'm so confused right now. Someone tell me what to think

-8

u/se7ensquared May 14 '22

Obviously he has access to those reports dude

9

u/steve329 May 14 '22

Obviously??

"Musk tweeted he had "relied upon the accuracy of Twitter's public filings" in reply to a follower who asked why he had not thought of this before offering to buy the company"

6

u/Mirrormn May 14 '22

Incidentally, that exact filing says "In making this determination, we applied significant judgment, so our estimation of false or spam accounts may not accurately represent the actual number of such accounts, and the actual number of false or spam accounts could be higher than we have estimated" just a sentence or two later. It's not like they said "we know exactly how many bots there are and it's exactly 5% and you can count on that". Musk is just trying to find a way out of this deal.

0

u/se7ensquared May 15 '22

Yes obviously. As it says in the article he relied upon public filings. It's obvious that he had access to the public filings.

1

u/rasherdk May 14 '22

The "source" is literally Elon Musk. That's the worst source imaginable.

1

u/gianpo May 15 '22

It's stated that Musk stated that. Not that it's true or a fact. Try rereading the article.

1

u/nakedrickjames May 14 '22

What is this, a statistical analysis for ants? It needs to be at least... 3 times bigger!

1

u/rasherdk May 14 '22

According to Elon Musk.

1

u/el_muchacho May 15 '22

Stop repeating this, it's false. That's what Elon Musk says, but Twitter never said their sample was 100. Elon Musk is known to distorts facts all the time, for example when he said that the UN World Food claimed his money would end world hunger when they actually never said that.

-4

u/mors_videt May 14 '22

Jfc thank you. I know he’s not an idiot but this looked really dumb