There seems to be consensus around folks that twitter is where one should go for “alpha” when talking about crypto. TikTok for the plebs, Reddit for the midwits and twitter for the self-proclaimed(?) geniuses, illustrated perfectly by this meme:
So, since twitter is obviously the place where the respected people in crypto generally hang out, I decided that it could be fun to analyze what has come out of it during the past decade, starting from 2013. This first post is going to focus on mentions of crypto keywords from a large number of accounts, where every account is in a tier. How did I choose which accounts to analyze?
Mapping out crypto twitter
I wanted a large list of accounts, but I also wanted to be sure that the list wasn’t filled by spambots and reply guys that doesn’t represent anything interesting. The best approach I could come up with was to choose around 200 accounts (you can find them here handpicked_accounts) that I see as representable of CT, and take ALL of the accounts that are followed by these accounts and go from there. (NOTE: not the FOLLOWERS of these 100 accounts, the accounts they are FOLLOWING). Starting out with these accounts: , I ended up with 300k accounts, which I then filtered down to 200k by only keeping the ones that have 10+ follows from my handpicked accounts. The accounts are in different tiers, where the top tier is the accounts with 90+ follows from the handpicked accounts, and then there’s a linear decrease to the bottom tier where they have 5-10 follows from the handpicked accounts.
I now have all the accounts I care about, and the next question is how to get the tweets? Unfortunately, twitter’s API is not generous enough so we’ll have to get them in another way. Now the scraping can start, and after a while my database has amassed a list of 50 million tweets, which are just waiting to be analyzed! (I didn’t initially think this would be so easy, but this scraping library is just really good).
How to analyze the tweets?
If we have a tweet and wanna analyze it for crypto stuff, what do? The obvious thing is to tokenize the tweets and then analyze the words, which is what I did with help from NLTK. I have not bothered to do any sentiment analysis, and only care about mentions of a set of keywords. I actually don’t think sentiment analysis would have been worth it, since more mentions of something is a good enough indicator of “bullishness”. With this out of the way, how do we choose the keywords?
Token keywords
It’s obvious to me that mentions of projects and their tokens are of interest so I retrieved all tokens that exists on coingecko from their API, and got back both the symbol for the token and the name for the project. For example, Ethereum and ETH, Bitcoin and BTC. Since coingecko’s token list is really big, I think it’s fair to assume that we know have basically all the tokens we care about and their keywords covered. For example, the keywords for BTC are: bitcoin, btc, $btc, #btc and #bitcoin. However for something like the NFT marketplace LooksRare, the keywords are only LooksRare, $looks and #looksrare since the word looks is too common and we can’t map every tweet mentioning “looks” with this project. If both the token symbol and the project name is to common, the only keyword left is the cashtag ($) followed by the symbol.
General crypto keywords
Apart from keywords related to a specific project, I thought it would be fun to also match general words associated with CT, such as “altseason”, “shitcoin”, “metaverse” and “flippening”, but also the famous abbreviations such as “wagmi” (when did it get killed? Looking at you mrs Zuck), “gm” and “HFSP”. For example, a general myth on CT is that when the flippening talk starts, we are close to the top. We can now see if that’s true.
Analyzing the tweets
We now have all the tweets we care about together with all the keywords we want to match them with it. There are so much one can do with this data, and I’ll probably make more posts later exploring it more. For this post, I figure that it would be best to focus on the big picture stuff. Like, what was most talked about for each year?
General stuff we’ll explore in this post:
from 2013 to 2022, what does the leaderboard of mentions look like? how has it changed during the years?
how does the leaderboard of mentions look like in the years 2019, 2020 and 2021 respectively?
Specific stuff we’ll explore in this post:
“The Shill/Market Cap Ratio” which is something I just came up with, where a high ratio means that a project is shilled on CT very much compared to it’s market cap.
Explore which accounts were early to three of the largest success stories in 2021, $SOL, $LUNA and $AVAX . (we’ll also find out who coined the term "$solunavax”).
What generally “tops” first, the actual price or the twitter shilling of the token?
When does the talk about “altseason” start, is it when “the altseason” has already been going on for a while and we’re close to the top? Or is it before the tokens run?
Are we close to the top when people start talking about “the flippening”?
What has been most popular on CT the last 8 years?
This basic barplot shows us the mentions during 2013-2022 at the top, followed by 2019, 2020 and 2021 below. What does it say? Basically nothing of interesting if I’m being honest. The corn is always mentioned the most (tada!), followed by the good ethereum. I guess one thing you can see is that Cardano is mentioned less than you would expect if you went after market cap.
The Shill/Market Cap Ratio
So I don’t have the time do keep doing this because I started working on a project that I find to be much more interesting. It’s a secret for now though! Will open source everything around this project soon so that someone can start where I left off.
Wen opensource?
+1 WEN opensource?