Category Archives: Analytics

Analytics: Optimal starting ease for core vocab in Anki

I’ve long wondered what the optimum starting ease settings are for learning vocabulary in anki. Starting ease is the primary setting the affects accuracy, workload, and ultimately how much I can learn in a given time. There’s supermemo’s theory page, but it’s not specific to japanese vocabulary or even language learning. I want to know my personal settings for the deck I’m studying so I decided analyze my anki learning data to find out.

The first scatter chart shows the relationship between a card’s ease and my accuracy answering the card. The blue data points are from when I first started studying core vocabulary and was using a lot of filtered decks. I’ve since realized that filtered decks aren’t as efficient as simply using anki’s algorithm and sensible settings. I’m also guessing that there’s a learning effect making it easier to learn japanese vocabulary once I’m a few thousand words into learning. Either way, it seems that some combination of those factors is allowing me to be more accurate lately(red) as opposed to when I started(blue).

The second chart shows what happens when I simulate my workload for various values along the combined best fit curve. The blue line(left axis) is simply the combined line from the chart above. The red line(right axis) is the simulated workload and the yellow line(right axis) is a smoothed version of the red line. As you can see, on the left side of the chart, if I try for high accuracy, my workload is twice what it could be if I accepted a lower accuracy. At an ease of around 210, my accuracy should be around 61%, but my workload is about half what it is with ease 130 allowing me to study twice as many cards in the same amount of time.

The problem with the chart above is that the yellow line doesn’t accurately show how much of the vocabulary I actually “know” for any ease/accuracy setting. In other words, if I am getting 60% accuracy vs 80% accuracy, I “know” 20% less vocabulary, but it’s counted the same in the chart above. So the following chart is the same, only the yellow workload line is adjusted to account for accuracy, so that every point on the line represents the same number of known cards.

Judging by this last chart, my most efficient starting ease for my core vocabulary deck is around 175 which should put my accuracy around 67%. Lately, I’ve had my ease set to rather easy settings because it makes the learning process a lot more fun when I feel like I’m winning. However, I realized that the slope of that yellow line is so steep that a small sacrifice in accuracy should result in a large decrease in workload, allowing me to add more cards. So, I’ve decided to slowly raise my ease settings until I find a good comprise between accuracy, efficiency and enjoyment.

Analytics: The difficulty of finding leeches

Anki Analytics: Card difficulty

Analytics: Leeches

1 Reply

This is a new series where I combine a few things that I am currently learning into a topic I have no business pretending to know anything about. In addition to teaching myself Japanese, I am also attempting to teach myself programming and also data analysis. Although it’s going very slowly, I am hoping to figure out a few things that will hopefully make the ankiing a little more efficient.

My first target is those damn leeches. Leeches are what anki calls those cards that you keep forgetting over and over. According to the supermemo site, around 50% of your time can be spent learning 2.5% of the material. That 2.5% that is taking half of your time are leeches. Depending on your goals, wouldn’t it be nice to be able to identify that 2.5% of material and spend that 50% of your time learning twice as much? Personally, I would rather learn 97.5% of core twice as fast before spending the time to learn that last 2.5%.

Unfortunately we don’t know what those 2.5% hard vocab words are, and even worse, anki doesn’t give us nearly the tools to find them. All that anki gives us is a setting that once you fail a card more than a set number of times (default is 7), anki will suspend that card. The thinking being that you are more likely to learn a new card in less additional time than keep trying (and failing) to learn the one you’ve failed so many times already. But I’ve always wondered what setting has you learning the most amount of material in the least amount of time?

This is the question I set out to answer. I wrote a small program that counts the number of reps to either learn a card or become a leech. I considered a card to be “learned” once it’s interval surpassed 4 months. I did this for all cards, and averaging the reps to learn a card and the reps to become a leech for every card I’ve studied. The result is the average number of reps it would take to learn a card assuming a given leech threshold in anki.

The above graph shows the results for the 4 decks I’ve been studying. The first thing to notice is that “core sentence”s and my” Japanese for busy people” decks are much easier than my “core vocabulary” and “kanji” decks. The other thing to notice is that for all decks except for kanji, setting the leech threshold to the lowest setting results in learning the most number of cards in the fewest reps. Kanji appears to be most efficient setting the leech threshold to 8, but any number higher than 4 appears to be just fine. The final thing to notice is that all of the vocabulary and sentence decks appear to have a similar curve, and a very smooth one. I take this to suggest that for all vocabulary decks I study, setting leeches to the lowest setting will result in learning the most amount of vocab words in the least amount of reps. However this isn’t the only consideration.

The second graph shows the ratio of learned cards to suspended leeches for each deck and each leech threshold. As you can see with the “hard” vocab and kanji decks, at lower thresholds anki is suspending more cards than I would be learning learning. In fact, setting the leech threshold to 1 for core vocab and kanji would result in learning only 18% of the vocab deck and 6% of the kanji deck. This is hardly desirable, but finding a good balance between efficiency and completeness might make sense for some people. For instance, setting the threshold to 9 for kanji and 6 for core vocab gets me in the 50-60% coverage range. That still seems less than optimal to me, but something that I have to think about as there is no clear cut answer unfortunately.

That’s it for now. Please put you thoughts, criticism, praise and especially suggestions in the comments as I’m happy to make this better with your help.

	low	high	mean	median
Tae Kim	5	20	8.386740331	8
RTK	3	98	37.78409091	34.5
Core sentences	2	59	9.30834753	7
Core vocab	2	194	30.17112299	19
JFBP	5	164	19.59797297	7

Jon Ken Po

Learning Japanese from the beginning

Category Archives: Analytics

Analytics: Optimal starting ease for core vocab in Anki

Analytics: The difficulty of finding leeches

Anki Analytics: Card difficulty

Analytics: Leeches