Prevalence and Incidence of Rare Diseases

I’ve now come close to doing something I thought was impossible in my first post, which is knowing where the prevalence of ataxia stands among a list of other rare diseases. I was drawn to this PDF because of the prevalence and incidence information it contains on 3,064 rare diseases, and I have made the data interactive—sortable by column, and filterable.

Here is my page

Note that the data are estimations. For example, worldwide consistency never makes sense; in some cases, studies may have only been done on high-risk populations; etc. Also note that treating all types of diseases uniformly has its failings. For example, tuberculosis and SCA are vastly different, yet they are both here.

Exact precision cannot be expected, here or anywhere else for that matter. Still, I find the information to be valuable and meaningful.

There are 3,064 diseases in the list, with 286 duplicates, for 3,209 entries in the list. The duplicates are mostly doubles, plus a few triples.

Main features of my page

The sort order of each sortable column is toggled when the column heading is clicked.

The table is striped (alternate rows are shaded) to enhance readability. Yes, I realize that strict alternation does not occur when the table is filtered. So be it.

Browser notes

Use a modern browser. Old browsers won’t work. Don’t bother with Microsoft Internet Explorer 6, 7, 8, 9, 10. I only use Google Chrome regularly. Microsoft Edge is terribly slow.

Use a big screen. Don’t bother with a phone. Phones should work but are too small to be realistic.

Overview—numerical data

The primary goal was to be able to sort numerical data numerically.

When sorted on one of the numerical columns, the ranking index is a numbering of the rows, where every item with the same numerical prevalence or incidence gets the same number.

Overview—disease names

The names are hyperlinked to a Google search on the name.

Though you can sort by name, that’s more of an afterthought. Sorting by name essentially scrambles the numerical data.

When the table is sorted on the names, the ranking index changes whenever the name changes. There are 286 diseases listed more than once.


Only rows containing the filter text are shown, with the text highlighted.

When filtering on only one character, it can take a few seconds to generate the filtered table. I find that it speeds up sufficiently by the second character.


Prevalence and incidence numbers can be shown per 100,000 or for a population given in millions. Some sample country population sizes are listed (and clickable) for conveniently computing prevalence for their sizes.

Overview—experimental section

Show duplicate names only. I wanted to do this, but it wasn’t worth the effort to make this a persistent filter that combines with the filter text. In fact, clicking this link first clears any filter text.

Also, importantly (since this is kludgy), if you care about sort order, you need to sort before invoking this experimental filter. When you sort, it brings back all the data and undoes this experimental filter.

Observations about ataxia

To filter on SCA, filter on “spinocerebellar” (just “spino” is enough). Except for the missing SCA types, this is quite revealing about prevalence. I know from the following PDF that the most prevalent SCA types are 1, 2, 3, 6, 7, and 8.
Page 7

(Unfortunately, the table is missing SCA types 4, 6, 7, 8, 9, 10, 15, 16, 24, 28, and 33.)

I didn’t realize that the rarer forms of SCA are so rare—that in a few cases, only a few people have it. The table doesn’t have prevalence or incidence data for most of the SCAs. That means they are so rare, only how many people (or families) have it is recorded. Consider a case such as SCA28, with six worldwide (?) cases. That’s a prevalence of less than one per billion!

Prevalence, incidence: desecrated

Unfortunately, the PDF data play fast and loose with the terms prevalence and incidence. And since I ran with it like it is, now I’m complicit!

Here’s the way it’s supposed to be:

  • Prevalence is how many people in a population are known to have something right now.
  • Incidence is the rate (i.e., per time period) at which new people are known to be getting it in a population.
  • Birth prevalence, intuitively, sounds like it’s how many babies are known to be born with something in a given time frame. At least for genetic defects like those that cause SCA, this equates to:
    • Birth prevalence = birth incidenceprevalence * birthrate

This document’s definitions contradict this a bit:

  • Incidence = prevalence / disease mean duration
    • Sort of: prevalence per year for the disease’s duration
  • Birth prevalence = prevalence / patient life expectancy
    • Sort of: prevalence per year for the patient’s duration

(I assume the durations are in years.)

I’m not sure I’ve grasped these definitions, yet I go ahead and treat all three of the numbers fairly similarly.

Example #1: Let’s say a disease is shown only having incidence data. If we infer that prevalence data is absent because it wouldn’t make sense, then in theory people get the disease but no one has the disease for a long time—it’s either quickly curable (or goes away) or quickly fatal; we can’t tell which.

Example #2: Let’s say a disease is shown having a prevalence of 20 per 100,000 and an incidence of 139 per 100,000. In 100,000 people, 139 get the disease in one year, but at any point in time, 20 will have it. Either the disease is often curable (or goes away) or is fatal, in less than a year—but not always (which is why 20 have it). This is essentially “the opposite” of ataxia and just a curiosity on my part.

What’s next?

When I get around to it, I plan to put into words how ataxia fits in with other rare diseases.

Meanwhile, here’s a comic.

Leave a Reply

Your email address will not be published. Required fields are marked *