9 Comments
Dec 16, 2021·edited Dec 16, 2021Liked by T Coddington
author

ianbot wins the coveted "Reader of the Month" award! Your prize is a free subscription to I Numero 🤣😉... but seriously, thank you!

Expand full comment

Age isn't a relevant criterion. It correlates because lots of old people have accumulated conditions caused by decades of poor choices. Many don't. These underlying conditions are what produces the immune deficiencies that contribute to serious outcomes to infections. This virus is easy prey for healthy immune systems. The real scandal is healthy immune systems are easy to maintain, but not profitable for the medical industrial complex. They have a name for obese diabetics who sit indoors watching sitcoms so have low vitamin D -- customers.

They're updating the census this year, apparently. Harassed me for months. Who does that -- commerce? They have the data you want, but not the data you need. I don't know if they'll give it to you.

Expand full comment

Perhaps IPUMS has the data in a format that you might find more usable. See: https://www.ipums.org/ - I think that IPUMS NHGIS looks promising, based on your comments; https://www.nhgis.org/

Expand full comment

That's where I was gonna say to go. County level data is a mess in the CDC databases that I've seen. Was working with some of the datasets months ago and settled on state because cleaning the county data would take way longer than I had.

Expand full comment

So this is no good ? https://wonder.cdc.gov/bridged-race-population.html

Expand full comment

I haven't looked at that one, I'd say download the csv and give it a check. I'm not the best coder either so for my purposes at the time it was going to be faster for what I was doing to do state level data. I was looking at a few different variables some of which were absent from some of the county databases.

Expand full comment

During the vetting phase I tend to load any nominated dataset into a common area for review which is modelled so that each field value lands in a new row. In that manner there is no need to design a dedicated receiving space - all simply land in a generic structure of run identity, field number, and field value broken down by numeric, character, date or json/array type. This does not stop there being dedicated receiving areas one per candidate file type but does allow separation out before any decisions about the quality of contents. Eventually I expect to make the landing area sparse so that repeated values take less space after the first example of the value holds a slot of instanciation. Eventually also expect to dynamically create targeted receiving tables on the basis of the formats revealed by inspection of emerging field values. A kind of scanner for data which can readily ingest new sources without prior design work. However it is not enough to load data sources, we need to be able to annotate the records with a kind of water mark which evokes data quality measures and presents lineage to answer future questions.

Expand full comment

This is super helpful, thanks!

Expand full comment