CODERSBRAIN

Beyond the Spreadsheet: Powerful Secrets

 Digital photos are stored as a bunch of numbers (left) that make no sense to your brain until you open them with suitable tools (right).

Example of the blue color channel data from a photo of my wooden floor, opened in MS Paint.

Ta-da! You’ve just done data visualization. The music swells as you discover that the power of data analysis was inside you all along.

But does this mean you’re ready to work as a professional analyst?

Not quite. There are some big differences between an amateur and a professional analyst.

Data pro vs amateur difference #1 — Software skills

Unlike most amateurs, the pro knows how to use software (e.g. Python and R) that allows them to interact with more data formats all in one place. While MS Paint only works for images, analytics software can handle images and tables and sounds and text and and and… and the kitchen sink.

Here’s what it looks like when you open that same image with Python:

And here’s the same image viewed with R:

Data pro vs amateur difference #2— Handling lots of data with ease

The second difference is that a pro can work with obscene amounts of data. Even though I’ve been playing with data more than two decades, I still prefer to open a single photo in my browser or even MS Paint rather than in R or Python. So, besides the flexibility of being able to open lots of different data types, what’s the selling point for learning the analytics pro tools? Well, what if you want to make sense of a million photos?

You *could* try to use MS Paint to make sense of them all, but at the speed of 1 second per image, it’ll take you more than a month of full time work. A pro can do it in minutes with the right tools by using code to process and summarize vast amounts of data.

How do you start learning these tools? You look up how to install them (R and Python are free) and start playing with them. Just like MS Paint, but better. Simply do a Google search for whatever task you’re trying to achieve with them and read the results.

Here’s the first result that comes up in response to the search query above:

Boom. That’s all you need.

Well, if you’ve never used R before, your next search will need to be “How do I install a package in R?” but after that, you’re golden. Just copy-paste the code in the answer, replacing “my image” with the filename and filepath for your photo. Not sure what those terms mean? Do a search to look them up. When you’ve run out of things you have to look up, you will have mastered the task you set out to learn. Looking stuff up is how developers develop (pun intended).

Do a whole bunch of this and one day you’ll wake up to the realization that you’ve accidentally developed pro software skills.

One reason I love programming is that it’s a cross between magic spells and LEGO. To learn the abracadabra that gets your task done, you look it up on the internet… which is itself data analytics!

Seriously, you don’t need a course. Simply challenge yourself to look at as many new data formats as you can in R or Python (they’re both good), and, along the way, keep asking the internet how to overcome any hurdles that come up. After you open the data (here’s how to find data to look at), come up with a question that strikes your fancy and try to use the tool to get an answer. Start small and get more ambitious as you go along. There’s nothing stopping you! Have fun!

Data pro vs amateur difference #3 — Immunity to data science bias

In my opinion, learning the tools is the easy part. The hard part is adopting the analytics mindset, which is what the next differences are all about. Starting with this one: the expert has developed an all-encompassing disrespect for data. Yes, you heard me.

Only a newbie pronounces “data” with a capital “D” and treats it as something magical. Professionals have been burned and had their hearts broken enough times to learn the hard way that data is just some stuff that humans decided to write down in electronic form. (More here.)

The advantage of data is memory, not quality.

Sprinkling some numbers into a story to make make it more “sciency” might win the trust of amateurs, but seasoned analysts know better. They are immune to what I call data science bias — trusting information more when it smells of the data sciences. Adding a pretty graph to a nonsense report doesn’t fool them.

Experts understand that the advantage of data is memory, not quality, so they’re as skeptical of formal datasets as they are of the sights and sounds they take in by strolling down the street.

“With data, you’re still just another person with an opinion.’’

One of my favorite pioneers of statistics, W. Edwards Deming, famously said that “without data, you’re just another person with an opinion.’’ That is true, but unfortunately so is this: “With data, you’re still just another person with an opinion.’’ Expert analysts understand this in their very bones.

To start building the same immunity, stop treating data as special. You’ve already (hopefully*) learned how to be sensible and skeptical with photos. For example, you know better than to take anything you see on Instagram as a true unaltered, unbiased representation of reality. If you didn’t take the photo, you won’t trust the photo. Right? Right.

Stop treating data as special!

All the common sense rules you’ve learned for navigating the sights and sounds you’re exposed to in the wild also apply to structured data (numbers in a table/matrix/spreadsheet).

Equating data with truth is the same thing as believing everything that’s written in a book without knowing anything about the author. If you keep your wits about you and maintain a healthy skepticism, you’re well on your way to good analytics.

*There are some darling people who seem to have reached adulthood without learning that not everything you find online is true. If that’s you, may I gently suggest that analytics might not be the best career choice for you?

In addition to more practice with professional tools, the professional analyst understands the, ahem, professional aspects of the profession, which we’ll cover in the next article in this series and this article