Philosophy
This page outlines my teaching philosophy and some implementation details. You are unlikely to find it interesting! I include it here for three reasons. First, I believe in transparency. I am happy for students to “see behind the curtain,” to understand the reasoning behind my pedagogical choices. Second, I want to provide teaching staff with links which they may find interesting. Third, I am always interested in discussing pedagogy with colleagues. Let me know if you have comments!
First, the syllabus for my Harvard class included a “Course Philosophy” section which applies just as much to every class I teach. Highlights:
No course does a better job of increasing students’ odds of getting the future they want.
This is the central promise I make to every student. Do what I tell you and you will increase the odds of getting the internship/job/career that you want, as well as your odds of being successful once you are there. Every organization needs more people, especially more junior people, who can do things with data.
The central metaphor for this class is Ulysses and the Sirens. You are Ulysses. Ithaca is the future you want. The Sirens are the many distractions of the modern world. I am the rope. No course at Harvard does more to increase students’ chances of getting the future they want.
I am creating a “pit of success,” a structure in which students can’t help but to spend 1 or 2 hours a day doing data science.
No Lectures: The worst method for transmitting information from my head to yours is for me to lecture you. There are no lectures. We work on problems together during class. You learn soccer with the ball at your feet. You learn about data with your hands on the keyboard.
Professionalism: We use professional tools. Your workflow will be very similar to the workflow involved in paid employment. Your problem sets and final project will be public, the better to impress others with your abilities. High quality work will be shared with your classmates.
Second, the central goal of the class — and what should be the central goal of all introductory data science classes, IMHO — is to teach students how to take a real question and a pile of actual data, and then use the latter to answer the former in a professional manner. That is very hard! Some comments:
- I don’t teach students trivia. For example, every introductory textbook introduces the Hawthorne effect. Why? It is interesting and (maybe) important. But does it meaningfully help students to handle real-world data science projects? No! Or, at least, not nearly as well as other knowledge.
- Coding, as well as professional software tools like Git and Github, are absolutely critical to this goal. Any introductory course which doesn’t cover this material is failing its students.
- The best way to give beginners a simulacrum of professional experience is to provide them with a checklist, a list of tasks to perform which take them from the start of a problem to its conclusion. Although there are some data science project checklists, I can’t find any which are appropriate for beginners. Pointers welcome!
- My textbook and its associated tutorials are an attempt to provide, and teach, a checklist. Feedback is welcome! I use the Cardinal Virtues as an memory aid, and split each virtue into a few steps. Is this a particularly smart way of doing things? I don’t know! Suggestions welcome.
- The best way to learn a checklist is to apply it over and over again. The main focus of the textbook, and of my courses, is to provide many examples of applying the checklist, generally with increasing amounts of sophistication and many opportunities for students to practice.
- Any class which shares this real-question-plus-real-data approach should, as I do, require students to complete a solo final project in which they demonstrate their new abilities, both to themselves and to (potential) future employers.
Third, “Kill The Math and Let the Introductory Course Be Born” is an article (pdf) which explains why most of my courses include little (meaningful) mathematics. Abstract:
Our introductory classes in statistics and data science use too much mathematics. The key causal effect which our students want our classes to have is to improve their future performance and opportunities. The more professional their computing skills (in the context of data analysis), the greater their likely success. Introductory courses should feature almost no mathematical/statistical formulas beyond simple algebra.
Fourth, the heart of my courses are the tutorials which students complete on their own. This essay about how to write tutorials provides some background details. Highlights:
Imagine the shallowest possible learning curve. Almost every student should be able to answer almost every Exercise, albeit perhaps with the help of the Hint. There are no hard questions. In fact, there really aren’t questions at all. Instead, there are instructions: Do this. Do that. Next do this other thing.
Assume that you are giving the student a private lesson. You ask them a question. They give you an answer. What would you say next to them? What do you want to teach them, given that context?
Generally, students don’t do the assigned reading, at least in a large class. But, they will complete required work. They will do the tutorials. Our promise: If you complete the tutorials, you will become a data scientist. There is simply no way not to.
Fifth, I will (try to) ensure that students are always engaged in class, always doing data science. Part of that is the “no lectures” philosophy. The rhythm of class centers around solving a specific problem. I say a few words and send the students off to work on a problem. Because students work so much in class, I end up saying fewer words in lecture than any other teacher.
In the physical classroom, students work in pairs, either in parallel, discussing the problem as they both type in some code, or as pair programmers, both looking at the same screen but only one student writing code.
On Zoom, students work in breakout rooms, each of which includes about four students. One student shares her screen, a different student each time we go to breakout rooms. Another student is the “guide,” the person who tells the screen-sharer what to type. The other two students should be coding along as well. Course staff moves from room to room, ensuring that the sharing and guiding students are different each time. Course staff will also often, when first entering a room, ask a different student to share his screen.
After a few minutes of working as a small group, I will bring the whole class back together. I then ask a random student to tell me what their group came up with. From my syllabus:
Cold Calling: I call on students during class. This keeps every student involved, makes for a more lively discussion and helps to prepare students for the real world, in which you can’t hide in the back row. Want to be left alone? Don’t take this course.
I share my screen with the class, typing what the randomly-selected student tells me to type, providing commentary on the approach and, perhaps, suggesting a different answer. My goal is to ensure that all the student groups are “caught up,” so that when I send them back to continue work on the problem, they can all start from, more or less, the same place.