rstudio::conf Trip Report

12 min readFeb 4, 2020

I interned for RStudio and then got to be an Instructor Teaching Assistant for the Applied Machine Learning Workshop at rstudio::conf. I also gave a lightning talk about my work over the summer on Data Science for Software Engineers. Here is my trip report, inspired by Amy Ko.

It’s me taking a selfie at the Applied Machine Learning Workshop! We had 180 people, all learning about tidymodels!

Day 1: Conferencing While Autistic

I’m finally starting to understand why conferencing is so fun, and also what to do so that I can enjoy them. I’ve included the Sensory-Friendly Conference Guide that I made for conference organizers; Carl Howe took on the burden of disseminating this information to more than just rstudio::conf, and I felt super welcome and accommodated for.

Sensory-Friendly Conference Guide for Organizers. The transcript is here.

No matter what, people will reach out for a handshake. It’s a social and professional custom and I understand that. As I become more and more comfortable with the needs I have, and know that I am worthy of having those needs met, I just say “I don’t usually handshake. Nice to meet you, I’m Yim!” It actually serves as a nice icebreaker most of the time. Sometimes I do it compulsively when people pull me in with their strong handshake energy. Then I explain to them how I can feel their veins and how I’m truly committing to this handshake so they better appreciate it. That usually goes over fairly well if I also laugh, because I am funny. But also, I can feel your bones and I’m thinking about your hand muscles.

I love who I am. That has made the biggest difference, above and beyond whatever society could do. But that makes me able to teach a lot. I’m unafraid to ask for what I, and probably many others, could benefit from. I think one of the more difficult things to understand is that people like me can perform professionally on occasion. But a week-long conference tests our abilities. We have to conserve our energy just like you would for running a marathon or drinking a lot of beer. We have to distribute it across the time allotted. So I’m learning not to start off too strong. If I don’t want to make eye contact, I don’t. If I want to wiggle, I do. If I don’t understand a social phenomenon, I ask. If I need to get away, I do. Of course it’s not possible for everyone; and there are so many professional pressures that make us think we have to perform adequately or risk our social standing/career success. But I come first, and I’ve been learning that most people really respect that.

Day 1: Applied Machine Learning Workshop

The first hour was learning crisis management when your entire internet infrastructure fails. The Applied ML workshop was a 2 day workshop taught by Max Kuhn and Davis Vaughn, and TAed by yours truly and other former interns. We wanted everyone to spin up an instance of RStudio ServerPro, so that they wouldn’t have to configure any environments or download any packages. Seems helpful, right? No need to go local, no need to clone the repo, no need to mess around with package installation.

Then the loading problems began.

No one could connect, nothing was loading, the workshop was starting, and hand after hand was going up. Turns out, we literally needed the hotel to allocate more bandwidth for the conference, and then have people refresh their IPs and reconnect once we did that. I got a lot of practice in calmly running around and assuring people it was all gonna be okay. And I got a lot of insight into what can go wrong no matter how much you plan out a lesson or a workshop or a conference.

The workshop itself covered intricacies of tidymodels. This was where I began to realize the differences between theory and practice. I’ve worked with regression models, decision trees, k-fold validation, stratification, principle component analysis, etc. But have I even touched a workflow for these? No! So here’s the magic of tidymodels. I used to wonder “ugh why would I waste my time learning all that?” But ever since I forced myself to learn dplyr a few years ago, I’ve been grateful ever since. tidymodels has a series of packages to make a workflow:

Pre-Process: rsample, recipes >> Train: parsnip >> Validate: yardstick

RStudio creates these packages to make your life easier… eventually. It is an investment, particularly for someone like me who has written most ML algorithms from scratch or relied on scikit-learn. But lately, a lot of my work has been in R, and in general I’d like to be well versed in whatever I can. The best part of learning new tools is getting to teach those tools to fellow PhD students and those who could really use them. In my opinion, what packages from RStudio excel at is readability. They write so that you can read, and that has not been my experience with Python ( pytorch, keras , pandas, etc.). Here is a Gentle Introduction to Tidymodels.

Day 1: My Hot Take on Housing Prices and the Iris Dataset

For the Applied Machine Learning Workshop, we relied on the Ames Housing dataset. As much as I really dislike these typical datasets, I need to admit why we use them. They are real world data, they are clean, they have lots of worked examples online, and they have relationships that help us learn various modeling techniques. But my recent research suggests that we might get a benefit from inputting our own data into the dataset. Even if we were to use housing prices as our data (which can be problematic for those who would never own a house or have little domain expertise)… You can imagine asking: “Do you own a house? Why did you get the price you got? What do you think mattered the most when you bid on the house in that particular market?” For those who don’t own a house (which for university level is most students), you might ask “How did you choose where to live? What were the constraints? Was the price always constant or did it change?” You can imagine all kinds of ways to include your learner in the data they are about to explore. Even taking 5 minutes to have your learners write it down, or add a new line in the dataframe that represents their own experience (if applicable). My PhD work suggests that there might be a benefit of doing this for evaluating model performance, critiquing the data, and suggesting additional features to include in the model.

My latest brainchild is to run a workshop where I order a bunch of iris flowers. Actual iris flowers. A handful would be setosa and a handful would be versicolor and my students will actually measure the Sepal.length by hand. Then they will try to figure out where they fit in the data, and classify themselves based on the classic iris dataset. You can learn a lot from bringing yourself into the context, from having fun, and from noticing the shortcomings of the data.

Photo of purple iris flowers in a field. Are they setosa or versicolor or virginica? I have no idea.

Day 2: The Buzzword Generator

The joke that started to emerge was how “buzzwordy” the tech world can be. Me and my fellow former intern and friend Maya Gans couldn’t stop ourselves from coming up with silly workshop titles that packed as many buzzwords as possible into one sentence. It seemed like everything around us was “at Scale” or “on the Cloud” or “in Docker Containers” etc. We jokingly started inventing really long buzzword titles for fake workshops, which inspired me to code up a simple website to randomly generate them. It’s not machine learned or anything, it’s just random combinations of various words mentioned in the rstudio::conf agenda. But they sure are hilarious. Click “Generate” and you get these hilariously long and buzzwordy workshop titles! You can find this at yimregister.github.io/whichworkshop. We laughed enough at each of these until it hurt, and that was a great time.

Fake workshop titles. “Big Data Visualization Blockchain with the Tidyverse in Docker Environments.”, “a Practical Intro to Deep Learned Survival Analysis with Kubernetes Packages”, and “Applied Shiny Machine Learning RMarkdown Databases for Excel Users with Kubernetes”.

Day 3: RStudio as a Benefit Corporation

Hey! Turns out RStudio is making a legal committment to being a corporation that gives a crap about social good. I wasn’t sure of all these details, but JJ seemed to really know what he was talking about. He introduced the idea of a Benefit Corporation, which very basically is a “for-profit corporate entity that includes positive impact on society, workers, the community and the environment in addition to profit as its legally defined goals, in that the definition of “best interest of the corporation” is specified to include those impacts”. RStudio was a really, really good place to work. There seems to be a very core shared commitment to helping teach the world. Each person you meet from RStudio has some grander mission to help raise data literacy for social good. No company is perfect, and you never know how marketing teams can skew the darkside of any corporation. But my experience with RStudio was overwhelmingly positive (particularly with the Education team). And this was a pretty neat step for RStudio, with JJ repeating “you know, what if I die??” as a driving force between legally declaring RStudio as responsible for their social impact. It shouldn’t be up to one person, it should be part of the core responsibilities of the company.

Image of JJ.Alaire (CEO of RStudio) talking about RStudio as a Public Benefit Corporation

Day 3: Google AI and Data Literacy (and Visualization Tools!)

Slide from Google AI’s keynote that reads “Training data — the key to ML. Debug your data first, not your program”.

Fernanda Viegas and Martin Wattenberg came from Google AI to talk about some really awesome visualization tools for inspecting your training data for neural network image classification. They demoed Facets, a tool for taking a look at all those 32x32 images and trying to see some human level patterns when all the training data is in front of you. They demoed a tool that shows how to avoid discrimination when using models to predict who will defer on a loan, and how the model is optimizing for different values depending on what you tell it to do: Attack discrimination with smarter machine learning. And they showed some neat visualizations for seeing the MNIST dimensional space in a more intuitive way! This embedding projector shows both word embeddings and the classic MNIST dataset in a 3D PCA playground. Their hearts were in the right place, and my Slido questions got asked about self-advocacy on the ground! Basically, I see how we build these tools for developers and consultants, but I’d like to see a bigger focus on trusting that the public can also learn to protect themselves and ask the right questions. Fernanda was super down with all that. Thanks Hadley for asking my question :D

Day 3: Teacups, Giraffes, and Statistics

Some of Desiree’s amazing artwork from the site, featuring a teacup giraffe in a forest.

Desirée De Leon is a true wizard. Her artwork is impeccable, her talk was fun, her stats knowledge is awesome, and her statistical communication is unparalleled. Welcome to the world of Teacup Giraffes and Statistics, a project she created to teach statistics in a fun and relatable way. The premise is that you are a grad student, and you need to investigate the differences in two populations of teacup giraffes. The series has interactive R coding, statistics instruction, narrative storytelling, and even a fun backstory for how the teacup giraffes came to be! You would be sorely missing out if you didn’t go revel at this incredible project: Teacups, Giraffes, and Statistics.

Day 3: On Misgendering, Sensory Overload, and PTSD While Conferencing

I notoriously start conferencing with lots of energy, aware that I’ll probably crash in the next few days. This time around, I was much more aware of what I needed to do in order to preserve my social abilities. However, there always comes a point. First, the misgendering is constant. I have lots of patience for a learning and growing society, who want to be inclusive. But they truly do fall short and it’s disappointing. There we are, with our little nametags with our pronouns listed as if it’s the hot new trend. It’s not enough to give your pronouns, you need to read mine too. Please. I do have patience, and of course I don’t fault people for trying and “slipping up”. But it is not enough to performatively participate in the pronoun discourse. You need to practice using them, and be mindful of your gendered assumptions. Lately I’ve been feeling pretty resilient, and often even feeling like I’m a burden for even asking anyways. But if you want to commit to inclusivity, you need to stop seeing us as binary genders. You need to try to understand who we are and what we stand for.

But it is not enough to performatively participate in the pronoun discourse. You need to practice using them, and be mindful of your gendered assumptions.

This is the first conference I’ve gotten away with not shaking hands, because everyone is afraid of the Coronavirus. You win some, you lose some. But a packed space with lots of clapping and lots of moving is still overwhelming. Stimming helps, so just remember if you ever see someone flapping their hands in a crowded space they might just be coping and don’t necessarily need anything. Don’t ask them why they’re doing that. My fellow interns simply asked what they could do, if anything. So we got a lunch table in the back and spent some quiet time together. This plus all of the PTSD trauma recovery work I’ve been doing while in the midst of resurfacing everything that happened to me…I took the afternoon off in my room! I had a dance party by myself for an hour. Oh, and finished my slides.

Day 3: Party

An image from the California Academy of Sciences; it’s 4 big jellyfish swimming around behind some glass!

We had a party at the California Academy of Sciences and it was really awesome! I’m usually not able to go to things like that, because the partying is hard for my senses. But I had my earplugs and I love science museums more than anything. I met a cool RStudio employee who stuck with me for a while, listening to lots of my experiences as an autistic adult navigating the world. I’m learning that my experiences can maybe help parents of kids with special sensory needs, and to be honest, any kids that get overwhelmed. I ended up going off on my own, plus earplugs and some wine, to just go look at fish and learn about rainforests and immerse myself in all the beautiful scientific imagery. And I got to learn all about RLadies-Seattle, which is “aggressively inclusive” and includes me and my non-binary self. See you soon, RLadies et al!

Day 4: Object of type ‘closure’ is not subsettable

“What you really see when you get an error message”. The words “error, failed, unable, cannot, no, warning” are all in red, and every other bit of information that is there is rewritten to say “blah blah blah”. Hahaha!

Jenny Bryan’s closing keynote on debugging was honestly so hilarious and relatable. She walked us through the steps you should take when debugging your programs, including “Turn it off and on again!” But for real, restart your R session and just try again as a first defense. She touched on some very real issues of feeling fear and incompetence when we make errors. She made us all realize that we will always have bugs. And having a good debugging strategy can save us some of that emotional turmoil. This, of course, was reflective of the cool work that Amy J. Ko and Dastyni Loksa have done in the spaces of debugging and metacognition. Amy has consistently reminded me that having a debugging strategy is much better than panicking and flailing! The funniest part of Jenny’s talk was the slide I included, which is “What we really see when we get error messages”. All of the scary warning words are lit up in red, and everything else is replaced with “blah blah blah”. Just remember, those messages are giving you some helpful information that you can use to debug your program! However, it really would be nice if it didn’t aggressively spit out all that jargon at us. The talk was originally jokingly named after the vague and confusing famous R error message “object of type ‘closure’ is not subsettable”. She ended up keeping it, because it is just too on point.

Day 4: Data Science for Software Engineers! My Talk!

I’m mostly getting tired of writing this blog post now, so I’ll just skip to the part where I gave my talk! And it was good! And people participated in my “Software Mythbusters Game”! And people laughed! Hopefully Data Science for Software Engineers will attract some interested educators or software teams. It’s important that those who build software know how to evaluate what works and what doesn’t, and what is effective for the world and their practices. Greg Wilson’s mentorship and Maya Gans’ friendship was such a gift to my life. I’m forever inspired to keep contributing to the R world, share what I learn, stay open to learning from others, and hopefully see everyone next year!