Women in Data Cutting-Edge Practitioners and Their Views on Critical Skills, Background, and Education Cornelia Lévy-Bencheton and Shannon Cutt Women in Data by Cornelia Lévy-Bencheton and Shannon Cutt Copyright © 2015 O’Reilly Media, Inc All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safaribooksonline.com) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Shannon Cutt Production Editor: Nicole Shelby Copyeditor: Jasmine Kwityn Interior Designer: David Futato Cover Designer: Ellie Volckhausen Illustrator: Rebecca Demarest February 2015: First Edition Revision History for the First Edition 2015-01-26: First Release 2015-04-10: Second Release While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights 978-1-491-92301-6 [LSI] Chapter Women in Data: Cutting-Edge Practitioners and Their Views on Critical Skills, Background, and Education Introduction Women in data and technology are no longer outliers or anomalies; they are entering the mainstream and excelling where technical skills, advanced education, and no small amount of personal tenacity and brilliance are the minimum requirements That said, women are still an underrepresented minority in the disciplines of science, technology, engineering, and math, known by the acronym STEM To investigate and understand how and why some women extremely well, we interviewed 15 women in data to learn what got them to their current level of success, exactly what motivated them to get there, and their views about opportunities for women in tech We were very keen on hearing their recommendations about what needs to get “fixed” to close the tech gender gap for others We think you will find the stories shared during these interviews both interesting and inspiring They reveal insights that will widen the path for other women analysts, engineers, mathematicians, and data scientists These insights include: An update on the expanding role of the contemporary data scientist New attitudes toward women in data among Millennials Benefits of the data and STEM fields as a career choice for women Much needed and increasingly sought after remedies for closing the gender gap Wondering what’s new? The gender gap in tech is not news, but here’s what is: it’s shrinking The underrepresentation of women in tech has garnered tremendous attention and support of late to the point where the continued existence of the numbers disparity has fostered a nation-wide movement to bring more women into technical fields Starting with the feeder pipeline of education (from kindergarten to university) and continuing through to diversity issues in and beyond the workplace, bridging the gender gap in STEM and tech is now a nation-wide crusade and a very hot topic The groundswell of attention comes from every possible sector: public and private companies, national and local governments, associations, educators, parents, teachers, scientific organizations, media publicity, and trade groups Emphasis is on correcting a range of loss and leakage issues that occur at multiple points along the career continuum Extending from the type of coursework offered in schools, factors that discourage women from selecting and staying with tech include cultural bias, behavioral psychology, and gender stereotypes Now through increased publicity, there is a definite assault on the gender gap issue Our data practitioners confirm that dispelling myths of women’s inability to well in math and tech is only a small part of the battle Other challenges center on advancing the idea that gender diversity fuels creativity, innovation, and economic growth Much work needs to be done to publicize these truths and change the prevailing mindset Because women represent over 55% of the workforce, it is striking that fewer than 25% of jobs in technical and computing fields are held by women When 58% of bachelor’s degrees are being awarded to women, why are only 18% of computer science degrees going to women? Silicon Valley companies are leading the way in looking into these disparities and opening up advancement to better paying, higher prestige, leadership positions to their female employees These jobs are also exciting and satisfying, and contribute handsomely to the bottom line Perhaps because big data has created a tsunami of new challenges and opportunities, or perhaps because of the well-publicized need to fill over 1.4 million new jobs in computer science by 2020 (jobs that will largely go unfilled), or perhaps because of a national sense of not wanting to fall behind on the world stage, closing the gender gap in tech is finally making it to the national priority list Our interviews with practitioners in data and STEM reveal that they are themselves the solution and model for the much needed changes that will help close the gender gap in tech employment interviews and for winning the support of executives on the job Learning how to communicate sophisticated ideas in a digestible and memorable way is another equally vital component of each student’s success Given her multidisciplinary background ranging from academia and research to business consulting and course design, Skelly favors classifying data science as a “generalists’s” field As such, she believes it leaves one open to develop their career from a base of technical skills, and expand into business and real-life challenges When asked about her views on early education, Skelly commented that she’d love to see the math and science curriculum de-coupled from goals of bygone days and better tailored to modern use cases and interests For example, she recommends de-emphasizing geometry and trigonometry in favor of concepts from statistics or linear algebra In the work world, Skelly would also like to see more males advocating for and supporting women In her mind, when male coworkers actively promote their female teammates as equals, they clear away unnecessary debris and help everyone to work at their peak ability Kathleen Ting Technical Account Manager, Cloudera MS, Computer Engineering, Santa Clara University BS, Computer Engineering, Santa Clara University As a Technical Account Manager (TAM), Kathleen Ting helps Cloudera’s strategic customers deploy and use the Apache Hadoop platform in production Acting as the customer’s technical advisor, she is a technical expert with a passion for customer management A key aspect to success in this role is persistence: in managing expectations between the customer and Cloudera’s product development teams, in forming a relationship of trust with the customer, in anticipating customer needs, and in acting with agility in unexpected situations Thinking back to how she wound up in technology, Ting recalls that in high school she attended a week-long engineering camp held at Santa Clara University (SCU) This experience, and the professors she met there, convinced her to major in computer engineering as well as attend SCU Furthermore, it was at SCU that she met two of the mentors who have helped shape her career At SCU, Ting had the opportunity to select Marie Wieck, an industry stalwart at IBM, as a mentor through MentorNet Encouraging her to continue on for an advanced degree, Wieck afterwards passed Ting’s resume into IBM, leading to an interview, and Ting’s life-long dream to work at IBM was fulfilled After Ting worked at IBM on the mainframe for a few years, it was her former SCU Dean of Engineering, Dan Pitt, who piqued her interest in big data Although she lacked open source experience at the time, Ting persisted in trying to interview at Cloudera Eventually it was her unconventional offer to work for free that landed her an interview at Cloudera, where she started in early 2011 as their first Hadoop support engineer At Cloudera, Ting sought out Apache Software Foundation (ASF) member Arvind Prabhakar, who became her mentor related to all things Apache In fact, her first conversation with Prabhakar was around the need for an Apache Sqoop mainframe connector by which to easily move data from the mainframe to Hadoop Ting credits this mentorship with leading her to become an Apache Sqoop Committer and Project Management Committee (PMC) member Drawing from her work at Cloudera, Ting is a frequent speaker at data-related conferences as well as a published coauthor of the Apache Sqoop Cookbook (O’Reilly) Appreciative of how mentorship has propelled her career, Ting tries to her part in shrinking the tech gender gap by volunteering with the Society of Women Engineers as well as by speaking to young women at the She’s Geeky conference and to incoming SCU engineering freshmen In addition to using mentoring to shrink the tech gender gap, we need to foster creativity in schools, to build the mindset of learning from failures (rather than being discouraged or ashamed), and to set mandatory programming requirements to build familiarity with and exposure to technology at an early age Renetta Garrison Tull Associate Vice Provost, Director of PROMISE, University of Maryland Baltimore County (UMBC) PhD, Speech Science, Northwestern University MS, Electrical Engineering, Northwestern University BS, Electrical Engineering, Howard University Dr Renetta Tull is a recognized expert in women and minorities in education, and in the STEM gender gap—both within and outside the academic environment Dr Tull is also an electrical engineer by training and is passionate about bringing more women into the field From her vantage point at the University of Maryland Baltimore County (UMBC) as Associate Vice Provost for Graduate Student Development and Postdoctoral Affairs, Dr Tull concentrates on opportunities for graduate and postdoctoral professional development As Director of PROMISE: Maryland’s Alliance for Graduate Education and the Professoriate (AGEP) program for the University System of Maryland (USM), Dr Tull also has a unique perspective on the STEM subjects that students cover prior to attending the university, within academia, and as preparation for the workforce beyond graduation Dr Tull has been writing code since the seventh grade Fascinated by the Internet, she “learned HTML before there were WYSIWYGs!”, and remains heavily involved with the online world “I’ve been politely chided in meetings for pulling out my phones (yes plural), sending texts, and updating our organization’s professional Twitter and Facebook status, while taking care of emails from multiple accounts I manage several blogs, each for different audiences … friends, colleagues, and students.” Dr Tull has been involved with many strategic initiatives, including the following: Liaison for Institutional Collaboration at Universidad Metropolitana in San Juan, Puerto Rico Co-PI for the ADVANCE Hispanic Women in STEM Networking Conference Co-PI Collaborative Research on diversity programs with the Quality of Life Technology Center (Carnegie Mellon University/University of Pittsburgh) Leader, “Women in STEM” project for the Latin and Caribbean Consortium of Engineering Institutions (LACCEI) Invited speaker for programs at MIT, Cornell, University of Maine System, Society for Hispanic Professional Engineers (SHPE), National Society of Black Engineers (NSBE), American Indian Science and Engineering Society (AISES), The National GEM Consortium and others Former Board Member of the Northeastern Association of Graduate Schools Former Vice President of Operations for an emerging technologies firm Dr Tull believes that women thrive with support and a network She recommends organizations such as Latinas in Computing, the ACM Richard Tapia Celebration of Diversity in Computing Conference, and the National Science Foundation’s Louis Stokes Alliance for Minority Participation, a grant-based program that provides student scholarships Dr Tull works to increase the psychological sense of community and professional development opportunities for graduate students in Maryland Through her involvement with PROMISE, a local program in Maryland that is one of eight National Science Foundation AGEP programs in the United States for underrepresented STEM graduate students and postdocs Programs include the Summer Success Institute (SSI), Professors-in-Training (PROFit), Dissertation House, Research Symposium, and others According to Dr Tull, these programs have contributed to an increase in applications, enrollments, and graduation rates of underrepresented graduate students in STEM fields in Maryland Dr Tull sums up her thoughts about some of the changes and initiatives that need to be implemented across the STEM fields: “Women can be more involved with computing fields, and certainly, opportunities for ubiquitous computing limit restrictions for engagement However, women need to know that they are invited to the table; they need to see images that show that they are not excluded and they need regular and continuous access to computing environments early, within the K–12 years Many local, regional, and national groups are doing that These groups need exposure within the schools, so that girls will take advantage of them Further, the national media can contribute to a solution by showcasing images of girls working with computers, and women in computing Some leaders are talking about making computer science a core standard as early as elementary school, in the same way that math and reading are taught at early ages Building proficiency, and acceptance at a young age, can go a long way This acceptance must then continue throughout college, graduate school, and the career.” Hanna Wallach Researcher, Microsoft; Assistant Professor, University of Massachusetts Amherst PhD, Machine Learning, University of Cambridge MS, Cognitive Science and Machine Learning, University of Edinburgh BS, Computer Science, University of Cambridge In speaking with Hanna Wallach, she proudly cites a point of great distinction: “I’m probably the only person on the planet who has appeared in both Linux Format and Glamour magazines.” Another example of bridging different worlds—Wallach works in research at Microsoft and also has academic responsibilities at the University of Massachusetts Amherst At UMass, Wallach is a core faculty member in the recently formed Computational Social Science Institute, where she develops new machine learning methods for uncovering fresh insights about the ways in which people interact She collaborates with political scientists, sociologists, and journalists to learn how organizations—often those underlying the U.S political system—work in practice by analyzing publicly available data Her research contributes to machine learning, Bayesian statistics, and in collaboration with social scientists, to the field of computational social science Her work on infinite belief networks won the best paper award at AISTATS 2010, and she’s organized several workshops on Bayesian latent variable modeling and computational social science A Glamour Magazine “35 Women Under 35 Who Are Changing the Tech Industry” honoree, Wallach says “at its core, my work is really all about using fancy math and fast computers to learn about social processes, such as those that underlie the U.S government.” She adds that what enables her cutting-edge work is the massive quantity and diversity of data now publicly available “We’ve had computers for a while We’ve also had some of the math But now we have massive amounts of data What were once hypothetical theories about people and society are now being validated (or disproved!) by large-scale, data-driven analyses This is extremely exciting.” Is a researcher and computer science professor also a data professional? “Yes, absolutely,” says Wallach, “because data science brings together people with a wide variety of skill sets and backgrounds.” Moreover, she believes that in order to succeed in the field, you need to be open-minded, interested, and continually learning—you need to have a “growth mindset.” Wallach enjoys a strong network of professional women and men, who are at the ready to provide support, suggestions, and advice as role models, friends, and colleagues Committed to giving back and supporting the cause of helping other women, Wallach has cofounded three groups that focus on helping the next generation, including the annual Women in Machine Learning Workshop (which is currently in its ninth year!) From her vantage point, Hanna sees a few fixes to the educational system that would help bring more women into the data and STEM fields She believes it’s critical to involve girls in science as early as possible (waiting until college is way too late, in her opinion) In addition, she encourages visible role models, promoting diversity and inclusion, and networking Wallach strongly believes it’s important to embrace and seek out diversity: “There’s even considerable research about how teams with diverse viewpoints end up creating better ideas and products.” Alice Zheng Director of Data Science, Dato Post Doc, Carnegie Mellon, Auton Lab and Parallel Data Lab PhD, Electrical Engineering, emphasis in Communication, Computation and Statistics, University of California, Berkeley BA, Mathematics and Computer Science, University of California, Berkeley Alice Zheng is Director of Data Science at Dato—the Seattle-based startup that was awarded at the Startup Showcase at the Strata+Hadoop World conference in Fall 2014 At Dato, Zheng demonstrates her expertise in machine learning algorithms and applications—she loves playing with data and building tools that enable others to play with data How did this come about? Alice recalls being in elementary school in her native China, where math studies begin early, and bringing home a math grade that was below expectations Her parents, both aeronautical engineers, were convinced she could better And after some intense tutoring from her father, she did, thereafter to remain at the top of all her math classes To Zheng, data science is essentially about understanding data: “data science is about making cohesive sense of those many many tiny pieces of information It’s about detecting patterns that can then be used to help people make better decisions.” She adds that while many academic disciplines focus on the method, “data science focuses on the object of analysis It’s a science (and art) all about the nails, not the hammer-and I like that!” Zheng explains her feeling that the core of data science is problem solving: “I love solving problems, and I love solving problems through data But even more fundamentally, I love gaining fresh perspectives of the world There are as many perspectives as there are people, and data is as amorphous, as flexible, as mysterious, and as potentially impactful as human beings … I delight in the process of carefully deriving meaning and actionable information where it was previously dominated by chaos and confusion It’s almost a secondary benefit to then be able to make decisions or produce outputs (like making recommendations of what else someone can buy), based on the understanding I gain The insights are what I prize.” Zheng believes that the key to success is removing obstacles, even one’s own inner obstacles: “it’s our perceived limitations that limit us, not our innate abilities To encourage girls to take up math and science, I think we should be working on getting rid of all of their perceived limitations about themselves This is subtly different than building confidence Confidence is the positive booster, whereas perceived limitations means negative self-talk As we look for positive feedback to build confidence, we should also be eliminating the sources of negativity.” Zheng also has a refreshing perspective on failure—she affirms that failing at something at the beginning is not a sign that one shouldn’t it While failure is certainly uncomfortable, Zheng believes it’s not where we start that should be the determining factor She explains: “As human beings, we are very good at gauging position (where we are in terms of expertise level), but we are not so good at noticing velocity (how fast we are moving up the chain) It’s the latter that will determine the level of expertise we will eventually reach This is why it’s important to ignore those uncomfortable failures at the beginning, because it may take us a while to ramp up and maybe even longer for us to notice the change …” Margit Zwemer Founder and CEO, LiquidLandscape MFE, Financial Engineering, University of California, Berkeley BS, Mathematics, Stanford University Growing up in a family where both parents and grandparents had advanced degrees, it came naturally to Margit Zwemer to have a strong motivation toward higher education, and even to be comfortable with math and enjoy a challenge Zwemer’s grandparents emigrated from Hungary; her grandfather arrived with a PhD in chemistry and rebuilt his business here in the United States The value of a scientific education was instilled in Zwemer at an early age In fact, it was her father who taught Zwemer how to code when she was still in elementary school Zwemer’s early professional experience was as an options trader (a “quant”) at Goldman Sachs, and later with Société Générale When she joined tech startup Kaggle, she rebranded as a “data scientist.” From having worked with machine learning algorithms and the open data community at Kaggle, Margit later developed these interests and techniques further in the financial data visualization space In early 2013, Margit cofounded her own financial technology startup, LiquidLandscape, a company focused on real-time human/data interaction using high-volume financial time series data.“Founding a startup is decidedly an exciting time with ups and downs, predictably, but having the right cofounder is a huge part of building a strong company,” Zwemer shares Having her own business allows Zwemer to use all of her skills and interests —to apply math and engineering to solving real world problems, and to use charts and visuals to enable understanding by her business constituents Zwemer sees data visualizations “not just as pretty pictures to summarize the underlying data but as a tool for interacting with and manipulating a live data set and searching for patterns and anomalies that may not be easy to identify in the raw numbers.” At LiquidLandscape, Zwemer is now “exploring a broad range of visualization environments each of which has pros and cons for understanding certain aspects of the data.We use both traditional twodimensional visualizations like line charts and heat maps to show the interactions between a small number of variables and three-dimensional screen graphics that allow the overlay of additional streams of data so that users can visualize the overall state of the market.” The company is currently exploring immersive data environments in virtual reality Zwemer adds that what’s important for us now is “being able to integrate these visual environments into a single core data pipeline; it’s a huge leap over traditional ways of monitoring and visualizing data and consequently making decisions, optimizing performance and reducing risk.” Zwemer sees data science as an emerging field and a hybrid one, that paradoxically, lowers the risk associated with starting one’s own business “From my perspective, being a hybrid field, it lessens the economic risk of looking for work because it is multifaceted As one evolves and develops throughout one’s career, it’s always good to have a multiplicity of tools in the tool kit Data science provides that And since data scientists are in demand, we can take more risks, career-wise, and that is very acceptable, even valued as a badge of courage Data science enhances your risk tolerance with others and with your personal life and yourself.” About the Authors Cornelia Lévy-Bencheton is a communications strategy consultant and writer whose data-driven marketing and decision support work helps companies optimize their performance As Principal of CLB Strategic Consulting, LLC., her focus is on the impact of disruptive technologies and their associated cultural challenges that open up new opportunities and necessitate refreshed strategies She concentrates on big data, IT, Women in STEM, social media and collaborative networking Ms Lévy-Bencheton has held senior marketing and strategy positions in well-known financial services firms, is currently on the Board of The Data Warehouse Institute, Tri-State Chapter, (TDWI) and the Board of the Financial Women’s Association (FWA) She earned her MA from Stanford University, MBA from Pace University and holds several advanced certificates from New York University Shannon Cutt is an O’Reilly editor focused on content related to data She has more than seven years of experience developing and managing books, video, and other media When not reading about data, she’s hiking all over the East Bay Area and caring for any stray animal who comes her way ... succinctly: “invest in the change you want to see.” The change Kelly is interested in seeing and helping bring about is gender diversity: “I am focused on investing in women, and investing in women. .. series in 20 13’ to just that Part of the shift in our mindset with women in technology is seeing those women as experts with a point of view on technology (not simply as having a point of view on women. .. others We think you will find the stories shared during these interviews both interesting and inspiring They reveal insights that will widen the path for other women analysts, engineers, mathematicians,