Support 2: ux

The Freelance Studio Denver, Co. User Experience Agency Ending the UX Designer Drought Part 2 - Laying the Foundation by Fred Beecher June 23rd, 2015 11 Comments The first article in this series, “A New Apprenticeship Architecture,” laid out a high-level framework for using the ancient model of apprenticeship to solve the modern problem of the UX talent drought. In this article, I get into details. Specifically, I discuss how to make the business case for apprenticeship and what to look for in potential apprentices. Let’s get started! Defining the business value of apprenticeship Apprenticeship is an investment. It requires an outlay of cash upfront for a return at a later date. Apprenticeship requires the support of budget-approving levels of your organization. For you to get that support, you need to clearly show its return by demonstrating how it addresses some of your organization’s pain points. What follows is a discussion of common pain points and how apprenticeship assuages them. Hit growth targets If your company is trying to grow but can’t find enough qualified people to do the work that growth requires, that’s the sweet spot for apprenticeship. Apprenticeship allows you to make the designers you’re having trouble finding. This is going to be a temporal argument, so you need to come armed with measurements to make it. And you’ll need help from various leaders in your organization to get them. UX team growth targets for the past 2-3 years (UX leadership) Actual UX team growth for the past 2-3 years (UX leadership) Average time required to identify and hire a UX designer (HR leadership) Then you need to estimate how apprenticeship will improve these measurements. (Part 3 of this series, which will deal with the instructional design of apprenticeship, will offer details on how to make these estimates.) How many designers per year can apprenticeship contribute? How much time will be required from the design team to mentor apprentices? Growth targets typically do not exist in a vacuum. You’ll likely need to combine this argument with one of the others. Take advantage of more revenue opportunities One of the financial implications of missing growth targets is not having enough staff to capitalize on all the revenue opportunities you have. For agencies, you might have to pass up good projects because your design team has a six-week lead time. For product companies, your release schedule might fall behind due to a UX bottleneck and push you behind your competition. The data you need to make this argument differ depending on whether your company sells time (agency) or stuff (product company). When doing the math about an apprenticeship program, agencies should consider: What number of projects have been lost in the past year due to UX lead time? (Sales leadership should have this information.) What is the estimated value of UX work on lost projects? (Sales leadership) What is the estimated value of other (development, strategy, management, etc.) work on lost projects? (Sales leadership) Then, contrast these numbers with some of the benefits of apprenticeship: What is the estimated number of designers per year apprenticeship could contribute? What is the estimated amount of work these “extra” designers would be able to contribute in both hours and cash? What is the estimated profitability of junior designers (more) versus senior designers (less), assuming the same hourly rate? Product companies should consider: The ratio of innovative features versus “catch-up” features your competitors released last year. (Sales or marketing leadership should have this information.) The ratio of innovative features versus “catch-up” features you released in the past year. (Sales or marketing leadership) Any customer service and/or satisfaction metrics. (Customer service leadership) Contrast this data with… The estimated number of designers per year you could add through apprenticeship. The estimated number of features they could’ve completed for release. The estimated impact this would have on customer satisfaction. Avoid high recruiting costs Recruiting a mid- to senior-level UX designer typically means finding them and poaching them from somewhere else. This requires paying significant headhunting fees on top of the person-hours involved in reviewing resumes and portfolios and interviewing candidates. All the data you need to make this argument can come from UX leadership and HR. Average cost per UX designer recruit Average number of hours spent recruiting a UX designer Contrast this data with: Estimated cost per apprentice To estimate this, factor in: Overhead per employee Salary (and benefits if the apprenticeship is long enough to qualify while still an apprentice) Software and service licenses Mentorship time from the current design team Mentorship/management time from the designer leading the program Increase designer engagement This one is tricky because most places don’t measure engagement directly. Measuring engagement accurately requires professional quantitative research. However, there are some signs that can point to low engagement. High turnover is the number one sign of low engagement. What kind of people are leaving—junior designers, seniors, or both? If possible, try to get exit interview data (as raw as possible) to develop hypotheses about how apprenticeship could help. Maybe junior designers don’t feel like their growth is supported… allowing them to leverage elements of an apprenticeship program for further professional development could fix that. Maybe senior designers are feeling burnt out. Consistent mentorship, like that required by apprenticeship, can be reinvigorating. Other signs of low engagement include frequently missing deadlines, using more sick time, missing or being late to meetings, and more. Investigate any signs you see, validate any assumptions you might take on, and hypothesize about how apprenticeship can help address these issues. Help others If your organization is motivated by altruism, that is wonderful! At least one organization with an apprenticeship program actually tries very hard not to hire their apprentices. Boston’s Fresh Tilled Soil places their graduated apprentices with their clients, which creates a very strong relationship with those clients. Additionally, this helps them raise the caliber and capacity of the Boston metro area when it comes to UX design. Hiring great UX apprentices Hiring apprentices requires a different approach to evaluating candidates than hiring established UX designers. Most candidates will have little to no actual UX design skills, so you have to evaluate them for their potential to acquire and hone those skills. Additionally, not everyone learns effectively through apprenticeship. Identifying the traits of a good apprentice in candidates will help your program run smoothly. Evaluating for skill potential Portfolio. Even though you’re evaluating someone who may never have designed a user experience before, you still need them to bring some examples of something they’ve made. Without this, it’s impossible to get a sense of what kind of process they go through to make things. For example, one apprentice candidate brought in a print brochure she designed. Her description of how she designed it included identifying business goals, balancing competing stakeholder needs, working within constraints, and getting feedback along the way, all of which are relevant to the process of UX design. Mindset. The number one thing you must identify in a candidate is whether they already possess the UX mindset, the point of view that things are designed better when they’re designed with people in mind. This is usually the light bulb that goes off in people’s heads when they discover UX design. If that light hasn’t gone off, UX might not be the right path for that person. Apprenticeship is too much of an investment to risk that. Evaluating for this is fairly simple. It usually comes out in the course of a conversation. If not, asking outright “What does user experience design mean to you” can be helpful. Pay careful attention to how people talk about how they’ve approached their work. Is it consistent with their stated philosophy? If not, that could be a red flag. Intrinsic motivation. When people talk about having a “passion” for something, what that means is that they are intrinsically motivated to do that thing. This is pretty easy to evaluate for. What have they done to learn UX? Have they taken a class? That’s a positive sign. Have they identified and worked through a UX problem on their own? Even better! If a candidate hasn’t put in the effort to explore UX on their own, they are likely not motivated enough to do well in the field. Self-education. While self-education is a sign of intrinsic motivation, it’s also important in its own right. Apprenticeship relies heavily on mentorship, but the responsibility for the direction and nature of that mentorship lies with the apprentice themselves. If someone is a self-educator, that’s a good predictor that they’ll be able to get the most out of mentorship. This is another fairly easy one to evaluate. Ask them to tell you about the most recent UX-related blog post or article they read. It doesn’t matter what it actually is, only whether they can quickly bring something to mind. Professional skills. UX design is not a back-office field. UX designers talk with clients, customers, stakeholders, developers, and more. To be an effective UX designer a candidate must possess basic professional skills such as dressing appropriately and communicating well. Simple things like sending a “thank you” email are a great indication of good professional skills. (Physically mailed thank you notes get extra bonus points. One-off letterpressed mailed thank you notes get even more!) Collaboration. UX design is a collaborative discipline. If a candidate struggles with collaboration, they’ll struggle in the field. When discussing their work (especially class project work), be sure to ask what role they played on the project and how they interacted with other people. Complaining about others and taking on too much work themselves are some warning signs that could indicate that a candidate has trouble with collaboration. Evaluating for apprenticeship fit Learning pattern. Some people learn best by gradually being exposed to a topic. I call these people toe-dippers, as they prefer to dip their toes into something before diving in. Others prefer to barrel off the dock straight into the deep end and then struggle to the surface. I call these people deep-enders. While apprenticeship can be modified to work better for deep-enders, its gradual exposure can often frustrate them. It is much better suited for toe-dippers. Evaluating for this is tricky, though. Asking people whether they prefer to dive in or learn gradually, they’ll say “dive in” because they think that’s what you want to hear. Asking them how they’ve approached learning other skills can give some insight, but this is not 100% reliable. Learning by doing. Apprenticeship helps people acquire skills through experiential learning. If this is not how a person learns, apprenticeship may not be for them. Evaluating for this is very much like evaluating for intrinsic motivation. Has someone gone to the trouble of identifying and solving a design problem themselves? Have they practiced UX methods they have learned about? If so, it’s likely that learning by doing is effective for them. Receptiveness to critique. Apprenticeship is a period of sustained critique. Someone whose response to criticism is defensiveness or despondency will not be successful as an apprentice. This is easy to identify in an interview within the context of discussing the work examples the candidate has brought. My favorite technique for doing this is to find something insignificant to critique and then hammer on it. This is not how I normally critique, of course; it’s a pressure test. If a candidate responds with openness and a desire to learn from this encounter, that’s a very positive sign. If they launch into a monologue defending their decisions, the interview is pretty much over. If you’re fired up about UX apprenticeship (and how could you not be?), start making it happen in your organization! Do the research, find the data, and share your vision with your company’s leadership so they can see it too! When you get the go-ahead, you’ll be all ready to start looking for apprentices. If you follow these guidelines, you’ll get great apprentices who will grow into great designers. Stay tuned for Part 3 of this series where I’ll get detailed about the instructional design of apprenticeship, pedagogy, mentorship, and tracking! Share this: EmailTwitter206RedditLinkedIn229Facebook20Google Posted in Big Ideas, Business Design, Education, Workplace and Career | 11 Comments » 11 Comments Building the Business Case for Taxonomy Taxonomy of Spices and Pantries: Part 1 by Grace G Lau September 1st, 2015 9 Comments XKCD comic strip about not being able to name all seven dwarfs from Snow White. How often have you found yourself on an ill-defined site redesign project? You know, the ones that you end up redesigning and restructuring every few years as you add new content. Or perhaps you spin up a new microsite because the new product/solution doesn’t fit in with the current structure, not because you want to create a new experience around it. Maybe your site has vaguely labelled navigation buckets like “More Magic”—which is essentially your junk drawer, your “everything else.” Your top concerns on such projects are: You can’t find anything. Your users can’t find anything. The navigation isn’t consistent. You have too much content. Your hopeful answer to everything is to rely on an external search engine, not the one that’s on your site. Google will find everything for you. A typical site redesign project might include refreshing the visual design, considering the best interaction practices, and conducting usability testing. But what’s missing? Creating the taxonomy. “Taxonomy is just tagging, right? Sharepoint/AEM has it—we’re covered!” In the coming months, I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies Are you ready? ROI of taxonomy Although the word “taxonomy” is often used interchangeably with tagging, building an enterprise taxonomy means more than tagging content. It’s essentially a knowledge organization system, and its purpose is to enable the user to browse, find, and discover content. Spending the time on building that taxonomy empowers your site to better manage your content at scale, allow for meaningful navigation, expose long-tail content, reuse content assets, bridge across subjects, and provide more efficient product/brand alignment. In addition, a sound taxonomy in the long run will improve your content’s findability, support social sharing, and improve your site’s search engine optimization. (Thanks to Mike Atherton’s “Modeling Structured Content” workshop, presented at IA Summit 2013, for outlining the benefits.) How do you explain taxonomy to get stakeholders on board? No worries, we won’t be going back to high school biology. Explaining taxonomy Imagine a household kitchen. How would you organize the spices? Consider the cooks: In-laws from northern China, mom from Hong Kong, and American-born Grace. I’ve moved four times in the past five years. My husband, son, and I live with my in-laws. I have a mother who still comes over to make her Cantonese herbal soups. We all speak different languages: English, Mandarin Chinese, and Cantonese Chinese. I have the unique need of organizing my kitchen for multiple users. For my in-laws, they need to be able to find their star anise, peppercorn, tree ear mushrooms, and sesame oil. My mom needs a space to store her dried figs, dried shiitake mushrooms, dried goji berries, and snow fungus. I need to find a space for dried thyme and rosemary for the “American” food I try to make. Oh, and we all need a consistent place for salt and sugar. People can organize their kitchen by activity zones: baking, canning, preparing, and cooking. Other ways to organize a kitchen successfully could include: attributes (shelf-life, weight, temperature requirements) usage (frequency, type of use) seasonality (organic, what’s in season, local) occasion (hot pot dinners, BBQ parties) You can also consider organizing by audience such as for the five year old helper. I keep refining how the kitchen is organized each time we move. I have used sticky notes in Chinese and English with my in-laws and my mom as part of a card sorting exercise; I’ve tested the navigation around the kitchen to validate the results. A photo of pantry shelves labeled noodles, rice, garlic, and the like. Early attempts at organizing my pantry. If this is to be a data-driven taxonomy, I could consider attaching RFID tags to each spice container to track frequency and type of usage for a period of time to obtain some kitchen analytics. On the other hand, I could try guesstimating frequency by looking at the amount of grime or dust collected on the container. How often are we using chicken bouillon and to make what dishes? Does it need to be within easy reach of the stovetop or can it be relegated to a pantry closet three feet away? Photo of labeled spice jars in a drawer. From Home Depot. Understanding the users and their tasks and needs is a foundation for all things UX. Taxonomy building is not any different. How people think about and use their kitchen brings with it a certain closeness that makes taxonomy concepts easier to grasp. Who are the users? What are they trying to do? How do they currently tackle this problem? What works and what doesn’t? Watch, observe, and listen to their experience. Helping the business understand the underlying concepts is one of the challenges I’ve faced with developing a solid taxonomy. We’re not just talking about tagging but breaking down the content by its attributes and metadata as well as by its potential usage and relation to other content. The biggest challenge is building the consensus and understanding around that taxonomy—taxonomy governance—and keeping the system you’ve designed well-seasoned! Now, back to that site redesign project that you were thinking of: How about starting on that taxonomy? My next post will cover taxonomy planning. How to determine when customer feedback is actionable Merging statistics with product management by Naira Musallam, Nis Frome, Michael Williams, and Tim Lawton October 13th, 2015 1 Comments One of the riskiest assumptions for any new product or feature is that customers actually want it. Although product leaders can propose numerous ‘lean’ methodologies to experiment inexpensively with new concepts before fully engineering them, anything short of launching a product or feature and monitoring its performance over time in the market is, by definition, not 100% accurate. That leaves us with a dangerously wide spectrum of user research strategies, and an even wider range of opinions for determining when customer feedback is actionable. To the dismay of product teams desiring to ‘move fast and break things,’ their counterparts in data science and research advocate a slower, more traditional approach. These proponents of caution often emphasize an evaluation of statistical signals before considering customer insights valid enough to act upon. This dynamic has meaningful ramifications. For those who care about making data-driven business decisions, the challenge that presents itself is: How do we adhere to rigorous scientific standards in a world that demands adaptability and agility to survive? Having frequently witnessed the back-and-forth between product teams and research groups, it is clear that there is no shortage of misconceptions and miscommunication between the two. Only a thorough analysis of some critical nuances in statistics and product management can help us bridge the gap. Quantify risk tolerance You’ve probably been on one end of an argument that cited a “statistically significant” finding to support a course of action. The problem is that statistical significance is often equated to having relevant and substantive results, but neither is necessarily the case. Simply put, statistical significance exclusively refers to the level of confidence (measured from 0 to 1, or 0% to 100%) you have that the results you obtained from a given experiment are not due to chance. Statistical significance alone tells you nothing about the appropriateness of the confidence level selected nor the importance of the results. To begin, confidence levels should be context-dependent, and determining the appropriate confidence threshold is an oft-overlooked proposition that can have profound consequences. In statistics, confidence levels are closely linked to two concepts: type I and type II errors. A type I error, or false-positive, refers to believing that a variable has an effect that it actually doesn’t. Some industries, like pharmaceuticals and aeronautics, must be exceedingly cautious against false-positives. Medical researchers for example cannot afford to mistakenly think a drug has an intended benefit when in reality it does not. Side effects can be lethal so the FDA’s threshold for proof that a drug’s health benefits outweigh their known risks is intentionally onerous. A type II error, or false-negative, has to do with the flip side of the coin: concluding that a variable doesn’t have an effect when it actually does. Historically though, statistical significance has been primarily focused on avoiding false-positives (even if it means missing out on some likely opportunities) with the default confidence level at 95% for any finding to be considered actionable. The reality that this value was arbitrarily determined by scientists speaks more to their comfort level of being wrong than it does to its appropriateness in any given context. Unfortunately, this particular confidence level is used today by the vast majority of research teams at large organizations and remains generally unchallenged in contexts far different than the ones for which it was formulated. Matrix visualising Type I and Type II errors as described in text. But confidence levels should be representative of the amount of risk that an organization is willing to take to realize a potential opportunity. There are many reasons for product teams in particular to be more concerned with avoiding false-negatives than false-positives. Mistakenly missing an opportunity due to caution can have a more negative impact than building something no one really wants. Digital product teams don’t share many of the concerns of an aerospace engineering team and therefore need to calculate and quantify their own tolerance for risk. To illustrate the ramifications that confidence levels can have on business decisions, consider this thought exercise. Imagine two companies, one with outrageously profitable 90% margins, and one with painfully narrow 5% margins. Suppose each of these businesses are considering a new line of business. In the case of the high margin business, the amount of capital they have to risk to pursue the opportunity is dwarfed by the potential reward. If executives get even the weakest indication that the business might work they should pursue the new business line aggressively. In fact, waiting for perfect information before acting might be the difference between capturing a market and allowing a competitor to get there first. In the case of the narrow margin business, however, the buffer before going into the red is so small that going after the new business line wouldn’t make sense with anything except the most definitive signal. Although these two examples are obviously allegorical, they demonstrate the principle at hand. To work together effectively, research analysts and their commercially-driven counterparts should have a conversation around their organization’s particular level of comfort and to make statistical decisions accordingly. Focus on impact Confidence levels only tell half the story. They don’t address the magnitude to which the results of an experiment are meaningful to your business. Product teams need to combine the detection of an effect (i.e., the likelihood that there is an effect) with the size of that effect (i.e., the potential impact to the business), but this is often forgotten on the quest for the proverbial holy grail of statistical significance. Many teams mistakenly focus energy and resources acting on statistically significant but inconsequential findings. A meta-analysis of hundreds of consumer behavior experiments sought to qualify how seriously effect sizes are considered when evaluating research results. They found that an astonishing three-quarters of the findings didn’t even bother reporting effect sizes “because of their small values” or because of “a general lack of interest in discovering the extent to which an effect is significant…” This is troubling, because without considering effect size, there’s virtually no way to determine what opportunities are worth pursuing and in what order. Limited development resources prevent product teams from realistically tackling every single opportunity. Consider for example how the answer to this question, posed by a MECLABS data scientist, changes based on your perspective: In terms of size, what does a 0.2% difference mean? For Amazon.com, that lift might mean an extra 2,000 sales and be worth a $100,000 investment…For a mom-and-pop Yahoo! store, that increase might just equate to an extra two sales and not be worth a $100 investment. Unless you’re operating at a Google-esque scale for which an incremental lift in a conversion rate could result in literally millions of dollars in additional revenue, product teams should rely on statistics and research teams to help them prioritize the largest opportunities in front of them. Sample size constraints One of the most critical constraints on product teams that want to generate user insights is the ability to source users for experiments. With enough traffic, it’s certainly possible to generate a sample size large enough to pass traditional statistical requirements for a production split test. But it can be difficult to drive enough traffic to new product concepts, and it can also put a brand unnecessarily at risk, especially in heavily regulated industries. For product teams that can’t easily access or run tests in production environments, simulated environments offer a compelling alternative. That leaves product teams stuck between a rock and a hard place. Simulated environments require standing user panels that can get expensive quickly, especially if research teams seek sample sizes in the hundreds or thousands. Unfortunately, strategies like these again overlook important nuances in statistics and place undue hardship on the user insight generation process. A larger sample does not necessarily mean a better or more insightful sample. The objective of any sample is for it to be representative of the population of interest, so that conclusions about the sample can be extrapolated to the population. It’s assumed that the larger the sample, the more likely it is going to be representative of the population. But that’s not inherently true, especially if the sampling methodology is biased. Years ago, a client fired an entire research team in the human resources department for making this assumption. The client sought to gather feedback about employee engagement and tasked this research team with distributing a survey to the entire company of more than 20,000 global employees. From a statistical significance standpoint, only 1,000 employees needed to take the survey for the research team to derive defensible insights. Within hours after sending out the survey on a Tuesday morning, they had collected enough data and closed the survey. The problem was that only employees within a few timezones had completed the questionnaire with a solid third of the company being asleep, and therefore ignored, during collection. Clearly, a large sample isn’t inherently representative of the population. To obtain a representative sample, product teams first need to clearly identify a target persona. This may seem obvious, but it’s often not explicitly done, creating quite a bit of miscommunication for researchers and other stakeholders. What one person may mean by a ‘frequent customer’ could mean something different entirely to another person. After a persona is clearly identified, there are a few sampling techniques that one can follow, including probability sampling and nonprobability sampling techniques. A carefully-selected sample size of 100 may be considerably more representative of a target population than a thrown-together sample of 2,000. Research teams may counter with the need to meet statistical assumptions that are necessary for conducting popular tests such as a t-test or Analysis of Variance (ANOVA). These types of tests assume a normal distribution, which generally occurs as a sample size increases. But statistics has a solution for when this assumption is violated and provides other options, such as non-parametric testing, which work well for small sample sizes. In fact, the strongest argument left in favor of large sample sizes has already been discounted. Statisticians know that the larger the sample size, the easier it is to detect small effect sizes at a statistically significant level (digital product managers and marketers have become soberly aware that even a test comparing two identical versions can find a statistically significant difference between the two). But a focused product development process should be immune to this distraction because small effect sizes are of little concern. Not only that, but large effect sizes are almost as easily discovered in small samples as in large samples. For example, suppose you want to test ideas to improve a form on your website that currently gets filled out by 10% of visitors. For simplicity’s sake, let’s use a confidence level of 95% to accept any changes. To identify just a 1% absolute increase to 11%, you’d need more than 12,000 users, according to Optimizely’s stats engine formula! If you were looking for a 5% absolute increase, you’d only need 223 users. But depending on what you’re looking for, even that many users may not be needed, especially if conducting qualitative research. When identifying usability problems across your site, leading UX researchers have concluded that “elaborate usability tests are a waste of resources” because the overwhelming majority of usability issues are discovered with just five testers. An emphasis on large sample sizes can be a red herring for product stakeholders. Organizations should not be misled away from the real objective of any sample, which is an accurate representation of the identified, target population. Research teams can help product teams identify necessary sample sizes and appropriate statistical tests to ensure that findings are indeed meaningful and cost-effectively attained. Expand capacity for learning It might sound like semantics, but data should not drive decision-making. Insights should. And there can be quite a gap between the two, especially when it comes to user insights. In a recent talk on the topic of big data, Malcolm Gladwell argued that “data can tell us about the immediate environment of consumer attitudes, but it can’t tell us much about the context in which those attitudes were formed.” Essentially, statistics can be a powerful tool for obtaining and processing data, but it doesn’t have a monopoly on research. Product teams can become obsessed with their Omniture and Optimizely dashboards, but there’s a lot of rich information that can’t be captured with these tools alone. There is simply no replacement for sitting down and talking with a user or customer. Open-ended feedback in particular can lead to insights that simply cannot be discovered by other means. The focus shouldn’t be on interviewing every single user though, but rather on finding a pattern or theme from the interviews you do conduct. One of the core principles of the scientific method is the concept of replicability—that the results of any single experiment can be reproduced by another experiment. In product management, the importance of this principle cannot be overstated. You’ll presumably need any data from your research to hold true once you engineer the product or feature and release it to a user base, so reproducibility is an inherent requirement when it comes to collecting and acting on user insights. We’ve far too often seen a product team wielding a single data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a user panel that doesn’t exactly represent your target customer can skew individual test results. Similarly, and in digital product management especially, customer perceptions and trends evolve rapidly, further complicating data. Look no further than the handful of mobile operating systems which undergo yearly redesigns and updates, leading to constantly elevated user expectations. It’s perilously easy to imitate Homer Simpson’s lapse in thinking, “This year, I invested in pumpkins. They’ve been going up the whole month of October and I got a feeling they’re going to peak right around January. Then, bang! That’s when I’ll cash in.” So how can product and research teams safely transition from data to insights? Fortunately, we believe statistics offers insight into the answer. The central limit theorem is one of the foundational concepts taught in every introductory statistics class. It states that the distribution of averages tends to be Normal even when the distribution of the population from which the samples were taken is decidedly not Normal. Put as simply as possible, the theorem acknowledges that individual samples will almost invariably be skewed, but offers statisticians a way to combine them to collectively generate valid data. Regardless of how confusing or complex the underlying data may be, by performing relatively simple individual experiments, the culminating result can cut through the noise. This theorem provides a useful analogy for product management. To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process that incorporates both quantitative and qualitative techniques across a variety of environments. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an insight until a pattern has been rigorously established. Divide no more The moral of the story is that the nuances in statistics actually do matter. Dogmatically adopting textbook statistics can stifle an organization’s ability to innovate and operate competitively, but ignoring the value and perspective provided by statistics altogether can be similarly catastrophic. By understanding and appropriately applying the core tenets of statistics, product and research teams can begin with a framework for productive dialog about the risks they’re willing to take, the research methodologies they can efficiently but rigorously conduct, and the customer insights they’ll act upon. Share this: Planning a Taxonomy Project Taxonomy of Spices and Pantries: Part 2 by Grace G Lau October 20th, 2015 No Comments This is part 2 of “Taxonomy of Spices and Pantries,” in which I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies In part 1, I enumerated the business reasons for a taxonomy focus in a site redesign and gave a fun way to explain taxonomy. The kitchen isn’t going to organize itself, so the analogy continues. I’ve moved every couple of years and it shows in the kitchen. Half-used containers of ground pepper. Scattered bags of star anise. Multiple bags of ground and whole cumin. After a while, people are quick to stuff things into the nearest crammable crevice (until we move again and the IA is called upon to organize the kitchen). Planning a taxonomy covers the same questions as planning any UX project. Understanding the users and their tasks and needs is a foundation for all things UX. This article will go through the questions you should consider when planning a kitchen, er, um…, a taxonomy project. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Source: Grace Lau. Same as a designing any software, application, or website, you’ll need to meet with the stakeholders and ask questions: Purpose: Why? What will the taxonomy be used for? Users: Who’s using this taxonomy? Who will it affect? Content: What will be covered by this taxonomy? Scope: What’s the topic area and limits? Resources: What are the project resources and constraints? (Thanks to Heather Hedden, “The Accidental Taxonomist,” p.292) What’s your primary purpose? Why are you doing this? Are you moving, or planning to move? Is your kitchen so disorganized that you can’t find the sugar you needed for soy braised chicken? Is your content misplaced and hard to search? How often have you found just plain old salt in a different spot? How many kinds of salt do you have anyway–Kosher salt, sea salt, iodized salt, Hawaiian pink salt? (Why do you have so many different kinds anyway? One of my favorite recipe books recommended using red Hawaiian sea salt for kalua pig. Of course, I got it.) You might be using the taxonomy for tagging or, in librarian terms, indexing or cataloging. Maybe it’s for information search and retrieval. Are you building a faceted search results page? Perhaps this taxonomy is being used for organizing the site content and guiding the end users through the site navigation. Establishing a taxonomy as a common language also helps build consensus and creates smarter conversations. On making baozi (steamed buns), I overheard a conversation between fathers: Father-in-law: We need 酵母 [Jiàomǔ] {noun}. Dad: Yi-see? (Cantonese transliteration of yeast) Father-in-law: (confused look) Dad: Baking pow-daa? (Cantonese transliteration of baking powder) Meanwhile, I look up the Chinese translation of “yeast” in Google Translate while mother-in-law opens her go-to Chinese dictionary tool. I discover that the dictionary word for “yeast” is 发酵粉 [fājiàofěn] {noun}. Father-in-law: Ah, so it rises flour: 发面的 [fāmiànde] {verb} This discovery ensues more discussion about what it does and how it is used. There was at least 15 more minutes of discussing yeast in five different ways before the fathers agreed that they were talking about the same ingredient and its purpose. Eventually, we have this result in our bellies. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Who are the users? Are they internal? Content creators or editors, working in the CMS? Are they external users? What’s their range of experience in the domain? Are we speaking with homemakers and amateur cooks or seasoned cooks with many years at various Chinese restaurants? Looking at the users of my kitchen, I identified the following stakeholders: Content creators: the people who do the shopping and have to put away the stuff People who are always in the kitchen: my in-laws People who are sometimes in the kitchen: me Visiting users: my parents and friends who often come over for a BBQ/grill party The cleanup crew: my husband who can’t stand the mess we all make How do I create a taxonomy for them? First, I attempt to understand their mental models by watching them work in their natural environment and observing their everyday hacks as they complete their tasks. Having empathy for users’ end game—making food for the people they care for—makes a difference in developing the style, consistency, and breadth and depth of the taxonomy. What content will be covered by the taxonomy? In my kitchen, we’ll be covering sugars, salts, spices, and staples used for cooking, baking, braising, grilling, smoking, steaming, simmering, and frying. The Freelance Studio Denver, Co. User Experience Agency Ending the UX Designer Drought Part 2 - Laying the Foundation by Fred Beecher June 23rd, 2015 11 Comments The first article in this series, “A New Apprenticeship Architecture,” laid out a high-level framework for using the ancient model of apprenticeship to solve the modern problem of the UX talent drought. In this article, I get into details. Specifically, I discuss how to make the business case for apprenticeship and what to look for in potential apprentices. Let’s get started! Defining the business value of apprenticeship Apprenticeship is an investment. It requires an outlay of cash upfront for a return at a later date. Apprenticeship requires the support of budget-approving levels of your organization. For you to get that support, you need to clearly show its return by demonstrating how it addresses some of your organization’s pain points. What follows is a discussion of common pain points and how apprenticeship assuages them. Hit growth targets If your company is trying to grow but can’t find enough qualified people to do the work that growth requires, that’s the sweet spot for apprenticeship. Apprenticeship allows you to make the designers you’re having trouble finding. This is going to be a temporal argument, so you need to come armed with measurements to make it. And you’ll need help from various leaders in your organization to get them. UX team growth targets for the past 2-3 years (UX leadership) Actual UX team growth for the past 2-3 years (UX leadership) Average time required to identify and hire a UX designer (HR leadership) Then you need to estimate how apprenticeship will improve these measurements. (Part 3 of this series, which will deal with the instructional design of apprenticeship, will offer details on how to make these estimates.) How many designers per year can apprenticeship contribute? How much time will be required from the design team to mentor apprentices? Growth targets typically do not exist in a vacuum. You’ll likely need to combine this argument with one of the others. Take advantage of more revenue opportunities One of the financial implications of missing growth targets is not having enough staff to capitalize on all the revenue opportunities you have. For agencies, you might have to pass up good projects because your design team has a six-week lead time. For product companies, your release schedule might fall behind due to a UX bottleneck and push you behind your competition. The data you need to make this argument differ depending on whether your company sells time (agency) or stuff (product company). When doing the math about an apprenticeship program, agencies should consider: What number of projects have been lost in the past year due to UX lead time? (Sales leadership should have this information.) What is the estimated value of UX work on lost projects? (Sales leadership) What is the estimated value of other (development, strategy, management, etc.) work on lost projects? (Sales leadership) Then, contrast these numbers with some of the benefits of apprenticeship: What is the estimated number of designers per year apprenticeship could contribute? What is the estimated amount of work these “extra” designers would be able to contribute in both hours and cash? What is the estimated profitability of junior designers (more) versus senior designers (less), assuming the same hourly rate? Product companies should consider: The ratio of innovative features versus “catch-up” features your competitors released last year. (Sales or marketing leadership should have this information.) The ratio of innovative features versus “catch-up” features you released in the past year. (Sales or marketing leadership) Any customer service and/or satisfaction metrics. (Customer service leadership) Contrast this data with… The estimated number of designers per year you could add through apprenticeship. The estimated number of features they could’ve completed for release. The estimated impact this would have on customer satisfaction. Avoid high recruiting costs Recruiting a mid- to senior-level UX designer typically means finding them and poaching them from somewhere else. This requires paying significant headhunting fees on top of the person-hours involved in reviewing resumes and portfolios and interviewing candidates. All the data you need to make this argument can come from UX leadership and HR. Average cost per UX designer recruit Average number of hours spent recruiting a UX designer Contrast this data with: Estimated cost per apprentice To estimate this, factor in: Overhead per employee Salary (and benefits if the apprenticeship is long enough to qualify while still an apprentice) Software and service licenses Mentorship time from the current design team Mentorship/management time from the designer leading the program Increase designer engagement This one is tricky because most places don’t measure engagement directly. Measuring engagement accurately requires professional quantitative research. However, there are some signs that can point to low engagement. High turnover is the number one sign of low engagement. What kind of people are leaving—junior designers, seniors, or both? If possible, try to get exit interview data (as raw as possible) to develop hypotheses about how apprenticeship could help. Maybe junior designers don’t feel like their growth is supported… allowing them to leverage elements of an apprenticeship program for further professional development could fix that. Maybe senior designers are feeling burnt out. Consistent mentorship, like that required by apprenticeship, can be reinvigorating. Other signs of low engagement include frequently missing deadlines, using more sick time, missing or being late to meetings, and more. Investigate any signs you see, validate any assumptions you might take on, and hypothesize about how apprenticeship can help address these issues. Help others If your organization is motivated by altruism, that is wonderful! At least one organization with an apprenticeship program actually tries very hard not to hire their apprentices. Boston’s Fresh Tilled Soil places their graduated apprentices with their clients, which creates a very strong relationship with those clients. Additionally, this helps them raise the caliber and capacity of the Boston metro area when it comes to UX design. Hiring great UX apprentices Hiring apprentices requires a different approach to evaluating candidates than hiring established UX designers. Most candidates will have little to no actual UX design skills, so you have to evaluate them for their potential to acquire and hone those skills. Additionally, not everyone learns effectively through apprenticeship. Identifying the traits of a good apprentice in candidates will help your program run smoothly. Evaluating for skill potential Portfolio. Even though you’re evaluating someone who may never have designed a user experience before, you still need them to bring some examples of something they’ve made. Without this, it’s impossible to get a sense of what kind of process they go through to make things. For example, one apprentice candidate brought in a print brochure she designed. Her description of how she designed it included identifying business goals, balancing competing stakeholder needs, working within constraints, and getting feedback along the way, all of which are relevant to the process of UX design. Mindset. The number one thing you must identify in a candidate is whether they already possess the UX mindset, the point of view that things are designed better when they’re designed with people in mind. This is usually the light bulb that goes off in people’s heads when they discover UX design. If that light hasn’t gone off, UX might not be the right path for that person. Apprenticeship is too much of an investment to risk that. Evaluating for this is fairly simple. It usually comes out in the course of a conversation. If not, asking outright “What does user experience design mean to you” can be helpful. Pay careful attention to how people talk about how they’ve approached their work. Is it consistent with their stated philosophy? If not, that could be a red flag. Intrinsic motivation. When people talk about having a “passion” for something, what that means is that they are intrinsically motivated to do that thing. This is pretty easy to evaluate for. What have they done to learn UX? Have they taken a class? That’s a positive sign. Have they identified and worked through a UX problem on their own? Even better! If a candidate hasn’t put in the effort to explore UX on their own, they are likely not motivated enough to do well in the field. Self-education. While self-education is a sign of intrinsic motivation, it’s also important in its own right. Apprenticeship relies heavily on mentorship, but the responsibility for the direction and nature of that mentorship lies with the apprentice themselves. If someone is a self-educator, that’s a good predictor that they’ll be able to get the most out of mentorship. This is another fairly easy one to evaluate. Ask them to tell you about the most recent UX-related blog post or article they read. It doesn’t matter what it actually is, only whether they can quickly bring something to mind. Professional skills. UX design is not a back-office field. UX designers talk with clients, customers, stakeholders, developers, and more. To be an effective UX designer a candidate must possess basic professional skills such as dressing appropriately and communicating well. Simple things like sending a “thank you” email are a great indication of good professional skills. (Physically mailed thank you notes get extra bonus points. One-off letterpressed mailed thank you notes get even more!) Collaboration. UX design is a collaborative discipline. If a candidate struggles with collaboration, they’ll struggle in the field. When discussing their work (especially class project work), be sure to ask what role they played on the project and how they interacted with other people. Complaining about others and taking on too much work themselves are some warning signs that could indicate that a candidate has trouble with collaboration. Evaluating for apprenticeship fit Learning pattern. Some people learn best by gradually being exposed to a topic. I call these people toe-dippers, as they prefer to dip their toes into something before diving in. Others prefer to barrel off the dock straight into the deep end and then struggle to the surface. I call these people deep-enders. While apprenticeship can be modified to work better for deep-enders, its gradual exposure can often frustrate them. It is much better suited for toe-dippers. Evaluating for this is tricky, though. Asking people whether they prefer to dive in or learn gradually, they’ll say “dive in” because they think that’s what you want to hear. Asking them how they’ve approached learning other skills can give some insight, but this is not 100% reliable. Learning by doing. Apprenticeship helps people acquire skills through experiential learning. If this is not how a person learns, apprenticeship may not be for them. Evaluating for this is very much like evaluating for intrinsic motivation. Has someone gone to the trouble of identifying and solving a design problem themselves? Have they practiced UX methods they have learned about? If so, it’s likely that learning by doing is effective for them. Receptiveness to critique. Apprenticeship is a period of sustained critique. Someone whose response to criticism is defensiveness or despondency will not be successful as an apprentice. This is easy to identify in an interview within the context of discussing the work examples the candidate has brought. My favorite technique for doing this is to find something insignificant to critique and then hammer on it. This is not how I normally critique, of course; it’s a pressure test. If a candidate responds with openness and a desire to learn from this encounter, that’s a very positive sign. If they launch into a monologue defending their decisions, the interview is pretty much over. If you’re fired up about UX apprenticeship (and how could you not be?), start making it happen in your organization! Do the research, find the data, and share your vision with your company’s leadership so they can see it too! When you get the go-ahead, you’ll be all ready to start looking for apprentices. If you follow these guidelines, you’ll get great apprentices who will grow into great designers. Stay tuned for Part 3 of this series where I’ll get detailed about the instructional design of apprenticeship, pedagogy, mentorship, and tracking! Share this: EmailTwitter206RedditLinkedIn229Facebook20Google Posted in Big Ideas, Business Design, Education, Workplace and Career | 11 Comments » 11 Comments Building the Business Case for Taxonomy Taxonomy of Spices and Pantries: Part 1 by Grace G Lau September 1st, 2015 9 Comments XKCD comic strip about not being able to name all seven dwarfs from Snow White. How often have you found yourself on an ill-defined site redesign project? You know, the ones that you end up redesigning and restructuring every few years as you add new content. Or perhaps you spin up a new microsite because the new product/solution doesn’t fit in with the current structure, not because you want to create a new experience around it. Maybe your site has vaguely labelled navigation buckets like “More Magic”—which is essentially your junk drawer, your “everything else.” Your top concerns on such projects are: You can’t find anything. Your users can’t find anything. The navigation isn’t consistent. You have too much content. Your hopeful answer to everything is to rely on an external search engine, not the one that’s on your site. Google will find everything for you. A typical site redesign project might include refreshing the visual design, considering the best interaction practices, and conducting usability testing. But what’s missing? Creating the taxonomy. “Taxonomy is just tagging, right? Sharepoint/AEM has it—we’re covered!” In the coming months, I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies Are you ready? ROI of taxonomy Although the word “taxonomy” is often used interchangeably with tagging, building an enterprise taxonomy means more than tagging content. It’s essentially a knowledge organization system, and its purpose is to enable the user to browse, find, and discover content. Spending the time on building that taxonomy empowers your site to better manage your content at scale, allow for meaningful navigation, expose long-tail content, reuse content assets, bridge across subjects, and provide more efficient product/brand alignment. In addition, a sound taxonomy in the long run will improve your content’s findability, support social sharing, and improve your site’s search engine optimization. (Thanks to Mike Atherton’s “Modeling Structured Content” workshop, presented at IA Summit 2013, for outlining the benefits.) How do you explain taxonomy to get stakeholders on board? No worries, we won’t be going back to high school biology. Explaining taxonomy Imagine a household kitchen. How would you organize the spices? Consider the cooks: In-laws from northern China, mom from Hong Kong, and American-born Grace. I’ve moved four times in the past five years. My husband, son, and I live with my in-laws. I have a mother who still comes over to make her Cantonese herbal soups. We all speak different languages: English, Mandarin Chinese, and Cantonese Chinese. I have the unique need of organizing my kitchen for multiple users. For my in-laws, they need to be able to find their star anise, peppercorn, tree ear mushrooms, and sesame oil. My mom needs a space to store her dried figs, dried shiitake mushrooms, dried goji berries, and snow fungus. I need to find a space for dried thyme and rosemary for the “American” food I try to make. Oh, and we all need a consistent place for salt and sugar. People can organize their kitchen by activity zones: baking, canning, preparing, and cooking. Other ways to organize a kitchen successfully could include: attributes (shelf-life, weight, temperature requirements) usage (frequency, type of use) seasonality (organic, what’s in season, local) occasion (hot pot dinners, BBQ parties) You can also consider organizing by audience such as for the five year old helper. I keep refining how the kitchen is organized each time we move. I have used sticky notes in Chinese and English with my in-laws and my mom as part of a card sorting exercise; I’ve tested the navigation around the kitchen to validate the results. A photo of pantry shelves labeled noodles, rice, garlic, and the like. Early attempts at organizing my pantry. If this is to be a data-driven taxonomy, I could consider attaching RFID tags to each spice container to track frequency and type of usage for a period of time to obtain some kitchen analytics. On the other hand, I could try guesstimating frequency by looking at the amount of grime or dust collected on the container. How often are we using chicken bouillon and to make what dishes? Does it need to be within easy reach of the stovetop or can it be relegated to a pantry closet three feet away? Photo of labeled spice jars in a drawer. From Home Depot. Understanding the users and their tasks and needs is a foundation for all things UX. Taxonomy building is not any different. How people think about and use their kitchen brings with it a certain closeness that makes taxonomy concepts easier to grasp. Who are the users? What are they trying to do? How do they currently tackle this problem? What works and what doesn’t? Watch, observe, and listen to their experience. Helping the business understand the underlying concepts is one of the challenges I’ve faced with developing a solid taxonomy. We’re not just talking about tagging but breaking down the content by its attributes and metadata as well as by its potential usage and relation to other content. The biggest challenge is building the consensus and understanding around that taxonomy—taxonomy governance—and keeping the system you’ve designed well-seasoned! Now, back to that site redesign project that you were thinking of: How about starting on that taxonomy? My next post will cover taxonomy planning. How to determine when customer feedback is actionable Merging statistics with product management by Naira Musallam, Nis Frome, Michael Williams, and Tim Lawton October 13th, 2015 1 Comments One of the riskiest assumptions for any new product or feature is that customers actually want it. Although product leaders can propose numerous ‘lean’ methodologies to experiment inexpensively with new concepts before fully engineering them, anything short of launching a product or feature and monitoring its performance over time in the market is, by definition, not 100% accurate. That leaves us with a dangerously wide spectrum of user research strategies, and an even wider range of opinions for determining when customer feedback is actionable. To the dismay of product teams desiring to ‘move fast and break things,’ their counterparts in data science and research advocate a slower, more traditional approach. These proponents of caution often emphasize an evaluation of statistical signals before considering customer insights valid enough to act upon. This dynamic has meaningful ramifications. For those who care about making data-driven business decisions, the challenge that presents itself is: How do we adhere to rigorous scientific standards in a world that demands adaptability and agility to survive? Having frequently witnessed the back-and-forth between product teams and research groups, it is clear that there is no shortage of misconceptions and miscommunication between the two. Only a thorough analysis of some critical nuances in statistics and product management can help us bridge the gap. Quantify risk tolerance You’ve probably been on one end of an argument that cited a “statistically significant” finding to support a course of action. The problem is that statistical significance is often equated to having relevant and substantive results, but neither is necessarily the case. Simply put, statistical significance exclusively refers to the level of confidence (measured from 0 to 1, or 0% to 100%) you have that the results you obtained from a given experiment are not due to chance. Statistical significance alone tells you nothing about the appropriateness of the confidence level selected nor the importance of the results. To begin, confidence levels should be context-dependent, and determining the appropriate confidence threshold is an oft-overlooked proposition that can have profound consequences. In statistics, confidence levels are closely linked to two concepts: type I and type II errors. A type I error, or false-positive, refers to believing that a variable has an effect that it actually doesn’t. Some industries, like pharmaceuticals and aeronautics, must be exceedingly cautious against false-positives. Medical researchers for example cannot afford to mistakenly think a drug has an intended benefit when in reality it does not. Side effects can be lethal so the FDA’s threshold for proof that a drug’s health benefits outweigh their known risks is intentionally onerous. A type II error, or false-negative, has to do with the flip side of the coin: concluding that a variable doesn’t have an effect when it actually does. Historically though, statistical significance has been primarily focused on avoiding false-positives (even if it means missing out on some likely opportunities) with the default confidence level at 95% for any finding to be considered actionable. The reality that this value was arbitrarily determined by scientists speaks more to their comfort level of being wrong than it does to its appropriateness in any given context. Unfortunately, this particular confidence level is used today by the vast majority of research teams at large organizations and remains generally unchallenged in contexts far different than the ones for which it was formulated. Matrix visualising Type I and Type II errors as described in text. But confidence levels should be representative of the amount of risk that an organization is willing to take to realize a potential opportunity. There are many reasons for product teams in particular to be more concerned with avoiding false-negatives than false-positives. Mistakenly missing an opportunity due to caution can have a more negative impact than building something no one really wants. Digital product teams don’t share many of the concerns of an aerospace engineering team and therefore need to calculate and quantify their own tolerance for risk. To illustrate the ramifications that confidence levels can have on business decisions, consider this thought exercise. Imagine two companies, one with outrageously profitable 90% margins, and one with painfully narrow 5% margins. Suppose each of these businesses are considering a new line of business. In the case of the high margin business, the amount of capital they have to risk to pursue the opportunity is dwarfed by the potential reward. If executives get even the weakest indication that the business might work they should pursue the new business line aggressively. In fact, waiting for perfect information before acting might be the difference between capturing a market and allowing a competitor to get there first. In the case of the narrow margin business, however, the buffer before going into the red is so small that going after the new business line wouldn’t make sense with anything except the most definitive signal. Although these two examples are obviously allegorical, they demonstrate the principle at hand. To work together effectively, research analysts and their commercially-driven counterparts should have a conversation around their organization’s particular level of comfort and to make statistical decisions accordingly. Focus on impact Confidence levels only tell half the story. They don’t address the magnitude to which the results of an experiment are meaningful to your business. Product teams need to combine the detection of an effect (i.e., the likelihood that there is an effect) with the size of that effect (i.e., the potential impact to the business), but this is often forgotten on the quest for the proverbial holy grail of statistical significance. Many teams mistakenly focus energy and resources acting on statistically significant but inconsequential findings. A meta-analysis of hundreds of consumer behavior experiments sought to qualify how seriously effect sizes are considered when evaluating research results. They found that an astonishing three-quarters of the findings didn’t even bother reporting effect sizes “because of their small values” or because of “a general lack of interest in discovering the extent to which an effect is significant…” This is troubling, because without considering effect size, there’s virtually no way to determine what opportunities are worth pursuing and in what order. Limited development resources prevent product teams from realistically tackling every single opportunity. Consider for example how the answer to this question, posed by a MECLABS data scientist, changes based on your perspective: In terms of size, what does a 0.2% difference mean? For Amazon.com, that lift might mean an extra 2,000 sales and be worth a $100,000 investment…For a mom-and-pop Yahoo! store, that increase might just equate to an extra two sales and not be worth a $100 investment. Unless you’re operating at a Google-esque scale for which an incremental lift in a conversion rate could result in literally millions of dollars in additional revenue, product teams should rely on statistics and research teams to help them prioritize the largest opportunities in front of them. Sample size constraints One of the most critical constraints on product teams that want to generate user insights is the ability to source users for experiments. With enough traffic, it’s certainly possible to generate a sample size large enough to pass traditional statistical requirements for a production split test. But it can be difficult to drive enough traffic to new product concepts, and it can also put a brand unnecessarily at risk, especially in heavily regulated industries. For product teams that can’t easily access or run tests in production environments, simulated environments offer a compelling alternative. That leaves product teams stuck between a rock and a hard place. Simulated environments require standing user panels that can get expensive quickly, especially if research teams seek sample sizes in the hundreds or thousands. Unfortunately, strategies like these again overlook important nuances in statistics and place undue hardship on the user insight generation process. A larger sample does not necessarily mean a better or more insightful sample. The objective of any sample is for it to be representative of the population of interest, so that conclusions about the sample can be extrapolated to the population. It’s assumed that the larger the sample, the more likely it is going to be representative of the population. But that’s not inherently true, especially if the sampling methodology is biased. Years ago, a client fired an entire research team in the human resources department for making this assumption. The client sought to gather feedback about employee engagement and tasked this research team with distributing a survey to the entire company of more than 20,000 global employees. From a statistical significance standpoint, only 1,000 employees needed to take the survey for the research team to derive defensible insights. Within hours after sending out the survey on a Tuesday morning, they had collected enough data and closed the survey. The problem was that only employees within a few timezones had completed the questionnaire with a solid third of the company being asleep, and therefore ignored, during collection. Clearly, a large sample isn’t inherently representative of the population. To obtain a representative sample, product teams first need to clearly identify a target persona. This may seem obvious, but it’s often not explicitly done, creating quite a bit of miscommunication for researchers and other stakeholders. What one person may mean by a ‘frequent customer’ could mean something different entirely to another person. After a persona is clearly identified, there are a few sampling techniques that one can follow, including probability sampling and nonprobability sampling techniques. A carefully-selected sample size of 100 may be considerably more representative of a target population than a thrown-together sample of 2,000. Research teams may counter with the need to meet statistical assumptions that are necessary for conducting popular tests such as a t-test or Analysis of Variance (ANOVA). These types of tests assume a normal distribution, which generally occurs as a sample size increases. But statistics has a solution for when this assumption is violated and provides other options, such as non-parametric testing, which work well for small sample sizes. In fact, the strongest argument left in favor of large sample sizes has already been discounted. Statisticians know that the larger the sample size, the easier it is to detect small effect sizes at a statistically significant level (digital product managers and marketers have become soberly aware that even a test comparing two identical versions can find a statistically significant difference between the two). But a focused product development process should be immune to this distraction because small effect sizes are of little concern. Not only that, but large effect sizes are almost as easily discovered in small samples as in large samples. For example, suppose you want to test ideas to improve a form on your website that currently gets filled out by 10% of visitors. For simplicity’s sake, let’s use a confidence level of 95% to accept any changes. To identify just a 1% absolute increase to 11%, you’d need more than 12,000 users, according to Optimizely’s stats engine formula! If you were looking for a 5% absolute increase, you’d only need 223 users. But depending on what you’re looking for, even that many users may not be needed, especially if conducting qualitative research. When identifying usability problems across your site, leading UX researchers have concluded that “elaborate usability tests are a waste of resources” because the overwhelming majority of usability issues are discovered with just five testers. An emphasis on large sample sizes can be a red herring for product stakeholders. Organizations should not be misled away from the real objective of any sample, which is an accurate representation of the identified, target population. Research teams can help product teams identify necessary sample sizes and appropriate statistical tests to ensure that findings are indeed meaningful and cost-effectively attained. Expand capacity for learning It might sound like semantics, but data should not drive decision-making. Insights should. And there can be quite a gap between the two, especially when it comes to user insights. In a recent talk on the topic of big data, Malcolm Gladwell argued that “data can tell us about the immediate environment of consumer attitudes, but it can’t tell us much about the context in which those attitudes were formed.” Essentially, statistics can be a powerful tool for obtaining and processing data, but it doesn’t have a monopoly on research. Product teams can become obsessed with their Omniture and Optimizely dashboards, but there’s a lot of rich information that can’t be captured with these tools alone. There is simply no replacement for sitting down and talking with a user or customer. Open-ended feedback in particular can lead to insights that simply cannot be discovered by other means. The focus shouldn’t be on interviewing every single user though, but rather on finding a pattern or theme from the interviews you do conduct. One of the core principles of the scientific method is the concept of replicability—that the results of any single experiment can be reproduced by another experiment. In product management, the importance of this principle cannot be overstated. You’ll presumably need any data from your research to hold true once you engineer the product or feature and release it to a user base, so reproducibility is an inherent requirement when it comes to collecting and acting on user insights. We’ve far too often seen a product team wielding a single data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a user panel that doesn’t exactly represent your target customer can skew individual test results. Similarly, and in digital product management especially, customer perceptions and trends evolve rapidly, further complicating data. Look no further than the handful of mobile operating systems which undergo yearly redesigns and updates, leading to constantly elevated user expectations. It’s perilously easy to imitate Homer Simpson’s lapse in thinking, “This year, I invested in pumpkins. They’ve been going up the whole month of October and I got a feeling they’re going to peak right around January. Then, bang! That’s when I’ll cash in.” So how can product and research teams safely transition from data to insights? Fortunately, we believe statistics offers insight into the answer. The central limit theorem is one of the foundational concepts taught in every introductory statistics class. It states that the distribution of averages tends to be Normal even when the distribution of the population from which the samples were taken is decidedly not Normal. Put as simply as possible, the theorem acknowledges that individual samples will almost invariably be skewed, but offers statisticians a way to combine them to collectively generate valid data. Regardless of how confusing or complex the underlying data may be, by performing relatively simple individual experiments, the culminating result can cut through the noise. This theorem provides a useful analogy for product management. To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process that incorporates both quantitative and qualitative techniques across a variety of environments. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an insight until a pattern has been rigorously established. Divide no more The moral of the story is that the nuances in statistics actually do matter. Dogmatically adopting textbook statistics can stifle an organization’s ability to innovate and operate competitively, but ignoring the value and perspective provided by statistics altogether can be similarly catastrophic. By understanding and appropriately applying the core tenets of statistics, product and research teams can begin with a framework for productive dialog about the risks they’re willing to take, the research methodologies they can efficiently but rigorously conduct, and the customer insights they’ll act upon. Share this: Planning a Taxonomy Project Taxonomy of Spices and Pantries: Part 2 by Grace G Lau October 20th, 2015 No Comments This is part 2 of “Taxonomy of Spices and Pantries,” in which I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies In part 1, I enumerated the business reasons for a taxonomy focus in a site redesign and gave a fun way to explain taxonomy. The kitchen isn’t going to organize itself, so the analogy continues. I’ve moved every couple of years and it shows in the kitchen. Half-used containers of ground pepper. Scattered bags of star anise. Multiple bags of ground and whole cumin. After a while, people are quick to stuff things into the nearest crammable crevice (until we move again and the IA is called upon to organize the kitchen). Planning a taxonomy covers the same questions as planning any UX project. Understanding the users and their tasks and needs is a foundation for all things UX. This article will go through the questions you should consider when planning a kitchen, er, um…, a taxonomy project. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Source: Grace Lau. Same as a designing any software, application, or website, you’ll need to meet with the stakeholders and ask questions: Purpose: Why? What will the taxonomy be used for? Users: Who’s using this taxonomy? Who will it affect? Content: What will be covered by this taxonomy? Scope: What’s the topic area and limits? Resources: What are the project resources and constraints? (Thanks to Heather Hedden, “The Accidental Taxonomist,” p.292) What’s your primary purpose? Why are you doing this? Are you moving, or planning to move? Is your kitchen so disorganized that you can’t find the sugar you needed for soy braised chicken? Is your content misplaced and hard to search? How often have you found just plain old salt in a different spot? How many kinds of salt do you have anyway–Kosher salt, sea salt, iodized salt, Hawaiian pink salt? (Why do you have so many different kinds anyway? One of my favorite recipe books recommended using red Hawaiian sea salt for kalua pig. Of course, I got it.) You might be using the taxonomy for tagging or, in librarian terms, indexing or cataloging. Maybe it’s for information search and retrieval. Are you building a faceted search results page? Perhaps this taxonomy is being used for organizing the site content and guiding the end users through the site navigation. Establishing a taxonomy as a common language also helps build consensus and creates smarter conversations. On making baozi (steamed buns), I overheard a conversation between fathers: Father-in-law: We need 酵母 [Jiàomǔ] {noun}. Dad: Yi-see? (Cantonese transliteration of yeast) Father-in-law: (confused look) Dad: Baking pow-daa? (Cantonese transliteration of baking powder) Meanwhile, I look up the Chinese translation of “yeast” in Google Translate while mother-in-law opens her go-to Chinese dictionary tool. I discover that the dictionary word for “yeast” is 发酵粉 [fājiàofěn] {noun}. Father-in-law: Ah, so it rises flour: 发面的 [fāmiànde] {verb} This discovery ensues more discussion about what it does and how it is used. There was at least 15 more minutes of discussing yeast in five different ways before the fathers agreed that they were talking about the same ingredient and its purpose. Eventually, we have this result in our bellies. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Who are the users? Are they internal? Content creators or editors, working in the CMS? Are they external users? What’s their range of experience in the domain? Are we speaking with homemakers and amateur cooks or seasoned cooks with many years at various Chinese restaurants? Looking at the users of my kitchen, I identified the following stakeholders: Content creators: the people who do the shopping and have to put away the stuff People who are always in the kitchen: my in-laws People who are sometimes in the kitchen: me Visiting users: my parents and friends who often come over for a BBQ/grill party The cleanup crew: my husband who can’t stand the mess we all make How do I create a taxonomy for them? First, I attempt to understand their mental models by watching them work in their natural environment and observing their everyday hacks as they complete their tasks. Having empathy for users’ end game—making food for the people they care for—makes a difference in developing the style, consistency, and breadth and depth of the taxonomy. What content will be covered by the taxonomy? In my kitchen, we’ll be covering sugars, salts, spices, and staples used for cooking, baking, braising, grilling, smoking, steaming, simmering, and frying. How did I determine that? Terminology from existing content. I opened up every cabinet and door in my kitchen and made an inventory. Search logs. How were users accessing my kitchen? Why? How were users referring to things? What were they looking for? Storytelling with users. How did you make this? People like to share recipes and I like to watch friends cook. Doing user interviews has never been more fun! What’s the scope? Scope can easily get out of hand. Notice that I have not included in my discussion any cookbooks, kitchen hardware and appliances, pots and pans, or anything that’s in the refrigerator or freezer. You may need a scope document early on to plan releases (if you need them). Perhaps for the first release, I’ll just deal with the frequent use items. Then I’ll move on to occasional use items (soups and desserts). If the taxonomy you’re developing is faceted—for example, allowing your users to browse your cupboards by particular attributes such as taste, canned vs dried, or weight—your scope should include only those attributes relevant to the search process. For instance, no one really searches for canned goods in my kitchen, so that’s out of scope. What resources do you have available? My kitchen taxonomy will be limited. Stakeholders are multilingual so items will need labelling in English, Simplified Chinese, and pinyin romanization. I had considered building a Drupal site to manage an inventory, but I have neither the funding or time to implement such a complex site. At the same time, what are users’ expectations for the taxonomy? Considering the context in the taxonomy’s usage is important. How will (or should) a taxonomy empower its users? It needs to be invisible; as an indication of a good taxonomy, it shouldn’t affect their current workflow but make it more efficient. Both fathers and my mom are unlikely to stop and use any digital technology to find and look things up. Most importantly, the completed taxonomy and actual content migration should not conflict with the preparation of the next meal. My baby needs a packed lunch for school, and it’s 6 a.m. when I’m preparing it. There’s no time to rush around looking for things. Time is limited and a complete displacement of spices and condiments would disrupt the high-traffic flow in any household. Meanwhile, we’re out of soy sauce again and I’d rather it not be stashed in yet a new home and forgotten. That’s why we ended up with three open bottles of soy sauce from different brands. What else should you consider for the taxonomy? Understanding the scope of the taxonomy you’re building can help prevent scope creep in a taxonomy project. In time, you’ll realize that the 80% of your time and effort is devoted to research while 20% of the time and effort is actually developing the taxonomy. So, making time for iterations and validation through card sorting and other testing is important in your planning. In my next article, I will explore the many uses of taxonomy outside of tagging.The Freelance Studio Denver, Co. User Experience Agency Ending the UX Designer Drought Part 2 - Laying the Foundation by Fred Beecher June 23rd, 2015 11 Comments The first article in this series, “A New Apprenticeship Architecture,” laid out a high-level framework for using the ancient model of apprenticeship to solve the modern problem of the UX talent drought. In this article, I get into details. Specifically, I discuss how to make the business case for apprenticeship and what to look for in potential apprentices. Let’s get started! Defining the business value of apprenticeship Apprenticeship is an investment. It requires an outlay of cash upfront for a return at a later date. Apprenticeship requires the support of budget-approving levels of your organization. For you to get that support, you need to clearly show its return by demonstrating how it addresses some of your organization’s pain points. What follows is a discussion of common pain points and how apprenticeship assuages them. Hit growth targets If your company is trying to grow but can’t find enough qualified people to do the work that growth requires, that’s the sweet spot for apprenticeship. Apprenticeship allows you to make the designers you’re having trouble finding. This is going to be a temporal argument, so you need to come armed with measurements to make it. And you’ll need help from various leaders in your organization to get them. UX team growth targets for the past 2-3 years (UX leadership) Actual UX team growth for the past 2-3 years (UX leadership) Average time required to identify and hire a UX designer (HR leadership) Then you need to estimate how apprenticeship will improve these measurements. (Part 3 of this series, which will deal with the instructional design of apprenticeship, will offer details on how to make these estimates.) How many designers per year can apprenticeship contribute? How much time will be required from the design team to mentor apprentices? Growth targets typically do not exist in a vacuum. You’ll likely need to combine this argument with one of the others. Take advantage of more revenue opportunities One of the financial implications of missing growth targets is not having enough staff to capitalize on all the revenue opportunities you have. For agencies, you might have to pass up good projects because your design team has a six-week lead time. For product companies, your release schedule might fall behind due to a UX bottleneck and push you behind your competition. The data you need to make this argument differ depending on whether your company sells time (agency) or stuff (product company). When doing the math about an apprenticeship program, agencies should consider: What number of projects have been lost in the past year due to UX lead time? (Sales leadership should have this information.) What is the estimated value of UX work on lost projects? (Sales leadership) What is the estimated value of other (development, strategy, management, etc.) work on lost projects? (Sales leadership) Then, contrast these numbers with some of the benefits of apprenticeship: What is the estimated number of designers per year apprenticeship could contribute? What is the estimated amount of work these “extra” designers would be able to contribute in both hours and cash? What is the estimated profitability of junior designers (more) versus senior designers (less), assuming the same hourly rate? Product companies should consider: The ratio of innovative features versus “catch-up” features your competitors released last year. (Sales or marketing leadership should have this information.) The ratio of innovative features versus “catch-up” features you released in the past year. (Sales or marketing leadership) Any customer service and/or satisfaction metrics. (Customer service leadership) Contrast this data with… The estimated number of designers per year you could add through apprenticeship. The estimated number of features they could’ve completed for release. The estimated impact this would have on customer satisfaction. Avoid high recruiting costs Recruiting a mid- to senior-level UX designer typically means finding them and poaching them from somewhere else. This requires paying significant headhunting fees on top of the person-hours involved in reviewing resumes and portfolios and interviewing candidates. All the data you need to make this argument can come from UX leadership and HR. Average cost per UX designer recruit Average number of hours spent recruiting a UX designer Contrast this data with: Estimated cost per apprentice To estimate this, factor in: Overhead per employee Salary (and benefits if the apprenticeship is long enough to qualify while still an apprentice) Software and service licenses Mentorship time from the current design team Mentorship/management time from the designer leading the program Increase designer engagement This one is tricky because most places don’t measure engagement directly. Measuring engagement accurately requires professional quantitative research. However, there are some signs that can point to low engagement. High turnover is the number one sign of low engagement. What kind of people are leaving—junior designers, seniors, or both? If possible, try to get exit interview data (as raw as possible) to develop hypotheses about how apprenticeship could help. Maybe junior designers don’t feel like their growth is supported… allowing them to leverage elements of an apprenticeship program for further professional development could fix that. Maybe senior designers are feeling burnt out. Consistent mentorship, like that required by apprenticeship, can be reinvigorating. Other signs of low engagement include frequently missing deadlines, using more sick time, missing or being late to meetings, and more. Investigate any signs you see, validate any assumptions you might take on, and hypothesize about how apprenticeship can help address these issues. Help others If your organization is motivated by altruism, that is wonderful! At least one organization with an apprenticeship program actually tries very hard not to hire their apprentices. Boston’s Fresh Tilled Soil places their graduated apprentices with their clients, which creates a very strong relationship with those clients. Additionally, this helps them raise the caliber and capacity of the Boston metro area when it comes to UX design. Hiring great UX apprentices Hiring apprentices requires a different approach to evaluating candidates than hiring established UX designers. Most candidates will have little to no actual UX design skills, so you have to evaluate them for their potential to acquire and hone those skills. Additionally, not everyone learns effectively through apprenticeship. Identifying the traits of a good apprentice in candidates will help your program run smoothly. Evaluating for skill potential Portfolio. Even though you’re evaluating someone who may never have designed a user experience before, you still need them to bring some examples of something they’ve made. Without this, it’s impossible to get a sense of what kind of process they go through to make things. For example, one apprentice candidate brought in a print brochure she designed. Her description of how she designed it included identifying business goals, balancing competing stakeholder needs, working within constraints, and getting feedback along the way, all of which are relevant to the process of UX design. Mindset. The number one thing you must identify in a candidate is whether they already possess the UX mindset, the point of view that things are designed better when they’re designed with people in mind. This is usually the light bulb that goes off in people’s heads when they discover UX design. If that light hasn’t gone off, UX might not be the right path for that person. Apprenticeship is too much of an investment to risk that. Evaluating for this is fairly simple. It usually comes out in the course of a conversation. If not, asking outright “What does user experience design mean to you” can be helpful. Pay careful attention to how people talk about how they’ve approached their work. Is it consistent with their stated philosophy? If not, that could be a red flag. Intrinsic motivation. When people talk about having a “passion” for something, what that means is that they are intrinsically motivated to do that thing. This is pretty easy to evaluate for. What have they done to learn UX? Have they taken a class? That’s a positive sign. Have they identified and worked through a UX problem on their own? Even better! If a candidate hasn’t put in the effort to explore UX on their own, they are likely not motivated enough to do well in the field. Self-education. While self-education is a sign of intrinsic motivation, it’s also important in its own right. Apprenticeship relies heavily on mentorship, but the responsibility for the direction and nature of that mentorship lies with the apprentice themselves. If someone is a self-educator, that’s a good predictor that they’ll be able to get the most out of mentorship. This is another fairly easy one to evaluate. Ask them to tell you about the most recent UX-related blog post or article they read. It doesn’t matter what it actually is, only whether they can quickly bring something to mind. Professional skills. UX design is not a back-office field. UX designers talk with clients, customers, stakeholders, developers, and more. To be an effective UX designer a candidate must possess basic professional skills such as dressing appropriately and communicating well. Simple things like sending a “thank you” email are a great indication of good professional skills. (Physically mailed thank you notes get extra bonus points. One-off letterpressed mailed thank you notes get even more!) Collaboration. UX design is a collaborative discipline. If a candidate struggles with collaboration, they’ll struggle in the field. When discussing their work (especially class project work), be sure to ask what role they played on the project and how they interacted with other people. Complaining about others and taking on too much work themselves are some warning signs that could indicate that a candidate has trouble with collaboration. Evaluating for apprenticeship fit Learning pattern. Some people learn best by gradually being exposed to a topic. I call these people toe-dippers, as they prefer to dip their toes into something before diving in. Others prefer to barrel off the dock straight into the deep end and then struggle to the surface. I call these people deep-enders. While apprenticeship can be modified to work better for deep-enders, its gradual exposure can often frustrate them. It is much better suited for toe-dippers. Evaluating for this is tricky, though. Asking people whether they prefer to dive in or learn gradually, they’ll say “dive in” because they think that’s what you want to hear. Asking them how they’ve approached learning other skills can give some insight, but this is not 100% reliable. Learning by doing. Apprenticeship helps people acquire skills through experiential learning. If this is not how a person learns, apprenticeship may not be for them. Evaluating for this is very much like evaluating for intrinsic motivation. Has someone gone to the trouble of identifying and solving a design problem themselves? Have they practiced UX methods they have learned about? If so, it’s likely that learning by doing is effective for them. Receptiveness to critique. Apprenticeship is a period of sustained critique. Someone whose response to criticism is defensiveness or despondency will not be successful as an apprentice. This is easy to identify in an interview within the context of discussing the work examples the candidate has brought. My favorite technique for doing this is to find something insignificant to critique and then hammer on it. This is not how I normally critique, of course; it’s a pressure test. If a candidate responds with openness and a desire to learn from this encounter, that’s a very positive sign. If they launch into a monologue defending their decisions, the interview is pretty much over. If you’re fired up about UX apprenticeship (and how could you not be?), start making it happen in your organization! Do the research, find the data, and share your vision with your company’s leadership so they can see it too! When you get the go-ahead, you’ll be all ready to start looking for apprentices. If you follow these guidelines, you’ll get great apprentices who will grow into great designers. Stay tuned for Part 3 of this series where I’ll get detailed about the instructional design of apprenticeship, pedagogy, mentorship, and tracking! Share this: EmailTwitter206RedditLinkedIn229Facebook20Google Posted in Big Ideas, Business Design, Education, Workplace and Career | 11 Comments » 11 Comments Building the Business Case for Taxonomy Taxonomy of Spices and Pantries: Part 1 by Grace G Lau September 1st, 2015 9 Comments XKCD comic strip about not being able to name all seven dwarfs from Snow White. How often have you found yourself on an ill-defined site redesign project? You know, the ones that you end up redesigning and restructuring every few years as you add new content. Or perhaps you spin up a new microsite because the new product/solution doesn’t fit in with the current structure, not because you want to create a new experience around it. Maybe your site has vaguely labelled navigation buckets like “More Magic”—which is essentially your junk drawer, your “everything else.” Your top concerns on such projects are: You can’t find anything. Your users can’t find anything. The navigation isn’t consistent. You have too much content. Your hopeful answer to everything is to rely on an external search engine, not the one that’s on your site. Google will find everything for you. A typical site redesign project might include refreshing the visual design, considering the best interaction practices, and conducting usability testing. But what’s missing? Creating the taxonomy. “Taxonomy is just tagging, right? Sharepoint/AEM has it—we’re covered!” In the coming months, I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies Are you ready? ROI of taxonomy Although the word “taxonomy” is often used interchangeably with tagging, building an enterprise taxonomy means more than tagging content. It’s essentially a knowledge organization system, and its purpose is to enable the user to browse, find, and discover content. Spending the time on building that taxonomy empowers your site to better manage your content at scale, allow for meaningful navigation, expose long-tail content, reuse content assets, bridge across subjects, and provide more efficient product/brand alignment. In addition, a sound taxonomy in the long run will improve your content’s findability, support social sharing, and improve your site’s search engine optimization. (Thanks to Mike Atherton’s “Modeling Structured Content” workshop, presented at IA Summit 2013, for outlining the benefits.) How do you explain taxonomy to get stakeholders on board? No worries, we won’t be going back to high school biology. Explaining taxonomy Imagine a household kitchen. How would you organize the spices? Consider the cooks: In-laws from northern China, mom from Hong Kong, and American-born Grace. I’ve moved four times in the past five years. My husband, son, and I live with my in-laws. I have a mother who still comes over to make her Cantonese herbal soups. We all speak different languages: English, Mandarin Chinese, and Cantonese Chinese. I have the unique need of organizing my kitchen for multiple users. For my in-laws, they need to be able to find their star anise, peppercorn, tree ear mushrooms, and sesame oil. My mom needs a space to store her dried figs, dried shiitake mushrooms, dried goji berries, and snow fungus. I need to find a space for dried thyme and rosemary for the “American” food I try to make. Oh, and we all need a consistent place for salt and sugar. People can organize their kitchen by activity zones: baking, canning, preparing, and cooking. Other ways to organize a kitchen successfully could include: attributes (shelf-life, weight, temperature requirements) usage (frequency, type of use) seasonality (organic, what’s in season, local) occasion (hot pot dinners, BBQ parties) You can also consider organizing by audience such as for the five year old helper. I keep refining how the kitchen is organized each time we move. I have used sticky notes in Chinese and English with my in-laws and my mom as part of a card sorting exercise; I’ve tested the navigation around the kitchen to validate the results. A photo of pantry shelves labeled noodles, rice, garlic, and the like. Early attempts at organizing my pantry. If this is to be a data-driven taxonomy, I could consider attaching RFID tags to each spice container to track frequency and type of usage for a period of time to obtain some kitchen analytics. On the other hand, I could try guesstimating frequency by looking at the amount of grime or dust collected on the container. How often are we using chicken bouillon and to make what dishes? Does it need to be within easy reach of the stovetop or can it be relegated to a pantry closet three feet away? Photo of labeled spice jars in a drawer. From Home Depot. Understanding the users and their tasks and needs is a foundation for all things UX. Taxonomy building is not any different. How people think about and use their kitchen brings with it a certain closeness that makes taxonomy concepts easier to grasp. Who are the users? What are they trying to do? How do they currently tackle this problem? What works and what doesn’t? Watch, observe, and listen to their experience. Helping the business understand the underlying concepts is one of the challenges I’ve faced with developing a solid taxonomy. We’re not just talking about tagging but breaking down the content by its attributes and metadata as well as by its potential usage and relation to other content. The biggest challenge is building the consensus and understanding around that taxonomy—taxonomy governance—and keeping the system you’ve designed well-seasoned! Now, back to that site redesign project that you were thinking of: How about starting on that taxonomy? My next post will cover taxonomy planning. How to determine when customer feedback is actionable Merging statistics with product management by Naira Musallam, Nis Frome, Michael Williams, and Tim Lawton October 13th, 2015 1 Comments One of the riskiest assumptions for any new product or feature is that customers actually want it. Although product leaders can propose numerous ‘lean’ methodologies to experiment inexpensively with new concepts before fully engineering them, anything short of launching a product or feature and monitoring its performance over time in the market is, by definition, not 100% accurate. That leaves us with a dangerously wide spectrum of user research strategies, and an even wider range of opinions for determining when customer feedback is actionable. To the dismay of product teams desiring to ‘move fast and break things,’ their counterparts in data science and research advocate a slower, more traditional approach. These proponents of caution often emphasize an evaluation of statistical signals before considering customer insights valid enough to act upon. This dynamic has meaningful ramifications. For those who care about making data-driven business decisions, the challenge that presents itself is: How do we adhere to rigorous scientific standards in a world that demands adaptability and agility to survive? Having frequently witnessed the back-and-forth between product teams and research groups, it is clear that there is no shortage of misconceptions and miscommunication between the two. Only a thorough analysis of some critical nuances in statistics and product management can help us bridge the gap. Quantify risk tolerance You’ve probably been on one end of an argument that cited a “statistically significant” finding to support a course of action. The problem is that statistical significance is often equated to having relevant and substantive results, but neither is necessarily the case. Simply put, statistical significance exclusively refers to the level of confidence (measured from 0 to 1, or 0% to 100%) you have that the results you obtained from a given experiment are not due to chance. Statistical significance alone tells you nothing about the appropriateness of the confidence level selected nor the importance of the results. To begin, confidence levels should be context-dependent, and determining the appropriate confidence threshold is an oft-overlooked proposition that can have profound consequences. In statistics, confidence levels are closely linked to two concepts: type I and type II errors. A type I error, or false-positive, refers to believing that a variable has an effect that it actually doesn’t. Some industries, like pharmaceuticals and aeronautics, must be exceedingly cautious against false-positives. Medical researchers for example cannot afford to mistakenly think a drug has an intended benefit when in reality it does not. Side effects can be lethal so the FDA’s threshold for proof that a drug’s health benefits outweigh their known risks is intentionally onerous. A type II error, or false-negative, has to do with the flip side of the coin: concluding that a variable doesn’t have an effect when it actually does. Historically though, statistical significance has been primarily focused on avoiding false-positives (even if it means missing out on some likely opportunities) with the default confidence level at 95% for any finding to be considered actionable. The reality that this value was arbitrarily determined by scientists speaks more to their comfort level of being wrong than it does to its appropriateness in any given context. Unfortunately, this particular confidence level is used today by the vast majority of research teams at large organizations and remains generally unchallenged in contexts far different than the ones for which it was formulated. Matrix visualising Type I and Type II errors as described in text. But confidence levels should be representative of the amount of risk that an organization is willing to take to realize a potential opportunity. There are many reasons for product teams in particular to be more concerned with avoiding false-negatives than false-positives. Mistakenly missing an opportunity due to caution can have a more negative impact than building something no one really wants. Digital product teams don’t share many of the concerns of an aerospace engineering team and therefore need to calculate and quantify their own tolerance for risk. To illustrate the ramifications that confidence levels can have on business decisions, consider this thought exercise. Imagine two companies, one with outrageously profitable 90% margins, and one with painfully narrow 5% margins. Suppose each of these businesses are considering a new line of business. In the case of the high margin business, the amount of capital they have to risk to pursue the opportunity is dwarfed by the potential reward. If executives get even the weakest indication that the business might work they should pursue the new business line aggressively. In fact, waiting for perfect information before acting might be the difference between capturing a market and allowing a competitor to get there first. In the case of the narrow margin business, however, the buffer before going into the red is so small that going after the new business line wouldn’t make sense with anything except the most definitive signal. Although these two examples are obviously allegorical, they demonstrate the principle at hand. To work together effectively, research analysts and their commercially-driven counterparts should have a conversation around their organization’s particular level of comfort and to make statistical decisions accordingly. Focus on impact Confidence levels only tell half the story. They don’t address the magnitude to which the results of an experiment are meaningful to your business. Product teams need to combine the detection of an effect (i.e., the likelihood that there is an effect) with the size of that effect (i.e., the potential impact to the business), but this is often forgotten on the quest for the proverbial holy grail of statistical significance. Many teams mistakenly focus energy and resources acting on statistically significant but inconsequential findings. A meta-analysis of hundreds of consumer behavior experiments sought to qualify how seriously effect sizes are considered when evaluating research results. They found that an astonishing three-quarters of the findings didn’t even bother reporting effect sizes “because of their small values” or because of “a general lack of interest in discovering the extent to which an effect is significant…” This is troubling, because without considering effect size, there’s virtually no way to determine what opportunities are worth pursuing and in what order. Limited development resources prevent product teams from realistically tackling every single opportunity. Consider for example how the answer to this question, posed by a MECLABS data scientist, changes based on your perspective: In terms of size, what does a 0.2% difference mean? For Amazon.com, that lift might mean an extra 2,000 sales and be worth a $100,000 investment…For a mom-and-pop Yahoo! store, that increase might just equate to an extra two sales and not be worth a $100 investment. Unless you’re operating at a Google-esque scale for which an incremental lift in a conversion rate could result in literally millions of dollars in additional revenue, product teams should rely on statistics and research teams to help them prioritize the largest opportunities in front of them. Sample size constraints One of the most critical constraints on product teams that want to generate user insights is the ability to source users for experiments. With enough traffic, it’s certainly possible to generate a sample size large enough to pass traditional statistical requirements for a production split test. But it can be difficult to drive enough traffic to new product concepts, and it can also put a brand unnecessarily at risk, especially in heavily regulated industries. For product teams that can’t easily access or run tests in production environments, simulated environments offer a compelling alternative. That leaves product teams stuck between a rock and a hard place. Simulated environments require standing user panels that can get expensive quickly, especially if research teams seek sample sizes in the hundreds or thousands. Unfortunately, strategies like these again overlook important nuances in statistics and place undue hardship on the user insight generation process. A larger sample does not necessarily mean a better or more insightful sample. The objective of any sample is for it to be representative of the population of interest, so that conclusions about the sample can be extrapolated to the population. It’s assumed that the larger the sample, the more likely it is going to be representative of the population. But that’s not inherently true, especially if the sampling methodology is biased. Years ago, a client fired an entire research team in the human resources department for making this assumption. The client sought to gather feedback about employee engagement and tasked this research team with distributing a survey to the entire company of more than 20,000 global employees. From a statistical significance standpoint, only 1,000 employees needed to take the survey for the research team to derive defensible insights. Within hours after sending out the survey on a Tuesday morning, they had collected enough data and closed the survey. The problem was that only employees within a few timezones had completed the questionnaire with a solid third of the company being asleep, and therefore ignored, during collection. Clearly, a large sample isn’t inherently representative of the population. To obtain a representative sample, product teams first need to clearly identify a target persona. This may seem obvious, but it’s often not explicitly done, creating quite a bit of miscommunication for researchers and other stakeholders. What one person may mean by a ‘frequent customer’ could mean something different entirely to another person. After a persona is clearly identified, there are a few sampling techniques that one can follow, including probability sampling and nonprobability sampling techniques. A carefully-selected sample size of 100 may be considerably more representative of a target population than a thrown-together sample of 2,000. Research teams may counter with the need to meet statistical assumptions that are necessary for conducting popular tests such as a t-test or Analysis of Variance (ANOVA). These types of tests assume a normal distribution, which generally occurs as a sample size increases. But statistics has a solution for when this assumption is violated and provides other options, such as non-parametric testing, which work well for small sample sizes. In fact, the strongest argument left in favor of large sample sizes has already been discounted. Statisticians know that the larger the sample size, the easier it is to detect small effect sizes at a statistically significant level (digital product managers and marketers have become soberly aware that even a test comparing two identical versions can find a statistically significant difference between the two). But a focused product development process should be immune to this distraction because small effect sizes are of little concern. Not only that, but large effect sizes are almost as easily discovered in small samples as in large samples. For example, suppose you want to test ideas to improve a form on your website that currently gets filled out by 10% of visitors. For simplicity’s sake, let’s use a confidence level of 95% to accept any changes. To identify just a 1% absolute increase to 11%, you’d need more than 12,000 users, according to Optimizely’s stats engine formula! If you were looking for a 5% absolute increase, you’d only need 223 users. But depending on what you’re looking for, even that many users may not be needed, especially if conducting qualitative research. When identifying usability problems across your site, leading UX researchers have concluded that “elaborate usability tests are a waste of resources” because the overwhelming majority of usability issues are discovered with just five testers. An emphasis on large sample sizes can be a red herring for product stakeholders. Organizations should not be misled away from the real objective of any sample, which is an accurate representation of the identified, target population. Research teams can help product teams identify necessary sample sizes and appropriate statistical tests to ensure that findings are indeed meaningful and cost-effectively attained. Expand capacity for learning It might sound like semantics, but data should not drive decision-making. Insights should. And there can be quite a gap between the two, especially when it comes to user insights. In a recent talk on the topic of big data, Malcolm Gladwell argued that “data can tell us about the immediate environment of consumer attitudes, but it can’t tell us much about the context in which those attitudes were formed.” Essentially, statistics can be a powerful tool for obtaining and processing data, but it doesn’t have a monopoly on research. Product teams can become obsessed with their Omniture and Optimizely dashboards, but there’s a lot of rich information that can’t be captured with these tools alone. There is simply no replacement for sitting down and talking with a user or customer. Open-ended feedback in particular can lead to insights that simply cannot be discovered by other means. The focus shouldn’t be on interviewing every single user though, but rather on finding a pattern or theme from the interviews you do conduct. One of the core principles of the scientific method is the concept of replicability—that the results of any single experiment can be reproduced by another experiment. In product management, the importance of this principle cannot be overstated. You’ll presumably need any data from your research to hold true once you engineer the product or feature and release it to a user base, so reproducibility is an inherent requirement when it comes to collecting and acting on user insights. We’ve far too often seen a product team wielding a single data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a user panel that doesn’t exactly represent your target customer can skew individual test results. Similarly, and in digital product management especially, customer perceptions and trends evolve rapidly, further complicating data. Look no further than the handful of mobile operating systems which undergo yearly redesigns and updates, leading to constantly elevated user expectations. It’s perilously easy to imitate Homer Simpson’s lapse in thinking, “This year, I invested in pumpkins. They’ve been going up the whole month of October and I got a feeling they’re going to peak right around January. Then, bang! That’s when I’ll cash in.” So how can product and research teams safely transition from data to insights? Fortunately, we believe statistics offers insight into the answer. The central limit theorem is one of the foundational concepts taught in every introductory statistics class. It states that the distribution of averages tends to be Normal even when the distribution of the population from which the samples were taken is decidedly not Normal. Put as simply as possible, the theorem acknowledges that individual samples will almost invariably be skewed, but offers statisticians a way to combine them to collectively generate valid data. Regardless of how confusing or complex the underlying data may be, by performing relatively simple individual experiments, the culminating result can cut through the noise. This theorem provides a useful analogy for product management. To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process that incorporates both quantitative and qualitative techniques across a variety of environments. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an insight until a pattern has been rigorously established. Divide no more The moral of the story is that the nuances in statistics actually do matter. Dogmatically adopting textbook statistics can stifle an organization’s ability to innovate and operate competitively, but ignoring the value and perspective provided by statistics altogether can be similarly catastrophic. By understanding and appropriately applying the core tenets of statistics, product and research teams can begin with a framework for productive dialog about the risks they’re willing to take, the research methodologies they can efficiently but rigorously conduct, and the customer insights they’ll act upon. Share this: Planning a Taxonomy Project Taxonomy of Spices and Pantries: Part 2 by Grace G Lau October 20th, 2015 No Comments This is part 2 of “Taxonomy of Spices and Pantries,” in which I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies In part 1, I enumerated the business reasons for a taxonomy focus in a site redesign and gave a fun way to explain taxonomy. The kitchen isn’t going to organize itself, so the analogy continues. I’ve moved every couple of years and it shows in the kitchen. Half-used containers of ground pepper. Scattered bags of star anise. Multiple bags of ground and whole cumin. After a while, people are quick to stuff things into the nearest crammable crevice (until we move again and the IA is called upon to organize the kitchen). Planning a taxonomy covers the same questions as planning any UX project. Understanding the users and their tasks and needs is a foundation for all things UX. This article will go through the questions you should consider when planning a kitchen, er, um…, a taxonomy project. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Source: Grace Lau. Same as a designing any software, application, or website, you’ll need to meet with the stakeholders and ask questions: Purpose: Why? What will the taxonomy be used for? Users: Who’s using this taxonomy? Who will it affect? Content: What will be covered by this taxonomy? Scope: What’s the topic area and limits? Resources: What are the project resources and constraints? (Thanks to Heather Hedden, “The Accidental Taxonomist,” p.292) What’s your primary purpose? Why are you doing this? Are you moving, or planning to move? Is your kitchen so disorganized that you can’t find the sugar you needed for soy braised chicken? Is your content misplaced and hard to search? How often have you found just plain old salt in a different spot? How many kinds of salt do you have anyway–Kosher salt, sea salt, iodized salt, Hawaiian pink salt? (Why do you have so many different kinds anyway? One of my favorite recipe books recommended using red Hawaiian sea salt for kalua pig. Of course, I got it.) You might be using the taxonomy for tagging or, in librarian terms, indexing or cataloging. Maybe it’s for information search and retrieval. Are you building a faceted search results page? Perhaps this taxonomy is being used for organizing the site content and guiding the end users through the site navigation. Establishing a taxonomy as a common language also helps build consensus and creates smarter conversations. On making baozi (steamed buns), I overheard a conversation between fathers: Father-in-law: We need 酵母 [Jiàomǔ] {noun}. Dad: Yi-see? (Cantonese transliteration of yeast) Father-in-law: (confused look) Dad: Baking pow-daa? (Cantonese transliteration of baking powder) Meanwhile, I look up the Chinese translation of “yeast” in Google Translate while mother-in-law opens her go-to Chinese dictionary tool. I discover that the dictionary word for “yeast” is 发酵粉 [fājiàofěn] {noun}. Father-in-law: Ah, so it rises flour: 发面的 [fāmiànde] {verb} This discovery ensues more discussion about what it does and how it is used. There was at least 15 more minutes of discussing yeast in five different ways before the fathers agreed that they were talking about the same ingredient and its purpose. Eventually, we have this result in our bellies. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Who are the users? Are they internal? Content creators or editors, working in the CMS? Are they external users? What’s their range of experience in the domain? Are we speaking with homemakers and amateur cooks or seasoned cooks with many years at various Chinese restaurants? Looking at the users of my kitchen, I identified the following stakeholders: Content creators: the people who do the shopping and have to put away the stuff People who are always in the kitchen: my in-laws People who are sometimes in the kitchen: me Visiting users: my parents and friends who often come over for a BBQ/grill party The cleanup crew: my husband who can’t stand the mess we all make How do I create a taxonomy for them? First, I attempt to understand their mental models by watching them work in their natural environment and observing their everyday hacks as they complete their tasks. Having empathy for users’ end game—making food for the people they care for—makes a difference in developing the style, consistency, and breadth and depth of the taxonomy. What content will be covered by the taxonomy? In my kitchen, we’ll be covering sugars, salts, spices, and staples used for cooking, baking, braising, grilling, smoking, steaming, simmering, and frying. How did I determine that? Terminology from existing content. I opened up every cabinet and door in my kitchen and made an inventory. Search logs. How were users accessing my kitchen? Why? How were users referring to things? What were they looking for? Storytelling with users. How did you make this? People like to share recipes and I like to watch friends cook. Doing user interviews has never been more fun! What’s the scope? Scope can easily get out of hand. Notice that I have not included in my discussion any cookbooks, kitchen hardware and appliances, pots and pans, or anything that’s in the refrigerator or freezer. You may need a scope document early on to plan releases (if you need them). Perhaps for the first release, I’ll just deal with the frequent use items. Then I’ll move on to occasional use items (soups and desserts). If the taxonomy you’re developing is faceted—for example, allowing your users to browse your cupboards by particular attributes such as taste, canned vs dried, or weight—your scope should include only those attributes relevant to the search process. For instance, no one really searches for canned goods in my kitchen, so that’s out of scope. What resources do you have available? My kitchen taxonomy will be limited. Stakeholders are multilingual so items will need labelling in English, Simplified Chinese, and pinyin romanization. I had considered building a Drupal site to manage an inventory, but I have neither the funding or time to implement such a complex site. At the same time, what are users’ expectations for the taxonomy? Considering the context in the taxonomy’s usage is important. How will (or should) a taxonomy empower its users? It needs to be invisible; as an indication of a good taxonomy, it shouldn’t affect their current workflow but make it more efficient. Both fathers and my mom are unlikely to stop and use any digital technology to find and look things up. Most importantly, the completed taxonomy and actual content migration should not conflict with the preparation of the next meal. My baby needs a packed lunch for school, and it’s 6 a.m. when I’m preparing it. There’s no time to rush around looking for things. Time is limited and a complete displacement of spices and condiments would disrupt the high-traffic flow in any household. Meanwhile, we’re out of soy sauce again and I’d rather it not be stashed in yet a new home and forgotten. That’s why we ended up with three open bottles of soy sauce from different brands. What else should you consider for the taxonomy? Understanding the scope of the taxonomy you’re building can help prevent scope creep in a taxonomy project. In time, you’ll realize that the 80% of your time and effort is devoted to research while 20% of the time and effort is actually developing the taxonomy. So, making time for iterations and validation through card sorting and other testing is important in your planning. In my next article, I will explore the many uses of taxonomy outside of tagging.The Freelance Studio Denver, Co. User Experience Agency Ending the UX Designer Drought Part 2 - Laying the Foundation by Fred Beecher June 23rd, 2015 11 Comments The first article in this series, “A New Apprenticeship Architecture,” laid out a high-level framework for using the ancient model of apprenticeship to solve the modern problem of the UX talent drought. In this article, I get into details. Specifically, I discuss how to make the business case for apprenticeship and what to look for in potential apprentices. Let’s get started! Defining the business value of apprenticeship Apprenticeship is an investment. It requires an outlay of cash upfront for a return at a later date. Apprenticeship requires the support of budget-approving levels of your organization. For you to get that support, you need to clearly show its return by demonstrating how it addresses some of your organization’s pain points. What follows is a discussion of common pain points and how apprenticeship assuages them. Hit growth targets If your company is trying to grow but can’t find enough qualified people to do the work that growth requires, that’s the sweet spot for apprenticeship. Apprenticeship allows you to make the designers you’re having trouble finding. This is going to be a temporal argument, so you need to come armed with measurements to make it. And you’ll need help from various leaders in your organization to get them. UX team growth targets for the past 2-3 years (UX leadership) Actual UX team growth for the past 2-3 years (UX leadership) Average time required to identify and hire a UX designer (HR leadership) Then you need to estimate how apprenticeship will improve these measurements. (Part 3 of this series, which will deal with the instructional design of apprenticeship, will offer details on how to make these estimates.) How many designers per year can apprenticeship contribute? How much time will be required from the design team to mentor apprentices? Growth targets typically do not exist in a vacuum. You’ll likely need to combine this argument with one of the others. Take advantage of more revenue opportunities One of the financial implications of missing growth targets is not having enough staff to capitalize on all the revenue opportunities you have. For agencies, you might have to pass up good projects because your design team has a six-week lead time. For product companies, your release schedule might fall behind due to a UX bottleneck and push you behind your competition. The data you need to make this argument differ depending on whether your company sells time (agency) or stuff (product company). When doing the math about an apprenticeship program, agencies should consider: What number of projects have been lost in the past year due to UX lead time? (Sales leadership should have this information.) What is the estimated value of UX work on lost projects? (Sales leadership) What is the estimated value of other (development, strategy, management, etc.) work on lost projects? (Sales leadership) Then, contrast these numbers with some of the benefits of apprenticeship: What is the estimated number of designers per year apprenticeship could contribute? What is the estimated amount of work these “extra” designers would be able to contribute in both hours and cash? What is the estimated profitability of junior designers (more) versus senior designers (less), assuming the same hourly rate? Product companies should consider: The ratio of innovative features versus “catch-up” features your competitors released last year. (Sales or marketing leadership should have this information.) The ratio of innovative features versus “catch-up” features you released in the past year. (Sales or marketing leadership) Any customer service and/or satisfaction metrics. (Customer service leadership) Contrast this data with… The estimated number of designers per year you could add through apprenticeship. The estimated number of features they could’ve completed for release. The estimated impact this would have on customer satisfaction. Avoid high recruiting costs Recruiting a mid- to senior-level UX designer typically means finding them and poaching them from somewhere else. This requires paying significant headhunting fees on top of the person-hours involved in reviewing resumes and portfolios and interviewing candidates. All the data you need to make this argument can come from UX leadership and HR. Average cost per UX designer recruit Average number of hours spent recruiting a UX designer Contrast this data with: Estimated cost per apprentice To estimate this, factor in: Overhead per employee Salary (and benefits if the apprenticeship is long enough to qualify while still an apprentice) Software and service licenses Mentorship time from the current design team Mentorship/management time from the designer leading the program Increase designer engagement This one is tricky because most places don’t measure engagement directly. Measuring engagement accurately requires professional quantitative research. However, there are some signs that can point to low engagement. High turnover is the number one sign of low engagement. What kind of people are leaving—junior designers, seniors, or both? If possible, try to get exit interview data (as raw as possible) to develop hypotheses about how apprenticeship could help. Maybe junior designers don’t feel like their growth is supported… allowing them to leverage elements of an apprenticeship program for further professional development could fix that. Maybe senior designers are feeling burnt out. Consistent mentorship, like that required by apprenticeship, can be reinvigorating. Other signs of low engagement include frequently missing deadlines, using more sick time, missing or being late to meetings, and more. Investigate any signs you see, validate any assumptions you might take on, and hypothesize about how apprenticeship can help address these issues. Help others If your organization is motivated by altruism, that is wonderful! At least one organization with an apprenticeship program actually tries very hard not to hire their apprentices. Boston’s Fresh Tilled Soil places their graduated apprentices with their clients, which creates a very strong relationship with those clients. Additionally, this helps them raise the caliber and capacity of the Boston metro area when it comes to UX design. Hiring great UX apprentices Hiring apprentices requires a different approach to evaluating candidates than hiring established UX designers. Most candidates will have little to no actual UX design skills, so you have to evaluate them for their potential to acquire and hone those skills. Additionally, not everyone learns effectively through apprenticeship. Identifying the traits of a good apprentice in candidates will help your program run smoothly. Evaluating for skill potential Portfolio. Even though you’re evaluating someone who may never have designed a user experience before, you still need them to bring some examples of something they’ve made. Without this, it’s impossible to get a sense of what kind of process they go through to make things. For example, one apprentice candidate brought in a print brochure she designed. Her description of how she designed it included identifying business goals, balancing competing stakeholder needs, working within constraints, and getting feedback along the way, all of which are relevant to the process of UX design. Mindset. The number one thing you must identify in a candidate is whether they already possess the UX mindset, the point of view that things are designed better when they’re designed with people in mind. This is usually the light bulb that goes off in people’s heads when they discover UX design. If that light hasn’t gone off, UX might not be the right path for that person. Apprenticeship is too much of an investment to risk that. Evaluating for this is fairly simple. It usually comes out in the course of a conversation. If not, asking outright “What does user experience design mean to you” can be helpful. Pay careful attention to how people talk about how they’ve approached their work. Is it consistent with their stated philosophy? If not, that could be a red flag. Intrinsic motivation. When people talk about having a “passion” for something, what that means is that they are intrinsically motivated to do that thing. This is pretty easy to evaluate for. What have they done to learn UX? Have they taken a class? That’s a positive sign. Have they identified and worked through a UX problem on their own? Even better! If a candidate hasn’t put in the effort to explore UX on their own, they are likely not motivated enough to do well in the field. Self-education. While self-education is a sign of intrinsic motivation, it’s also important in its own right. Apprenticeship relies heavily on mentorship, but the responsibility for the direction and nature of that mentorship lies with the apprentice themselves. If someone is a self-educator, that’s a good predictor that they’ll be able to get the most out of mentorship. This is another fairly easy one to evaluate. Ask them to tell you about the most recent UX-related blog post or article they read. It doesn’t matter what it actually is, only whether they can quickly bring something to mind. Professional skills. UX design is not a back-office field. UX designers talk with clients, customers, stakeholders, developers, and more. To be an effective UX designer a candidate must possess basic professional skills such as dressing appropriately and communicating well. Simple things like sending a “thank you” email are a great indication of good professional skills. (Physically mailed thank you notes get extra bonus points. One-off letterpressed mailed thank you notes get even more!) Collaboration. UX design is a collaborative discipline. If a candidate struggles with collaboration, they’ll struggle in the field. When discussing their work (especially class project work), be sure to ask what role they played on the project and how they interacted with other people. Complaining about others and taking on too much work themselves are some warning signs that could indicate that a candidate has trouble with collaboration. Evaluating for apprenticeship fit Learning pattern. Some people learn best by gradually being exposed to a topic. I call these people toe-dippers, as they prefer to dip their toes into something before diving in. Others prefer to barrel off the dock straight into the deep end and then struggle to the surface. I call these people deep-enders. While apprenticeship can be modified to work better for deep-enders, its gradual exposure can often frustrate them. It is much better suited for toe-dippers. Evaluating for this is tricky, though. Asking people whether they prefer to dive in or learn gradually, they’ll say “dive in” because they think that’s what you want to hear. Asking them how they’ve approached learning other skills can give some insight, but this is not 100% reliable. Learning by doing. Apprenticeship helps people acquire skills through experiential learning. If this is not how a person learns, apprenticeship may not be for them. Evaluating for this is very much like evaluating for intrinsic motivation. Has someone gone to the trouble of identifying and solving a design problem themselves? Have they practiced UX methods they have learned about? If so, it’s likely that learning by doing is effective for them. Receptiveness to critique. Apprenticeship is a period of sustained critique. Someone whose response to criticism is defensiveness or despondency will not be successful as an apprentice. This is easy to identify in an interview within the context of discussing the work examples the candidate has brought. My favorite technique for doing this is to find something insignificant to critique and then hammer on it. This is not how I normally critique, of course; it’s a pressure test. If a candidate responds with openness and a desire to learn from this encounter, that’s a very positive sign. If they launch into a monologue defending their decisions, the interview is pretty much over. If you’re fired up about UX apprenticeship (and how could you not be?), start making it happen in your organization! Do the research, find the data, and share your vision with your company’s leadership so they can see it too! When you get the go-ahead, you’ll be all ready to start looking for apprentices. If you follow these guidelines, you’ll get great apprentices who will grow into great designers. Stay tuned for Part 3 of this series where I’ll get detailed about the instructional design of apprenticeship, pedagogy, mentorship, and tracking! Share this: EmailTwitter206RedditLinkedIn229Facebook20Google Posted in Big Ideas, Business Design, Education, Workplace and Career | 11 Comments » 11 Comments Building the Business Case for Taxonomy Taxonomy of Spices and Pantries: Part 1 by Grace G Lau September 1st, 2015 9 Comments XKCD comic strip about not being able to name all seven dwarfs from Snow White. How often have you found yourself on an ill-defined site redesign project? You know, the ones that you end up redesigning and restructuring every few years as you add new content. Or perhaps you spin up a new microsite because the new product/solution doesn’t fit in with the current structure, not because you want to create a new experience around it. Maybe your site has vaguely labelled navigation buckets like “More Magic”—which is essentially your junk drawer, your “everything else.” Your top concerns on such projects are: You can’t find anything. Your users can’t find anything. The navigation isn’t consistent. You have too much content. Your hopeful answer to everything is to rely on an external search engine, not the one that’s on your site. Google will find everything for you. A typical site redesign project might include refreshing the visual design, considering the best interaction practices, and conducting usability testing. But what’s missing? Creating the taxonomy. “Taxonomy is just tagging, right? Sharepoint/AEM has it—we’re covered!” In the coming months, I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies Are you ready? ROI of taxonomy Although the word “taxonomy” is often used interchangeably with tagging, building an enterprise taxonomy means more than tagging content. It’s essentially a knowledge organization system, and its purpose is to enable the user to browse, find, and discover content. Spending the time on building that taxonomy empowers your site to better manage your content at scale, allow for meaningful navigation, expose long-tail content, reuse content assets, bridge across subjects, and provide more efficient product/brand alignment. In addition, a sound taxonomy in the long run will improve your content’s findability, support social sharing, and improve your site’s search engine optimization. (Thanks to Mike Atherton’s “Modeling Structured Content” workshop, presented at IA Summit 2013, for outlining the benefits.) How do you explain taxonomy to get stakeholders on board? No worries, we won’t be going back to high school biology. Explaining taxonomy Imagine a household kitchen. How would you organize the spices? Consider the cooks: In-laws from northern China, mom from Hong Kong, and American-born Grace. I’ve moved four times in the past five years. My husband, son, and I live with my in-laws. I have a mother who still comes over to make her Cantonese herbal soups. We all speak different languages: English, Mandarin Chinese, and Cantonese Chinese. I have the unique need of organizing my kitchen for multiple users. For my in-laws, they need to be able to find their star anise, peppercorn, tree ear mushrooms, and sesame oil. My mom needs a space to store her dried figs, dried shiitake mushrooms, dried goji berries, and snow fungus. I need to find a space for dried thyme and rosemary for the “American” food I try to make. Oh, and we all need a consistent place for salt and sugar. People can organize their kitchen by activity zones: baking, canning, preparing, and cooking. Other ways to organize a kitchen successfully could include: attributes (shelf-life, weight, temperature requirements) usage (frequency, type of use) seasonality (organic, what’s in season, local) occasion (hot pot dinners, BBQ parties) You can also consider organizing by audience such as for the five year old helper. I keep refining how the kitchen is organized each time we move. I have used sticky notes in Chinese and English with my in-laws and my mom as part of a card sorting exercise; I’ve tested the navigation around the kitchen to validate the results. A photo of pantry shelves labeled noodles, rice, garlic, and the like. Early attempts at organizing my pantry. If this is to be a data-driven taxonomy, I could consider attaching RFID tags to each spice container to track frequency and type of usage for a period of time to obtain some kitchen analytics. On the other hand, I could try guesstimating frequency by looking at the amount of grime or dust collected on the container. How often are we using chicken bouillon and to make what dishes? Does it need to be within easy reach of the stovetop or can it be relegated to a pantry closet three feet away? Photo of labeled spice jars in a drawer. From Home Depot. Understanding the users and their tasks and needs is a foundation for all things UX. Taxonomy building is not any different. How people think about and use their kitchen brings with it a certain closeness that makes taxonomy concepts easier to grasp. Who are the users? What are they trying to do? How do they currently tackle this problem? What works and what doesn’t? Watch, observe, and listen to their experience. Helping the business understand the underlying concepts is one of the challenges I’ve faced with developing a solid taxonomy. We’re not just talking about tagging but breaking down the content by its attributes and metadata as well as by its potential usage and relation to other content. The biggest challenge is building the consensus and understanding around that taxonomy—taxonomy governance—and keeping the system you’ve designed well-seasoned! Now, back to that site redesign project that you were thinking of: How about starting on that taxonomy? My next post will cover taxonomy planning. How to determine when customer feedback is actionable Merging statistics with product management by Naira Musallam, Nis Frome, Michael Williams, and Tim Lawton October 13th, 2015 1 Comments One of the riskiest assumptions for any new product or feature is that customers actually want it. Although product leaders can propose numerous ‘lean’ methodologies to experiment inexpensively with new concepts before fully engineering them, anything short of launching a product or feature and monitoring its performance over time in the market is, by definition, not 100% accurate. That leaves us with a dangerously wide spectrum of user research strategies, and an even wider range of opinions for determining when customer feedback is actionable. To the dismay of product teams desiring to ‘move fast and break things,’ their counterparts in data science and research advocate a slower, more traditional approach. These proponents of caution often emphasize an evaluation of statistical signals before considering customer insights valid enough to act upon. This dynamic has meaningful ramifications. For those who care about making data-driven business decisions, the challenge that presents itself is: How do we adhere to rigorous scientific standards in a world that demands adaptability and agility to survive? Having frequently witnessed the back-and-forth between product teams and research groups, it is clear that there is no shortage of misconceptions and miscommunication between the two. Only a thorough analysis of some critical nuances in statistics and product management can help us bridge the gap. Quantify risk tolerance You’ve probably been on one end of an argument that cited a “statistically significant” finding to support a course of action. The problem is that statistical significance is often equated to having relevant and substantive results, but neither is necessarily the case. Simply put, statistical significance exclusively refers to the level of confidence (measured from 0 to 1, or 0% to 100%) you have that the results you obtained from a given experiment are not due to chance. Statistical significance alone tells you nothing about the appropriateness of the confidence level selected nor the importance of the results. To begin, confidence levels should be context-dependent, and determining the appropriate confidence threshold is an oft-overlooked proposition that can have profound consequences. In statistics, confidence levels are closely linked to two concepts: type I and type II errors. A type I error, or false-positive, refers to believing that a variable has an effect that it actually doesn’t. Some industries, like pharmaceuticals and aeronautics, must be exceedingly cautious against false-positives. Medical researchers for example cannot afford to mistakenly think a drug has an intended benefit when in reality it does not. Side effects can be lethal so the FDA’s threshold for proof that a drug’s health benefits outweigh their known risks is intentionally onerous. A type II error, or false-negative, has to do with the flip side of the coin: concluding that a variable doesn’t have an effect when it actually does. Historically though, statistical significance has been primarily focused on avoiding false-positives (even if it means missing out on some likely opportunities) with the default confidence level at 95% for any finding to be considered actionable. The reality that this value was arbitrarily determined by scientists speaks more to their comfort level of being wrong than it does to its appropriateness in any given context. Unfortunately, this particular confidence level is used today by the vast majority of research teams at large organizations and remains generally unchallenged in contexts far different than the ones for which it was formulated. Matrix visualising Type I and Type II errors as described in text. But confidence levels should be representative of the amount of risk that an organization is willing to take to realize a potential opportunity. There are many reasons for product teams in particular to be more concerned with avoiding false-negatives than false-positives. Mistakenly missing an opportunity due to caution can have a more negative impact than building something no one really wants. Digital product teams don’t share many of the concerns of an aerospace engineering team and therefore need to calculate and quantify their own tolerance for risk. To illustrate the ramifications that confidence levels can have on business decisions, consider this thought exercise. Imagine two companies, one with outrageously profitable 90% margins, and one with painfully narrow 5% margins. Suppose each of these businesses are considering a new line of business. In the case of the high margin business, the amount of capital they have to risk to pursue the opportunity is dwarfed by the potential reward. If executives get even the weakest indication that the business might work they should pursue the new business line aggressively. In fact, waiting for perfect information before acting might be the difference between capturing a market and allowing a competitor to get there first. In the case of the narrow margin business, however, the buffer before going into the red is so small that going after the new business line wouldn’t make sense with anything except the most definitive signal. Although these two examples are obviously allegorical, they demonstrate the principle at hand. To work together effectively, research analysts and their commercially-driven counterparts should have a conversation around their organization’s particular level of comfort and to make statistical decisions accordingly. Focus on impact Confidence levels only tell half the story. They don’t address the magnitude to which the results of an experiment are meaningful to your business. Product teams need to combine the detection of an effect (i.e., the likelihood that there is an effect) with the size of that effect (i.e., the potential impact to the business), but this is often forgotten on the quest for the proverbial holy grail of statistical significance. Many teams mistakenly focus energy and resources acting on statistically significant but inconsequential findings. A meta-analysis of hundreds of consumer behavior experiments sought to qualify how seriously effect sizes are considered when evaluating research results. They found that an astonishing three-quarters of the findings didn’t even bother reporting effect sizes “because of their small values” or because of “a general lack of interest in discovering the extent to which an effect is significant…” This is troubling, because without considering effect size, there’s virtually no way to determine what opportunities are worth pursuing and in what order. Limited development resources prevent product teams from realistically tackling every single opportunity. Consider for example how the answer to this question, posed by a MECLABS data scientist, changes based on your perspective: In terms of size, what does a 0.2% difference mean? For Amazon.com, that lift might mean an extra 2,000 sales and be worth a $100,000 investment…For a mom-and-pop Yahoo! store, that increase might just equate to an extra two sales and not be worth a $100 investment. Unless you’re operating at a Google-esque scale for which an incremental lift in a conversion rate could result in literally millions of dollars in additional revenue, product teams should rely on statistics and research teams to help them prioritize the largest opportunities in front of them. Sample size constraints One of the most critical constraints on product teams that want to generate user insights is the ability to source users for experiments. With enough traffic, it’s certainly possible to generate a sample size large enough to pass traditional statistical requirements for a production split test. But it can be difficult to drive enough traffic to new product concepts, and it can also put a brand unnecessarily at risk, especially in heavily regulated industries. For product teams that can’t easily access or run tests in production environments, simulated environments offer a compelling alternative. That leaves product teams stuck between a rock and a hard place. Simulated environments require standing user panels that can get expensive quickly, especially if research teams seek sample sizes in the hundreds or thousands. Unfortunately, strategies like these again overlook important nuances in statistics and place undue hardship on the user insight generation process. A larger sample does not necessarily mean a better or more insightful sample. The objective of any sample is for it to be representative of the population of interest, so that conclusions about the sample can be extrapolated to the population. It’s assumed that the larger the sample, the more likely it is going to be representative of the population. But that’s not inherently true, especially if the sampling methodology is biased. Years ago, a client fired an entire research team in the human resources department for making this assumption. The client sought to gather feedback about employee engagement and tasked this research team with distributing a survey to the entire company of more than 20,000 global employees. From a statistical significance standpoint, only 1,000 employees needed to take the survey for the research team to derive defensible insights. Within hours after sending out the survey on a Tuesday morning, they had collected enough data and closed the survey. The problem was that only employees within a few timezones had completed the questionnaire with a solid third of the company being asleep, and therefore ignored, during collection. Clearly, a large sample isn’t inherently representative of the population. To obtain a representative sample, product teams first need to clearly identify a target persona. This may seem obvious, but it’s often not explicitly done, creating quite a bit of miscommunication for researchers and other stakeholders. What one person may mean by a ‘frequent customer’ could mean something different entirely to another person. After a persona is clearly identified, there are a few sampling techniques that one can follow, including probability sampling and nonprobability sampling techniques. A carefully-selected sample size of 100 may be considerably more representative of a target population than a thrown-together sample of 2,000. Research teams may counter with the need to meet statistical assumptions that are necessary for conducting popular tests such as a t-test or Analysis of Variance (ANOVA). These types of tests assume a normal distribution, which generally occurs as a sample size increases. But statistics has a solution for when this assumption is violated and provides other options, such as non-parametric testing, which work well for small sample sizes. In fact, the strongest argument left in favor of large sample sizes has already been discounted. Statisticians know that the larger the sample size, the easier it is to detect small effect sizes at a statistically significant level (digital product managers and marketers have become soberly aware that even a test comparing two identical versions can find a statistically significant difference between the two). But a focused product development process should be immune to this distraction because small effect sizes are of little concern. Not only that, but large effect sizes are almost as easily discovered in small samples as in large samples. For example, suppose you want to test ideas to improve a form on your website that currently gets filled out by 10% of visitors. For simplicity’s sake, let’s use a confidence level of 95% to accept any changes. To identify just a 1% absolute increase to 11%, you’d need more than 12,000 users, according to Optimizely’s stats engine formula! If you were looking for a 5% absolute increase, you’d only need 223 users. But depending on what you’re looking for, even that many users may not be needed, especially if conducting qualitative research. When identifying usability problems across your site, leading UX researchers have concluded that “elaborate usability tests are a waste of resources” because the overwhelming majority of usability issues are discovered with just five testers. An emphasis on large sample sizes can be a red herring for product stakeholders. Organizations should not be misled away from the real objective of any sample, which is an accurate representation of the identified, target population. Research teams can help product teams identify necessary sample sizes and appropriate statistical tests to ensure that findings are indeed meaningful and cost-effectively attained. Expand capacity for learning It might sound like semantics, but data should not drive decision-making. Insights should. And there can be quite a gap between the two, especially when it comes to user insights. In a recent talk on the topic of big data, Malcolm Gladwell argued that “data can tell us about the immediate environment of consumer attitudes, but it can’t tell us much about the context in which those attitudes were formed.” Essentially, statistics can be a powerful tool for obtaining and processing data, but it doesn’t have a monopoly on research. Product teams can become obsessed with their Omniture and Optimizely dashboards, but there’s a lot of rich information that can’t be captured with these tools alone. There is simply no replacement for sitting down and talking with a user or customer. Open-ended feedback in particular can lead to insights that simply cannot be discovered by other means. The focus shouldn’t be on interviewing every single user though, but rather on finding a pattern or theme from the interviews you do conduct. One of the core principles of the scientific method is the concept of replicability—that the results of any single experiment can be reproduced by another experiment. In product management, the importance of this principle cannot be overstated. You’ll presumably need any data from your research to hold true once you engineer the product or feature and release it to a user base, so reproducibility is an inherent requirement when it comes to collecting and acting on user insights. We’ve far too often seen a product team wielding a single data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a user panel that doesn’t exactly represent your target customer can skew individual test results. Similarly, and in digital product management especially, customer perceptions and trends evolve rapidly, further complicating data. Look no further than the handful of mobile operating systems which undergo yearly redesigns and updates, leading to constantly elevated user expectations. It’s perilously easy to imitate Homer Simpson’s lapse in thinking, “This year, I invested in pumpkins. They’ve been going up the whole month of October and I got a feeling they’re going to peak right around January. Then, bang! That’s when I’ll cash in.” So how can product and research teams safely transition from data to insights? Fortunately, we believe statistics offers insight into the answer. The central limit theorem is one of the foundational concepts taught in every introductory statistics class. It states that the distribution of averages tends to be Normal even when the distribution of the population from which the samples were taken is decidedly not Normal. Put as simply as possible, the theorem acknowledges that individual samples will almost invariably be skewed, but offers statisticians a way to combine them to collectively generate valid data. Regardless of how confusing or complex the underlying data may be, by performing relatively simple individual experiments, the culminating result can cut through the noise. This theorem provides a useful analogy for product management. To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process that incorporates both quantitative and qualitative techniques across a variety of environments. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an insight until a pattern has been rigorously established. Divide no more The moral of the story is that the nuances in statistics actually do matter. Dogmatically adopting textbook statistics can stifle an organization’s ability to innovate and operate competitively, but ignoring the value and perspective provided by statistics altogether can be similarly catastrophic. By understanding and appropriately applying the core tenets of statistics, product and research teams can begin with a framework for productive dialog about the risks they’re willing to take, the research methodologies they can efficiently but rigorously conduct, and the customer insights they’ll act upon. Share this: Planning a Taxonomy Project Taxonomy of Spices and Pantries: Part 2 by Grace G Lau October 20th, 2015 No Comments This is part 2 of “Taxonomy of Spices and Pantries,” in which I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies In part 1, I enumerated the business reasons for a taxonomy focus in a site redesign and gave a fun way to explain taxonomy. The kitchen isn’t going to organize itself, so the analogy continues. I’ve moved every couple of years and it shows in the kitchen. Half-used containers of ground pepper. Scattered bags of star anise. Multiple bags of ground and whole cumin. After a while, people are quick to stuff things into the nearest crammable crevice (until we move again and the IA is called upon to organize the kitchen). Planning a taxonomy covers the same questions as planning any UX project. Understanding the users and their tasks and needs is a foundation for all things UX. This article will go through the questions you should consider when planning a kitchen, er, um…, a taxonomy project. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Source: Grace Lau. Same as a designing any software, application, or website, you’ll need to meet with the stakeholders and ask questions: Purpose: Why? What will the taxonomy be used for? Users: Who’s using this taxonomy? Who will it affect? Content: What will be covered by this taxonomy? Scope: What’s the topic area and limits? Resources: What are the project resources and constraints? (Thanks to Heather Hedden, “The Accidental Taxonomist,” p.292) What’s your primary purpose? Why are you doing this? Are you moving, or planning to move? Is your kitchen so disorganized that you can’t find the sugar you needed for soy braised chicken? Is your content misplaced and hard to search? How often have you found just plain old salt in a different spot? How many kinds of salt do you have anyway–Kosher salt, sea salt, iodized salt, Hawaiian pink salt? (Why do you have so many different kinds anyway? One of my favorite recipe books recommended using red Hawaiian sea salt for kalua pig. Of course, I got it.) You might be using the taxonomy for tagging or, in librarian terms, indexing or cataloging. Maybe it’s for information search and retrieval. Are you building a faceted search results page? Perhaps this taxonomy is being used for organizing the site content and guiding the end users through the site navigation. Establishing a taxonomy as a common language also helps build consensus and creates smarter conversations. On making baozi (steamed buns), I overheard a conversation between fathers: Father-in-law: We need 酵母 [Jiàomǔ] {noun}. Dad: Yi-see? (Cantonese transliteration of yeast) Father-in-law: (confused look) Dad: Baking pow-daa? (Cantonese transliteration of baking powder) Meanwhile, I look up the Chinese translation of “yeast” in Google Translate while mother-in-law opens her go-to Chinese dictionary tool. I discover that the dictionary word for “yeast” is 发酵粉 [fājiàofěn] {noun}. Father-in-law: Ah, so it rises flour: 发面的 [fāmiànde] {verb} This discovery ensues more discussion about what it does and how it is used. There was at least 15 more minutes of discussing yeast in five different ways before the fathers agreed that they were talking about the same ingredient and its purpose. Eventually, we have this result in our bellies. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Who are the users? Are they internal? Content creators or editors, working in the CMS? Are they external users? What’s their range of experience in the domain? Are we speaking with homemakers and amateur cooks or seasoned cooks with many years at various Chinese restaurants? Looking at the users of my kitchen, I identified the following stakeholders: Content creators: the people who do the shopping and have to put away the stuff People who are always in the kitchen: my in-laws People who are sometimes in the kitchen: me Visiting users: my parents and friends who often come over for a BBQ/grill party The cleanup crew: my husband who can’t stand the mess we all make How do I create a taxonomy for them? First, I attempt to understand their mental models by watching them work in their natural environment and observing their everyday hacks as they complete their tasks. Having empathy for users’ end game—making food for the people they care for—makes a difference in developing the style, consistency, and breadth and depth of the taxonomy. What content will be covered by the taxonomy? In my kitchen, we’ll be covering sugars, salts, spices, and staples used for cooking, baking, braising, grilling, smoking, steaming, simmering, and frying. How did I determine that? Terminology from existing content. I opened up every cabinet and door in my kitchen and made an inventory. Search logs. How were users accessing my kitchen? Why? How were users referring to things? What were they looking for? Storytelling with users. How did you make this? People like to share recipes and I like to watch friends cook. Doing user interviews has never been more fun! What’s the scope? Scope can easily get out of hand. Notice that I have not included in my discussion any cookbooks, kitchen hardware and appliances, pots and pans, or anything that’s in the refrigerator or freezer. You may need a scope document early on to plan releases (if you need them). Perhaps for the first release, I’ll just deal with the frequent use items. Then I’ll move on to occasional use items (soups and desserts). If the taxonomy you’re developing is faceted—for example, allowing your users to browse your cupboards by particular attributes such as taste, canned vs dried, or weight—your scope should include only those attributes relevant to the search process. For instance, no one really searches for canned goods in my kitchen, so that’s out of scope. What resources do you have available? My kitchen taxonomy will be limited. Stakeholders are multilingual so items will need labelling in English, Simplified Chinese, and pinyin romanization. I had considered building a Drupal site to manage an inventory, but I have neither the funding or time to implement such a complex site. At the same time, what are users’ expectations for the taxonomy? Considering the context in the taxonomy’s usage is important. How will (or should) a taxonomy empower its users? It needs to be invisible; as an indication of a good taxonomy, it shouldn’t affect their current workflow but make it more efficient. Both fathers and my mom are unlikely to stop and use any digital technology to find and look things up. Most importantly, the completed taxonomy and actual content migration should not conflict with the preparation of the next meal. My baby needs a packed lunch for school, and it’s 6 a.m. when I’m preparing it. There’s no time to rush around looking for things. Time is limited and a complete displacement of spices and condiments would disrupt the high-traffic flow in any household. Meanwhile, we’re out of soy sauce again and I’d rather it not be stashed in yet a new home and forgotten. That’s why we ended up with three open bottles of soy sauce from different brands. What else should you consider for the taxonomy? Understanding the scope of the taxonomy you’re building can help prevent scope creep in a taxonomy project. In time, you’ll realize that the 80% of your time and effort is devoted to research while 20% of the time and effort is actually developing the taxonomy. So, making time for iterations and validation through card sorting and other testing is important in your planning. The Freelance Studio Denver, Co. User Experience Agency Ending the UX Designer Drought Part 2 - Laying the Foundation by Fred Beecher June 23rd, 2015 11 Comments The first article in this series, “A New Apprenticeship Architecture,” laid out a high-level framework for using the ancient model of apprenticeship to solve the modern problem of the UX talent drought. In this article, I get into details. Specifically, I discuss how to make the business case for apprenticeship and what to look for in potential apprentices. Let’s get started! Defining the business value of apprenticeship Apprenticeship is an investment. It requires an outlay of cash upfront for a return at a later date. Apprenticeship requires the support of budget-approving levels of your organization. For you to get that support, you need to clearly show its return by demonstrating how it addresses some of your organization’s pain points. What follows is a discussion of common pain points and how apprenticeship assuages them. Hit growth targets If your company is trying to grow but can’t find enough qualified people to do the work that growth requires, that’s the sweet spot for apprenticeship. Apprenticeship allows you to make the designers you’re having trouble finding. This is going to be a temporal argument, so you need to come armed with measurements to make it. And you’ll need help from various leaders in your organization to get them. UX team growth targets for the past 2-3 years (UX leadership) Actual UX team growth for the past 2-3 years (UX leadership) Average time required to identify and hire a UX designer (HR leadership) Then you need to estimate how apprenticeship will improve these measurements. (Part 3 of this series, which will deal with the instructional design of apprenticeship, will offer details on how to make these estimates.) How many designers per year can apprenticeship contribute? How much time will be required from the design team to mentor apprentices? Growth targets typically do not exist in a vacuum. You’ll likely need to combine this argument with one of the others. Take advantage of more revenue opportunities One of the financial implications of missing growth targets is not having enough staff to capitalize on all the revenue opportunities you have. For agencies, you might have to pass up good projects because your design team has a six-week lead time. For product companies, your release schedule might fall behind due to a UX bottleneck and push you behind your competition. The data you need to make this argument differ depending on whether your company sells time (agency) or stuff (product company). When doing the math about an apprenticeship program, agencies should consider: What number of projects have been lost in the past year due to UX lead time? (Sales leadership should have this information.) What is the estimated value of UX work on lost projects? (Sales leadership) What is the estimated value of other (development, strategy, management, etc.) work on lost projects? (Sales leadership) Then, contrast these numbers with some of the benefits of apprenticeship: What is the estimated number of designers per year apprenticeship could contribute? What is the estimated amount of work these “extra” designers would be able to contribute in both hours and cash? What is the estimated profitability of junior designers (more) versus senior designers (less), assuming the same hourly rate? Product companies should consider: The ratio of innovative features versus “catch-up” features your competitors released last year. (Sales or marketing leadership should have this information.) The ratio of innovative features versus “catch-up” features you released in the past year. (Sales or marketing leadership) Any customer service and/or satisfaction metrics. (Customer service leadership) Contrast this data with… The estimated number of designers per year you could add through apprenticeship. The estimated number of features they could’ve completed for release. The estimated impact this would have on customer satisfaction. Avoid high recruiting costs Recruiting a mid- to senior-level UX designer typically means finding them and poaching them from somewhere else. This requires paying significant headhunting fees on top of the person-hours involved in reviewing resumes and portfolios and interviewing candidates. All the data you need to make this argument can come from UX leadership and HR. Average cost per UX designer recruit Average number of hours spent recruiting a UX designer Contrast this data with: Estimated cost per apprentice To estimate this, factor in: Overhead per employee Salary (and benefits if the apprenticeship is long enough to qualify while still an apprentice) Software and service licenses Mentorship time from the current design team Mentorship/management time from the designer leading the program Increase designer engagement This one is tricky because most places don’t measure engagement directly. Measuring engagement accurately requires professional quantitative research. However, there are some signs that can point to low engagement. High turnover is the number one sign of low engagement. What kind of people are leaving—junior designers, seniors, or both? If possible, try to get exit interview data (as raw as possible) to develop hypotheses about how apprenticeship could help. Maybe junior designers don’t feel like their growth is supported… allowing them to leverage elements of an apprenticeship program for further professional development could fix that. Maybe senior designers are feeling burnt out. Consistent mentorship, like that required by apprenticeship, can be reinvigorating. Other signs of low engagement include frequently missing deadlines, using more sick time, missing or being late to meetings, and more. Investigate any signs you see, validate any assumptions you might take on, and hypothesize about how apprenticeship can help address these issues. Help others If your organization is motivated by altruism, that is wonderful! At least one organization with an apprenticeship program actually tries very hard not to hire their apprentices. Boston’s Fresh Tilled Soil places their graduated apprentices with their clients, which creates a very strong relationship with those clients. Additionally, this helps them raise the caliber and capacity of the Boston metro area when it comes to UX design. Hiring great UX apprentices Hiring apprentices requires a different approach to evaluating candidates than hiring established UX designers. Most candidates will have little to no actual UX design skills, so you have to evaluate them for their potential to acquire and hone those skills. Additionally, not everyone learns effectively through apprenticeship. Identifying the traits of a good apprentice in candidates will help your program run smoothly. Evaluating for skill potential Portfolio. Even though you’re evaluating someone who may never have designed a user experience before, you still need them to bring some examples of something they’ve made. Without this, it’s impossible to get a sense of what kind of process they go through to make things. For example, one apprentice candidate brought in a print brochure she designed. Her description of how she designed it included identifying business goals, balancing competing stakeholder needs, working within constraints, and getting feedback along the way, all of which are relevant to the process of UX design. Mindset. The number one thing you must identify in a candidate is whether they already possess the UX mindset, the point of view that things are designed better when they’re designed with people in mind. This is usually the light bulb that goes off in people’s heads when they discover UX design. If that light hasn’t gone off, UX might not be the right path for that person. Apprenticeship is too much of an investment to risk that. Evaluating for this is fairly simple. It usually comes out in the course of a conversation. If not, asking outright “What does user experience design mean to you” can be helpful. Pay careful attention to how people talk about how they’ve approached their work. Is it consistent with their stated philosophy? If not, that could be a red flag. Intrinsic motivation. When people talk about having a “passion” for something, what that means is that they are intrinsically motivated to do that thing. This is pretty easy to evaluate for. What have they done to learn UX? Have they taken a class? That’s a positive sign. Have they identified and worked through a UX problem on their own? Even better! If a candidate hasn’t put in the effort to explore UX on their own, they are likely not motivated enough to do well in the field. Self-education. While self-education is a sign of intrinsic motivation, it’s also important in its own right. Apprenticeship relies heavily on mentorship, but the responsibility for the direction and nature of that mentorship lies with the apprentice themselves. If someone is a self-educator, that’s a good predictor that they’ll be able to get the most out of mentorship. This is another fairly easy one to evaluate. Ask them to tell you about the most recent UX-related blog post or article they read. It doesn’t matter what it actually is, only whether they can quickly bring something to mind. Professional skills. UX design is not a back-office field. UX designers talk with clients, customers, stakeholders, developers, and more. To be an effective UX designer a candidate must possess basic professional skills such as dressing appropriately and communicating well. Simple things like sending a “thank you” email are a great indication of good professional skills. (Physically mailed thank you notes get extra bonus points. One-off letterpressed mailed thank you notes get even more!) Collaboration. UX design is a collaborative discipline. If a candidate struggles with collaboration, they’ll struggle in the field. When discussing their work (especially class project work), be sure to ask what role they played on the project and how they interacted with other people. Complaining about others and taking on too much work themselves are some warning signs that could indicate that a candidate has trouble with collaboration. Evaluating for apprenticeship fit Learning pattern. Some people learn best by gradually being exposed to a topic. I call these people toe-dippers, as they prefer to dip their toes into something before diving in. Others prefer to barrel off the dock straight into the deep end and then struggle to the surface. I call these people deep-enders. While apprenticeship can be modified to work better for deep-enders, its gradual exposure can often frustrate them. It is much better suited for toe-dippers. Evaluating for this is tricky, though. Asking people whether they prefer to dive in or learn gradually, they’ll say “dive in” because they think that’s what you want to hear. Asking them how they’ve approached learning other skills can give some insight, but this is not 100% reliable. Learning by doing. Apprenticeship helps people acquire skills through experiential learning. If this is not how a person learns, apprenticeship may not be for them. Evaluating for this is very much like evaluating for intrinsic motivation. Has someone gone to the trouble of identifying and solving a design problem themselves? Have they practiced UX methods they have learned about? If so, it’s likely that learning by doing is effective for them. Receptiveness to critique. Apprenticeship is a period of sustained critique. Someone whose response to criticism is defensiveness or despondency will not be successful as an apprentice. This is easy to identify in an interview within the context of discussing the work examples the candidate has brought. My favorite technique for doing this is to find something insignificant to critique and then hammer on it. This is not how I normally critique, of course; it’s a pressure test. If a candidate responds with openness and a desire to learn from this encounter, that’s a very positive sign. If they launch into a monologue defending their decisions, the interview is pretty much over. If you’re fired up about UX apprenticeship (and how could you not be?), start making it happen in your organization! Do the research, find the data, and share your vision with your company’s leadership so they can see it too! When you get the go-ahead, you’ll be all ready to start looking for apprentices. If you follow these guidelines, you’ll get great apprentices who will grow into great designers. Stay tuned for Part 3 of this series where I’ll get detailed about the instructional design of apprenticeship, pedagogy, mentorship, and tracking! Share this: EmailTwitter206RedditLinkedIn229Facebook20Google Posted in Big Ideas, Business Design, Education, Workplace and Career | 11 Comments » 11 Comments Building the Business Case for Taxonomy Taxonomy of Spices and Pantries: Part 1 by Grace G Lau September 1st, 2015 9 Comments XKCD comic strip about not being able to name all seven dwarfs from Snow White. How often have you found yourself on an ill-defined site redesign project? You know, the ones that you end up redesigning and restructuring every few years as you add new content. Or perhaps you spin up a new microsite because the new product/solution doesn’t fit in with the current structure, not because you want to create a new experience around it. Maybe your site has vaguely labelled navigation buckets like “More Magic”—which is essentially your junk drawer, your “everything else.” Your top concerns on such projects are: You can’t find anything. Your users can’t find anything. The navigation isn’t consistent. You have too much content. Your hopeful answer to everything is to rely on an external search engine, not the one that’s on your site. Google will find everything for you. A typical site redesign project might include refreshing the visual design, considering the best interaction practices, and conducting usability testing. But what’s missing? Creating the taxonomy. “Taxonomy is just tagging, right? Sharepoint/AEM has it—we’re covered!” In the coming months, I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies Are you ready? ROI of taxonomy Although the word “taxonomy” is often used interchangeably with tagging, building an enterprise taxonomy means more than tagging content. It’s essentially a knowledge organization system, and its purpose is to enable the user to browse, find, and discover content. Spending the time on building that taxonomy empowers your site to better manage your content at scale, allow for meaningful navigation, expose long-tail content, reuse content assets, bridge across subjects, and provide more efficient product/brand alignment. In addition, a sound taxonomy in the long run will improve your content’s findability, support social sharing, and improve your site’s search engine optimization. (Thanks to Mike Atherton’s “Modeling Structured Content” workshop, presented at IA Summit 2013, for outlining the benefits.) How do you explain taxonomy to get stakeholders on board? No worries, we won’t be going back to high school biology. Explaining taxonomy Imagine a household kitchen. How would you organize the spices? Consider the cooks: In-laws from northern China, mom from Hong Kong, and American-born Grace. I’ve moved four times in the past five years. My husband, son, and I live with my in-laws. I have a mother who still comes over to make her Cantonese herbal soups. We all speak different languages: English, Mandarin Chinese, and Cantonese Chinese. I have the unique need of organizing my kitchen for multiple users. For my in-laws, they need to be able to find their star anise, peppercorn, tree ear mushrooms, and sesame oil. My mom needs a space to store her dried figs, dried shiitake mushrooms, dried goji berries, and snow fungus. I need to find a space for dried thyme and rosemary for the “American” food I try to make. Oh, and we all need a consistent place for salt and sugar. People can organize their kitchen by activity zones: baking, canning, preparing, and cooking. Other ways to organize a kitchen successfully could include: attributes (shelf-life, weight, temperature requirements) usage (frequency, type of use) seasonality (organic, what’s in season, local) occasion (hot pot dinners, BBQ parties) You can also consider organizing by audience such as for the five year old helper. I keep refining how the kitchen is organized each time we move. I have used sticky notes in Chinese and English with my in-laws and my mom as part of a card sorting exercise; I’ve tested the navigation around the kitchen to validate the results. A photo of pantry shelves labeled noodles, rice, garlic, and the like. Early attempts at organizing my pantry. If this is to be a data-driven taxonomy, I could consider attaching RFID tags to each spice container to track frequency and type of usage for a period of time to obtain some kitchen analytics. On the other hand, I could try guesstimating frequency by looking at the amount of grime or dust collected on the container. How often are we using chicken bouillon and to make what dishes? Does it need to be within easy reach of the stovetop or can it be relegated to a pantry closet three feet away? Photo of labeled spice jars in a drawer. From Home Depot. Understanding the users and their tasks and needs is a foundation for all things UX. Taxonomy building is not any different. How people think about and use their kitchen brings with it a certain closeness that makes taxonomy concepts easier to grasp. Who are the users? What are they trying to do? How do they currently tackle this problem? What works and what doesn’t? Watch, observe, and listen to their experience. Helping the business understand the underlying concepts is one of the challenges I’ve faced with developing a solid taxonomy. We’re not just talking about tagging but breaking down the content by its attributes and metadata as well as by its potential usage and relation to other content. The biggest challenge is building the consensus and understanding around that taxonomy—taxonomy governance—and keeping the system you’ve designed well-seasoned! Now, back to that site redesign project that you were thinking of: How about starting on that taxonomy? My next post will cover taxonomy planning. How to determine when customer feedback is actionable Merging statistics with product management by Naira Musallam, Nis Frome, Michael Williams, and Tim Lawton October 13th, 2015 1 Comments One of the riskiest assumptions for any new product or feature is that customers actually want it. Although product leaders can propose numerous ‘lean’ methodologies to experiment inexpensively with new concepts before fully engineering them, anything short of launching a product or feature and monitoring its performance over time in the market is, by definition, not 100% accurate. That leaves us with a dangerously wide spectrum of user research strategies, and an even wider range of opinions for determining when customer feedback is actionable. To the dismay of product teams desiring to ‘move fast and break things,’ their counterparts in data science and research advocate a slower, more traditional approach. These proponents of caution often emphasize an evaluation of statistical signals before considering customer insights valid enough to act upon. This dynamic has meaningful ramifications. For those who care about making data-driven business decisions, the challenge that presents itself is: How do we adhere to rigorous scientific standards in a world that demands adaptability and agility to survive? Having frequently witnessed the back-and-forth between product teams and research groups, it is clear that there is no shortage of misconceptions and miscommunication between the two. Only a thorough analysis of some critical nuances in statistics and product management can help us bridge the gap. Quantify risk tolerance You’ve probably been on one end of an argument that cited a “statistically significant” finding to support a course of action. The problem is that statistical significance is often equated to having relevant and substantive results, but neither is necessarily the case. Simply put, statistical significance exclusively refers to the level of confidence (measured from 0 to 1, or 0% to 100%) you have that the results you obtained from a given experiment are not due to chance. Statistical significance alone tells you nothing about the appropriateness of the confidence level selected nor the importance of the results. To begin, confidence levels should be context-dependent, and determining the appropriate confidence threshold is an oft-overlooked proposition that can have profound consequences. In statistics, confidence levels are closely linked to two concepts: type I and type II errors. A type I error, or false-positive, refers to believing that a variable has an effect that it actually doesn’t. Some industries, like pharmaceuticals and aeronautics, must be exceedingly cautious against false-positives. Medical researchers for example cannot afford to mistakenly think a drug has an intended benefit when in reality it does not. Side effects can be lethal so the FDA’s threshold for proof that a drug’s health benefits outweigh their known risks is intentionally onerous. A type II error, or false-negative, has to do with the flip side of the coin: concluding that a variable doesn’t have an effect when it actually does. Historically though, statistical significance has been primarily focused on avoiding false-positives (even if it means missing out on some likely opportunities) with the default confidence level at 95% for any finding to be considered actionable. The reality that this value was arbitrarily determined by scientists speaks more to their comfort level of being wrong than it does to its appropriateness in any given context. Unfortunately, this particular confidence level is used today by the vast majority of research teams at large organizations and remains generally unchallenged in contexts far different than the ones for which it was formulated. Matrix visualising Type I and Type II errors as described in text. But confidence levels should be representative of the amount of risk that an organization is willing to take to realize a potential opportunity. There are many reasons for product teams in particular to be more concerned with avoiding false-negatives than false-positives. Mistakenly missing an opportunity due to caution can have a more negative impact than building something no one really wants. Digital product teams don’t share many of the concerns of an aerospace engineering team and therefore need to calculate and quantify their own tolerance for risk. To illustrate the ramifications that confidence levels can have on business decisions, consider this thought exercise. Imagine two companies, one with outrageously profitable 90% margins, and one with painfully narrow 5% margins. Suppose each of these businesses are considering a new line of business. In the case of the high margin business, the amount of capital they have to risk to pursue the opportunity is dwarfed by the potential reward. If executives get even the weakest indication that the business might work they should pursue the new business line aggressively. In fact, waiting for perfect information before acting might be the difference between capturing a market and allowing a competitor to get there first. In the case of the narrow margin business, however, the buffer before going into the red is so small that going after the new business line wouldn’t make sense with anything except the most definitive signal. Although these two examples are obviously allegorical, they demonstrate the principle at hand. To work together effectively, research analysts and their commercially-driven counterparts should have a conversation around their organization’s particular level of comfort and to make statistical decisions accordingly. Focus on impact Confidence levels only tell half the story. They don’t address the magnitude to which the results of an experiment are meaningful to your business. Product teams need to combine the detection of an effect (i.e., the likelihood that there is an effect) with the size of that effect (i.e., the potential impact to the business), but this is often forgotten on the quest for the proverbial holy grail of statistical significance. Many teams mistakenly focus energy and resources acting on statistically significant but inconsequential findings. A meta-analysis of hundreds of consumer behavior experiments sought to qualify how seriously effect sizes are considered when evaluating research results. They found that an astonishing three-quarters of the findings didn’t even bother reporting effect sizes “because of their small values” or because of “a general lack of interest in discovering the extent to which an effect is significant…” This is troubling, because without considering effect size, there’s virtually no way to determine what opportunities are worth pursuing and in what order. Limited development resources prevent product teams from realistically tackling every single opportunity. Consider for example how the answer to this question, posed by a MECLABS data scientist, changes based on your perspective: In terms of size, what does a 0.2% difference mean? For Amazon.com, that lift might mean an extra 2,000 sales and be worth a $100,000 investment…For a mom-and-pop Yahoo! store, that increase might just equate to an extra two sales and not be worth a $100 investment. Unless you’re operating at a Google-esque scale for which an incremental lift in a conversion rate could result in literally millions of dollars in additional revenue, product teams should rely on statistics and research teams to help them prioritize the largest opportunities in front of them. Sample size constraints One of the most critical constraints on product teams that want to generate user insights is the ability to source users for experiments. With enough traffic, it’s certainly possible to generate a sample size large enough to pass traditional statistical requirements for a production split test. But it can be difficult to drive enough traffic to new product concepts, and it can also put a brand unnecessarily at risk, especially in heavily regulated industries. For product teams that can’t easily access or run tests in production environments, simulated environments offer a compelling alternative. That leaves product teams stuck between a rock and a hard place. Simulated environments require standing user panels that can get expensive quickly, especially if research teams seek sample sizes in the hundreds or thousands. Unfortunately, strategies like these again overlook important nuances in statistics and place undue hardship on the user insight generation process. A larger sample does not necessarily mean a better or more insightful sample. The objective of any sample is for it to be representative of the population of interest, so that conclusions about the sample can be extrapolated to the population. It’s assumed that the larger the sample, the more likely it is going to be representative of the population. But that’s not inherently true, especially if the sampling methodology is biased. Years ago, a client fired an entire research team in the human resources department for making this assumption. The client sought to gather feedback about employee engagement and tasked this research team with distributing a survey to the entire company of more than 20,000 global employees. From a statistical significance standpoint, only 1,000 employees needed to take the survey for the research team to derive defensible insights. Within hours after sending out the survey on a Tuesday morning, they had collected enough data and closed the survey. The problem was that only employees within a few timezones had completed the questionnaire with a solid third of the company being asleep, and therefore ignored, during collection. Clearly, a large sample isn’t inherently representative of the population. To obtain a representative sample, product teams first need to clearly identify a target persona. This may seem obvious, but it’s often not explicitly done, creating quite a bit of miscommunication for researchers and other stakeholders. What one person may mean by a ‘frequent customer’ could mean something different entirely to another person. After a persona is clearly identified, there are a few sampling techniques that one can follow, including probability sampling and nonprobability sampling techniques. A carefully-selected sample size of 100 may be considerably more representative of a target population than a thrown-together sample of 2,000. Research teams may counter with the need to meet statistical assumptions that are necessary for conducting popular tests such as a t-test or Analysis of Variance (ANOVA). These types of tests assume a normal distribution, which generally occurs as a sample size increases. But statistics has a solution for when this assumption is violated and provides other options, such as non-parametric testing, which work well for small sample sizes. In fact, the strongest argument left in favor of large sample sizes has already been discounted. Statisticians know that the larger the sample size, the easier it is to detect small effect sizes at a statistically significant level (digital product managers and marketers have become soberly aware that even a test comparing two identical versions can find a statistically significant difference between the two). But a focused product development process should be immune to this distraction because small effect sizes are of little concern. Not only that, but large effect sizes are almost as easily discovered in small samples as in large samples. For example, suppose you want to test ideas to improve a form on your website that currently gets filled out by 10% of visitors. For simplicity’s sake, let’s use a confidence level of 95% to accept any changes. To identify just a 1% absolute increase to 11%, you’d need more than 12,000 users, according to Optimizely’s stats engine formula! If you were looking for a 5% absolute increase, you’d only need 223 users. But depending on what you’re looking for, even that many users may not be needed, especially if conducting qualitative research. When identifying usability problems across your site, leading UX researchers have concluded that “elaborate usability tests are a waste of resources” because the overwhelming majority of usability issues are discovered with just five testers. An emphasis on large sample sizes can be a red herring for product stakeholders. Organizations should not be misled away from the real objective of any sample, which is an accurate representation of the identified, target population. Research teams can help product teams identify necessary sample sizes and appropriate statistical tests to ensure that findings are indeed meaningful and cost-effectively attained. Expand capacity for learning It might sound like semantics, but data should not drive decision-making. Insights should. And there can be quite a gap between the two, especially when it comes to user insights. In a recent talk on the topic of big data, Malcolm Gladwell argued that “data can tell us about the immediate environment of consumer attitudes, but it can’t tell us much about the context in which those attitudes were formed.” Essentially, statistics can be a powerful tool for obtaining and processing data, but it doesn’t have a monopoly on research. Product teams can become obsessed with their Omniture and Optimizely dashboards, but there’s a lot of rich information that can’t be captured with these tools alone. There is simply no replacement for sitting down and talking with a user or customer. Open-ended feedback in particular can lead to insights that simply cannot be discovered by other means. The focus shouldn’t be on interviewing every single user though, but rather on finding a pattern or theme from the interviews you do conduct. One of the core principles of the scientific method is the concept of replicability—that the results of any single experiment can be reproduced by another experiment. In product management, the importance of this principle cannot be overstated. You’ll presumably need any data from your research to hold true once you engineer the product or feature and release it to a user base, so reproducibility is an inherent requirement when it comes to collecting and acting on user insights. We’ve far too often seen a product team wielding a single data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a user panel that doesn’t exactly represent your target customer can skew individual test results. Similarly, and in digital product management especially, customer perceptions and trends evolve rapidly, further complicating data. Look no further than the handful of mobile operating systems which undergo yearly redesigns and updates, leading to constantly elevated user expectations. It’s perilously easy to imitate Homer Simpson’s lapse in thinking, “This year, I invested in pumpkins. They’ve been going up the whole month of October and I got a feeling they’re going to peak right around January. Then, bang! That’s when I’ll cash in.” So how can product and research teams safely transition from data to insights? Fortunately, we believe statistics offers insight into the answer. The central limit theorem is one of the foundational concepts taught in every introductory statistics class. It states that the distribution of averages tends to be Normal even when the distribution of the population from which the samples were taken is decidedly not Normal. Put as simply as possible, the theorem acknowledges that individual samples will almost invariably be skewed, but offers statisticians a way to combine them to collectively generate valid data. Regardless of how confusing or complex the underlying data may be, by performing relatively simple individual experiments, the culminating result can cut through the noise. This theorem provides a useful analogy for product management. To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process that incorporates both quantitative and qualitative techniques across a variety of environments. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an insight until a pattern has been rigorously established. Divide no more The moral of the story is that the nuances in statistics actually do matter. Dogmatically adopting textbook statistics can stifle an organization’s ability to innovate and operate competitively, but ignoring the value and perspective provided by statistics altogether can be similarly catastrophic. By understanding and appropriately applying the core tenets of statistics, product and research teams can begin with a framework for productive dialog about the risks they’re willing to take, the research methodologies they can efficiently but rigorously conduct, and the customer insights they’ll act upon. Share this: Planning a Taxonomy Project Taxonomy of Spices and Pantries: Part 2 by Grace G Lau October 20th, 2015 No Comments This is part 2 of “Taxonomy of Spices and Pantries,” in which I will be exploring the what, why, and how of taxonomy planning, design, and implementation: Building the business case for taxonomy Planning a taxonomy The many uses of taxonomy Card sorting to validate a taxonomy Tree testing a taxonomy Taxonomy governance Best practices of enterprise taxonomies In part 1, I enumerated the business reasons for a taxonomy focus in a site redesign and gave a fun way to explain taxonomy. The kitchen isn’t going to organize itself, so the analogy continues. I’ve moved every couple of years and it shows in the kitchen. Half-used containers of ground pepper. Scattered bags of star anise. Multiple bags of ground and whole cumin. After a while, people are quick to stuff things into the nearest crammable crevice (until we move again and the IA is called upon to organize the kitchen). Planning a taxonomy covers the same questions as planning any UX project. Understanding the users and their tasks and needs is a foundation for all things UX. This article will go through the questions you should consider when planning a kitchen, er, um…, a taxonomy project. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Rumination of stuff in my kitchen and the kinds of users and stakeholders the taxonomy needs to be mindful of. Source: Grace Lau. Same as a designing any software, application, or website, you’ll need to meet with the stakeholders and ask questions: Purpose: Why? What will the taxonomy be used for? Users: Who’s using this taxonomy? Who will it affect? Content: What will be covered by this taxonomy? Scope: What’s the topic area and limits? Resources: What are the project resources and constraints? (Thanks to Heather Hedden, “The Accidental Taxonomist,” p.292) What’s your primary purpose? Why are you doing this? Are you moving, or planning to move? Is your kitchen so disorganized that you can’t find the sugar you needed for soy braised chicken? Is your content misplaced and hard to search? How often have you found just plain old salt in a different spot? How many kinds of salt do you have anyway–Kosher salt, sea salt, iodized salt, Hawaiian pink salt? (Why do you have so many different kinds anyway? One of my favorite recipe books recommended using red Hawaiian sea salt for kalua pig. Of course, I got it.) You might be using the taxonomy for tagging or, in librarian terms, indexing or cataloging. Maybe it’s for information search and retrieval. Are you building a faceted search results page? Perhaps this taxonomy is being used for organizing the site content and guiding the end users through the site navigation. Establishing a taxonomy as a common language also helps build consensus and creates smarter conversations. On making baozi (steamed buns), I overheard a conversation between fathers: Father-in-law: We need 酵母 [Jiàomǔ] {noun}. Dad: Yi-see? (Cantonese transliteration of yeast) Father-in-law: (confused look) Dad: Baking pow-daa? (Cantonese transliteration of baking powder) Meanwhile, I look up the Chinese translation of “yeast” in Google Translate while mother-in-law opens her go-to Chinese dictionary tool. I discover that the dictionary word for “yeast” is 发酵粉 [fājiàofěn] {noun}. Father-in-law: Ah, so it rises flour: 发面的 [fāmiànde] {verb} This discovery ensues more discussion about what it does and how it is used. There was at least 15 more minutes of discussing yeast in five different ways before the fathers agreed that they were talking about the same ingredient and its purpose. Eventually, we have this result in our bellies. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Homemade steamed baozi. Apparently, they’re still investigating how much yeast is required for the amount of flour they used. Source: Grace Lau. Who are the users? Are they internal? Content creators or editors, working in the CMS? Are they external users? What’s their range of experience in the domain? Are we speaking with homemakers and amateur cooks or seasoned cooks with many years at various Chinese restaurants? Looking at the users of my kitchen, I identified the following stakeholders: Content creators: the people who do the shopping and have to put away the stuff People who are always in the kitchen: my in-laws People who are sometimes in the kitchen: me Visiting users: my parents and friends who often come over for a BBQ/grill party The cleanup crew: my husband who can’t stand the mess we all make How do I create a taxonomy for them? First, I attempt to understand their mental models by watching them work in their natural environment and observing their everyday hacks as they complete their tasks. Having empathy for users’ end game—making food for the people they care for—makes a difference in developing the style, consistency, and breadth and depth of the taxonomy. What content will be covered by the taxonomy? In my kitchen, we’ll be covering sugars, salts, spices, and staples used for cooking, baking, braising, grilling, smoking, steaming, simmering, and frying. How did I determine that? Terminology from existing content. I opened up every cabinet and door in my kitchen and made an inventory. Search logs. How were users accessing my kitchen? Why? How were users referring to things? What were they looking for? Storytelling with users. How did you make this? People like to share recipes and I like to watch friends cook. Doing user interviews has never been more fun! What’s the scope? Scope can easily get out of hand. Notice that I have not included in my discussion any cookbooks, kitchen hardware and appliances, pots and pans, or anything that’s in the refrigerator or freezer. You may need a scope document early on to plan releases (if you need them). Perhaps for the first release, I’ll just deal with the frequent use items. Then I’ll move on to occasional use items (soups and desserts). If the taxonomy you’re developing is faceted—for example, allowing your users to browse your cupboards by particular attributes such as taste, canned vs dried, or weight—your scope should include only those attributes relevant to the search process. For instance, no one really searches for canned goods in my kitchen, so that’s out of scope. What resources do you have available? My kitchen taxonomy will be limited. Stakeholders are multilingual so items will need labelling in English, Simplified Chinese, and pinyin romanization. I had considered building a Drupal site to manage an inventory, but I have neither the funding or time to implement such a complex site. At the same time, what are users’ expectations for the taxonomy? Considering the context in the taxonomy’s usage is important. How will (or should) a taxonomy empower its users? It needs to be invisible; as an indication of a good taxonomy, it shouldn’t affect their current workflow but make it more efficient. Both fathers and my mom are unlikely to stop and use any digital technology to find and look things up. Most importantly, the completed taxonomy and actual content migration should not conflict with the preparation of the next meal. My baby needs a packed lunch for school, and it’s 6 a.m. when I’m preparing it. There’s no time to rush around looking for things. Time is limited and a complete displacement of spices and condiments would disrupt the high-traffic flow in any household. Meanwhile, we’re out of soy sauce again and I’d rather it not be stashed in yet a new home and forgotten. That’s why we ended up with three open bottles of soy sauce from different brands. What else should you consider for the taxonomy? Understanding the scope of the taxonomy you’re building can help prevent scope creep in a taxonomy project. In time, you’ll realize that the 80% of your time and effort is devoted to research while 20% of the time and effort is actually developing the taxonomy. So, making time for iterations and validation through card sorting and other testing is important in your planning. In my next article, I will explore the many uses of taxonomy outside of tagging. In my next article, I will explore the many uses of taxonomy outside of tagging. How did I determine that? Terminology from existing content. I opened up every cabinet and door in my kitchen and made an inventory. Search logs. How were users accessing my kitchen? Why? How were users referring to things? What were they looking for? Storytelling with users. How did you make this? People like to share recipes and I like to watch friends cook. Doing user interviews has never been more fun! What’s the scope? Scope can easily get out of hand. Notice that I have not included in my discussion any cookbooks, kitchen hardware and appliances, pots and pans, or anything that’s in the refrigerator or freezer. You may need a scope document early on to plan releases (if you need them). Perhaps for the first release, I’ll just deal with the frequent use items. Then I’ll move on to occasional use items (soups and desserts). If the taxonomy you’re developing is faceted—for example, allowing your users to browse your cupboards by particular attributes such as taste, canned vs dried, or weight—your scope should include only those attributes relevant to the search process. For instance, no one really searches for canned goods in my kitchen, so that’s out of scope. What resources do you have available? My kitchen taxonomy will be limited. Stakeholders are multilingual so items will need labelling in English, Simplified Chinese, and pinyin romanization. I had considered building a Drupal site to manage an inventory, but I have neither the funding or time to implement such a complex site. At the same time, what are users’ expectations for the taxonomy? Considering the context in the taxonomy’s usage is important. How will (or should) a taxonomy empower its users? It needs to be invisible; as an indication of a good taxonomy, it shouldn’t affect their current workflow but make it more efficient. Both fathers and my mom are unlikely to stop and use any digital technology to find and look things up. Most importantly, the completed taxonomy and actual content migration should not conflict with the preparation of the next meal. My baby needs a packed lunch for school, and it’s 6 a.m. when I’m preparing it. There’s no time to rush around looking for things. Time is limited and a complete displacement of spices and condiments would disrupt the high-traffic flow in any household. Meanwhile, we’re out of soy sauce again and I’d rather it not be stashed in yet a new home and forgotten. That’s why we ended up with three open bottles of soy sauce from different brands. What else should you consider for the taxonomy? Understanding the scope of the taxonomy you’re building can help prevent scope creep in a taxonomy project. In time, you’ll realize that the 80% of your time and effort is devoted to research while 20% of the time and effort is actually developing the taxonomy. So, making time for iterations and validation through card sorting and other testing is important in your planning. In my next article, I will explore the many uses of taxonomy outside of tagging.