Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 19 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
19
Dung lượng
16,08 MB
Nội dung
Palatable Computation: Recipe Generation Using Graph Embeddings William Taylor Bakst Stanford University Jesse Andrew wbakst6stanford.edu jstrobe@stanford.edu Stanford University Abstract Our paper will focus on a tripartite graph connecting flavor compounds to ingredients and ingredients to recipes On a fundamental level, our analysis will revolve around a graph projection of recipes to ingredients, called the complement network, which is established in previous research done on this data set In order to improve the complement network, we have created several novel metrics which we will apply to the weighting of the graph This will allow us to evaluate ingredient relationships in light of a food pairing hypothesis, which asserts the usage of ingredients with similar flavors in Western cuisines and the usage of ingredients with disparate flavors in Eastern cuisines Furthermore, we create the substitution network defined by Teng et al without the use of user data and instead inferring which ingredients can be substituted for one another in recipes using our new metrics Fi- nally, we generate new recipes using the information these two networks provide by starting with a set of seed ingredients or randomly chosen base ingredient and finding suitable additions based on our updated networks and hypotheses Introduction The process of recipe development is an intricate, cultural, and creative process which aims Strober to under- stand and produce palatable dishes with innovative combinations of ingredients In an attempt to better understand this process, the role of ingredients and fla- vors in recipes has been questioned and explored Several hypotheses have been made, only to be contested by contradictory assertions regarding what fundamental combinations create the best recipes One such hypothesis is the *food pairing hypothesis,” which simply asserts that ingredients that share common flavors combine well in recipes Graph analysis has recently been used to quantitatively analyze these roles, as it offers an accessible way to process large amounts of data and has inspired new assertions on how recipes are formulated However, these studies have also brought to light more questions Do ingredients of similar flavors or disparate flavors combine well in a recipe? Which ingredients can be substituted for one another without altering the underlying taste of a recipe? Can unconventional ingredients be combined to create palatable new creations? This study will strive to answer these types of questions by analyzing the relationships of ingredients, flavors, and recipes, ultimately building upon previous network oriented analyses Background and Related Work Previous studies on this topic, using network oriented analysis, have created a foundation for us the explore the data at hand and pose new questions The following section will briefly analyze three of them, such that we can relate this study to previously observed results 2.1 Flavor Network [1] This study gathers data relating ingredients to recipes and flavor compounds to ingredients in order to analyze the general patterns that underlie ingredient and flavor use in modern recipes, across various cul- tures As the degree to which a recipe is palatable is largely due to its ingredients, this paper dives deeper into the analysis of ingredients to include the flavor compounds which make up these ingredients, providing a more precise understanding of why recipes use certain combinations of ingredients One primary goal of the paper is to evaluate what is called the food pairing hypothesis, which assumes that ingredient pairs with many common flavor compounds go well together in the same dish This hypothesis has been a driving force behind the search for novel recipes and ingredient combinations The authors also hypothesize that, while the food pairing hypothesis is prevalent in Western cuisines, it is much less so in East Asian cuisines This paper uses a graph projection of flavors to ingredients, where ingredients are connected when sharing common flavor compounds These edges are weighted by the number of shared compounds In order to characterize each regional cuisine by its flavor compounds, the paper uses an authenticity metric to compare the prevalence of certain ingredients in a specific cuisine to that in all cuisines This metric showed that the ingredients in East Asian recipes tend to have disparate flavors, while the ingredients in North American and Western European recipes tend to have many flavor compounds in common, confirming their original hypothesis that the proclaimed food pairing hypothesis is only true for a limited set of cuisines This paper also performs important preprocessing and evaluation of the data to make for smoother analysis More specifically, sources of potential error, certain fundamental characteristics of the data, and limita- tions of the data sets are discussed One primary example is a concern regarding ingredients that are common in recipes not due to flavor, but due to other roles such as the mechanical stability or the color of the recipe The authors determine that these confounding factors can be filtered out systematically because of the large size of the data set and will not interfere with the analysis Ingredients such as egg, flour, or paprika are some examples of ingredients that can perform roles beyond flavor 2.2 Recipe Recommendation [5] In this paper, Teng et al expand on the analysis and use of the food network conducted by Ahn et al by introducing the bipartite relationship between ingredients and recipes and constructing a graph where an ingredient node and a recipe node have a connecting edge if the recipe uses that ingredient They then fold the graph to construct an ingredient graph where ingredients are connected based on Pointwise Mutual Information (PMI) defined on pairs of ingredients (a, b): where Pa, b) # of recipes containing a and b b) = # of recipes # of recipes containing a p(a) = —— # of recipes # of recipes containing b ?()=———————————— # of recipes This PMI metric tells us how likely two ingredients are to appear together in the same recipe versus separately, where complementary ingredients occur together far more often than would be expected by chance This graph, which they call the complement network, captures the co-occurrence frequency of two ingredients They found that this network is composed of two large communities: one savory and one sweet Teng et al also proposed a method for determining a substitution network by scraping user comments suggesting ingredient substitutions from allrecipes.com This substitution network has edges between two in- gredients (a, b) weighted by the p(a | b) for all ingredi- ent pairs (a, b) and represents the ability to switch two ingredients in a recipe without making it any worse Ultimately the paper uses its analysis of the food network to show that we can learn interesting insights about the underlying connections between ingredients using the vast data contained in online recipe sharing websites and use these insights to help predict user preferences and recommend recipes 2.3 food2vec [2] This article proposes learning embeddings for ingredients and using these embeddings to recommend addition(s) to a given recipe, where the embeddings are learned using word2vec on 100,00 recipes The con- cept behind this article is that embeddings enable us to learn the context of ingredients with respect to the rest of the ingredients in the recipe Then, once these ingredient embeddings are learned, we can use a dis- tance function on the embedding vector space to find the k-closest ingredients to the average of the ingredient embeddings in a given recipe to recommend as additions to said recipe An example of this is entering in a recipe such as [bread, peanut butter, jelly, honey] and the system recommending strawberry as an additional ingredient The paper also found some interesting patterns in the embedding vector space, primarily that cuisines of a certain locale were clustered together However, it also found that Northern European and American ingredients had a seemingly random structure, perhaps due to the cultural diversity in these regions or overrepresentation in the data Overall, this is an interest- ing application of modern natural language processing techniques to a recipe dataset in order to quantify the relationships between ingredients Data The datasets we will use are sourced from our first reaction paper Three recipe lists are included from epicurious.com, allrecipes.com, and menupan.com, including 57,691 recipes In addition, we will use a file connecting ingredients to each flavor compound associated with them, determined by their presence past a certain threshold There are 1,530 ingredients associated with a total of 1,107 flavor compounds in this file, but only 384 ingredients are included in the recipe data set As such, only 384 ingredients and 1,107 flavor compounds will be considered Network Representations of Data To start, we can improve upon our understanding of recipes which share similar ingredients and on our understanding of ingredients which can be substituted for each other in recipes These two topics are analyzed by Teng et al., and we plan to expand upon this analysis in several ways To so, we first recreated the original Complement Network from Teng et al.’s paper Next, we created three networks from our original tripartite graph by connecting flavors to ingredients and ingredients to recipes, which we describe below 4.1 Food Pairing Hypothesis Network The first is called the food pairing hypothesis network, which is a modification of the complement network created by Teng et al Rather than simply folding recipes onto ingredients using the PMI metric to weight the edges, this complement network folds both recipes and flavors onto ingredients, where each node in the network is an ingredient Edges in this network are weighted by a metric explained below, considering co-occurrence in recipes in addition to shared flavors We this so that we can evaluate the findings from Ahn et al.s paper, where they expand upon the original *food pairing hypothesis.” To review, they assert that Western cuisines pair ingredients with similar fla- vors while Eastern cuisines pair ingredients with disparate flavors The network described in Teng et al.s paper does not take into account the flavor components of each ingredient; however, we believe that the edge weights in Teng et al.'s complement network should take into account how similar or disparate two ingredients are because of Ahn et al.s analysis Thus, we propose a new weighting scheme: RF(a,b) = FF(a, b) = ROR RoRa tạn tị aa FPHF(a, b) = RF(a, b) * (FF(a, b) — median(FF))? Here, R; indicates the set of recipes containing ingredients 7, and F;; indicates the set of flavors contained in ingredient RF stands for the Recipe Factor, FF for the Flavor Factor, and FPHF for the Food Pairing Hypothesis Factor This metric takes into account the analysis done by Ahn et al because two very similar or very disparate ingredients will have larger edge weights if they also appear in the same recipe Thus, disparate ingredients should only have large edge weights when actually paired together and small otherwise 4.2 Updated Complement Network The second network we create will be another modification to the complement network created by Teng et al We propose a new method of weighting the edges in this complement network In our food pairing hypothesis network, we specifically weight edges between nodes in order to examine the food pairing hypothesis, where differences or similarities in flavors of ingredients will heavily influence the weight of edges In this updated complement network, we will provide a balance between measuring co-occurrence of ingredients in recipes and their similarity or difference in flavors — co-occurrence of ingredients in recipes plays a greater role in the weighting We so with the following metric: COF(a, b) = PMI (a, b) + \/(FF(a, b) — median(FF))? Here, we incorporate the PMI metric used in Teng et al.’s original complement network to implement recipe Our third network will be called the substitution network, which will fold both recipes and flavors onto ingredients, but with a separate metric to weight edges in order to evaluate how well ingredients can substitute for one another in a recipe While Teng et al.’s paper uses information captured in user reviews and comments to build an ingredient substitution network, we believe that this substitution network can be inferred directly from the food network using the following metric that we propose: 4.4.2 Degree Distributions 100 4.3 Substitution Network in our substitution network We hypothesize that Flavor Towns and Recipe Towns will have very little connection or overlap This is because Recipe Towns should include ingredients that go well together, but Flavor Towns should include ingredients that substitute well for each other and likely could not compose a recipe on their own Number of Nodes with a Given Degree co-occurrence We then weight this value by adding the square root of the relative strength of the difference or similarity of flavor profiles This way, recipe cooccurrence is still the backbone of the network, supplemented by our evaluation of flavor profile 50 100 150 Node Degree 200 (a) Complement Networks FF(a, b) SF(a, b) = I+RE(a,b) 50 100 150 Node Degree 200 250 —(b) Substitution Network Figure 1: Degree Distributions The following sections will provide an in-depth analysis of the results of our methods for each network 140 100 120 140 (c) Updated Complement 3» è 8 8 Number of Nodes with a Given Given Degree 80D legree 4.4, Analysis 4.4.1 Number of Nodes with a Given Degree 8 8 Here, SF stands for the Substitution Factor, which is constructed around the assumption that we can likely substitute one ingredient for another if they not often appear in the same recipes but have many common flavors We have constructed the denominator of the substitution factor such that a perfect score is an identical flavor profile with no co-occurence in recipes We believe that this weighting metric will produce a network that is similar to the substitution network defined in Teng et al.’s paper One primary difference is that Teng et al.’s substitution network is directed, whereas our network will be undirected 000 025 050 10 075 100 Node Degree 20 Node Degree 125 150 40 175 50 (d) Substitution Figure 2: Weighted Degree Distributions In Figure 2, we compare the weighted degree distri- Hypothesis butions for each of the networks we have created, in We have identified two types of communities within our graphs: Recipe Towns Flavor Towns Recipe Towns are communities in our complement network variants, and Flavor Towns are communities addition to the original complement network As the unweighted degree distributions will be the same for the original complement network, the food hypothesis network, and the updated complement network, this provides more insight into how our proposed weighting schemes change the structure of the networks In the original complement network and the updated complement network, the degree distributions are roughly linear with a negative correlation between weighted degree and number of nodes dated complement network, we In the up- also see a drop-off with higher weighted degrees, which indicates that a small number of ingredients appear in a large number of recipes with many other ingredients As the metric used to weight edges in the food pairing hypothesis network compares the number of shared flavors to the median number of shared flavors between ingredients, we can see that many ingredient pairings not have specifically distinct or similar flavors Thus, the degree distribution has a roughly logarithmic curve where a high number of ingredients have a small weighted degree and an increasingly small number have a high weighted degree This contradicts the updated ’food pairing hypothesis” made by Ahn et al., as relatively few ingredients used together tend to have specifically similar or disparate flavors Rather, this supports our hypothesis for recipe genera- ment network, the updated complement network, and the food hypothesis network were all the same, as we use the same threshold of number of recipes to qualify an edge to be created While our food hypothesis network had an additional threshold, this did not effect the results As such, these are placed in the same category of ’complement networks” in Table In our complement networks, the ingredients with the highest PageRank scores are widely used and serve as base or foundational ingredients in recipes We see that butter, wheat, and egg are high scoring, which makes sense given that their applications span across different cuisines and varieties of plates, such as appetizers, main dishes, and deserts The PageRank metric For our substitution network, the weighted degree distribution shows us that there are many ingredients that are not substitutable for anything and many that are substitutable for a large number of ingredients Otherwise, this number is relatively consistent across the board works well in describing the ingredients with the highest scores, but over-inflates ingredients that are used in fewer recipes, as they are likely to be connected to a base ingredient with a high score In the substitution network, the PageRank metric shows us which ingredients share more than the average number of flavors in common with many ingredients, but does not necessarily indicate which ingredients are highly substitutable for other ingredients This is because it is an aggregate metric and doesn’t account for weighted edges 4.4.3 4.4.4 tion, which we describe in further detail in section PageRank Below are the top 10 PageRank scores for each of our constructed networks: Ingredient | PageRank butter 0.01428 wheat 0.01327 onion 0.01313 egg 0.01278 garlic 0.01245 vegetable oil 0.01113 black pepper | 0.01100 cream 0.01082 olive oil 0.01081 vinegar 0.01060 Ingredient | PageRank black tea 0.00623 orange 0.00615 roasted beef | 0.00589 green tea 0.00574 tea 0.00571 jasmine tea 0.00571 raw beef 0.00564 beef 0.00564 strawberry 0.00553 soybean 0.00549 (a) Complement Networks (b) Substitution Network Table 1: PageRank by Network In Table 1, we analyze the rankings of pages in each of our networks according to an unweighted PageRank score This metric provides insight into the most important ingredients, considering how many ingredients they are connected to and how important these ingredients are The PageRank scores for the original comple- Unconnected Ingredients Below are randomly sampled ingredient pairs from each of our constructed networks that are unconnected (i.e not have an edge between them but may have a path): Ingredient | Ingredient black pepper eel turnip frankfurter cherry quince huckleberry beer bartlett pear date (a) Original Complement Ingredient | Ingredient orange juice chickpea nectarine lemongrass wheat bread sour cherry cherry thai pepper kumquat dill (c) Updated Complement Ingredient | Ingredient pimento artichoke mandarin roasted almond sheep cheese cream litchi raisin tequila mango (b) Food Pairing Hypothesis Ingredient | Ingredient vinegar shallot rhubarb feta cheese palm walnut grapefruit onion brassica kumquat (d) Substitution Table 2: Unconnected Ingredients by Network (Random Sample) By randomly sampling the unconnected ingredient By running community detection on each of our net- pairs from (a) and (c) in Table 2, we can analyze which works with the Clauset-Newman-Moore (CNM) algorithm [3] implemented in the SNAP.PY library, we can ingredients not pass our threshold for number of shared recipes required to justify an edge This provides an interesting way to analyze which ingredients are unlikely to appear together in our randomly generated recipes, independent of flavor As our food pairing hypothesis network also incorporates a threshold requiring either disparate or similar flavors, the requirements are more stringent in order to appear in the same recipes However, we must note that some dis- connected ingredient pairs, such as tequila and mango, may be likely to appear in the same recipe but simply not hit the recipe threshold This will not be an issue for recipe generation since there will likely be a path between these two ingredients even if it is not a direct edge In our substitution network, our threshold network as a whole In each network, there tends to be one or two Recipe Towns and the rest of the ingredients are in their own communities This seems to suggest that there are base ingredients and accent ingredients for recipes that make each recipe unique Without a threshold on the number of recipes required to make an edge, this structure would not be apparent In the visualizations shown in Figure 2, the size of each ingredient reflects the degree of that ingredient As such, we can see that several of the ingredients displayed in larger font, such as egg, butter, and vegetable oil, also have high PageRank scores as shown in Table 1; for an edge is that the number of flavors in common between two ingredients As such, the results shown in Table not meet this requirement and, as expected, appear to share few or no common flavors 4.4.6 Flavor Towns Below are the two major Flavor Towns extracted from our substitution network: = cheddar: _cheese munster _cheese ‡ Shếệp.- -cheese «a: DFOVOLone_chees vàng samjJk mi1k fatp an EbE98 h - walnut ae ane "1 oil „ swiss_cheese k a ˆ ST nh hệ SJasmine_ tea’ Smuscat _grape coda _ras phercy : rape ste ea es cur pe C pee19J.anse egetable_oil almond pect 26AN8Cr penespane rye_|ab ng nise_see ŠIÍÙ£D€E ^ ceesaurs tất ch :iveeetable „che€SGronano_ Kha champagne_wine fee “cheese :cottage” "cheese SẼ, Below are the largest Recipe Towns from each of our constructed networks: Ỹplueberr y Recipe Towns y 4.4.5 garner a better understanding of the structure of each “cane “NG 1as5e5 = honey ‘but tehanitl cream_cheese cream fig wheat WldalMOn Figure 4: Flavor Towns Peanut pumpkin For (a) Original Complement (b) Food Pairing Hypothesis lime ut almond cocoa Cinnamon buttermilk om vegetable oil cane_ molasse cream cheese ragsin ULERY apple ) ce No Tỉ KP: ee starch (c) Updated Complement Figure 3: Recipe Towns our substitution Clauset-Newman-Moore network, community we also ran the detection algo- rithm to find our Flavor Towns Here, we only found two significant communities, one centered around a variety of cheeses and the other centered around berries and fruits This indicates that ingredients with many variations or types are going to have many ingredients possible to be substituted with and will likely have greater weights between them, while other ingredients may have a select few or no ingredients that can be substituted for them Furthermore, part (d) from Table in the ap- pendix shows us that various meats are also likely to have many suitable substitutions, particularly types of seafood Finally, we can confirm our hypothesis that our Recipe Towns and Flavor Towns not resemble one another, as Recipe Towns contain ingredients that appear frequently with one another and Flavor Towns contain similar ingredients which are unlikely to cooccur in recipes 4.4.7 weighting metrics would be the driving force in determining the probability that a given node would be reached with a random walk [4] There are X steps in our generation process: 5.2.1 Seed Ingredients The generation algorithm enables the user to ask for recipes starting from one of four possibilities: Ingredient Relationships 1.Seed ingredients See Table in the Appendix for the top ingredients by each metric both overall and for each cuisine 2.Cuisine of choice 3.Random cuisine 4.Completely random Recipe Generation Our recipe generation engine is fundamentally an adaption of node2vec, where embeddings are created for ingredients and we find similar ingredients using Euclidean distance The Generation Architecture section will provide a detailed description of the methodology behind the generation engine 5.1 Hypothesis While the hypothesis made by Ahn et al asserts that pairs of ingredients in recipes from Eastern cuisines have significantly fewer shared flavor compounds than random pairs would have, we believe that this hypothesis lacks precision While key ingredients may specifically have disparate flavors, we hypothesize that there are also ingredients that play the role of fillers, which not contribute to the prominent flavors in a recipe As such there must be a reasonable proportion of accent ingredients to base filler ingredients in order to create a palatable recipe Otherwise, there would simply be an eclectic group of outstanding flavors We predict that this will be an important factor in generating suitable recipes As such, we will test recipe generations from each of our networks, expecting to see groupings of overpowering flavors when drawing from our food pairing hypothesis network and more balanced recipes when drawing from the updated complement network We hypothesize that these expectations will hold regardless of the preset cuisine choice or seed ingredients 5.2 Generation Architecture To start, we created our generation engine such that we could generate recipes from any of our networks Thus, when running node2vec to create the pertinent embeddings, we set p = and q = so that our Seed ingredients and cuisine choices can be specified explicitly by the user if desired Otherwise, the user can simply select random cuisine or completely random to test their luck If the user does not specify seed ingredients, the generator will randomly select two seed ingredients based on which option 2-4 above is chosen Once the seed ingredients are selected, the algorithm first de- termines which embeddings to use based on the user specified network 5.2.2 Centroid For each iteration of choosing an ingredient, we calculate the centroid of the current set of ingredient embeddings Our algorithm uses the centroid rather than the average distance between the embeddings of a new ingredient and each of the embeddings corresponding to ingredients in the current recipe because this is a good approximation and is much faster than the latter option The top ingredients with the smallest distance between their embedding and the centroid are then determined 5.2.3 Choosing New Ingredient Each time we want to choose a new ingredient, we first rank all of the remaining ingredients based on the euclidean distance between their embeddings and the centroid of this iteration We then calculate a corresponding proportional probability distribution where the proportional probability of choosing a particular ingredient is the reciprocal of the previously determined distance plus one We then normalize this prob- ability distribution by dividing by the sum of all proportional probabilities We create the probability distribution in this way such that ingredients with smaller distances have a higher probability of being sampled 5.2.4 Substitution Another essential component of the generation model and its interface is the option for substituting ingredients There are two options for doing so First, if a user wants to specify allergies or foods they absolutely want to avoid, there is an input option to indicate these foods When this option is indicated and one of the specified foods is randomly generated, we first calculate the top ten ingredients with the largest edge weights in the substitution network to the given ingredient We then calculate the average Euclidean distance between the original complement network embeddings of the potential substitute and those of each ingredient in the rest of the recipe Finally, we sampled from these top ten ingredients using a probability distribution defined by the weight of the edge minus a fraction of the calculated average distance to the rest of the recipe The other option for substituting ingredients is designed to enable evaluation of potential replacements in the recipe due to preference Once the generation algorithm is run, the user can run a substitution script which, for any specified ingredients and number of potential substitutes, will return the top substitutes according to edge weight in our substitution network This could also be seen as a manual, rather than automatic, method 5.2.5 Generation As stated above, once we have generated the probabil- ity distribution among the top ingredients by embedding distance, a random ingredient is then added to the recipe This process is repeated until the desired length of the recipe is reached The generation model also accepts several other arguments: the desired minimum and maximum length of the recipe, which network to use, and the number of accent ingredients desired The options for the network used are as follows: original complement network, updated complement network, food pairing hypothesis network, and a combination of the original complement network and the food pairing hypothesis network In this last option, base ingredients are chosen via the embeddings from the original complement network and accent ingredients are chosen via the embeddings in the food pairing hypothesis network This option explores our proposed hypothesis that a palatable recipe must have both base ingredients and accent ingredients 5.3 Results and Analysis For the purpose of our analysis, we generated recipes according to each of our three different network generation methods with the same seed ingredient This enables us to compare the quality of the recipes across different cuisines, as well as compare the recipes generated by the different networks Of course, a few generated recipes are not truly representative of the algorithm, but they provide enough information for an interesting discussion Furthermore, as the evaluation of recipes is largely subjective, much of this analysis must be qualitative 5.3.1 Original Complement Network Cuisine Seed | Recipe American | chicken | lima bean, yam, dill, onion, artichoke, almond Japanese | chicken | olive oil, caraway, wasabi, sesame oil, roasted sesame seed, enokidake African | chicken | basil, cardamom, bean, almond, peanut, honey French | chicken | mushroom, lime juice, vegetable oil, coriander, tamarind, hazelnut Table 3: Seed Generated Recipes 5.3.2 Food Pairing Hypothesis Network Cuisine Seed | Base American | chicken | celery, cashew, egg Japanese | chicken | bacon, tomato grapefruit, juice, dill, cauliflower, tabasco pepper, pepper, nira, cherry African | chicken | starch, carrot, milk plum, wine, bassica black fat, French | chicken | fenugreek, cottage cheese, cayenne, cashew, carrot, oyster Table 4: Seed Generated Recipes 5.3.3 Updated Complement Network 58 Average Pairwise Distance Within Recipe OCN FPH OCN_FPH UCN 2813 Cuisine 264 Seed | Recipe American | chicken | kale, coconut, tabasco pepper, champagne wine, cayenne, squash Japanese | chicken | oyster, cider, milk, cocoa, sesame seed African | chicken | rose, French | bell pepper, pea, olive, mustard, mace chicken | beef, basil, lentil, plum, asparagus, bread 241 221 204 18 16 143 OCN-FPH: Note that (a) stands for accent ingredient Cuisine Seed | Base Accent American | chicken | white wine, | beef — broth, oatmeal, cane | kale, romano molasses cheese Japanese | chicken | egg, black | rice, chinese African | chicken | lime peel | bell pepper, French | 20 25 30 By comparing our results to the trends seen in Figure , we can see that it is good to have some distance between ingredients, as the ingredients added through the food pairing hypothesis network are likely to have a further distance from the embeddings from the orig- Base and Accent bean, lemon | 15 Figure Table 5: Seed Generated Recipes 5.3.4 10 cabbage, sesame seed oil, chick- | beet, black pea, cane | pepper molasses chicken | parsley, seed, brussels bone oil, | sprout, radish cardamom Table 6: Seed Generated Recipes First, we must note that all three generation networks have the potential to produce high-quality recipes; however, we have noticed through repeated generation that the original complement network and the updated complement network often tend to put multiple meats in the same recipe, whereas the Base and Accent generation method tends not to this This leads us to believe that our Base and Accent hypothesis produces better recipes on average, while the original complement network and updated complement network produce pretty good recipes, and the food pairing hypothesis network produces low-quality recipes As hypothesized the food pairing hypothesis network lacks sufficient base ingredients to consistently create a coherent and balanced recipe inal complement network However, the need for base ingredients and some distance between ingredients is clear, as just sampling from the food pairing hypothesis network gives poor results Conclusion One remarkable conclusion we gathered was that although recipes created with both the original complement network and the food pairing hypothesis network were consistently of high quality, this was emphasized when the recipe was generated within a certain cuisine rather than completely at random We believe that the ability to generate good recipes becomes easier when you specify the cuisine since a specific cuisine has been developed and nurtured over the course of hundreds to thousands of years Overall, the success of our recipe generation when considering both base and accent ingredients confirms that the updated food pairing hypothesis by Ahn et al lacks the specification that the foundations of a recipe, regardless of the cuisine, not rely on specifically similar or disparate flavors Further Research One potential focus of further research revolves around the length of recipes As the relationships between ingredients, particularly the number of base versus accent ingredients, changes with the length of recipes, our generation algorithm could be improved by incorporating a cutoff judging whether superfluous ingredients have been added to the recipe at hand Furthermore, a deeper understanding of the make- ups of base and accent ingredients would also enable for a more precise generation algorithm, as we conclude that this is an essential part of recipe generation Code You can find the code for this project at: https://github.com/wbakst/food References [1] Y Ahn, S E Ahnert, J P Bagrow, Flavor network and the principles CoRR, abs/1111.6074, 2011 and A Barabasi of food pairing [2] J Altosaar food2vec — augmented cooking with machine intelligence https://jaan.io/food2vec-augmentedcooking-machine-intelligence/ [3] A Clauset, M E J Newman, and C Moore Finding community structure in very large networks 2004 [4] A Grover and J Leskovec node2vec: learning for networks [5] C Teng, Y Lin, and Scalable feature CoRR, abs/1607.00653, 2016 L A Adamic ommendation using ingredient abs/1111.3919, 2011 Recipe networks rec- CoRR, A Appendix See following pages for tables 10 Ingredient sav Ingredient | Ingredient | turmeric fenugreek | egg wheat coriander fenugreek | butter wheat turmeric coriander milk wheat garlic tomato garlic olive oil wheat vanilla sesame oil soy sauce lavender savory garlic cayenne black pepper onion cumin fenugreek | egg vegetable oil | basil oregano onion pepper black pepper garlic ginger soy sauce olive oil tomato cayenne onion egg vanilla chive cucumber | fennel pork sausage | cumin coriander Score emo am u turmeric en mM carawa sau VI enne eno carawa' e musse ovage (1 (a) PMI Ingredient sav Ingredient emmental c ort c Score uefort cheese emo Score 0.173 0.124 0.120 0.111 0.091 0.084 0.074 0.073 0.073 0.069 0.066 0.064 0.060 0.059 0.059 0.059 0.059 0.058 0.057 0.057 0.056 0.055 0.054 0.054 0.053 (b) FPHF Ingredient munster cheese munster c emmen Cc emmen Cc Taw ee ort WwW carawa asmine tea wine oat c sweet musSSe turmeric munster cheese tea e wine C S mM mus carawa V TH sauvl Oc jamaican rum Tum (d) SF (c) COF Table 7: Top 25 Ingredients For Each Metric 11 n wine (a) FPHF (b) COF (c) SN (d) PMI Table 8: Top For Cuisine: african (a) FPHF (b) COF (c) SN (d) PMI Table 9: Top For Cuisine: american (a) FPHF (b) COF (c) SN (d) PMI Table 10: Top For Cuisine: asian (a) FPHF (b) COF (c) SN (d) PMI Table 11: Top For Cuisine: austria (a) FPHF (b) COF (c) SN (d) PMI Table 12: Top For Cuisine: bangladesh (a) FPHF (b) COF (c) SN (d) PMI Table 13: Top For Cuisine: belgium (a) FPHF (b) COF (c) SN Table 14: Top For Cuisine: cajun creole 12 (d) PMI yam (a) FPHF yam (b) COF (c) SN (d) PMI Table 15: Top For Cuisine: canada (a) FPHF (b) COF (c) SN (d) PMI Table 16: Top For Cuisine: caribbean orange (a) FPHF (b) COF (c) SN (d) PMI Table 17: Top For Cuisine: central southamerican (a) FPHF (b) COF (c) SN (d) PMI Table 18: Top For Cuisine: chinese (a) FPHF (b) COF (c) SN (d) PMI Table 19: Top For Cuisine: east-african (a) FPHF (b) COF (c) SN (d) PMI Table 20: Top For Cuisine: east asian savory (a) FPHF (b) COF (c) SN Table 21: Top For Cuisine: eastern-europe 13 (d) PMI caraway (a) FPHF (b) COF (c) SN (d) PMI Table 22: Top For Cuisine: easterneuropean russian caraway (a) FPHF (b) COF (c) SN (d) PMI Table 23: Top For Cuisine: english scottish (a) FPHF (b) COF (c) SN (d) PMI Table 24: Top For Cuisine: french (a) FPHF (b) COF (c) SN (d) PMI Table 25: Top For Cuisine: german savory (a) FPHF (b) COF (c) SN (d) PMI Table 26: Top For Cuisine: greek (a) FPHF (b) COF (c) SN (d) PMI Table 27: Top For Cuisine: indian (a) FPHF (b) COF (c) SN Table 28: Top For Cuisine: indonesia 14 (d) PMI (a) FPHF (b) COF (c) SN (d) PMI Table 29: Top For Cuisine: iran (a) FPHF (b) COF (c) SN (d) PMI Table 30: Top For Cuisine: irish (a) FPHF (b) COF (c) SN (d) PMI Table 31: Top For Cuisine: israel caraway (a) FPHF : Caraway (b) COF (c) SN (d) PMI Table 32: Top For Cuisine: italian (a) FPHF (b) COF (c) SN (d) PMI Table 33: Top For Cuisine: japanese caraway (a) FPHF caraway (b) COF (c) SN (d) PMI Table 34: Top For Cuisine: jewish (a) FPHF (b) COF (c) SN Table 35: Top For Cuisine: korean 13 (d) PMI cayenne (a) FPHF (b) COF (c) SN (d) PMI Table 36: Top For Cuisine: lebanon cayenne (a) FPHF (b) COF pepper (c) SN (d) PMI Table 37: Top For Cuisine: malaysia caraway (a) FPHF (b) COF (c) SN (d) PMI Table 38: Top For Cuisine: mediterranean (a) FPHF (b) COF (c) SN (d) PMI Table 39: Top For Cuisine: mexican (a) FPHF (b) COF (c) SN (d) PMI Table 40: Top For Cuisine: middleeastern caraway (a) FPHF caraway (b) COF (c) SN (d) PMI Table 41: Top For Cuisine: moroccan (a) FPHF (b) COF (c) SN Table 42: Top For Cuisine: netherlands 16 (d) PMI (a) FPHF (b) COF (c) SN (d) PMI Table 43: Top For Cuisine: north-african cayenne (a) FPHF (b) COF (c) SN (d) PMI Table 44: Top For Cuisine: pakistan corn (a) FPHF pepper (b) COF (c) SN (d) PMI Table 45: Top For Cuisine: philippines (a) FPHF (b) COF (c) SN Table 46: Top For Cuisine: portugal (a) FPHF (b) COF (c) SN (d) PMI Table 47: Top For Cuisine: scandinavian (a) FPHF (b) COF (c) SN (d) PMI Table 48: Top For Cuisine: south-african (a) FPHF (b) COF (c) SN Table 49: Top For Cuisine: south-america 17 (d) PMI (a) FPHF (b) COF (c) SN (d) PMI Table 50: Top For Cuisine: southern soulfood (a) FPHF (b) COF (c) SN (d) PMI Table 51: Top For Cuisine: southwestern (a) FPHF (b) COF (c) SN (d) PMI Table 52: Top For Cuisine: spain (a) FPHF (b) COF (c) SN (d) PMI Table 53: Top For Cuisine: spanish portuguese (a) FPHF (b) COF (c) SN (d) PMI Table 54: Top For Cuisine: switzerland (a) FPHF (b) COF (c) SN (d) PMI Table 55: Top For Cuisine: thai cayenne (a) FPHF (b) COF (c) SN Table 56: Top For Cuisine: turkey 18 (d) PMI (a) FPHF (b) COF (c) SN (d) PMI Table 57: Top For Cuisine: uk-and-ireland pepper (a) FPHF (b) COF (c) SN (d) PMI Table 58: Top For Cuisine: vietnamese cane cayenne (a) FPHF (b) COF (c) SN (d) PMI Table 59: Top For Cuisine: west-african (a) FPHF (b) COF (c) SN Table 60: Top For Cuisine: western 19 (d) PMI