1. Trang chủ
  2. » Ngoại Ngữ

NLP_Rankings-_Publication-based_Ranking_System_and_Platform_for_NLP_Research

45 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề NLP Rankings: Publication-based Ranking System and Platform for NLP Research
Tác giả Chloe Lee
Người hướng dẫn Jinho D. Choi Adviser, Bree Ettinger Committee Member, Jeremy Jacobson Committee Member
Trường học Emory University
Chuyên ngành Mathematics
Thể loại Thesis
Năm xuất bản 2020
Định dạng
Số trang 45
Dung lượng 1,01 MB

Cấu trúc

  • 1. Introduction (9)
  • 2. Related Works (11)
    • 2.1. Generic University Rankings (11)
      • 2.1.1. U.S. News Rankings (11)
      • 2.1.2. QS World University Rankings (12)
    • 2.2. Publication-Based University Rankings (12)
      • 2.2.1. NTU Ranking (13)
      • 2.2.2. CSRankings (14)
  • 3. NLP Rankings (16)
    • 3.1. Data Collection (16)
    • 3.2. Author-University Matching (18)
    • 3.3. Scoring Mechanism (20)
  • 4. Demonstration (23)
    • 4.1. Rankings (23)
    • 4.2. Visualizations (24)
  • 5. Analysis (26)
    • 5.1. University-Level Analysis (26)
      • 5.1.1. Top 50 Universities in the United States (26)
      • 5.1.2. University Trend Clustering (27)
    • 5.2. Author-Level Analysis (30)
      • 5.2.1. Top Universities Attended by Top 100 NLP Authors (30)
      • 5.2.2. Authors Success Evaluation: weight-contribution index (32)
    • 5.3. User Analysis (34)
      • 5.3.1. Log Data Statistics (34)
      • 5.3.2. Weight Customization (35)
      • 5.3.3. Re-Visit Frequency (35)
  • 6. Conclusion and Discussions (37)
  • 7. References (40)
  • 8. Appendix (41)

Nội dung

Introduction

As the Information Age progresses, the demand for analyzing unstructured textual data has surged, highlighting the importance of Natural Language Processing (NLP) and leading to the establishment of various NLP programs in higher education With each university offering unique strengths, prospective students and faculty often face challenges in selecting the right programs While several university rankings exist to assess overall quality, none specifically focus on NLP To address this gap, NLP Rankings has been developed as a publication-based ranking system, providing valuable insights into the research environments of academic institutions in the United States This platform is publicly accessible, ensuring that the research community can easily access relevant information about NLP programs.

The ranking system evaluates academic research achievements in NLP by analyzing publications from reputable journals and conferences, specifically sourced from the ACL Anthology This ranking should complement traditional generic rankings to assess program performance and suitability, enabling researchers to make informed career decisions in the field of NLP.

This article reviews the advantages and drawbacks of current rankings, detailing the data collection and cleaning methods, along with the scoring systems used to generate NLP Rankings Additionally, it showcases a user-friendly platform that allows customization of ranking weights based on various publication venues and conferences, enabling users to tailor the results to their specific needs and preferences.

Section 5 of the article offers valuable insights for prospective students and faculty candidates by presenting various levels of evaluation related to NLP rankings It also assesses the usefulness of these rankings for the research community Finally, Section 6 concludes with a discussion of the findings and outlines future steps to enhance NLP Rankings.

Related Works

Generic University Rankings

Numerous reputable university rankings exist both in the United States and globally, typically assessing multiple indicators to evaluate university performance A key factor in these rankings is the opinions of peers and professionals, which can introduce subjectivity and potential manipulation into the evaluation process.

Since 1983, U.S News has been a leading source for university rankings, offering essential guidance to students in their educational choices Its Best Graduate Schools rankings, like its undergraduate counterparts, utilize a range of indicators to evaluate institutions.

1 expert opinions about the program excellence

2 statistical indicators that measure the quality of a school’s faculty, research, and students

U.S News rankings are derived from statistical surveys completed by academic professionals, including deans, program directors, and senior faculty, alongside various indicators such as admission criteria, student-faculty ratios, and job placement success The final rankings are determined by U.S News applying undisclosed weights to each indicator, reflecting their judgment of importance However, the rankings may be considered unreliable due to a tendency among experts to favor the institutions where they have studied or worked.

International university rankings adopt a standardized methodology to evaluate academic institutions The QS World University Rankings, published annually by Quacquarelli Symonds (QS), was established in 2004 to address the demand for a global university ranking system This ranking is based on six key metrics, each assigned specific weights to assess the performance of universities worldwide.

As showed in the weightings of each metrics, 50% (Academic Reputation and

The university evaluation's Employer Reputation is based on subjective opinions derived from the QS Academic and Employer Surveys Additionally, institutional research quality contributes only 20% to the overall ranking, highlighting its limited impact on the assessment.

Citations per Faculty metric, which may be susceptible to “citation cartels”4.

Publication-Based University Rankings

On the other hand, as opposed to generic rankings that incorporates multiple factors, some university rankings focus on field-based ranking, reflecting the level of research

The phenomenon of authors within a specific group disproportionately citing each other highlights a unique aspect of academic collaboration Unlike traditional university rankings, which evaluate institutions on a broad spectrum, these rankings focus specifically on the volume and impact of scientific publications, emphasizing the importance of scholarly contributions to the academic community.

Established in 2007 by the Higher Education Evaluation and Accreditation Council of Taiwan (HEEACT) and continued by National Taiwan University (NTU) since 2012, the NTU Ranking utilizes bibliometric methods to assess universities based on their scientific publication performance In addition to providing an overall ranking of international universities, the NTU Ranking also features subject-specific rankings across various academic disciplines.

The criteria and overall performance indicators with weightings for the rankings are as followed:

- Number of articles in previous years6 (10%)

- Number of articles in the current year (15%)

- Number of citations in previous years (15%)

- Number of citations in the last 2 years (10%)

- Average number of citations in previous years (10%)

- H-index7 of the last 2 years (10%)

- Number of highly cited papers in previous years (15%)

5 Also known as Performance Ranking of Scientific Papers for World Universities; http://nturanking.lis.ntu.edu.tw/

6 Starting from 2008 to current year

7 A metric used to evaluate the cumulative scholarly impact of an author's by measuring measure both the productivity and citation impact of one’s academic publications

- Number of articles in the current year in high-impact journals (15%)

Publication-based rankings assess universities based on both the quantity and quality of their research outputs, with a significant focus on quality In the NTU Ranking, the quality of publications is determined by the prestige of the journals and the frequency of citations, although the latter may lack objectivity.

CSRankings, developed by Emery Berger, is a widely recognized ranking system for Computer Science programs that evaluates multiple research areas Unlike traditional rankings that rely on reputation and surveys, CSRankings adopts a metrics-based approach, focusing on universities' performance at prestigious publication venues Each university's score is derived from the credits earned by its faculty, ensuring an unbiased evaluation To prevent manipulation, citation-based metrics are excluded, and the ranking system uses respected conferences and journals as proxies to maintain integrity.

While popular university rankings like U.S News and QS are widely used, they cater more to industry-focused students rather than those pursuing NLP research These rankings tend to be generic and subjective, lacking the specificity needed for academic interests Although the NTU Ranking provides field-based evaluations across various subjects, it does not offer a dedicated ranking specifically for NLP research, highlighting a gap in resources for academics in this field.

Natural Language Processing (NLP) publication quality is often assessed using citation-based metrics, which can be misleading due to the prevalent "citation cartel" phenomenon in academia While CSRankings serves as a valuable resource for those exploring various Computer Science programs, it exclusively evaluates publications from the ACL, EMNLP, and NAACL venues for ranking NLP programs Additionally, CSRankings treats publications from different journals and conferences with equal weight, a practice that may not align with everyone's perspective on publication value.

NLP Rankings

Data Collection

The rankings are based on publications sourced from the ACL Anthology, the largest open-source platform for NLP-related research This includes both long and short papers published between 2010 and 2019 from various reputable venues in the field of Natural Language Processing.

- Annual Meeting of the Association for Computational Linguistics (ACL)

- International Conference on Computational Linguistics (COLING)

- Conference on Computational Natural Language Learning (CoNLL)

- European Chapter of ACL (EACL)

- Conference on Empirical Methods in NLP (EMNLP)

- International Joint Conference on NLP (IJCNLP)

- North American Chapter of ACL (NAACL)

9 https://www.aclweb.org/anthology/

- Transactions of the Association for Computational Linguistics (TACL)

Workshop and demonstration papers (WS) exceeding four pages are collected, encompassing workshops organized by ACL events, student research workshops, and system demonstrations at these venues This collection also includes conferences and workshops hosted by Special Interest Groups (SIG), the Conference on Lexical and Computational Semantics, and the Semantic Evaluation (SEMEVAL).

ACL CL EMNLP NAACL TACL COLING CoNLL EACL IJCNLP WS Total

𝑊 : default weight; |𝑉| : # of proceedings or issues; |𝑃| : total # of publications; |𝐴| : total # of authors; {𝐴} : total # of unique authors;

|𝑃 𝑢 | : # of publications by academic authors in U.S.; |𝐴 𝑢 | : # of academic authors in U.S.; {𝐴 𝑢 } : # of unique academic authors in U.S

Table 1: Statistics of the publications collected

All publications include a bibliography file (*.bib) with meta-information and a PDF file containing the publication's content While these bibliography files are structured, their formats can vary across different venues and years, leading to conversion into a consistent JSON format Additionally, PDF files are transformed into text files for further information extraction It's important to note that some publications may be excluded if their PDF files are scanned images that cannot be converted into text, although this is rare for more recent papers.

10 As most workshop/demonstration papers under 4 pages (including references) are found to be incomplete, they are discarded due to quality control

Figure 1.1: Number of NLP publications over the last 10 years

Figure 1.2: Number of NLP authors over the last 10 years

Over the past decade, the field of Natural Language Processing (NLP) has seen a significant increase in both the number of publications and authors Notably, academic institutions in the United States contribute approximately 25-30% of the total publications and authorship in this growing area of research.

Author-University Matching

Authors often include email addresses from their primary organizations in publications to indicate institutional affiliation and provide a contact method In academia, it's common for authors to be associated with multiple organizations, leading them to select the most relevant email address based on their authorship For example, a professor who collaborates with an industrial company during the summer may opt to use the company's email address in their publication, thereby attributing the work to that organization.

Email addresses are typically found within the first 2,000 lines of original PDF files, making it essential to extract them from this section of converted text files A comprehensive set of regular expressions is utilized to identify various email formats, including special cases where multiple emails are grouped and presented within brackets, such as {id1,id2}@institute.edu This method achieves an impressive 85.8% coverage of publication authors' email addresses.

Emails in bibliography files are not consistently ordered according to the authors' names To address this, we match the actual email addresses extracted with pseudo-generated email addresses by minimizing the Levenshtein distance between them.

Academic institutions typically use similar naming conventions for emails as follows (f/m/l: the initial of the first/middle/last name, (m) is optional):

To accurately match emails with authors, a matrix \( M \in \mathbb{R}^{e \times c} \) is constructed for each publication, where \( e \) represents the number of extracted email addresses and \( c \) is calculated as \( n \cdot a \), with \( n = 6 \) indicating the number of naming conventions and \( a \) being the number of authors in the bibliography Each column in the matrix corresponds to a pseudo-generated email address created by substituting the author’s first name, last name, and initials The matrix is populated with the Levenshtein distance between each email and author name pairing Authors are then matched to emails by sequentially taking the \( \text{argmin} \) of the matrix, aligning each author with the most similar email If there are more authors than emails, the contributions from unmatched authors are excluded from scoring.

Scoring Mechanism

NLP Rankings provides a metric-based evaluation of universities in the United States, specifically tailored for prospective NLP students and current researchers Unlike generic university rankings, NLP Rankings prioritizes research achievements through academic publications rather than expert opinions It also avoids citation-based metrics, as seen in the NTU Ranking, to prevent manipulation Furthermore, NLP Rankings is uniquely focused on the field of NLP, incorporating various scoring features that distinguish it from other rankings like CSRankings.

In contrast to CSRankings, which treats all journals and conferences equally, NLP Rankings assigns specific weights to publications based on their venue and type Major venues like CL, TACL, ACL, NAACL, and EMNLP are given a weight of 3, while other conferences are assigned a weight of 2, and workshops or demonstrations receive a weight of 1 Users can customize these weights on the NLP Rankings platform to suit their individual preferences Additionally, similar to CSRankings, the credit for each publication is distributed evenly among all authors, ensuring that each author receives a score of 𝑤.

The weighted credit score, denoted as \( w \), is calculated based on the total number of authors, represented as \( a \), in a publication Each institution's score is determined by aggregating the weighted scores of all its authors, utilizing the matching algorithm outlined in Section 3.2.

NLP Rankings offers a more comprehensive evaluation of university contributions by considering both faculty and student authorship, unlike CSRankings, which relies solely on faculty scores For example, in a paper with four authors—two students and one professor from institution I1 and one professor from institution I2—NLP Rankings allocates 75% of the credit to I1 and 25% to I2 In contrast, CSRankings would assign 25% credit to both institutions, disregarding the significant role of student contributions, which can lead to misleading comparisons between universities.

NLP Rankings employs a unique scoring mechanism that accounts for institutional authorship, meaning that an author's scores do not transfer when they move to a different institution While a well-regarded author with numerous high-quality publications may maintain strong performance at a new institution, this is not assured due to varying research environments and student quality Instead, NLP Rankings reflects each author's activity level at their respective institutions by noting the year of their most recent publication associated with that institution.

Demonstration

Rankings

Figure 2: NLP Rankings Homepage User Interface

A Time Range 𝑆𝑡𝑎𝑟𝑡𝑖𝑛𝑔 𝑌𝑒𝑎𝑟 and 𝐸𝑛𝑑𝑖𝑛𝑔 𝑌𝑒𝑎𝑟 filter the NLP publications that are used to calculate the rankings Rankings will refresh immediately after selection

B Display As default, only the top 100 academic institutions and authors are displayed

Users may choose the number of academic institutions and researchers that are displayed on the platform

C Weights NLP Rankings is designed to allow users to weight publications based on their preference and understanding of the values of different publication venues and types Users may increase or decrease the default credits, which updates both the institution and author rankings with customized weights

D Academic Institution Ranking By clicking on each institution name, the drop-down menu contains all the authors who had published for the institution, their respective scores achieved within the selected time range, the latest year each author published for the institution, as well as the total number of publications written by each author within the selected time range for the institution

E Author Ranking This ranking shows the top academic researchers with their respective scores.

Visualizations

The visualizations tab includes several interactive graphs that allow users to compare up to 5 different universities in different aspects

1 University Ranking Score Timeline This graph shows the selected universities’ ranking score from 2010 to 2019 in a stacked bar chart, where the lighter shading represents the amount of score contributed by top 10% of the authors in the university in terms of ranking scores

2 Number of Authors in Various Publication Amount This stacked bar chart shows the number of authors in the university who have published only one, two, or more than three publications

3 Average Publication Percent Contribution Overtime This line graph shows the average publication contribution percentage from 2010 to 2019, which indicates how likely universities co-author with different universities overtime

4 Average Number of Authors per Publication Overtime This line graph shows the average number of authors per paper of each university from 2010 to 2019, which reflects how independent the researchers at each university are.

Analysis

University-Level Analysis

5.1.1 Top 50 Universities in the United States

Appendix A presents the top 50 universities from NLP Rankings between 2010 to

2019 based on the default weights mentioned in Section 3.3 Carnegie Mellon

The University leads the U.S in NLP research, ranking first among 216 universities, with a research population and NLP ranking score that are twice those of the University of Washington, which holds the second position Additionally, notable differences in ranking scores exist between the top institutions, including Stanford University at third and Johns Hopkins University.

Hopkins University (4th) has a score difference of 97.54, which is roughly equivalent to

33 long papers; Johns Hopkins University (4th) and Columbia University (5th) has a difference of 72.01, which is roughly equivalent to 24 long papers; Columbia University

The Massachusetts Institute of Technology (6th) has a score difference of 57.62 compared to the 5th ranked institution, which is approximately equal to 19 long papers As the rankings progress, the gaps between consecutive ranks narrow, particularly up to the sixth position, with subsequent ranks showing only slight variations in scores, often just a few long papers apart.

The overall ranking reflects the performance of universities over the past decade To gain insights into their growth and changes, Appendix B details the annual rankings of the top 50 universities listed in Appendix A, highlighting several significant trends.

Over the past decade, prestigious institutions like Carnegie Mellon University, University of Washington, and Stanford University have consistently ranked among the top universities in natural language processing (NLP) Notably, Carnegie Mellon has maintained its position as the leading university for all ten years, highlighting its exceptional academic performance in the field Other top 10 universities have also demonstrated strong competitiveness throughout this period.

Washington and Stanford University began lower in rank, but showed an upward movement in NLP Rankings year over year, and ended with ranks second and forth respectively in 2019

In recent years, some prestigious universities have experienced a decline in their rankings Notably, the University of California, Berkeley, which ranked third in 2010, fell out of the top 20 by 2019 Similarly, Columbia University has also seen a decrease in its performance Despite these declines, among the top 50 universities, there are more institutions that have improved their rankings; 29 universities experienced rank increases from 2010 to 2019, while 20 saw a drop, and only one maintained its position Overall, the average rank change during this period was 15.52, suggesting a general trend of improvement in university rankings.

While program rankings are a common reference for students applying to graduate NLP programs, several other factors play a crucial role in their application decisions One significant aspect is the trend in each university's ranking score, which serves as an indicator of both current and future performance Since students typically apply to multiple universities, often combining similar and diverse programs, a hierarchical cluster analysis is conducted to group universities based on their ranking trends This analysis employs an agglomerative hierarchical clustering algorithm, initially grouping universities into individual clusters, which are then merged into larger clusters as the hierarchy progresses, ultimately culminating in a single collective group of all universities.

Hierarchical clustering is preferred for grouping universities due to its ability to handle an unknown number of clusters (𝑘) while also identifying sub-clusters This method effectively highlights universities with similar research interests and areas in Natural Language Processing (NLP).

Cluster analysis employs the Ward variance minimization algorithm to assess the distance between newly formed clusters and untouched universities The distance function is defined accordingly.

𝑛 𝑖 + 𝑛 𝑗 + 𝑛 𝑘 𝑑(𝑐 𝑖 , 𝑐 𝑗 ) 2 where 𝑐 𝑖 , 𝑐 𝑗 , 𝑐 𝑘 are disjoint clusters with sizes 𝑛 𝑖 , 𝑛 𝑗 , and 𝑛 𝑘

Appendix C presents a dendrogram illustrating the clustering results of 216 universities, categorizing them into three distinct groups The red cluster represents high-tier universities with superior ranks and scores, while the green cluster denotes mid-low tier universities that exhibit lower rankings and fewer NLP publications Notably, Carnegie Mellon University stands out as an outlier in the blue cluster, demonstrating exceptional performance in NLP research compared to its peers.

Among the 26 high-tier universities, all are ranked within the top 30 in the overall NLP Ranking Notably, the University of North Texas (26th), Brandeis University (28th), and Stony Brook University (29th) are in the top 30 but do not belong to the high-tier group.

Appendix B, these universities have a downward sloping trend, which may be the reason why they are not included in the high tier group

There are 189 mid-lower tier universities, many of which have minimal NLP publications While some, like Brown University from the Ivy League, have notable research achievements, their recent inactivity, as shown in Appendix B, has led to their classification within the mid-lower tier.

Exploring the sub-clusters within the high tier of universities is essential, as many students aspire to gain admission to prestigious programs and often apply to similar institutions As illustrated in Figure 3, a sub-cluster of four high-tier universities exhibited a strong performance starting in 2010, followed by a dip until the mid-2010s, after which they showed significant improvement between 2018 and 2019 This trend in publication scores suggests a shared research interest, indicating that certain topics gain popularity in specific years, leading to an increase in published papers The mid-2010s marked a surge in interest in neural networks within natural language processing (NLP), resulting in NLP labs focused on this area experiencing a rise in their ranking scores By 2016, all four universities had multiple publications on topic modeling and neural learning in NLP.

Clustering results can sometimes inaccurately reflect future research quality, as they remain influenced by historical performance For example, the University of Texas Dallas is categorized in the high tier; however, its recent three-year performance has declined, as evidenced by its drop from rank 13, as detailed in Appendix B.

Figure 3: Score Trends of Clustered Universities

Author-Level Analysis

5.2.1 Top Universities Attended by Top 100 NLP Authors

From a young researcher's perspective, working in a lab alongside renowned scientists offers significant advantages It provides an opportunity to adopt the thought processes of accomplished researchers and gain insights into publishing in high-impact journals Additionally, collaboration with leading innovators enhances their publication experience While universities have some influence on researchers, the institutions where top researchers have trained or currently work reflect high-standard research environments.

The Author Ranking in NLP Rankings evaluates leading academic researchers based on their cumulative publication scores, which aggregate all publication scores from the past decade for each institution Among the 7,426 authors affiliated with universities in the United States, Figure 4 illustrates the institutions associated with the top 100 authors.

Figure 4: Universities Attended by Top 100 NLP Authors

In NLP Rankings, the correlation between author ranking and university ranking is evident, as a university's ranking score is derived from the cumulative scores of its contributing authors A university achieves a higher ranking when it boasts strong researchers Consequently, it is not surprising that the leading authors are predominantly affiliated with Carnegie Mellon University, followed by Stanford University and Columbia University.

The leading universities in Natural Language Processing (NLP) research align closely with top university rankings However, institutions that host only one or two prominent NLP authors may indicate emerging research hubs in the field Despite their lower overall rankings, these universities can offer a supportive research environment, attracting top NLP talent to establish new laboratories This focus on recruitment can provide significant opportunities for young researchers joining these programs, fostering growth and innovation in NLP.

5.2.2 Authors Success Evaluation: weight-contribution index

University rankings are widely used by academic researchers to evaluate the quality of potential research environments and anticipate their research progress However, these rankings can be misleading, as they focus on the university level, while researchers are primarily concerned with individual author performance.

The h-index, introduced by Jorge E Hirsch in 2005, serves as a key metric for assessing the publication excellence of individual authors It measures both publication productivity and citation impact by counting the number of papers with citation counts equal to or greater than a specific value, h In essence, an author with an h-index of h has h publications that each received at least h citations, while the remaining publications have fewer than h citations.

The index is widely recognized in academia as a measure of a researcher's achievements and is often used for research fellowships and university positions Its mathematical simplicity makes it a strong indicator that promotes a high volume of quality publications However, like all citation-based metrics, it has drawbacks; it is influenced by the specific field of study and is susceptible to manipulation.

The weight-contribution index (𝑤𝑐-index) is a proposed variation of the ℎ-index for NLP Rankings, which aims to enhance citation impact by factoring in the prestige of publication journals while maintaining productivity In this framework, different journals and conferences are assigned weights based on their type and venue, reflecting their relative prestige The selection of an author’s paper by a journal or conference committee signifies a significant achievement in research and academic quality.

In the NLP Rankings scoring system, the credit for each publication is shared equally among all authors, taking into account their individual contribution percentages An author may have several publications within a year, even if none are independently led or conducted The contribution percentage is calculated as \( c = 1 \).

𝑎 where 𝑎 is the total number of authors of the publication

The 𝑤𝑐-index is calculated by combining assigned weights and contribution percentages, with publications being excluded if their product is below 1 For example, author 𝐴, who co-authored an ACL paper worth 3 credits, receives a full contribution percentage of 1, along with two other authors.

The weight and contribution percentage of an author's ACL paper can significantly impact their 𝑤𝑐-index; for instance, if author A has a paper with a weight of 1, it will positively contribute to their index Conversely, a paper co-authored with three others yields a lower product of 0.75, meaning it won't count towards the 𝑤𝑐-index This system rewards researchers who publish independently in prestigious journals and conferences, reflecting their academic success.

The 𝑤𝑐-index also shows the behavior and current status of researchers With an updated 𝑤𝑐-index that shows the cumulative 𝑤𝑐-index in different time ranges (e.g

2011 𝑤𝑐-index: 2010-2011, 2012 𝑤𝑐-index: 2010-2012, …), it shows how individual researchers have performed over the past decades

Appendix D presents a comparison of the ℎ-index and 𝑤𝑐-index for the top 30 NLP authors since 2015, highlighting that the 𝑤𝑐-index is more beneficial for prospective students in assessing current faculty engagement Unlike the ℎ-index, the 𝑤𝑐-index incorporates a contribution component, allowing students to gauge faculty involvement in research activities Additionally, analyzing the trends of the 𝑤𝑐-index reveals valuable insights into researchers' current status; flat trends may indicate faculty who are less active in academia, while upward trends suggest young researchers who are actively contributing to the research community.

User Analysis

NLP Rankings is an interactive platform that enables users to customize settings for personalized rankings according to their preferred weights and selected timeframes By analyzing the choices users make during this customization process, the platform effectively gathers insights into user interests and behaviors.

In just 46 days from its launch on February 12, 2020, to March 28, 2020, NLP Rankings received 3,913 visits from 1,219 unique IP addresses Notably, 97.3% of these accesses were limited to viewing the rankings for the default period of 2010 to 2019, indicating that the majority of users engaged with NLP Rankings without delving deeper into the site’s content.

Since 2015, there has been a noticeable increase in attention towards recent years in NLP research Specifically, the years 2015 and 2016 are the most frequently examined, followed closely by 2018 and 2017 This trend implies that researchers often focus on the years corresponding to the start of their academic careers, indicating that 2015 marks the onset of growing interest in natural language processing.

Start Year End Year Count

Table 2: Time frame choices on NLP Rankings platform

According to Section 3.3, users can adjust the weights assigned to various venues and publication types, yet 99.2% of total accesses utilize the default weights for calculating NLP Rankings This indicates a strong consensus among users regarding the proposed values attributed to different journals and conferences.

To assess the usefulness of the platform, the logs from the same IP address but on different days over this 46-day period is analyzed Of the 1,219 unique IP addresses,

73.9% of the users only viewed the site once, and never used the site again 18.7% used it twice, and 3.0% checked it on three different days

The histogram of unique IP re-visit frequency indicates that NLP Rankings serves as a valuable resource for NLP researchers However, due to its brief operational period and the timing of its launch, it is anticipated that NLP Rankings will experience increased usage during application seasons later in the year.

Conclusion and Discussions

NLP Rankings is a specialized ranking system and platform tailored for the field of Natural Language Processing (NLP), offering essential insights that are particularly beneficial for current and prospective researchers Unlike general institutional rankings, NLP Rankings focuses on delivering relevant information that meets the specific needs of the NLP research community.

This article analyzes university and NLP author rankings by evaluating publications from the open-source ACL Anthology between 2010 and 2019 Focusing exclusively on U.S universities, a total of 216 institutions contributed to NLP research in the past decade The rankings are determined by the volume of publications and the weighted importance of each venue and issue type.

Natural Language Processing (NLP) is a relatively new academic field compared to others with longer histories and more established programs, resulting in limited references for objective evaluation of institutions NLP Rankings provide valuable insights for researchers and students, helping them assess the quality of faculty and research productivity across various academic institutions Additionally, industry employers may utilize these rankings to evaluate potential candidates.

The analysis of university rankings reveals that while top-tier institutions maintain their competitiveness, some programs are rising or falling in status This trend is largely attributed to the emergence of Natural Language Processing (NLP) as a new academic field, with many universities establishing their programs only recently Established institutions continue to lead, attracting skilled young researchers Despite the short ten-year evaluation period, the ranking score trends suggest that students can anticipate future research advancements at these institutions.

The hierarchical clustering of universities reveals scoring trends over the past decade, categorizing 216 institutions into three main groups This approach offers students deeper insights beyond traditional rankings, which merely aggregate scores and do not forecast future research environments The clustering results are logical, with Carnegie Mellon University identified as an outlier, while most high-ranking universities are grouped together, contrasting with a separate cluster of lower-ranking institutions.

Universities associated with renowned NLP authors highlight the potential for strong research programs, providing valuable insights beyond mere rankings, thus helping to identify promising emerging programs The suggested weight-contribution index facilitates trend analysis of researchers' publication quality and quantity, offering a deeper understanding of individual research achievements.

An analysis of user log data over 46 days indicates that the NLP Rankings platform offers valuable information for the research community, as evidenced by user engagement and revisit frequency However, since the data was collected during the application season, the site experienced limited traffic To better assess the platform's effectiveness, additional analysis should be performed during subsequent application seasons.

NLP Rankings only rank universities based on their research in Natural Language

The field of Natural Language Processing (NLP) encompasses various research areas, making it essential for students interested in an academic career to consider different research interests and focuses To enhance NLP rankings, conducting cluster analysis can help identify key research interests at institutions, while trend analysis and topic modeling can reveal trending research topics from the past decade This approach aims to deliver valuable insights that the NLP research community seeks.

Appendix

A NLP Rankings: Top 50 Universities in the United States (2010 – 2019)

Rank Institution # of Authors Score

7 University of Illinois at Urbana-Champaign 163 337.72

15 University of Texas at Austin 86 209.71

23 University of North Carolina at Chapel Hill 34 106.03

24 University of California, Santa Barbara 37 103.51

27 Toyota Technological Institute at Chicago 20 92.98

31 University of Illinois at Chicago 47 88.55

33 University of California, San Diego 56 81.35

36 University of California, Santa Cruz 45 66.05

37 University of California, Los Angeles 51 64.19

44 The City University of New York 33 52.48

B NLP Rankings: Top 50 Universities in the United States Rank Change (2010 – 2019)

Rank in Year Rank Institution 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

15 University of Texas at Austin

Rank in Year Rank Institution 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

44 The City University of New York

C Hierarchical Clustering Dendrogram based on Scoring Trends

Ngày đăng: 23/10/2022, 00:21

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN