1. Trang chủ
  2. » Tất cả

Statistics, data mining, and machine learning in astronomy

1 2 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 1
Dung lượng 56,46 KB

Nội dung

Statistics, Data Mining, and Machine Learning in Astronomy 9 Classification “One must always put oneself in a position to choose between two alternatives ” (Talleyrand) In chapter 6 we described techn[.]

9 Classification “One must always put oneself in a position to choose between two alternatives.” (Talleyrand) n chapter we described techniques for estimating joint probability distributions from multivariate data sets and for identifying the inherent clustering within the properties of sources We can think of this approach as the unsupervised classification of data If, however, we have labels for some of these data points (e.g., an object is tall, short, red, or blue) we can utilize this information to develop a relationship between the label and the properties of a source We refer to this as supervised classification The motivation for supervised classification comes from the long history of classification in astronomy Possibly the most well known of these classification schemes is that defined by Edwin Hubble for the morphological classification of galaxies based on their visual appearance; see [7] This simple classification scheme, subdividing the types of galaxies into seven categorical subclasses, was broadly adopted throughout extragalactic astronomy Why such a simple classification became so predominant when subsequent works on the taxonomy of galaxy morphology (often with a better physical or mathematical grounding) did not, argues for the need to keep the models for classification simple This agrees with the findings of George Miller who, in 1956, proposed that the number of items that people are capable of retaining within their short term memory was 7±2 (“The magical number 7±2” [10]) Subsequent work by Herbert Simon suggested that we can increase seven if we implement a partitioned classification system (much like telephone numbers) with a chunk size of three Simple schemes have more impact—a philosophy we will adopt as we develop this chapter I 9.1 Data Sets Used in This Chapter In order to demonstrate the strengths and weaknesses of these classification techniques, we will use two astronomical data sets throughout this chapter RR Lyrae First is the set of photometric observations of RR Lyrae stars in the SDSS [8] The data set comes from SDSS Stripe 82, and combines the Stripe 82 standard stars (§1.5.8),

Ngày đăng: 20/11/2022, 11:19