Foundations of info metrics modeling, inference, and imperfect information

Foundations of Info-Metrics Foundations of Info-Metrics MODELING, INFERENCE, AND IMPERFECT INFORMATION Amos Golan 1 Oxford University Press is a department of the University of Oxford It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide Oxford is a registered trade mark of Oxford University Press in the UK and certain other countries Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America © Oxford University Press 2018 All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organization Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Library of Congress Cataloging-in-Publication Data Names: Golan, Amos Title: Foundations of info-metrics: modeling, inference, and imperfect information / Amos Golan Other titles: Foundations of info-metrics Description: New York, NY: Oxford University Press, [2018] | Includes bibliographical references and index Identifiers: LCCN 2016052820 | ISBN 9780199349524 (hardback: alk paper) | ISBN 9780199349531 (pbk.: alk paper) | ISBN 9780199349548 (updf) | ISBN 9780199349555 (ebook) Subjects: LCSH: Measurement uncertainty (Statistics) | Inference | Mathematical statistics ‘Information Measurement’ and ‘Mathematical Modeling.’ Classification: LCC T50.G64 2017 | DDC 519.5/4—dc23 LC record available at https://lccn.loc.gov/2016052820 9 8 7 6 5 4 3 2 1 Paperback printed by WebCom, Inc., Canada Hardback printed by Bridgeport National Bindery, Inc., United States of America To my grandparents, Sara and Jacob Spiegel and Dora and Benjamin Katz, and my children, Maureen and Ben CONTENTS List of Figures xiii List of Tables xv List of Boxes xvii Acknowledgments xix Introduction The Problem and Objectives Outline of the Book Rational Inference: A Constrained Optimization Framework 10 Inference Under Limited Information 11 Qualitative Arguments for Rational Inference 11 Probability Distributions: The Object of Interest 12 Constrained Optimization: A Preliminary Formulation 15 The Basic Questions 18 Motivating Axioms for Inference Under Limited Information 19 Axioms Set A: Defined on the Decision Function 20 Axioms Set B: Defined on the Inference Itself 20 Axioms Set C: Defined on the Inference Itself 21 Axioms Set D: Symmetry 22 Inference for Repeated Experiments 22 Axioms Versus Properties 24 The Metrics of Info-Metrics 32 Information, Probabilities, and Entropy 32 Information Fundamentals 32 Information and Probabilities 37 Information and Entropy 39 Information Gain and Multiple Information Sources 43 Basic Relationships 43 Entropy and the Grouping Property 44 Relative Entropy 46 Mutual Information 47 Axioms and Properties 49 Shannon’s Axioms 49 Properties 49 viii { Contents Entropy Maximization 59 Formulation and Solution: The Basic Framework 60 Information, Model, and Solution: The Linear Constraints Case 60 Model Specification 60 The Method of Lagrange Multipliers: A Simple Derivation 61 Information, Model, and Solution: The Generalized Constraints Case 68 Basic Properties of the Maximal Entropy Distribution 71 Discussion 72 Uniformity, Uncertainty, and the Solution 72 Conjugate Variables 74 Lagrange Multipliers and Information 76 The Concentrated Framework 79 Examples in an Ideal Setting 82 Geometric Moment Information 82 Arithmetic Moment Information 83 Joint Scale and Scale-Free Moment Information 87 Likelihood, Information, and Maximum Entropy: A Qualitative Discussion 87 Inference in the Real World 107 Single-Parameter Problems 108 Exponential Distributions and Scales 108 Distribution of Rainfall 108 The Barometric Formula 110 Power and Pareto Laws: Scale-Free Distributions 112 Distribution of Gross Domestic Products 113 Multi-Parameter Problems 114 Size Distribution: An Industry Simulation 114 Incorporating Inequalities: Portfolio Allocation 117 Ecological Networks 122 Background 123 A Simple Info-Metrics Model 124 Efficient Network Aggregation 126 Advanced Inference in the Real World 135 Interval Information 136 Theory 136 Conjugate Variables 139 Weather Pattern Analysis: The Case of New York City 140 Treatment Decision for Learning Disabilities 143 Background Information and Inferential Model 143 A Simulated Example 145 Brain Cancer: Analysis and Diagnostics 147 The Information 148 Contents } ix The Surprisal 151 Bayesian Updating: Individual Probabilities 154 Efficiency, Sufficiency, and Optimality 165 Basic Properties 166 Optimality 166 Implications of Small Variations 167 Efficiency 169 Statistical Efficiency 169 Computational Efficiency 175 Sufficiency 176 Concentration Theorem 178 Conditional Limit Theorem 180 Information Compression 180 Prior Information 194 A Preliminary Definition 195 Entropy Deficiency: Minimum Cross Entropy 196 Grouping Property 200 Surprisal Analysis 209 Formulation 209 Extension: Unknown Expected Values or Dependent Variables 211 Transformation Groups 211 The Basics 212 Simple Examples 215 Maximum Entropy Priors 221 Empirical Priors 221 Priors, Treatment Effect, and Propensity Score Functions 222 A Complete Info-Metrics Framework 231 Information, Uncertainty, and Noise 232 Formulation and Solution 234 A Simple Example with Noisy Constraints 242 The Concentrated Framework 245 A Framework for Inferring Theories and Consistent Models 249 Examples in an Uncertain Setting 250 Theory Uncertainty and Approximate Theory: Markov Process 250 Example: Mixed Models in a Non-Ideal Setting 254 Uncertainty 259 The Optimal Solution 259 Lagrange Multipliers 261 The Stochastic Constraints 262 The Support Space 262 The Cost of Accommodating Uncertainty 264 452 { Index call strike price option, 413 Caticha, A., 28, 287, 288, 300 causal inference, 322–323 definition, 6, 307–308 info-metrics, 318–319 causal inference via constraint satisfaction, 308–324 causal inference, 318–319 causality, inference, and Markov transition probabilities, 319–324 fundamentals, 319 inferred causal influence, 322–323 model, 320–322 simulated example, 323b–324b definitions, 308–309 nonmonotonic reasoning, 309–317 fundamentals, 309–314 grouping and, 314–316, 315f principle of causation, 316–317 six-sided die, default logic, 312b–313b typicality, 316 causality inference, 254, 319 inference, Markov transition probabilities and, 319–324 inferred causal influence, 322–323 model, 320–322 simulated example, 323b–324b philosophy, 325 science, 325–326 study, evolution, 326 causation principle, 316–317 probabilistic, 308 chain rule for entropies, 43 chapter dependency chart, 7, 7f characteristics, individuals’, 339 Chernozhukov, V., 362–363, 387 Clausius, R., 37 closed world assumption, 310 coding, efficient, 36b–37b complementary approach, 28 complete info-metrics framework, 232–274, 446 concentrated framework, 245–249 efficiency and optimality, 271–274 optimality, 272–274 (see also optimality) examples justice, written text interpretation, 233 simple, with noisy constraints, 242–245, 246f examples, uncertain setting, 250–259 Markov process, theory uncertainty and approximate theory, 250–254, 251t mixed models, non-ideal setting, 254–259, 260f theoretical information, incorporating, strategies, 255b–258b formulation and solution, 234–242 graphical representations, 239, 240f–242f inferring theories and consistent models, 234–235 information, uncertainty, and noise, 232–234 noisy inverse problems, 232, 234 priors, adding, 268–269, 270f uncertainty, 259–264 cost of accommodating uncertainty, 260f, 264 Lagrange multipliers, 261–262 optimal solution, 259–261 stochastic constraints, 262 support space, 262–264 visual representation, 264–268, 265f–267f completeness, 26 assumption, 310, 318 complexity, 24 level, 138 compression information, 180–182, 183b–184b sufficient statistics, 182 computational efficiency, 175–176 computerized tomography, image reconstruction, 121b–122b concentrated framework, 78–82, 81b, 245–249 concentrated model, 138 economic entropy, 296–297 prior information, 197 concentration theorem, 28, 178–180, 316 chi square, 184–186 numerical examples and implications, 185b optimality, 273–274 conditional entropy, 43 conditional limit theorem, 180 conditional mean function, 359 conditioning entropy, 48 confidence interval, 97 conjugate variables, 66 entropy maximization, 74–76, 75b real world inference, advanced, 139–140 temperature, 74b conjunction, constant, 326 conservation rules (laws), 60, 78, 137 conservative information processing rule, 24–25 consistency axioms, 27 consistency conditions, 21 consistent information, 311 constant conjunction, 326 constant-weight aggregation, 128 constrained optimization Index } 453 inference under limited information, 15–18, 16b–17b, 17f statistical inference, continuous problems, 361–364 basic model, 361–363 general information-theoretic model, 363–364 statistical inference, discrete, 341–343 unified framework, 1–2 constraints, 12 See also specific types budget, 293–294 conservation rules, 60, 78 definition, 195 formulation, optimal, 446 generalized, 68–72, 69f information, 285–286 linear, 60–68 method of Lagrange multipliers, simple derivation, 61–68, 65f, 66f, 67b–68b, 68f model specification, 60–61 vs priors, 195–196 satisfaction, causal inference via, 308–324 (see also causal inference via constraint satisfaction) specification, vs decision function choice, 447 from statistical requirements, stochastic moment conditions, 383b from theory nonlinear, inequality and, 383b–386b stochastic moment conditions, 383b–386b consumption, 290, 292 continuous entropy, 98 continuous problems, statistical inference, 358–403 See also statistical inference, continuous problems continuous random variables Bayesian method of moments, 398 generalized method of moments, 393 stochastic moment conditions, 377–378 support spaces, 377, 380–381 T-dimensional probability distribution, 361, 368 continuous regression models, statement of problem, 358–359 contours, entropy distributions of equal entropy, 64, 66f, 241f distributions of equal entropy deficiency, 198f, 199–200, 199f, 270f geometrical view, decision-making for underdetermined problems, 16b–17b primal-dual graphical relationships, 81b six-sided die, default logic, 312b–313b three assets, 119, 120f upper envelope curve, 121 convexity, normalization function, 91–93 coordinate invariance, 26 coronary artery disease, 418 coronary artery disease prediction, 418–426 analyses and results, 420–426 complete sample, 420–423, 420t, 421t, 422f out-of-sample, 421t, 423, 423f sensitivity analysis and simulated scenarios, 424, 424t data and definitions, 419 implications and significance, 425–426 counterfactual conditions, 327–328 covariance, 71, 91–93 covariance matrix, 72 derivations, 401 covariates, 339 Cover, T. M., 42 Cramér-Rao inequality, 173–174 Cramér-Rao lower bound, 169–170, 174 relative entropy, 174, 174b–175b statistical efficiency and optimality, 272–273 Cressie, N., 365 Cressie-Read entropy, 365, 366–367, 366f, 370b Cressie-Read function, 372–373 criterion function, 196, 391 Boltzmann-Gibbs-Shannon entropy, 319, 372 embedded information, minimizing, 388 empirical likelihood, 372–373, 375 priors in, 18, 196 statistical inference, continuous problems, 372–373, 375 choice, 387–388, 391, 440 using wrong, 387–388 cross entropy, 21, 46–47 definition, 196–197 minimum, 196–200, 198f, 199f nonuniform priors, 199–200, 199f uniform priors, 198, 198f Csiszar, I., 20, 25, 27, 42 data aggregation, 126 Deaton, A., 383b decision criteria, 11 See also specific applications decision function, 11–12, 13 decreasing returns to scale, 128 deductive reasoning, vs inferential reasoning, 309 default logic, 309–311 extreme conditional probabilities, 310 six-sided die, 312b–313b degeneracy of event k (nk), 202–203, 202f degree distributions, 122 demand, 292 De Moivre, A., 4b 454 { Index density function, 74, 82, 83, 213 joint, 216b normalized, 84, 98, 111 probability (pdf), 12, 15 scaled and scale-free moments, 87 Weibull distribution, 87 dependence, normalization function Ω, 138, 139b dependent variables, surprisal analysis, 211 description problem, 326 design matrix, 60 diagnostics, inference and, quantitative formulation, 96–98 differential entropy, 98 directed graph, 123b–124b discrete choice models (problems), 45, 161, 335–339 definition and examples, 335 example, die and discrete choice models, 335–339, 337b, 338f info-metrics benefits for inference, 351 discrete distributions, multivariate, 204, 205b discrete problems, statistical inference, 334–353 See also statistical inference, discrete problems discrimination information, 21, 46–47 distribution Boltzmann, 44, 111 Bose-Einstein, 138, 139b degree, 122 ecological degree, 122–123, 123b–124b exponential, 108–111, 109f rainfall, 108–110, 109f gross domestic product, 113, 113f marginal, 48, 141, 141f, 154 maximal entropy, 71–72 Maxwell-Boltzmann, 74 prior, 203–204 probability, 12–15, 14b hypothesis testing, 94 inference under limited information, 12–15, 14b joint, 43 unobserved (inferred), 13–15 sampling, 172 scale-free, 112–113 definition, 112 gross domestic products, 113, 113f size industry simulation, 114–117, 116f optimization problem, 115–117, 116f prior construction, 204–208, 206b–207b Weibull, 87 Doctrine of Chances (De Moivre), 4b Donoho, D., 402–403 dosage decisions, 432–433 dose effect prediction, drug-induced liver injury, 432–439 data and definitions, 434 implications and significance, 439 inference and predictions, 434–439 analyzing residuals, extreme events, 437–439, 438f linear model, 434–436, 436t medical background and objective, 433–434 dual problem, 79–81, 81b primal-dual graphical relationship, 80, 81b dummy variables, 145 ecological degree distribution, 122–123, 123b–124b ecological networks, 122–125 background, 123–124, 123b–124b simple info-metrics model, 124–125 economic entropy, concentrated model, 296–297 economic statistical equilibrium, 294 economy, 288 efficiency, 169–176 coding, 36b–37b computational, 175–176 network aggregation, 126–130, 127f, 129f statistical, 169–175, 170b–175b optimality, 272–273 efficient information processing rule, 24 Einstein, A., 23–24, 179–180 election prediction, using priors on individuals, 426–432 analyses and results, 427–431 data, 427–428 priors and analyses, 428–431, 429t, 430f, 431f implications and significance, 431–432 elementary events, 212 elementary outcomes grouping property, 44, 200, 201 prior information, two-dice example, 201–204, 202b, 205b social science examples, 208 empirical likelihood, 367, 368–369, 370b, 371 vs Shannon entropy, 371–376, 374f empirical priors, 221–222 empirical weight, 362, 369 entities, 284–285 entropy α-, 400 Boltzmann-Gibbs-Shannon, 3, 19, 21, 26, 88 statistical inference, continuous problems, 364, 370b, 372, 375, 377, 388 chain rule, 43 conditional, 43 Index } 455 continuous, 50, 98 Cressie-Read, 365, 366–367, 366f, 370b cross, minimum, 196–200, 198f, 199f deficiency, minimum cross entropy, 196–200, 198f, 199f definition, 37, 39 derivation, Stirling’s approximation, 186 economic, concentrated model, 296–297 generalized, 364–367, 365b, 366f grouping property, 42b, 44–46, 50 information and, 39–42, 40f, 41b–42b joint, 43 Kullback-Leibler relative, 364 maximum (see maximum entropy) mixing, 45, 203 mutual, 43 non-negativity, 50 positivity, 50 relative, 21, 196–200, 198f, 199f, 294–295 Cramér-Rao lower bound, 174, 174b–175b information gain and multiple information sources, 46–47 Rényi, 103, 364, 365b Shannon, 186, 364, 365b vs empirical likelihood, 371–376, 374f simple symmetry, 50 statistical properties, fundamental, 37 Tsallis, 365, 366–367, 366f entropy concentration theorem, 180 entropy maximization, 59–101 concentrated framework, 78–82, 81b continuous entropy, notes, 98 discussion, 72–78 conjugate variables, 74–76, 75b Lagrange multipliers and information, 76–78 uniformity, uncertainty, and solution, 72–74 examples, ideal setting, 82–87 arithmetic moment information, 83–86, 85b–86b geometric moment information, 82–83 joint scale and scale-free moment information, 87 formulation and solution, basic framework, 59–72 information, model, and solution: generalized constraints case, 68–72, 69f information, model, and solution: linear constraints case, 60–68 hypothesis test notes, qualitative discussion, 93–96, 95b inference and diagnostics, quantitative formulation, 96–98 likelihood, information, and maximum entropy, qualitative discussion, 87–88, 89b uniqueness, convexity, and covariance, 90–93 convexity of normalization function and covariance, 91–93 uniqueness, 90–91 entropy ratio statistic, 96–97 ε-equilibrium, 261, 387 equilibrium, 181 economic statistical, 294 ε-, 261, 387 mixed-strategy equilibrium condition, 256b Nash, 257b, 294 “noisy,” 387 option pricing, inferring distribution, 416–418 statistical, social science example, 294–296 Walrasian, 294 estimating equations, 170b, 360 estimating functions, 170b estimator, 169, 171 unbiased, 171–172 Euclidean likelihood, 369–371, 370b events composite, entropy of, 43 entropies of, weighted sum, 316 entropy, 200 (see also entropy) exponential distributions and scales, 108 extreme, analyzing residuals, 420, 432, 434, 437–439, 438f, 440 grouping property, 200–204, 202f, 205b, 208, 212 k, 314 Laplace principle of indifference, 217 likelihood, 14b, 143 maximum entropy principle, 213 mutually exclusive, 200 one-time, 28 predicting future, 282–283 probabilities and frequencies, 14b problem description, 326 vs problems, 213 questions, 41b–42b rare observing, 40f tail function, 108 repeated experiments, 22 surprisal, 38–39, 152, 209 three-event problem mixed-theory, 259, 260f nonuniform priors, 270f exogeneity, 308 expectation operation, 15 456 { Index experiment See also specific types information and data from, 1–2 repeated inference, 22–24 inference under limited information, motivating axioms, 19–20 probabilities and frequencies, equivalence, 14b expiration time, 413 exponential distribution, 108–111, 109f exponential distributions and scales, 108–111 barometric formula, 110–111 rainfall distribution, 108–110, 109f exponential probability density function, 73 extreme conditional probabilities, 310 extreme value modeling, 437 fair coins, constructing random variable from, 183b–184b false positive, probability, 86b falsification, 286–288 Fisher, R. A., 176 Fisher information, 173, 175b Fisher information matrix, 72, 93 flat curvature, 173 fluctuation, 24 food web definition, 122 ecological network, 122–123, 123b–124b formulation and solution, basic framework, 59–72 information, model, and solution, generalized constraints case, 68–72, 69f information, model, and solution: linear constraints case, 60–68 framework, complete info-metrics, 232–274, 446 See also complete info-metrics framework French, K., 119 frequency predicted, 14b probability equivalence, 14b functional form, 195 fundamental structure, 195 game, 255b game theory, 255b–258b, 285–286 Gell-Mann, M., 14b general information-theoretic model, 363–364 generalized constraints case, 68–72, 69f generalized empirical likelihood, 367 generalized entropies, 364–367, 365b, 366f Cressie-Read, 365, 366–367, 366f Kullback-Leibler relative entropy, 364 Rényi relative, 103, 364, 365b Shannon, 364, 365b Tsallis, 365, 366–367, 366f generalized likelihood, 343–345 generalized method of moments (GMM), 371, 393–397 background, 393–394 definition and traditional formulation, 394–395 example, ideal setting, 396–397 extension, info-metrics framework and, 397 information-theoretic solution, 395–396 generalized Pareto distribution (GPD), 437–438, 438f “genesis” problem, 327 geometric mean constraint, 318, 388 geometric moment information, 82–83 Gibbs, J. W., 37, 39 glioblastoma multiforme, 145–157 See also brain cancer Golan, A., 262, 288, 377, 383b, 384b Goldszmidt, M., 316–317 goodness of fit, 97 grand potential function, 82, 247 Granger, C. W., 323 graph See also specific topics bipartite, 123b–124b complete info-metrics framework, 239, 240f–242f directed, 123b–124b inequality constraints, 67, 68f log inequality, 69, 69f maximum entropy solution geometrical view (simplex representation), 66f inequality constraints, 67, 68b–69b two-dimensional, 65, 65f parallel coordinates, 314 primal-dual relationship, 1b, 80 undirected, 123b–124b Greene, W. H., 92 gross domestic product distribution, 113, 113f grouping property definition, 44 degeneracy of event k (nk), 202–203, 202f entropy, 42b, 44–46, 50 nonmonotonic reasoning and, 314–316, 315f prior distribution, 203–204 prior information, 200–209 natural sciences, 200–201 physics, 201, 202–203 Shannon’s grouping property, 201 size distribution, 204–208, 206b–207b social sciences examples, 208 two-dice example, 200, 201–202, 202f, 204, 205b Index } 457 universal information or constraints, 208–209 guessing, information and, 35b–36b Gutsche, T. L., 426 Hall, A. R., 387, 394 Hammerich, A. D., 299b Hansen, L. P., 393–394 hard information, 233 Harte, J., 298, 301 Harte model constraints, 285 entities of interest, 286 simple, 284 validation and falsification, 287 Hartley, R. V L., 33, 34, 233 Hartley’s formula, 34–35, 35b, 40 Heckman, J. J., 323 Hessian, 90, 91–93, 173, 248 historical perspective, 4b–5b home mortgage lending, racial discrimination, 347–351 inference, marginal effects, prediction, and discrimination, 348–351, 349t, 350t loans, minorities, and sample size, background, 347–348 hot fusion plasma, soft x-ray emissivity, 122b Hunter, D., 327–328 hypothesis as constraint within optimization problem, 93 statistical, 76–77 hypothesis testing logic, 93–94 maximum entropy problem, 94 notes, qualitative discussion, entropy maximization, 93–96, 95b probability distribution, 94 probability distributions, 94 qualitative discussion, 93–96, 95b I-divergence, 21, 46–47 Imbens, G. W., 323, 396 independence subset, 21, 26 system, 20–21, 26 individual preferences, 292–293 inequality (constraints) Cramér-Rao, 173–174 graphical representation, 67, 68f incorporation, portfolio allocation, 117–122, 120f, 121b–122b nonlinear constraints from theory, 383b–386b inference See also specific topics and types causal, 6 complexity level, 138 diagnostics, quantitative formulation, 96–98 information-theoretic methods, 271 inverse, 4b prior information, 196 (see also prior information (priors)) rational, 10–30 under uncertainty (see also specific topics) history, 4b inference, real world, 107–131 multi-parameter problems, 114–130 ecological networks, 122–125, 123b–124b efficient network aggregation, 126–130, 127f, 129f inequality incorporation, portfolio allocation, 117–122, 120f, 121b–122b size distribution, industry simulation, 114–117, 116f single-parameter problems, 108–113 exponential distributions and scales, 108–111, 109f power and Pareto laws, scale-free distributions, 112–113, 113f inference, real world, advanced, 136–162 brain cancer, 145–157 Bayesian updating, individual probabilities, 154–157, 157b fundamentals, 147–148 information, 148–151, 148b–149b surprisal, 208 data transformation from intervals to integers, 158–159, 159t interval information, 136–143 Bose-Einstein distribution, 138, 139b conjugate variables, 139–140 theory, 136–138, 139b weather pattern analysis, 140–143, 141f, 142f learning disabilities, treatment decision, 143–147 background information and inferential model, 143–145 simulated example, 145–147, 146b, 147t treatment decision data generation, 159–161 inference under limited information, 11–22 constrained optimization, 15–18, 16b–17b, 17f motivating axioms, 19–22 set A: defined on decision function, 20 set B: defined on inference itself, 20–21 set C: defined on inference itself, 21–22 set D: symmetry, 22 probability distributions, 12–15, 14b qualitative arguments for rational inference, 11–12 458 { Index inferential reasoning nonmonotonic, 309–310 (see also nonmonotonic reasoning) vs pure deductive reasoning, 309 inferred causal influence, 322–323 info-metrics, 1–2 See also specific topics information See also specific types bits, 33–34 compression, 180–182, 183b–184b conservation, 317 consistent, 311 constraints and, 285–286 definition, 32–33, 232 discrimination, 21, 46–47 entropy and, 39–42, 40f, 41b–42b guessing and, 35b–36b hard, 233 input, 32–33, 88, 232–233 insufficient, 1 interval, 136–143 (see also interval information) marginal additional, 47 matrix, 93 model, solution, and generalized constraints case, 68–72, 69f linear constraints case, 60–68 (see also linear constraints case) mutual, 47–49, 48f objective, 32–33 output, 88 partial (noisy), 11 prior, 33, 85b, 88, 150, 152, 194–226, 233 (see also prior information) probabilities and, 37–39 processing, 1 sets, 181 soft, 233–234 subjective, 33 Wiener’s derivation, 51–52 information gain, multiple information sources and, 43–49 basic relationships, 43 entropy and grouping property, 42b, 44–46, 50 mutual information, 47–49, 48f relative entropy, 46–47 information processing rule, 21 as algorithm, 21 concise specification, 25–26 conservative, 24–25, 189 efficient, 24, 181 simple, 24 symmetry, 22 uniform, 22 information-theoretic methods of inference, 271, 358 stochastic moment conditions, 376–386 examples, simulated, 381–382, 382f examples, simulated, with extensions, 381–386, 383b–386b support spaces, 380–381 zero-moment conditions, 367–371, 370b information-theoretic model, general, 363–364 input information, 32–33, 88, 232–233 insufficient information, 1 insufficient reason, principle of, 4b, 26 interval information, 136–143 Bose-Einstein distribution, 138, 139b conjugate variables, 139–140 theory, 136–138, 139b weather pattern analysis, 140–143, 141f, 142f interval to integer data transformation, 158–159, 159t “intervention” problem, 326–327 invariance, 20, 214 inverse inference, 4b inverse probability, 4b Jaynes, E. T., 4b–5b, 20, 27, 174, 180, 195, 211, 212, 213, 284, 300 Jaynes model, simple, 284 constraints, 285 entities of interest, 286 validation and falsification, 287 Johnson, P., 396 Johnson, R. W., 20, 21, 25, 27 joint distribution, 21 joint entropy, 43, 321, 378, 385b joint probability distribution, 43, 73 joint scale, 87 Judge, G., 262, 377, 383b K See outcomes (K) Kakavand, H., 402–403 Karloff, H., 126 Khinchin, A. I., 42 kinetic energy, gas molecules, 74b Kitamura, Y., 395, 412 Kolmogorov complexity, 403 Koopmans, T. C., 26–27 Korner, J., 42 Kullback, S., 176–177 Kullback-Leibler informational distance function, 21, 46–47 Kullback-Leibler relative entropy, 364 Lagrange multipliers, 62, 137–138 approach, simple derivation, 61–68, 65f, 66f, 67b–68b, 68f brain cancer, 148, 149b, 150–153, 155–156 complete info-metrics framework, 237, 248 Index } 459 complexity level, 138 conjugate variables, 139–140 information and, 76–78 interval information, 137–138 learning disability treatment, 144–145, 146 production, 291–292 social sciences, 297–298 surprisal analysis, 211 uncertainty, 261–262 Lagrangian (function), 137 complete info-metrics framework, 237 generalized likelihood, 344 Hessian of, 92 prior information, 197 Laplace, P., 4b, 26 Lawless, J., 170b learning disabilities, treatment decision, 143–147 background information and inferential model, 143–145 simulated example, 145–147, 146b, 147t least squares criterion priors, 228 variance and maximum entropy, 173b least squares method, unconstrained models, 359–360 Leibler, R. A., 176–177 Levine, R. D., 21–22, 27, 76, 300 likelihood, 14b, 47 See also maximum likelihood approach (method) empirical, 367, 368–369, 370b, 371 vs Shannon entropy, 371–376, 374f Euclidean, 369–371, 370b function, 73, 85b generalized, 343–345 generalized empirical, 367 information, maximum entropy, and qualitative discussion, 87–88, 89b linear constraints case, 60–68 method of Lagrange multipliers, simple derivation, 61–68, 65f, 66f, 67b–68b, 68f model specification, 60–61 liver injury, drug-induced, 433–434 dose effect prediction, 432–439 (see also dose effect prediction, drug-induced liver injury) Lloyd, S., 14b logarithm base, 36b–37b log inequality, graphical representation, 69, 69f logit, 432 macro state, 13 Mammen, J., 402–403 Mandal, L., 419 marginal additional information, 47 marginal distribution, 48, 141, 141f, 154 marginal effects, 358 marginal entropy, 43 Markov chain, 177 Markov process statistical inference, continuous problems, 387 theory uncertainty and approximate theory, 250–254, 251t Markov transition probabilities, causality, inference and, 319–324 simulated example, 323b–324b Martian seaweed example, 310, 312b, 313–314, 337b, 338f maturity date, 413 maximum entropy inference “recipe,” 89b Jaynes’ formalism, 4b–5b method, 181 optimization procedure, 61 priors, 221 problem, hypothesis testing, 94 random variable from fair coins, 183b–184b Shannon limit, 182, 183b statistics and, 170b–171b variance and, 172b–173b maximum entropy solution, 72–73, 117 See also specific applications generalized method of moments, information-theoretic reformulation, 395–396 graphical representations geometrical view (simplex representation), 66f inequality constraints, 67, 68b–69b two-dimensional, 65, 65f prior information, 198 uniformity, uncertainty, and, 72–74 maximum likelihood approach (method), 73, 77, 87–88, 393 continuous problems, 359–360 unconstrained model as, 340–341 maximum likelihood logit, 162 discrete problems, 343, 344, 352, 428 Maxwell-Boltzmann distribution, 74 mean free path, 108–111, 109f mean squared error (MSE), 275, 382 meta-theory, 5, 232, 282 method of moments Bayesian, 398–399 generalized, 371, 393–397 (see also generalized method of moments (GMM)) metrics, 32–53 axioms and properties, 49–51 properties, 49–51 Shannon’s axioms, 49 460 { Index metrics (cont.) information, probabilities, and entropy, 32–42 information, logarithm base, and efficient coding, 36b–37b information and entropy, 39–42, 40f, 41b–42b information and guessing, 35b–36b information and probabilities, 37–39 information fundamentals, 32–35, 35b–37b information gain and multiple information sources, 43–49 basic relationships, 43 entropy and grouping property, 42b, 44–46, 50 mutual information, 47–49, 48f relative entropy, 46–47 units, 33 Wiener’s derivation of information, 51–52 Michelangelo, 448 micro RNA (miRNA) definition and function, 148b–149b glioblastoma multiforme, identifying, 148–151 prediction accuracy, significance level and, 157b micro state, 13 Miller, D., 262, 377, 383b minimum cross entropy, 196–200, 198f, 199f misspecification statistical inference, continuous problems, 386–387 traditional, 386 zero-moment conditions, 362–363 mixed models, non-ideal setting, 254–259, 260f mixed-strategy equilibrium condition, 256b mixed-theory, three-event problem, simplex representation, 259, 260f mixing entropy, 45, 203 modeling and theories, 282–304 See also specific types basic building blocks, 284–288 information and constraints, 285–286 prediction, 288 priors, incorporating, 286 problem and entities, 284–285 validation and falsification, 286–288 core questions, 282–284 definitions, 282, 283 example, social science, 288–299 (see also social science example, detailed) examples, classic other, 300–301 single-lane traffic, toy model, 299b–300b parsimony, 282 theories vs models, 283 mole fraction, 44 Montroll E., 299b “most probable,” 28 Muellbauer J., 383b multi-parameter problems, 114–130 ecological networks, 122–124 background, 123–124, 123b–124b simple info-metrics model, 124–125 efficient network aggregation, 126–130, 127f, 129f inequality incorporation, portfolio allocation, 117–122, 120f, 121b–122b size distribution, industry simulation, 114–117, 116f multiple information sources, 43–49 See also information gain, multiple information sources and multivariate discrete distributions, 204, 205b mutual entropy, 43 mutual information, 47–49, 48f Nash equilibrium, 257b, 294 nats, 38 natural base, 37b natural scale, 108 nested choice models, 45 network, 123 aggregation, efficient, 126–130, 127f, 129f ecological, 122–125 background, 123–124, 123b–124b simple info-metrics model, 124–125 New York City, weather pattern analysis, 140–143, 141f, 142f nk, 202–203, 202f node, unified, 126 noise, 235 constraints, simple example, 242–245, 246f equilibrium, 387 information, 11 inverse inferential problems, 232 inverse problems, 232, 234 uncertainty, information, and, 232–234 non-ideal setting definition, 250 mixed models, 254–259, 260f nonlinear constraints from theory, inequality and, 383b–386b nonmonotonic reasoning, 309–317 causation, principle, 316–317 fundamentals, 309–314 nonmonotonic reasoning and grouping, 314–316, 315f six-sided die, default logic, 312b–313b typicality and info-metrics, 316 non-negativity, entropy, 50 Index } 461 normalization condition, 61 normalization constraint, 15 normalization function convexity, 91–93 Ω, dependence, 138, 139b objective information, 32–33 object of interest, 12–15, 14b optimality, 166–168, 272–274 concentration theorem, 273–274 statistical efficiency, 272–273 uncertainty, 259–261 optimization, 27 problems, 16b, 17f process, 12 optimization, constrained, 195 inference under limited information, 15–18, 16b–17b, 17f statistical inference, continuous problems, 361–364 basic model, 361–363 general information-theoretic model, 363–364 statistical inference, discrete problems, 341–343 unified framework, 1–2 option, 412 call, 413 option pricing, 412–418 generalized case, inferring equilibrium distribution, 416–418 implications and significance, 418 simple case, one option, 413–416 organization, downsizing, 127f, 128–130, 129f outcomes (K), 34–35, 48, 184 See also specific types discrete, as proposition, 48 elementary, 44, 200–205, 208 (see also elementary outcomes) Wiener’s derivation of information, 52 output information, 88 parallel coordinates, 314 Pareto, V., 112 Pareto distribution, gross domestic products, 113 Pareto laws, power and, 112–113, 113f Pareto principle, 112 parsimony, 282 partial information, 11 Pearl, J., 308–309, 323 penguin example, 310 “perfect” priors, 221–222 Perloff, J. M., 384b permutation invariance, 26 political message tailoring and negative messages’ impact testing, 345–347 congressional race and survey, background, 346 inference, prediction, and effect of different messages, 346–347 Popper, K. R., 286 portfolio allocation, 117–122, 120f, 121b–122b positive predictive power, 428–431 positivity, entropy, 50 potential, for problem, 81–82 power, positive predictive, 428–431 power law, 112, 217 See also specific types definition, 112 distribution, scale choice invariance, 217 geometric mean constraint, 318 geometric mean constraint and Boltzmann- Gibbs-Shannon entropy, 388 maximum entropy framework, 82–83 Pareto laws and, scale-free distributions, 112–113, 113f priors, 217, 221 self-similarity, 112 Shannon entropy vs empirical likelihood, 371–376, 374f predicted frequencies, 14b prediction, 288 accuracy, significance level, miRNA and, 157b social science example, detailed, 298 table, simulated example, 146, 146b, 147t preferences individual, 292–293 social science example, 297–298 prices, 297–298 primal-dual relationship, 81b, 247 primal model, 79–81, 81b principle of indifference, Bernoulli-Laplace- Keynes, 212–213 principle of insufficient reason, 212–213 Bernoulli’s and Laplace’s, 4b, 26 Jayne’s generalization, 4b–5b prior distribution, 203–204 prior information (priors), 18, 33, 85b, 88, 150, 152, 194–226, 233 brain cancer, 150, 152 complete info-metrics framework, adding to, 268–269, 270f vs constraints, 195–196 constructing and quantifying, challenges, 195 definition, preliminary, 195–196 election prediction, improved, 426–432 empirical, 221–222 entropy deficiency, minimum cross entropy, 196–200, 198f, 199f 462 { Index prior information (priors) (cont.) grouping property, 200–209, 202f, 205b–207b (see also grouping property) improper, 215–216 incorporating, 286 incorrect, 196 maximum entropy, 221 “perfect,” 221–222 social science example, 298 for straight line, 217, 218b–220b surprisal analysis, 209–211 brain cancer, 151–153, 208 extension, unknown expected values or dependent variables, 211 formulation, 209–211 theoretical, 222 transformation groups, 211–220 basics, 212–215 simple examples, 215–220, 216b, 218b–220b treatment effect, propensity score functions and, 222–223 probabilistic causation, 308 probabilities frequency equivalence, 14b information and, 37–39 inverse, 4b subjective, 14b probability distribution hypothesis testing, 94 inference under limited information, 12–15, 14b joint, 43, 73 risk-neutral, 412 inferred, 414–418, 415f unobserved (inferred), 13–15 probit, 432 problem See also specific types characterization, social science example, 288–289 entities and, 284–285 processing See information processing production, 291–292 definition, 290 function, 114 propensity score functions, priors, treatment effect and, 222–223 matching, 222 properties, basic, 166–180 See also specific types concentration theorem, 178–180 conditional limit theorem, 180 efficiency, 169–176 computational, 175–176 statistical, 169–175, 170b–175b optimality, 166–168 sufficiency, 176–178 Pseudo-R2, 97 Qin, J., 170b questions, core, 282–284 racial discrimination, home mortgage lending, 347–351 inference, marginal effects, prediction, and discrimination, 348–351, 349t, 350t loans, minorities, and sample size, background, 347–348 rainfall distribution, exponential, 108–110, 109f randomness (noise), complete info-metrics framework, 233, 234 See also noise random variable See also specific topics continuous Bayesian method of moments, 398 generalized method of moments, 393 stochastic moment conditions, 377–378 support spaces, 377, 380–381 T-dimensional probability distribution, 361, 368 discrete binary, information, entropy, and probability relationships, 40f joint entropy and conditional entropy, 43 mutual information, 47–49, 48f outcomes, as proposition, 48 relative entropy, 46–47 Shannon’s axioms, 49–50 fair coins, 183b–184b generalized entropies, 364, 365b info-metrics solution, three possible outcomes, 241f X, constrained optimization, 15 rational inference, 10–30, 25 axioms set B: concise specification, 25–26 axioms vs properties, 24–25 basic questions, 18–19 inference for repeated experiments, 22–24 inference under limited information, 11–18 constrained optimization, 15–18, 16b–17b, 17f probability distributions, object of interest, 12–15, 14b qualitative arguments for rational inference, 11–12 motivating axioms for inference under limited information, 19–22 set A: defined on decision function, 20 set B: defined on inference itself, 20–21 set C: defined on inference itself, 21–22 set D: symmetry, 22 Index } 463 notes, 26–28 real world inference, 107–131 See also inference, real world advanced, 136–162 (see also inference, real world, advanced) regularity conditions, 94 Reiss, H., 299b Reiter, R., 310 relative entropy, 21, 196–200, 198f, 199f, 294–295 Cramér-Rao lower bound and, 174, 174b–175b information gain and multiple information sources, 46–47 Rényi, A., 364 Rényi entropy, 103, 364, 365b Rényi function, 372 repeated experiments inference for, 22–24 inference under limited information, motivating axioms, 19–20 probabilities and frequencies, equivalence, 14b resource web, complete, 125 risk-neutral probabilities, 412 distribution, 412 option pricing, inferred, 414–418, 415f Ruben, D. B, 323 Sairam, N., 419 sampling distribution, 172 scale decreasing returns to, 128 exponential, 108–111, 109f joint, 87 natural, 108 scale-free distributions, 112–113 definition, 112 economics, wealth/GDP, 216–217 gross domestic products, 113, 113f universal law, 371 scale-free moment information, 87 scale invariant See scale-free distributions scale parameter, 74, 213 scaling, 21, 26 scientific theory, 282 second-best theory, 447 second law of thermodynamics, 181 self-consistency, 414 self-similarity, 112 semantic, 39 Shannon, C. E., 4b, 27, 33, 37–38, 42, 195, 233 Shannon entropy, 186, 364, 365b vs empirical likelihood, 371–376, 374f Shannon-Khinchin axioms, 42, 49 Shannon limit, 182, 183b Shannon’s axioms, 49 Shannon’s grouping property, 201 nonmonotonic reasoning, 314–316, 315f Shen, Z., 384b Shirley, K. E., 126 Shore, J. E., 20, 21, 25, 27 significance level, 97 simple, 24 simple symmetry, entropy, 50 simplex representation, maximum entropy solution, 66f Simpson, T., 4b single-parameter problems, 108–113 arithmetic moment information, 83 exponential distributions and scales, 108–111 barometric formula, 110–111 rainfall distribution, 108–110, 109f power and Pareto laws, scale-free distributions, 112–113 gross domestic products distribution, 113, 113f size distribution industry simulation, 114–117, 116f optimization problem, 115–117, 116f prior construction, 204–208, 206b–207b Skilling, J., 20, 25, 27 social science example, detailed, 288–299 basic entities, 289–291 economic entropy, concentrated model, 296–297 information and constraints, 291–294 budget constraints, 293–294 consumption, 292 individual preferences, 292–293 production, 291–292 supply and demand, 292 model summary, 299 prices, Lagrange multipliers, and preferences, 297–298 priors, validation, and prediction, 298 problem characterization, 288–289 statistical equilibrium, 294–296 soft information, 233–234 soft x-ray emissivity, hot fusion plasma, 122b spacing See also support space unequal, 380 Spady, R. H., 396 spot price, 413, 418 state, 184 statistic, 171 statistical efficiency, 169–175, 170b–175b optimality, 272–273 statistical equilibrium economic, 294 social science example, 288 464 { Index statistical hypothesis, 76–77 statistical inference, continuous problems, 358–403 Bayesian method of moments, 398–399 constrained optimization, 361–364 basic model, 361–363 general information-theoretic model, 363–364 continuous problems, info-metrics benefits, 388–390 continuous regression models, statement of problem, 358–359 definitions and problem specification, 359 generalized entropies, 364–367, 365b, 366f generalized method of moments, 371, 393–397 background, 393–394 definition and traditional formulation, 394–395 example, ideal setting, 396–397 extension, info-metrics framework and, 397 information-theoretic solution, 395–396 information and model comparison, 390–392 information-theoretic methods, stochastic moment conditions, 376–386 example, simulated, 381–382, 382f examples, simulated, constraints from statistical requirements, 383b examples, simulated, inequality and nonlinear constraints from theory, 383b–386b examples, simulated, with extensions, 381–386 support spaces, 380–381 information-theoretic methods, zero-moment conditions, 367–371 specific cases, empirical and Euclidean likelihoods, 367–371, 370b misspecification, 386–387 power law, Shannon entropy vs empirical likelihood, 371–376, 374f unconstrained models in traditional inference, 359–360 statistical inference, discrete problems, 334–353 constrained optimization, 341–343 definitions and problem specification, 339 discrete choice models, statement of problem, 335–339 example, die and discrete choice models, 335–339, 337b, 338f discrete choice problems, info-metrics benefits in, 351 generalized likelihood, info-metrics framework, 343–345 real-world examples, 345–351 political message tailoring and negative messages’ impact testing, 345–347 racial discrimination in home mortgage lending, 347–351, 349t, 350t unconstrained model as maximum likelihood, 340–341 Stirling’s approximation, 179, 182, 186 stochastic constraints, uncertainty, 262 stochastic moment conditions, information- theoretic methods, 376–386 See also under information-theoretic methods of inference straight line, prior for, 217, 218b–220b strike price, 413, 415 structural equations, 285 structural form, 195 Stutzer, M., 395, 412, 413 subjective information, 33 subjective probabilities, 14b subset independence, 21, 26 sufficiency, 176–178 summary tree, 126 supply and demand, 292 support space, 237, 239, 262–264 definition, 239 information-theoretic methods, stochastic moment conditions, 380–381 specifying, 243–245, 247 symmetric-about-zero, 247, 252, 262, 381 unequal spacing, 380 surprisal (analysis), 38–39 applications, 152 brain cancer, 151–153, 208 definition, 38, 211 deviation terms, 210 extension, unknown expected values or dependent variables, 211 formulation, 209–211 prior information, 209–211 survival function, 112 symbols, 449–451 symmetry, 22 simple, entropy, 50 system independence, 20–21, 26 tail function, 112 temperature, conjugate variable and, 74b theories See also modeling and theories; specific theories definitions, 282, 283 inferring consistent models and, framework, 234–235 theoretical information incorporation, strategies, 255b–258b Index } 465 vs models, 283 Popper on, 286 scientific, 282 uncertainty, approximate theory and, Markov process, 250–254, 251t theory of large deviations, 316 thermodynamics, second law, 181 Thomas, J. A., 42 three-constraint problem, arithmetic moment information, 83–86 three-sigma rule, 380 three-state transition matrix, schematic representation, 251, 251t Tikochinsky, Y., 21–22, 27 Tishby, N. Z., 21–22, 27 tomography, info-metrics and, 121b–122b transformation groups, 211–220 applications, 212 basics, 212–215 examples, simple, 215–220, 216b, 218b–220b economics, wealth/GDP distribution, 216–217 Poisson process, random variable X, 216 prior for straight line, 217, 218b–220b summary, 217 two-variable, 215–216, 216b motivation, 212 transitivity, 26 treatment decision data generation, 159–161 learning disabilities, 143–147, 146b, 147t treatment effects models, 326 propensity score functions, priors, and, 222–223 tree diagram, 41b–42b, 126 trial parameter, 82 Tribus, M., 73 trophic species, 124 Tsallis, C., 365–367, 366f type, 45, 137 typical groups, 316 typicality, 316 typical set, 316 Uffink, J., 21, 28 unbiased estimator, 171–172 uncertainty, 259–264 accommodating, cost, 260f, 264 information, noise and, 232–234 Lagrange multipliers, 261–262 optimal solution, 259–261 stochastic constraints, 262 support space, 262–264 uniformity and solution, 72–74 unconstrained models as maximum likelihood, 340–341 in traditional inference, 359–360 unconstrained problem, 96–97 underdetermined problems decision-making, 16b–17b, 17f definition, 15 undirected graph, 123b–124b unequal spacing, 380 unified constrained optimization framework, 1–2 unified node, 126 uniform, 21 uniformity, uncertainty, solution, and, 72–74 uniqueness, 20, 26, 90–91 universal law, scale-free, 371 unknown probability density function (pdf), 12 unobserved probability distributions, 13–15 utility function, 11–12 validation, 286–288 social science example, 298 variance, maximum entropy and, 172b–173b variational principle, 27 von Neumann, J., 283–284 Walrasian equilibrium, 294 weather pattern analysis, New York City, 140–143, 141f, 142f Weibull distribution, 87 weight, empirical, 362 weight matrix, 395 Wiener, N., 33, 51–52, 233 Wiener’s derivation of information, 51–52 Zadran, S., 147, 153, 155–156, 157b Zellner, A., 28, 398–399 zero-moment conditions definition and example, 362 generalized method of moments, 393 information-theoretic methods, 367–371, 370b misspecification, 362–363 specification, assumptions, 362 Zipf ’s law, 371 ... The Metrics of Info- Metrics 32 Information, Probabilities, and Entropy 32 Information Fundamentals 32 Information and Probabilities 37 Information and Entropy 39 Information Gain and Multiple.. .Foundations of Info- Metrics Foundations of Info- Metrics MODELING, INFERENCE, AND IMPERFECT INFORMATION Amos Golan 1 Oxford University Press is a department of the University of Oxford... Library of Congress Cataloging-in-Publication Data Names: Golan, Amos Title: Foundations of info- metrics: modeling, inference, and imperfect information / Amos Golan Other titles: Foundations of info- metrics

Định dạng
Số trang	489
Dung lượng	21,33 MB