0% found this document useful (0 votes)
31 views

medical informatics and health informatics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

medical informatics and health informatics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Introduction to Computational

Health Informatics
Chapman & Hall/CRC
Data Mining and Knowledge Series
Series Editor: Vipin Kumar

Data Mining with R


Learning with Case Studies, Second Edition
Luís Torgo

Social Networks with Rich Edge Semantics


Quan Zheng and David Skillicorn

Large-Scale Machine Learning in the Earth Sciences


Ashok N. Srivastava, Ramakrishna Nemani and Karsten Steinhaeuser

Data Science and Analytics with Python


Jesus Rogel-Salazar

Feature Engineering for Machine Learning and Data Analytics


Guozhu Dong and Huan Liu

Exploratory Data Analysis Using R


Ronald K. Pearson

Human Capital Systems, Analytics and Data Mining


Robert C. Hughes

Industrial Applications of Machine Learning


Pedro Larrañaga et al.

Privacy-Aware Knowledge Discovery


Novel Applications and New Techniques
Francesco Bonchi and Elena Ferrari

Knowledge Discovery for Counterterrorism and Law Enforcement


David Skillicorn

Multimedia Data Mining


A Systematic Introduction to Concepts and Theory
Zongfei Zhang and Ruofei Zhang

For more information about this series please visit:


https://www.crcpress.com/Chapman--HallCRC-Data-Mining-and-Knowledge-Discovery-Series/
book-series/CHDAMINODIS
Introduction to Computational
Health Informatics
By Arvind Kumar Bansal (principal)
Kent State University

Javed Iqbal Khan


Kent State University

S. Kaisar Alam
President & Chief Engineer at
Imagine Consulting Services
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2020 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-4987-5663-1 (Paperback)


International Standard Book Number-13: 978-0-367-43478-6 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to
publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or
the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright
material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.
copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400.
CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification
and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Names: Bansal, Arvind Kumar, author. | Khan, Javed I. (Professor of


computer science), author. | Alam, S. Kaisar, author.
Title: Introduction to computational health informatics / by Arvind Kumar
Bansal, Javed Iqbal Khan, S. Kaisar Alam.
Other titles: Chapman & Hall/CRC data mining and knowledge discovery
series.
Description: Boca Raton : CRC Press, [2020] | Series: Chapman & Hall/CRC
data mining and knowledge discovery series | Includes bibliographical
references and index.
Identifiers: LCCN 2019042593 | ISBN 9780367434786 (hardback : alk. paper) |
ISBN 9781498756631 (paperback : alk. paper) | ISBN 9781003003564 (ebook)

Subjects: MESH: Medical Informatics


Classification: LCC R855.3 | NLM W 26.5 | DDC 610.285--dc23
LC record available at https://lccn.loc.gov/2019042593

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
This book is dedicated to all the visionaries who have worked incessantly to free this world from
pain and misery.
Contents

Preface xxiii
Chapter Outlines xxv
Classroom Use of this Textbook xxix
Acknowledgments xxxi
About the Authors xxxiii

1 Introduction 1
1.1 Informatics 2
1.2 Modeling Healthcare Information 3
1.2.1 Data Abstraction 3
1.2.2 Raw Data to Information to Knowledge 4
1.2.3 Inference and Learning 4
1.3 Medical Informatics 5
1.3.1 Health Informatics 5
1.3.2 Clinical Informatics 5
1.3.2.1 Nursing informatics 6
1.3.2.2 Pharmacoinformatics 6
1.3.3 Patients’ Privacy and Confidentiality 6
1.4 Computational Health Informatics 7
1.4.1 Acceptance and Adoption 7
1.4.2 Emulating Human–Human Interactions 8
1.4.3 Improving Clinical Interfaces 8
1.4.4 Privacy and Security 9
1.5 Motivation and Learning Outcomes 9
1.6 Overview of Computational Health Informatics 10
1.6.1 Medical Databases 10
1.6.1.1 Electronic medical records 10
1.6.1.2 Information retrieval issues 12
1.6.1.3 Information de-identification 12
1.6.1.4 Maintaining patient privacy 13
1.6.1.5 Standardized medical knowledge bases 13
1.6.1.6 Automated data collection 13
1.6.2 Medical Information Exchange 14
1.6.2.1 Standards for information exchange 14
1.6.2.2 Types of connectivity 15
1.6.3 Integration of Electronic Health Records 16
1.6.3.1 Accessing from heterogeneous databases 16
1.6.3.2 Heterogeneity and interoperability 17
1.6.4 Knowledge Bases for Health Vocabulary 18
1.6.4.1 LOINC 18
1.6.4.2 MedDRA 18

vii
viii Contents

1.6.4.3 SNOMED 18
1.6.4.4 ICD 18
1.6.5 Concept Similarity and Ontology 19
1.6.6 Interfaces 20
1.6.6.1 Visual interfaces 20
1.6.6.2 Natural language interfaces 20
1.6.7 Intelligent Modeling and Data Analysis 21
1.6.7.1 Hidden Markov model 21
1.6.7.2 Uncertainty-based reasoning 22
1.6.7.3 Fuzzy logic 22
1.6.7.4 Bayesian probabilistic network 22
1.6.7.5 Speech-to-text conversion 22
1.6.7.6 Text analysis and generation 22
1.6.7.7 Heuristic reasoning 23
1.6.8 Machine Learning and Knowledge Discovery 23
1.6.8.1 Clustering 23
1.6.8.2 Regression analysis 24
1.6.8.3 Decision trees 25
1.6.8.4 Data mining 25
1.6.9 Medical Image Processing and Transmission 25
1.6.9.1 Image processing techniques 27
1.6.9.2 Medical image transmission 27
1.6.10 Biosignal Processing 27
1.6.10.1 ECG 28
1.6.10.2 EEG 28
1.6.10.3 MEG 28
1.6.11 Clinical Data Analytics 29
1.6.11.1 Evidence-based medicine 29
1.6.11.2 Survivability and hazard analysis 30
1.6.11.3 Randomized clinical trials 30
1.6.11.4 Clinical decision support system (CDSS) 30
1.6.11.5 Biomarkers discovery 30
1.6.12 Pervasive Health Care 31
1.6.12.1 Patient-care coordination 31
1.6.13 Bioinformatics for Disease and Drug Discovery 32
1.6.13.1 Biochemical reactions pathways 32
1.6.13.2 Genetic disease discovery 33
1.6.13.3 Vaccine development 33
1.6.13.4 Drug discovery 34
1.6.14 Pharmacokinetics and Drug Efficacy 34
1.6.14.1 Pharmacogenetics 35
1.7 Summary 35
1.8 Assessment 37
1.8.1 Concepts and Definitions 37
1.8.2 Problem Solving 37
1.8.3 Extended Response 40
Further Reading 41

2 Fundamentals 47
2.1 Data Modeling 47
2.1.1 Basic Data Structures 47
Contents ix

2.1.1.1 Histograms 48
2.1.1.2 Records in database 48
2.1.2 Modeling N-Dimensional Feature-Space 48
2.1.2.1 Proximity of multidimensional data 50
2.1.3 Modeling Graphs 51
2.1.3.1 Modeling graphs as matrices 51
2.1.3.2 Modeling graphs as a set of vertices 52
2.1.3.3 Modeling graphs as a set of edges 52
2.1.4 Trees for Database Search 53
2.1.4.1 Interval-based search 54
2.1.4.2 Limitations of binary trees 54
2.1.4.3 B+ trees for database access 54
2.1.4.4 PATRICIA tree – fast string-based search 55
2.1.5 Spatial Trees for Multidimensional Data 56
2.1.5.1 Quad tree 56
2.1.5.2 K-D (K-dimensional) tree 57
2.1.5.3 R (rectangular) tree 58
2.1.5.4 SS (similarity search) tree 59
2.1.5.5 VP (vantage-point) tree 60
2.1.6 Trees for Multidimensional Database Search 61
2.1.6.1 K-D-B tree and variants 61
2.1.7 Time-Series Data 63
2.1.7.1 Representing time-series data 63
2.1.7.2 Indexing structure 64
2.1.7.3 ISAX-based indexing 64
2.1.8 Trees for Spatiotemporal Access 65
2.1.8.1 Time-parameterized R-trees 65
2.2 Digitization of Sensor Data 66
2.2.1 Analog to Digital Conversion 66
2.2.1.1 Standardized sound format 67
2.2.1.2 Error correction and preprocessing 67
2.2.2 Digital Representation of Images 68
2.2.2.1 Proximity preserving image representation 68
2.2.2.2 Standardized image formats 69
2.2.2.3 Standardized video formats 69
2.2.3 Image Compression 70
2.2.3.1 Huffman coding 71
2.2.3.2 Segmentation and image compression 71
2.2.3.3 Compression in digital image formats 71
2.3 Approximate String Matching 71
2.3.1 Hamming Distance 72
2.3.2 Edit-Distance 72
2.3.2.1 Jaro–Winkler distance 72
2.3.2.2 Levenshtein edit-distance 73
2.3.3 Applications of Approximate String Matching 74
2.3.3.1 Dynamic programming 74
2.4 Statistics and Probability 75
2.4.1 Statistics 75
2.4.1.1 Basic metrics 76
2.4.1.2 Correlation 76
x Contents

2.4.2 Probability 77
2.4.2.1 Bayes’ theorem 78
2.4.3 Probability Distribution Functions 78
2.4.3.1 Gaussian distribution 78
2.4.3.2 Bivariate Gaussian distribution 79
2.4.3.3 Other distributions 79
2.4.4 Hypothesis and Verification 80
2.4.4.1 Confidence intervals and margin-of-errors 80
2.4.4.2 Hypothesis testing 81
2.4.5 Curve Fitting 81
2.4.5.1 Fitting a straight line 82
2.5 Modeling Multimedia Feature Space 83
2.5.1 Texture Modeling 83
2.5.1.1 Histogram as texture 84
2.5.1.2 Gradients as texture 84
2.5.1.3 Run-length matrices 85
2.5.1.4 Hurst operator 86
2.5.1.5 Cooccurrence (SGLD) matrix 87
2.5.1.6 Local binary pattern 88
2.5.1.7 Gabor filters 89
2.5.1.8 Wavelets 89
2.5.2 Shape Modeling 90
2.5.2.1 Contour-based techniques 90
2.6 Similarity-Based Search Techniques 92
2.6.1 Matching Query and Database Entity 93
2.6.2 Tree Traversal Techniques 93
2.6.2.1 Traversing R+ tree 93
2.6.2.2 Traversing SS-tree 94
2.6.2.3 Traversing K-D and K-D-B trees 94
2.6.2.4 Traversing VP trees 94
2.7 Temporal Abstraction and Inference 95
2.7.1 Modeling Time 95
2.7.2 Time Interval-Based Matching 95
2.7.2.1 Dynamic time warping 97
2.7.3 Temporal Analysis 98
2.7.4 Knowledge-Directed Temporal Analysis 99
2.8 Types of Databases 100
2.8.1 Relational Database 100
2.8.1.1 Limitations of relational databases 101
2.8.2 Object-Based Databases 101
2.8.2.1 Types of object-based databases 102
2.8.3 Multimedia Databases 102
2.8.4 Temporal Databases 102
2.8.4.1 Queries in temporal databases 103
2.8.4.2 Issues in temporal databases 103
2.8.5 Knowledge Bases 103
2.8.6 Distributed Databases and Knowledge Bases 103
2.9 Middleware for Information Exchange 104
2.9.1 eXtended Markup Language (XML) 104
2.9.2 SOAP and Message Envelope 105
Contents xi

2.10 Human Physiology 105


2.10.1 Modeling Human Body 106
2.10.1.1 Cardiopulmonary system 106
2.10.1.2 Liver 106
2.10.1.3 Kidney 107
2.10.2 Heart 107
2.10.2.1 Depolarization of the heart-cells 108
2.10.2.2 ECG 108
2.10.3 Brain and Electrical Activity 109
2.10.3.1 Brain waveforms 109
2.11 Genomics and Proteomics 109
2.11.1 Genome Structure 110
2.11.1.1 Chromosomes 110
2.11.1.2 Proteins 110
2.11.1.3 Protein functions 110
2.11.2 Biochemical Reactions Pathways 111
2.11.3 Gene Mutations and Abnormalities 112
2.11.4 Antigens and Antibodies 112
2.12 Summary 113
2.13 Assessment 117
2.13.1 Concepts and Definitions 117
2.13.2 Problem Solving 118
2.13.3 Extended Response 121
Further Reading 122

3 Intelligent Data Analysis Techniques 129


3.1 Dimension Reduction 130
3.1.1 Principal Component Analysis 130
3.1.2 Linear Discriminant Analysis 131
3.1.3 Independent Component Analysis 131
3.2 Heuristic Search Techniques 131
3.2.1 Goal-Directed Global Search 132
3.2.1.1 Best-first search 132
3.2.1.2 A* search 133
3.2.2 Local Search – Hill Climbing 133
3.2.3 Combining a Local and Stochastic Search 133
3.2.3.1 Simulated annealing – stochastic + local search 134
3.2.4 Genetic Algorithms 134
3.2.5 Nature-Inspired Metaheuristics 135
3.2.5.1 Ant colony optimization 136
3.2.5.2 Bee search and firefly search 136
3.2.5.3 Particle swarm-based optimization 136
3.3 Inferencing Techniques 137
3.3.1 Forward Chaining 137
3.3.2 Backward Chaining 137
3.3.3 Fuzzy Reasoning 138
3.3.3.1 Fuzzy logic 139
3.3.4 Uncertainty-Based Reasoning 140
3.3.5 Inductive Reasoning 140
xii Contents

3.4 Machine Learning 140


3.4.1 Supervised Learning 141
3.4.2 Reinforced Learning 141
3.4.3 Unsupervised Learning 141
3.5 Classification Techniques 142
3.5.1 Clustering Techniques 142
3.5.1.1 Hard-clustering vs soft clustering 142
3.5.1.2 K-means clustering 143
3.5.1.3 Hierarchical clustering 145
3.5.1.4 Incremental K-means clustering 146
3.5.1.5 Density-based clustering 147
3.5.1.6 Spanning-tree–based clustering 147
3.5.1.7 Clustering time-series data 148
3.5.2 Support Vector Machine 149
3.5.3 Decision Trees 150
3.5.4 Artificial Neural Network (ANN) 151
3.5.4.1 Training an ANN 152
3.6 Regression Analysis 153
3.6.1 Linear Regression Analysis 153
3.6.2 Multilinear Regression Analysis 155
3.6.3 Logistic Regression Analysis 155
3.6.4 Bayesian Statistics and Regression Analysis 155
3.6.4.1 Bayesian regression analysis 156
3.6.5 Issues in Regression Analysis 156
3.7 Probabilistic Reasoning over Time 157
3.7.1 Bayesian Decision Network 158
3.7.1.1 Temporal Bayesian network 159
3.7.2 Markov Model 159
3.7.2.1 First-order Markov model 160
3.7.2.2 Training Markov models 161
3.7.3 Hidden Markov Model 161
3.7.3.1 Modeling HMM 162
3.7.3.2 Probability of transition sequence 162
3.7.3.3 Queries in HMM 163
3.8 Data Mining 164
3.8.1 Interestingness 164
3.8.2 Associative Learning 165
3.8.2.1 Associative rules formation 165
3.8.2.2 Apriori algorithm 166
3.8.3 Temporal Data Mining 168
3.8.3.1 Identifying meaningful patterns 168
3.9 Interoperability and Ontology 169
3.9.1 Entities and Relationships 170
3.9.1.1 Unified medical language system 170
3.9.1.2 MedDRA and ontology 171
3.9.2 Similarity Measures 171
3.9.2.1 Path-based similarity 171
3.9.2.2 Content-based similarity 172
3.9.3 Ontology Formation 172
3.9.3.1 Corpus building 172
Contents xiii

3.9.3.2 Semantic normalization 172


3.9.3.3 Concept formation 173
3.10 Intelligent Interfacing Techniques 173
3.10.1 Natural Language Understanding 173
3.10.1.1 Characteristics of speech waveforms 173
3.10.1.2 Speech recognition techniques 174
3.10.1.3 Word-sense disambiguation 174
3.10.1.4 Multilingual translation 175
3.10.2 Intelligent Text-Extraction 176
3.10.2.1 Latent semantic analysis 176
3.10.2.2 Topic modeling 177
3.10.3 Automated Text Summarization 178
3.10.4 Event Extraction from Clinical Text 179
3.10.5 Case Studies 180
3.10.5.1 cTakes system 180
3.10.5.2 MedKAT system 180
3.11 Summary 181
3.12 Assessment 184
3.12.1 Concepts and Definitions 184
3.12.2 Problem Solving 185
3.12.3 Extended Response 189
Further Reading 190

4 Healthcare Data Organization 197


4.1 Modeling Medical Data 198
4.1.1 Modeling Electronic Health Record 198
4.1.1.1 Modeling multimedia objects 198
4.1.2 Modeling Biosignals 198
4.1.3 Medical Image Compression 199
4.1.3.1 Lossy vs lossless compression 199
4.2 Automated Data Acquisition and Interfaces 200
4.2.1 Virtual Medical Devices 200
4.2.1.1 Static model of VMD 200
4.2.1.2 Dynamic model of VMD 200
4.2.2 Data Interface Model and Technology 201
4.2.2.1 Adapters 201
4.2.2.2 Transformers 202
4.2.2.3 Message flow 202
4.3 Healthcare Data Archival and Interchange 202
4.3.1 Datamarts 203
4.3.2 Data Warehouse 203
4.3.2.1 Design and operations 203
4.3.3 Information Clearing House 204
4.4 Health Information Flow 204
4.4.1 Health Service Bus 205
4.4.2 Clinical Information Flow 205
4.4.3 Clinical Interface Technology 207
4.4.4 XML-Based Representation 208
4.5 Electronic Medical Records 209
4.5.1 Components of Clinical Information System 212
4.5.1.1 Ordering system 212
xiv Contents

4.5.1.2 Portals for data-access 212


4.5.1.3 Interfaces 213
4.5.2 EMR Views 213
4.5.2.1 Patient’s view 213
4.5.2.2 Healthcare providers’ view 213
4.5.2.3 Pharmacist’s view 215
4.5.2.4 Billing and claims view 215
4.5.2.5 Case study of OPENEMR 216
4.6 Heterogeneity in Medical Databases 216
4.6.1 Master Patient Index 216
4.6.1.1 MPI generation 217
4.6.1.2 MPI-based operations 218
4.6.2 Virtual Medical Record 218
4.6.2.1 Retrieval from heterogeneous databases 219
4.6.3 Multimedia Information in Databases 219
4.6.3.1 Indexing multimedia objects 219
4.6.3.2 Characterizing tumors 219
4.6.4 Temporal Information in Medical Databases 220
4.6.4.1 Time-stamping 220
4.6.4.2 Temporal inferencing 221
4.7 Information Retrieval 221
4.7.1 Matching Similar Records 222
4.7.1.1 Probabilistic record-linkage 222
4.7.2 Content-Based Image Retrieval 223
4.7.2.1 Histogram-based quick retrieval 223
4.7.2.2 Matching ranked images 223
4.7.2.3 Image alignment 224
4.7.2.4 Invariance 224
4.8 Interoperability and Standardizations 225
4.8.1 Knowledge Bases of Medical Terms 225
4.8.2 SNOMED – Medical Code Standardization 226
4.8.2.1 SNOMED structure 226
4.8.2.2 SNOMED concepts 227
4.8.2.3 SNOMED representation 227
4.8.3 LOINC – Lab and Clinical Data Standardization 228
4.8.3.1 LOINC code structure 228
4.8.4 ICD (International Classification of Diseases) 229
4.8.4.1 ICD classification 229
4.8.4.2 ICD code structure 230
4.8.5 ARDENML – Executable Clinical Information 230
4.8.5.1 Medical logical module 230
4.9 Standard for Transmitting Medical Information 231
4.9.1 Reference Information Management Model (RIMM) 231
4.9.2 Clinical Document Architecture 232
4.9.3 HL7 Based Information Exchange 233
4.9.3.1 HL7 message structure 234
4.9.3.2 Translating HL7 messages to SOAP 234
4.9.4 WSDL (Web Services Description Language) 235
4.10 HIPAA – Protecting Patients’ Privacy Rights 235
4.10.1 HIPAA Violations and Protection 236
4.10.2 Role Layers and Patient Data Protection 237
Contents xv

4.10.3 HIPAA and Privacy in EMRs 238


4.10.4 Data De-Identification 239
4.10.4.1 K-anonymity criterion 239
4.11 Integrating the Healthcare Enterprise 240
4.11.1 Patient Care Coordination 240
4.12 Summary 241
4.13 Assessment 243
4.13.1 Concepts and Definitions 243
4.13.2 Problem Solving 244
4.13.3 Extended Response 247
Further Reading 248

5 Medical Imaging Informatics 255


5.1 Radiography (X-Ray) 256
5.1.1 Components of X-Ray Machine 257
5.1.2 Applications of Radiography 258
5.2 Computed Tomography 259
5.2.1 Fundamentals of Computed Tomography 259
5.2.2 Application of CT Scans 261
5.3 Magnetic Resonance Imaging 261
5.3.1 Fundamentals of NMR 261
5.3.2 MRI Resolution 262
5.3.3 MRI System 263
5.3.4 Applications of MRI Analysis 263
5.4 Ultrasound 264
5.4.1 Principles of Ultrasound Waves 265
5.4.2 Instrumentation 265
5.4.2.1 Pulse generation 266
5.4.3 Overall Functioning 267
5.4.4 Image Formation 267
5.4.4.1 Resolution 268
5.4.4.2 Signal attenuation 269
5.4.4.3 Reflection and refraction 270
5.4.4.4 Probe–skin interface 272
5.4.4.5 Time-gain compensation (TGC) 272
5.4.5 Doppler Ultrasound 272
5.4.6 Applications of Ultrasonic Image Analysis 273
5.5 Nuclear Medicine 273
5.5.1 Applications of PET 274
5.6 Alternate Modalities 275
5.7 Medical Image Archiving and Transmission 275
5.7.1 Medical Image Compression 276
5.7.2 Medical Image Retrieval 276
5.7.3 Medical Image Transmission 279
5.8 Medical Image Analysis 279
5.8.1 Medical Image Segmentation and Feature Extraction 279
5.8.1.1 Thresholding 280
5.8.1.2 Region growing 280
5.8.1.3 Classifier methods 281
5.8.1.4 Markov random field model 281
xvi Contents

5.8.1.5 ANN and segmentation 281


5.8.1.6 Deformable models 281
5.8.1.7 Other segmentation techniques 282
5.8.1.8 Multimodal fusion and segmentation 282
5.8.1.9 Applications of image segmentation 282
5.8.2 Image Segmentation for Lesion Identification 283
5.8.2.1 Enhancing image quality 284
5.8.2.2 Binarization 285
5.8.2.3 Ranking, decision, and segmentation 285
5.9 Computer-Aided Diagnosis 287
5.9.1 Cancer Identification from Medical Images 287
5.9.1.1 Identification using CT images 287
5.9.1.2 Identification using MR images 287
5.9.1.3 Identification using ultrasound 288
5.9.2 Quantitative Descriptors 288
5.9.3 Classifiers 289
5.9.4 Machine Learning Approaches 289
5.9.5 A Case Study 290
5.10 Summary 291
5.11 Assessment 293
5.11.1 Concepts and Definitions 293
5.11.2 Problem Solving 293
5.11.3 Extended Response 296
Further Reading 297

6 DICOM – Medical Image Communication 301


6.1 Overview of Network and Application Layers 301
6.1.1 Physical-Link Layer 302
6.1.2 Network-Ended Layers 302
6.1.3 Application-Ended Layers 303
6.2 Modeling Medical Imaging Information Using DICOM 304
6.2.1 Entity-Relationship Model 304
6.2.2 DICOM’s Entity Relationship Model 305
6.2.3 Medical Imaging 305
6.2.4 Life Cycle of Medical Imaging 307
6.2.4.1 Service-episode 309
6.2.4.2 Imaging service request 309
6.2.4.3 Requested-procedure 309
6.2.4.4 Modality scheduled procedure-step 309
6.2.4.5 Procedure-plan 310
6.2.4.6 Protocol 310
6.2.4.7 Performed procedure-step 310
6.2.4.8 Clinical document 311
6.2.5 Representation of Information Entities 312
6.2.6 Information Objects 315
6.2.6.1 Concatenation 316
6.2.6.2 Dimension organization 316
6.2.6.3 Types of data elements 317
6.3 Network Encoding and Communication 317
6.3.1 Network Encoding Levels and Formats 317
6.3.2 IOD Encoding 319
Contents xvii

6.3.3 Communication Services at Application Level 320


6.3.4 Types of DIMSE (DICOM Messaging Service) 321
6.3.5 Association Service 322
6.3.5.1 DICOM upper layer protocol 322
6.3.5.2 Application identity 323
6.3.6 Negotiation 325
6.3.6.1 Capacity negotiation 325
6.3.6.2 Role selection negotiation 326
6.3.6.3 Communication window negotiation 327
6.4 Verification Service 328
6.5 Storage Services 329
6.5.1 Storage Services Implementation 329
6.5.2 Levels of Service 329
6.5.3 Coercion 330
6.6 Storage Commitment Service 330
6.7 Query/Retrieval Service 331
6.7.1 QR Information Model 331
6.7.2 Commands for QR Service 335
6.7.3 Tree Maintenance Requirements 335
6.7.4 Types of Matching 336
6.7.4.1 Temporal range matching 337
6.7.4.2 Sequence matching 338
6.8 Search and Retrieval Process 338
6.8.1 Search Process 338
6.8.1.1 Actions on response to search 338
6.8.1.2 Hierarchical search 338
6.8.2 Retrieval Process 339
6.9 Medical Image Security 339
6.9.1 DICOM Standards and Image Security 340
6.9.2 Watermarking 341
6.10 Summary 342
6.11 Assessment 344
6.11.1 Concepts and Definitions 344
6.11.2 Problem Solving 345
6.11.3 Extended Response 347
Further Reading 348

7 Bioelectric and Biomagnetic Signal Analysis 351


7.1 Electrocardiograph (ECG) Fundamentals 353
7.1.1 Polarization and Electrical Conductance 354
7.1.2 Measurement Planes and Leads Placement 355
7.1.3 Polarization Vectors 356
7.1.4 Variations in P-QRS-T Waveforms 357
7.1.5 ECG Metrics 357
7.2 ECG Analysis 358
7.2.1 Noise Removal 359
7.2.2 P-QRS-T Waveforms Detection 360
7.2.2.1 Wavelet approach 360
7.2.2.2 Model-based approach 361
7.2.2.3 HMM-based approach 361
xviii Contents

7.2.2.4 Artificial neural network approach 361


7.2.2.5 Identification of P and T waveforms 361
7.3 Heart Diseases and ECG 362
7.3.1 Cardiomyopathy 363
7.3.1.1 Heart enlargement 363
7.3.2 Ectopic-Nodes and Arrhythmia 365
7.3.2.1 Supraventricular arrhythmia 365
7.3.2.2 Ventricular arrhythmia 367
7.3.3 Ischemia and Myocardial Infarction 368
7.3.4 Electrolyte Imbalance 370
7.4 Computational Techniques for Detecting Heart Diseases 370
7.4.1 Morphological Analysis 371
7.4.1.1 Limitations of morphology analysis 371
7.4.2 Applying Markov Models 372
7.4.2.1 Matching PTG with Markov models 373
7.4.3 Neural Network-Based Analysis 373
7.4.4 Other Techniques 374
7.5 Electroencephalography (EEG) 374
7.5.1 Lead Placements for EEG Measurement 374
7.5.2 EEG Recordings 375
7.5.3 EEG Artifacts and Noise Removal 375
7.5.4 EEG Waveforms 376
7.5.5 EEG Analysis 378
7.6 Magnetoencephlography (MEG) 379
7.6.1 MEG Recording 379
7.6.2 Localization of Neural Activities 379
7.7 Identifying Brain Abnormalities 380
7.7.1 Predicting Dementia 380
7.7.2 Localizing Brain Tumors and Lesions 380
7.7.3 Predicting Epilepsy and Epileptic Attacks 380
7.8 Elecromyography (EMG) 381
7.8.1 EMG Measurement and Noise Removal 382
7.8.2 EMG Analysis and Classification 383
7.8.3 EMG Application and Muscular Abnormalities 384
7.9 Summary 384
7.10 Assessment 387
7.10.1 Concepts and Definitions 387
7.10.2 Problem Solving 387
7.10.3 Extended Response 390
Further Reading 391

8 Clinical Data Analytics 397


8.1 Clinical Data Classification 398
8.1.1 Clustering of Clinical Data 399
8.2 Biostatistics in Clinical Research 399
8.2.1 Sensitivity, Specificity and Accuracy 400
8.2.2 Application 401
8.3 Randomized Clinical Trials 401
8.3.1 Hidden Inherent Clusters and Bias 401
8.3.2 Cluster Contamination 402
Contents xix

8.3.3 Correlation 402


8.3.4 Meta-analysis 403
8.3.4.1 Control-group vs. intervention-group 403
8.3.4.2 Fixed vs. random effect 403
8.4 Survivability and Risk Analysis 404
8.4.1 Nonparametric Estimates 404
8.4.2 Parametric Estimates 405
8.4.3 Risk Analysis 406
8.5 Clinical Decision Support System 407
8.5.1 Clinical Knowledge-Based Systems 407
8.5.2 Neural Network-Based Diagnosis 408
8.5.3 Clinical Data Mining 409
8.5.4 Effect on Clinical Practices 410
8.6 Clinical Process Mining 410
8.6.1 LDA-Based Derivation 411
8.6.2 HMM-Based Modeling 411
8.7 Disease Management and Identification 412
8.7.1 Genetic Disease Identification 413
8.7.2 Identifying Biomarkers 413
8.8 Application of AI Techniques 414
8.8.1 Case Study I: Cancer Detection 414
8.8.2 Case Study II: Dynamic Organ-Failure in ICU 415
8.8.3 Case Study III: Detecting Fatty Liver Disease 416
8.9 Summary 417
8.10 Assessment 418
8.10.1 Concepts and Definitions 418
8.10.2 Problem Solving 419
8.10.3 Extended Response 421
Further Reading 422

9 Pervasive Health and Remote Care 427


9.1 Pervasive Healthcare Taxonomy 428
9.1.1 Context-Aware Systems 428
9.1.2 Remote Monitoring 430
9.1.3 Automated Data Collection 431
9.1.4 Telemedicine and Teleconsultation 431
9.1.5 Home Health Care 433
9.1.6 Acute Care Technologies 434
9.2 Wearable Health-Monitoring Systems 434
9.2.1 Wearable Sensors Placement 435
9.2.2 Wearable Sensors Design 436
9.2.3 Wireless Connectivity and Data Collection 436
9.2.4 Issues in Wearable Devices 437
9.2.5 Sensor Data Fusion 437
9.2.6 Privacy and Confidentiality 438
9.3 Patient Care Coordination 439
9.3.1 Radio Frequency Identification (RFID) 439
9.3.1.1 RFID collision 440
9.3.2 RFID-Based Localization and Tracking 441
9.3.2.1 Modeling RFID-based localization 441
xx Contents

9.4 Home-Based Monitoring 442


9.4.1 Activity Recognition 443
9.4.2 Fall Detection 444
9.4.3 Localization and Tracking 445
9.4.3.1 Indoor tracking 446
9.4.3.2 Outdoor tracking 446
9.5 Healthcare Network and Interfaces 446
9.5.1 WBAN (Wireless Body Area Network) 447
9.5.2 WPAN (Wireless Personal Area Network) 448
9.5.3 Gateway to Wide Area Network 449
9.5.4 Application Software Interface 449
9.6 Standards for Data Communication 449
9.6.1 Zigbee 450
9.6.2 MICS (Medical Implant Communication Service) 450
9.6.3 WMTS (Wireless Medical Telemetry Service) 451
9.6.4 UWB (Ultrawide Bandwidth) 451
9.6.5 Clouds in Health care 451
9.7 Acceptance and Adoption 451
9.8 Summary 452
9.9 Assessment 453
9.9.1 Concepts and Definitions 453
9.9.2 Problem Solving 453
9.9.3 Extended Response 457
Further Reading 458

10 Disease Prediction and Drug Development 465


10.1 Genomics 467
10.1.1 Genome Structure 470
10.1.1.1 Mitosis 470
10.1.1.2 Meiosis 471
10.1.2 Gene Structure 471
10.1.3 Gene-to-Protein Translation 472
10.1.4 Protein–Protein Interactions 474
10.1.5 Protein–DNA Interactions 475
10.1.6 Structure and Function 475
10.1.6.1 Domain conservation and functionality 476
10.1.6.2 Sequence matching 478
10.1.6.3 Multifunctional proteins 480
10.2 Proteoemics and Pathways 480
10.2.1 Metabolic Pathways 481
10.2.1.1 Modeling metabolic pathways 481
10.2.1.2 Modeling metabolic reactions 483
10.2.1.3 Modeling flux analysis 485
10.2.2 Cell Signaling Pathways 486
10.3 Genome Analysis Tools 488
10.3.1 BLAST Search for Similar Genes/Proteins 488
10.3.2 SNP (Single-Nucleotide Polymorphism) 488
10.3.3 Linkage Analysis and GWAS 489
10.3.4 SAGE (Serial Analysis of Gene Expressions) 489
10.3.5 CAGE (CAP Analysis of Gene Expressions) 490
10.3.6 Microarray Analysis of Gene-Expressions 490
Contents xxi

10.4 Genome and Proteome Analysis 490


10.4.1 Identifying Genes 491
10.4.2 Finding Gene-Groups 491
10.4.3 Deriving Metabolic Pathways 492
10.4.4 Deriving Disease-Related Pathways 492
10.4.4.1 Correlation-based analysis 492
10.4.4.2 Bayesian network-based analysis 492
10.4.5 Host–Pathogen Interactions 494
10.4.6 Biomarker Discovery 494
10.4.7 Case Studies in Disease Discovery 495
10.4.7.1 Prostate cancer detection using GWAS 495
10.4.7.2 Breast cancer metastasis profiling 496
10.4.7.3 Liver cancer detection 496
10.5 Drug Development 496
10.5.1 Structure and Function of Antibodies 497
10.5.2 Virtual Screening in Drug Design 498
10.6 Immunoinformatics – Vaccine Development 498
10.6.1 Identifying Target Genes 501
10.6.2 Epitope Analysis 501
10.7 Pharmacokinetics and Pharmacodynamics 504
10.7.1 Drug Distribution and Excretion 505
10.7.2 Drug Efficacy and Toxicity 505
10.7.3 Drug–Drug Interactions 506
10.7.3.1 Techniques to study drug–drug interactions 506
10.8 Summary 506
10.9 Assessment 508
10.9.1 Concepts and Definitions 508
10.9.2 Problem Solving 509
10.9.3 Extended Response 511
Further Reading 512

11 End-User’s Emotion and Satisfaction 519


Contributed by Leon Sterling
11.1 Need for End-Users’ Emotional Well-Being and Satisfaction 520
11.2 Modeling Emotion During Software Design 521
11.3 Collecting Information and Providing Support 523
11.3.1 Study I: Software for Insomnia 523
11.3.2 Study II: Software for Prescreening for Severity of Depression 524
11.3.3 Study III: Website for Empowering Psychosis Patients 525
11.3.4 Study IV: Emergency Alarm Services for Elderly Persons 525
11.3.5 Study V: A Tele-Health Hearing Aid System 526
11.4 Summary 526
Acknowledgments 527
11.5 Assessment 527
11.5.1 Concepts and Definitions 527
11.5.2 Extended Response 527
Further Reading 527

12 Conclusion 531
12.1 Evolution of Health Informatics 532
12.2 Evolution of Standards 533
xxii Contents

12.3 Elderly Care and Adoption 534


12.4 Health Informatics in Developing Countries 535
12.5 Current Status of Technology Adoption 535
12.6 Future Development 536
Further Reading 537

Appendix I: Websites for Healthcare Standards 541


Appendix II: Healthcare-Related Conferences and Journals 543
Appendix III: Health Informatics Related Organizations 553
Appendix IV: Health Informatics Database Resources 555
Appendix V: Selected Companies in Healthcare Industry 557
Index 561
Preface

Since the dawn of civilization, doctors and nurses have strived to relieve people from their pain. Continuous
improvement in science and information technology has enhanced the efforts of doctors and nurses by
giving them better tools to archive, analyze and transmit clinical data. Information technology promises
to provide available medical information seamlessly to providers and caregivers so they can optimize
their efforts for the best possible care.
In the last two decades, the increasing presence of computer processing has rendered health informa-
tion widely available. Combined with computational modeling and the development of distributed data-
bases, clinical data is being archived and analyzed using machine learning techniques and data mining,
generating a form of knowledge never seen before. This knowledge is improving life-expectancy by better
disease management, development of new vaccines, and drugs with reduced development-cycle time.
It is envisioned that, in the future, the seamless integration of information technology, intelligent
analysis techniques and medical science will provide quality care for an affordable price by incorporating
better clinical data analysis, providing pervasive care, removing duplicate medical treatment and labora-
tory data analysis and making data available electronically to collaborating healthcare providers.
The flow of information has raised many issues such as data-format standardizations, adoption of
technology and the need for intelligent user-friendly interfaces for the end users such as patients, doctors,
hospitals, nurses, pharmacies, insurance providers, policy makers and clinical researchers.
Despite the exponential growth of this multidisciplinary field, there has not been a single textbook
that provides the computational aspect of health informatics for both software developers and a new
generation of “Health Informatics Scientists”—the books written by clinical scientists present the topic
from the perspective of a clinical practitioner. There is a need for a textbook in Computational Health
Informatics that can prepare computer science or information technology students to understand the com-
putational techniques used in health informatics, along with the related medical concepts.
This book describes various computational techniques, including biostatistics, heterogeneous data-
bases, artificial intelligence, signal analysis, bioinformatics, image analysis, data communication for
transmission of clinical data and medical images and their application to clinical data analysis, as well as
management of electronic health records and their seamless integration to connect healthcare providers.
The book also discusses emerging areas of telemedicine, pervasive care, remote monitoring and bioinfor-
matics for the discovery of drugs, including pharmacokinetics and pharmacodynamics.
This textbook is based upon the Computational Health Informatics course that I have been teaching
since 2012, first to graduate students and then to senior-level undergraduate students beginning in 2014.
The course content evolved along with my understanding of the lack of knowledge and concepts students
need to develop software for health informatics. As I started writing the book, the course material also
evolved along with my knowledge. I included new material based upon my research on the ongoing evolu-
tion of Computational Health Informatics.
Javed, Kaisar and I committed to writing this textbook in 2015. It has taken four long years to come
to fruition, due to our other commitments as well as the need to do extensive research of the scattered
material available across multiple disciplines. When Leon Sterling came to know about our efforts, he
graciously contributed an important chapter about the need for new technology to meet patients’
emotional needs and satisfaction before it can be successfully adopted. The book itself has gone through
two revisions.
This textbook will assist (1) computer science students to understand concepts needed to develop tech-
niques and healthcare software; and (2) medical students and practitioners to understand the computational

xxiii
xxiv Preface

background and concepts for healthcare software and data management. The material is sufficient for one
semester at a senior or graduate-freshman level course. The book dwells on concepts and techniques;
however, specific in-depth algorithms have been avoided. We believe that the knowledge of the concepts
and techniques discussed will prepare the students to follow the necessary algorithms.
In my classes, I could cover Chapters 1 through 7, followed by Chapters 8 and 9. Other instructors
may find other combinations, including Chapter 10 on Bioinformatics for Drug Discovery, to be useful.
Because of the diversity of topics, I recommend sufficient classroom interactions between the instructors
and the students.
We hope that this book will provide a solid foundation to generate a new class of medical technocrats
who will understand and apply computational methods to facilitate patient-friendly automation in health-
care and improve the interpretation of clinical data.

Arvind Kumar Bansal


Kent State University, Kent, Ohio, USA
Chapter Outlines

We assume that students will have a background of two semesters of programming, introductory knowl-
edge of data structure concepts, and some knowledge of statistics and computer networks. The book
assumes that students can write at least 300 lines of code for developing projects. The book is divided into
12 chapters, including a concluding chapter. We have explained concepts in simple intuitive language at
an abstract level. We have described examples and case studies as needed.

Chapter 1 introduces informatics and data modeling, and describes modeling of health-related infor-
mation using the computational techniques to archive, retrieve, transmit, and analyze clinical and
patient-centric data. It introduces classifications of medical informatics and defines computational health
informatics. It describes the components of electronic health records, including medical images and trans-
parent integration of medical data from heterogeneous sources, including healthcare providers, medical
data warehouse, patients, pharmacy, hospitals and insurance agencies. It describes the need for secure
transmission of medical data between sources and user-friendly human–computer interfaces.

Chapter 2 describes foundational concepts derived from the needs of archiving, retrieving, transmitting
and intelligently analyzing data to extract human comprehensible knowledge from clinical data. It describes
data abstractions such as trees, graphs, strings and their matching; image modeling, matching analysis
techniques; formats to represent images; image compression techniques; basics of probability and statistics
needed for data analytics; curve fitting needed for data analysis; concepts in statistics; different types of
databases such as relational databases, object-based databases, multimedia databases, temporal databases;
knowledge bases and techniques to keep privacy and security in databases; middleware for data communi-
cation; basics of human physiology needed for health informatics; and basics of genomics and proteomics.

Chapter 3 describes various artificial intelligent and machine learning techniques that are used for the
automation of human–computer interactions, text extraction and summarization, form filling from doc-
tors’ natural language dictation about patients’ conditions, monitoring patients’ conditions, and clinical
data analysis to derive new information and knowledge from a huge amount of data generated. It describes
artificial intelligent techniques such as heuristic searches, probabilistic reasoning and modeling, deduc-
tion and induction; machine learning techniques such as data clustering, regression analysis, neural net-
works, support vector machines, Markov processes, Bayesian networks, hidden Markov models and data
mining. It also describes analysis and clustering of time-series data. It describes ontology and medical
dictionaries to understand and compare an assessment of the patients’ condition by various specialist
doctors. It also briefly describes techniques for automated information extraction, event analysis and sum-
marization from natural language texts.

Chapter 4 d iscusses organization of healthcare data that removes the duplication of patient records
while preserving the privacy of the patients as protected by HIPAA (Health Insurance Portability and
Accountability Act). HIPAA prevents providers and insurers from exposing information to others without
necessity and the patient’s consent. This chapter also describes automatic acquisition of data from medi-
cal sensors, conversion of data for automated archiving and retrieval from the heterogeneous healthcare
databases, and many popular standards for the exchange of health information electronically over the
Internet. It discusses interoperability and transformation of data to make them compatible with heteroge-
neous databases containing multimedia objects and temporal objects. It also discusses different views of
electronic medical records.
xxv
xxvi Chapter Outlines

Chapter 5 d escribes various medical imaging techniques needed to derive and analyze medical images
such as X-ray, computer-aided tomography (CAT), magnetic resonance imaging (MRI), ultrasound, posi-
tron emission tomography (PET), and other nuclear medicine and optics. The analysis of medical images
can offer noninvasively significant insights to clinicians. The chapter also describes various formats
and compression techniques for medical image archival, retrieval and transmission. Finally, the chapter
describes techniques for the application of medical image analysis with a focus on cancer detection and
computer-assisted treatment monitoring.

Chapter 6 describes DICOM, the standard for communicating digital images between medical databases.
The chapter describes data structures, modeling a medical process using entity-relationship modeling,
transmission protocols and various network levels involved in an image transmission. It also describes
briefly the security issues in transmitting medical images.

Chapter 7 d escribes the various signal analysis techniques to understand ECG (electrocardiograms) for
analyzing and monitoring heart-related diseases, electroencephalograms (EEG) to understand brain-
related diseases and electromyography (EMG) to analyze muscle-related abnormalities. It also describes
how various artificial intelligent techniques, described in Chapter 4, can be applied to extract and analyze
ECG and EEG. It discusses various applications of computational analysis to identify different diseases
related to the heart, brain and muscles.

Chapter 8 d escribes the application of various artificial intelligent techniques such as clustering, regres-
sion analysis, time-series data analysis, neural networks, clustering and data-mining to perform clini-
cal data analysis derived by clinical trials. It discusses statistical and computational techniques to study
drug efficacy, survivability and risk analysis. It describes some applications of clinical decision support
systems utilizing knowledge-based systems and artificial neural networks. It discusses the techniques to
identify and improve clinical processes and biomarkers for the cost-effectiveness of treatments. Finally,
three applications of clinical data analytics have been discussed for cancer detection, detection and man-
agement of dynamic organ failure and fatty-liver disease.

Chapter 9 d iscusses the concepts, techniques and some algorithms for remote care, automated monitor-
ing and transmission of signals, biosignal analysis, archiving the derived data for future analysis and
managing information security and patients’ privacy during data transmission and archiving. Remote
monitoring is becoming important to handle the shortage of medical practitioners, to provide elder-care
and to identify refractory conditions important in identifying disease-states of patients.

Chapter 10 d escribes bioinformatics and its application to drug discovery, efficacy analysis of drugs and
derivation of drug dosage and toxicity using pharmacokinetics and pharmacodynamics. This chapter
discusses biological concepts necessary for explaining bioinformatics, causes of various diseases, genetic
diseases, and pathway aberration-related diseases, as well as vaccine development and improvement in
the efficacy of drugs. The analysis techniques describe similarity-based search, genome alignment tech-
niques, dynamic programming techniques, SNP (single-nucleotide polymorphism), GWAS (genome-wide
association studies) and microarray analysis to identify signaling pathways. This chapter briefly describes
the structure of antibodies and computational techniques to improve the binding-affinity of antibodies to
improve drug-effectiveness.

Chapter 11 d iscusses the lack of understanding about the emotional needs of the healthcare providers,
caretakers and patients on the part of software developers. The potential for health informatics software to
improve health outcomes for patients is enormous. However, the effective utilization of health informatics
software depends upon the adoption and appropriation of the software by a wide range of stakeholders, with
a wide range of abilities and motivations. The emotional aspect of this interaction is vital. However, software
developers are often unaware of the patients’ emotional needs, experiences, and physical and emotional
Chapter Outlines xxvii

impairments, and thus ignore their needs in the developed software. This chapter also describes four
case studies where emotional factors have been taken into consideration during software development for
healthcare applications.

Chapter 12 describes the evolution of health informatics and its impact in an aging society and the need
of the developing world to provide quality care while maintaining the economy. It also discusses issues in
developing standards and adaptability. Finally, this chapter describes some future directions in computa-
tional health informatics.

There are five appendices at the end of the book that describe various sources or healthcare-related stan-
dards, conferences and journals, organizations, databases and companies related to healthcare. These
lists are representative subsets and are not meant to be exhaustive. The purpose of the appendices is to
provide the needed data-sources for doing research and project-reports needed for the course. Appendix I
describes the websites for major standards and formats described in this book. Appendix II summarizes
the list of the conferences and journals that were the source of material for this book. These conferences
and journals are rich sources for graduate research and course projects. The list is still not comprehensive,
and students should also find other sources for research. Appendix III lists major funding and databank
agencies, which are a rich source of data and are also involved in policy decisions regarding healthcare.
Appendix IV lists some major national and international databases that will be helpful in graduate and
undergraduate students’ research and projects. The list is certainly not comprehensive and misses many
research databanks from individual research groups and universities, yet is a major source of archived
data sufficient for research and projects. Finally, Appendix V lists a small representative subset of compa-
nies involved in the healthcare industry. It is divided into different classes such as EHR, medical imaging
devices and diagnostics, wearable devices and pervasive care and drug discovery.
Classroom Use of this Textbook

Based upon the experience in the Health Informatics course, this textbook is suitable for a one-semester
senior-level undergraduate course or freshman-level graduate course. For the graduate-level offerings, the
course needs to be augmented by the research articles given at the back of each chapter and various jour-
nals and conferences in the area (see Appendix I). A suggested distribution of the effort is given below:

SUGGESTED MINIMUM
TIME IN 75-MINUTE 45-MINUTE COVERAGE FOR A SEMESTER-
CHAPTER MINUTES LECTURES LECTURES LONG COURSE
Chapter 1 150 2.0 3.0 Full
Chapter 2 300 4.0 6.5 Full
Chapter 3 250 3.5 5.5 Full
Chapter 4 250 3.5 5.5 Full
Chapter 5 180 2.5 4.0 At least Sections 5.8 and 5.9
Chapter 6 180 2.5 4.0 Full
Chapter 7 180 2.5 4.0 At least Sections 7.1–7.5
Chapter 8 150 2.0 3.0 At least Sections 8.1, 8.5, and 8.6
Chapter 9 150 2.0 3.0 At least Sections 9.1–9.4
Chapter 10 180 2.5 4.0 Based on class makeup
Chapter 11 75 1.0 1.0 At least Sections 11.1 and 11.2
Total App. 2100 28 units 43 units

xxix
Acknowledgments

I thank Kent State University for the “Kent State University-Summa Health System Collaborative
Research Grant” that started my collaboration with Dr. Jeffrey Neilson (MD), who steered me to com-
putational health informatics from a medical practitioner’s perspective. I also acknowledge Jeff for gra-
ciously accepting my request to deliver guest lectures in my first offering of a graduate-level course during
Fall 2012.
I acknowledge all the researchers in this fast-growing field for their valuable contributions that
became an invaluable source of knowledge and learning. I must acknowledge Javed Iqbal Khan, who
nudged me to write this textbook with a promise to contribute the chapter on DICOM. I acknowledge
Kaisar Alam for contributing the chapter on medical image informatics. I also acknowledge the acquisi-
tion editor, Randi Cohen, for her constant encouragement and support throughout this long process of
writing and improving the text. I acknowledge the reviewers who raised the bar with useful comments. I
acknowledge Siemens Healthcare, research groups, researchers, publishers and medical practitioners who
permitted their copyrighted images and drawings to be included in this book.
My former PhD student, now Dr. Purva Gawde, contributed to the teaching and provision of feedback
of the material that was immensely helpful. She also developed an online version of the course material
from an earlier unpublished version of this book that was taught by her and myself. Finally, I acknowl-
edge my PhD advisor and friend, Leon Sterling, who graciously contributed a valuable chapter that raises
an important issue that any health technology ultimately must be human-friendly and easy to use for
adoption.

Arvind Kumar Bansal

Being a longtime worker in medical informatics − medical image processing, high fidelity and com-
plex video communication, information coding, computation for radiation treatment planning, HIPAA to
DICOM, HL7, and medical IoT security – I have felt the need for a common compiled source of knowl-
edge in this multidisciplinary rich and vast area of computational health informatics. Each time I started
working on a topic, it involved vast self-learning into a seemingly different wilderness of knowledge.
There was never an ideal textbook in this highly important area that could prepare students from a com-
puter science and engineering background for the field.
Over the course of a year, discussions along these lines with Prof. Arvind Kumar Bansal eventually
resulted in this project. I am glad to see that finally that dream textbook is here. Given the rapid growth in
this area and its highly challenging multidisciplinary topical composition, it inevitably has many deficien-
cies. However, I am hopeful that, with feedback, this project will become perfect in a few years. More
importantly, it will now pave the way to allow students and practitioners to delve deeper into the area of
medical informatics with much sharper technical tools than has been possible previously.
I gratefully acknowledge the contribution of my advisor, David Y. Y. Yun, who introduced me to
the world of medical computing and affirmed the strength of seeking the bigger picture rising above the
individual subareas of computing.

Javed Iqbal Khan

xxxi
xxxii Acknowledgments

I have been in the area of medical imaging informatics and computer-aided diagnosis since early 1990s
and would like to thank Arvind for initiating this much-needed project. I believe that this book will fill a
conspicuous void and will be very useful to the practitioners in this area.
I would like to thank six individuals whom I consider both friends and mentors: my PhD advisor,
Kevin Parker; my postdoc supervisor, Jonathan Ophir (deceased); my former supervisor, Ernie Feleppa;
my former supervisor, Fred Lizzi (deceased); Kazi Khairul Islam; and Brian Garra. Finally, I would like
to thank my Creator, my parents (deceased), my wife, our two children, my two siblings and all my friends
and family. I really appreciate all your encouragements and support.

S. Kaisar Alam
About the Authors

Arvind Kumar Bansal is a full professor of Computer Science at Kent State
University. He received both B. Tech (1979) in Electrical Engineering and
M. Tech (1983) in Computer Engineering and Science from the Indian Institute
of Technology at Kanpur (IITK), India, and PhD (1988) in Computer Science
from Case Western Reserve University (CWRU), Cleveland, Ohio, USA.
He has been a faculty member of Computer Science at Kent State University,
Kent, Ohio, USA, since 1988 and has taught undergraduate and graduate-level
courses in the areas of artificial intelligence, computational health informatics,
multimedia languages and systems and programming languages. He also directs
the “Artificial Intelligence Laboratory” at Kent State University and has been
teaching “Computational Health Informatics” regularly since 2012.
His research contributions are in the areas of artificial intelligence, bioinfor-
matics, proteomics, biological computing models, massive parallel knowledge bases, program analysis,
ECG analysis, social robotics and multimedia languages and systems. He has published over 75 refereed
articles in journals and international conferences. His research has been funded by NASA and the US Air
Force. He has also served in many program committees in the areas of artificial intelligence, bioinformat-
ics, logic programming, multimedia, parallel programming and programming languages. In addition, he
has been an area editor in the international journal Tools with Artificial Intelligence and is a member of
IEEE and ACM.

Javed Iqbal Khan is a full professor of Computer Science at Kent State University.
He received his B. Tech (1987) in Electrical Engineering from the Bangladesh
University of Engineering and Technology (BUET), Bangladesh, and his MS (1990)
and PhD (1995) in Electrical Engineering (Computer Track) from the University of
Hawaii at Manoa, Hawaii, USA. He has been a faculty member of Computer Science
at Kent State University, Kent, Ohio, USA, since 1997. He has regularly taught under-
graduate and graduate courses in the areas of Internet engineering, peer-to-peer
systems, artificial intelligence, algorithms and networking.
His research contributions are in Internet Engineering, artificial intelligence,
automated knowledge extraction, routing and network decision-making with medi-
cal data, perceptual enhancement through eye-tracking, cyber infrastructure for medical-image com-
munication, and networking for education. He has published over 100 articles in refereed international
conferences and journals and has been in NSF panels, many program committees and the executive
committee of IEEE Internet Engineering. He also led a team that designed and implemented two national
educational networks as a part of UN-funded project. His research has been funded by World Bank, NSF,
DARPA and NASA. As well, he has been a Fulbright scholar and has served as a senior specialist on high-
performance education networking in the Fulbright National Roster of experts. He is an associate editor
of International Journal of Computer Networks and Applications and is a member of IEEE and ACM.

xxxiii
xxxiv About the Authors

S. Kaisar Alam received his PhD (1996) in Electrical Engineering from the
University of Rochester, New York, USA. His research publications and teaching
are in signal/image processing with applications to medical imaging. He was a
Principal Investigator at Riverside Research, New York from 1998 to 2013 and the
Chief Research Officer at an upcoming tech startup in Singapore from 2013 to 2017.
He has been a visiting professor at the Center for Computational Biomedicine Imaging
and Modeling (CBIM), Rutgers University, Piscataway, New Jersey (since 2013)
and an adjunct faculty at The College of New Jersey (TCNJ), Ewing, New Jersey
(since 2017). Currently, he runs his own consulting company specializing in medi-
cal image analysis and diagnostic and therapeutic applications of ultrasound. He is a Fellow of the American
Institute of Ultrasound in Medicine (AIUM) and a senior member of IEEE and has served in the AIUM
Technical Standards Committee and the Ultrasound Coordinating Committee of the RSNA-QIBA. He is
an associate editor of Ultrasonics (Elsevier) and Ultrasonic Imaging (Sage). Dr. Alam has been a recipient
of the prestigious Fulbright Scholar award.

Visiting Research Faculty, Center for Computational Biomedicine Imaging


and Modeling (CBIM), Rutgers University, Piscataway, NJ, USA
Adjunct Faculty, Electrical & Computer Engineering,
The College of New Jersey (TCNJ), Ewing, NJ, USA
President and Chief Engineer
Imagine Consulting LLC, Dayton, NJ, USA
Introduction
1
Providing health care is a complex task. Multiple medical practitioners specialize in various aspects of
health care and collaborate to treat a patient. Many cooperating hospitals provide surgery and health
maintenance. Insurance agencies pay and regulate service bills of patients. Government agencies collect
demographics-based medical data, analyze them and formulate healthcare policies. Congress uses the sta-
tistical data provided by healthcare agencies to formulate privacy laws, laws to avoid and contain threats
of contagious diseases and epidemics, and allocate budgets for a healthier society.
A patient goes through multiple specialists and hospitals for treatment and moves around across
diverse geographical locations. In recent years, people have become more mobile due to business neces-
sities and the availability of better transport. Asking a patient to go through the same set of tests will
duplicate effort resulting in inconvenience, an increase in medical cost and a burden on resources such as
beds and medical staff. Human life expectancy has been steadily growing, and is now around 82 years.
Certain diseases become more pronounced in old age, and elderly patients need remote long-term health
care in home settings to contain medical cost and resources and reduce patient movement. Besides, elderly
patients prefer to remain in the comforting environment of their homes.
In the last three decades, computers and their applications have exploded. Computers are good for:
1) archiving and retrieving of data; 2) intelligent analysis for hypothesis formation; 3) efficient and accu-
rate automation of processes that require a huge amount of human resources and are nearly impossible
for humans; 4) providing innovative ways to probe into abnormalities; 5) efficiently communicating infor-
mation across geographical locations; 6) mimicking speech for human-like interaction; 7) activating an
automated process in response to the human voice and 8) analyzing complex signals and images for aber-
rations and abnormalities. Computers have been employed in the healthcare industry in a variety of ways
such as radiology image analysis for noninvasive disease detection; analysis of biosignals (electrocardio-
gram [ECG], electroencephalogram [EEG] and electromyogram [EMG]) to identify diseases related to
vital organs (heart, lung, brain and muscles) and in robotic surgery, etc. Every complex medical device
has an embedded computer in it, and this trend will grow.
Providing long-term health care in hospitals is expensive, and is not preferred by patients unless essen-
tial. Due to the increased cost and the problem of transportation, elderly people are staying at home and need
to be monitored remotely. Providing remote care requires periodic monitoring and automated transmission
of certain vital signs and physiological data, such as blood sugar and blood pressure, to a medical care pro-
vider. Sending medical images requires high-quality, undistorted lossless transmission for an accurate inter-
pretation. Privacy should be maintained during data archiving, retrieval and transmission over a secure line.
The use of computers and automation leads to: 1) improved data management; 2) reduced loss of infor-
mation; 3) reduced human errors by overburdened medical staff and in data entry, resulting in fewer acci-
dents; 4) streamlining medical care, resulting in reduced wastage and better utilization of medical resources;
5) efficient search and retrieval capability, allowing doctors to use content-based search to compare previ-
ous occurrences of similar diseases and their treatment; 6) better visualization for diagnosis; 7) more effi-
cient and accurate analysis of data for disease diagnosis and hypothesis formation; 8) improved long-term
archiving and sharing of medical data among healthcare providers; 9) reduced duplication of procedures and
lab tests; 10) portability and transparency of data to improve patients’ trust; 11) improved intelligent analysis
of data for discovering new knowledge related to disease prediction and identification; 12) enhanced policy
formulation based upon statistical evidence; 13) improved mobility of people without sacrificing health care

1
2 Introduction to Computational Health Informatics

FIGURE 1.1 Types of interacting automation in health care

due to the availability of medical data and 14) reduction of paper consumption. For example, automated
image analysis can diagnose malignancies in mammograms and brain MRIs (magnetic resonance imaging).
Automated analysis of ECGs can facilitate the work of medical practitioners in the treatment of various
heart-related diseases. Automated analysis of EEG can predict an impending epilepsy attack.
There are some disadvantages in automation:

1. Automated interpretation cannot replace the wisdom of human healthcare providers due to
inherent limitations in modeling techniques that can affect overall accuracy.
2. The availability of automation encourages excessive and often meaningless data generation.
Handling a large amount of data is difficult and error prone.
3. Computer programmers introduce and enforce unnecessary checks and mundane questions in
the human–computer interface that take away additional time from healthcare providers, mak-
ing the automation less attractive. This issue has plagued the adoption of automation tools by
medical practitioners.

One problem in the slow adoption of automated healthcare systems is that the doctors and programmers
do not understand each other’s needs. Healthcare providers find the software overly imposing despite
understanding their advantages. Both need to be educated and trained: healthcare providers should under-
stand computational health informatics more, and information scientists should understand the needs of
the care providers and patients to make their software user-friendly.
Although there are a few disadvantages, the perceived advantages are significant. Due to automation,
healthcare services, pharmacy, medicine dispensing and nursing are getting seamlessly integrated. As shown
in Figure 1.1, there are many components of health-automation: patients’ electronic records, archiving and
analysis of physiological data, improvement in time-management of nursing, pharmacy, medicine dispensing
and billing. Computer-based system automation has seamlessly integrated these components in the last decade.

1.1 INFORMATICS
Informatics handles different aspects of information such as modeling a process, digitization of the
information, efficient electronic archiving and retrieval of the information, transferring the information,
grouping and classifying the information for enhanced data analysis and knowledge extraction, statisti-
cal analysis to identify data patterns, analysis of time-series data to identify a trend, learning from the
1 • Introduction 3

patterns to create simple rules and new medical knowledge, interacting with other information sources to
enhance existing knowledge and keeping the information current.
The key factor is the improvement of the overall system efficiency with significant reduction in pro-
cessing time and required resources. In terms of health care, the resources are: 1) availability of healthcare
providers; 2) availability of hospital beds and 3) support personnel hours to handle duplications. The increase
in system efficiency will be: 1) improved number of patients treated; 2) faster recovery of the patients due to
improved coordination and better diagnosis and 3) increased productivity.

1.2 MODELING HEALTHCARE INFORMATION


Healthcare information from patients is collected using multiple sensor-devices such as an oximeter – a
device to measure the saturation level of oxygen in the blood, ECG machines – a device to collect wave-
forms from heartbeat, EEG machine – a device to collect brain waveforms, spectrometers and other
machines. These machines provide raw data and should be interfaced to each other and central comput-
ers to exchange information. Different healthcare providers maintain their own data repositories, and the
information may be stored using different formats and different coding schemes. For the exchange of
information, there must be a common meta-level platform to which other information data formats can
be translated.
Raw analog data is first digitized. The digitization of medical data is done using multiple incom-
patible formats. Software adapters translate data in one format to another. The digitized raw data is
denoised, images are enhanced for better resolution and the denoised digital data is structured to gather
information. This structured data is stored in an indexed database. Multiple indexed databases are
connected using common index keys. This network of information is then analyzed, and data mined to
gather knowledge.
One of the major discrepancies between human communication and structured database represen-
tation is the lack of natural languages in structured databases. Natural language is comprehensible to
humans. However, it suffers from ambiguity and context sensitivity, and its meaning is affected by the
background knowledge about a person, trying to interpret the meaning. Two experts may use different
set of phrases to communicate similar information. While speaking to a nonprofessional or a beginner,
experts simplify the statements to convey similar meaning. There must be means of comparing and find-
ing similarity between two documents having the same meaning. To find similarity between two docu-
ments, one must handle ontology. Ontology uses descriptions of an entity to associate different meanings
to the same words or give same meaning to the different words. Different words and phrases having the
same meaning can be matched to each other using a dictionary of synonyms and antonyms, or by analyz-
ing the overall meaning.

1.2.1 Data Abstraction


Healthcare domains have multiple entities such as patient, doctors, nurses, hospitals, medications, hospital
beds, lab tests, vital organs’ signals such as ECG and EEG, administrators, diseases, radiology equip-
ment, physiological test equipment, wearable devices and monitoring devices. Each entity has multiple
attributes.
An entity is modeled using only a subset of attribute-value pairs needed to solve the problem. The
abstraction uses well-defined data structure to represent the subset of attribute-value pairs. We need data
abstraction capabilities in the programming languages to implement the abstract models. It should be easy
to implement the required abstractions to develop the corresponding software. A software uses structured
data format.
4 Introduction to Computational Health Informatics

Example 1.1
A patient is modeled abstractly as (personal information, list of healthcare providers, insurance provider,
disease history, list of medicines causing side-effects, symptoms, diagnosis, prognosis). Each field is fur-
ther decomposed into many subfields. For example, patients’ personal information is modeled as a tuple
(patient’s unique identifier such as social security number, name, address, emergency contact).

Example 1.2
A hospital bed is modeled as a tuple: (bed-id, location, patient-id, doctor-on-duty, nurse-on-duty,
patient’s entry-time, patient’s condition, patient’s lab data, list of medications administered, list of
signals monitored). A hospital department is modeled as an array of beds for inpatient treatment. Each
monitoring device is modeled as a tuple (type of signal, signal output, frequency of signal output,
archival format of the signal).

1.2.2 Raw Data to Information to Knowledge


The raw data coming out of the patient monitoring systems such as ECG machines, EEG machines, echo-
grams and chart showing drug response to a disease is stored in the databases using a structured format.
Data is collected using multiple modalities. Each modality is represented using a different data-format.
For example, images are stored using “JPEG” format; sketches and figures are stored in “GIF” format;
textual information is stored in text formats; video is stored in “MPEG” format; and sound is stored in
“WAV” format. This raw data is denoised and enhanced before archiving.
A structured data that conveys some pattern is called information. The derived information is further
analyzed to: 1) identify associations between different patterns or 2) learn generic rules that connect one
or more parameters to the overall pattern. This generic form of rules or association is called knowledge. A
key goal in the available large amount of clinical data is to extract knowledge that can improve patients’
treatments or form public health policies. Generally, the formation of association or rules requires machine
learning techniques that employ statistical analysis.
Exchanging information becomes difficult due to heterogeneous structuring formats. To avoid
this problem, standardized universal formats are designed by information scientists. Adapters and
data-transformers convert structured data represented in one format to a universal format and back to
another structured data format.

1.2.3 Inference and Learning


The knowledge can be derived either by: 1) logical inference – deriving new derived knowledge rules by
combining old knowledge rules or by inductively forming a rule by looking at various examples; 2) data
mining – identifying common patterns in the data using association rules; 3) supervised learning – having
prior knowledge of patterns and deriving similarity with the known patterns using intelligent comparison
techniques; and 4) unsupervised learning – characterizing data elements by some features, and grouping
data-elements based upon proximity of feature values using some similarity criteria. Similarity is based
on the notion of distance between the feature values. The distance criteria could be the same Boolean
value, same fuzzy value and Euclidean distance below some threshold for real and integer values. There
are multiple artificial intelligence techniques to derive new knowledge. Collectively, discovery and learn-
ing of new knowledge is called machine learning.
Derivation of new knowledge has a certain amount of uncertainty due to: 1) the lack of knowledge
of all the parameters affecting the outcome; 2) the presence of noise in the measurements; 3) inherent
approximation present in artificial intelligence techniques used to derive knowledge and 4) limitations
caused by observation, sensor resolution, noise and data processing errors. The level of uncertainty is
modeled using probabilistic reasoning with its root in the statistical analysis.
1 • Introduction 5

1.3 MEDICAL INFORMATICS


Medical informatics is a broader term. Any informatics related to medical science will qualify as medi-
cal informatics. This involves informatics related to patient’s health; informatics related to medical
discovery such as drug discovery and management; discovery of new surgical techniques, tools and
procedures; informatics related to data analytics from the patients’ symptoms and lab tests; informat-
ics related to policy decisions based on public health, including gender-based health and policies such
as maternity, age-based health and policies such as vaccination, ethnicity-based health and policies.
National Library of Medicine (NLM) defines medical informatics as “providing a scientific and theo-
retical basis for the application of the computer and automated information systems to biomedicine
and health affairs.”

1.3.1 Health Informatics


The United States NLM defines health informatics as “the interdisciplinary study of the design, develop-
ment, adoption and application of IT-based innovations in healthcare services delivery, management and
planning.” The overall goal of health informatics is to seamlessly integrate the information related to
health services to improve health management in a cost-effective and seamless manner. Health informat-
ics is directly related to efficient patient care and recovery, use of computational and automated devices
for disease and patients’ recovery management, discovery of new procedures and medications using data
analytics to improve health care, to have error-free archiving and transmission of health records to remove
duplications and reduce the overall cost of health care.
Health informatics is a multidisciplinary field comprising information science/technology, biomedi-
cal technology, communication technology, computer science, social science, behavioral science, man-
agement science and statistical analysis for large-scale data analysis and hypothesis verification. The tools
include clinical terminologies and guidelines, and information and communication systems, including
wireless communication, to improve patient-care delivery by ensuring a high-quality patient-specific data
generation, archiving, retrieval and transmission.
Health informatics utilizes: 1) the archiving, retrieval, analysis, transfer and visualization of
medical data to improve patient–doctor interaction; 2) machine learning techniques to form a new
hypothesis about diseases and drugs, classify data for intelligent analysis and to improve patient moni-
toring and 3) automation techniques to reduce the time needed to record and analyze the medical data.
Automation includes software development and patient-centric human–computer interaction. Since
health informatics facilitates the improvement of care providers’ effectiveness and improves efficiency
of patients’ recovery, we should study: 1) the psychology and sociology associated with human accep-
tance and adoption of new techniques; 2) human perception and comprehension about data collection
and analysis; 2) the performance of the system for optimality; 3) maintenance of the privacy of the
patient-specific data during information exchange and transmission and 4) emulation of human–human
interaction to reduce the unnecessary burden of wasting time of medical practitioners due to enhanced
technical complexity.

1.3.2 Clinical Informatics


Clinical informatics is a subfield of health informatics concerned with delivering healthcare ser-
vice to patients. AMIA (American Medical Informatics Association) defines clinical informatics
as “application of informatics and information technology to deliver healthcare services.” Clinical
informatics involves many subfields such as radiology, ophthalmology, pathology, dermatology and
6 Introduction to Computational Health Informatics

psychology. Clinical informatics involves health signal monitoring, nursing of the inpatients, man-
agement of patient–doctor encounters, nursing care, physiological data analysis, radiology image
analysis, ECG signal analysis, ophthalmological data analysis, managing the treatment record of the
patients, including medications dispensing record and the procedures involved in patients’ treatment.
There are many further subcategories of clinical informatics such as dental informatics, pharmaceuti-
cal informatics, nursing informatics and primary care informatics. Primary care informatics involves
all aspects of family practice, general internal medicine, educating patients, pediatrics, geriatrics and
advanced nursing. Dental informatics involves all aspects of dental care, including dental surgery and
prosthetics.

1.3.2.1 Nursing informatics


Nursing informatics is a subfield of clinical informatics that integrates nursing science with informatics.
The informatics comprises management of records about admitting and discharging patients, patient data
collection for archiving, hospital bed management, catheterization, pain management, signal monitoring,
medication dispensing, management of therapy charts such as respiratory therapy, patient emergency
response, patient recovery analysis, emergency alert system, nurse procedure charting, nursing educa-
tion, improvement of human technology interfaces and developing models for integrated patient-care
management.

1.3.2.2 Pharmacoinformatics
AMIA defines pharmacoinformatics as all aspects related to using medications by the patients.
Pharmacoinformatics includes all aspects of research, analytics and development of computational tech-
niques, including decision support systems, for prescribing, verifying and dispensing, administering,
monitoring and educating the patients and care providers about the medication. Prescription includes the
streamlining and automating the process of prescription, administration, verification and billing of the
medication such that a medication once prescribed by the physician is automatically checked for side-
effects, duplication, and permission by the insurance company before dispensing.

1.3.3 Patients’ Privacy and Confidentiality


Each patient is an individual and their state of health is their private information. It cannot be leaked to
any other person or organization unless the person needs to know. Even if the person knows the informa-
tion, the information is for a specific purpose, and it cannot be used for any other purpose.
This regulation is to protect the patients against discrimination by the insurance companies, employ-
ers and society. For example, an employer may not employ illegally a patient with cancer; an insurance
agency may not insure a person with an existing condition; a community may not allow a person with HIV
(Human Immunodeficiency Virus) in the public places.
The regulation that controls this privacy is called HIPAA (Health Insurance Portability and
Accountability Act) that was passed by US Congress in the year 1996. The HIPAA protects patients
by restricting the use of health information held by entities such as doctors, other healthcare pro-
viders and their business associates. The privacy rule permits the disclosure of health informa-
tion needed for patient care, important legal purposes and national security, including epidemics
management.
Maintaining patients’ privacy and confidentiality has close relationship with how patient-specific
data is archived, transmitted, disclosed to third parties not involved in patients’ treatment and portabil-
ity of data. Data must be secured whether it is being archived in computer databases or being transmit-
ted electronically or being shared among a team of medical practitioners associated with the treatment
to a patient.
1 • Introduction 7

1.4 COMPUTATIONAL HEALTH INFORMATICS


Health informatics has various computational aspects related to individual patient care such as: 1) the col-
lection of data from medical equipment to electronic databases; 2) archiving, retrieval and transmission of
patient-related data in an automated computer database so it becomes independent of format of any spe-
cific installation; 3) data aggregation and intelligent analysis of data and resource usage for discovering
new diseases, medications, treatment’s automation and automated patient monitoring; 4) disease diagno-
sis, disease management and drug distribution; 5) statistical data analysis of patients’ recovery data with
similar disease; 6) quantitatively studying the effect of medicines before approval for drug administration
and 7) statistical data analysis of patient signals such as ECG – heart-related signal, EEG – brain-related
signals and EMG – muscle-related signals.
Computational Health Informatics pertains to using computers in health informatics. Various model-
ing techniques, computational techniques, algorithms and software development to improve the integration
of patient-related medical information, improvement of the resource usage for medical care, improvement
of medical data handling, improvement of effectiveness of medical practitioners, improvement of efficiency
and the automation of health care come within the realm of Computational Health Informatics.
Computational Health Informatics is concerned with the development and application of computa-
tional techniques for: 1) health-related data collection from the sensors and other sources and their secure
archiving in medical databases; 2) retrieval of health-related information from the distributed databases
and knowledge bases; 3) secure transmission of health-related information; 4) visualization and intelligent
analysis of health-related data for automated diagnosis and discovery of new knowledge; 5) converting the
data back and forth from a human comprehensible format to structured data format suitable for archiving
in large databases; 6) performing data analysis to improve diagnosis and treatment and 6) remote care of
different types of patients and elderly persons. Lately, pervasive health care, bioinformatics and pharma-
cokinetics have found significant overlap with computational health informatics due to: 1) the extensive
use of computational techniques in these fields; 2) discovery of genomic causes of diseases and abnor-
malities using computational techniques; 3) improved automation in drug and vaccine development and
4) improved remote care using the computational techniques, including intelligent analysis of automatically
collected data, automated tracking of patients, and automated secure transmission and archiving of data.
Computational health informatics requires integration of many computer science subfields such as
database management, computer algorithms, artificial intelligence and machine learning, signal analysis,
image processing, data security and encryption, software engineering and computer networking, includ-
ing wireless and sensor networks, Internet engineering and embedded computing.

1.4.1 Acceptance and Adoption


There are certain key factors for computational health informatics to succeed: 1) acceptance and adoption
by the medical practitioners; 2) compliance with privacy laws; 3) acceptance and adoption by the patients
and their care-providers, including close relatives; 4) cost factor to update and upgrade; 5) compatibility
with the previous system; 6) support for integration with heterogeneous systems and 7) ease of training.
A key factor in developing health-informatics software is not to make the doctors and medical prac-
titioner slave to the process established by the software designers and programmers for industrial pro-
cesses. Rather, for better acceptance, adoption and maximization of the time utilization of the medical
practitioners, additional effort is needed to develop intelligent software that collects the data from the
medical practitioners in a natural form such as voice dictation or handwritten text, automatically convert
natural language to structured data format and transform structured data format back to natural language.
The intelligent software should also interact with medical practitioners and patients as a human counter-
part would. It includes human courtesy and emotional intelligence.
8 Introduction to Computational Health Informatics

Another key HIPAA is very important because when we transfer the information between the central
database to the end user who could be a patient, or a healthcare provider, or a pharmacy, ensure that proper
software filters filter out the information not needed by the end user. Due to the privacy constraints, the
archived data needs additional security and encryption.
Third important aspect is acceptance by the patients and relatives. Electronic devices are looked with
suspicion for many reasons: 1) violation of privacy; 2) presumed lack of response by the human care pro-
vider when an alert occurs; 3) fear of failure to operate at a critical time; 4) cumbersome entanglement and
interaction with human body in terms of the form factor (weight and size); 5) technical complexity and
lack of standardization resulting into a learning curve to operate; 6) lack of human courtesy and human-
like interaction and 7) lack of empathy specially for elderly patients. Devices still have a bigger form fac-
tor. Recent wireless sensors are better. However, there is no technology to assess the pain and emotion of
the patient just by watching the patient.
Technology is expensive to upgrade and integrate with the remaining information system. Because of
this limitation, hospitals are slow to upgrade the technology. A new technology also requires training of
the staff and patients to use. Unfortunately, due to the lack of standardization and backward compatibility
of operations, it is difficult to learn the changes in technology.

1.4.2 Emulating Human–Human Interactions


Emulating human–human interaction is very important because humans are used to interact with emo-
tional intelligence and courtesy, and avoid repeated details and verification. Another important require-
ment is to maintain the privacy of the patient-specific information in clinical data collection, archiving
and transmission. Archiving, processing and transfer of medical data need to maintain patients’ privacy.
A healthcare provider or an insurance agency should know the patients’ condition only on “Need to
Know” basis according to the privacy laws of a country.
Traditionally, doctors are used to record dictations on recorders that are converted into a tex-
tual form. One will like to employ computers and intelligent techniques to understand doctors’ dicta-
tions and handwriting, and convert into a structured format that can be easily processed by computers.
However, different doctors express the same condition of a patient depending upon their expertise
and the intended audience, using different medical phrases and different words. To understand mul-
tiple different phrases carrying similar meaning, different types of medical dictionary, thesaurus and
cross-language dictionaries are needed along with the natural language understanding, generation and
translation software.

1.4.3 Improving Clinical Interfaces


There are multiple ways for a physician to specify the diagnosis. However, textual descriptions do not
accurately localize the abnormalities. Information about an organ defect can be conveyed by marking
on the appropriate diagram of an organ. There are multiple situations where these visual interfaces are
needed such as: 1) heart abnormality; 2) lung abnormality; 3) kidney abnormality; 4) liver abnormality;
5) spine abnormality; 6) neck muscle abnormality; 7) blood flow problems in different parts of the bodies
to describe thrombosis (blood clotting); 8) bone fractures and 9) tumors localization in a brain. Such inter-
faces should automatically be translated to textual description and archived in the multimedia database.

Example 1.3
Figure 1.2 shows a cross-sectional view into a heart showing various chambers and heart valves. A
physician when prompted with this visual interface can easily mark multiple heart-related abnormali-
ties on a computer screen.
1 • Introduction 9

FIGURE 1.2 A cross-sectional view of heart diagram for visual interface (Figure courtesy © Dr. Purva Gawde,
part of her PhD dissertation, used with permission).

1.4.4 Privacy and Security


Secure data collection, archiving and transmission facilitates: 1) availability of health care remotely by patients,
including elderly care, long-term rehabilitation and chronic disease management; 2) sharing of data between
healthcare providers, reducing test-duplications and improving the mobility of people; 3) improved data col-
lection using mobile devices to record transient health conditions that cannot be reproduced; 4) integrated real-
time care coordination specially for intensive care unit patients and serious injury patients where paramedics
can transmit monitored data to the main hospital in real time and 5) portability of data for a patient.

1.5 MOTIVATION AND LEARNING OUTCOMES


The motivation of this course is to prepare the students who want to develop computational techniques
and software in “Computational Health Informatics.” Automation of health informatics requires deep
understanding and integration of various subfields of computer science along with deeper knowledge of
physiological processes and clinical data analysis.
The educators and software developers find a large gap in this fast-evolving automation of health
industry where definitions of various subfields are still evolving. The market has created many interdis-
ciplinary jobs that integrate health science, data science, computer science and healthcare management,
including the application of computers in hospital management.
This book and the related course in “Computational Health Informatics” will reduce the gap by:

1. providing background knowledge of various clinical concepts that need automation;


2. providing knowledge of various computational concepts and intelligent techniques used for the
analysis of clinical data; and
3. providing knowledge of various computational concepts and techniques used in the automation
of electronic archiving, retrieval, transfer and analysis of patient-centered data.
10 Introduction to Computational Health Informatics

This book is for the first course in “Computational Health Informatics” suitable for senior undergraduate
students and fresh graduate students. It is also suitable for the researchers in one discipline such as com-
puter scientists to understand the concepts and issues involved in “Computational Health Informatics” or a
physician getting educated in understanding computational techniques involved in the automation process
of healthcare.
This book prepares the students and the researchers to explore further in this fast-evolving field
and does not provide detailed algorithms and software of various approaches described throughout
this book. Detailed algorithms can be looked up in the cited research articles, and can be explored
further.

1.6 OVERVIEW OF COMPUTATIONAL


HEALTH INFORMATICS
This section summarizes various subfields of computational health informatics subsequently described
in the following chapters. The scope of computational health informatics keeps evolving. This book
describes: 1) automated data collection and medical databases; 2) seamless integration of medical data
from heterogeneous sources; 3) various medical knowledge bases related to disease and medical pro-
cedures; 4) intelligent and data mining techniques used for data analysis and knowledge discovery;
5) radiology image analysis and transmission; 6) major biosignals such as ECG, EEG and EMG, their role
in the related diseases and their computational analysis; 7) analysis of clinical data and the role of decision
support systems in helping medical practitioners; 8) techniques and issues used in monitoring patients
remotely; 9) computational techniques in analyzing genomic data and 10) pharmacokinetics and related
computational techniques for studying the efficacy of administered drugs.

1.6.1 Medical Databases


Medical data can be clinical data such as radiology images of internal organs and their motion, bone
fractures, pathology, medication, disease-diagnosis and data related to a large voluntary group of
patients being treated for similar abnormalities. Very large databases are used to archive: 1) patient-
related information; 2) medication billing and payment data; 3) multiyear EMR (Electronic Medical
Record); 4) multiyear medical images such as MRI, computerized tomography (CAT) scans or X-ray
images; 5) timed signal and signal analysis data such as ECG, EEG and EMG; 6) videos of surgery
or movement of internal organs and 7) actual voice and written notes by the medical practitioners for
authentication. These records are used for future reference and diagnosis, to avoid duplication when
a patient gets second opinion from another healthcare provider and to save cost. The information is
exchanged regularly between the medical organizations and medical practitioners to facilitate the treat-
ment of patients.
Medical databases have huge memory requirement, and the data should be accessed quickly over the
Internet. Medical images stored in a medical database should have high resolution for accurate diagnosis.

1.6.1.1 Electronic medical records


Different organizations require a different type of data. For example, a healthcare provider needs
patient’s history, including diseases, surgeries, medications and their side-effects on the patients, dos-
ages, lab-tests and their results. In contrast, pharmacies only need to know the name of the doctor,
1 • Introduction 11

medicines and their dosages. The billing department only needs to know the procedure codes that
medical practitioner performed, stay-time in the hospital, types of treatment (regular or emergencies;
inpatient or outpatient), and insurance companies need to know the procedure codes, name of the
patients, any duplication of the procedure codes, and whether the procedure codes are allowed. Medical
practitioners should be able to query by date, range of dates and by content-based similarity. This
requires innovative information archiving/retrieval techniques.
With the availability of cheap computational power and ever-growing networking of computer, it
has become possible to put all the health informatics data into large databases and share the information
electronically using a secure computer network either by transmitting or by remotely accessing a central
database. This automation of data requires interoperability among multiple types of databases used by
health organizations. In addition, database records should be carried over the Internet to the other organi-
zation, medical practitioners or the patients.
EMR is a networked database that contains: personal information of patients, physicians’ infor-
mation, patient’s physician-related information, patients’ history, information about patient-doctor
encounters, including prescription and lab-results, information about patient monitoring, patient-
reminders, information about dispensing of the medicine by the pharmacists, list of pharmacies and
their information, information about prescriptions sent to pharmacies, information about hospital
facilities, information about lab facilities, information about insurance agencies, information about
billing, billing-audit related information, Addresses of the entities (patients, physicians, clinics, hos-
pitals etc.), treatment eligibility information, drug-related information. The information is related to
each other using primary keys such as patient-ids, provider-id, hospital-id, insurance-id, pharmacy-id,
encounter-id, etc. Each of these information contents has multiple fields. For example, patient infor-
mation includes fields such as (name, gender, title, occupation, employer, patient-id, social security
number, driver license, date of birth, New/repeat, address, phone(s), insurance company, insurance
type, emergency contact(s), etc.). Figure 1.3 shows an interconnection of a subset of these information
components related to an electronic health record (EHR) database. The textboxes show different rela-
tional tables.
The relational tables are connected through shared-key to other relational tables. For example,
the textbox “patient personal information” contains the set of information for each individual patient;
the textbox “physician information” contains the set of information for each individual physician; and
pharmacy information contains the set of information for each individual pharmacy. “Patient-physician
encounter” is a relational table containing all the information about appointment for a patient with
a physician. All the edges between the relational tables show the connection between the relational
tables using a common field used to identify individual records (tuples) uniquely in at least one of the
relational tables.

FIGURE 1.3 A small subset of EHR database connectivity


12 Introduction to Computational Health Informatics

For example, “patient-id” will be a unique id for the relational table “Patient’s personal informa-
tion,” and “Physician-id” will be a unique-id for the relational table “physician information.” The field
“patient-id” connects the relational table “Patient-physician encounter” with the relational table “Patient’s
personal information” so that related records related to patients can be retrieved. Similarly, the relational
table “Patient-physician encounter” is connected to the relational table “physician information” so that
physician-related information can be retrieved. Alternately, we can answer a query, what all patients a
physician has seen over a period?
Medical databases can be geographically separated as each hospital has its own proprietary informa-
tion about the patients that cannot be shared with others, other than the patient, without proper authoriza-
tion as permitted by HIPAA. The cross-references about the same patient-ids are stored in a centralized
database so that the records from other hospitals can be retrieved with no duplications as described in
Section 3.3.2.
The advantages of EMR are: 1) instant access to integrated data to avoid duplication of the tests;
2) data-archiving for a long period to study the recovery of the patients; 3) analysis of disease specific data
to identify patterns of parameters that can cause the diseases; 4) data analytics to study the effectiveness,
toxicity (harmful effects) and proper dosage of medications; 5) integration of real-time monitoring of the
patients’ vital signs, lab results, diagnosis and medicine dissemination and 6) providing remote health
care to elderly patients. The overall effect is to reduce the cost of medical care while providing the opti-
mum use of the resources such as patients served per healthcare providers, patients treated per hospital
bed and patients served for every nurse.

1.6.1.2 Information retrieval issues


Medical databases contain a huge amount of heterogeneous data. Database records are complex with
multiple values, and they are tagged with a unique key (primary-key). Database records are stored in the
secondary storage. Images like X-ray have two dimensions; MRI has three dimensions and data charac-
terized by N-features (N > 1) have N-dimensions. Computational health informatics deals with images,
large-size records and N-dimensional feature space due to the presence of diseases, health conditions and
biomarkers that are characterized by N-variables.
Health informatics requires similarity-based comparison to compare to find similar radiology images
or similar feature-values in pathology reports to find out similar disease states. Hence, any content-based
search mechanism should incorporate similarity-based matching. The search can also be initiated using
temporal logic such as “ECG two years before myocardial infarction.” Hence, the search should be capa-
ble of handling abstract temporal information and match the record time with the query using temporal
abstraction.

1.6.1.3 Information de-identification


A patient’s treatment is related to patient’s background, history and knowledge about the medication and
medical procedures. The process of knowledge formation requires statistical analysis. Statistical analysis
is not patient specific. Hence, all the individual information that can identify an individual patient needs to
be removed from statistical analysis. Such information includes patient’s name, patient’s address, patient’s
geographical location, age, gender, ethnicity and patient’s medical history. This is important because
such information can be unlawfully used against a specific patient by the hospitals, employers, insurance
agencies and individual communities. Techniques have been developed to: 1) remove identification of the
patient’s data before data is statistically analyzed; 2) prohibit unauthorized access and update to patients’
data and 3) to ensure the authenticity of data before updating the medical databases. The topic has been
discussed in detail in Section 4.10.4.
1 • Introduction 13

FIGURE 1.4 A schematic of trust-management system for maintaining HIPAA

1.6.1.4 Maintaining patient privacy


HIPAA can be violated by unauthorized access in the database. The database should be protected against
1) illegal accesses; 2) the hackers and 3) unauthorized access within a hospital setting where doctors-on-duty
keep changing based upon the shift. If a doctor is not on duty, then HIPAA compliance prohibits him/her
to see the patient-specific data.
To comply with HIPAA in a hospital situation where doctor-on-duty keeps changing, a person should
be authenticated as the current healthcare provider by a trust-management unit. The doctor-on-duty can
open the documents after an authentic key is supplied to him. This privilege is either kept dynamic or it is
limited to certain monitors where the doctors can visualize the patient-specific data. The overall schema
is shown in Figure 1.4.

1.6.1.5 Standardized medical knowledge bases


Knowledge bases have rules besides basic databases that can derive additional information by com-
bining two or more database relations. Different types of medical knowledge bases interact with each
other. Each hospital and provider’s office use large standardized knowledge bases of:

1. Procedures for the diagnosis, treatment, surgeries and billing that require efficient compression
and transmission over the Internet;
2. Medicines administered to patients in different diseases;
3. Synonyms and antonyms used for automated text analysis so that similarity between two seem-
ingly different texts can be understood; and
4. Images and ECG histories of patients to compare the progression/remission rates of diseases.

1.6.1.6 Automated data collection


There are multiple sensor devices that collect patients’ vital sign data, perform automated analysis of
lab-data and create radiology images. This data should be automatically archived into a medical data-
base. Each hardware device has a software-based virtual device that controls the hardware operations
and collects sensor data from the electronic circuitry. To interface the information between laboratory
test results, data collected by virtual devices and hospital information systems are linked using a com-
mon clinical interface format. The associated software infrastructure is called Clinical Enterprise
Service Bus (or Health Enterprise Service Bus). It links different virtual devices and medical database
14 Introduction to Computational Health Informatics

FIGURE 1.5 A schematic for automated data acquisition using health service bus

each having a different standard and format. The adapters transform back and forth the data in common
format to the data format of individual virtual device and structured data suitable for archiving in a
medical database. There are multiple types of adapters to provide the data transformation as illustrated
in Figure 1.5.

1.6.2 Medical Information Exchange


Medical databases are stored in different organizations that use a variety of software developed by vari-
ous vendors. These software packages are incompatible due to the differences in the data formats. This
incompatibility necessitates the use of a common platform and the corresponding software to convert data
between various formats using a common platform to support information exchange as follows: format1
↔ common platform ↔ format2. To avoid duplication of patients’ data, there must be a mechanism to
cross-reference the patient records of the same patients archived in different institutions.
Many organizations are involved as end-points of data transfer. It could be a hospital, a clinic, includ-
ing a remote-clinic, a health-provider’s office, a patient’s computer or mobile, a mobile computer in the
battlefield, a pharmacy, a billing office or an insurance company. All these people are connected to each
other over the Internet using a secure communication protocol. This communication pathway is called
Health Information Bus or Medical Enterprise Bus, and is similar to the concept of computer architecture
bus inside a computer where different peripheral devices are connected through this common bus.

1.6.2.1 Standards for information exchange


Internet-based transmission is based upon XML (eXtended Markup Language) based tools such as SOAP
(Simple Object Access Protocol) that embeds the patient-related information to transmit to the destina-
tion where the information is again extracted. XML is an intermediate layer Internet-based language for
information storage and computation, and is based upon encoding all the nested data into a flat format
using user-defined tags. Using XML, complex relational databases – database expressed as a network of
two-dimensional relational tables, image databases and complex graph structures can be encoded and
transmitted over the Internet with ease, and can be interfaced with other web-based languages such as
JavaScript, Java, Python and C#. SOAP is an XML-based communication protocol that acts as an enve-
lope for transmitting the information from a source to the destination.
1 • Introduction 15

Application layer sender Application layer receiver

Application layer
HL-7 message HL-7 message

Adapter Web layer Adapter

Web interface

SOAP SOAP
message Transport layer message

Secure Internet transmission protocol

FIGURE 1.6 A schematic of Internet-based medical information exchange using HL7 format

Current technology uses Java-based or XML-based relational/object-based databases. Medical data


are transmitted over the Internet using wrapper languages. Many medical XML-based wrappers have
been developed for transferring medical data transfer securely between two health organizations or a
health organization and a billing organization.
HL7 (Health Level 7) is a clinical markup standard that specifies the structure and semantics of
the clinical documents to be transferred over the Internet. Information from a database is embedded in
the HL7 format using interface software at the source-end, and is converted back at the destination-end
using a software. HL7 is built on top of the SOAP protocol. Providing a common interface and trans-
mission standard provides interoperability between heterogeneous databases/knowledge bases imple-
mented by different vendors across different medical organizations. This interoperability provides ease
of information transfer cutting down the cost by avoiding the duplication. The overall transmission
layer is shown in Figure 1.6.

1.6.2.2 Types of connectivity


There are different types of communication networks to transmit the data:

1. Medical Information Bus (MIB) that is used to send the sensor data to a central monitoring
station where it can be analyzed for any emergency condition and visualized by the nurse-on-duty.
MIB is used to gather data from the sensors used to monitor the patients post-surgery, in inten-
sive care units and during the surgery. Different sensors are developed by different vendors
using their own proprietary data-format. The task of the MIB is to provide a common data-
format for information exchange for data coming from different sensors. The advantage of MIB
is that devices from different vendors become plug-and-play. The current standard for MIB is
IEEE 11073, and it has seven layers: physical layer, data-link layer, network layer, transport
layer, session layer, presentation layer and application layer. The communication standard is
based upon an intermediate XML message-based language called MDDL – Medical Device
Data Language. Each physical device is described as “Virtual Medical Device” (VMD) in
MDDL. MDDL codes for: i) medical devices; ii) different types of alerts and iii) units for mea-
surements. More details about clinical interfaces are given in Sections 3.1.3–3.1.5.
2. Communication between different units within the same organization with homogeneous data-
bases is done using HL7 over the Intranet or secure cloud.
16 Introduction to Computational Health Informatics

3. Communication between different organizations having heterogeneous databases is done using


HL7 and various adapters (interfaces). The structured data in a database is converted to HL7
format at the source end, and is retrieved and converted back to structured data at the destina-
tion end as illustrated in Figure 1.6.
4. An enterprise bus or cloud is used to communicate between different types of organizations
such as hospitals, data warehouses, pharmacies, patients, insurance agencies, data analytics
centers, academic research centers, healthcare providers outside the hospital network. The
advantage of using cloud is that large amount of data, including images can be stored in the
cloud, and can be accessed on demand. There are different types of cloud: 1) secure private
cloud within the firewall and 2) public clouds. An important issue is to interface the informa-
tion from secure clouds within the firewall with the public cloud outside the firewall. To protect
the privacy and confidentiality of the data going out of secured private cloud, the data must
be encrypted sufficiently, and it can only be decrypted at the destination after verifying the
encryption key or the password that can be linked to the encryption key. Some of these authen-
tication needs are dynamic. For example, if a healthcare provider has stopped providing care to
a patient, then s(he) should not be able to see the patient-related data.

1.6.3 Integration of Electronic Health Records


Multiple hospitals and medical datamarts hold patients’ data and have ownership rights on these data. A
patient goes to more than one hospital, medical practitioner, or pharmacy and may have more than one
insurance company that pays for their bills. All these institutions need to access relevant patient-related
information to treat the patient and handle billing information. There are many issues in integrating the
records of patients such as: 1) problem of heterogeneity and interoperability; 2) patients being indexed
differently by different health organizations; and 3) use of different incompatible software tools from dif-
ferent vendors by the health organizations.
Heterogeneity is caused due to the use of different operating systems, multiple incompatible stan-
dards and incompatible data-formats to archive and process data. Due to the implicit incompatibility in
the data-format, there is a need for: 1) interface software that converts the data from one format to another
to make it understandable across the organizations and 2) the use of middleware like Java (and Java
Virtual Machine) or XML-based standard formats that have become the basis of transmitting data from
one organization to another organization.

1.6.3.1 Accessing from heterogeneous databases


Another major problem in implementing an integrated record system is that health organizations
have different indexing systems to access patients’ records in their database. Besides HIPAA privacy
restrictions and health organizations’ ownership rights, different indexing system makes it impossible
to share the record of the same patient across the organizations. There must be one unique index for
each patient across multiple organizations to share the patient information. To solve this problem of
cross-sharing the patients’ information, two types of indexes are created: 1) unique index of a patient
and 2) index within the organizations. The unique index is cross-referenced with the index within the
organizations.
To access the information from one organization to another, following steps are taken: 1) The index
of the requesting organization is converted to the unique index using a cross-reference table stored in a
centralized database in a data warehouse; 2) this unique index is converted to the index of the patient in
the second health organization supplying the patient information; 3) the information is retrieved; 4) infor-
mation is converted into a common format and 5) at the requesting organization, information is converted
back to the structured format and stored in the local database or viewed. Information can be sent directly
between the organizations or through the datamart.
1 • Introduction 17

FIGURE 1.7 A scheme of data sharing between two heterogeneous databases

Figure 1.7 shows a schema for information exchange using unique patient index. Two databases cor-
respond to requesting party and sending party. Each database is connected through the incoming and
outgoing data interfaces to the cross-reference database that store patients’ unique patient-id, while the
databases of an individual health-organization stores its local indexing for the patients.
The cross-reference database has two types of tables: 1) a table to look up the unique patient-index
given the local-indices and 2) a table to lookup local patient-id given the unique patient-index. Using
these tables, the local patient-id from the requesting database is converted to local patient-id of the
sending database and vice-versa. The outgoing data interfaces convert the local data format to a com-
mon data format such as HL7, and the incoming data interfaces convert the HL7 format back to the
local data format.

1.6.3.2 Heterogeneity and interoperability


Institutions use software developed by different vendors. There is no consensus on a standard format
between vendors for many reasons such as: 1) competition; 2) evolution of the standards based upon the
improvement in the technology; 3) variations in the requirements of different institutions; 4) variations
in how different medical practitioners provide the medical information and diagnostics; and 5) privacy
laws needed to protect patients’ privacies. Attempts are regularly made to develop a consensus format for
middle-ware such as medical procedure code, lab-test code, disease code, medical information exchange
format and medicine code. However, these formats also evolve as the systems and procedures continu-
ously change in medical industry, and new information is created.
Another problem is the difference in the perceptions of the software developers and medical practi-
tioners. Current software tools and interfaces are not sufficiently intelligent, and are template-driven. This
creates adoptability problems for these tools. These tools are altered based upon the feedback from the
medical practitioners. However, this causes multiple incompatible versions of formats and libraries across
the medical industry. Incompatibility in documents is also caused by using different words and phrases by
doctors to describe the same medical conditions to a different audience (expert vs patient), recommenda-
tions and medicines having the same basic chemical/biochemical compound but sold under a different
brand name. Wordings also change with a change in the medical domain.
18 Introduction to Computational Health Informatics

To solve this problem of incompatibility of operating systems, standards, formats, language librar-
ies, the multitude of health vocabulary meaning similar things, intelligent knowledge-based software is
needed to interface records and develop dictionaries that provide interoperability, and intelligent software
is needed to analyze documents that would identify and match the meaning of sentences and phrases.

1.6.4 Knowledge Bases for Health Vocabulary


We need multiple dictionaries to standardize health vocabulary such as medical procedures, including
surgical procedures, medicines, disease states and cross-referencing tables to identify that two names refer
to the same entity. There are many popular dictionaries such as LOINC (Logical Observation Identifiers
Names and Codes), MedDRA (Medical Dictionary for Regulatory Activities), SNOMED (Systematized
Nomenclature of Medicine) and ICD (International Classification of Diseases).

1.6.4.1 LOINC
LOINC is a dictionary of universal code names for medical terminology related to EHRs. The use of
universal standardized codes facilitates the electronic exchange of medical information such as medical
procedures, including surgical procedures, lab-tests, devices used in lab-tests and clinical observations.
Data-exchange standards such as HL7 and IHE (Integrating the Healthcare Enterprise) interface with
LOINC codes. More details of LOINC have been described in Section 4.8.3.

1.6.4.2 MedDRA
MedDRA is a tool to encode and communicate the pharmaceutical terminologies related to the clinical
tests of drugs, vaccines and drug-delivery devices under development. It covers the tests and outcomes
starting from the clinical stage up to the marketing stage. Much of the clinical information such as dis-
eases and disorders, observed signs and symptoms, drug efficacy (effectiveness of the drug), side-effects,
adverse events; social relationships and family history are encoded and communicated using MedDRA.
These data are pooled, analyzed, compared and verified using standardized data analytics tools. The
medical terminology of MedDRA is extensive, and is structured hierarchically.
The advantage of MedDRA is to provide encoding free from language and cultural barriers. The
results and outcomes are shared among the clinical researchers. It is a rich, highly specific, hierarchical,
medically oriented and rigorously maintained terminology designed to meet the needs of drug regulators
and the pharmaceutical industry as a shared international standard.

1.6.4.3 SNOMED
SNOMED CT (Systematized Nomenclature of Medicine − Clinical Terms) describes universal codes
for medical terms such as medicine, procedures, microorganisms, diseases, synonyms, anatomy of
where a disease occurs, functions and structure of medicines, chemical agents or microorganisms caus-
ing a disease, chemical name of the drugs, disease diagnosis, devices and activities used in treating
diseases and social relationships associated with disease conditions. SNOMED codes are transmitted
over the Internet using exchange standards such as HL7. The coding structure of SNOMED has been
discussed in Section 4.8.2.

1.6.4.4 ICD
ICD is a world standard for the classification of diseases, their symptoms, their diagnostics, abnormal
findings, origin and spread, complaints and social circumstances. It is supported by WHO (World Health
Organization), and is used worldwide to collect statistics of the treatment, symptoms and fatalities caused
1 • Introduction 19

by various diseases. ICD keeps getting updated as the medicines, treatments and medical procedures get
updated. The current version is ICD-10. More details of ICD codes and their structure are discussed in
Section 4.8.4.

1.6.5 Concept Similarity and Ontology


Ontology is described using a hierarchical networked structure (directed acyclic graph) involving class,
instance, attributes, relations, semantics within a domain, restrictions and events that may affect the entity
attributes and relations. Ontology is used to identify equivalency or phrases that convey the similar mean-
ing. However, they may be stated differently depending upon the context.
Given two separate databases from different medical sources, an ontological structure needs to be
created to relate the concepts within the databases. Some terms may define a concept that may be sub-
sumed by another term using subclass relationship, while a combination of two terms may define a con-
cept that has not been explicitly stated. This hierarchical (directed acyclic graph) structure is an integral
part of ontology. Using transitivity between various relationships, two terms are related to each other as
illustrated in Example 1.5 and Figure 1.8.
Ontology has been widely used to relate multiple heterogeneous medical databases and knowledge
bases to each other by first transforming databases to ontological structure, and then equating two different
looking terms using this hierarchical ontological structure.

Example 1.4
The word “growth” in the context of tumor diagnosis will mean malignancy of the tumor, and will be
a cause of much anxiety. However, growth within the context of a child will be a healthy and welcome
aspect. Multiple medical terminologies have been used within the same and different dictionaries that
have similar meaning.

Example 1.5
A cardiologist describes to a physician that a patient has “arrhythmia” (a condition describing an
irregular beat-pattern of heart); another cardiologist describes it as “tachycardia”; the third cardiolo-
gist describes to the patient as “fast heartbeat;” first cardiologist describes to another cardiologist as
“ventricular arrhythmia.”

FIGURE 1.8 An illustration of hierarchical structure for the ontology in Example 1.5
20 Introduction to Computational Health Informatics

A close scrutiny of the phrases shows that all four conditions are related semantically using a hierar-
chical network. The phrase “fast heartbeat” is related to “tachycardia” using the relation is-meaning-of;
the entity “tachycardia” is related to the entity “arrhythmia” using the subclass relation is-a; the entity
“ventricular arrhythmia” is related to the entity “arrhythmia” using the subclass relationship is-a; the entity
“tachycardia” is related to the entity “ventricular arrhythmia” using a relationship can-be. A hierarchical
structure with entities as nodes and relations as edges relates all four terms using transitivity of relations as
illustrated in Figure 1.8.
In Section 1.6.4, many standards were described. These standards have overlapping domains. One
database may use medical codes using one standard such as LOINC while the other may use SNOMED.
Ontology will relate the two terms from different databases or match two natural language descriptions.
More details about ontology description and their role in integrating information in the heterogeneous
databases are described under Section 3.9.

1.6.6 Interfaces
When operating in a heterogeneous environment in a culturally diverse country, the data has to be inter-
faced to: 1) patient; 2) medical practitioners, including specialists, surgeons, nurses, pharmacists, para-
medics and radiologists; 3) billing personnel; 4) appointment and social interaction staff and 5) data
analysts and academic researchers.
The data have multiple types such as patients’ family history, encounter with the medical practitio-
ners, medication history, diagnosis from the symptoms, history of the symptoms, pre- and post-history
radiology images, billing amounts along with codes. The data is exchanged between multiple hetero-
geneous databases (often placed behind the firewalls) and between medical practitioners using differ-
ent medical terminology based upon their specialization. This information exchange requires medical
interfaces.
Medical interfaces provide: 1) a seamless transition of image and data between heterogeneous data-
bases and 2) interaction between different actors and the data collection system. The second type of
interface is a user interface and varies based upon the actor (patient, doctor, nurse), type of disorder (eye
disorder, heart disorder, lung disorder, various types of fractures, and brain tumors) and type of interven-
tion (biopsy, surgery, simulation of surgery, etc.).

1.6.6.1 Visual interfaces


The major idea is to provide a natural and effortless interaction with the actor. For example, it would be
much easier for a heart specialist to see an image of a heart with arteries and mark the blockage at the
exact spot instead of just a textual explanation. Similarly, a patient will like a simple template-based inter-
face to provide the information about him. A surgeon will like to know the exact location of the disorder
to plan his surgery and may practice using a simulation with a realistic three-dimensional interface. The
image-based interface should have zoom capability and capability to see one or more views so that the
region of interest can be accurately located.

1.6.6.2 Natural language interfaces


Physicians and patients will like to understand the diagnosis in natural language, and patients will like to
understand the diagnosis in simple terms devoid of complex medical terms. That means there is a need to
develop user interfaces that accept and translate clinicians’ natural language speech, parse the sentences
and translate it to structured data into the database that can be used by computer applications. Similarly,
after retrieving the information from a database, the user interfaces need to have the capability to generate
equivalent context-aware natural language description.
1 • Introduction 21

An important aspect missing in present day user interfaces is the lack of understanding of human–
computer interaction and the cognitive aspect of human comprehension in the information presentation
and collection. Cognitive science studies human comprehension and performance and will help in devel-
oping better user interfaces. The development of interfaces requires middleware tools that integrate well
with image-based representation and with the web-based and/or cloud-based databases.

1.6.7 Intelligent Modeling and Data Analysis


Artificial intelligence (or computational intelligence) simulates intelligent reasoning integrating math-
ematical (such as probabilistic reasoning, heuristic reasoning and statistical reasoning) and computational
techniques (search and deductive reasoning techniques and machine learning techniques). The major
advantages of computational intelligence are: 1) reduced computation time for hard problems with mul-
tiple constraints that cannot be handled easily using traditional algorithmic techniques; 2) pruning the
search space to yield solution efficiently; 3) to provide the most plausible solution and explanation by stor-
ing the deduction steps; 4) to analyze data and learn from solution steps to improve the performance and
5) to analyze data to discover new knowledge.
Healthcare data analytics means: 1) analysis of lab results for disease diagnosis and recovery;
2) analysis of the lab results for better diagnosis; 3) analysis of the digitized sensor data in the emer-
gency room, during surgery and post-surgery data to identify any alert condition; 4) analysis of medication
response patterns to identify the optimum dosage that causes minimal toxicity − harmful effects on the
body; 5) analysis of medical images such as MRI, CAT (computer-aided tomography) scan, X-ray, PET
(positron emission tomography) scan to identify the abnormalities and diseases in vital human organs with
minimum invasiveness; 6) text analysis of doctors’ notes to store the relevant medical data in the electronic
medical database and 7) development of various user interfaces and adapters that transform data from one
format to another format to provide compatibility between distributed databases and knowledge bases.
Artificial intelligence techniques have been applied to predict: 1) the patients’ recovery; 2) effec-
tive medicine dosage; 3) identify subclasses of heart diseases using ECG analysis; 4) automated mining
of the clinical data to identify association patterns; 5) automated understanding of the doctor’s speech;
6) automated analysis and matching of textual diagnosis and notes written by the physicians; 7) auto-
mated identification of cancer and other malignancies using the analysis of X-rays, CAT scans, MRIs, and
ultrasounds, even before the malignancies become painful and harmful; 8) discovery of new biomarkers
indicative of complex diseases by data analytic techniques, including statistical analysis; 9) discovery of
the genetic causes of disease.
The purpose of using computational intelligence is to: 1) automate the data analytics; 2) provide bet-
ter accuracy in identifying the causes and outcomes for diseases and recoveries; 3) identify meaningful
patterns that will promote health care, including remote care; 4) provide an interface between the end
users and the EHR for time-efficient performance; 5) provide compatibility between various data-formats
and 6) predict the effect of a treatment on the patient’s survivability based on clinical trials.
Intelligent Data Analytics (IDA) utilizes multiple artificial intelligence techniques such as uncertainty-
based reasoning, temporal abstractions, probability-based reasoning, Markov models, including Hidden
Markov Models (HMM), Bayesian networks, Fast Fourier Transforms (FFT), fuzzy reasoning, clustering,
regression analysis, speech recognition, natural language understanding, data mining and visualization
tools to automatically (or semi-automatically) identify meaningful patterns embedded in medical data that
benefit health management, including diagnosis, patient recovery, prognosis and improving treatment.

1.6.7.1 Hidden Markov model


HMM is a probabilistic abstract transition machine where the state changes probabilistically with time
from one state to one of the multiple states. In an HMM, the machine state transitions are not known.
Instead, the state transition is probabilistically derived from the measured evidence. The probability of
22 Introduction to Computational Health Informatics

transition is derived by statistical analysis of the large sample set of examples with a known outcome.
HMM is described in Section 3.7.3. It has been used to model many phenomena where time-series data
involving periodic measurement of values is known. Some applications of computational health informat-
ics are ECG analysis, recovery response to medication, speech recognition, natural language understand-
ing model, gene detection during genome analysis, etc.

1.6.7.2 Uncertainty-based reasoning


Uncertainty-based associates qualitative uncertainty factor with an outcome given an input where
the same input may lead to multiple outputs. Uncertainty-based reasoning is important because
an observed phenomenon can have multiple outcomes, and all the input parameters or the map-
ping function between input parameters and the outcome may not be known. Uncertainty-based
reasoning is more than rule-based reasoning. It differs somewhat from probability-based reason-
ing because the uncertainty may not add to 1.0. Probability is based on statistical analysis of large
sample size of data.

1.6.7.3 Fuzzy logic


Fuzzy reasoning divides a range of values to a finite and small number of perceptive values. The advan-
tage of fuzzy logic is that it is based upon approximate human perception and cuts down the search space
significantly. Fuzzy logic is useful because all the parameters are not known about a phenomenon, and
measurements are not always accurate. Fuzzy logic is used during patient–doctor interaction to describe
qualitatively medical conditions of a patient. For example, a patient may state that on a scale of 1...10, the
pain is eight. It is described in more detail in Section 3.3.3.

1.6.7.4 Bayesian probabilistic network


Bayesian network is used to model a probabilistic phenomenon with multiple input parameters affecting
one or more outcomes. A dynamic Bayesian network repeated unfolds the static Bayesian network with
time. Dynamic Bayesian network is used to model time-series data involving multiple variables such as
the effect of regular periodic medications or treatment of a chronic disease where the medication is given
over a long term to contain the progression of the disease.

1.6.7.5 Speech-to-text conversion


Speech-to-text conversion analyzes the speech, separates the speech into individual words separated by
embedded silence periods, performs temporal and frequency-domain analysis of sound of individual
words and looks up a dictionary to identify the corresponding syllables. These syllables are joined and
analyzed further in a context to identify the words. These words are analyzed further to extract the infor-
mation in the textual form.
Speech to text conversion is a well-developed technology, and has commercially available software.
It is used to convert doctor’s speech to text form. The text information is further analyzed automatically
to extract information related to patient diagnosis and physician’s prescription.

1.6.7.6 Text analysis and generation


Traditionally, doctors and nurses write a handwritten note in reporting the case history of a patient.
Substituting these notes into computer-generated forms with a limited number of fields will limit physi-
cians’ options. Physicians use different health-related vocabulary to explain the same concept based
upon their expertise and their assessment of the patients’ conditions. Natural language text is also
needed to explain the condition to a patient and his relatives in an easy to comprehend manner. This
1 • Introduction 23

requires that healthcare software should be able to convert natural language summary into structured
data into the database, generate natural language from structured data and find equivalence between
two textual summaries.
Textual analysis for extracting information requires detection of health-domain specific words and
the corresponding values (including fuzzy values). Text generation uses various templates to generate
natural language, and equivalence of the two sentences is found using concept similarity and ontology.
The technique has been described in Section 3.10.

1.6.7.7 Heuristic reasoning


Most of the resource allocation and scheduling problems are hard to solve algorithmically in a realistic
duration. The solution to such problems can be expressed as a state-space search problem where a state is
a tuple of variable-value pairs. The solution space is modeled as a huge graph where each state is a node in
the graph. Each action that changes the value of one or more variable becomes the edge between the two
corresponding states. Because the graph can be huge, any blind search would waste a lot of computational
time to find a solution.
An alternative approach is to use a mathematical function to estimate the distance from the current
state to the final state and then make sure that next move does not increase the estimated distance. This
estimation of the distance using mathematical function is called heuristics. The use of heuristics keeps
the search very focused and avoids traversal to the nodes that do not lead to final states or increase the
distance. This approach reduces the computational time significantly.
In computational health informatics, allocation of any resource such as planning of bed allocation
for a maximum number of patients’ recoveries, reducing the overall cost of treatment, maximum utiliza-
tion of physicians, maximum utilizations of equipment are optimization problems and require heuristic
reasoning. Heuristics-based intelligent search techniques are described in Section 3.2.

1.6.8 Machine Learning and Knowledge Discovery


Machine learning is a computational intelligence concept used to: 1) identify different classes of the enti-
ties such that members in each class behave in a similar manner or have similar attributes; 2) identify a
pattern in the data that may predict some known associated property or outcome; 3) learn to improve the
solution to a task by shortening the search time in a state-space graph; 4) predict the outcome given the
parameters and 5) derive new knowledge by observing and modeling a phenomenon. Machine learning
is based upon characterizing an entity by the set of important attributes called features, and mapping the
feature-values based upon some well-known previous mapping or similarity of feature-values. Similarity
analysis is done using a notion of distance between the feature-values.
Many patients may have similar disease symptoms and may go through recovery based upon differ-
ent set of medications due to the difference in gender, age, ethnic variations or other coexisting disease
symptoms. To find the common causes of disease, recovery factors, medication dosages, and a large set of
data is analyzed using machine learning techniques.
There are many machine learning techniques that have been employed in computational health infor-
matics. Popular techniques are variants of clustering, regression analysis, decision trees, HMM, probabi-
listic decision network, neural networks, associative data mining and support vector machines. Various
machine learning techniques used in computational health informatics are described in Chapter 3. This
section briefly describes few popular techniques to give an intuitive understanding of machine learning.

1.6.8.1 Clustering
Clustering is an automated unsupervised learning technique for the classification of the data ele-
ments based upon modeling data in an N-dimensional space where each feature of a data element is a
24 Introduction to Computational Health Informatics

FIGURE 1.9 An example of clusters in two dimensions

dimension. Points are grouped together if the distance between the coordinate-vector for each point in
a group is less than a threshold. The underlying assumption is that two entities having similar feature-
values have other common attributes and behavior. Clustering has been used to automatically learn the
classes of entities exhibiting similar behavior. Many types of clustering techniques have been discussed
in Section 3.5.1.
Many notions of distance are used to derive the similarity between two feature-vectors. Popular
ones are: Euclidean distance, Manhattan distance and weighted Euclidean distance. Euclidean dis-
tance finds the shortest straight-line path between two points. Given two points in an N-dimensional
space as <x11, x12 , …, x1N> and <x21, x22 , …, x2N>, the Euclidean distance between the points is given
by Σ ii == 1N ( x1i − x 2i )2 . Manhattan distance finds out the sum of the absolute difference of values of the
same coordinates between two points. Given coordinate vectors <x11, x12, …, x1N> and <x21, x22, …, x2N>, the
Manhattan distance is given by Σ ii == 1N x1i − x 2i . Weighted Euclidean distance adds different weights to indi-
vidual distance components contributed by different dimensions. Given coordinate vectors <x11, x12, …, x1N>
and <x21, x22, …, x2N>, weighted Euclidean distance is given by Σ ii == 1N wi × ( x1i − x 2i )2 where wi is the weight
of the ith parameter. The rationale is that different parameters have different importance.

Example 1.6
Figure 1.9 illustrates the concept for a popular type of clustering called K-means clustering of nine
data elements. Each data element is a vector of two feature-values. Each feature has become a dimen-
sion. Thus, we mark the data elements as points in a two-dimensional plane. Two groups of points are
close to each other, and the distance between them is below a threshold. These two groups are called
clusters. The assumption is that all the points within the same cluster share common properties.

1.6.8.2 Regression analysis


Regression analysis is used to predict the trend based on the curve fitting on a set of data points. Using
regression analysis, the dependent variable can be predicted as a function of independent variable in
Figure 1.10. The curve is fitted that minimizes the experimental error. The curve can be linear or nonlin-
ear depending upon the problem. Linear curve fitting is called Linear Regression Analysis. Linear regres-
sion analysis is popular due to simple reasoning involved.
In Figure 1.10, small circles are experimental points. Using these points, a line is drawn such that it
has minimum average error distance from all the experimental data. This optimum line is used to predict
the value of the dependent variable for a random value of the independent variable within the valid range.
Regression analysis has found many applications such as efficacy analysis and toxicity analysis of medica-
tion of a treatment regimen; effect of dosage adjustment on the recovery of patients.
1 • Introduction 25

FIGURE 1.10 An example of prediction using regression analysis

1.6.8.3 Decision trees


Decision tree is an automated classification technique in which the decision-making process is based upon
a tree traversal using one parameter at a time. The most discriminating parameter is toward the root of the
decision tree, and is checked first. Using decision tree, we can classify different disease conditions based
upon parameter values as explained in Section 3.5.3.
An entity is modeled as a set of attribute-value pairs. A data element moves down to one of the
branches based on the comparison of one of the attribute value (parameter value). The process is repeated
until the entity is placed in one of the classes at the end of the tree (leaf-node). This technique classifies a
sample set into multiple groups based upon their common properties.

1.6.8.4 Data mining


Data mining is about deriving knowledge, patterns and association between two or more parameters
affecting the outcome. It uses statistical analysis techniques to identify the association patterns between
different parameters. If the values of one or more independent variables increase, then the values of the
dependent variables change in a predictable way.

Example 1.7
A large sample size of patients is given the same medication. Some get side-effects. Suppose that
the ethnicity-based analysis establishes that 90% are Afro-Americans, there is an associative pattern
between side-effect and Afro-Americans.

In health care, a large amount of data is collected from sensors monitoring the patients, lab-results,
medication reports, diagnosis based on the lab results and symptoms. This data is data-mined to:
1) identify a new set of parameter values that cause diseases; 2) derive effective dosage for different
class of patients based upon age, gender and ethnicity; 3) identify new disease patterns; and 4) identify
biomarkers − biomolecule in internal fluids from the body that indicate the presence of diseases before
other detectable symptoms appear. Data Mining is described in Section 3.8.

1.6.9 Medical Image Processing and Transmission


Medical images play a major role in healthcare due to their minimally invasive nature. Without
surgery, using various imaging techniques, the images of the internal organs and their functions are
captured. These images are used for diagnosis such as bone fracture and deformations; ligament tears;
26 Introduction to Computational Health Informatics

FIGURE 1.11 An X-ray of a fractured ankle (Images provided by Siemens Healthcare, used with written
permission).

spine injuries; tuberculosis; cancer; hidden cysts and tumors in vital organs such as brain, lungs, heart,
liver, pancreas and kidney; malfunctioning internal organs; and state of the fetus during pregnancies.
These medical images could be a two-dimensional still image as in X-ray of a bone fracture, a cascaded
sequence of images of different slices of an organ used to computationally create a 3D structure as in CAT
scan and MRI, or a video of images to record or model the motion such as heart motions to understand
the problems in heart wall motion during blood pumping. Before archiving, an image is preprocessed to
remove the noise and enhance the image quality using image processing techniques.

Example 1.8
Figure 1.11 shows an X-ray of the fractured ankle that has been operated upon. The X-ray shows if
there is any remaining problem in the healing process without performing any surgery. Similarly,
Figure 1.12 shows the MRI scan of a brain that can check for any abnormality such as tumor in the
brain without performing any invasive surgery. These images can be used to plan future surgery
more accurately.

FIGURE 1.12 MRI scan of a brain (Images provided by Siemens Healthcare, used with written permission).
1 • Introduction 27

1.6.9.1 Image processing techniques


Image processing is used for automated texture analysis and compression of images in X-rays, CAT
scan and MRI scan. When an image is transmitted from a hospital to a physician, or from a radiologist
to the central database, it needs to be compressed to avoid congesting the transmission line. However,
the compression should be such that image is not distorted. Otherwise, disease-related information
will be lost. This type of compression is called lossless compression. Another important concept in
image processing is to compare two images and find out the changes in the textures in the same physi-
cal spot: one from the past recording and one from the current recording. A change in texture shows
anomalies.
Before performing any image analysis, image quality is enhanced. Improving the quality requires
noise removal that involves removal of the spurious pixel intensities in the image. Image analysis can
be of many types such as: 1) comparing the image of a patient’s organ with an image of a healthy per-
son’s organ to identify disease-states; 2) comparing the image of a patient’s organ with a past image
of the same organ of the same patient to identify the progression of disease or remission of a disease
such as tumor.
Before comparing two images, they must be aligned. This process is called image registration.
After the image registration, intensities and textures of corresponding pixels are compared, and
changes are recorded. The extent of enclosed segments of image with similar patterns (texture and/or
intensity) is identified. The process to identify homogeneous image-regions with same intensity,
color or texture is called segmentation. Radiology images are black-and-white images. Hence, seg-
mentation of radiology images requires texture or intensity. A detailed discussion of medical image
analysis is given in Chapter 5.

1.6.9.2 Medical image transmission


The images are transmitted using DICOM (Digital Imaging and Communication in Medicine). DICOM
is an international standard for the archiving and transmission of clinical images to provide compatibil-
ity and interoperability between heterogeneous sources. The standard describes the format to store and
exchange medical images and associated information between different units within a hospital and across
multiple health-providers.
DICOM interfaces are available for different devices used in radiology such as MRI, CAT scan,
X-ray, PET, echogram and photographic films. DICOM stores the personal information of patients in the
image-header and supports protecting the patients’ personal information using user-specified encryption
and authentication. A detailed description of DICOM standard and image-exchange protocols is given in
Chapter 6.

1.6.10 Biosignal Processing


Biosignal analysis develops techniques and tools to analyze different types of signals. In health informat-
ics, the signals can be ECG, EEG, MEG and EMG. Signals could also be speech signals coming from the
doctor’s dictation about patient’s condition that needs to be converted into text form to be stored within
the database for further matching with past data.
Signal processing and image analysis are important for noninvasive visualization of internal
organs, their functioning, their motion and any defect. This information helps in accurate diagnosis,
planning the surgery, intervention before the problem becomes acute and monitoring the recovery of
the patients.

You might also like