Data

« Back to Glossary Index

Data and Information
– The Latin word ‘data’ is the plural of ‘datum,’ meaning ‘thing given.’
– The word ‘data’ was first used in English in the 1640s.
– The term ‘data processing’ was first used in 1954.
– ‘Data’ is treated as a mass noun when used as a synonym for information.
– Some style guides recommend using the form of ‘data’ that suits the target audience.
Data, information, knowledge, and wisdom are closely related concepts.
Data becomes information once it has been analyzed.
– The amount of information in a data stream can be measured by its Shannon entropy.
– Knowledge is the awareness of its environment that an entity possesses.
Data is often considered the least abstract concept, information the next least, and knowledge the most abstract.
Data can be seen as the smallest units of factual information.
– Contextually connected pieces of information can be described as data insights or intelligence.
– The stock of insights and intelligence resulting from the synthesis of data into information is called knowledge.
Data represents existing information or knowledge that is represented or coded for better usage or processing.
– Advances in computing technologies have led to the advent of big data and the field of data science.

Types of Data
Data can represent abstract ideas or concrete measurements.
– Examples of data sets include price indices, unemployment rates, and census data.
Data can be collected using techniques such as measurement, observation, and analysis.
– Field data is collected in an uncontrolled environment, while experimental data is generated in a controlled scientific experiment.
– Raw data is typically cleaned before analysis by removing outliers and correcting errors.

Data in the Digital Economy
Data has been described as the new oil of the digital economy.
– Big data refers to very large quantities of data, usually at the petabyte scale.
– Traditional data analysis methods may be difficult or impossible to apply to big data.
Data science uses machine learning and AI methods to analyze big data efficiently.
– Extracting insights or intelligence from infinite data would be theoretically impossible.

Data Collection and Longevity
Data can be gathered through primary or secondary sources.
Data analysis methodologies include data triangulation and data percolation.
– The data is percolated using pre-determined steps to extract relevant information.
– The longevity of data is an important field in computer science, technology, and library science.
– Scientific data stored on hard drives or optical discs may become unreadable after a few decades.
Data accessibility is a problem as much scientific data is never published or deposited in data repositories.
– A survey found that less than 1 out of 5 studies were able or willing to provide requested data.
– More than half of the datasets in Dryad lacked details to reproduce research results.

FAIR Data, Metadata, and Data Representation
– FAIR data is data that is Findable, Accessible, Interoperable, and Reusable.
– Requiring FAIR data can improve reproducibility and advance science and technology.
– The problem of reproducibility in using data in other fields is that it may introduce assumptions that are counterproductive.
– The term ‘capta’ has been introduced as an alternative to ‘data’ for visual representations in the humanities.
– The humanities affirm knowledge production as situated, partial, and constitutive.
– Metadata is a description of other data.
– The library catalog is a prototypical example of metadata.
– Mechanical computing devices are classified based on how they represent data.
– Analog computers represent data as physical quantities like voltage or distance.
– Digital computers represent data as a sequence of symbols from a fixed alphabet. Source:  https://en.wikipedia.org/wiki/Data

Data (Wikipedia)

In common usage data (US: /ˈdætə/; UK: /ˈdtə/) is a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formally. A datum is an individual value in a collection of data. Data is usually organized into structures such as tables that provide additional context and meaning, and which may themselves be used as data in larger structures. Data may be used as variables in a computational process. Data may represent abstract ideas or concrete measurements. Data is commonly used in scientific research, economics, and in virtually every other form of human organizational activity. Examples of data sets include price indices (such as consumer price index), unemployment rates, literacy rates, and census data. In this context, data represents the raw facts and figures from which useful information can be extracted.

These are some of the different types of data.

Data is collected using techniques such as measurement, observation, query, or analysis, and is typically represented as numbers or characters which may be further processed. Field data is data that is collected in an uncontrolled in-situ environment. Experimental data is data that is generated in the course of a controlled scientific experiment. Data is analyzed using techniques such as calculation, reasoning, discussion, presentation, visualization, or other forms of post-analysis. Prior to analysis, raw data (or unprocessed data) is typically cleaned: Outliers are removed and obvious instrument or data entry errors are corrected.

Data can be seen as the smallest units of factual information that can be used as a basis for calculation, reasoning, or discussion. Data can range from abstract ideas to concrete measurements, including, but not limited to, statistics. Thematically connected data presented in some relevant context can be viewed as information. Contextually connected pieces of information can then be described as data insights or intelligence. The stock of insights and intelligence that accumulates over time resulting from the synthesis of data into information, can then be described as knowledge. Data has been described as "the new oil of the digital economy". Data, as a general concept, refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing.

Advances in computing technologies have led to the advent of big data, which usually refers to very large quantities of data, usually at the petabyte scale. Using traditional data analysis methods and computing, working with such large (and growing) datasets is difficult, even impossible. (Theoretically speaking, infinite data would yield infinite information, which would render extracting insights or intelligence impossible.) In response, the relatively new field of data science uses machine learning (and other artificial intelligence (AI)) methods that allow for efficient applications of analytic methods to big data.

« Back to Glossary Index