Overview of the Data Sets (Website Version)

Author
Affiliation

Jon Yearsley

School of Biology and Environmental Science, UCD

Published

January, 2025

BEEKEEPER.TXT

This file is a text file with lank space delimiting the variables. The first six rows contain a brief description of the data. The data can be downloaded from http://drjonyearsley.github.io/Data/BEEKEEPER.TXT

This data set is a field experiment to test the impact of a pesticide on a field’s flower productivity. Two fields were used, each field had three experimental treatments (control, procedural control and treatment) and four replicate plots per treatment.

Variable name Definition of the variable
Field = Field ID variable
Treatment = Treatment ID variable
. (1=pesticide, 2=water procedural control, 3=control)
Plot = Experimental plot ID variable
Flowers = Number of flowers recorded in each experimental plot

DRUG.CSV

This is a CSV file of data from the paper by Marcone et al (2024) (<https://doi.org/10.3390/ cancers16234007>) that studies the effect of a new compound (Pyrazinib, P3) on making oesophageal cancer cells more sensitive to radiation treatment. The study focuses on the effect of attaching P3 to gold nanoparticles, as a possible means of drug delivery.

The file contains data on the fraction of surviving cells 24 hours after irradiation for two types of cell under various treatment conditions. In all treatments and controls Dimethyl sulfoxide (DMSO) is used as a vehicle.

Variable name Definition of the variable
CellType = The name of the oesophageal adenocarcinoma cell line used (OE33P=a radiosensitive cell line, OE33R=a radioresistant cell line)
Radiation = The radiation dose applied to the cells [units Grays, Gy]
AuNP = Gold nanoparticle treatment (Yes=nanoparticles present with the cells, No=no nanoparticles present)
P3 = Pyraznib treatment (Yes=P3 present with the cells, No=no P3 present)
Treatment = The combination of radiation, AuNP and P3 treatments (5 levels)
Replicate 1 = The fraction of cells surviving treatment after 24 hours (replicate 1)
Replicate 2 = The fraction of cells surviving treatment after 24 hours (replicate 2)
Replicate 3 = The fraction of cells surviving treatment after 24 hours (replicate 3)
Replicate 4 = The fraction of cells surviving treatment after 24 hours (replicate 4)

GENE_EXPRESSION.CSV

This is a CSV file containing gene expression data from a study on strain of enterotoxigenic Staphylococcus aureus bacteria that causes food poisoning (strain BW10). The aim of the study was to discover genes that are up-regulated and down-regulated in response to envrionmental stressors (glucose and nitrate). Gene expression was recorded using RNAseq. The experiment was replicated three times.

Variable name Definition of the variable
ID = A unique gene identifier
GENE = The name of the gene identifier
TREATMENT = The environmental treatment (Ctrl=No stressor, Glucose=High glucose concentration, Nitrate=High nitrate concentration)
EXPRESSION = The expression level of the gene (read counts)

HEATSHOCK.XLSX

This is an Excel file containing the data from the paper by Gao et al (2014) (https://doi.org/10.1098/rspb.2014.1135). The dataset contains the expression of the heat shock protein HSP70 from a western blot analysis under seven different temperature treatments for two species of reptiles (T. septentrionalis and P. sinensis) and two species of birds (C. coturnix and A. platyrhynchos) .

Variable name Definition of the variable
Species Name of the species
T (°C) Temperature treatment (Units degrees centigrade)
LC Loading control. The total density on the gel of all proteins
HSP70 The total density on the gel of HSP70 protein
relative HSP70 Relative expression of HSP70 ( HSP70 / LC)

HEIGHT.CSV

This file is a comma separated variable file and is available at http://drjonyearsley.github.io/Data/HEIGHT.CSV. The file contains a subset of data from the 2012 U.S. Army Anthropometric database.

The file contains data on US army recruits aged from 20 to 40 years of age. The full data set is available at https://phc.amedd.army.mil/topics/workplacehealth/ergo/Pages/Anthropometric-Database.aspx

Variable name Definition of the variable
AGE = Age of the individual when data was collected [units=years]
SEX = Sex of the individual (Male, Female)
YEAR = The year when an observation was collected (2010, 2011, 2012)
HANDED = Handedness when writing (Left, Right, None)
HEIGHT = Height of the individual [units=m]
WEIGHT = Weight of the individual [units=kg]
WRIST_CIRC = Wrist circumference [units=mm]
HEAD_CIRC = Head circumference [units=mm]
HAND_LENGTH = Hand length [units=mm]
SPAN = The distance between the tips of the middle
. fingers of horizontally outstretched arms [units=mm]
FOOT_LENGTH = The distance from the back of the heel (pternion) to the
. landmark at the first metatarsophalangeal protrusion on
. the ball of the right foot [units=mm]
EAR = The distance between the side of the head and the outside
. edge of the right ear at its most furthest point [units=mm]
EYES = The distance between the two pupils [units=mm]

INSECT.TXT

This file is a text file of TAB delimited variables. The first three rows contain a brief description of the data.

The data can be downloaded from http://drjonyearsley.github.io/Data/INSECT.TXT

This data set contains the number of insects counted within a quadrat from experimental fields that had been treated with one of six different insecticide sprays.

Variable name Definition of the variable
Spray = the name of each insecticide spray (A-F)
Count = the number of insects counted in a standard quadrat from an experimental field

IRISH_METEO.CSV

This is a CSV file containing the daily meteorological data for Ireland from 1970 until 2022. All meteorological variables are averaged across all weather stations on the island of Ireland.

The data can be downloaded from http://drjonyearsley.github.io/Data/IRISH_METEO.CSV

Variable name Definition of the variable
date The date for the day corresponding to the meteorological data
year The year for the day corresponding to the meteorological data
doy The day of the year (1 = 1st Jan)
meanTemp Daily mean air temperature (degrees C)
minTemp Daily minimum air temperature (degrees C)
maxTemp Daily maximum air temperature (degrees C)
dailyRainfall Total daily rainfall (mm)
meanPressure Daily mean sea level pressure (Pa)
meanHumidity Daily mean humidity (%)
meanWindSpeed Daily mean wind speed (m s-1)
meanSolarRadiation Daily mean solar radiation (W m-2)

MALIN_HEAD.TXT

This is a TAB delimited text file containing monthly rainfall data for Malin Head (Ireland). The data were copied from https://www.met.ie/climate/available-data and pasted into a text file. The text file can be downloaded from http://drjonyearsley.github.io/Data/MALIN_HEAD.TXT

The data are organised as they appear on the website: in a wide format where each month’s rainfall is in a separate column. The rainfall data is therefore spread out across several columns.

Variable name Definition of the variable
Year = Year in which data was collected
Month = Month for which data was collected (Jan-Dec)
Rainfall = Monthly rainfall (mm)

MLY532.CSV

This is a CSV file containing the average monthly meteorological data for Dublin Airport from 1941 until 2024. There are 12 variables (month, year and 10 meteorological variables).

The first 19 lines of the file contain metadata. Line 20 contains a header giving the names of the variables and the data itself starts on line 21.

The data can be downloaded from Met Éireann at https://cli.fusio.net/cli/climate_data/webdata/dly532.zip

Met Éireann data and its derivatives must be accredited with the following 5 statements:

  1. Copyright statement: Copyright Met Éireann
  2. Source https://www.met.ie
  3. Licence Statement: This data is published under a Creative Commons Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/
  4. Disclaimer: Met Éireann does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use.
  5. Where applicable, an indication if the material has been modified and an indication of previous modifications
Variable name Definition of the variable Units
year Year of observation
month Month of observation
rain Precipitation Amount in a month mm
meant Mean Air Temperature in a month degrees C
maxtp Maximum Air Temperature in a month degrees C
mintp Minimum Air Temperature in a month degrees C
mnmax Mean of Daily Maximum Air Temperature in a month degrees C
mnmin Mean of Daily Minimum Air Temperature in a month degrees C
gmin Grass Minimum Temperature degrees C
wdsp Mean Wind Speed in a month knot
mxgt Highest Gust in a month knot
sun Sunshine duration in a month hours

MOSQUITO.XLSX

This is an Excel file that is available on the Zenodo data repository (https://doi.org/10.5281/zenodo.1296744). The data are from a study by Vantaux et al. (https://doi.org/10.1101/207183) that look at whether the malaria parasite Plasmodium falciparum can manipulate a mosquito’s host choice.

Variable name Definition of the variable
village = Village where data were collected (Klesso, Samendeni, Soumousso)
collection = The method used to collect mosquitos (obet = OBET, spray, tent =BNT)
origin = The location of the collection site (MH= human dwellings , MI= unoccupied houses , CA= animal shed , PA= calf baited-trap, PH= human baited trap)
fed = Has a mosquito had a blood meal? (yes, no)
parity = Reproductive status of mosquito (parous=has laid eggs, nulliparous=has not laid eggs, undetermined)
species.mol = Mosquito species from molecular data (a= anopheles arabiensis, c= anopheles coluzzii, g =anopheles gambiae, NA= undetermined)
oocyst = Number of malarial oocysts in the mosquito
spz = Presence of sporozoite in the mosquito (0=absent, 1=present)
infection = Infection status of the mosquito (oocyst = oocyst infected only, spz = oocyst and sporozoite infected & sporozoite infected only, uninfected)
infection1 = Infection status of the mosquito (oocspz = oocyst and sporozoite infected, spz =sporozoite infected only, oocyst = oocyst infected only, uninfected)
choice = Host choice of the mosquito (H= human host & animal and human only choice, other = animal choice)
RS_tech = Technique used to detect blood meal (ELISA, pcr)
spz_tech = Technique used to detect infection status (ELISA, qpcr)

msleep

msleep is a built-in data set that comes with the package ggplot2. The data can be made available by giving the following commands (ggplot2 package must be installed)

library('ggplot2')      # Load ggplot2 package
data(msleep)            # Make msleep data available

The data give morphological data and sleeping patterns of 83 mammals species from 19 orders.

The data are from the publication:
Savage, VM and West, GB (2007) A quantitative, theoretical framework for understanding mammalian sleep. Proceedings of the National Academy of Sciences, 104 (3):1051-1056.
http://www.pnas.org/content/104/3/1051.abstract

The msleep dataset has 11 variables:

Variable name Definition of the variable
name = species common name
genus = genus name
vore = feeding type (carnivore/herbivore/omnivore/insectivore)
order = name of the order
conservation = the IUCN conservation status of the species (lc/nt/en/domesticated vu/cd)
sleep_total = total sleep time (hours)
sleep_rem = REM sleep time (hours)
sleep_cycle = length of sleep cycle (hours)
awake = time spent awake (hours)
brainwt = brain weight (kg)
bodywt = body weight (kg)

PARMA_ETAL_2017.XLSX

This is an Excel file containing the data from the paper by Parma et al (2017) (https://doi.org/10.1038/s41598-017-16827-y).

The dataset contains observations on the handedness (left/right) of 29 individuals from Italy when they were 9 years old, and quantitative measurements from ultrasound scans before these individuals were born.

Variable name Definition of the variable
ID ID code for an individual
PostnatalHandedness_Code Handedness (0=right handed, 1=left handed)
PostnatalHandedness Handedness (left / right)
Gender_Code Gender of individual (0=male, 1=female)
Gender Gender of individual (male / female)
GW_Code Gestation weeks code (1=14 weeks, 2=18weeks, 3=22 weeks)
GW Gestation weeks (14, 18, 22 weeks)
Target Code Ultrasound Test Target (1=eye, 2=mouth, 3=wall)
Target Ultrasound Test Target (eye/mouth/wall)
Movement_Code Movement direction (0=right, 1=left)
Movement Movement direction (right/left)
MT Movement time (ms)
TPV Time to peak velocity (percentage)

Recruitment_data_97-07.txt

This file is a TAB delimited text file. The file can be downloaded the data from the Ecological Archives repository (http://esapubs.org/archive/ecol/E090/039/default.htm#data)

The data are from a long-term study on recruitment of marine coastal organisms. Recruitment of brown algae, mussels and barnacles were studied. Metadata describing the data are found at http://esapubs.org/archive/ecol/E090/039/metadata.htm

Variable name Definition of the variable
Year = Year in which the data point was collected
Bay = Bay from which the data point was collected
Site = Site within the bay from which the data point was collected
Size = The size of the clearing around the experimental site (main treatment)
. (0 = No clearing, 1 = 1 m, 2 = 2 m, 4=4 m, 8=8 m clearings)
Fucoid_F = Recruitment of brown algae (Number of fucoid zygotes and germlings on a total of five flats, Number per 1.25 cm2)
Fucoid_G = Recruitment of brown algae (Number of fucoid zygotes and germlings in a total of five grooves, Number per 1.25 cm)
Fucoid_T = Recruitment of brown algae (Sum of Fucoid_F and Fucoid_G)
Barnacle = Recruitment of barnacles (Number of S. balanoides cyprids and metamorphs per plate, Number per 39.6 cm2)
M_300 = Recruitment of mussels (Subsample of junvenile M. edulis trapped on 300 um sieve, Number per 40 cm2)
M_425 = Recruitment of mussels (Subsample of junvenile M. edulis trapped on 450 um sieve, Number per 40 cm2)
M_WT = Dry weight of mussel pad (g)
F_start = Date fuciod tiles placed out (Month/Day/Year)
F_end = Date fuciod tiles taken in (Month/Day/Year)
B_start = Date barnacle plates placed out (Month/Day/Year)
B_end = Date barnacle plates taken in (Month/Day/Year)
M_start = Date mussel pads placed out (Month/Day/Year)
M_end = Date mussel pads taken in (Month/Day/Year)

WOLF.CSV

This file is a text file of comma separated variables. The data can be downloaded from http://dx.doi.org/10.5061/dryad.5fp5m

The data in this file are from the publication:
Bryan H, Smits J, Koren L, Paquet P, Musiani M, Wynne-Edwards K (2014) Heavily hunted wolves have higher stress and reproductive steroids than wolves with lower hunting pressure. Functional Ecology 29(3): 347-356. http://dx.doi.org/10.1111/1365-2435.12354

This dataset includes measurements of cortisol, testosterone, and progesterone in wolf hair samples collected from hunters in the tundra-taiga and northern boreal forest of Canada. Additional samples were collected from wolves killed as part of a control program in the boreal forest (population 3).

This dataset has seven variables:

Variable name Definition of the variable
Individual = the ID of each individual (1-178)
Sex = the sex of each individual (M=male, F=female)
Population = the population that each individual belongs to
. (1=boreal forest, lightly hunted, 2=tundra-taiga, heavily hunted, 3=boreal forest, heavily hunted).
Colour = coat colour of each individual (D=dark, W=light, blank=missing data)
Cpgmg = concentration of cortisol in a hair sample [units=pg/mg of hair]
Tpgmg = concentration of testosterone in a hair sample, males only [units=pg/mg of hair]
Ppgmg = concentration of progesterone in a hair sample, females only [units=pg/mg of hair]