2  Demographic and Clinical Variables

Steps taken to harmonised data columns in relation to demographic and clinical variables are discussed here.

1 Age

age_years is the harmonised positive integer data field to denote the age of the patient during the time of the CT scan.

It is harmonised as follows:

Table S1: Harmonisation process of age_years.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column age of positve integer values. 0 is used to indicate unknown values.

Value of 0 in age will be changed to NA. age_years will take the values of age.

Cohort B

Column Age of positve integer values

age_years will take the values of Age.

2 Sex

sex is the harmonised data field to denote the sex of the patient during the time of the CT scan.

It holds the following values:

Table S2: Harmonised values of sex.

Value

Description

0

female

1

male

-1

unknown

It is harmonised as follows:

Table S3: Harmonisation process of sex.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column sex with

F as female.

M as male.

Change the values of sex as follows:

F as 0.

M as 1.

Cohort B

Column Sex with

Female as female.

Male as male.

Map the values of Sex to sex as follows:

Female as 0.

Male as 1.

3 Height, Weight, BMI and BSA

height is the harmonised positive real data field to denote the height in cm of the patient during the time of the CT scan.

weight is the harmonised positive real data field to denote the weight in kg of the patient during the time of the CT scan.

bsa_m2 is the harmonised positive real data field to denote the body surface area in m2 of the patient during the time of the CT scan.

bmi is the harmonised positive real data field to denote the body mass index of the patient during the time of the CT scan.

All values are converted to two decimal places if the number of decimal places exceeded two.

They are harmonised as follows:

Table S4: Harmonisation process of height_cm, weight_kg, bsa_m2 and bmi.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column height in cm of positve real numeric values in one decimal place.

Column weight in kg of positve real numeric values in one decimal place.

height_cm will take the values of height.

weight_kg will take the values of weight.

bsa_m2 and bmi are calculated using data fields height_cm and weight_kg. All values are then converted to two decimal places.

Cohort B

Column Height in cm of positve integer values.

Column Weight in kg of positve integer values.

height_cm will take the values of Height.

weight_kg will take the values of Weight.

bsa_m2 and bmi are calculated using data fields height_cm and weight_kg. All values are then converted to two decimal places.

4 Smoking History

smoke_current is the harmonised data field to denote if the patient is a current smoker during the time of the CT scan. smoke_past is the harmonised data field to denote if the patient is a past smoker during the time of the CT scan.

They hold the following values:

Table S5: Harmonised values of smoke_current and smoke_past.

Value

Description

0

no

1

yes

-1

unknown

They are harmonised as follows:

Table S6: Harmonised process of smoke_current and smoke_past.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column smoke_current_good with

0 as no.

1 as yes.

-1 as unknown.

Column smoke_past_good with

0 as no.

1 as yes.

-1 as unknown.

smoke_current will take the values of smoke_current_good.

smoke_past will take the values of smoke_past_good.

Cohort B

Column Smoke History with

non-smoker as non-smoker.

past smoker as a past smoker.

current smoker as a current smoker.

NA as unknown.

Map the values of Smoke History to smoke_current as follows:

non-smoker and past smoker as 0.

current smoker as 1.

NA as -1.

Map the values of Smoke History to smoke_past as follows:

non-smoker and current smoker as 0.

past smoker as 1.

NA as -1.

After harmonisation, we validate the values of smoke_current and smoke_past to ensure that there can only be the following cases:

Table S7: Valid values of smoke_current and smoke_past.

Description

smoke_current

smoke_past

Non-smoker

0

0

Past smoker

0

1

Current smoker

1

0

Unknown

-1

-1

5 Have Shortness of Breath

have_sob is the harmonised data field to denote if the patient has shortness of breath during the time of the CT scan.

It holds the following values:

Table S8: Harmonised values of have_sob.

Value

Description

0

no

1

yes

-1

unknown

have_sob is harmonised as follows:

Table S9: Harmonised process of have_sob.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column have_sob with

0 as no.

1 as yes.

have_sob remains unchanged.

Cohort B

Column Dyspnea with

no as no.

yes as yes.

Map the values of Dyspnea to have_sob as follows:

no as 0.

yes as 1.

6 Have Chest Pain

have_chest_pain is the harmonised data field to denote if the patient has chest pain during the time of the CT scan.

It holds the following values:

Table S10: Harmonised values of have_chest_pain.

Value

Description

0

no

1

yes

-1

unknown

have_chest_pain is harmonised as follows:

Table S11: Harmonised process of have_chest_pain.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column chest_pain_type with

0 as no chest pain.

1 as typical chest pain.

2 as atypcial chest pain.

3 as nonanginal chest pain.

Map the values of chest_pain_type to have_chest_pain as follows:

If chest_pain_type has a value of 1, 2 or 3, have_chest_pain will be 1.

Else, if chest_pain_type has a value of 0, have_chest_pain will be 0.

Cohort B

Column Chest Pain Character with

no chest pain as no chest pain.

typical as typical chest pain.

atypcial as atypcial chest pain.

nonanginal as nonanginal chest pain.

Map the values of Chest Pain Character to have_chest_pain as follows:

If chest_pain_type has a value of typical, atypical or nonanginal, have_chest_pain will be 1.

Else, if chest_pain_type has a value of no chest pain, have_chest_pain will be 0.

7 Symptoms

symptoms is the harmonised data field to denote the patient’s symptoms during the time of the CT scan.

It holds the following values:

Table S12: Harmonised values of symptoms.

Value

Description

0

asymptomatic

1

chest pain

2

only dyspnea

3

others

-1

unknown

Regarding the symptoms: chest pain, dypsnea and other symptoms:

  • If a patient has all three symptoms, chest pain will take the highest priority. Hence, symptoms = 1
  • If a patient has both dyspnea and other symptoms (not chest pain related), dyspnea will take the higher priority. Hence, symptoms = 2

The general approach is to assume that the patients are asymptomatic (symptoms = 0) unless indicated that they have chest pain (symptoms = 1), dypsnea (symptoms = 2), other symptoms like heart palpitations (symptoms = 3) or all symptom related data fields are missing (symptoms = -1).

symptoms is harmonised as follows:

Table S13: Harmonised process of symptoms.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column have_sob with

0 as no.

1 as yes.

Column chest_pain_type with

0 as no chest pain.

1 as typical chest pain.

2 as atypcial chest pain.

3 as nonanginal chest pain.

Map the values of chest_pain_type and have_sob to symptoms as follows:

If chest_pain_type has a value of 1, 2 or 3, symptoms will be 1.

Else, if chest_pain_type has a value of 0 and have_sob has a value of 1, symptoms will be 2.

Else, symptoms will be 0.

Cohort B

Column Dyspnea with

no as no.

yes as yes.

Column Chest Pain Character with

no chest pain as no chest pain.

typical as typical chest pain.

atypcial as atypcial chest pain.

nonanginal as nonanginal chest pain.

Map the values of Chest Pain Character and Dyspnea to symptoms as follows:

If chest_pain_type has a value of typical, atypical or nonanginal, symptoms will be 1.

Else, if chest_pain_type has a value of no chest pain and Dyspnea has a value of yes, symptoms will be 2.

Else, symptoms will be 0.

8 Chest Pain Type

chest_pain_type is the harmonised data field to denote the patient’s chest pain type during the time of the CT scan.

It holds the following values:

Table S14: Harmonised values of chest_pain_type.

Value

Description

0

no symptoms

1

typical

2

atypical

3

nonanginal

4

dyspnea

-1

unknown

Regarding the symptoms: chest pain, dypsnea and other symptoms:

  • If a patient has both chest pain (typical, atypical or nonanginal) and dyspnea, chest pain will take the higher priority. Hence, chest_pain_type will be either 1, 2 or 3
  • If a patient has both dyspnea and other symptoms (not chest pain related), dyspnea will take the higher priority. Hence, chest_pain_type will be 4.
  • If a patient has other symptoms that are neither chest pain nor dyspnea, like heart palpitations, chest_pain_type will be -1.

The general approach is to assume that the patients are asymptomatic (chest_pain_type = 0) unless indicated that they have a specific type of chest pain (chest_pain_type = 1, 2 or 3), dypsnea (chest_pain_type = 4), other symptoms like heart palpitations (chest_pain_type = -1) or all symptom related data fields are missing (chest_pain_typed = -1).

chest_pain_type is harmonised as follows:

Table S15: Harmonised process of chest_pain_type.

Cohort ID

Original Response

Harmonisation Response

Cohort A

Column have_sob with

0 as no.

1 as yes.

Column chest_pain_type with

0 as no chest pain.

1 as typical chest pain.

2 as atypcial chest pain.

3 as nonanginal chest pain.

Map the values of chest_pain_type and have_sob to chest_pain_type as follows:

If chest_pain_type has a value of 1, chest_pain_type will be 1.

Else, if chest_pain_type has a value of 2, chest_pain_type will be 2.

Else, if chest_pain_type has a value of 3, chest_pain_type will be 3.

Else, if chest_pain_type has a value of 0 and have_sob has a value of 1, chest_pain_type will be 4.

Else, chest_pain_type will be 0.

Cohort B

Column Dyspnea with

no as no.

yes as yes.

Column Chest Pain Character with

no chest pain as no chest pain.

typical as typical chest pain.

atypcial as atypcial chest pain.

nonanginal as nonanginal chest pain.

Map the values of Chest Pain Character and Dyspnea to chest_pain_type as follows:

If Chest Pain Character has a value of typical, chest_pain_type will be 1.

Else, if Chest Pain Character has a value of atypical, chest_pain_type will be 2.

Else, if Chest Pain Character has a value of nonanginal, chest_pain_type will be 3.

Else, if Chest Pain Character has a value of no chest pain and Dyspnea has a value of yes, chest_pain_type will be 4.

Else, chest_pain_type will be 0.

After harmonisation, we validate the values of chest_pain_type and symptoms to ensure that there can only be the following cases:

Table S16: Valid values of symptoms and chest_pain_type.

Description

symptoms

chest_pain_type

Asymptomatic

0

0

Have chest pain

1

1, 2 or 3

Only dypsnea

2

4

Other symptoms

2

-1

Unknown

-1

-1