Determining the severity of ischemic stroke in non-contrast CT is a difficult problem due to a low signal to noise ratio. This leads to variable interpretation of ischemic stroke severity. We investigate the level of agreement between four methods including the use of an automated system with the aim of identifying early ischemic changes within the brain. For the evaluation we divide the middle cerebral artery territory of each hemisphere into ten regions defined according to the Alberta Stroke Programme Early CT Score (ASPECTS). The automatic system uses a specialised Convolutional Neural Network (CNN) based regressor to produce voxel-level confidence masks of which voxels are suspected as showing early ischemic change and from this we compute the score. Additionally, we obtain the score from three other methods that involved trained human graders. We compare the level of agreement between these methods at both a patient level and a territory level through Simultaneous Truth and Performance Level Estimation (STAPLE) and Cohen’s kappa coefficient. We analyse possible causes of disagreement between the methods and statistically validate the performance of the CNN model against the performance of clinical staff. We find that the CNN produces scores that correlate the greatest with its training data at the patient level, but the training data could be improved to strengthen the correlation with the professional standard.