Abstract: Vision-language foundational models have achieved commendable results on related tasks. However, their application to medical tasks is still limited due to issues arising from data biases.