ImageFolder [폐렴 분류해보기]

KAU 2021. 2. 18. 18:09

오랜만에 즐거운 machine learning 시간

데이터 셋이 필요하기 때문에

Kaggle에서 X-ray 데이터 셋을 다운로드 받아옵니다.

www.kaggle.com/paultimothymooney/chest-xray-pneumonia

Chest X-Ray Images (Pneumonia)

5,863 images, 2 categories

www.kaggle.com

우한 폐렴 데이터도 준비했습니다.

우리는 colab을 사용할것이기 때문에 데이터 셋을 구글 드라이브에 올려주도록 합시다.

실습 드가자 드가자

import torchvision
from torchvision import transforms

from torch.utils.data import DataLoader

토치 비전 라이브러리와 데이터 로더 라이브러리를 임포트 해주도록 합시다.

from matplotlib.pyplot import imshow
%matplotlib inline

시각화 라이브러리인 matplot 라이브러리도 임포트해줍시다.

코랩에서도 기본적으로 리눅스 명령어를 사용하여 경로를 이동할 수 있습니다.

trans = transforms.Compose([
    transforms.Resize((64,128))
])

train_data = torchvision.datasets.ImageFolder(root='custom_data/origin_data', transform=trans)

64x128로 데이터셋을 변환시켜줍시다.

for num, value in enumerate(train_data):
    data, label = value
    print(num, data, label)
    
    if(label == 0):
        data.save('custom_data/train_data/gray/%d_%d.jpeg'%(num, label))
    else:
        data.save('custom_data/train_data/red/%d_%d.jpeg'%(num, label))

원래는 jpeg로 확장자를 변경해 줄 수 있다.

CNN을 이용해서 학습시켜보도록 합시다

각종 라이브러리를 임포트 시켜주도록 합시다.

import torch
import torch.nn as nn
import torch.nn.functional as F

import torch.optim as optim
from torch.utils.data import DataLoader

import torchvision
import torchvision.transforms as transforms

device = 'cuda' if torch.cuda.is_available() else 'cpu'

torch.manual_seed(777)
if device =='cuda':
    torch.cuda.manual_seed_all(777)

data_loader = DataLoader(dataset = train_data, batch_size = 8, shuffle = True, num_workers=2)

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3,6,5),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(6,16,5),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.layer3 = nn.Sequential(
            nn.Linear(16*13*29, 120),
            nn.ReLU(),
            nn.Linear(120,2)
        )
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.shape[0], -1)
        out = self.layer3(out)
        return out

CNN 구조 정의가 궁금하시다면 다음 게시글을 참고해주세요

metar.tistory.com/entry/CNNConvolutional-Neural-Network

CNN[Convolutional Neural Network]

convolution이란 무엇인가? •2D Convolution •주어진 filter로 입력 영상에 Convolution하여 출력 영상을 얻어내는 과정 •딥러닝에서는 해당 filter를 하나의 ‘가중치’로 보고 학습시키는 대상이 됨.

metar.tistory.com

optimizer = optim.Adam(net.parameters(), lr=0.00005)
loss_func = nn.CrossEntropyLoss().to(device)

원래 데이터 삭제하고 라벨된 데이터만 남겨야합니다.

(처음부터 다른 폴더에 저장하면 되는데..)

torch.save(net.state_dict(), "/content/drive/MyDrive/Colab Notebooks/xray/model/model.pth")

new_net = CNN().to(device)

new_net.load_state_dict(torch.load('/content/drive/MyDrive/Colab Notebooks/xray/model/model.pth'))

print(net.layer1[0])
print(new_net.layer1[0])

print(net.layer1[0].weight[0][0][0])
print(new_net.layer1[0].weight[0][0][0])

net.layer1[0].weight[0] == new_net.layer1[0].weight[0]

테스트 드가자 드가자~

아마도 간단한 CNN으로 학습시켜서 분류하기 힘든것 아닌지..

아니면 데이터셋이 부족한것일 수도 있다.

(원래 폐렴 데이터가 5000장인데 200장만 사용했습니다.)

'ML' 카테고리의 다른 글

Resnet 리뷰 (0)	2021.02.25
CNN[Convolutional Neural Network] (0)	2021.02.11
Restricted Boltzmann Machine(RBM) (0)	2021.01.30
로지스틱 회귀(Logistic regression) (0)	2021.01.15
SqueezeNet [모델 압축] 논문 리뷰&구현 [Matlab] (0)	2020.11.17

현재글ImageFolder [폐렴 분류해보기]

Aero-Machine Learning

반도체공학,딥러닝,기초수학,플라즈마,프로그래밍,RF system 그리고 수치해석에 대해서 탐구합니다. 현재는 네덜란드계 반도체 장비회사에서 하드웨어 엔지니어로 근무 중입니다. This blog explores semiconductor engineering, deep learning,rf system and basic mathematics.

https://news.skhynix.co.kr/853, https://news.skhynix.co.kr/1776, https://news.skhynix.co.kr/1839, 박성진 유한요소법 강의 정리, https://ynebula.tistory.com/22, https://bi.snu.ac.kr/Courses/ML2016/ML2016.html, https://hwiyong.tistory.com/324, https://www.samsungsemiconstory.com/1966, https://news.skhynix.co.kr/1773?category=1067703, https://angeloyeo.github.io/2019/08/01/SVD.html, 텐서, https://www.youtube.com/watch?v=KofAX-K4dk4&list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv&index=12, c++, https://universics.tistory.com/45?category=467099,

Today :
Yesterday :

Aero-Machine Learning