[Data Mining] Introduction to Data Mining -1-

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

incastle의 콩나물

[Data Mining] Introduction to Data Mining -1- 본문

19-1 대학 수업/데이터마이닝

[Data Mining] Introduction to Data Mining -1-

incastle 2019. 4. 26. 21:56

데이터 마이닝 진창호 교수님 (19-1)

Why Mine Data? Commercial Viewpoint

- 많은 양의 데이터가 수집되고 저장되고 있다.

- 컴퓨터의 가격이 갈수록 저렴하고 컴퓨팅 파워가 강력해지고 있다.

- competitive pressure is strong(더 잘 제공하자! 커스터마이즈 하자!)

Why Mine Data? Scientific Viewpoint

- 데이터가 더 빨리, 많이 수집되고 있다.

- 기존의 기술로는 raw data 처리가 힘들다.

Data, information, Knowledge

- Data : 관측되는 raw fact들

- Information : raw data로부터 얻어지는 유의미한 패턴

>> 자주 나오는 이야기 : 전화번호는 Information이지만 분석의 대상이 아님

>> 가장 큰 이유는 implicit 하지 않기 때문이다.

- knowledge : information을 일반화한 것

Data Mining 이란?

- data로부터 Non-trivial(사소하지 않은), implicit(내재적), unknown, potentially 하면서 useful 한 정보를 추출하는 것

ex) 검색 결과를 통해 유사한 웹페이지를 그룹핑한다. (0)

웹사이트에 '아마존'이라고 검색을 한다. (X)

전처리(preprocessing)

- data mining에 더 적합하도록 data를 가공하는 것

- 여러 예시가 있음 => 뒤에서 자세히 살펴보도록 하자.

후처리(postprocessing)

- for integrating data mining results into decision support systems

- 해석하면 DM의 결과물을 의사결정에 사용하기 위해서 적용하는 것?

DM말고 기존의 방법으로 왜 안되나?

- 데이터 양이 너무 많아짐

- 데이터 차원이 너무 많아짐

- distributed nature of data

>> multiple source로부터 분석한 결과물을 어떻게 통합할 것인가?

>> 데이터 보안(흩어져 있으니까 보안이 취약해진다?라는 뜻인 거 같음, 핸드폰 데이터 수집?)

>> how to reduce the amount of communication needed to perform the distributed computation

Challenges of data mining(위에 놈 추가)

- Scalability

>> data의 양이 확장되도 알고리즘이 잘 돌아가느냐?

>> but 너무 많이 생각할 필요는 없음. 불필요한 resource 투입은 안됨

>> 미래의 문제

- Dimensionality

>> 현재의 문제

- Complex and Heterogeneous Data : attribute가 다양한 type임

- Data Quality

>> outlier, missing 이런 것

>> format이 다른 건 data quality의 문제 x => 전처리의 문제

- Data Ownership and Distribution : not stored in one location

>> 데이터 사용 허락 받는 건 어려움

>> 다른 source에서 받아서 분석한 데이터를 어떻게 통합하는가?

>> not stored in one location or not owned by one organization

- Privacy Preservation

>> 어떤 데이터는 개인정보 때문에 수집 안됨

- Streaming data

>> capability of dealing with a large amount of data at real time

Prediction Methods

- 가정 : 미래는 과거랑 비슷할 거야 => 그 pattern이 유지가 안되면 예측이 불가능하다.

Description Methods

- Derive patterns that summarize the underlying relationship in data

- Find human-interpretable patterns that describe the data

- 예측 결과를 바탕으로 역으로 attribute들을 살펴보면서 설명을 하는 것

느낌 : Challenges of data mining 예시 적기 나올 거 같음! 미리 예제 만들어 놓자

'19-1 대학 수업 > 데이터마이닝' 카테고리의 다른 글

[Data Mining] Classification : Basic Concepts (0)	2019.05.02
[Data Mining] Introduction to Data Mining -4- (Data Preprocess, 전처리) (0)	2019.05.02
[Data Mining] Introduction to Data Mining -3- (Data Quality) (0)	2019.05.01
[Data Mining] Introduction to Data Mining -2- (Data란?) (0)	2019.05.01

'19-1 대학 수업/데이터마이닝' Related Articles

Comments

incastle의 콩나물

[Data Mining] Introduction to Data Mining -1- 본문

[Data Mining] Introduction to Data Mining -1-

'19-1 대학 수업 > 데이터마이닝' 카테고리의 다른 글

티스토리툴바