Winter is coming !!: 2011

2011년 11월 21일 월요일

IT 용어 정의

[ 솔루션 부문 ]

EAI (enterprise application integration)
기업 내 상호 연관된 모든 애플리케이션을 유기적으로 연동하여 필요한 정보를 중앙 집중적으로 통합, 관리, 사용할 수 있는 환경을 구현하는 것으로 e-비즈니스를 위한 기본 인프라. 기존의 점 대 점 인터페이스(Point-to-Point Interface)에서는 애플리케이션 수의 실질적 한계와 유지 보수의 어려움 및 애플리케이션 추가 시 방대한 비용 및 시간 손실이 있었으나 EAI를 도입한 인터페이스에서는 새로운 애플리케이션 도입 시 어댑터(Adapter)만 필요한 손쉬운 확장성이 보장된다.
[출처] 네이버 지식사전

SSO(single sign-on)

단일 인증으로 한 시스템에 사용자가 인증한 이후 접근 권한이 있는 연관 시스템에 자동으로 인증을 처리함으로써 사용자로 하여금 비밀번호관리에 부담을 줄여주고, 재인증으로 인한 서비스 중단을 없애는 장점이 있고, SSO를 도입한 기업 입장에서는 인증관리에 필요한 IT Cost를 줄이고 시스템 전체 수준의 보안을 강화할 수 있으며, 인증과 계정 정보의 중앙관리가 가능하게 됩니다.

[출처] Single sing on(SSO) 알아보기Ⅰ(개념 및 도입시기) |작성자 tomatosoft

SLO(Single Log On)

SSO vs SLO :: 둘다 한번의 인증으로 시스템을 모두 사용하는 개념.

Single Sign On의 경우에는 계정정보가 시스템별로 존재하고 공통인증모듈 하나만 존재하는 반면, Sing LogOn는 계정정보가 하나의 시스템에 존재하고, 각각의 시스템에는 서로 별도의 인증부분이 존재.

예) daum.net 로그인시 naver.com에도 로그인되게 SSO

mail.naver.com 로그인시 blog.naver.com 에도 로그인되게 SLO

[출처] 블로그

엔터프라이즈포털 (Enterprise Portal)
기업의 분산된 전산 시스템과 인사 재무 등 경영정보를 인터넷 환경으로 묶어 언제 어디서나 필요한 업무를 개인화된 단일 화면으로 수행할 수 있도록 하기 위해 도입된 시스템을 말한다.
포털이 제공하는 기능을 중심으로 기업 정보 포털(Enterprise Information Portal, EIP), 지식 기반 기업 포털(Knowledge based Enterprise Portal, KEP), 기업 응용 포털(Enterprise Application Portal, EAP)로 구분된다.
포털이 제공하는 기능으로는 보안(Security), 개인화(Personalization), 협업(Collaboration), 어플리케이션 통합(Application Integration), 통일된 검색(Unified Search), 카테고리 기능(Categorization), 개방성(Openness) 등이 있다.
[출처] 위키백과 | 네이트 지식 | 네이버 백과사전

X-Internet

포레스트 리서치에서 정의.

'수행 가능한 인터넷 (eXeutable Internet)' + '확장된 인터넷(eXtended Internet)' 개념

실행성(eXecutable)과 확장성(eXtended)이 뛰어난 웹 다음의 인터넷을 지칭하는 말로, 웹 아키텍처와 클라이언트/서버 아키텍처의 한계성을 극복하고 장점만을 수용한 새로운 인터넷 아키텍처를 의미.

그러나 클라이언트에서 실행되는 X-Internet 제품은 표준 기술이 아니며, 브라우저에 embedded 되어 구동되므로 브라우저에 종속되는 한계가 있음.

국내에 사용 제품으로는 MiPlatform, TrustForm 이 대표적이며, ActiveX 기반으로 구현되어 있어 ActiveX 특유의 제품호환 및 버전관리, 디버깅 등의 문제가 많음.

[출처] 네이버 검색 및 자필

EDMS (Electronic Document Management System)

문서파일의 작성부터 소멸될 때까지의 모든 과정을 관리하는 시스템이다.

'종이 없는 사무실(Paperless Office)'을 실현하여 생산성을 높일 수 있는 수단으로 각광받고 있다.

기업에서 EDMS를 적용하면 워크그룹 내에서 환경에 상관없이 다양한 문서를 공유할 수 있게 된다.

국내에서 활동하고 있는 EDMS업체로 이스트만 소프트웨어 등 외국계 업체와 키스톤 테크놀로지, 트라이튼테크, 사이버다임 등 국내업체들이 있다.

데이터의 중복 최소화, 일관성 유지, 문서관리 표준화 등의 장점이 있으나, 데이터 집중에 따른 수행속도 저하, 멀티미디어 자료에 대한 Indexing 기능 미흡(Meta Data를 이용한 검색 지원) 등의 단점이 있다.

[출처] 네이버 지식사전

BPM (business process management)

조직내의 업무 처리 생산성을 분석한 결과, 업무를 처리하기 위해 소요하는 전체 시간 중에 단지 10%만이 업무 자체에 소요되고 나머지 90%의 시간은 업무간의 연계와 전이에 소요된다고 한다.

(BPMi.org에 따르면) 비즈니스 프로세스(business process)란 기업이 타 기업과 개인 고객을 포함한 모든 파트너에게 가치(서비스, 제품 등)를 제공하기 위해 순차적이거나 동시적으로 발생할 수 있는 모든 연관된 업무들의 모음을 의미.

따라서, 기업 내외와 고객 사이에서 일어날 수 있는 비즈니스 프로세스를 제대로 구축하고 관리하기 위한 관련 기술의 집합체를 의미.

즉, BPM은 비즈니스 프로세스를 시스템 차원에서 관리함으로써 궁극적으로 프로세스에 대한 자동화와 생산성 및 효율성의 확대, 프로세스에 대한 지식을 축적하고 분석·개선해 나갈 수 있도록 하는 목적을 가지게 된다.

BPMS는 이러한 BPM을 기술적으로 가능하게 하는 소프트웨어 시스템 혹은 플랫폼을 지칭한다.

@,,@) 전사적자원관리(ERP), 고객관계관리(CRM), 경영정보시스템(MIS) 등 각각 처리되던 프로세스를 BPMS 시스템에 연계하여 효율적으로 관리(누가, 어느부서가, 어떤 선처리 후에 또는 어떤 후처리 후에..등등)하게 해주는 비즈니스 프로세스 관리 시스템

IM(identity management)

한번의 로그인으로 다양한 시스템 혹은 인터넷 서비스를 사용할 수 있게 해주는 통합 인증(SSO)과 애플리케이션의 접근 권한 중심의 솔루션인 엑스트라넷 접근 관리(EAM)를 보다 포괄적으로 확장한 보안 솔루션. 업무 프로세스를 정의하고 관리하는 인프라로서 업무 효율성, 생산성, 보안성의 극대화를 통해 확실한 이익 창출을 보장하는 비즈니스 툴이다. 조직이 필요로 하는 보안 정책을 수립하고 자동으로 사용자에게 계정을 만들어 주며, 사용자는 자신의 위치와 직무에 따라 적절한 계정 및 권한을 부여받을 수 있을 뿐만 아니라 사용자 정보의 변경과 삭제도 실시간으로 반영할 수 있다. 또한 직원이 입사하거나 퇴직할 경우에도 모든 애플리케이션에 대한 해당 직원의 계정을 자동으로 생성 및 삭제한다. 계정을 부여받은 사용자가 시스템에 접속할 때도 그에 대한 인증과 인가를 처리하고 데이터들을 저장한다. 저장된 감사 기록은 향후 기업의 보안 문제를 효율적으로 대처할 수 있게 한다.

@,,@) 그냥 계정관리솔루션, 중앙에서 계정을 통폐합하여 서브시스템들 에게 계정정보를 실시간으로 동기화 해주는 솔루션.

[출처] 네이버 지식사전

[ 시스템 부문 ]

DR(disaster recovery - 재해복구)

재해복구시스템은 천재지변이나 테러 같은 참사에도 데이터를 보존하고 자동 복구하는 장치로, 회사 외부의 공간에 데이터보관을 위한 공간을 마련해 두는 방식이 대표적이다. 원격지에 별도의 전산센터를 세워 시스템. 데이터 등 정보자산을 보호하고 재해가 발생하면 즉각 주(主)전산센터를 대체, 기업의 경영활동이 계속될 수 있도록 하는 체제다.

[출처] 네이버 지식사전

RAC(Oracle Real Application Cluster)

RAC는 오라클의 디스크 클러스터링 기술 또는 옵션 소프트웨어 이며, 2001년 Oracle9i와 함께 소개되었다.

RAC는 다수의 컴퓨터(DB 인스턴스)가 동일(또는 단일) 데이터베이스에 접근하면서 오라클 RDBMS 소프트웨어를 구동하여 clustered database를 제공하게 한다.

비 RAC 오라클 데이터베이스에서는 하나의 인스턴스에서 하나 데이터베이스만 접근할 수 있다.

데이터베이스는 데이터 파일, 컨트롤 파일, 디스크에 위치한 redologs의 집합이다.

인스턴스는 오라클 관련 메모리의 집합(collection)을 구성하고, 운영 체제는 컴퓨터 시스템에서 데이터베이스가 구동하는 것을 처리한다.

이렇게 RAC를 구성함으로서 fault tolerance(고장 허용), load balancing(로그 밸런싱), scalability(확장성) 을 목적을 달성하고자 한다.

[출처] http://driden.tistory.com/23

SAN(Storage Area Network)

서버와 저장 장치를 연결하는 Fibre Channel 기반 네트워크.

저장 장치가 서버가 아닌 네트워크에 연결되지만 네트워크의 모든 서버에서 저장 장치를 이용할 수 있습니다.

일반적으로 파이버 채널, 범용 고속 전송을 통해 서버 또는 워크스테이션과 장치를 연결하는 네트워크.

SAN(storage area network) 모델은 저장소를 자체 전용 네트워크로 옮기고, 서버-디스크 SCSI 버스와 기본 사용자 네트워크로부터 데이터 저장소를 제거하는 역할을 합니다.

SAN에는 LAN 사용자와의 인터페이스 지점을 제공하는 하나 이상의 호스트와 하나 이상의 패브릭 스위치(대규모 SAN의 경우), 그리고 많은 수의 저장 장치를 지원하기 위한 SAN 허브가 포함됩니다.

SAN의 모든 서버에 표시되는 공유 저장소가 포함된 네트워크.

정보기술(IT)이 급속히 발전하면서 기업들의 가장 큰 고민 가운데 하나는 많은 데이터를 어떻게 효율적으로 저장할 수 있는가 하는 것이었다.

기존 저장 방법은 장비에 스토리지를 붙여서 쓰는 DAS(direct attached storage:직접연결스토리지)를 이용하였으나, 저장할 데이터와 늘어나는 데이터가 한 공간에 존재하므로 데이터의 전송 속도가 떨어지는 단점이 있다.

SAN은 이러한 단점을 극복하기 위한 목적으로 1990년대 말부터 개발되기 시작해 채 몇 년도 안 되어 새로운 데이터 저장기법으로 떠올랐다.

서로 다른 종류의 저장장치들이 함께 연결되어 있어 모든 사용자들이 공유할 수 있을 뿐 아니라, 백업·복원·영구보관·검색 등이 가능하고, 한 저장장치에서 다른 저장장치로 데이터를 이동시킬 수 있다는 장점이 있다.

SAN 외에 별도로 랜이나 네트워크를 구성해 저장 데이터를 관리하는 방법으로 NAS(network attached storage:네트워크연결스토리지) 등이 있지만, 2002년 현재 SAN 기법이 보편화되어 시장의 50% 이상을 차지하고 있다.

더욱이 갈수록 대형화하면서 고성장세를 보이고 있어, 2006년에는 세계 총시장 규모가 430억 달러에 달할 것으로 추정된다.

[출처] 네이버 백과사전 | 시만텍 용어집 | 네이버 블로그

RAID (redundant array of independent [또는 inexpensive] disks)
RAID[레이드]는 중요한 데이터를 가지고 있는 서버에 주로 사용되며, 여러 대의 하드디스크가 있을 때 동일한 데이터를 다른 위치에 중복해서 저장하는 방법이다. 데이터를 여러 대의 디스크에 저장함에 따라 입출력 작업이 균형을 이루며 겹치게 되어 전체적인 성능이 개선된다. 여러 대의 디스크는 MTBF를 증가시키기 때문에 데이터를 중복해서 저장하면 고장에 대비하는 능력도 향상된다.
하나의 RAID는 운영체계에게 논리적으로는 하나의 하드디스크로 인식된다. RAID는 스트라이핑 기술을 채용하여 각 드라이브의 저장공간을 1 섹터(512 바이트)의 크기에서부터 수 MB에 이르는 공간까지 다양한 범위로 파티션할 수 있다. 모든 디스크의 스트라이프는 인터리브되어 있으며, 차례대로 어드레싱된다.
의료 및 기타 과학분야의 사진 등 대형 레코드가 저장된 단일 사용자용 시스템에서, 스트라이프는 으레 512 바이트 정도의 적은 량으로 설정되는데, 이를 통하여 하나의 레코드가 모든 디스크들에 걸쳐있게 되고, 또 모든 디스크를 동시에 읽음으로써 빠르게 접근할 수 있게된다.
다중 사용자시스템에서는 최대크기의 레코드를 넣을 수 있을 정도로 충분히 넓은 스트라이프를 확보함으로써 더 나은 성능을 발휘하게 되는데, 이는 드라이브간의 디스크 입출력을 중첩시켜준다.
RAID에는 중복되지 않는 어레이인 RAID-0를 제외하더라도, 9가지 형태가 더 있다.

*RAID-0 : 이 방식은 스트라이프를 가지고는 있지만 데이터를 중복해서 기록하지 않는다. 따라서, 가장 높은 성능을 기대할 수 있지만, 고장대비 능력이 전혀 없으므로 이 방식은 진정한 RAID라고 하기 어렵다.
*RAID-1 : 이 형식은 흔히 디스크 미러링이라고도 하는데, 중복 저장된 데이터를 가진 적어도 두 개의 드라이브로 구성된다. 스트라이프는 없으며, 각 드라이브를 동시에 읽을 수 있으므로 읽기 성능은 향상된다. 쓰기 성능은 단일 디스크 드라이브의 경우와 정확히 같다. RAID-1은 다중 사용자 시스템에서 최고의 성능과 최고의 고장대비 능력을 발휘한다.
*RAID-2 : 이 형식은 디스크들간에 스트라이프를 사용하며, 몇몇 디스크들은 에러를 감지하고 수정하는데 사용되는 ECC 정보가 저장되어 있다. 이 방식은 RAID-3에 비해 장점이 없다.
*RAID-3 : 이 형식은 스트라이프를 사용하며, 패리티 정보를 저장하기 위해 별도의 드라이브 한 개를 쓴다. 내장된 ECC 정보가 에러를 감지하는데 사용된다. 데이터 복구는 다른 드라이브에 기록된 정보의 XOR를 계산하여 수행된다. 입출력 작업이 동시에 모든 드라이브에 대해 이루어지므로, RAID-3은 입출력을 겹치게 할 수 없다. 이런 이유로 RAID-3는 대형 레코드가 많이 사용되는 업무에서 단일 사용자시스템에 적합하다.
*RAID-4 : 이 형식은 대형 스트라이프를 사용하며, 이는 사용자가 어떤 단일 드라이브로부터라도 레코드를 읽을 수 있다는 것을 의미한다. 이것은 데이터를 읽을 때 중첩 입출력의 장점을 취할 수 있도록 한다. 모든 쓰기 작업은 패리티 드라이브를 갱신해야하므로, 입출력의 중첩은 불가능하다. RAID-4는 RAID-5에 비해 장점이 없다.
*RAID-5 : 이 형식은 회전식 패리티 어레이를 포함한다. 그러므로 RAID-4에서의 쓰기 제한을 주소 지정한다. 그러므로 모든 읽기/쓰기 동작은 중첩될 수 있다. RAID-5는 패리티 정보를 저장하지만 데이터를 중복저장하지는 않는다 (그러나 패리티 정보는 데이터를 재구성하는데 사용될 수 있다). RAID-5는 보통 3 ~ 5개의 디스크를 어레이로 요구한다. RAID-5는 성능이 그리 중요하지 않고 쓰기 작업이 많지 않은 다중 사용자시스템에 적합하다.
*RAID-6 : 이 형식은 RAID-5와 비슷하지만, 다른 드라이브들 간에 분포되어 있는 2차 패리티 구성을 포함함으로써 매우 높은 고장대비 능력을 제공한다. 현재로서는 RAID-6의 상용 모델은 거의 없다.
*RAID-7 : 이 형식은 컨트롤러로서 내장되어 있는 실시간 운영체계를 사용하며, 속도가 빠른 버스를 통한 캐시, 독자적인 컴퓨터의 여러 가지 특성들을 포함하고 있다. 현재 단 하나의 업체만이 이 시스템을 제공한다.
*RAID-10 : 이 형식은 각 스트라이프는 RAID-1 드라이브 어레이인 스트라이프 어레이를 제공한다. 이 방식은 RAID-1보다 높은 성능을 제공하지만, 값이 더 비싸다.
*RAID-53 : 이 형식은 각 스트라이프는 RAID-3 디스크 에레이인 스트라이프 어레이를 제공한다. 이 방식은 RAID-3보다 높은 성능을 제공하지만, 값이 더 비싸다.
[출처] 텀즈 | 네이버 블로그

2011년 11월 15일 화요일

요구 분석

3.1 소개

분석
어떤 것을 이해할 목적으로 쪼개고 나누어 구성 요소의 특징이나 기능, 관계를 파악하는 것.
분석 내용 기술 방법 - 요구 사항 정의, 기능적 모델링, 정적 모델링, 동적 모델링

분석 단계의 작업

1) 요구정의
시스템을 위한 요구, 즉 입력, 출력, 처리, 성능, 보안 등을 파악하기 위하여 조사하는 작업
2) 기능적 모델링
정보 시스템의 작업을 사용자와 환경이라는 관점에서 작성한 것.
시스템이 외부에 제공하는 기능을 사용 사례라는 단위로 파악
비즈니스 프로세스를 액티비티 단위로 나타내는 과정
3) 정적 모델링
비즈니스 프로세스를 지원할 자료와 이를 이요하는 오퍼레이션 묶음(클래스)의 구조에 대하여 표현
클래스 관계를 나타내는 클래스 다이어그램
클래스의 역활을 찾아내기 위한 CRC(Class-Responsibility-Collaboration) 카드를 사용한다.
4) 동적 모델링
비즈니스 프로세스를 지원하는 시스템 내부의 동적인 측면을 나타낸다.
인터랜션 다이어그램과 상태 다이어그램 등

분석단계는 시스템에 대한 자세한 제안을 사용자, 관리자, 그 밖의 관련자가 참석한 회의에서 발표함으로써 끝난다. 이러한 회의를 워크스루(walk-through) 라고 하는데 새로운 시스템의 개념적 내용을 의사결정 관련자들에게 발표함으로써 명확히 이해시키고 어떤 개선이 필요한지 알게 하고 프로젝트를 계속 진행할 것인지 판단하게 한다.

3.2 요구 정의

요구(Requirement)

시스템이 무엇을 하여야 하는지 또는 어떤 특성을 가져야 하는지를 기술한 문장.

비즈니스 관점에서 시스템이 무엇을 하느냐에 초점을 맞추어 작성.

분석단계 요구 - 비즈니스 요구, 사용자 요구

설계단계 요구 - 시스템 요구(개발자 관점에서 작성)

비즈니스 요구와 시스템 요구

요구는 기능적이거나 기능 이외의 것으로 나누며, 요구 분석 문서에는 기능적 요구와 비기능적 요구 모두 포함한다.

기능적 요구(Functional Requirement)

시스템이 수행해야 할 처리나 가져야 할 정보와 밀접.

예) 상품검색, 장바구니, 상품 주문, 마일리지, 상품 찜, 재고 탐색 등

비기능적 요구(Non Functional Requirement)

시스템이 작동되면서 가져야 할 특성으로 성능이나 사용 용이성, 유지보수, 보안 등

예) 평균 반응 시간, 가용성, 개인 정보 보호, 처리량, 데이터 복구 등

요구 정의

기능적 요구와 비기능적 요구를 개조식으로 나열한 보고서.

요구는 개조식으로 작성하되 숫자를 매겨 각 요구 하나하나가 명확하게 구별되어야 한다.

- 기능적 요구와 비기능적 요구로 구분

- 요구의 타입이나 기능으로 그룹화

- 비즈니스 요구에 따라 우선순위 부여(상, 중, 하 또는 릴리스1, 릴리스2, 릴리스3 등)

요구정의의 우선순위에 따라 요구를 반영하는 시스템의 버전을 다르게 할 수 있음.

이런 방법은 시스템을 점증적으로 개발하지만 요구 사항을 한꺼번에 만들어 넘기는

RAD(Rapid Application Development) 방법을 사용할 때 특히 중요함.

요구를 정의하는 가장 명확한 목적은 분석 단계에 다른 결과물, 즉 프로세스 모델이나 데이터 모델 등을 작성하는데 필요한 정보를 제공하기 위함이며, 가장 중요한 목적은 시스템의 범위를 정하는 것이다.

요구 결정

IT 프로젝트의 실패 원인 중 사용자 참여의 결핍이 흔한 이유 중 하나임. 요구를 정의하기 위하여 먼저 비즈니스나 IT 측면에서 요구 자체를 결정하여야 한다. 경영 관련자와 분석가가 협력하는것이 가장 효과적인 방법임.

요구 정의 작성
기능적 요구와 비기능적 요구를 결정 -> 요구 취합 -> 요구 리스트 정의(우선순위)

3.3 비즈니스 프로세스 분석

비즈니스 프로세스 자동화(BPA - Business Process Automation)
시스템 개발 요청에 나와 있는 비즈니스 요구를 만족시키기 위하여 특정 비즈니스 프로세스에 컴퓨터 기술의 도입이 필요할 때 적용.
문제분석 + 근본 원인 분석

비즈니스 프로세스 개선(BPI - Business Process Improvement)
기술이 제공하는 새로운 기회를 이용하거나 경쟁사를 따라 하기 위하여 운영하는 방법을 적당히 바꾸는 것을 의미. 효율과 효과를 높일 수 있다.
기간 분석 + 작업 비용 분석

비즈니스 프로세스 리엔지니어링(BPR - Business Process Reengineering)현재의 비즈니스 방법을 없애고 새로운 아이디어와 기술을 이용하여 크게 바꾼다.

성과분석 + 기술 분석 + 작업 제거

3.4 요구 취합 방법

요구 취합의 실제
- 분석가는 요구 취합 과정에서 파생되는 중요한 파생 효과가 있다는 것을 인지하여야 한다.
(프로젝트를 위한 지지자 형성 및 프로젝트 팀과 시스템 사용자 간의 신뢰감 구축)
- 분석가는 요구 취합 과정에 누가 참여할 것인지 신중히 결정하여야 한다.
- 참석자가 시간을 할애하여 참여해 준 것에 대하여 경의를 표하라.

인터뷰
가장 흔히 사용하는 요구 취합 방법.
일반적으로 한 사람씩 한다.
1) 인터뷰 대상자 선정 - 상하위층의 시각 모두를 파악할 수 있도록 다양한 계층 선정
2) 인터뷰 질문 작성 - 폐쇄형 질문, 자유 해답형 질문, 유도형 질문
3) 인터뷰 준비 - 질의 순서, 예상 답변, 인터뷰 계획, 우선 순위 등
4) 인터뷰 수행 - 인터뷰 기록, 사실과 의견 구분
5) 인터뷰 후속 조치 - 보고서 작성

JAD(Joint Application Development)
프로젝트 팀, 사용자, 관리자 등이 협력하여 시스템의 요구를 찾도록 도와주는 정보 취합 기술. IBM이 1970년대에 개발하였으며, 사용자로부터 정보를 취합하는 효과적인 방법으로 알려짐.
10~20명의 사용자를 회의 주재자의 지시에 따라 만나게 하는 기법.
1) 참여자 선정
2) 회의 설계
3) 회의 준비
4) 회의 실시
5) 후속 조치

설문
개인으로부터 정보를 취하기 위한 서면 질의서.
많은 사람들의 정보나 의견이 필요할 때 사용.
외부 기관이 사용할 목적으로 시스템을 개발한다거나, 지리적으로 떨어진 사용자의 의견을 취합할 때 주로 사용.
1) 응답자 선정
2) 설문지 설계
3) 설문 집행 - 회수율을 높이기 위하여 설문의 동기와 목적 타당성을 설명
4) 후속 조치 - 결과 보고서

서류 검토
현재 시스템을 이해하기 위한 서류를 분석함. 그러나, 대부분의 시스템은 문서로 잘 만들어 지지 못함.

관찰
작업 과정을 지켜보는 것으로 현재 시스템에 관한 정보를 수집하기 위한 좋은 방법.

요구 추출 방법 선택

3.5 요구 문서화

요구 분석서 작성

1) 다루는 문제 또는 요지

2) 배경 지식

3) 환경 및 시스템 모델

4) 기능적 요구

5) 비기능적 요구

요구 분석서 검토

1) 개발 비용의 투자 효과

2) 현재 당면한 문제의 해결

3) 명확하고 통일된 표현

4) 모호한 점이 없게

5) 일관성

6) 품질 좋은 시스템을 유도

7) 실현 가능성

8) 검증 가능성

9) 식별할 이름

2011년 11월 9일 수요일

프로젝트 계획

계획 단계의 작업 과정은 아래와 같이 나눠볼 수 있다.

1. 비즈니스 목표 설정
2. 시스템 개발 요청 정의
3. 타당성 분석
4. 개발 일정과 비용 산정
5. 계획서 작성

1.1 비즈니스 목표 설정

전략적 계획

질 : 지금 무엇을 하고 있습니까?
답1 : 나는 돌을 쪼고 있습니다.
답2 : 나는 역사에 남을 궁전 건축을 하고 있습니다.

시스템 분석가는 IT 시스템의 광범위하고 전략적인 역활에 초점을 두어야 하며, 전략계획을 수립할 때 SWOT(Strorng-강점, Weak-약점, Opportunity-기회, Threat-위기) 분석을 이용하기도 한다.

경영 목표

기업이나 기관은 목적, 비전, 가치를 근거로 미션 선언문(mission statement) 이나 비전 선언문을 가지고 있다. 미션 선언문은 프로젝트 계획의 시작점이다. 기관은 미션을 성취하기 위하여 정해 놓은 목표가 있다. 이런 목표를 성취하기 위해 단기 목표를 세워놓았을 것이다. 단기 목표는 곧 전술적인 계획을 일컫는다.

1.2 프로젝트 선정

시스템 개발 요청서 - 기업 내부나 외부에서 일어나는 여러 가지 요인들로 부터 나오는 시스템 개발 필요성을 공식으로 문서로 요청한 것

시스템 개발을 요청하는 여섯 가지 주된 요인

서비스 향상
성능 개선
신제품 또는 서비스 지원
정보 증대
제어력 강화
비용 절감

시스템 개발 요청서의 요소

1.3 타당성 분석

기술적 타당성

- 사용자와 분석가가 응용 분야에 익숙한지를 따져본다.
- 기술에 익숙한지를 따져본다.
- 프로젝트의 크기(인원, 기간, 기능 등)를 고려한다.
- 레거시 시스템과의 호환성을 검토한다.

경제적 타당성

비용수익 요소 파악
|
비용수익 금액 파악
|
연도별 수익 계산
|
프로젝트 경제적 가치 계산

1). 비용수익 요소 파악
- 개발비용 : 급여, 컨설트, 하드웨어/소프트웨어, 설치비, 사무실 집기, 데이터 변환 비용 등
- 운영비용 : 라이센스, 업그레이드, 수리, 운영급여, 통신비, 교육 등의 비용
- 눈에 보이는 이익 : 매출 증가, 인건비 절감, 창고비 절감, IT 비용 절감 등
- 눈에 안보이는 이익 : 시장 점유율 /인지도 증가, 품질향상, 서비스 향상, 업무절차개선 등

2). 비용수익 금액 산정
- 비용과 수익의 예측이 어렵다면 기대치를 구할 수 있다.
예) 매출 5억 증대 확률 30%, 6억 5천 증대 확률 50%, 7억 5천 증대 확률 20% 일 경우
(5억 * 0.3) + (6억5천 * 0.5) + (7억5천 * 0.2) = 6억 2천 5백만원

- 보이지 않는 수익은 값어치 환산을 통해 구할 수 있다.
예) 서비스 향상으로 고객불만 3년에 10% 감소
고객불만 접수 콜센터 비용은 연 2천만원

3) 연도별 수익 계산
비용수익 분석은 현금의 흐름을 보이기 위하여 3년 ~ 5년 동안 비용과 수익의 추이를 나타낸다. 이때 일정 비율 증가/감소 분도 반영해야한다.(인건비, 매출 등)

4) 경제적 가치 계산
- 투자대비수익률(Return on Investment)
ROI = (총수익 - 총비용) / 총비용

- 손익분기점(break-even point) : 투자 비용 회수 소요 기간

투자비용 대비 흑자 전환 시점(년)을 기준으로

손익분기점 잔여 기간 = (흑자전환 당해 년도 연간 수익 - 투자대비 누적 수익) / 연간 수익

- 현재 가치

ROI, 손익분기점=화폐 가치의 변화를 간과, 화폐 가치 변화를 반영하여 현재 가치로 환산

현재 가치 = 금액 / (1+이자율)n, n승 - n은 연도

순수 현재 이익 = 수익의 현재 가치 - 비용의 현재 가치

조직 측면의 타당성

- 전략적 정렬(strategic alignment) : 프로젝트 목표와 비즈니스 목표를 맞추는 것
- 프로젝트 관련자(stakeholder)들에 대한 분석

1.4 규모 산정

프로젝트의 상충관계

일정을 단축시키면 시스템 규모를 줄이거나 인원을 더 투입해야 한다.

시스템 구축 소요 기간 예측의 간단한 방법은 계획 단계에 걸린 시간을 기초로 전체 프로젝트에 걸리는 기간을 추정하는 것이다.
일반적으로 산업계 통계에 의하면 계획단계 15%, 분석단계 20%, 설계단계 35%, 구현단계 30%를 할애 한다.

예) 계획기간 4개월 소요되었다면 나머지 프로젝트 단계는 대략 4/0.15=22.66개월

1.5 일정 계획

프로젝트 규모와 대략적인 일정을 산정한 후에는 작업 계획을 세운다.
작업 계획은 각 작업이 언제 완성되어야 하고, 누가 작업할 것인지, 어떤 결과가 나와야 하는지 등 중요한 정보를 담고 있다.

작업계획을 완성하기 위하여 프로젝트 관리자는 아래와 같은 단계를 수행한다.

작업 파악 - WBS(Work Breakdown Structure-작업분할구조) 구성 - 간트 차트(Gantt chart) 표현

* 마일스톤(milestone)
프로젝트의 중간결과를 점검하는 시점
ex) 요구 분석 결과 리뷰, 설계 리뷰, 시스템 프로토타입 완성 등

1.6 조직 구성

얼마나 많은 인원을 프로젝트에 투입하여야 하는가?
각 인원이 프로젝트 수행에 필요한 기술이 무엇인가?
목표 달성을 위하여 참여자에게 어떻게 동기부여를 할 것인가?
프로젝트 구성원 또는 조직 사이의 갈등은 어떻게 최소화 할 것인가?
어떤 사람이 프로젝트에 참여하는가?
업무 보고 체계와 수행 목표, 규칙은 무엇인가?

팀 조직

조직 구성의 첫 번째 단계는 프로젝트에 필요한 스탭의 평균 인원을 결정하는 것이다.
평균인원 = 소요 예상 인원-월 / 기간
예) 40인원-월, 10개월에 완성 하려면 ... 40 / 10 = 4 (4명의 정규 인력 필요)

프로젝트 일정 단축을 목적으로 인원을 늘린다고 프로젝트 일정이 단축되지 않는다.
팀 인원과 생선성의 관계는 비례하지 않는데, 인력이 늘어날수록 의사소통 경로가 더 복잡해 지기 때문이다.

동기 부여

참여자의 생산성에 영향을 줄 수 있는 최고의 가치.

동기 부여를 위하여 하지 말아야 할 것

자료출처 : UML을 활용한 시스템 분석 설계 (최은만 지음-생능출판사)

최첨단 XML 압축 기술 조사

요즘 시스템들간의 통신 또는 클라이언트/서버 통신에서 XML 데이터를 많이 사용한다.
XML이 표준기술이라 장점이 많지만, xml tag로 인한 데이터 크기가 커진다는 단점도 많다.
실제로 2010년 차세대 시스템 오픈 후 반년간 500 error 로 고생한적 있다.
그때의 상황을 보면 오류나는 PC에서도 데이터 크기가 큰 이벤트에 대해서만 500 오류가 발생했었다.
여러 대안방법을 시도해 보았었다.
PC의 레지스트리를 조작하여 PC-랜선 간의 패킷 크기도 조정해 보았고,
네트워크 구간의 공유장비를 교체해서 해결한 적도 있다.
또한, 방화벽의 설정을 변경해 오류 확률을 낮춘적도 있었다.
그러나 Application Side 에서의 대안방법을 찾아내지는 못했었다.
오늘도 우연히 서핑하다 Application Side 에서의 대안방법도 있음을 인지하게 되었다.
일찍 알았더라면 6개월간 쫒아다니며 그 고생을 하지는 않았을것을....

소개

자주 사용하는 약어

CDATA: Character data
DTD: Document Type Definition
GPS: Global positioning system
HTML: HyperText Markup Language
PPM: Prediction by Partial Match
SAX: Simple API for XML
W3C: World Wide Web Consortium
XML: Extensible Markup Language

XML은 HTML과 월드 와이드 웹이 얻고 있는 엄청난 대중성의 결과로서 등장한 가장 유용하고 중요한 기술 중 하나이다. XML은 다양한 아키텍처 간에 중립적인 데이터 표시 기능을 제공하고, 최소한의 노력으로 소프트웨어 시스템 간의 격차를 해소하고, 대량의 반구조적 데이터를 저장하므로, 수많은 문제점을 해결해준다.

XML은 문서에 있는 각각의 레코드에 대해 스키마가 반복되도록 설계되어 있으므로, 종종 자기 기술적 데이터로 불린다. 이런 자기 기술적 특성으로 인해 XML은 유연성이 매우 뛰어나지만, 자세한 정보를 많이 담고 있어 XML 문서 크기가 지나치게 커지는 문제점도 발생한다. XML 사용이 계속 증가하고 XML 문서의 큰 저장소가 현재 너무 만연되어 있기 때문에, 효율적인 XML 압축 도구에 대한 요구가 거세다.

그림 1은 네트워크를 통해 XML 데이터를 전송하는 비용을 줄이기 위해 XML 압축 프로그램을 사용할 때의 장점을 보여준다. 큰 XML 문서의 크기 문제점을 해결하기 위해, 여러 가지 XML 인식 압축 프로그램에서는 잘 알려진 XML 문서의 구조를 이용해 일반적인 텍스트 압축 프로그램보다 나은 압축비를 달성한다. XML 압축 도구의 수많은 이점 중에는 데이터 교환에 필요한 네트워크 대역폭 감소, 저장에 필요한 디스크 공간 감소, XML 문서 처리 및 쿼리의 기본 메모리 요구사항 최소화가 포함된다.

그림 1. 네트워크를 통해 XML 데이터를 전송할 때 XML 압축 프로그램을 사용하는 이점을 보여주는 예

네트워크를 통해 XML 데이터를 전송할 때 XML 압축 프로그램을 사용하는 이점의 사례를 보여주는 다이어그램

원칙적으로, XML 압축 프로그램은 두 가지 기본 특성을 기준으로 분류할 수 있다. 그림 2는 XML 문서의 구조에 대한 인식을 바탕으로 하는 첫 번째 분류를 나타낸 것이다. 이 분류에 따르면, 압축 프로그램은 다음 두 가지 기본 그룹으로 나뉜다.

일반 텍스트 압축 프로그램. XML 데이터는 텍스트 파일로 저장되기 때문에, XML 문서를 압축하기 위한 첫 번째 논리적 접근 방법은 기존의 범용 텍스트 압축 도구(예: gzip, bzip2)를 사용하는 것이었다. 이 그룹의 XML 압축 프로그램은 XML을 인식하지 못하는 압축 프로그램이다. 즉, 이런 프로그램은 XML 문서를 일반 텍스트 문서로 취급하므로 기존의 텍스트 압축 기술을 적용한다.
XML 인식 압축 프로그램. 이 그룹의 압축 프로그램은 XML 문서 구조를 인식하여 일반적인 텍스트 압축 프로그램보다 높은 압축비를 달성하도록 설계되었다. 이 그룹에 속하는 압축 프로그램은 다음과 같이 XML 문서의 스키마 정보 가용성에 대한 종속성에 따라 더 세부적으로 분류할 수 있다.
- 스키마 종속적 압축 프로그램. 인코더와 디코더 모두 압축 프로세스를 완료하기 위해 문서 스키마 정보에 대한 액세스 권한이 있어야 한다(예: rngzip).
- 스키마 독립적 압축 프로그램. 인코딩 및 디코딩 프로세스를 완료하는 데 스키마 정보의 가용성이 필수적이지 않다(예: XMill, SCMPPM).

그림 2. XML 문서의 구조에 대한 인식 여부에 따른 XML 압축 프로그램의 분류

XML 문서의 구조에 대한 인식 여부에 따른 XML 압축 프로그램의 분류를 보여주는 다이어그램

그림 3은 다음과 같이 쿼리 지원 능력을 바탕으로 하는 XML 압축 프로그램의 두 번째 분류 방법을 보여준다.

쿼리 불가능한(아카이브) XML 압축 프로그램. 이 그룹에 속하는 XML 압축 프로그램에서는 압축된 형식에 대해 어떤 쿼리도 처리할 수 없다(예: gzip, bzip2, XMill). 이 그룹의 기본 목적은 최고의 압축비를 달성하는 것이다. 기본적으로, 범용 텍스트 압축 프로그램은 쿼리 불가능한 그룹의 압축 프로그램에 속한다.
쿼리 가능한 XML 압축 프로그램. 이 그룹에 속하는 XML 압축 프로그램에서는 압축된 형식에 대한 쿼리를 처리할 수 있다. 이 그룹의 압축비는 보통 아카이브 XML 압축 프로그램의 압축비보다 낮다. 하지만, 이 그룹의 기본 목적은 쿼리 실행 중 전체 문서의 압축이 풀리지 않도록 하는 것이다. 사실, 압축된 XML 형식에 대해 직접 쿼리를 수행하는 능력은 모바일 디바이스 및 GPS 시스템과 같이 자원이 한정된 컴퓨팅 디바이스에 호스트되어 있는 수많은 애플리케이션에 중요하다. 기본적으로, 쿼리 가능한 모든 압축 프로그램은 XML도 인식한다. XML 문서의 구조 및 데이터 파트를 인코드하는 방식을 기준으로, 다음과 같이 이 그룹의 압축 프로그램을 더 세분화할 수 있다.
- 준동형 압축 프로그램. XML 문서의 원본 구조는 그대로 유지되고, 원본 형식(예: XGrind)과 같은 방법으로 압축된 형식에 액세스하여 구문 분석할 수 있다.
- 비 준동형 압축 프로그램. XML 문서의 인코딩 프로세스가 데이터 파트에서 구조 파트를 분리한다(예: XQueC). 따라서 압축된 형식의 구조가 원본 XML 문서의 구조와 다르다.

그림 3. 쿼리 실행의 지원 여부에 따른 XML 압축 프로그램의 분류

쿼리 실행의 지원 여부에 따른 XML 압축 프로그램의 분류를 보여주는 다이어그램

위로

일반 텍스트 압축 프로그램

XML은 트리 구조의 데이터를 텍스트로 표시한다. XML 문서를 압축하기 위한 간단한 논리적 접근 방법은 기존의 범용 텍스트 압축 도구를 사용하는 것이다. 지난 수십 년간에 걸쳐 텍스트 데이터를 효율적으로 압축하기 위해 수많은 알고리즘이 고안되었다. 이 그룹에 속하는 압축 프로그램 중 가장 대중적이고 효율적인 대표적 프로그램은 gzip, bzip2 및 PPM 압축 프로그램이다.

gzip 압축 프로그램은 LZ77 알고리즘과 호프만 압축 코딩의 조합을 사용하는 DEFLATE 무손실 데이터 압축 알고리즘을 기반으로 한다. LZ77 알고리즘에서는 데이터 압축을 위해 이미 인코더와 디코더를 모두 통과한 데이터와 일치하는 참조로 데이터 부분을 대체한다. 호프만 압축 코딩에서는 각 기호에 대한 표시를 선택하기 위해 특정 메소드를 사용하며, 이때 공통 소스 기호를 줄이기 위해 더 짧은 비트 문자열을 사용하는 가장 공통적인 문자가 사용된다.

bzip2 압축 프로그램에서는 Burrows-Wheeler 변환을 사용하여 자주 되풀이되는 문자 시퀀스를 같은 문자로 구성된 문자열로 변환한 다음, MTP(move-to-front) 변환을 적용하고, 마지막으로 호프만 압축 코딩을 적용한다. Burrows-Wheeler 변환에서는 원래 문자열에 자주 나오는 하위 문자열이 여러 개 있는 경우 변환된 문자열에 단일 문자가 한 행에 여러 번 반복되는 곳이 여러 군데 있는 식으로, 문자들의 순서가 변경된다. 이 접근 방법은 반복되는 문자들이 이어지는 문자열을 쉽게 압축하는 데 도움이 되므로 압축에 유용하게 사용된다. 실제로, bzip2 압축은 gzip을 이용한 압축보다 높은 압축비로 파일을 압축하지만, 성능은 더 낮다.

PPM은 컨텍스트 모델링 및 예측을 바탕으로 한 적응형 통계 데이터 압축 기술이다. PPM에서는 순서가 고정된 여러 가지 컨텍스트 모델을 함께 조합하여 입력 시퀀스에서 다음 문자를 예측하는 유한 컨텍스트 통계 모델링 기술을 사용한다. 이 모델의 각 컨텍스트에 대한 예측 확률은 빈도 개수로부터 계산된다. 이는 적절히 업데이트됩니다. 실제로 나타나는 기호는 산술 코딩을 사용하여 예측한 분포와 상대적으로 인코드된다. PPM은 간단하고 지금까지 제시한 압축 프로그램 중에서 가장 효율적이지만, 계산에 따른 비용이 가장 많이 들기도 한다.

실제로는, 아카이브 목적으로 또는 데이터 교환 프로세스 중의 네트워크 대역폭 감소를 위해 일반 텍스트 압축 프로그램이 사용된다. 일반적으로, 압축비와 압축/압축 풀기 시간이라는 두 가지 기본 메트릭의 측면에서 이런 압축 프로그램 간에 절충점이 있다. 한편으로는, PPM 압축이 압축비는 가장 높은 반면 gzip은 압축비가 가장 낮다. 다른 한편으로는, gzip이 압축/압축 풀기 시간 면에서 최고의 성능을 발휘하는 반면 PPM의 압축/압축 풀기 시간은 훨씬 오래 걸린다. bzip2는 두 메트릭에 대해 중간 위치를 차지한다. 따라서 사용자는 이 두 가지 메트릭을 기준으로 주로 사용 시나리오의 요구사항에 따라 사용할 압축 프로그램을 선택하게 된다.

위로

쿼리 불가능한(아카이브) XML 압축 프로그램

이 그룹에 속하는 XML 압축 프로그램은 최고의 압축비를 달성하는 것이 주 목적이므로 압축된 형식에 대해 어떤 쿼리도 처리할 수 없다. 이 섹션에서는 다음과 같이 이 그룹의 두 가지 주 클래스의 대표적 압축 프로그램에 대해 설명한다.

스키마에 독립적인 압축 프로그램
스키마에 종속적인 압축 프로그램

스키마에 독립적인 압축 스킴

이 클래스의 압축 스킴에서는 인코딩 및 디코딩 프로세스에 대한 스키마 정보의 가용성이 필수적이지 않다. XMill은 XML 문서의 구조를 데이터에서 분리하고 트리에서 데이터 값의 상대 경로와 데이터 유형을 바탕으로 데이터 값을 같은 유형의 컨테이너로 그룹화한다는 새로운 아이디어를 도입한 XML 인식 압축 프로그램을 최초로 구현한 것이다.

XMill에서는 원본 XML 문서의 구조 및 데이터 값 파트가 모두 수집되고 따로 압축된다. 구조 파트에서는 XML 태그 및 속성이 사전 기반 방식으로 인코드된 후 백엔드 일반 텍스트 압축 스킴으로 인코딩을 전달한다. XMill의 구조적 인코딩 스킴은 각각의 개별 요소와 속성 이름에 정수 코드를 지정하며, 이 코드는 요소 및 속성 이름 사전에 대한 키 역할을 한다. 데이터 파트에서는 데이터 값이 경로와 데이터 유형에 따라 유형이 같고 시맨틱상 관련된 컨테이너로 그룹화된다. 그런 다음 각각의 컨테이너는 이 컨테이너의 데이터 유형에 이상적인 전문 압축 프로그램을 사용하여 따로 압축된다. 이 그룹화 조작을 통해 반복을 현지화하므로 압축비가 향상된다.

최신 버전의 XMill 소스 배포에서는 gzip, bzip2 및 PPM의 세 가지 대체 백엔드 범용 압축 프로그램 중 하나로 압축된 형식의 중간 2진 데이터가 전달될 수 있다. 그림 4는 XML 구문 분석기, 구조 및 데이터 컨테이너, 하나 이상의 압축 스킴, 압축된 XML 파일(압축된 구조와 압축된 데이터가 있음)을 포함한 XMill 압축 프로그램의 일반적인 아키텍처 개요를 나타낸 것이다.

그림 4. XMill 압축 프로그램의 일반적인 아키텍처
XMill 압축 프로그램의 일반적인 아키텍처를 보여주는 다이어그램

그림 5는 XML 파일을 구조 및 데이터 컨테이너로 분할하는 예를 보여준다. 요소 및 속성 테이블은 XML 문서의 구조 컨테이너를 저장한다. 각 고유 경로의 값(요소 또는 속성)은 별도의 테이블(컨테이너)에 저장된다. 따라서 각 컨테이너의 값은 같은 유형이 되어 더욱 효율적으로 압축될 수 있다. 이런 예로는 별개의 요소(customers, customers/customer, customers/customer/firstName, customers/customer/lastName, customers/customer/invoice, customers/customer/invoice/items, customers/customer/invoice/items/item) 및 속성(customers/customer/@id, customers/customer/invoice/@total) 테이블이 포함된다.

그림 5. XML 파일을 구조 및 데이터 컨테이너로 분할하는 예
XML 파일을 구조 및 데이터 컨테이너로 분할하는 예를 보여주는 다이어그램

그림 6은 XMill 압축 프로그램의 명령행 옵션을 나타낸 것이다.

그림 6. XMill 압축 프로그램의 명령행 옵션
XMill 압축 프로그램의 명령행 옵션 화면 캡처

그림 7은 XMill 압축 프로그램을 사용하여 XML 문서(tpc.xml, 크기는 282KB)를 압축한 효과를 보여주는 것으로, 출력된 압축 파일(tpc.xmi, 크기는 41KB) 크기가 원본 XML 파일 크기의 15%에 해당하는 것을 알 수 있다.

그림 7. XMill 압축 프로그램을 사용하여 XML 파일을 압축한 결과물

XMill 압축 프로그램을 사용하여 XML 파일을 압축한 결과물의 화면 캡처

XMLPPM은 MHM이라는 멀티플렉싱된 계층 구조 PPM 모델을 사용하는 스트리밍 XML 압축 프로그램이다. XMLPPM은 부분 일치 압축 스킴(PPM)에 의해 일반적인 용도로 예측한 결과에 따라 조정된 것으로 간주된다. XMLPPM에서는, 우선 SAX 구문 분석기를 사용하여 XML 파일을 구문 분석하여 SAX 이벤트 스트림을 생성한다. 각각의 이벤트는 ESAX(인코드된 SAX)라는 바이트코드 표시를 사용하여 인코드된다. ESAX 바이트코드는 구문 구조(요소, 문자, 속성 및 기타 기호)를 바탕으로 여러 가지 멀티플렉싱된 PPM 모델 중 하나를 사용하여 인코드된다. XMLPPM 압축 프로그램의 변형으로서 SCMPPM 압축 프로그램이 제안되었다. SCMPPM은 구조 컨텍스트 모델링(SCM)이라는 기술을 PPM 압축 스킴과 결합한 것이다. SCMPPM은 별도의 모델을 사용하여 각 요소 기호 아래에서 텍스트 컨텐츠를 압축하므로 XMLPPM보다 더 큰 PPM 모델 세트를 사용한다.

XMill과 XMLPPM의 압축비 및 압축/압축 풀기 시간은 백엔드 범용 압축 프로그램(gzip, bzip2 또는 PPM)에 매우 종속적이며 관련이 깊다. 따라서 이들도 범용 백엔드 압축 프로그램과 같은 절충 사항이 있다.

스키마에 종속적인 압축 스킴

이 클래스의 압축 프로그램에서는 인코딩 및 디코딩 프로세스 중에 XML 문서 스키마 정보의 가용성이 필수적이다. 예를 들어, XAUST XML 압축 프로그램은 DTC에 있는 각 요소에 대해 하나씩, DTD의 스키마 정보를 결정적 유한 오토마톤(DFA) 세트로 변환한다. 그런 다음, 각각의 전환은 요소별로 레이블이 지정되고 전환과 연관된 조치는 그 전환에 레이블을 지정하는 요소에 대한 DFA의 시뮬레이터에 대한 호출이다. XAUST는 동일 요소에 대한 모든 데이터를 단일 컨테이너로 그룹화하고, 산술 4차 압축 프로그램에 대해 단일 모델을 사용하여 이 컨테이너를 점진적으로 압축한다. XAUST는 DTD 스키마 정보를 사용하여 문서의 구조를 추적하고 예상되는 기호를 정확히 예측할 수 있다. 예상되는 기호가 고유한 경우에는 디코더가 DTD에서 같은 모델을 생성하고 따라서 고유의 예상 기호를 생성할 수 있으므로 기호를 인코드할 필요가 전혀 없다.

RNGzip XML 도구는 주어진 Relax NG 스키마를 준수하는 XML 문서를 압축한다. RNGzip에서는 송신자와 수신자가 정확히 같은 스키마에 대해 미리 합의해야 한다. 이런 점에서, 스키마는 암호화 및 복호화를 위한 공유 키와 같다. RNGzip에서는 Relax NG 스키마 유효성 검증기를 사용하여 지정된 스키마에서 결정적 트리 오토마톤을 빌드한다. 그런 다음, XML 문서가 지정되면 오토마톤에서 XML을 허용하는지 확인한다. 이 오토마톤이 주어지면, 수신자가 아주 약간의 정보를 전송하여 전체 XML 문서를 다시 생성할 수 있다. 오토마톤에 선택 지점이 있는 경우 RNGzip은 선택된 전환만 전송하고, 텍스트 전환이 있는 경우에는 일치하는 텍스트가 전송된다.

이론적으로, 스키마에 종속적인 압축 프로그램은 스키마에 독립적인 압축 프로그램보다 압축비가 약간 더 높을 수 있다. 그러나 XML 문서의 스키마 정보를 항상 사용할 수 있는 것은 아니며 그에 따라 반구조적 데이터를 표시할 수 있는 XML의 유연성이 지닌 이점을 잃게 되므로 이런 압축 프로그램을 선호하지 않거나 흔히 사용하지는 않는다. 이 유형의 압축 프로그램은 미리 정의된 스키마로 XML 문서를 압축하는 데 사용될 때만 효과적일 수 있다.

위로

쿼리 가능한 XML 압축 프로그램

쿼리 가능한 XML 압축 프로그램의 주요 목적은 전체 문서의 압축을 풀지 않고 압축된 형식에 대해 쿼리를 직접 평가할 수 있도록 하는 것이다. 이 그룹의 압축비는 보통 아카이브 XML 압축 프로그램의 압축비보다 낮다. 이런 유형의 압축 프로그램은 모바일 디바이스 및 GPS 시스템과 같이 자원이 한정된 컴퓨팅 디바이스에 호스트되어 있는 수많은 애플리케이션에 매우 중요하다. 이 섹션에서는 다음과 같이 이 그룹의 두 가지 주 클래스의 대표적 압축 프로그램에 대해 설명한다.

준동형 압축 프로그램
비 준동형 압축 프로그램

준동형 쿼리 가능 XML 압축 프로그램

이 클래스의 압축 프로그램은 압축된 형식으로 XML 문서의 원래 구조를 유지하므로, 원본 형식과 같은 방식으로 액세스 및 구문 분석할 수 있다. XGrind는 압축된 XML 문서의 압축을 완전히 풀 필요 없이 쿼리를 지원하는 최초의 XML 인식 압축 스킴이다. XGrind는 구조에서 데이터를 분리하지 않는다. 그래서 XML 문서의 원래 구조가 유지된다.

XGrind 압축 형식의 준동 특성 덕분에 XGrind에서 다음과 같이 다양하고 흥미로운 기능을 사용할 수 있다.

압축된 XML 문서는 태그와 해당 인코딩으로 대체된 요소/속성 값을 가진 원본 XML 문서로 볼 수 있다. 그러므로 XGrind를 확장된 SAX 구문 분석기로 간주할 수 있다.
XML 인덱싱 기술은 일반 XML 문서에서 빌드할 수 있는 것과 유사한 방법으로 압축된 문서에서 빌드할 수 있다. XGrind에서는 요소 및 속성 이름이 사진 기반 인코딩 스킴을 사용하여 인코드되고 문자 데이터는 반 적응형 호프만 압축 코딩을 사용하여 압축된다. XGrind의 쿼리 프로세서는 압축된 값에 대한 정확한 일치 및 접두부 일치 쿼리와 압축이 풀린 값에 대한 부분 일치 및 범위 쿼리만 처리할 수 있다. 그러나 XGrind에서는 압축된 도메인에서의 비 동등 선택과 같은 여러 가지 조작을 지원하지 않는다. 따라서 XGrind는 결합, 집계, 중첩된 쿼리 또는 생성 조작을 수행할 수 없다.

XPress는 특성의 조합을 사용하여 XML 데이터를 효율적으로 압축하고 검색하는 또 다른 준동형 쿼리 가능 XML 압축 프로그램이다. XML 문서의 레이블과 경로를 인코드하기 위해, XPress는 서로 다른 간격으로 역 산술 인코딩 방법을 사용한다. 간격 사이에 제약 관계를 사용하여 압축된 XML 데이터에 대해 경로 표현식이 평가된다. XPress의 압축 스킴은 입력 파일의 임시 스캔을 사용하여 통계 정보를 수집하는 반 적응형 스킴이고, 데이터에 대한 인코딩 규칙은 데이터의 위치에 상관없다. 또한, 이 스킴에서는 자동 유추 유형 정보를 바탕으로 데이터 값에 알맞은 인코더를 사용한다.

비 준동형 쿼리 가능 XML 압축 프로그램

이 클래스의 압축 프로그램은 XML 문서의 인코딩 프로세스 중에 데이터 파트에서 구조 파트를 분리한다. 따라서 준동형 클래스와는 달리, 압축된 형식의 구조는 원본 XML 문서의 구조와 다르고 압축 풀기 프로세스 중 다른 방식으로 구문 분석되어야 한다. 하지만, 더 높은 압축비를 달성할 수 있다. 예를 들어, XSeq는 문법 기반 쿼리 가능 XML 압축 스킴으로서, 유명한 문법 기반 텍스트 문자열 압축 알고리즘인 Sequitur를 적절히 조정한 것으로 간주된다.

XSeq에서는 입력 XML 파일의 토큰이 컨테이너 세트로 분리되고 그 각각은 Sequitur를 사용하여 압축된다. Sequitur 압축 알고리즘은 주어진 문자열 입력에 대해 컨텍스트가 없는 문법을 형성하는 선형 시간 온라인 알고리즘이다. XSeq는 정의된 컨텍스트가 없는 문법을 사용하여 부적절한 압축 데이터의 순차적 스캔을 피하고 주어진 쿼리에 의해 일치되는 데이터 값만 처리한다. 그 밖에도, XSeq는 컨텍스트가 없는 문법을 통해 전체 또는 부분적으로 압축을 풀지 않고 압축된 파일에 대해 직접 쿼리를 처리할 수 있다. 다른 컨테이너에 저장된 데이터 값을 연결하고 쿼리 평가 시간을 단축하기 위해, XSeq는 압축된 파일 내에 저장되고 메모리에 로드된 후 규칙 컨텐츠를 처리하는 인덱스 세트를 사용한다. 예를 들어, XSeq는 압축을 풀지 않고 컨테이너에 있는 각각의 데이터 값을 빠르게 찾을 수 있지만 헤더 인덱스에는 파일에 있는 각 컨테이너의 입구에 대한 포인터 목록이 있는 구조적 인덱스를 사용한다.

TREECHOP XML 압축 프로그램은 XML 문서의 SAX 기반 구문 분석을 수행하여 압축 프로세스를 시작하고 구문 분석된 토큰은 첫 번째 레벨의 깊이에 있는 압축 스트림에 작성되는 또 다른 쿼리 가능 XML 압축 프로그램이다. 각 노드에 대한 코드 워드에는 상위의 코드 워드가 접두부로 표시되고, XML 문서 트리에 있는 두 개의 노드가 같은 경로를 가진 경우 같은 코드 워드를 공유한다. 각각의 CDATA 섹션, 주석, 처리 명령어 및 리프가 아닌 노드에 2진 코드 워드가 지정된다. 이 코드 워드는 트리 노드의 경로를 바탕으로 고유하게 지정된다. 트리 노드 인코딩이 첫 번째 레벨의 깊이에서 압축 스트림에 작성되기 때문에, 압축 풀기 프로그램이 적응형 인코딩 정보를 점진적으로 사용하여 원본 XML 문서를 다시 생성할 수 있다. TREECHOP에서는 압축 스트림을 통해 단일 스캔을 사용하여 정확한 일치 및 범위 쿼리를 수행할 수 있다.

XQuec 시스템에서는 XML 문서 내부의 분리 구조 및 컨텐츠를 바탕으로 압축된 XML 문서에 대해 단편화 및 스토리지 모델을 사용한다. 그 밖에도, XQuec 시스템은 같은 그룹에 속한 컨테이너가 쿼리 조건부에도 함께 나타나도록 컨테이너를 그룹화하는 방법의 올바른 선택에 의존한다. 압축된 도메인 내에서 조건부의 평가를 수행하기 위해, 이 시스템에서는 조건부에서 관련된 컨테이너가 같은 그룹에 속하고 압축된 도메인에서 그 조건부를 지원하는 알고리즘으로 압축되도록 한다. 조건부에 대한 정보는 사용 가능한 쿼리 워크로드를 사용하여 유추된다. XQueC에서는 쿼리 워크로드 정보를 이용해 소스 모델에 따라 컨테이너를 세트로 파티션하고 가장 적합한 압축 알고리즘을 각 세트에 적절히 지정한다. XQueC는 XML 쿼리를 평가하기 위한 대수도 고안했다. 일반 연산자와 압축 인식 연산자를 자유롭게 혼합할 수 있는 비용 기반 최적화 프로그램에서 이 대수를 이용한다.

위로

결론

개요를 설명하는 본 기사에서는 최첨단 XML 압축 기술에 대해 살펴보았다. 이 도메인에서 XMill에 의한 첫 번째 구현에서 XML 압축 메커니즘의 주요 혁신 기술이 제시되었다. 여기서는 데이터 파트에서 XML 문서의 구조 파트를 분리한 다음 따로 압축할 수 있는 같은 유형의 컨테이너로 관련 데이터 항목을 그룹화하는 개념을 도입했다. 중복 데이터를 더 쉽게 발견할 수 있기 때문에, 이런 분리를 통해 범용 압축 프로그램 또는 기타 압축 메커니즘으로 유형이 같은 이런 컨테이너를 압축하는 추가 단계를 개선할 수 있다.

나머지 XML 압축 프로그램은 대부분 이 아이디어를 다른 식으로 시뮬레이션했다. 압축 시간 및 압축 풀기 시간 메트릭은 여러 가지 XML 압축 기술을 구분하는 데 결정적인 역할을 한다. 원칙적으로, XML 문서의 스키마 정보를 항상 사용할 수 있는 것은 아니며 필요한 형식(DTD, XML 스키마, RElaxNG)으로 되어 있는 것도 아니기 때문에 실제로는 스키마에 종속적인 XML 압축 프로그램을 선호하지 않거나 흔히 사용하지는 않는다. 쿼리 가능 XML 압축 프로그램은 수많은 애플리케이션에 매우 중요하지만, 문법 기반 XML 압축 기술과 쿼리 가능 XML 압축 프로그램에 대한 확실한 구현은 공개적으로 제공되지 않는다. 이 두 가지 영역은 추가적인 연구와 개발을 위한 다양하고 흥미로운 접근 수단을 제공한다.

자료출처 : IBM(원문)

참고자료 : http://www.http-compression.com/

Best Practices for Speeding Up Your Web Site

Yahoo 사이트에 흥미있는 글이 올라 있네요.

Minimize HTTP Requests

tag: content

80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.

One way to reduce the number of components in the page is to simplify the page's design. But is there a way to build pages with richer content while also achieving fast response times? Here are some techniques for reducing the number of HTTP requests, while still supporting rich page designs.

Combined files are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining all CSS into a single stylesheet. Combining files is more challenging when the scripts and stylesheets vary from page to page, but making this part of your release process improves response times.

CSS Sprites are the preferred method for reducing the number of image requests. Combine your background images into a single image and use the CSS background-image and background-positionproperties to display the desired image segment.

Image maps combine multiple images into a single image. The overall size is about the same, but reducing the number of HTTP requests speeds up the page. Image maps only work if the images are contiguous in the page, such as a navigation bar. Defining the coordinates of image maps can be tedious and error prone. Using image maps for navigation is not accessible too, so it's not recommended.

Inline images use the data: URL scheme to embed the image data in the actual page. This can increase the size of your HTML document. Combining inline images into your (cached) stylesheets is a way to reduce HTTP requests and avoid increasing the size of your pages. Inline images are not yet supported across all major browsers.

Reducing the number of HTTP requests in your page is the place to start. This is the most important guideline for improving performance for first time visitors. As described in Tenni Theurer's blog postBrowser Cache Usage - Exposed!, 40-60% of daily visitors to your site come in with an empty cache. Making your page fast for these first time visitors is key to a better user experience.

top | discuss this rule

Use a Content Delivery Network

tag: server

The user's proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?

As a first step to implementing geographically dispersed content, don't attempt to redesign your web application to work in a distributed architecture. Depending on the application, changing the architecture could include daunting tasks such as synchronizing session state and replicating database transactions across server locations. Attempts to reduce the distance between users and your content could be delayed by, or never pass, this application architecture step.

Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the Performance Golden Rule. Rather than starting with the difficult task of redesigning your application architecture, it's better to first disperse your static content. This not only achieves a bigger reduction in response times, but it's easier thanks to content delivery networks.

A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.

Some large Internet companies own their own CDN, but it's cost-effective to use a CDN service provider, such as Akamai Technologies, EdgeCast, or level3. For start-up companies and private web sites, the cost of a CDN service can be prohibitive, but as your target audience grows larger and becomes more global, a CDN is necessary to achieve fast response times. At Yahoo!, properties that moved static content off their application web servers to a CDN (both 3rd party as mentioned above as well as Yahoo’s own CDN) improved end-user response times by 20% or more. Switching to a CDN is a relatively easy code change that will dramatically improve the speed of your web site.

top | discuss this rule

Add an Expires or a Cache-Control Header

tag: server

There are two aspects to this rule:

For static components: implement "Never expire" policy by setting far future Expires header
For dynamic components: use an appropriate Cache-Control header to help the browser with conditional requests

Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used onall components including scripts, stylesheets, and Flash components.

Browsers (and proxies) use a cache to reduce the number and size of HTTP requests, making web pages load faster. A web server uses the Expires header in the HTTP response to tell the client how long a component can be cached. This is a far future Expires header, telling the browser that this response won't be stale until April 15, 2010.

      Expires: Thu, 15 Apr 2010 20:00:00 GMT

If your server is Apache, use the ExpiresDefault directive to set an expiration date relative to the current date. This example of the ExpiresDefault directive sets the Expires date 10 years out from the time of the request.

      ExpiresDefault "access plus 10 years"

Keep in mind, if you use a far future Expires header you have to change the component's filename whenever the component changes. At Yahoo! we often make this step part of the build process: a version number is embedded in the component's filename, for example, yahoo_2.0.6.js.

Using a far future Expires header affects page views only after a user has already visited your site. It has no effect on the number of HTTP requests when a user visits your site for the first time and the browser's cache is empty. Therefore the impact of this performance improvement depends on how often users hit your pages with a primed cache. (A "primed cache" already contains all of the components in the page.) We measured this at Yahoo! and found the number of page views with a primed cache is 75-85%. By using a far future Expires header, you increase the number of components that are cached by the browser and re-used on subsequent page views without sending a single byte over the user's Internet connection.

top | discuss this rule

Gzip Components

tag: server

The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It's true that the end-user's bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.

Starting with HTTP/1.1, web clients indicate support for compression with the Accept-Encoding header in the HTTP request.

      Accept-Encoding: gzip, deflate

If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.

      Content-Encoding: gzip

Gzip is the most popular and effective compression method at this time. It was developed by the GNU project and standardized by RFC 1952. The only other compression format you're likely to see is deflate, but it's less effective and less popular.

Gzipping generally reduces the response size by about 70%. Approximately 90% of today's Internet traffic travels through browsers that claim to support gzip. If you use Apache, the module configuring gzip depends on your version: Apache 1.3 uses mod_gzip while Apache 2.x uses mod_deflate.

There are known issues with browsers and proxies that may cause a mismatch in what the browser expects and what it receives with regard to compressed content. Fortunately, these edge cases are dwindling as the use of older browsers drops off. The Apache modules help out by adding appropriate Vary response headers automatically.

Servers choose what to gzip based on file type, but are typically too limited in what they decide to compress. Most web sites gzip their HTML documents. It's also worthwhile to gzip your scripts and stylesheets, but many web sites miss this opportunity. In fact, it's worthwhile to compress any text response including XML and JSON. Image and PDF files should not be gzipped because they are already compressed. Trying to gzip them not only wastes CPU but can potentially increase file sizes.

Gzipping as many file types as possible is an easy way to reduce page weight and accelerate the user experience.

top | discuss this rule

Put Stylesheets at the Top

tag: css

While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages appear to be loading faster. This is because putting stylesheets in the HEAD allows the page to render progressively.

Front-end engineers that care about performance want a page to load progressively; that is, we want the browser to display whatever content it has as soon as possible. This is especially important for pages with a lot of content and for users on slower Internet connections. The importance of giving users visual feedback, such as progress indicators, has been well researched and documented. In our case the HTML page is the progress indicator! When the browser loads the page progressively the header, the navigation bar, the logo at the top, etc. all serve as visual feedback for the user who is waiting for the page. This improves the overall user experience.

The problem with putting stylesheets near the bottom of the document is that it prohibits progressive rendering in many browsers, including Internet Explorer. These browsers block rendering to avoid having to redraw elements of the page if their styles change. The user is stuck viewing a blank white page.

The HTML specification clearly states that stylesheets are to be included in the HEAD of the page: "Unlike A, [LINK] may only appear in the HEAD section of a document, although it may appear any number of times." Neither of the alternatives, the blank white screen or flash of unstyled content, are worth the risk. The optimal solution is to follow the HTML specification and load your stylesheets in the document HEAD.

top | discuss this rule

Put Scripts at the Bottom

tag: javascript

The problem caused by scripts is that they block parallel downloads. The HTTP/1.1 specificationsuggests that browsers download no more than two components in parallel per hostname. If you serve your images from multiple hostnames, you can get more than two downloads to occur in parallel. While a script is downloading, however, the browser won't start any other downloads, even on different hostnames.

In some situations it's not easy to move scripts to the bottom. If, for example, the script usesdocument.write to insert part of the page's content, it can't be moved lower in the page. There might also be scoping issues. In many cases, there are ways to workaround these situations.

An alternative suggestion that often comes up is to use deferred scripts. The DEFER attribute indicates that the script does not contain document.write, and is a clue to browsers that they can continue rendering. Unfortunately, Firefox doesn't support the DEFER attribute. In Internet Explorer, the script may be deferred, but not as much as desired. If a script can be deferred, it can also be moved to the bottom of the page. That will make your web pages load faster.

top | discuss this rule

Avoid CSS Expressions

tag: css

CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They were supported in Internet Explorer starting with version 5, but were deprecated starting with IE8. As an example, the background color could be set to alternate every hour using CSS expressions:

      background-color: expression( (new Date()).getHours()%2 ? "#B8D4FF" : "#F08A00" );

As shown here, the expression method accepts a JavaScript expression. The CSS property is set to the result of evaluating the JavaScript expression. The expression method is ignored by other browsers, so it is useful for setting properties in Internet Explorer needed to create a consistent experience across browsers.

The problem with expressions is that they are evaluated more frequently than most people expect. Not only are they evaluated when the page is rendered and resized, but also when the page is scrolled and even when the user moves the mouse over the page. Adding a counter to the CSS expression allows us to keep track of when and how often a CSS expression is evaluated. Moving the mouse around the page can easily generate more than 10,000 evaluations.

One way to reduce the number of times your CSS expression is evaluated is to use one-time expressions, where the first time the expression is evaluated it sets the style property to an explicit value, which replaces the CSS expression. If the style property must be set dynamically throughout the life of the page, using event handlers instead of CSS expressions is an alternative approach. If you must use CSS expressions, remember that they may be evaluated thousands of times and could affect the performance of your page.

top | discuss this rule

Make JavaScript and CSS External

tag: javascript, css

Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?

Using external files in the real world generally produces faster pages because the JavaScript and CSS files are cached by the browser. JavaScript and CSS that are inlined in HTML documents get downloaded every time the HTML document is requested. This reduces the number of HTTP requests that are needed, but increases the size of the HTML document. On the other hand, if the JavaScript and CSS are in external files cached by the browser, the size of the HTML document is reduced without increasing the number of HTTP requests.

The key factor, then, is the frequency with which external JavaScript and CSS components are cached relative to the number of HTML documents requested. This factor, although difficult to quantify, can be gauged using various metrics. If users on your site have multiple page views per session and many of your pages re-use the same scripts and stylesheets, there is a greater potential benefit from cached external files.

Many web sites fall in the middle of these metrics. For these sites, the best solution generally is to deploy the JavaScript and CSS as external files. The only exception where inlining is preferable is with home pages, such as Yahoo!'s front page and My Yahoo!. Home pages that have few (perhaps only one) page view per session may find that inlining JavaScript and CSS results in faster end-user response times.

For front pages that are typically the first of many page views, there are techniques that leverage the reduction of HTTP requests that inlining provides, as well as the caching benefits achieved through using external files. One such technique is to inline JavaScript and CSS in the front page, but dynamically download the external files after the page has finished loading. Subsequent pages would reference the external files that should already be in the browser's cache.

top | discuss this rule

Reduce DNS Lookups

tag: content

The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server's IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can't download anything from this hostname until the DNS lookup is completed.

DNS lookups are cached for better performance. This caching can occur on a special caching server, maintained by the user's ISP or local area network, but there is also caching that occurs on the individual user's computer. The DNS information remains in the operating system's DNS cache (the "DNS Client service" on Microsoft Windows). Most browsers have their own caches, separate from the operating system's cache. As long as the browser keeps a DNS record in its own cache, it doesn't bother the operating system with a request for the record.

Internet Explorer caches DNS lookups for 30 minutes by default, as specified by the DnsCacheTimeoutregistry setting. Firefox caches DNS lookups for 1 minute, controlled by thenetwork.dnsCacheExpiration configuration setting. (Fasterfox changes this to 1 hour.)

When the client's DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page's URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.

Reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading that takes place in the page. Avoiding DNS lookups cuts response times, but reducing parallel downloads may increase response times. My guideline is to split these components across at least two but no more than four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads.

top | discuss this rule

Minify JavaScript and CSS

tag: javascript, css

Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code areJSMin and YUI Compressor. The YUI compressor can also minify CSS.

Obfuscation is an alternative optimization that can be applied to source code. It's more complex than minification and thus more likely to generate bugs as a result of the obfuscation step itself. In a survey of ten top U.S. web sites, minification achieved a 21% size reduction versus 25% for obfuscation. Although obfuscation has a higher size reduction, minifying JavaScript is less risky.

In addition to minifying external scripts and styles, inlined <script> and <style> blocks can and should also be minified. Even if you gzip your scripts and styles, minifying them will still reduce the size by 5% or more. As the use and size of JavaScript and CSS increases, so will the savings gained by minifying your code.

top | discuss this rule

Avoid Redirects

tag: content

Redirects are accomplished using the 301 and 302 status codes. Here's an example of the HTTP headers in a 301 response:

      HTTP/1.1 301 Moved Permanently
      Location: http://example.com/newuri
      Content-Type: text/html

The browser automatically takes the user to the URL specified in the Location field. All the information necessary for a redirect is in the headers. The body of the response is typically empty. Despite their names, neither a 301 nor a 302 response is cached in practice unless additional headers, such asExpires or Cache-Control, indicate it should be. The meta refresh tag and JavaScript are other ways to direct users to a different URL, but if you must do a redirect, the preferred technique is to use the standard 3xx HTTP status codes, primarily to ensure the back button works correctly.

The main thing to remember is that redirects slow down the user experience. Inserting a redirect between the user and the HTML document delays everything in the page since nothing in the page can be rendered and no components can start being downloaded until the HTML document has arrived.

One of the most wasteful redirects happens frequently and web developers are generally not aware of it. It occurs when a trailing slash (/) is missing from a URL that should otherwise have one. For example, going to http://astrology.yahoo.com/astrology results in a 301 response containing a redirect tohttp://astrology.yahoo.com/astrology/ (notice the added trailing slash). This is fixed in Apache by usingAlias or mod_rewrite, or the DirectorySlash directive if you're using Apache handlers.

Connecting an old web site to a new one is another common use for redirects. Others include connecting different parts of a website and directing the user based on certain conditions (type of browser, type of user account, etc.). Using a redirect to connect two web sites is simple and requires little additional coding. Although using redirects in these situations reduces the complexity for developers, it degrades the user experience. Alternatives for this use of redirects include using Alias and mod_rewrite if the two code paths are hosted on the same server. If a domain name change is the cause of using redirects, an alternative is to create a CNAME (a DNS record that creates an alias pointing from one domain name to another) in combination with Alias or mod_rewrite.

top | discuss this rule

Remove Duplicate Scripts

tag: javascript

It hurts performance to include the same JavaScript file twice in one page. This isn't as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.

Unnecessary HTTP requests happen in Internet Explorer, but not in Firefox. In Internet Explorer, if an external script is included twice and is not cacheable, it generates two HTTP requests during page loading. Even if the script is cacheable, extra HTTP requests occur when the user reloads the page.

In addition to generating wasteful HTTP requests, time is wasted evaluating the script multiple times. This redundant JavaScript execution happens in both Firefox and Internet Explorer, regardless of whether the script is cacheable.

One way to avoid accidentally including the same script twice is to implement a script management module in your templating system. The typical way to include a script is to use the SCRIPT tag in your HTML page.

      <script type="text/javascript" src="menu_1.0.17.js"></script>

An alternative in PHP would be to create a function called insertScript.

      <?php insertScript("menu.js") ?>

In addition to preventing the same script from being inserted multiple times, this function could handle other issues with scripts, such as dependency checking and adding version numbers to script filenames to support far future Expires headers.

top | discuss this rule

Configure ETags

tag: server

Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser's cache matches the one on the origin server. (An "entity" is another word a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header.

      HTTP/1.1 200 OK
      Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT
      ETag: "10c24bc-4ab-457e1c1f"
      Content-Length: 12195

Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.

      GET /i/yahoo.gif HTTP/1.1
      Host: us.yimg.com
      If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
      If-None-Match: "10c24bc-4ab-457e1c1f"
      HTTP/1.1 304 Not Modified

The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won't match when a browser gets the original component from one server and later tries to validate that component on a different server, a situation that is all too common on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.

The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.

IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS isFiletimestamp:ChangeNumber. A ChangeNumber is a counter used to track configuration changes to IIS. It's unlikely that the ChangeNumber is the same across all IIS servers behind a web site.

The end result is ETags generated by Apache and IIS for the exact same component won't match from one server to another. If the ETags don't match, the user doesn't receive the small, fast 304 response that ETags were designed for; instead, they'll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn't a problem. But if you have multiple servers hosting your web site, and you're using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you're consuming greater bandwidth, and proxies aren't caching your content efficiently. Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.

If you're not taking advantage of the flexible validation model that ETags provide, it's better to just remove the ETag altogether. The Last-Modified header validates based on the component's timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This Microsoft Support article describes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:

      FileETag none

top | discuss this rule

Make Ajax Cacheable

tag: content

One of the cited benefits of Ajax is that it provides instantaneous feedback to the user because it requests information asynchronously from the backend web server. However, using Ajax is no guarantee that the user won't be twiddling his thumbs waiting for those asynchronous JavaScript and XML responses to return. In many applications, whether or not the user is kept waiting depends on how Ajax is used. For example, in a web-based email client the user will be kept waiting for the results of an Ajax request to find all the email messages that match their search criteria. It's important to remember that "asynchronous" does not imply "instantaneous".

To improve performance, it's important to optimize these Ajax responses. The most important way to improve the performance of Ajax is to make the responses cacheable, as discussed in Add an Expires or a Cache-Control Header. Some of the other rules also apply to Ajax:

Let's look at an example. A Web 2.0 email client might use Ajax to download the user's address book for autocompletion. If the user hasn't modified her address book since the last time she used the email web app, the previous address book response could be read from cache if that Ajax response was made cacheable with a future Expires or Cache-Control header. The browser must be informed when to use a previously cached address book response versus requesting a new one. This could be done by adding a timestamp to the address book Ajax URL indicating the last time the user modified her address book, for example, &t=1190241612. If the address book hasn't been modified since the last download, the timestamp will be the same and the address book will be read from the browser's cache eliminating an extra HTTP roundtrip. If the user has modified her address book, the timestamp ensures the new URL doesn't match the cached response, and the browser will request the updated address book entries.

Even though your Ajax responses are created dynamically, and might only be applicable to a single user, they can still be cached. Doing so will make your Web 2.0 apps faster.

top | discuss this rule

Flush the Buffer Early

tag: server

When users request a page, it can take anywhere from 200 to 500ms for the backend server to stitch together the HTML page. During this time, the browser is idle as it waits for the data to arrive. In PHP you have the function flush(). It allows you to send your partially ready HTML response to the browser so that the browser can start fetching components while your backend is busy with the rest of the HTML page. The benefit is mainly seen on busy backends or light frontends.

A good place to consider flushing is right after the HEAD because the HTML for the head is usually easier to produce and it allows you to include any CSS and JavaScript files for the browser to start fetching in parallel while the backend is still processing.

Example:

      ... <!-- css, js -->
    </head>
    <?php flush(); ?>
    <body>
      ... <!-- content -->

Yahoo! search pioneered research and real user testing to prove the benefits of using this technique.

top

Use GET for AJAX Requests

tag: server

The Yahoo! Mail team found that when using XMLHttpRequest, POST is implemented in the browsers as a two-step process: sending the headers first, then sending data. So it's best to use GET, which only takes one TCP packet to send (unless you have a lot of cookies). The maximum URL length in IE is 2K, so if you send more than 2K data you might not be able to use GET.

An interesting side affect is that POST without actually posting any data behaves like GET. Based on theHTTP specs, GET is meant for retrieving information, so it makes sense (semantically) to use GET when you're only requesting data, as opposed to sending data to be stored server-side.

top

Post-load Components

tag: content

You can take a closer look at your page and ask yourself: "What's absolutely required in order to render the page initially?". The rest of the content and components can wait.

JavaScript is an ideal candidate for splitting before and after the onload event. For example if you have JavaScript code and libraries that do drag and drop and animations, those can wait, because dragging elements on the page comes after the initial rendering. Other places to look for candidates for post-loading include hidden content (content that appears after a user action) and images below the fold.

Tools to help you out in your effort: YUI Image Loader allows you to delay images below the fold and theYUI Get utility is an easy way to include JS and CSS on the fly. For an example in the wild take a look atYahoo! Home Page with Firebug's Net Panel turned on.

It's good when the performance goals are inline with other web development best practices. In this case, the idea of progressive enhancement tells us that JavaScript, when supported, can improve the user experience but you have to make sure the page works even without JavaScript. So after you've made sure the page works fine, you can enhance it with some post-loaded scripts that give you more bells and whistles such as drag and drop and animations.

top

Preload Components

tag: content

Preload may look like the opposite of post-load, but it actually has a different goal. By preloading components you can take advantage of the time the browser is idle and request components (like images, styles and scripts) you'll need in the future. This way when the user visits the next page, you could have most of the components already in the cache and your page will load much faster for the user.

There are actually several types of preloading:

Unconditional preload - as soon as onload fires, you go ahead and fetch some extra components. Check google.com for an example of how a sprite image is requested onload. This sprite image is not needed on the google.com homepage, but it is needed on the consecutive search result page.
Conditional preload - based on a user action you make an educated guess where the user is headed next and preload accordingly. On search.yahoo.com you can see how some extra components are requested after you start typing in the input box.
Anticipated preload - preload in advance before launching a redesign. It often happens after a redesign that you hear: "The new site is cool, but it's slower than before". Part of the problem could be that the users were visiting your old site with a full cache, but the new one is always an empty cache experience. You can mitigate this side effect by preloading some components before you even launched the redesign. Your old site can use the time the browser is idle and request images and scripts that will be used by the new site

top

Reduce the Number of DOM Elements

tag: content

A complex page means more bytes to download and it also means slower DOM access in JavaScript. It makes a difference if you loop through 500 or 5000 DOM elements on the page when you want to add an event handler for example.

A high number of DOM elements can be a symptom that there's something that should be improved with the markup of the page without necessarily removing content. Are you using nested tables for layout purposes? Are you throwing in more <div>s only to fix layout issues? Maybe there's a better and more semantically correct way to do your markup.

A great help with layouts are the YUI CSS utilities: grids.css can help you with the overall layout, fonts.css and reset.css can help you strip away the browser's defaults formatting. This is a chance to start fresh and think about your markup, for example use <div>s only when it makes sense semantically, and not because it renders a new line.

The number of DOM elements is easy to test, just type in Firebug's console:
document.getElementsByTagName('*').length

And how many DOM elements are too many? Check other similar pages that have good markup. For example the Yahoo! Home Page is a pretty busy page and still under 700 elements (HTML tags).

top

Split Components Across Domains

tag: content

Splitting components allows you to maximize parallel downloads. Make sure you're using not more than 2-4 domains because of the DNS lookup penalty. For example, you can host your HTML and dynamic content on www.example.org and split static components between static1.example.org andstatic2.example.org

For more information check "Maximizing Parallel Downloads in the Carpool Lane" by Tenni Theurer and Patty Chi.

top

Minimize the Number of iframes

tag: content

Iframes allow an HTML document to be inserted in the parent document. It's important to understand how iframes work so they can be used effectively.

<iframe> pros:

Helps with slow third-party content like badges and ads
Security sandbox
Download scripts in parallel

<iframe> cons:

Costly even if blank
Blocks page onload
Non-semantic

top

No 404s

tag: content

HTTP requests are expensive so making an HTTP request and getting a useless response (i.e. 404 Not Found) is totally unnecessary and will slow down the user experience without any benefit.

Some sites have helpful 404s "Did you mean X?", which is great for the user experience but also wastes server resources (like database, etc). Particularly bad is when the link to an external JavaScript is wrong and the result is a 404. First, this download will block parallel downloads. Next the browser may try to parse the 404 response body as if it were JavaScript code, trying to find something usable in it.

top

Reduce Cookie Size

tag: cookie

HTTP cookies are used for a variety of reasons such as authentication and personalization. Information about cookies is exchanged in the HTTP headers between web servers and browsers. It's important to keep the size of cookies as low as possible to minimize the impact on the user's response time.

For more information check "When the Cookie Crumbles" by Tenni Theurer and Patty Chi. The take-home of this research:

Eliminate unnecessary cookies
Keep cookie sizes as low as possible to minimize the impact on the user response time
Be mindful of setting cookies at the appropriate domain level so other sub-domains are not affected
Set an Expires date appropriately. An earlier Expires date or none removes the cookie sooner, improving the user response time

top

Use Cookie-free Domains for Components

tag: cookie

When the browser makes a request for a static image and sends cookies together with the request, the server doesn't have any use for those cookies. So they only create network traffic for no good reason. You should make sure static components are requested with cookie-free requests. Create a subdomain and host all your static components there.

If your domain is www.example.org, you can host your static components on static.example.org. However, if you've already set cookies on the top-level domain example.org as opposed towww.example.org, then all the requests to static.example.org will include those cookies. In this case, you can buy a whole new domain, host your static components there, and keep this domain cookie-free. Yahoo! uses yimg.com, YouTube uses ytimg.com, Amazon uses images-amazon.com and so on.

Another benefit of hosting static components on a cookie-free domain is that some proxies might refuse to cache the components that are requested with cookies. On a related note, if you wonder if you should use example.org or www.example.org for your home page, consider the cookie impact. Omitting www leaves you no choice but to write cookies to *.example.org, so for performance reasons it's best to use the www subdomain and write the cookies to that subdomain.

top

Minimize DOM Access

tag: javascript

Accessing DOM elements with JavaScript is slow so in order to have a more responsive page, you should:

Cache references to accessed elements
Update nodes "offline" and then add them to the tree
Avoid fixing layout with JavaScript

For more information check the YUI theatre's "High Performance Ajax Applications" by Julien Lecomte.

top

Develop Smart Event Handlers

tag: javascript

Sometimes pages feel less responsive because of too many event handlers attached to different elements of the DOM tree which are then executed too often. That's why using event delegation is a good approach. If you have 10 buttons inside a div, attach only one event handler to the div wrapper, instead of one handler for each button. Events bubble up so you'll be able to catch the event and figure out which button it originated from.

You also don't need to wait for the onload event in order to start doing something with the DOM tree. Often all you need is the element you want to access to be available in the tree. You don't have to wait for all images to be downloaded. DOMContentLoaded is the event you might consider using instead of onload, but until it's available in all browsers, you can use the YUI Event utility, which has an onAvailablemethod.

For more information check the YUI theatre's "High Performance Ajax Applications" by Julien Lecomte.

top

Choose <link> over @import

tag: css

One of the previous best practices states that CSS should be at the top in order to allow for progressive rendering.

In IE @import behaves the same as using <link> at the bottom of the page, so it's best not to use it.

top

Avoid Filters

tag: css

The IE-proprietary AlphaImageLoader filter aims to fix a problem with semi-transparent true color PNGs in IE versions < 7. The problem with this filter is that it blocks rendering and freezes the browser while the image is being downloaded. It also increases memory consumption and is applied per element, not per image, so the problem is multiplied.

The best approach is to avoid AlphaImageLoader completely and use gracefully degrading PNG8 instead, which are fine in IE. If you absolutely need AlphaImageLoader, use the underscore hack_filter as to not penalize your IE7+ users.

top

Optimize Images

tag: images

After a designer is done with creating the images for your web page, there are still some things you can try before you FTP those images to your web server.

You can check the GIFs and see if they are using a palette size corresponding to the number of colors in the image. Using imagemagick it's easy to check using
identify -verbose image.gif
When you see an image useing 4 colors and a 256 color "slots" in the palette, there is room for improvement.
Try converting GIFs to PNGs and see if there is a saving. More often than not, there is. Developers often hesitate to use PNGs due to the limited support in browsers, but this is now a thing of the past. The only real problem is alpha-transparency in true color PNGs, but then again, GIFs are not true color and don't support variable transparency either. So anything a GIF can do, a palette PNG (PNG8) can do too (except for animations). This simple imagemagick command results in totally safe-to-use PNGs:
convert image.gif image.png
"All we are saying is: Give PiNG a Chance!"
Run pngcrush (or any other PNG optimizer tool) on all your PNGs. Example:
pngcrush image.png -rem alla -reduce -brute result.png
Run jpegtran on all your JPEGs. This tool does lossless JPEG operations such as rotation and can also be used to optimize and remove comments and other useless information (such as EXIF information) from your images.
jpegtran -copy none -optimize -perfect src.jpg dest.jpg

top

Optimize CSS Sprites

tag: images

Arranging the images in the sprite horizontally as opposed to vertically usually results in a smaller file size.
Combining similar colors in a sprite helps you keep the color count low, ideally under 256 colors so to fit in a PNG8.
"Be mobile-friendly" and don't leave big gaps between the images in a sprite. This doesn't affect the file size as much but requires less memory for the user agent to decompress the image into a pixel map. 100x100 image is 10 thousand pixels, where 1000x1000 is 1 million pixels

top

Don't Scale Images in HTML

tag: images

Don't use a bigger image than you need just because you can set the width and height in HTML. If you need
<img width="100" height="100" src="mycat.jpg" alt="My Cat" />
then your image (mycat.jpg) should be 100x100px rather than a scaled down 500x500px image.

top

Make favicon.ico Small and Cacheable

tag: images

The favicon.ico is an image that stays in the root of your server. It's a necessary evil because even if you don't care about it the browser will still request it, so it's better not to respond with a 404 Not Found. Also since it's on the same server, cookies are sent every time it's requested. This image also interferes with the download sequence, for example in IE when you request extra components in the onload, the favicon will be downloaded before these extra components.

So to mitigate the drawbacks of having a favicon.ico make sure:

It's small, preferably under 1K.
Set Expires header with what you feel comfortable (since you cannot rename it if you decide to change it). You can probably safely set the Expires header a few months in the future. You can check the last modified date of your current favicon.ico to make an informed decision.

Imagemagick can help you create small favicons

top

Keep Components under 25K

tag: mobile

This restriction is related to the fact that iPhone won't cache components bigger than 25K. Note that this is the uncompressed size. This is where minification is important because gzip alone may not be sufficient.

For more information check "Performance Research, Part 5: iPhone Cacheability - Making it Stick" by Wayne Shea and Tenni Theurer.

top

Pack Components into a Multipart Document

tag: mobile

Packing components into a multipart document is like an email with attachments, it helps you fetch several components with one HTTP request (remember: HTTP requests are expensive). When you use this technique, first check if the user agent supports it (iPhone does not).

Avoid Empty Image src

tag: server

Image with empty string src attribute occurs more than one will expect. It appears in two form:

straight HTML
<img src="">
JavaScript
var img = new Image();
img.src = "";

Both forms cause the same effect: browser makes another request to your server.

Internet Explorer makes a request to the directory in which the page is located.
Safari and Chrome make a request to the actual page itself.
Firefox 3 and earlier versions behave the same as Safari and Chrome, but version 3.5 addressed this issue[bug 444931] and no longer sends a request.
Opera does not do anything when an empty image src is encountered.

Why is this behavior bad?

Cripple your servers by sending a large amount of unexpected traffic, especially for pages that get millions of page views per day.
Waste server computing cycles generating a page that will never be viewed.
Possibly corrupt user data. If you are tracking state in the request, either by cookies or in another way, you have the possibility of destroying data. Even though the image request does not return an image, all of the headers are read and accepted by the browser, including all cookies. While the rest of the response is thrown away, the damage may already be done.

The root cause of this behavior is the way that URI resolution is performed in browsers. This behavior is defined in RFC 3986 - Uniform Resource Identifiers. When an empty string is encountered as a URI, it is considered a relative URI and is resolved according to the algorithm defined in section 5.2. This specific example, an empty string, is listed in section 5.4. Firefox, Safari, and Chrome are all resolving an empty string correctly per the specification, while Internet Explorer is resolving it incorrectly, apparently in line with an earlier version of the specification, RFC 2396 - Uniform Resource Identifiers (this was obsoleted by RFC 3986). So technically, the browsers are doing what they are supposed to do to resolve relative URIs. The problem is that in this context, the empty string is clearly unintentional.

HTML5 adds to the description of the tag's src attribute to instruct browsers not to make an additional request in section 4.8.2:

The src attribute must be present, and must contain a valid URL referencing a non-interactive, optionally animated, image resource that is neither paged nor scripted. If the base URI of the element is the same as the document's address, then the src attribute's value must not be the empty string.

Hopefully, browsers will not have this problem in the future. Unfortunately, there is no such clause for <script src=""> and <link href="">. Maybe there is still time to make that adjustment to ensure browsers don't accidentally implement this behavior.

This rule was inspired by Yahoo!'s JavaScript guru Nicolas C. Zakas. For more information check out his article "Empty image src can destroy your site".

자료출처 : Yahoo (원문)

페이지

2011년 11월 21일 월요일

IT 용어 정의

2011년 11월 15일 화요일

요구 분석

2011년 11월 9일 수요일

프로젝트 계획

최첨단 XML 압축 기술 조사

자주 사용하는 약어

Best Practices for Speeding Up Your Web Site

Minimize HTTP Requests

Use a Content Delivery Network

Add an Expires or a Cache-Control Header

Gzip Components

Put Stylesheets at the Top

Put Scripts at the Bottom

Avoid CSS Expressions

Make JavaScript and CSS External

Reduce DNS Lookups

Minify JavaScript and CSS

Avoid Redirects

Remove Duplicate Scripts

Configure ETags

Make Ajax Cacheable

Flush the Buffer Early

Use GET for AJAX Requests

Post-load Components

Preload Components

Reduce the Number of DOM Elements

Split Components Across Domains

Minimize the Number of iframes

No 404s

Reduce Cookie Size

Use Cookie-free Domains for Components

Minimize DOM Access

Develop Smart Event Handlers

Choose <link> over @import

Avoid Filters

Optimize Images

Optimize CSS Sprites

Don't Scale Images in HTML

Make favicon.ico Small and Cacheable

Keep Components under 25K

Pack Components into a Multipart Document

Avoid Empty Image src